Blogs
Visual-Based Navigation (VBN) : The Complete Guide to UAV Navigation
Globally, significant research efforts are currently focused on localization and navigation technologies for unmanned aerial vehicles (UAVs) in GNSS-denied conditions. Among these, Visual-Based Navigation (VBN) has emerged as a research hotspot, thanks to its core advantages: strong anti-interference capability, low power consumption, cost-effectiveness, compact size, simple device structure, and high localization accuracy. Particularly in recent years, the explosive advancements of AI technology in computer vision have not only overcome long-standing bottlenecks in visual technologies but also significantly enhanced image cognition capabilities—further propelling the development of VBN. However, VBN still faces critical issues that require urgent resolution.
1. Core Principles of VBN
As early as 2014, the United States launched five research and development projects for non-GPS navigation technologies, including Micro-PNT and ANS. Northrop Grumman developed the Assured PNT system, which integrates multiple auxiliary navigation solutions for GPS-denied scenarios—such as celestial navigation, terrain matching, LiDAR, magnetometers, and odometers—providing diverse options for localization in complex environments.
In terms of technical principles, VBN operates by using UAV-mounted visual devices (including visible light, infrared, and SAR types) to capture ground or environmental images. These images are then matched with reference maps containing geographic location information via image-matching algorithms, ultimately enabling precise UAV localization without relying on GNSS signals.
2. Two Main Technical Types of VBN
VBN is primarily categorized into "map-based" and "mapless" types, each adapted to different application scenarios:
Map-Based Visual Navigation: Requires pre-stored navigation maps with high-precision geographic information (e.g., scene maps, topographic maps). It achieves absolute localization by matching real-time images captured by the UAV with these navigation maps. Scene-matching navigation offers an order of magnitude higher accuracy than terrain-matching navigation. Consequently, terrain matching is often used in mid-course guidance phases, while scene matching is employed in terminal guidance to meet high-precision localization requirements.
Mapless Visual Navigation: Centered on Visual SLAM (Simultaneous Localization and Mapping) technology, it encompasses functions such as loop closure detection, visual relocalization, visual scene recognition, visual relative terrain navigation (georegistration), and image retrieval. In recent years, driven by rapid advancements in SLAM technology, deep learning, and computer vision, mapless VBN has made significant progress and become a key R&D focus for universities, UAV enterprises, and autonomous driving companies worldwide.

3. Commercial and Industrial Application Progress of VBN
In the commercial and industrial UAV sectors, VBN has achieved tangible progress in specific scenarios such as autonomous landing, obstacle avoidance, and follow-flight:
The U.S.-based Skydio 2 UAV is equipped with NVIDIA Jetson TX2 embedded AI computing hardware, enabling real-time processing of image data from 6 4K cameras. This delivers fully autonomous obstacle avoidance, significantly enhancing flight safety.
DJI’s Flight Autonomy system integrates 6 visual sensors, a main camera, 2 sets of infrared sensors, 1 set of ultrasonic sensors, a GPS/GLONASS dual-mode satellite positioning system, and dual-redundant IMU and compass sensors. When GPS signals are lost, the system fuses data from visual and other sensors to maintain basic global localization and navigation.
While no commercial/industrial UAV or autonomous driving products currently rely entirely on VBN, several breakthrough attempts have been made. For example, Tesla’s FSD (Full Self-Driving) Version 10.1 abandons high-precision maps and LiDAR, relying solely on pure vision + AI technology to achieve autonomous driving in some complex scenarios. Its road test results fully demonstrate the enormous application potential of VBN.
4. Three Core Challenges Facing VBN
Despite its rapid development, VBN still encounters three major technical bottlenecks:
Small-Scene Limitation: In Visual SLAM applications, landmark descriptors demand high memory usage. Localization methods that store complete scene models on UAV hardware are typically limited to small exploration spaces (≤ 200m × 200m), failing to meet the needs of large-scale scenarios.
Large-Scene Link Dependence: In large-scale scenarios (e.g., wide-area inspections), UAV-captured images must be transmitted back to ground servers. These servers then perform real-time map reconstruction, pose estimation, localization, and tracking before sending results back to the UAV. However, in GNSS-jammed environments, the reliability of data transmission links cannot be guaranteed, easily leading to localization interruptions.
Perceptual Confusion and Algorithm Adaptation Issues: As scene scale expands, environmental complexity increases dramatically, leading to "perceptual confusion"—where similar visual features appear in different regions, causing localization errors (e.g., a single image being incorrectly matched to multiple locations on a map). Additionally, while mainstream loop closure detection algorithms (such as SeqSLAM) can handle changes in lighting, weather, and time of day, they struggle to adapt to UAV flight at varying altitudes/angles and free aerial maneuvering, requiring further optimization.
5. Breakthrough Directions for VBN’s Technical Bottlenecks
To address the above issues, technical innovations can be pursued in three key areas:
Develop Hierarchical Matching Technology: First, use semantic technology and traditional image retrieval to roughly screen a set of map images similar to the UAV’s real-time on-board images. Then, within this candidate set, combine aerial imaging conditions, map information, and geometric data to calculate the UAV’s absolute position and pose in real time. Simultaneously, leverage deep learning for object detection and semantic segmentation to exclude invalid scene regions (e.g., aerial clouds). Focus on constructing stable image descriptors based on landmark buildings to accelerate map matching and search efficiency.
Optimize Large-Scale Map Processing: For long-endurance autonomous flight needs, resolve bottlenecks in navigation map compression and storage. Enhance the robustness of image features under varying seasons, lighting, and viewing angles. Optimize the generalization and accuracy of image retrieval and matching algorithms to enable real-time, fast searching in large-scale maps.
Deploy AI Acceleration Chips: While deep learning has improved VBN performance, it involves massive computations and million-scale parameters—demands that traditional on-board computing platforms (such as CPUs and FPGAs) cannot meet for real-time operation. By adopting low-cost, low-power AI acceleration chips and using heterogeneous acceleration technology to reduce deep learning model latency, VBN can achieve situational awareness and target recognition/tracking. It can also leverage image semantic analysis to enhance adaptability for autonomous flight in dynamic environments.
Conclusion
VBN technology is currently limited in its applications due to factors such as image capture being affected by seasons, lighting, viewing angles, and sensor types—along with unresolved challenges in large-scale map construction and searching. In the future, it will be essential to draw on advanced technologies from the autonomous driving, commercial UAV, and robotics fields. By aligning with the sensor characteristics, practical requirements, and computing resource constraints of long-endurance autonomous UAV flight, continuous technical research and field testing are needed. This will drive the large-scale application of VBN in GNSS-denied environments, solidifying its role as a core supporting technology for UAV localization and navigation.