
Visual-Inertial Odometry (VIO) is an advanced form of Visual SLAM (VSLAM), which itself is a further development of 2D SLAM. It’s worth noting that another major branch of SLAM, laser SLAM and point cloud matching, will not be discussed here.

Core Challenge: Balancing Accuracy and Cost



The fundamental core of VIO lies in balancing accuracy and cost. In recent years, research on overhead has primarily focused on engineering optimizations (such as GPU, DSP, and ASIC advancements). Conversely, the pursuit of accuracy is largely concentrated within academia, now delving into cutting-edge fields like AI and semantic understanding.

Inherent Disadvantages and Key Research Directions for VIO

Compared to laser SLAM, VIO inherently suffers from limitations in its mapping capabilities, and this is a crucial area for in-depth research. Additionally, VIO does indeed have high overhead, as sufficient overhead is essential to guarantee accuracy; scale issues can cause the system to crash in moments without it.

Core Operations and Difficulties of VIO

VIO’s primary function is to provide precise 3D spatial pose estimation for robots or XR devices, while also maintaining a complete trajectory in space. Among the most challenging aspects to control are extreme scenarios (hence the introduction of ZUPT) and the resulting uncontrolled scale. Initialization is also a problem, though comparatively less. As for “bag-of-words loop closure,” this “mystery” has limited effectiveness in practical applications; the final few meters of regression typically require smaller-scale AI or other calibration methods.

Solutions: Backend Optimization and ZUPT

Scientists have conducted extensive research to address these issues.
Backend Optimization: The “Science” and the “Magic”
One primary approach involves in-depth research into backend optimization. Due to its robustness, backend optimization can significantly improve accuracy and control scale while maintaining a moderate frequency suitable for real-time operation.
The most representative achievements in this area are ICEBA and DMVIO. Although both share similar underlying principles, each possesses unique intricacies:
- ICEBA constructs a lengthy 50 Local Bundle Adjustment (LBA) sliding window, combined with powerful relative marginalization techniques, and collaborates with a low-frequency Global Bundle Adjustment (GBA).
- DMVIO, on the other hand, constructs two Bundle Adjustments (BAs), enabling the system to revert to an earlier state, even as far back as 100 keyframes, to incorporate IMU factors.
Both methods have effectively managed the most troublesome issues of marginalization and relinearization, standing out as leaders in their respective fields. However, each also has its own set of challenges.
ZUPT: The Ultimate Engineering Solution
Another approach is Zero Velocity Update (ZUPT). This ultimate engineering solution, ZUPT, shares similarities with the backend optimization methods mentioned above. The core of ultimate ZUPT involves introducing a third sensor for correction, such as loose coupling with a wheel odometry.
Its working principle is as follows: under normal conditions, it relies on VIO’s estimation. When extreme situations occur, leading to a breakdown in scale, the entire system state is rolled back to an earlier point in time (similar to DMVIO, going back 10 seconds or even 100 keyframes). The system writes a large amount of state information into memory and stores prior information and linearization points (similar to ICEBA). Then, the missing data from that intermediate period is filled in by the wheel odometry, and BA optimization is re-initiated.



The development path of VIO is full of challenges. While its core concepts are relatively easy to grasp, its practical implementation is exceptionally complex and labor-intensive. Nevertheless, these are all essential steps in the maturation of VIO technology
RobotBaton-VIOBOT2 provides pure-vision spatial perception cameras designed specifically for robot vision to enhance a robot’s environmental awareness. The cameras deliver real-time spatial perception data, including depth maps, position, and posture, helping robots achieve more efficient spatial localization, object recognition, path planning, dynamic scene understanding, and obstacle avoidance. They are a core hardware component for boosting robot vision performance.

Add comment