Today, we will delve into the core technologies and engineering practices within Visual-Inertial Odometry (VIO) systems. While specific implementations and engineering habits may vary across teams, this article will focus on outlining the fundamental knowledge of VIO and sharing our practical experience based on VINS-MONO.
Overview of VIO System Backend Core Constraints
The core of VIO system backend optimization lies in its three key constraints: prior constraints, visual constraints, and IMU pre-integration. Our engineering practices are primarily based on VINS-MONO because it offers one of the clearest structural frameworks and logical principles among current VIO systems, which helps us more easily handle and pinpoint engineering issues.
VIO, or Visual-IMU Odometry, is considered a practical and cost-effective solution for pose estimation in the future. However, current VIO systems still face a series of challenges, mainly including:
- FEJ (First-Estimated Jacobian) Problem: During the sliding window optimization (SWF) process, the VIO system’s information matrix H is divided into two parts, H1 and H2. One part’s linearization point changes, while the other remains fixed. To prevent the null space of the matrix from degenerating and making unobservable variables observable, it’s crucial that the linearization points remain consistent when different residuals compute Jacobians for the same state estimate. VINS-MONO does not employ an FEJ strategy, yet it hasn’t resulted in significant negative impacts in practical applications.
- Yaw Angle and Three-Degrees-of-Freedom Unobservability: The system has four degrees of freedom that are unobservable, which ideally helps in better estimating the optimal solution. However, in practical applications (especially when anomalies occur, leading to distorted dataset performance), the trajectory can experience sudden spikes and quickly become unusable due to scale uncertainty, even challenging loop closure corrections.
- Complete Visual Loss Scenarios:
- Suddenly entering a texture-less area and turning.
- Instantaneous severe impact causing abrupt direction changes.
- Using a rolling shutter camera at high speed while experiencing severe shaking and bumps.These situations can all lead to visual tracking loss (e.g., in LK optical flow, new tracked points (blue) become lost points (red)). Once visual constraints are lost, if hardware performance is weak, the system can quickly introduce incorrect prior constraints, which, in conjunction with IMU pre-integration constraints, lead to severe trajectory deviation.
- Severe Visual Disturbances During Camera Motion: When unidentifiable dynamic obstacles (e.g., people) appear in front of the camera and significantly disrupt feature points, VIO continuously updates erroneous keyframes, quickly filling the sliding window. In such cases, even if the IMU’s actual movement is minor, the weight matrix Σ of the visual constraints dominates, leading to degraded system performance.
- IMU Drift at Zero Velocity: This is one of VIO’s common problems. When the system rapidly enters a zero-velocity state (slow entry has less impact), it usually stops generating new keyframes (KF), discards visual measurements, and continues to propagate pre-integration. At this point, VIO is highly susceptible to scale drift due to accumulated pre-integration errors.
- Visual Disturbances at Zero Velocity: Even if the system successfully enters zero velocity and stabilizes its trajectory, if a series of disturbances occur in front of the camera (similar to point 4, but in a zero-velocity state), the system is still prone to scale drift.
- Large Pitch Angle Due to Prolonged Absence of Yaw Input: During prolonged static observation, VIO systems may experience significant Pitch angle drift, which in turn causes scale drift. The same applies to Roll angle. This issue is still being localized but indeed exists in practical applications.
Engineering Strategies for Solving VIO Problems



The issues listed above indicate that making a VIO system work correctly and ideally is inherently a challenging task. A deep understanding of its principles requires a significant amount of time, and even with mastery, these problems are difficult to completely avoid. So, what is the root cause of these problems? The answer lies in the constraints within the pose estimation component and method you are using!
Taking VIO as an example, the most crucial elements are prior constraints, visual constraints, and IMU constraints. Our objective function is essentially a nonlinear least squares solution for these three types of constraints. However, the weight matrix Σ within each constraint and its corresponding weight control strategy are the key determinants of system robustness and usability. While a perfect strategy doesn’t exist, we can apply in-depth processing through corresponding engineering practices.
1、Zero-Velocity Update (ZUPT) Strategy: For problems 6 and 7, zero-velocity update is an effective engineering strategy. We can add a limiting condition before backend optimization: determine if the system is stationary/at zero velocity by judging the average Euclidean distance of the IMU input (6 degrees of freedom) over a period. For instance, if our IMU outputs data at 200Hz (one data point every 5ms), we can accurately determine the state by calculating the Euclidean distance within 40-100ms (i.e., 8 to 20 IMU data points).
When the system activates the zero-velocity state, you can adjust the original Σ weighting strategy, for example, by directly reducing the visual weight to a minimal value, which stops the visual sliding window (SWF) from updating. There are multiple corresponding IMU strategies:
- Strategy 1: Continuously use the latest set of IMU data for pre-integration, effectively converting the VIO system into pure IMU Odometry (IO). This method might be more complex and refined in engineering implementation.
- Strategy 2: When the zero-velocity update judgment takes effect, directly convert the VIO system to Visual Odometry (VO), stopping IMU pre-integration. However, trajectory drift due to obstacle disturbances can still occur in this scenario.
- Important Note: During engineering processing, it is crucial not to disrupt the core logic of the sliding window and backend optimization itself!
2、Brute-Force Strategy for Violent Motion: For problem 3, we have a simple but significantly effective “brute-force” strategy, which involves judging based on high-speed IMU motion input (e.g., the Euclidean distance strategy mentioned above):
- Strategy 1: Directly convert the system from VIO to pure IMU Odometry (IO). This method works well for visual loss caused by severe bumps and rapid shaking. The prerequisite remains: do not disrupt the sliding window and backend optimization itself!
- Strategy 2: In the backend optimization, write a callback function to directly determine based on the number of red/blue feature points from the LK optical flow method obtained from the frontend. This is a vision-based method, while the other is IMU-based. Essentially, both involve establishing new Σ adjustment strategies for the three constraints in extreme situations.
3、FEJ and Complex Moving Obstacle Handling:
For problem 1 (FEJ): We have tried manually adding FEJ, but the effect was not significant.
Problem 2 (Yaw angle unobservability) is one of the causes of problems 3-7, so it will not be elaborated on here.
Problem 3 (active obstacles during camera motion) is the most challenging issue, for instance, when the system is moving and personnel are consistently moving in front of the camera. Currently, we have not found a perfect solution. From a conceptual standpoint, the system could enter pure IO optimization, but establishing its judgment conditions is difficult. If the main system integrates an NPU (Neural Processing Unit), personnel judgment conditions can be established through machine learning methods, thereby triggering IO mode, which would greatly simplify processing.
By incorporating the above engineering strategies, we have successfully solved issues such as scale drift. Of course, you can also explore constraint strategies for extreme situations that suit your own needs and experience.
Common Problem Scenarios and Robustness Improvement
In summary, the scenarios where VIO systems are most prone to problems include:
- Transitioning from static to motion, and motion to static.
- Visual loss due to high-speed shaking, texture-less areas, or large Pitch/Roll angles.
- Dynamic obstacle interference.
Conventional handling methods include: precise intrinsic/extrinsic parameter adjustment, using hardware synchronization to ensure accurate time difference (td), and using global shutter cameras. Many believe that if these foundational tasks are done well enough, establishing a unifiedΣ strategy can solve all problems, but this is impossible to achieve in practice.
How to comprehensively improve the robustness of VIO systems from the overall perspective of system backend optimization and three constraints, through strategic adjustment and control, requires sustained perseverance and exploration.

Add comment