Blogs
How to Solve VIO Deployment Issues: 8 Engineering Challenges
Readers are advised to master the foundational knowledge in Robot State Estimation (Part 4): Growth Paths and Capability Enhancement before diving into this article; otherwise, understanding subsequent content may be challenging.
Previously, we mentioned that an ideal VIO system must meet multiple stringent requirements simultaneously, as follows:
- It must possess high-precision pose estimation capabilities (e.g., ORB-SLAM3, VINS-MONO systems);
- It must support semi-dense mapping (e.g., DSO, DM-VIO systems);
- It must have engineering deployment capabilities, handling various extreme scenarios and integrating technologies like ZUPT (referencing the PR-MONO1 solution);
- It must meet low-overhead requirements — systems that can only run on high-performance computing platforms have limited commercial value (except in autonomous driving), which also contradicts the original design intent of VINS/VIO.
RobotBaton-VIOBOT2 provide pure-vision spatial perception cameras designed specifically for robot vision to enhance a robot's environmental awareness. The cameras deliver real-time spatial perception data, including depth maps, position, and posture, helping robots achieve more efficient spatial localization, object recognition, path planning, dynamic scene understanding, and obstacle avoidance. They are a core hardware component for boosting robot vision performance.



Progressive Difficulty in Engineering Implementation
1、2D SLAM
There is a wealth of directly deployable open-source code. The mainstream solution combines "wheel odometer + IMU Kalman filtering" with single-point LiDAR or TOF sensors to build 2D grid maps, with relatively low technical barriers.
2、Binocular VSLAM + Loose Coupling with Partial Sensors
Implementation difficulty is moderate. The market already has many mature ranging hardware operators; without such operators, system overhead increases significantly, making it hard to meet the low-overhead requirement (i.e., requirement 4 above). This solution is challenging to debug in outdoor environments: different scenarios require matching baselines, and drift issues are common. Coupling RTK outdoors or wheel odometers/IMUs indoors can improve stability but will further increase system overhead.
3、Monocular VSLAM + Loose Coupling with Partial Sensors
System overhead is lower than binocular solutions, and debugging is simpler, but accuracy is lower and prone to various issues. Its initialization step often draws questions from users and partners, yet overall complexity remains lower than VIO.
4、Multi-camera / Panoramic VSLAM + Loose Coupling with Partial Sensors
Technical advantages are significant: when properly implemented, system robustness is extremely high (many teams are exploring this field). However, sensor configurations are more complex, leading to significantly higher system overhead, and coupling with other sensors is still necessary.
5、VIO + Loose Coupling with Partial Sensors
It is generally recommended to couple a depth camera (D-camera) for mapping or obstacle avoidance. Other sensors to be coupled are similar to those in the above solutions. The main challenge lies in extremely high implementation difficulty; tight coupling modes also result in high system overhead. Teams with resources are advised to integrate wheel odometers into the tight coupling framework.
6、Binocular + VIO + Loose Coupling with Partial Sensors
Implementation difficulty is higher than point 5, with cumbersome workflows. Its application value is limited except in L2-L3 assisted driving scenarios; reference can be made to the joint solution by DJI and Wuling.
Engineering Challenges of Mainstream Technical Routes
1、ORB-SLAM2 and ORB-SLAM3
These systems have a high level of engineering maturity, with easy parameter tuning and few issues. They ensure high accuracy through effective loop closure constraints, achieving an overall performance score of 80/100, making them widely adopted. However, from the perspective of senior engineers and product managers, the system has limitations in engineering deployment: its completeness may lead R&D personnel to over-rely on parameter tuning, neglecting underlying technical optimization. For example, rapid deployment is possible using NVIDIA Xavier NX with RealSense D435i, but the system lacks flexibility (e.g., modifying core functions like loop closure is extremely difficult). Over-reliance on this system tends to restrict R&D to parameter tuning — suitable for competition scenarios but not conducive to breakthroughs in core technologies, with high barriers to secondary development.
2、VINS-MONO (developed by HKUST)
As a significant academic achievement, the author should ideally fully endorse it, but an objective analysis of its pros and cons is necessary:
- Advantages:
- The architecture is standardized and easy to learn, with clear code logic and ample room for secondary development;
- The backend, implemented based on CERES, performs excellently with limited room for further optimization.
- Disadvantages:
- The frontend has significant room for optimization — the original algorithm has high overhead, feature points have average usability, and parallelization based on FAST features is relatively simple.
- Additionally, there are multiple approaches to implementing photometric calibration (BCs circle optimization).
- The system cannot achieve ideal mapping functionality; sparse mapping fails to meet the all-round requirements of VIO. Previous attempts to integrate direct methods have yielded poor results.
3、TUM Series (DSO→VI-DSO→DM-VIO)
This is the only technical route with the potential to achieve an all-round VIO system, though it still suffers from high overhead.Its technical advancement is outstanding, which need not be elaborated here.
- Disadvantages:
- Extremely high barriers to entry, making it the "ceiling" of VSLAM/VIO — DSO itself is one of the most theoretically and code-wise challenging VSLAM systems. Developing VI-DSO or DM-VIO directly without mastering DSO will pose enormous challenges.
- High sensitivity to photometric errors, requiring R&D personnel to deeply understand the characteristics of optical hardware; those unfamiliar with Kalibr or Zhang Zhengyou’s calibration method are advised to avoid this field.
- Extremely high difficulty in code implementation.
Core Difficulties in Engineering Deployment
Beyond mastering VSLAM/VIO fundamentals, engineering deployment requires overcoming the following challenges:
- Possessing solid engineering thinking and practical capabilities.
- Familiarity with various processing cores, including CPU, GPU, NPU, DSP, and FPGA.
- In-depth understanding of the characteristics of different cameras.
- Mastery of parallelization and multi-core optimization technologies, such as proficient use of registers and WARP primitives on NVIDIA platforms.
- Familiarity with various interface protocols, including VI, DPC, SDI, MIPI, USB, CAN, RS-232, and RS-485.
- Removing visualization tools like Pangolin and RVIZ from open-source frameworks to avoid occupying system resources.
- Converting raw data output in open-source systems to hardware encoding; all OSD (on-screen display) functions should be hardware-accelerated, with independent runtime libraries compiled for pose and point cloud output.
- Optimizing the use of pointers and shared memory, prioritizing manual implementation over calling library functions.
Of course, if budget allows, directly adopting high-performance platforms like Xavier, TX2, or i7, paired with high-end RealSense sensors, can resolve some of the above difficulties — this remains an efficient choice for current engineering deployment.