Blogs
Robot State Estimation | Does Robot Perception Technology Really Matter?
This article delves into the field of robot state estimation, drawing on recent reflections from practical work. It is not purely a technical piece but incorporates a commercial perspective—after all, any technology must ultimately confront the market and issues of survival.



Key Characteristics of Current Robot State Estimation
The field of robot state estimation currently exhibits several distinct traits:
- Heavy Investment in Perception for Complex Scenarios
In complex application environments, robot perception systems are often "bulky," relying heavily on one or more LiDARs, high-performance computing platforms, and numerous visual sensors. Beyond cost considerations, this foundational design makes SLAM (Simultaneous Localization and Mapping) extremely complex to implement in complex scenarios—whether handling global or local tasks—especially in ground-based environments. Aerial scenarios are slightly more manageable but face strict low-altitude flight restrictions and safety requirements.
- Intensifying Inward Competition in SLAM Academia
SLAM research is trending toward increasing complexity, with no sign of reversing. Examples include the integration of semantic mapping, semantic segmentation, GANs (Generative Adversarial Networks), LSTMs (Long Short-Term Memory networks), more sophisticated processing of line/surface features, and cluster control and collaboration. This has made the field increasingly esoteric, raising technical barriers and overlapping heavily with artificial intelligence (AI), deterring many newcomers.
- UAV Disruptive Potential in Specific Scenarios
Unmanned Aerial Vehicles (UAV) are likely to replace numerous traditional robot applications in the future. This is no exaggeration: take "inspection" tasks, for instance—UAVs are already beginning to fully replace ground-based robots in fields like power line inspection.
- Widening Gap Between Academia and Industry
Worryingly, the divide between academic research in robotics and the actual needs of industry is not narrowing but growing.
Reflections and Recommendations for Robot Development
Based on these observations, here are key reflections:
- Robotics Should Evolve Gradually
Robot technology should not aim to replace humans overnight. For example, even DJI’s most advanced drones remain tools and toys for humans. The development path should follow: "assistive" → "semi-autonomous" → "partially fully autonomous.
- Lessons from Cleaning Tools
Consider cleaning tools: brooms and mops have served as human aids for thousands of years. Fully autonomous floor-cleaning robots, after nearly a decade of development, have achieved full automation due to their extremely simple use cases. Yet, surprisingly, expensive "semi-autonomous" steam mops still achieve substantial annual sales. The reasons are straightforward:- Even the most automated cleaning robots struggle to clean certain areas thoroughly, frustrating users with germophobia.
- Steam mops are "fun" to use—even self-proclaimed "lazy people" like me enjoy using them.
- Reconsidering the Core Value of Robots
This raises an interesting question: No matter how advanced DJI’s drone technology, its core value lies in providing humans with unprecedented aerial perspectives and high playability. So, what should we focus on in the overall development of robots? Beyond unique cases like Robosen robots, I believe the core value of robots lies in:- Making us lazier (improving efficiency and freeing up human labor).
- Making tasks more enjoyable, turning tedious work into fun.
- Replacing dangerous or undesirable tasks.
Among these, only the third scenario (replacing dangerous/undesirable work) is suitable for developing fully autonomous robots. The first two scenarios (making us lazier and more entertained) hold enormous commercial potential!
Solutions and Technical Considerations
The complexity of fully autonomous robots for the third scenario exceeds the scope of this discussion, so we will not delve deeper here.
Instead, let’s focus on the first two scenarios: assistive/entertainment robots. Some technical details follow—if these are difficult to grasp, this field may not be the right fit for you.
Compared to the third scenario, planning and control in the first two are far simpler. For example, from the perspective of tool/assistant robots or pure aids, the perception layer barely requires more than a camera paired with a 32-bit SOC (System on Chip) to meet needs, enabling a wide range of business applications. However, such devices have low barriers to entry, risking future red ocean competition. Their playability is also limited, often restricted to remote-controlled toys with simple tricks or follow functions.
A key focus here is semi-autonomous remote-controlled robots. DJI’s recent products, particularly the DJI Avata, offer a prime example—their design is nearly flawless (backed by profound technological accumulation,which we do not recommend for average teams to blindly imitate).
These robots excel at meeting the needs of "making us lazier" and "high playability." They typically require no LiDAR because humans, as powerful operators, provide high-level perception and decision-making. Combined with capable VIO/VDIO (Visual-Inertial / Visual-Depth-Inertial Odometry) or lower-end VSLAM components, plus a solid 64-bit computing platform, they can fulfill all functions. Perception sensors at most include global shutter/rolling shutter cameras, depth cameras, and IMUs, which can be tightly coupled with wheel encoders/RTK.
Crucially, as extensions of human capabilities, these devices avoid the need to overcomplicate global map construction or deep integration of global and local mapping—sidestepping the truly "involution" (intensely competitive) aspects of SLAM.
Of course, such robots must thoroughly address transmission issues, requiring in-depth knowledge of communication technologies (e.g., near-field Mesh/Wi-SUN). For cross-internet management and operation, cloud integration and latency must also be considered. However, compared to the daunting, overly competitive knowledge system in SLAM, communication-related expertise will not become a major barrier to commercial implementation.
While I cannot detail all specific applications of these robots (many involve industry and partner trade secrets), I firmly believe this direction is bright, vast, and full of promise.