Dense Mapping for Autonomous Navigation

We develop real-time methods for generating dense maps for large-scale autonomous navigation of aerial robots. We investigate into monocular and multi-camera dense mapping methods with special attention on the tight integration between maps and motion planning modules.

Without any prior knowledge of the environment, our dense mapping module utilizes a inverse depth labeling method to extract a 3D cost volume through temporal aggregation on synchronized camera poses. After semi-global optimization and post-processing, a dense depth image is calculated and fed into our uncertainty-aware truncated signed distance function (TSDF) fusion approach, from which a live dense 3D map is produced.

Autonomous aerial navigation using monocular visual-inertial fusion


We present a real-time monocular visual-inertial dense mapping and autonomous navigation system. The whole system is implemented on a tight size and light weight quadrotor where all modules are processing onboard and in real time. By properly coordinating three major system modules: state estimation, dense mapping and trajectory planning, we validate our system in both cluttered indoor and outdoor environments via multiple autonomous flight experiments. A tightly-coupled monocular visual-inertial state estimator is develop for providing high-accuracy odometry, which is used for both feedback control and dense mapping. Our estimator supports on-the-fly initialization, and is able to online estimate vehicle velocity, metric scale, and IMU biases.
Without any prior knowledge of the environment, our dense mapping module utilizes a plane-sweeping-based method to extract a 3D cost volume through temporal aggregation on synchronized camera poses. After semi-global optimization and post-processing, a dense depth image is calculated and fed into our uncertainty-aware TSDF fusion approach, from which a live dense 3D map is produced. Using this map, our planning module firstly generates an initial collision-free trajectory based on our sampling-based path searching method. A gradient-based optimization method is then applied to ensure trajectory smoothness and dynamic feasibility. Following the trend of rapid increases in mobile computing power, we believe our minimum sensing sensor setup suggests a feasible solution to fully autonomous miniaturized aerial robots.


High-precision online markerless stereo extrinsic calibration

By Yonggen LING

Stereo cameras and dense stereo matching algorithms are core components for many robotic applications due to their abilities to directly obtain dense depth measurements and their robustness against changes in lighting conditions. However, the performance of dense depth estimation relies heavily on accurate stereo extrinsic calibration. In this work, we present a real-time markerless approach for obtaining high-precision stereo extrinsic calibration using a novel 5-DOF (degrees-of-freedom) and nonlinear optimization on a manifold, which captures the observability property of vision-only stereo calibration. Our method minimizes epipolar errors between spatial per-frame sparse natural features. It does not require temporal feature correspondences, making it not only invariant to dynamic scenes and illumination changes, but also able to run significantly faster than standard bundle adjustment-based approaches. We introduce a principled method to determine if the calibration converges to the required level of accuracy, and show through online experiments that our approach achieves a level of accuracy that is comparable to offline markerbased calibration methods. Our method refines stereo extrinsic to the accuracy that is sufficient for block matching-based dense disparity computation. It provides a cost-effective way to improve the reliability of stereo vision systems for long-term autonomy.


Real-time monocular dense mapping on aerial robots using visual-inertial fusion

By Zhenfei YANG

In this work, we present a solution to real-time monocular dense mapping. A tightly-coupled visual-inertial localization module is designed to provide metric and high-accuracy odometry. A motion stereo algorithm is proposed to take the video input from one camera to produce local depth measurements with semi-global regularization. The local measurements are then integrated into a global map for noise filtering and map refinement. The global map obtained is able to support navigation and obstacle avoidance for aerial robots through our indoor and outdoor experimental verification. Our system runs at 10Hz on an Nvidia Jetson TX1 by properly distributing computation to CPU and GPU. Through onboard experiments, we demonstrate its ability to close the perception-action loop for autonomous aerial robots. We release our implementation as open-source software.


Building maps for autonomous navigation using sparse visual SLAM features

By Yonggen LING

Autonomous navigation, which consists of a systematic integration of localization, mapping, motion planning and control, is the core capability of mobile robotic systems. However, most research considers only isolated technical modules. There exist significant gaps between maps generated by SLAM algorithms and maps required for motion planning. Our work presents a complete online system that consists in three modules: incremental SLAM, real-time dense mapping, and free space extraction. The obtained free-space volume (i.e. a tessellation of tetrahedra) can be served as regular geometric constraints for motion planning. Our system runs in real-time thanks to the engineering decisions proposed to increase the system efficiency. We conduct extensive experiments on the KITTI dataset to demonstrate the run-time performance. Qualitative and quantitative results on mapping accuracy are also shown. For the benefit of the community, we make the source code public.


Visual-Inertial State Estimation

Monocular visual-inertial state estimation with online initialization and camera-IMU extrinsic calibration

By Zhenfei YANG

There have been increasing demands for developing microaerial vehicles with vision-based autonomy for search and rescue missions in complex environments. In particular, the monocular visual-inertial system (VINS), which consists of only an inertial measurement unit (IMU) and a camera, forms a great lightweight sensor suite due to its low weight and small footprint. In this paper, we address two challenges for rapid deployment of monocular VINS: 1) the initialization problem and 2) the calibration problem. We propose a methodology that is able to initialize velocity, gravity, visual scale, and camera-IMU extrinsic calibration on the fly. Our approach operates in natural environments and does not use any artificial markers. It also does not require any prior knowledge about the mechanical configuration of the system. It is a significant step toward plug-and-play and highly customizable visual navigation for mobile robots. We show through online experiments that our method leads to accurate calibration of camera-IMU transformation, with errors less than 0.02 m in translation and 1° in rotation. We compare out method with a state-of-the-art marker-based offline calibration method and show superior results. We also demonstrate the performance of the proposed approach in large-scale indoor and outdoor experiments.


Self-calibrating multi-camera visual-inertial fusion for autonomous MAVs

By Zhenfei YANG

We address the important problem of achieving robust and easy-to-deploy visual state estimation for micro aerial vehicles (MAVs) operating in complex environments. We use a sensor suite consisting of multiple cameras and an IMU to maximize perceptual awareness of the surroundings and provide sufficient redundancy against sensor failures. Our approach starts with an online initialization procedure that simultaneously estimates the transformation between each camera and the IMU, as well as the initial velocity and attitude of the platform, without any prior knowledge about the mechanical configuration of the sensor suite. Based on the initial calibrations, a tightly-coupled, optimization-based, generalized multi-camera-inertial fusion method runs onboard the MAV with online camera-IMU calibration refinement and identification of sensor failures. Our approach dynamically configures the system into monocular, stereo, or other multicamera visual-inertial settings, with their respective perceptual advantages, based on the availability of visual measurements. We show that even under random camera failures, our method can be used for feedback control of the MAVs. We highlight our approach in challenging indoor-outdoor navigation tasks with large variations in vehicle height and speed, scene depth, and illumination.


 Aggressive quadrotor flight using dense visual-inertial fusion

By Yonggen LING

In this work, we address the problem of aggressive flight of a quadrotor aerial vehicle using cameras and IMUs as the only sensing modalities. We present a fully integrated quadrotor system and demonstrate through online experiment the capability of autonomous flight with linear velocities up to 4.2 m/s, linear accelerations up to 9.6 m/s2 , and angular velocities up to 245.1 degree/s. Central to our approach is a dense visual-inertial state estimator for reliable tracking of aggressive motions. An uncertainty-aware direct dense visual tracking module provides camera pose tracking that takes inverse depth uncertainty into account and is resistant to motion blur. Measurements from IMU pre-integration and multi-constrained dense visual tracking are fused probabilistically using an optimization-based sensor fusion framework. Extensive statistical analysis and comparison are presented to verify the performance of the proposed approach. We also release our code as open-source ROS packages.


High altitude monocular visual-inertial state estimation: initialization and sensor fusion

By Tianbo LIU

Obtaining reliable state estimates at high altitude but GPS-denied environments, such as between high-rise buildings or in the middle of deep canyons, is known to be challenging, due to the lack of direct distance measurements. Monocular visual-inertial systems provide a possible way to recover the metric distance through proper integration of visual and inertial measurements. However, the nonlinear optimization problem for state estimation suffers from poor numerical conditioning or even degeneration, due to difficulties in obtaining observations of visual features with sufficient parallax, and the excessive period of inertial measurement integration. Here we propose a spline-based high altitude estimator initialization method for monocular visual-inertial navigation system (VINS) with special attention to the numerical issues. Our formulation takes only inertial measurements that contain sufficient excitation, and drops uninformative measurements such as those obtained during hovering. In addition, our method explicitly reduces the number of parameters to be estimated in order to achieve earlier convergence. Based on the initialization results, a complete closed-loop system is constructed for high altitude navigation. Extensive experiments are conducted to validate our approach.


Robust initialization of monocular visual-inertial estimation on aerial robots

By Tong QIN

We propose a robust on-the-fly estimator initialization algorithm to provide high-quality initial states for monocular visual-inertial systems (VINS). Due to the non-linearity of VINS, a poor initialization can severely impact the performance of either filtering-based or graph-based methods. Our approach starts with a vision-only structure from motion (SfM) to build the up-to-scale structure of camera poses and feature positions. By loosely aligning this structure with pre-integrated IMU measurements, our approach recovers the metric scale, velocity, gravity vector, and gyroscope bias, which are treated as initial values to bootstrap the nonlinear tightly-coupled optimization framework. We highlight that our approach can perform on-the-fly initialization in various scenarios without using any prior information about system states and movement. The performance of the proposed approach is verified through the public UAV dataset and real-time onboard experiment. We make our implementation open source, which is the initialization part integrated in the VINS-Mono.


Quadrotor Test Beds & Demonstration

Authors: Kunyue SU, Tianbo LIU

We present a method allowing a quadrotor equipped with only onboard cameras and an IMU to catch a flying ball. Our system runs without any external infrastructure and with reasonable computational complexity. Central to our approach is an online monocular vision-based ball trajectory estimator that recovers and predicts the 3-D motion of a flying ball using only noisy 2-D observations.Our method eliminates the need for direct range sensing via stereo correspondences, making it robust against noisy or erroneous measurements. Our system is made by a simple 2-D visual ball tracker, a UKF-based state estimator that fuses optical flow and inertial data, and a nonlinear tracking controller. We perform extensive analysis on system performance by studying both the system dynamics and ball trajectory estimation accuracy. Through online experiments, we show the first mid-air interception of a flying ball with an aerial robot using only onboard sensors.

Related publications:
K. Su and S. Shen. Catching a flying ball with a vision-based quadrotor. In Proc. of the International Symposium on Experimental Robotics (ISER), Tokyo, Japan, October 2016. To Appear.

Blur-Aware Motion Estimation

Author: Yi LIN

Visual-based simultaneous localization and mapping (SLAM) technology has been well developed these years. Both feature-based methods and direct methods show impressive performance. However in extreme environments such as low-light environment, long-time exposure with aggressive motions always cause serious motion blur. In almost all visual odometry systems, blurry images drastically impede the feature detecting and feature matching. Also, frame-to-frame corresponding pixel intensity invariance assumption in photometric-based method will be affected. Adopting a blur kernel to deblur the raw images from the cameras as the pre-procedure is a popular way to deal with this problem. Instead of using a blur kernel, directly estimating the blurry images is another novel way. Notice the fact that motion blur is caused by camera motion during the exposure period, we model each pixel intensity engendering. A blur-aware motion estimation method we proposed to estimate the trajectory from an initial unblur frame to blur frame by knowing the depth map from stereo cameras.

Edge-Based Motion Estimation

Authors: Manohar KUSE, Yonggen LING

There has been a paradigm shifting trend towards feature-less methods due to their elegant formulation, accuracy and ever increasing computational power. In this work, we present a direct edge alignment approach for 6-DOF tracking. We argue that photo-consistency based methods are plagued by a much smaller convergence basin and are extremely sensitive to noise, changing illumination and fast motion. We propose to use the Distance Transform in the energy formulation which can significantly extend the influence of the edges for tracking. We address the problem of non-differentiability of our cost function and of the previous methods by use of a sub-gradient method. Through extensive experiments we show that the proposed method gives comparable performance to the previous method under nominal conditions and is able to run at 30 Hz in single threaded mode. In addition, under large motion we demonstrate our method outperforms previous methods using the same runtime configuration for our method.

Related publications:
M. Kuse and S. Shen. Robust camera motion estimation using direct edge alignment and sub-gradient method. In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), pages 573–579, Stockholm, Sweden, May 2016.


Invited Talk:ICRA 2016 Tutorial

Professor Shaojie SHEN offered a tutorial lecture with regard to Sensor fusion for State Estimation, for 2016 ICRA IEEE International Conference on Robotics and Automation which hold at Stockholm, Sweden. This tutorial provides an introduction to the theory and practice of aerial robots, with a mix of fundamentals and application. It will expose participants to the state of the art in robot design, mechanics, control, estimation, perception and planning. The tutorial aimed to answer the question of “What sensors besides cameras are required for state estimation”, “How do we fuse information from different cameras and ensure redundancy ”And “how do we calibrate different systems”.

Suggested readings:
• Z. Yang and S. Shen, “Monocular Visual-Inertial State Estimation with Online Initialization and Camera-IMU Extrinsic Calibration”, IEEE Transactions on Automation Science and Engineering, 2016
• S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, “Multi-Sensor Fusion for Robust Autonomous Flight in Indoor and Outdoor Environments with a Rotorcraft MAV”, IEEE International Conference on Robotics and Automation, Hong Kong, China, May 2014


Related links

ICRA 2016 in Stockholm

ICRA 2016 Tutorial: Aerial Robotics

Presentation slide: Sensor Fusion for Aerial Robots