# VDO-SLAM: A Visual Dynamic Object-aware SLAM System --- # Resources - Github: https://github.com/halajun/VDO_SLAM - [Paper](https://arxiv.org/pdf/2005.11052v2.pdf) - [Video Demo](https://drive.google.com/file/d/1PbL4KiJ3sUhxyJSQPZmRP6mgi9dIC0iu/view) - 【泡泡图灵智库】[VDO-SLAM：动态对象感知的视觉SLAM系统](https://www.ershicimi.com/p/e548fc78345dc313e614e7ff03f07c84) - [暑期研习系列：动态室外场景SLAM研究补充](https://zhuanlan.zhihu.com/p/163816831) - [VDO-SLAM ：一种动态目标感知的视觉SLAM系统](https://blog.csdn.net/electech6/article/details/108506856) --- # Demo # contributions - a novel formulation to model dynamic scenes in a unified estimation framework over robot poses, static and dynamic 3D points, and object motions. - accurate estimation for SE(3) pose change of dynamic objects that outperforms state-of-the-art algorithms, as well as a way to extract objects’ velocity in the scene, - a robust method for tracking moving objects exploiting semantic information with the ability to handle indirect occlusions resulting from the failure of semantic object segmentation, - a demonstrable system in complex and compelling real-world scenarios. # Background and Notation ## Coordinate Frames: - ${}^0X_k$: 3D pose of robot/camera - ${}^0L_k \in SE(3)$: 3D pose of the object - $k\in T$ the set of time steps - 0: global reference frame ## Points: - ${}^0m^i_k =[m^i_x,m^i_y,m^i_z,1] \in IE^3$ : the homogeneous coordinates of the i-th 3D point at time k - ${}^{X_k} m^i_k = {}^0 X^{−1}_k {}^0m^i_k$: the point in robot/camera frame - $I_k$ the reference frame associated with the image captured by the camera at time k - ${}^{I_k} p^i_k = [u^i,v^i,1] \in IE^2$ : the pixel location on frame $I_k$ corresponding to the homogeneous 3D point $${}^{I_k} p^i_k = \pi(^{X_k} m^i_k) = K^{X_k}m^i_k$$ where K is the camera intrinsics matrix. - $^{I_k}\phi^i \in IR^2$: optical flow that is the displacement vector indicating the motion of pixel $I_{k−1}p^i_{k−1}$ from image frame $I_{k−1}$ to $I_k$ $$^{I_k}\phi^i = ^{I_k}\tilde{p}^i_k −^{I_{k−1}}p^i_{k−1}$$ Here $^{I_k} \tilde {p}^i_k$ is the correspondence of $^{I_{k−1}}p^i_{k−1}$ in $I_k$. we leverage optical flow to find correspondences between consecutive frames. ## Object and 3D Point Motions: The object motion between times $k−1$ and $k$ is described by the homogeneous transformation $^{L_{k−1}}_{k−1} H_k \in SE(3)$ according to: $$^{L_{k−1}}_{k−1} H_k = ^0L^{-1}_{k-1} {}^{0}L_k$$ We write a point in its corresponding object frame as $${}^{L_k} m^i_k = ^0L^{−1}_k {}^0 m^i_k$$ substituting the object pose at time k, this becomes: $${}^0 m^i_k = ^0L^{−1}_k {}^{L_k}m^i_k = ^0L^{−1}_{k-1} {}^{L_{k−1}}_{k−1} H_k {}^{L_k} m^i_k$$ Note that for rigid body objects, $L_k m^i_k$ stays constant , and ${}^Lm^i = {}^0L_k^{−1} {}^0m^i_k = {}^0L^{−1}_{k+n} {}^0m^i_{k+n}$ for any integer $n\in Z$. Then, for rigid objects with $n =−1$: $${}^0 m^i_k = ^0L^{−1}_k {}^{L_k}m^i_k \\\ = ^0L^{−1}_{k-1} {}^{L_{k−1}}_{k−1} H_k {}^{L_k} m^i_k \\\ = ^0L^{−1}_{k-1} {}^{L_{k−1}}_{k−1} H_k {}^0L_{K-1}^{-1} {}^0m_{k-1}^i$$ So $${}^0_{k-1}H_k = ^0L^{−1}_{k-1} {}^{L_{k−1}}_{k−1} H_k {}^0L_{K-1}^{-1}$$ $${}^0 m^i_k = {}^0_{k-1}H_k {}^0m_{k-1}^i$$ It expresses the rigid object pose change in terms of the points that reside on the object in a model-free manner without the need to include the object 3D pose as a random variable in the estimation. ![vdo_slam_feagure2](https://cdn.jsdelivr.net/gh/yubaoliu/assets@image/vdo_slam_feagure2.png) # Camera Pose and Object Motion Estimation ## Camera Pose Estimation - ${}^0m^i_{k−1}$ observed 3D points at time $k-1$ in global reference frame - ${}^{I_k} \tilde p^i_k$: 2D correspondences in image $I_k$ - ${}^0X_k$: camera pose $$e_i({}^0X_k) = {}^{I_k} \tilde {p}^i_k −\pi({}^0X^{−1}_k {}^0m^i_{k−1})$$ We parameterise the SE(3) camera pose by elements of the Lie-algebra $x_k \in se(3)$: $${}^0X_k = exp({}^0x_k)$$ the solution of the least squares cost is given by: $${}^0x_k^{*\vee} = argmin \sum^{n_b}_i \rho(e_i^T({}^0x_k)\Sigma_p^{-1}e_i({}^0x_k))$$ $n_b$: ass visible 3D-2D static background point correspondences ## Object Motion Estimation Analogous to the camera pose estimation, a cost function based on re-projection error is constructed to solve for the object motion $$e_i({}^0_{k-1}H_k) := {}^{I_k} \tilde {p}^i_k − \pi({}^0X^{−1}_k {}^0_{k-1}H_k {}^0m^i_{k-1}) \\\ = {}^{I_k} \tilde {p}^i_k − \pi({}^0_{k-1}G_k {}^0m^i_{k-1})$$ the optimal solution: $${}^0_{k-1}g_k^{*\vee} = argmin \sum_i^{n_d} \rho( e_i^T({}^0_{k-1}g_k)\Sigma e_i({}^0_{k-1}g_k) )$$ $n_d$: visible 3D-2D dynamic point correspondences on an object between frames k−1 and k. ## Joint Estimation with Optical Flow In order to ensure a robust tracking of points, the technique proposed in this paper aims at refining the estimation of the optical flow jointly with the motion estimation. For camera pose estimation, the error term in is reformulated considering as: $$e_i({}^0X_k, {}^{I_{k-1}}\phi) = {}^{I_{k-1}}p_{k-1}^i + {}^{I_k}\phi^i - \pi({}^0X_k^{-1} {}^0m_{k-1}^i)$$ # Dynamic Object Tracking Ideally, the magnitude of the scene flow vector should be zero for all static 3D points. However, noise or error in depth and matching complicates the situation in real scenarios. If the magnitude of the scene flow of a certain point is greater than a predefined threshold, the point is considered dynamic. An object is then recognised dynamic if the proportion of “dynamic” points is above a certain level (30% of total number of points), otherwise static. # Graph Optimisation - 3D point meausrement factors - odometric factors - motion model factors - object smooth motion factor ![vdo_slam_factor_graph](https://cdn.jsdelivr.net/gh/yubaoliu/assets@image/vdo_slam_factor_graph.png) ![vdo_slam_cost_function](https://cdn.jsdelivr.net/gh/yubaoliu/assets@image/vdo_slam_cost_function.png) --- # Oxford Multi-motion Dataset (OMD) ## OMD Demo dataset sh demo-omd tree -L 1 . ├── depth ├── flow ├── image_0 ├── object_pose.txt ├── pose_gt.txt ├── semantic └── times.txt  ## Object pose sh 0 1 -0.778713066 0.183287918 2.070737279 0.153768414 -0.104345249 0.998078694 0 2 0.847724327 0.048528209 2.237562961 0.086078779 -0.011562935 0.245478135 0 3 -0.908852074 -0.077190375 0.641690553 -0.133886905 0.093544253 -2.995389714 0 4 0.952205539 0.058495007 0.960234314 -0.132837389 -0.078633387 3.002592906 1 1 -0.783350563 0.160257176 2.065534576 0.140000859 -0.091259926 0.998728005 1 2 0.844175202 0.045846580 2.236999366 0.087844636 -0.005751488 0.218618708 1 3 -0.891143146 -0.079981796 0.634832470 -0.067910809 0.085304838 -2.997665692 1 4 0.949186056 0.059351346 0.961655677 -0.135802240 -0.075755599 3.031812546  ## Pose sh 0 1.000000 0.000000 0.000000 0.000000 -0.000000 1.000000 -0.000000 -0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1 0.999989 0.003302 0.003453 0.000042 -0.003319 0.999979 0.005679 0.000545 -0.003433 -0.005692 0.999977 0.011220 0.000000 0.000000 0.000000 1.000000 2 0.999979 0.003718 0.005381 0.000532 -0.003778 0.999927 0.011609 0.001680 -0.005337 -0.011632 0.999916 0.021692 0.000000 0.000000 0.000000 1.000000  ## times sh 0000 5.000000e-02 1.000000e-01 1.500000e-01 2.000000e-01 2.500000e-01  --- # Source Code # Possible errors ## cs.h: No such file or directory Error message: sh [ 95%] Building CXX object CMakeFiles/g2o.dir/g2o/solvers/csparse_extension.cpp.o In file included from /root/catkin_ws/src/vdo-slam/dependencies/g2o/g2o/solvers/csparse_extension.cpp:22:0: /root/catkin_ws/src/vdo-slam/dependencies/g2o/g2o/solvers/csparse_extension.h:27:10: fatal error: cs.h: No such file or directory #include ^~~~~~  Solution sh apt install libsuitesparse-dev  ## SOLVEPNP_AP3P is not a member of 'cv' Error Message: sh /root/catkin_ws/src/vdo-slam/src/Tracking.cc:1657:63: error: 'SOLVEPNP_AP3P' is not a member of 'cv' iter_num, reprojectionError, confidence, inliers, cv::SOLVEPNP_AP3P); // AP3P EPNP P3P ITERATIVE DLS ^~~~~~~~~~~~~ /root/catkin_ws/src/vdo-slam/src/Tracking.cc:1657:63: note: suggested alternative: 'SOLVEPNP_P3P' iter_num, reprojectionError, confidence, inliers, cv::SOLVEPNP_AP3P); // AP3P EPNP P3P ITERATIVE DLS ^~~~~~~~~~~~~ SOLVEPNP_P3P  Solution: Upgrade OpenCV version to larger than 3.3.0 ## opencv2/xfeatures2d.hpp: No such file or directory Error Message: sh [ 50%] Building CXX object CMakeFiles/ObjSLAM.dir/src/Frame.cc.o /root/catkin_ws/src/vdo-slam/src/Frame.cc:9:10: fatal error: opencv2/xfeatures2d.hpp: No such file or directory #include ^~~~~~~~~~~~~~~~~~~~~~~~~  Solution: Install opencv_contrib ## error: 'CV_CPU_HAS_SUPPORT_SSE2' was not declared in this scope Error Message: sh /root/opencv/modules/core/include/opencv2/core/hal/intrin_sse.hpp:1910:13: error: 'CV_CPU_HAS_SUPPORT_SSE2' was not declared in this scope return (CV_CPU_HAS_SUPPORT_SSE2) ? true : false; ^~~~~~~~~~~~~~~~~~~~~~~ /root/catkin_ws/src/vdo-slam/example/vdo_slam.cc: In function 'int main(int, char**)':  Solution: Install opencv_contrib.