A Benchmark Dataset for 6DoF Object Pose Tracking




Abstract

Accurately tracking the six degree-of-freedom pose of an object in real scenes is an important task in computer vision and augmented reality with numerous applications. Although a variety of algorithms for this task have been proposed, it remains difficult to evaluate existing methods in the literature as oftentimes different sequences are used and no large benchmark datasets close to real-world scenarios are available. In this paper, we present a large object pose tracking benchmark dataset consisting of RGB-D video sequences of 2D and 3D targets with ground-truth information. The videos are recorded under various lighting conditions, different motion patterns and speeds with the help of a programmable robotic arm. We present extensive quantitative evaluation results of the state-of-the-art methods on this benchmark dataset and discuss the potential research directions in this field.

Paper

Paper (7.58 MB)
Supplementary Material (67.1 MB)
Poster (2.50 MB)

Intro

Citation

Po-Chen Wu, Yueh-Ying Lee, Hung-Yu Tseng, Hsuan-I Ho, Ming-Hsuan Yang, and Shao-Yi Chien, "A Benchmark Dataset for 6DoF Object Pose Tracking." In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR Adjunct), 2017.

Bibtex

@inproceeding{OPT2017,
    author    = {Wu, Po-Chen and Lee, Yueh-Ying and Tseng, Hung-Yu and Ho, Hsuan-I and Yang, Ming-Hsuan and Chien, Shao-Yi}, 
    title     = {A Benchmark Dataset for 6DoF Object Pose Tracking}, 
    booktitle = {IEEE International Symposium on Mixed and Augmented Reality (ISMAR Adjunct)},
    year      = {2017}
}

Notes

  • (2017/11/07) The runtimes of the IPPE method (0.044s → 0.001s) and the OPnP method (0.156s → 0.008s) are corrected.

Download

Model 2D 3D
File
Dataset 1920 ✕ 1080 512 ✕ 424
Focal Length   fx 1060.197 366.736
Focal Length   fy 1060.273 366.458
Principle Point   cx 965.809* 254.026*
Principle Point   cy 561.952* 207.470*
2D Dataset
3D Dataset
Pose Viewer
*The principle points are used for 1-indexed programming languages (e.g., MATLAB). They should be shifted by -1 for 0-indexed programming languages (e.g., C++).

Notes

  • This dataset can also be downloaded from FTP:
    Host Port Username Password
    140.112.48.121 25253 opt dataset
  • You can check the file name and file size by moving your mouse over the corresponding download icon.
  • It contains color, depth, and mask lossless PNG image sequences for both 2D and 3D models.
  • All images are rectified according to their distortion coefficients (radial and tangential distortions).
  • The transformation matrix between depth camera coordinate system and color camera coordinate system is shown below.
  • We provide the pose viewer software (written in MATLAB language) for checking poses. The GUI is shown below (1080p case).
  • The folder structure is shown below (1080p case).
  • The coordinate system is shown below.
  • The evaluated motion patterns are shown below.

Results

Images of different motion patterns with 2D targets and annotated ground truth poses. From left to right: Translation (wing), Zoom (duck), In-plane rotation (city), Out-of-plane rotation (beach), Flashing light (firework), and Moving light (maple).
Images of different motion patterns with 3D targets and wire-frame models rendered according to the annotated ground truth poses. From left to right: Translation (soda), Zoom (chest), In-plane rotation (ironman), Out-of-plane rotation (house), Flashing light (bike), and Moving light (jet).
Images of motion pattern "free motion" with 2D targets.
Images of motion pattern "free motion" with 3D targets.
Overall performance evaluation with 2D targets.
Performance by attributes with different speeds.
Precision plots for Translation, Zoom, In-plane Rotation, and Out-of-plane Rotation sub-datasets. The number in the plot title stands for the speed level.
Precision plots for Flashing Light, Moving Light, and Free Motion.
Overall performance evaluation with 3D targets.
Performance by attributes with different speeds.
Precision plots for Translation, Zoom, In-plane Rotation, and Out-of-plane Rotation sub-datasets. The number in the plot title stands for the speed level.
Precision plots for Flashing Light, Moving Light, and Free Motion.

References

Acknowledgement

The authors wish to thank Professor Shih-Chung Kang and Ci-Jyun Liang from RLab, NTUCE for providing their programmable robotic arm and the fruitful discussions. We would also like to show our gratitude to Po-Hao Hsu for sharing his photos used in this work.