OPT Dataset

Abstract

Accurately tracking the six degree-of-freedom pose of an object in real scenes is an important task in computer vision and augmented reality with numerous applications. Although a variety of algorithms for this task have been proposed, it remains difficult to evaluate existing methods in the literature as oftentimes different sequences are used and no large benchmark datasets close to real-world scenarios are available. In this paper, we present a large object pose tracking benchmark dataset consisting of RGB-D video sequences of 2D and 3D targets with ground-truth information. The videos are recorded under various lighting conditions, different motion patterns and speeds with the help of a programmable robotic arm. We present extensive quantitative evaluation results of the state-of-the-art methods on this benchmark dataset and discuss the potential research directions in this field.

Paper

Paper (7.58 MB)

Supplementary Material (67.1 MB)

Poster (2.50 MB)

Intro

Citation

Po-Chen Wu, Yueh-Ying Lee, Hung-Yu Tseng, Hsuan-I Ho, Ming-Hsuan Yang, and Shao-Yi Chien, "A Benchmark Dataset for 6DoF Object Pose Tracking." In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR Adjunct), 2017.

Bibtex

@inproceeding{OPT2017,
    author    = {Wu, Po-Chen and Lee, Yueh-Ying and Tseng, Hung-Yu and Ho, Hsuan-I and Yang, Ming-Hsuan and Chien, Shao-Yi}, 
    title     = {A Benchmark Dataset for 6DoF Object Pose Tracking}, 
    booktitle = {IEEE International Symposium on Mixed and Augmented Reality (ISMAR Adjunct)},
    year      = {2017}
}

Notes

(2017/11/07) The runtimes of the IPPE method (0.044s → 0.001s) and the OPnP method (0.156s → 0.008s) are corrected.

Download

Model	2D	3D
File

Dataset	1920 ✕ 1080	512 ✕ 424
Focal Length f_x	1060.197	366.736
Focal Length f_y	1060.273	366.458
Principle Point c_x	965.809*	254.026*
Principle Point c_y	561.952*	207.470*
2D Dataset
3D Dataset
Pose Viewer

*The principle points are used for 1-indexed programming languages (e.g., MATLAB). They should be shifted by -1 for 0-indexed programming languages (e.g., C++).

Notes

This dataset can also be downloaded from FTP:

Host	Port	Username	Password
140.112.48.121	25253	opt	dataset

You can check the file name and file size by moving your mouse over the corresponding download icon.
It contains color, depth, and mask lossless PNG image sequences for both 2D and 3D models.
All images are rectified according to their distortion coefficients (radial and tangential distortions).
The transformation matrix between depth camera coordinate system and color camera coordinate system is shown below.
We provide the pose viewer software (written in MATLAB language) for checking poses. The GUI is shown below (1080p case).
The folder structure is shown below (1080p case).
The coordinate system is shown below.
The evaluated motion patterns are shown below.

Results

Images of different motion patterns with 2D targets and annotated ground truth poses. From left to right: Translation (wing), Zoom (duck), In-plane rotation (city), Out-of-plane rotation (beach), Flashing light (firework), and Moving light (maple).

Images of different motion patterns with 3D targets and wire-frame models rendered according to the annotated ground truth poses. From left to right: Translation (soda), Zoom (chest), In-plane rotation (ironman), Out-of-plane rotation (house), Flashing light (bike), and Moving light (jet).

Images of motion pattern "free motion" with 2D targets.

Images of motion pattern "free motion" with 3D targets.

Overall performance evaluation with 2D targets.

Performance by attributes with different speeds.

Precision plots for Translation, Zoom, In-plane Rotation, and Out-of-plane Rotation sub-datasets. The number in the plot title stands for the speed level.

Precision plots for Flashing Light, Moving Light, and Free Motion.

Overall performance evaluation with 3D targets.

Performance by attributes with different speeds.

Precision plots for Translation, Zoom, In-plane Rotation, and Out-of-plane Rotation sub-datasets. The number in the plot title stands for the speed level.

Precision plots for Flashing Light, Moving Light, and Free Motion.

Acknowledgement

The authors wish to thank Professor Shih-Chung Kang and Ci-Jyun Liang from RLab, NTUCE for providing their programmable robotic arm and the fruitful discussions. We would also like to show our gratitude to Po-Hao Hsu for sharing his photos used in this work.