Google’s AI division has announced the release of the Objectron dataset, a corpus of short video clips designed to capture common objects from various angles — and each coming with augmented reality session data with sparse point-clouds and manually-annotated 3D bounding boxes.
“Understanding objects in 3D remains a challenging task due to the lack of large real-world datasets compared to 2D tasks (e.g., ImageNet, COCO, and Open Images),” explain Google Research software engineers Adel Ahmadyan and Liangkai Zhang. “To empower the research community for continued advancement in 3D object understanding, there is a strong need for the release of object-centric video datasets, which capture more of the 3D structure of an object, while matching the data format used for many vision tasks (i.e., video or camera streams), to aid in the training and benchmarking of machine learning models.”