Abstract:
This paper presents a novel method to estimate 3D keypoints from single-view RGB images.
Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints.
In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds.
This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input.
Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of N predicted keypoints.
This allows the prediction of different number of keypoints depending on the object’s geometry.
We use the estimated keypoints for computing the relative pose between two views of an object.
The results are compared with those of KP-Net[1] and StarMap [2], which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image.
The average angular distance error of our approach (5.94[[EQUATION]]) is 8.46[[EQUATION]] and 55.26[[EQUATION]] lower than that of KP-Net (14.40[[EQUATION]]) and StarMap (61.20[[EQUATION]]), respectively.
Published:
05 February 2023
RAISE Affiliate:
Spoke 4
Name of the Journal:
Pattern Recognition
Publication type:
Contribution in journal
DOI: