The curation of large-scale datasets is still costly
and requires much time and resources. Data is often manually
labeled, and the challenge of creating high-quality datasets
remains. In this work, we fill the research gap using active
learning for multi-modal 3D object detection. We propose
ActiveAnno3D, an active learning framework to select data
samples for labeling that are of maximum informativeness for
training. We explore various continuous training methods and
integrate the most efficient method regarding computational
demand and detection performance. Furthermore, we perform
extensive experiments and ablation studies with BEVFusion
and PV-RCNN on the nuScenes and TUM Traffic Intersection
(TUMTraf-I) dataset. We show that we can achieve almost the
same performance with PV-RCNN and the entropy-based query
strategy when using only half of the training data (77.25 mAP
compared to 83.50 mAP) of the TUMTraf-I dataset. BEVFusion
achieved an mAP of 64.31 when using half of the training data
and 52.88 mAP when using the complete nuScenes dataset.
We integrate our active learning framework into the proAnno
labeling tool to enable AI-assisted data selection and labeling
and minimize the labeling costs. Finally, we provide code,
weights, and visualization results on our website: https://active3d-framework.github.io/active3d-framework.