Object detection, particularly open-vocabulary object detection, plays a crucial role in Earth sciences, such as environmental monitoring, natural disaster assessment, and landuse planning. However, existing open-vocabulary detectors, primarily trained on natural-world images, struggle to generalize to remote sensing images due to a significant data domain gap. Thus, this paper aims to advance the development of open-vocabulary object detection in remote sensing community.
LAE-1M Dataset powered by LAE-Label Engine
LAE-DINO Open-Vocabulary Detector
In addition to the visual examples as shown in the benchmark figure, we further provide more infomations here. All the target datasets could be found on our github repo.
LAE-COD dataset examples: Raw data labelled by LAE-Label engine without rule-based filtering.
We propose a novel LAE-DINO detector for LAE, with dynamic vocaublary constuction (DVC) and VisualGuided Text Prompt Learning (VisGT) as novel modules. The Overall framework of our LAE-DINO:
@inproceedings{pan2025locate,
title={Locate anything on earth: Advancing open-vocabulary object detection for remote sensing community},
author={Pan, Jiancheng and Liu, Yanxing and Fu, Yuqian and Ma, Muyuan and Li, Jiahao and Paudel, Danda Pani and Van Gool, Luc and Huang, Xiaomeng},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={6},
pages={6281--6289},
year={2025}
}