EarthSynth: Generating Informative Earth Observation with Diffusion Models

ArXiv 2025

Jiancheng Pan^☆1,3, Shiye Lei^☆2, Yuqian Fu³,

Jiahao Li¹,

Yanxing Liu⁴,

Xiao He⁵,

Yuze Sun¹,

Long Peng⁶,

Xiaomeng Huang¹, Bo Zhao⁷,

Motivations

Remote sensing image (RSI) interpretation is fundamentally constrained by challenges such as severe class imbalance and limited availability of high-quality labeled data, significantly impeding the development of robust models for downstream tasks. However, labeling RSIs typically requires domain-specific expertise and substantial manual effort, making large-scale annotation time-consuming and costly. Consequently, an important research objective is to effectively exploit existing labeled Earth observation (EO) datasets by uncovering latent relationships among samples to improve data efficiency.

Contributions

🌍 EarthSynth and EarthSynth-180K

We propose EarthSynth, a diffusion-based generative foundation model trained on the EarthSynth-180K dataset with 180K samples aligned image, semantic mask, and text, achieving a unified solution to achieve multi-task generation.

🛰️ Counterfactual Composition (CF-Comp) and R-Filter

EarthSynth employs the CF-Comp strategy to balance the layout controllability and category diversity during training, enabling fine layout control for RSI generation. It further incorporates the R-Filter to extract more informative and high-quality synthesized samples.

EarthSynth-180K Dataset

EarthSynth-180K is derived from OEM, LoveDA, DeepGlobe, SAMRS, and LAE-1M datasets. It is further enhanced with mask and text prompt conditions, making it suitable for training foundation diffusion-based generative model. The EarthSynth-180K dataset is constructed using the Random Cropping and Category Augmentation strategies.

EarthSynth Model

EarthSynth is trained with CF-Comp training strategy on real and unrealistic logical mixed data distribution, learns remote sensing pixel-level properties in multiple dimensions, and builds a unified process for conditional diffusion training and synthesis.

Acknowledgement

This project references and builds upon several open-source models, including Diffusers, ControlNet, MM-Grounding-DINO, CLIP, and GSNet, and uses data from the OpenEarthMap, LoveDA, DeepGlobe, SAMRS, and LAE-1M datasets. We sincerely thank the authors and maintainers of these resources for supporting this work.

@article{pan2025earthsynth, title={EarthSynth: Generating Informative Earth Observation with Diffusion Models}, author={Pan, Jiancheng and Lei, Shiye and Fu, Yuqian and Li, Jiahao and Liu, Yanxing and Sun, Yuze and He, Xiao and Peng, Long and Huang, Xiaomeng and Zhao, Bo}, journal={arXiv preprint arXiv:2505.12108}, year={2025} }

EarthSynth: Generating Informative Earth Observation with Diffusion Models

¹Tsinghua University, ²University of Sydney, ³INSAIT, Sofia University "St. Kliment Ohridski", ⁴University of Chinese Academy of Sciences, ⁵Wuhan University, ⁶University of Science and Technology of China , ⁷Shanghai Jiao Tong University

Motivations

Contributions

🌍 EarthSynth and EarthSynth-180K

🛰️ Counterfactual Composition (CF-Comp) and R-Filter

EarthSynth-180K Dataset

EarthSynth Model

Acknowledgement

Citation

EarthSynth: Generating Informative Earth Observation with Diffusion Models

1Tsinghua University, 2University of Sydney, 3INSAIT, Sofia University "St. Kliment Ohridski", 4University of Chinese Academy of Sciences, 5Wuhan University, 6University of Science and Technology of China , 7Shanghai Jiao Tong University

Motivations

Contributions

🌍 EarthSynth and EarthSynth-180K

🛰️ Counterfactual Composition (CF-Comp) and R-Filter

EarthSynth-180K Dataset

EarthSynth Model

Acknowledgement

Citation

¹Tsinghua University, ²University of Sydney, ³INSAIT, Sofia University "St. Kliment Ohridski", ⁴University of Chinese Academy of Sciences, ⁵Wuhan University, ⁶University of Science and Technology of China , ⁷Shanghai Jiao Tong University