PASC 2026
Heidelberg Institute for Theoretical Studies (HITS)
The amount of astronomical data is growing exponentially
Machine learning methods are needed to explore this amount of data and to extract knowledge
Our goal: To develop modular open-source tools for self-supervised knowledge discovery and interactive visualization of large-scale cosmological data
PEST preprocesses universal cosmological simulation data into multi-channel images, data cubes, and point clouds
ETL (Extract \(\rightarrow\) Transform \(\rightarrow\) Load) pipeline driven by YAML
Apache Parquet stores efficiently multi-modal data in a columnar data storage
Upload to Hugging Face or Zenodo for easy sharing and integration with ML frameworks
Polsterer et al. (2024); Doser et al. (2026)

Cao and Aziz (2020)
Optimal latent space dimensionality selected by reconstruction quality.
Original IllustrisTNG SKIRT SDSS images

ResNet-18 autoencoder, 512 features: Sufficient reconstruction

ResNet-18 VAE-S\(^{128}\): Details are present, but blurry

Original IllustrisTNG SKIRT SDSS images

ResNet-18 VAE-S\(^{128}\): Details are present, but blurry

ResNet-18 VAE-S\({^2}\): Details are lost, but the overall structure is preserved



Fernique et al. (2015)
Projected

Generated

valhalla/emoji-dataset (2,749 emojis)
Projected

Projected (zoomed)

valhalla/emoji-dataset (2,749 emojis)
Projected

Generated

tonyassi/celebrity-1000 (18,184 images of 1000 celebrities)
Largest most uniform all-sky spectrophotometric survey (over 220 million sources)
Doser et al. (2026)



Funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) and Belgium, Czech Republic, France, Germany, Greece, Italy, Norway, and Spain under grant agreement No 101093441.
Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European High Performance Computing Joint Undertaking (JU) and Belgium, Czech Republic, France, Germany, Greece, Italy, Norway, and Spain. Neither the European Union nor the granting authority can be held responsible for them.


From Visualization to Knowledge Discovery (B. Doser & S. Trujillo-Gomez)