Abstract
Tokens are discrete representations that allow modern deep learning to scale by transforming high-dimensional data into sequences that can be efficiently learned. As existing tokenizers are designed for realistic visual perception, we investigate whether these are optimal for scientific images, which exhibit a large dynamic range and require token embeddings to retain physical properties. We propose Phaedra, inspired by classical shape-gain quantization and proper orthogonal decomposition. We demonstrate that Phaedra consistently improves reconstruction across a range of PDE datasets and shows strong out-of-distribution generalization to unknown PDEs and real-world Earth observation data.
Phaedra consistently improves reconstruction across PDE datasets, capturing fine details and precise magnitudes critical for scientific simulation. This is achieved by splitting the embeddings into shape and gain components, allowing for high-fidelity tokenization that retains physical properties, even in out-of-distribution scenarios.
Methodology
Phaedra leverages a disentangled hierarchical representation inspired by shape-gain quantization. The tokenizer factorizes latent embeddings into two distinct components: morphology ($z_\mu$), which captures spatial structures and topological features via Finite Scalar Quantization (FSQ), and amplitude ($z_\alpha$), which preserves the dynamic range and physical magnitudes. By incorporating an approximately continuous channel for amplitude, Phaedra avoids the precision loss common in standard discrete codebooks. This architecture effectively mirrors Proper Orthogonal Decomposition (POD) by separating spatial modes from their scalar coefficients, ensuring that physical properties such as conservation laws and sharp gradients are maintained even at high compression ratios.
The Phaedra pipeline uses a single encoder to compute two sets of embeddings; the shape embeddings are quantized using sparse, high-dimensional FSQ to capture spatial structures, while the gain embeddings are passed through a dense, 1-dimensional quantizer which approximates a continuous channel, preserving physical magnitudes. Following quantization, a learned recombination operator combines these two embeddings and passes them to the decoder for the final recosntruction.
Interactive Latent Space Exploration
Use the sliders below to independently interpolate the morphology ($z_\mu$) and amplitude ($z_\alpha$) tokens between two timesteps of the Kelvin-Helmholtz (KH) instability dataset. This example illustrates the disentanglement of features via the amplitude-morphology latent representations. Interpolating between the amplitude tokens shows larger changes within the overall structure of the flow, while changing morphology tokens results in finer changes to the turbulent structures, while maintaining the same overall dynamic range. By combining the morphology of one sample with the amplitude of another, we can generate reconstructions that maintain the fine-scale turbulent structure of one sample while strictly obeying the physical dynamic range of the other. Please note, this example is only for illustrative purposes and does not represent a true interpolation in the latent space, as the model was not trained with an explicit disentanglement loss.
Active Reconstruction
PDE Reconstruction Benchmarks
All models are trained on the same set of data, comprising Compressible Euler time-series defined by 4 different classes of initial conditions and Incompressible Navier-Stokes defined by 2 different classes of initial conditions. While we present the some competitive tokenizers below, a comprehensive comparison against additional baselines is detailed in the main paper. We evaluate Phaedra across three levels of difficulty.
- ID (In-Distribution): Test samples from the same PDE family and parameters as training.
- OD1 (Out-of-Distribution): Shifts in the PDE coefficients and initial conditions.
- OD2 (Out-of-Distribution): Shifts in the PDEs that define the problem.
| Model | Dataset | nMAE ↓ | nRMSE ↓ | Δσ²loc ↓ | γmin ↑ |
|---|---|---|---|---|---|
| VQ-VAE-2 | ID | 3.024 | 5.069 | 15.02 | 79.1% |
| OD1 | 2.113 | 5.854 | 21.55 | 87.5% | |
| OD2 | 4.449 | 5.833 | 17.06 | 68.9% | |
| FSQ | ID | 2.603 | 4.292 | 11.29 | 85.3% |
| OD1 | 1.876 | 4.314 | 20.55 | 93.8% | |
| OD2 | 3.831 | 4.997 | 11.65 | 68.0% | |
| Phaedra (Ours) | ID | 1.522 | 2.489 | 5.96 | 93.6% |
| OD1 | 1.217 | 2.442 | 6.47 | 98.0% | |
| OD2 | 2.500 | 3.363 | 5.82 | 79.9% |
Sentinel-2 L1C Earth Observation Data
| Group | Model | rL₁ ↓ | rL₂ ↓ | Δσ²loc ↓ | γmin ↑ |
|---|---|---|---|---|---|
| Baseline | Continuous | 7.426 | 8.100 | 62.61 | 90.02% |
| Compression: 4$\times$4 | Phaedra₄ (Ours) | 8.895 | 9.749 | 128.5 | 93.17% |
| FSQ₄ | 11.053 | 12.405 | 215.8 | 77.62% | |
| Compression: 8$\times$8 | Phaedra₈ (Ours) | 9.900 | 11.475 | 163.9 | 62.72% |
| Cosmos₈ | 16.717 | 19.245 | 19,566 | 79.70% |
Metrics calculated on native 13-band resolutions (10m, 20m, 60m).
Conclusion & Discussion
Our findings demonstrate that Phaedra successfully closes the "fidelity gap" that typically limits discrete tokenization in scientific applications. By balancing structural discretization with magnitude preservation, the model achieves superior reconstruction across diverse PDE families and exhibits remarkable zero-shot transfer to complex Earth observation data. This suggests that the shape-gain prior serves as a fundamental inductive bias for physical systems, enabling models to generalize across scales and disciplines. As we move toward larger scientific foundation models, Phaedra provides a robust framework for transforming continuous physical fields into discrete sequences without sacrificing the precision required for rigorous scientific analysis.
BibTeX
@misc{lingsch2026phaedralearninghighfidelitydiscrete,
title={Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science},
author={Levi Lingsch and Georgios Kissas and Johannes Jakubik and Siddhartha Mishra},
year={2026},
eprint={2602.03915},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.03915},
}