Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Sciences

Lingsch, Levi; Kissas, Georgios; Jakubik, Johannes; Mishra, Siddhartha

Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Sciences

Levi Lingsch^1,2,3, Georgios Kissas⁴, Johannes Jakubik², Siddhartha Mishra^1,3

¹ETH AI Center, ²IBM Research Europe, ³ SAM ETH Zurich, ⁴SDSC

Disentangled Latent Control: Phaedra factorizes fields into independent $z_\mu$ (Morphology) and $z_\alpha$ (Amplitude) tokens. By combining, for example, the local vortices from Sample B ($z_{\mu,B}$) with the global magnitude profile of Sample A ($z_{\alpha,A}$), the reconstruction maintains the fine-scale turbulent structure of one while strictly obeying the physical dynamic range of the other.

Abstract

Tokens are discrete representations that allow modern deep learning to scale by transforming high-dimensional data into sequences that can be efficiently learned. As existing tokenizers are designed for realistic visual perception, we investigate whether these are optimal for scientific images, which exhibit a large dynamic range and require token embeddings to retain physical properties. We propose Phaedra, inspired by classical shape-gain quantization and proper orthogonal decomposition. We demonstrate that Phaedra consistently improves reconstruction across a range of PDE datasets and shows strong out-of-distribution generalization to unknown PDEs and real-world Earth observation data.

Phaedra consistently improves reconstruction across PDE datasets, capturing fine details and precise magnitudes critical for scientific simulation. This is achieved by splitting the embeddings into shape and gain components, allowing for high-fidelity tokenization that retains physical properties, even in out-of-distribution scenarios.

Data distributions after normalization. Natural images have a fixed range and uniform distribution, even after normalization. Physical datasets, however, have a much larger range of values with outliers far outside the nominal range.

Methodology

Phaedra leverages a disentangled hierarchical representation inspired by shape-gain quantization. The tokenizer factorizes latent embeddings into two distinct components: morphology ($z_\mu$), which captures spatial structures and topological features via Finite Scalar Quantization (FSQ), and amplitude ($z_\alpha$), which preserves the dynamic range and physical magnitudes. By incorporating an approximately continuous channel for amplitude, Phaedra avoids the precision loss common in standard discrete codebooks. This architecture effectively mirrors Proper Orthogonal Decomposition (POD) by separating spatial modes from their scalar coefficients, ensuring that physical properties such as conservation laws and sharp gradients are maintained even at high compression ratios.

The Phaedra pipeline uses a single encoder to compute two sets of embeddings; the shape embeddings are quantized using sparse, high-dimensional FSQ to capture spatial structures, while the gain embeddings are passed through a dense, 1-dimensional quantizer which approximates a continuous channel, preserving physical magnitudes. Following quantization, a learned recombination operator combines these two embeddings and passes them to the decoder for the final recosntruction.

Interactive Latent Space Exploration

Use the sliders below to independently interpolate the morphology ($z_\mu$) and amplitude ($z_\alpha$) tokens between two timesteps of the Kelvin-Helmholtz (KH) instability dataset. This example illustrates the disentanglement of features via the amplitude-morphology latent representations. Interpolating between the amplitude tokens shows larger changes within the overall structure of the flow, while changing morphology tokens results in finer changes to the turbulent structures, while maintaining the same overall dynamic range. By combining the morphology of one sample with the amplitude of another, we can generate reconstructions that maintain the fine-scale turbulent structure of one sample while strictly obeying the physical dynamic range of the other. Please note, this example is only for illustrative purposes and does not represent a true interpolation in the latent space, as the model was not trained with an explicit disentanglement loss.

Morphology ($z_\mu$): 0.0

KH t=0.95 (A) KH t=1.00 (B)

Amplitude ($z_\alpha$): 0.0

KH t=0.95 (A) KH t=1.00 (B)

Active Reconstruction

PDE Reconstruction Benchmarks

All models are trained on the same set of data, comprising Compressible Euler time-series defined by 4 different classes of initial conditions and Incompressible Navier-Stokes defined by 2 different classes of initial conditions. While we present the some competitive tokenizers below, a comprehensive comparison against additional baselines is detailed in the main paper. We evaluate Phaedra across three levels of difficulty.

ID (In-Distribution): Test samples from the same PDE family and parameters as training.
OD₁ (Out-of-Distribution): Shifts in the PDE coefficients and initial conditions.
OD₂ (Out-of-Distribution): Shifts in the PDEs that define the problem.

ID: CEU Curved Riemann

OD₁: Airfoil

OD₂: Acoustic Wave

Model	Dataset	nMAE ↓	nRMSE ↓	Δσ²_loc ↓	γ_min ↑
VQ-VAE-2	ID	3.024	5.069	15.02	79.1%
	OD₁	2.113	5.854	21.55	87.5%
	OD₂	4.449	5.833	17.06	68.9%
FSQ	ID	2.603	4.292	11.29	85.3%
	OD₁	1.876	4.314	20.55	93.8%
	OD₂	3.831	4.997	11.65	68.0%
Phaedra (Ours)	ID	1.522	2.489	5.96	93.6%
	OD₁	1.217	2.442	6.47	98.0%
	OD₂	2.500	3.363	5.82	79.9%

Sentinel-2 L1C Earth Observation Data

Zero-Shot Evaluation: All models were evaluated without fine-tuning on Earth Observation data, relying on representations learned from synthetic physical simulations.

Group	Model	rL₁ ↓	rL₂ ↓	Δσ²_loc ↓	γ_min ↑
Baseline	Continuous	7.426	8.100	62.61	90.02%
Compression: 4$\times$4	Phaedra₄ (Ours)	8.895	9.749	128.5	93.17%
Compression: 4$\times$4	FSQ₄	11.053	12.405	215.8	77.62%
Compression: 8$\times$8	Phaedra₈ (Ours)	9.900	11.475	163.9	62.72%
Compression: 8$\times$8	Cosmos₈	16.717	19.245	19,566	79.70%

Metrics calculated on native 13-band resolutions (10m, 20m, 60m).

Sentinel-2 Reconstruction results — Figure 1: Comparison of Ground Truth vs. Phaedra Reconstruction for Sentinel-2 L1C imagery.

Conclusion & Discussion

Our findings demonstrate that Phaedra successfully closes the "fidelity gap" that typically limits discrete tokenization in scientific applications. By balancing structural discretization with magnitude preservation, the model achieves superior reconstruction across diverse PDE families and exhibits remarkable zero-shot transfer to complex Earth observation data. This suggests that the shape-gain prior serves as a fundamental inductive bias for physical systems, enabling models to generalize across scales and disciplines. As we move toward larger scientific foundation models, Phaedra provides a robust framework for transforming continuous physical fields into discrete sequences without sacrificing the precision required for rigorous scientific analysis.

BibTeX

@misc{lingsch2026phaedralearninghighfidelitydiscrete,
      title={Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science}, 
      author={Levi Lingsch and Georgios Kissas and Johannes Jakubik and Siddhartha Mishra},
      year={2026},
      eprint={2602.03915},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.03915}, 
}