Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Sciences

1ETH AI Center, 2IBM Research Europe, 3 SAM ETH Zurich, 4SDSC

Disentangled Latent Control: Phaedra factorizes fields into independent $z_\mu$ (Morphology) and $z_\alpha$ (Amplitude) tokens. By combining, for example, the local vortices from Sample B ($z_{\mu,B}$) with the global magnitude profile of Sample A ($z_{\alpha,A}$), the reconstruction maintains the fine-scale turbulent structure of one while strictly obeying the physical dynamic range of the other.

Abstract

Tokens are discrete representations that allow modern deep learning to scale by transforming high-dimensional data into sequences that can be efficiently learned. As existing tokenizers are designed for realistic visual perception, we investigate whether these are optimal for scientific images, which exhibit a large dynamic range and require token embeddings to retain physical properties. We propose Phaedra, inspired by classical shape-gain quantization and proper orthogonal decomposition. We demonstrate that Phaedra consistently improves reconstruction across a range of PDE datasets and shows strong out-of-distribution generalization to unknown PDEs and real-world Earth observation data.

Phaedra Comparisons

Phaedra consistently improves reconstruction across PDE datasets, capturing fine details and precise magnitudes critical for scientific simulation. This is achieved by splitting the embeddings into shape and gain components, allowing for high-fidelity tokenization that retains physical properties, even in out-of-distribution scenarios.

Data distributions

Data distributions after normalization. Natural images have a fixed range and uniform distribution, even after normalization. Physical datasets, however, have a much larger range of values with outliers far outside the nominal range.

Methodology

Phaedra leverages a disentangled hierarchical representation inspired by shape-gain quantization. The tokenizer factorizes latent embeddings into two distinct components: morphology ($z_\mu$), which captures spatial structures and topological features via Finite Scalar Quantization (FSQ), and amplitude ($z_\alpha$), which preserves the dynamic range and physical magnitudes. By incorporating an approximately continuous channel for amplitude, Phaedra avoids the precision loss common in standard discrete codebooks. This architecture effectively mirrors Proper Orthogonal Decomposition (POD) by separating spatial modes from their scalar coefficients, ensuring that physical properties such as conservation laws and sharp gradients are maintained even at high compression ratios.

Phaedra Pipeline

The Phaedra pipeline uses a single encoder to compute two sets of embeddings; the shape embeddings are quantized using sparse, high-dimensional FSQ to capture spatial structures, while the gain embeddings are passed through a dense, 1-dimensional quantizer which approximates a continuous channel, preserving physical magnitudes. Following quantization, a learned recombination operator combines these two embeddings and passes them to the decoder for the final recosntruction.

Interactive Latent Space Exploration

Use the sliders below to independently interpolate the morphology ($z_\mu$) and amplitude ($z_\alpha$) tokens between two timesteps of the Kelvin-Helmholtz (KH) instability dataset. This example illustrates the disentanglement of features via the amplitude-morphology latent representations. Interpolating between the amplitude tokens shows larger changes within the overall structure of the flow, while changing morphology tokens results in finer changes to the turbulent structures, while maintaining the same overall dynamic range. By combining the morphology of one sample with the amplitude of another, we can generate reconstructions that maintain the fine-scale turbulent structure of one sample while strictly obeying the physical dynamic range of the other. Please note, this example is only for illustrative purposes and does not represent a true interpolation in the latent space, as the model was not trained with an explicit disentanglement loss.

KH t=0.95 (A) KH t=1.00 (B)
KH t=0.95 (A) KH t=1.00 (B)
Active Reconstruction

Active Reconstruction

PDE Reconstruction Benchmarks

All models are trained on the same set of data, comprising Compressible Euler time-series defined by 4 different classes of initial conditions and Incompressible Navier-Stokes defined by 2 different classes of initial conditions. While we present the some competitive tokenizers below, a comprehensive comparison against additional baselines is detailed in the main paper. We evaluate Phaedra across three levels of difficulty.

  • ID (In-Distribution): Test samples from the same PDE family and parameters as training.
  • OD1 (Out-of-Distribution): Shifts in the PDE coefficients and initial conditions.
  • OD2 (Out-of-Distribution): Shifts in the PDEs that define the problem.
ID: CEU Curved Riemann
OD1: Airfoil
OD2: Acoustic Wave
ID Input
Input
OD1 Input
Input
OD2 Input
Input
ID Recon
Phaedra Reconstruction
OD1 Recon
Phaedra Reconstruction
OD2 Recon
Phaedra Reconstruction
Model Dataset nMAE ↓ nRMSE ↓ Δσ²loc γmin
VQ-VAE-2ID3.0245.06915.0279.1%
OD12.1135.85421.5587.5%
OD24.4495.83317.0668.9%
FSQID2.6034.29211.2985.3%
OD11.8764.31420.5593.8%
OD23.8314.99711.6568.0%
Phaedra (Ours) ID1.5222.4895.9693.6%
OD11.2172.4426.4798.0%
OD22.5003.3635.8279.9%

Sentinel-2 L1C Earth Observation Data

Zero-Shot Evaluation: All models were evaluated without fine-tuning on Earth Observation data, relying on representations learned from synthetic physical simulations.
Group Model rL₁ ↓ rL₂ ↓ Δσ²loc γmin
Baseline Continuous 7.426 8.100 62.61 90.02%
Compression: 4$\times$4 Phaedra₄ (Ours) 8.895 9.749 128.5 93.17%
FSQ₄ 11.053 12.405 215.8 77.62%
Compression: 8$\times$8 Phaedra₈ (Ours) 9.900 11.475 163.9 62.72%
Cosmos₈ 16.717 19.245 19,566 79.70%

Metrics calculated on native 13-band resolutions (10m, 20m, 60m).

Sentinel-2 Reconstruction results
Figure 1: Comparison of Ground Truth vs. Phaedra Reconstruction for Sentinel-2 L1C imagery.

Conclusion & Discussion

Our findings demonstrate that Phaedra successfully closes the "fidelity gap" that typically limits discrete tokenization in scientific applications. By balancing structural discretization with magnitude preservation, the model achieves superior reconstruction across diverse PDE families and exhibits remarkable zero-shot transfer to complex Earth observation data. This suggests that the shape-gain prior serves as a fundamental inductive bias for physical systems, enabling models to generalize across scales and disciplines. As we move toward larger scientific foundation models, Phaedra provides a robust framework for transforming continuous physical fields into discrete sequences without sacrificing the precision required for rigorous scientific analysis.

BibTeX

@misc{lingsch2026phaedralearninghighfidelitydiscrete,
      title={Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science}, 
      author={Levi Lingsch and Georgios Kissas and Johannes Jakubik and Siddhartha Mishra},
      year={2026},
      eprint={2602.03915},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.03915}, 
}