Abstract
We present a unified algorithmic framework for the numerical solution, constrained optimization, and physics-informed learning of PDEs with a variational structure. Our framework is based on a Galerkin discretization of the underlying variational forms, and its high efficiency stems from a novel highly-optimized and GPU-compliant TensorGalerkin framework for linear system assembly (stiffness matrices and load vectors). TensorGalerkin operates by tensorizing element-wise operations within a Python-level Map stage and then performs global reduction with a sparse matrix multiplication that performs message passing on the mesh-induced sparsity graph. It can be seamlessly employed downstream as i) a highly-efficient numerical PDEs solver, ii) an end-to-end differentiable framework for PDE-constrained optimization, and iii) a physics-informed operator learning algorithm for PDEs. With multiple benchmarks, including 2D and 3D elliptic, parabolic, and hyperbolic PDEs on unstructured meshes, we demonstrate that the proposed framework provides significant computational efficiency and accuracy gains over a variety of baselines in all the targeted downstream applications.
Numerical PDE Solver
To demonstrate the capability of our TensorMesh as a numerical PDE solver, we consider two PDE benchmarks: the Poisson equation and the linear elasticity equations. In the following figures, we present the runtimes for TensorMesh as the number of degrees of freedom increases, for both the CPU and GPU versions. As baselines, we present runtimes for the extremely popular FEniCS framework on CPUs, the scikit-fem (SKFEM) framework, JAX-FEM on CPU and GPU, and PINN.
Poisson Equation (3D)
Linear Elasticity (3D)
Poisson Equation — Solution Visualization
TensorMesh
JAX-FEM
scikit-fem
FEniCS
PINN
Linear Elasticity — Solution Visualization
TensorMesh
JAX-FEM
scikit-fem
FEniCS
PINN
Neural PDE Solver
We consider a two-dimensional Poisson equation with checkerboard forcing terms at different scales to investigate the performance of the proposed TensorPils framework as a stand-alone neural PDE solver. We compare its performance with three neural PDE solvers: PINNs, VPINNs, and the Deep Ritz method, and present the errors and visualization in the following.
We report relative L2 error (%) for different frequencies K and training throughput (it/s). All models were trained for 10,000 Adam + 200 LBFGS steps.
| Method | Rel. L2 Error (%) | Speed (it/s) | |||
|---|---|---|---|---|---|
| K=2 | K=4 | K=8 | Adam | LBFGS | |
| PINN | 0.90 | 6.30 | 34.77 | 20.1 | 1.0 |
| VPINN | 11.58 | 36.88 | 154.10 | 54.9 | 50.5 |
| Deep Ritz | 3.34 | 4.50 | 10.60 | 58.7 | 55.7 |
| TensorPils | 0.56 | 2.24 | 10.05 | 117.8 | 99.7 |
Solution Comparison
Physics-informed Neural Operator
In the next set of experiments, we test the entire TensorPils pipeline for physics-informed operator learning. To this end, we consider two time-dependent PDEs: the linear acoustic wave equation on a circular domain and the non-linear Allen-Cahn equations on an L-Shaped domain.
We report the relative L2 errors on both In-distribution (ID) and Out-of-distribution (OOD) test sets.
| Wave | AC | |||
|---|---|---|---|---|
| ID | OOD | ID | OOD | |
| Data-Driven | 0.089±0.013 | 0.230±0.017 | 0.135±0.042 | 0.152±0.080 |
| PI-DeepONet | 0.626±0.033 | 0.863±0.018 | 0.743±0.163 | 8.536±6.306 |
| TensorPils | 0.085±0.010 | 0.090±0.006 | 0.110±0.014 | 0.083±0.013 |
Wave Equation
Allen-Cahn Equation
Topology Optimization
We consider a challenging inverse design from the field of Topology Optimization. Our problem can be summarized as a compliance minimization problem for a 2D cantilever beam, described by the elasticity equations. Starting from a rectangular design domain, with fixed support on one edge and a point load applied in another corner. We compare with JAX-FEM using the same setup.
Performance comparison on the 2D Cantilever Beam task (51 iterations). Time is measured in seconds.
| Stage | JAX-FEM | TensorMesh (Ours) | Speedup |
|---|---|---|---|
| Setup Time | 2.62 s | 0.58 s | 4.5× |
| Optimization Loop | 28.51 s | 7.77 s | 3.7× |
| Total Time | 31.13 s | 8.35 s | 3.7× |
BibTeX
@article{wen2026tensorgalerkin,
title={Learning, Solving and Optimizing PDEs with TensorGalerkin:
an Efficient High-Performance Galerkin Assembly Algorithm},
author={Wen, Shizheng and Chi, Mingyuan and Yu, Tianwei and Moseley, Ben and Michelis, Mike Yan and Ren, Pu and Sun, Hao and Mishra, Siddhartha},
journal={arXiv preprint arXiv:2602.05052},
year={2026}
}