Learning, Solving and Optimizing PDEs with TensorGalerkin: an Efficient High-Performance Galerkin Assembly Algorithm

Wen, Shizheng; Chi, Mingyuan; Yu, Tianwei; Moseley, Ben; Michelis, Mike Yan; Ren, Pu; Sun, Hao; Mishra, Siddhartha

Learning, Solving and Optimizing PDEs with TensorGalerkin: an Efficient High-Performance Galerkin Assembly Algorithm

Shizheng Wen^*1, Mingyuan Chi^*1, Tianwei Yu¹, Ben Moseley², Mike Yan Michelis¹, Pu Ren³, Hao Sun⁴, Siddhartha Mishra¹

¹ETH Zurich ²Imperial College London ³Northeastern University ⁴Renmin University of China
^*Indicates Equal Contribution

Paper Code arXiv

Overview of TensorGalerkin: Map-Reduce Galerkin Assembly

Overview of TensorGalerkin. Stage I (Batch-Map) computes element-wise operators via a fully tensorized `einsum` kernel; Stage II (Sparse-Reduce) assembles global sparse values via routing matrices and a single SpMM. For comparison, the white box illustrates traditional FEM assembly via per-element loops and scatter-add (atomics) into the global system. The same assembly engine powers TensorMesh, TensorPils, and TensorOpt.

Abstract

We present a unified algorithmic framework for the numerical solution, constrained optimization, and physics-informed learning of PDEs with a variational structure. Our framework is based on a Galerkin discretization of the underlying variational forms, and its high efficiency stems from a novel highly-optimized and GPU-compliant TensorGalerkin framework for linear system assembly (stiffness matrices and load vectors). TensorGalerkin operates by tensorizing element-wise operations within a Python-level Map stage and then performs global reduction with a sparse matrix multiplication that performs message passing on the mesh-induced sparsity graph. It can be seamlessly employed downstream as i) a highly-efficient numerical PDEs solver, ii) an end-to-end differentiable framework for PDE-constrained optimization, and iii) a physics-informed operator learning algorithm for PDEs. With multiple benchmarks, including 2D and 3D elliptic, parabolic, and hyperbolic PDEs on unstructured meshes, we demonstrate that the proposed framework provides significant computational efficiency and accuracy gains over a variety of baselines in all the targeted downstream applications.

Numerical PDE Solver

To demonstrate the capability of our TensorMesh as a numerical PDE solver, we consider two PDE benchmarks: the Poisson equation and the linear elasticity equations. In the following figures, we present the runtimes for TensorMesh as the number of degrees of freedom increases, for both the CPU and GPU versions. As baselines, we present runtimes for the extremely popular FEniCS framework on CPUs, the scikit-fem (SKFEM) framework, JAX-FEM on CPU and GPU, and PINN.

Poisson Equation (3D)

Linear Elasticity (3D)

Poisson Equation — Solution Visualization

TensorMesh

JAX-FEM

scikit-fem

FEniCS

PINN

Linear Elasticity — Solution Visualization

TensorMesh

JAX-FEM

scikit-fem

FEniCS

PINN

Neural PDE Solver

We consider a two-dimensional Poisson equation with checkerboard forcing terms at different scales to investigate the performance of the proposed TensorPils framework as a stand-alone neural PDE solver. We compare its performance with three neural PDE solvers: PINNs, VPINNs, and the Deep Ritz method, and present the errors and visualization in the following.

We report relative L² error (%) for different frequencies K and training throughput (it/s). All models were trained for 10,000 Adam + 200 LBFGS steps.

Method	Rel. L² Error (%)			Speed (it/s)
Method	K=2	K=4	K=8	Adam	LBFGS
PINN	0.90	6.30	34.77	20.1	1.0
VPINN	11.58	36.88	154.10	54.9	50.5
Deep Ritz	3.34	4.50	10.60	58.7	55.7
TensorPils	0.56	2.24	10.05	117.8	99.7

Solution Comparison

Physics-informed Neural Operator

In the next set of experiments, we test the entire TensorPils pipeline for physics-informed operator learning. To this end, we consider two time-dependent PDEs: the linear acoustic wave equation on a circular domain and the non-linear Allen-Cahn equations on an L-Shaped domain.

We report the relative L² errors on both In-distribution (ID) and Out-of-distribution (OOD) test sets.

	Wave		AC
	ID	OOD	ID	OOD
Data-Driven	0.089±0.013	0.230±0.017	0.135±0.042	0.152±0.080
PI-DeepONet	0.626±0.033	0.863±0.018	0.743±0.163	8.536±6.306
TensorPils	0.085±0.010	0.090±0.006	0.110±0.014	0.083±0.013

Wave Equation

Allen-Cahn Equation

Topology Optimization

We consider a challenging inverse design from the field of Topology Optimization. Our problem can be summarized as a compliance minimization problem for a 2D cantilever beam, described by the elasticity equations. Starting from a rectangular design domain, with fixed support on one edge and a point load applied in another corner. We compare with JAX-FEM using the same setup.

Performance comparison on the 2D Cantilever Beam task (51 iterations). Time is measured in seconds.

Stage	JAX-FEM	TensorMesh (Ours)	Speedup
Setup Time	2.62 s	0.58 s	4.5×
Optimization Loop	28.51 s	7.77 s	3.7×
Total Time	31.13 s	8.35 s	3.7×

BibTeX

@article{wen2026tensorgalerkin,
  title={Learning, Solving and Optimizing PDEs with TensorGalerkin: 
         an Efficient High-Performance Galerkin Assembly Algorithm},
  author={Wen, Shizheng and Chi, Mingyuan and Yu, Tianwei and Moseley, Ben and Michelis, Mike Yan and Ren, Pu and Sun, Hao and Mishra, Siddhartha},
  journal={arXiv preprint arXiv:2602.05052},
  year={2026}
}

More Works from CAMLab

TensorMesh

TensorPils

Learning, Solving and Optimizing PDEs with TensorGalerkin: an Efficient High-Performance Galerkin Assembly Algorithm

Abstract

Numerical PDE Solver

Poisson Equation (3D)

Linear Elasticity (3D)

Poisson Equation — Solution Visualization

Linear Elasticity — Solution Visualization

Neural PDE Solver

Solution Comparison

Physics-informed Neural Operator

Wave Equation

Allen-Cahn Equation

Topology Optimization

BibTeX