CFD Surrogate Transformer

Introduction:

The final project for my CS7150: Deep Learning class, my team developed a surrogate transformer model to replace, or supplement, the computationally heavy simulation tool Computational Fluid Dynamics, or CFD. CFD is a critical simulation software for many critical applications such as intravenous injections, plane manufacturing, and weather forecasting. I was personally exposed to CFD in industry when my mechanical group would conduct simulations to analyze laminar flow at the working plane in biopharmaceutical clean room spaces. These simulations, depending on scale, can take anywhere from a minute for small environments and reduced meshes, or up to days or weeks for larger more complex analyses. We sought to design a transformer model to serve as a significantly lighter weight approach to CFD simulations.

Presentation

Code

Note

Repo currently private. Please request for access.

Question 1: Can we encode scientific priors into our model to improve its performance?

Context: The Reynold's Number

This question arose out of necessity. Every fluid's dynamics is directly linked to its molecular structure, explicitly defined as its Reynold's Number. Every simulation in our dataset had a different Reynold's Number making it a priority to encode this global parameter per simulation.

Our Method: A Custom Positional Encoding, ReyPR

Our teammate, Chris, has spent much of his PhD researching the impact of various positional encoding methods and their benefits to problems in different domains. He spearheaded our custom PE attention mechanism that incorporates the Reynolds number by adding a learned, per-head bias that scales with r. A vector, u, encodes how each head’s attention should change across different flows, and the term r · (q · u) is added to the attention logits. This lets the model adapt its attention patterns to different Reynolds numbers, improving its ability to capture laminar–turbulent variations. Below is Chris's formulaic breakdown of the method:

We deemed ReyPR to be Pareto-Optimal after a study comparing it's performance to other PE methods with a strict, next frame encoder-only model. The ANTI_METHOD fills random noise in to the r-term to prove that the Reynold's Number is actually relevant to the improved performance.

Question 2: Can we achieve optimal performance in a parameter reduced state?

Context: Nodes Per Frame

One of the critical issues with CFD is its complexity and memory requirements due to the dense mesh discretization of the simulation environment. In our dataset, there are 3 parameters (x-velo, y-velo, pressure) per 1699 nodes per frame, or 5097 parameters per frame. This is a very large parameter space and would get exponentially larger with larger environments or denser meshes.

Our Method: A Low Ranked Approximation Pooling

Inspired by LoRa, we implemented low-rank adaptations of our Key and Value matrices to reduce parameters and improve gradient flow with limited informational losses. We maintained Queries as full-rank to keep one of the original full-attention representations. This not only significantly reduced parameter size of the encoder-only model, but it improved performance as well.

Question 3: Is an autoregressive decoder-only method viable for full simulation sequence prediction?

Context: Full Simulation Capabilities

At the end of the day, the industry application of this method would be to replace as much of the CFD simulation as possible.

Our Method: A Decoder-Only Autoregressive Model

Given the nature of multi-frame autoregression, we opted for a decoder-style transformer with causal masking and employed a sliding window context frame. This fixed-length window trades-off efficiency for longer-term attention. We achieved some successes and some failures. As you can see on the left sample, in shorter autoregressive contexts, all future frames' error improved steadily. On the right however, you can see that the model is not able to consistently converge longer horizons.

Question 4: Would a physics informed loss function improve our model?

Context: Navier Stokes

Despite SOA CFD simulators using more robust versions of Navier Stokes, Navier Stokes is thematically the foundation of the simulation's calculations. In our research of PINNS, we know that they can help models converge more efficiently and with higher accuracy. For our context, we would also incorporate the relative position of our mesh's nodes, conceivably training the model to recognize mesh spatiotemporal relationships and enabling it to generalize to new environments or new meshes.

Our Method: Weighted Sum Loss Function

After building our Navier Stokes loss function, we added it to MSE as a weighted sum loss function. However, even after physics-loss clipping, normalization, and heavily weighting MSE to physics 99:1, the physics informed model was never able to outperform the MSE loss model alone. Perhaps with further parameter tweaking and a loss-method scheduler, we could improve upon this method and introduce model generalizability which is ultimately required for an industry viable surrogate model to CFD. Physics loss is in gray, MSE is in green.

Collaborative project with Christopher Curtis PhD AI, Brent Garey MSc. AI, and Liam Langert MSc. AI for CS7150: Deep Learning at Northeastern University