Lattice Documentation
GPU-accelerated wind tunnel simulation using Lattice Boltzmann Methods. Runs on consumer GPUs, outputs drag and lift coefficients, and includes an ML surrogate for instant predictions.
Quick Start
Build from source
sudo apt install build-essential cmake pkg-config \
libsdl2-dev libsdl2-ttf-dev mesa-common-dev \
libgl1-mesa-dev libegl1-mesa-dev
# Build
cmake -B build -S simulation -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
# Run interactively
cd simulation
../build/3d_fluid_simulation_car --wind=1.0Headless render
mkdir -p frames
../build/3d_fluid_simulation_car \
--wind=2.0 --duration=10 \
--model=assets/3d-files/ahmed_25deg_m.obj \
--output=frames --grid=256x128x128Web frontend
cd website
npm ci && npm run dev
How It Works
Each frame runs five GPU compute passes in sequence:
- Collision -- relax distribution functions toward equilibrium (BGK or regularized operator, with optional Smagorinsky SGS turbulence model)
- Streaming -- propagate distributions to neighboring cells
- Force computation -- accumulate drag/lift via momentum exchange at solid boundaries
- Particle advection -- move tracer particles through the velocity field
- Rendering -- draw particles as points or streamline trails
The LBM solver uses a D3Q19 lattice (19 velocity directions in 3D). Solid geometry is voxelized from OBJ meshes using ray casting. Boundary conditions are Zou-He velocity inlet, Zou-He pressure outlet, and Bouzidi interpolated bounce-back on solid walls. The Bouzidi scheme stores the fractional distance from each fluid cell to the actual mesh surface along every lattice link, then applies quadratic or linear interpolation during streaming. This gives second-order wall accuracy without refining the grid.
Architecture
lattice/
simulation/
src/main.c # Render loop, CLI, GL setup
src/lbm.c # LBM grid creation, force readback
src/ml_predict.c # ML inference forward pass (pure C)
shaders/
lbm_collide.comp # BGK + Smagorinsky + BCs
lbm_stream.comp # Pull-based streaming
lbm_force.comp # Momentum exchange drag
particle.comp # Particle advection + collision
trail_update.comp # Streamline trail history
lib/ # Headers
assets/ # OBJ models, fonts
website/
src/app/page.tsx # Main simulator UI
src/app/api/ # Next.js API routes
src/components/ # React components
ml/
train.cpp # Training driver
data_gen.py # Modal sweep for training data
framework/ # C++ autodiff, layers, optimizerCLI Reference
| Flag | Default | Description |
|---|---|---|
| -w, --wind=SPEED | 1.0 | Wind speed 0-5 (maps to Re) |
| -d, --duration=SECS | 0 | Headless render duration (0 = interactive) |
| -o, --output=PATH | Output directory for PPM frames | |
| -m, --model=PATH | car-model.obj | Path to OBJ mesh file |
| -a, --angle=DEG | Ahmed body slant angle (25 or 35) | |
| -g, --grid=XxYxZ | 128x64x64 | LBM grid resolution |
| -r, --reynolds=RE | 0 | Target Reynolds number (0 = derive from wind) |
| -s, --scale=S | 0.05 | Model scale factor |
| -S, --smagorinsky=CS | 0.1 | Smagorinsky constant (0-0.5) |
| -v, --viz=MODE | 1 | Visualization mode (0-7) |
| -c, --collision=MODE | 2 | Collision: 0=off, 1=AABB, 2=mesh, 3=voxel |
--reynolds is not set: Re = 200 * windSpeed, capped by the grid's minimum stable tau (0.52). Larger grids support higher Re. Voxel collision (mode 3) requires LBM and will fall back to AABB if LBM is disabled.API Reference
The web frontend talks to a Modal GPU worker via a REST endpoint.
POST /api/render
"wind_speed": 1.0,
"duration": 10,
"model": "ahmed25",
"viz_mode": 1,
"collision_mode": 2,
"reynolds": 0
}
"status": "complete",
"video_url": "https://...",
"cd_value": 0.295,
"cl_value": -0.012,
"cd_series": [0.31, 0.30, 0.295],
"effective_re": 480,
"grid_size": "256x128x128"
}The worker runs on an NVIDIA T4 GPU via Modal. Cold starts take 30-60 seconds; warm starts respond in ~15 seconds for a 10-second simulation.
Visualization Modes
| Mode | Key | Description |
|---|---|---|
| 0 | 3 | Depth -- distance-based coloring |
| 1 | 4 | Velocity magnitude -- jet colormap (blue to red) |
| 2 | 5 | Velocity direction -- RGB = normalized XYZ |
| 3 | 6 | Particle lifetime -- viridis colormap with fade |
| 4 | 7 | Turbulence -- cool-warm diverging map |
| 5 | 8 | Flow progress -- rainbow by x-position |
| 6 | 9 | Vorticity -- purple (laminar) to orange (turbulent) |
| 7 | V | Streamlines -- particle trails as line strips, 32-point history |
Press V to cycle through modes in interactive mode. Streamline mode (7) stores a 32-frame position trail per particle and renders as GL_LINE_STRIP with velocity-based coloring and alpha fade.
Physics Details
Governing Equations
Fluid flow is governed by the Navier-Stokes equations for incompressible flow:
where uis the velocity field, p is pressure, ρ is density, and ν is kinematic viscosity. The Reynolds number Re = UL/ν characterizes the flow regime. For automotive aerodynamics at highway speeds, Re reaches 10⁶ or higher, indicating fully turbulent flow.
Lattice Boltzmann Method
Rather than solving Navier-Stokes directly, the LBM evolves particle distribution functions on a D3Q19 lattice (3 dimensions, 19 velocities): 1 rest, 6 face-connected, and 12 edge-connected neighbors. The BGK collision operator relaxes distributions toward equilibrium:
Equilibrium Distribution
The Maxwell-Boltzmann equilibrium distribution is:
with weights w0 = 1/3, w1-6 = 1/18, w7-18 = 1/36 and speed of sound cs² = 1/3.
Collision Kernel
#version 430 core
layoutlocal_size_x = 8, local_size_y = 8, local_size_z = 8) in;
uniform float tau;
void main() {
uvec3 pos = gl_GlobalInvocationID;
uint idx = pos.x + pos.y*NX + pos.z*NX*NY;
float rho = 0.0;
vec3 u = vec3(0.0);
for (int i = 0; i < 19; i++) {
float fi = f[i][idx];
rho += fi;
u += fi * e[i];
}
u /= rho;
// BGK collision
float omega = 1.0 / tau;
for (int i = 0; i < 19; i++) {
float feq = equilibrium(i, rho, u);
f[i][idx] -= omega * (f[i][idx] - feq);
}
}Smagorinsky SGS Model
For turbulent flows, the Smagorinsky subgrid-scale model computes a local eddy viscosity from the non-equilibrium stress tensor:
where Cs is the Smagorinsky constant (default 0.1, tunable via --smagorinsky) and |Π| is the Frobenius norm of the non-equilibrium stress. This increases τ locally in turbulent regions, preventing instability without damping the entire domain.
Drag Computation
Surface forces use the Mei-Luo-Shyy momentum exchange method, which pairs with the Bouzidi boundary to account for the actual wall position rather than assuming it sits on the grid midpoint:
where fi* is the post-collision distribution heading toward the wall and fī is the Bouzidi-reflected distribution coming back. The drag and lift coefficients are then:
ML Surrogate Model
A small MLP predicts Cd and Cl in under a millisecond, giving instant estimates before the LBM has time to converge. The entire training framework is written from scratch in C++ with reverse-mode autodiff, so there are no Python ML dependencies at inference time. The same model runs in the C simulation binary, in the browser via TypeScript, and in the evaluation pipeline.
Architecture
The network takes three z-score normalized inputs (wind speed, Reynolds number, model ID) and outputs two values (Cd, Cl). SwiGLU activations replace standard ReLU for smoother gradient flow.
Linear(3, 256) -> SwiGLU(256, 512) -> Linear(256, 128) -> SwiGLU(128, 256) -> Linear(128, 2)
525,400 parameters total
fc1: 3 x 256 + bias = 1,024
act1: SwiGLU(256, 512) = 393,216 (gate + up + down projections)
fc2: 256 x 128 + bias = 32,896
act2: SwiGLU(128, 256) = 98,304
fc3: 128 x 2 + bias = 258Training
Training data comes from the LBM simulation itself. A parameter sweep over wind speeds, Reynolds numbers, and model geometries generates (input, Cd, Cl) pairs. The data generator (data_gen.py) submits render jobs to Modal, validates convergence, and writes a binary dataset.
Optimizer: AdamW (lr=1e-3, weight_decay=1e-4, clip_norm=1.0)
Loss: MSE on [Cd, Cl]
Batch size: 64
Epochs: 500 (early stopping, patience=50)
Train/val: 80/20 split
Data format: binary -- 16-byte header + N records of 5 float32s
[wind_speed, reynolds, model_id, cd, cl]Weight Format
Weights are stored in a custom binary format (LTWS) shared across all three inference targets. The normalizer file stores 3 means and 3 standard deviations as raw float32s.
model.bin: LTWS header (magic + version + count) + 12 parameter tensors
model_norm.bin: 6 x float32 (mean[3], std[3])Inference
The C simulation loads the weights at startup and prints an instant Cd/Cl estimate before the LBM loop begins. The browser loads the same files from /models/ and runs a pure TypeScript forward pass with no dependencies. Both paths take under 1ms.
cd ml && python data_gen.py --config sweep_config.yaml --endpoint $MODAL_URL
# Train the model
cmake -B build && cmake --build build && ./build/train dataset/training_data.bin
# Evaluate
python evaluate.py --weights model.bin --norm model_norm.bin --data dataset/training_data.binValidation
The solver uses a sphere drag test to validate the force computation pipeline (momentum exchange, Bouzidi boundary, Cd formula). At Re=100 with 16 cells across the sphere diameter, the measured Cd falls within the expected range from Clift, Grace & Weber (1978).
Sphere reference test
| Re | Grid | Cd (sim) | Cd (ref) | Status |
|---|---|---|---|---|
| 100 | 128×64×64 | 1.1 – 1.3 | 1.09 | Pass (30% tol) |
Why Cd differs from wind tunnel data
Published Ahmed body data (Ahmed 1984, Lienhart 2003) is measured at Re > 500,000 where the flow is fully turbulent. The LBM solver on typical grids (128-256 cells) operates at Re = 50-200 where the flow is laminar. At laminar Re the boundary layer is thicker, separation occurs earlier, and Cd is 3-10x higher than at turbulent Re. This is physically correct behavior, not a bug.
The Smagorinsky SGS turbulence model adds local eddy viscosity that partially compensates, but matching high-Re experimental data requires grids with hundreds of cells across the body -- beyond what consumer GPUs can handle in real time.
Grid size and Reynolds number
The achievable Reynolds number is limited by the grid resolution. The relaxation parameter τ must stay above 0.52 for numerical stability. This relationship determines the maximum Re for each grid:
| Grid | Body cells | Re (max) | Cd range | GPU VRAM |
|---|---|---|---|---|
| 128×64×64 | ~38 | ~115 | 2 – 5 | 86 MB |
| 128×96×96 | ~38 | ~115 | 2 – 5 | 240 MB |
| 256×128×128 | ~77 | ~230 | 1 – 3 | 960 MB |
| 512×256×256 | ~153 | ~460 | 0.5 – 2 | 7.6 GB |
Try it yourself
Use the calculator below to see what Reynolds number and memory footprint your grid configuration produces:
Grid Resolution Calculator
Performance
Performance depends on grid size and GPU. The LBM kernel is memory-bandwidth-bound: each cell reads and writes 19 float32 distributions per step. Typical throughput on modern GPUs:
| GPU | Grid | Steps/sec | MLUPS |
|---|---|---|---|
| A10G (Modal) | 128×96×96 | ~400 | ~470 |
| A100 40GB (Modal) | 256×128×128 | ~100 | ~420 |
| RTX 3070 | 128×64×64 | ~800 | ~420 |
| RTX 4090 | 256×128×128 | ~200 | ~840 |
MLUPS = Million Lattice Updates Per Second. Memory usage per cell is approximately 200 bytes (19 distributions × 2 buffers + velocity + solid + Bouzidi q). A 256×128×128 grid needs about 960 MB of GPU buffer space.
Convergence time
The drag coefficient needs 3-5 flow-throughs to converge. A flow-through is the time for a fluid element to traverse the domain: N_x / U_lattice steps. For a 256-cell domain at U = 0.05, one flow-through takes 5,120 steps. The simulation reports an exponential moving average (EMA) of Cd that stabilizes after approximately 50 samples.
Contributing
Lattice is open source under the MIT license. Contributions are welcome -- see CONTRIBUTING.md for the full style guide, testing, and PR process.
Code style
The simulation is C11, formatted with clang-format (LLVM style, 80 char limit). The website is TypeScript/React with Tailwind. Tests run in CI via GitHub Actions.
Commit sign-off
Every commit must carry a sign-off line certifying the Developer Certificate of Origin. Use git commit -s to add it automatically:
git commit -s -m "lbm: fix bounce-back at concave corners"
# lbm: fix bounce-back at concave corners
#
# Signed-off-by: Your Name <your@email.com>Running tests
cd simulation && xvfb-run ../build/test_lbm # LBM unit tests
xvfb-run bash test/test_integration.sh # Integration test
cd website && npm test # Frontend tests
cd ml/framework && ./build/test_ml # ML framework tests