Performance and Benchmarking
This section describes the performance characteristics of the wildfire level-set solver and provides guidance on benchmarking, profiling, and optimization.
Overview
The wildfire level-set solver is designed for high-performance wildfire simulations with scalable algorithms and efficient data structures. Performance depends on:
Grid resolution: Computational cost scales as O(N²) in 2D, O(N³) in 3D
Fire spread model: Physics-based models (Balbi, Lautenberger) are more expensive than empirical models (Rothermel, FBP)
Propagation method: FARSITE elliptical expansion is faster than level-set advection for coarse grids
Feature complexity: Spotting, crown fire, radiation preheating add overhead
Timing Benchmarks
The solver includes automated timing benchmarks in regtest/misc/timing_benchmark/
to track performance across code versions and detect regressions.
Running Benchmarks
From the build directory:
cd build
# Default benchmarks (level-set and FARSITE)
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
--exe ./levelset --dim 2 --nsteps 30
# All fire spread models
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
--exe ./levelset --dim 2 --nsteps 30 \
--scenarios levelset farsite balbi cruz_crown cheney_gould fbp_o1a lautenberger
# Custom resolutions
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
--exe ./levelset --dim 2 --nsteps 50 \
--resolutions 64 128 256 512
Benchmark Output
The benchmark script generates:
Console output: Real-time progress and summary table
timing_results.csv: Detailed timing data with columns:
scenario: Fire spread model namen_cells: Grid cells per dimensiontotal_cells: Total grid cells (n_cells^dim)wall_time_s: Wall-clock execution timensteps: Number of timesteps executedsteps_per_second: Throughput metriccells_per_step_per_s: Overall performance metric
Example output:
Scenario N Cells Time(s) Steps/s Cells/step/s
----------------------------------------------------------------
levelset 32 1,024 1.23 24.4 24,976
levelset 64 4,096 4.82 6.2 25,393
levelset 128 16,384 19.45 1.5 24,784
levelset 256 65,536 78.12 0.4 25,042
farsite 32 1,024 0.95 31.6 32,375
farsite 64 4,096 3.54 8.5 34,790
...
Scaling Analysis
The benchmark computes an empirical scaling exponent α where T ∝ N^α:
Expected range: [0.8·dim, 1.6·dim]
Ideal scaling: α = dim (linear in total cells)
Sub-linear scaling (α < dim): Cache efficiency, vectorization
Super-linear scaling (α > dim): Memory bandwidth limitations
Example:
Estimated scaling exponent α = 2.03 (expected [1.6, 3.2]) ✓
Fire Spread Model Performance
Relative computational costs (normalized to Rothermel level-set):
Fire Spread Model |
Relative Cost |
Notes |
|---|---|---|
Rothermel (level-set) |
1.0× (baseline) |
WENO5-Z + RK3 advection |
FARSITE ellipse |
0.7-0.9× |
Faster for coarse grids |
Balbi physics-based |
1.2-1.5× |
Additional radiation terms |
Cruz crown fire |
0.9-1.1× |
Simple algebraic model |
Cheney-Gould grassfire |
0.8-1.0× |
Piecewise-linear formula |
FBP (Canadian) |
0.9-1.1× |
Lookup tables, efficient |
Lautenberger semi-phys |
1.1-1.4× |
Physics-based coefficients |
Note: Costs vary with grid resolution, timestep size, and feature configuration.
Feature Overhead
Performance impact of optional features (% overhead relative to baseline):
Feature |
Overhead |
Notes |
|---|---|---|
Albini spotting |
5-15% |
Depends on spot frequency |
Ember accumulation |
2-5% |
Additional field updates |
Radiation preheating |
10-20% |
View factor calculations |
Crown fire (Van Wagner) |
3-8% |
Initiation criteria checks |
Bulk fuel consumption |
5-10% |
Post-frontal burnout tracking |
FMC phenology |
1-3% |
Seasonal moisture updates |
McArthur moisture scaling |
<2% |
Temperature/RH modulation |
Periodic wind gusts |
<1% |
Sinusoidal wind updates |
Recommendation: Enable only features needed for your simulation to minimize overhead.
Optimization Guidelines
Grid Resolution
Choose grid resolution based on:
Feature scale: Cell size should resolve fire perimeter features (typically 5-50 m)
Computational budget: Runtime scales as O(N²) in 2D
Accuracy requirements: Finer grids reduce numerical diffusion
Example: For 1 km × 1 km domain:
32×32 grid (31 m cells): Fast, coarse features only
64×64 grid (16 m cells): Good balance for most applications
128×128 grid (8 m cells): High accuracy, slower
256×256 grid (4 m cells): Very high accuracy, expensive
Timestep Selection
The CFL condition limits timestep size:
Δt ≤ CFL × Δx / max(ROS)
Recommendations:
CFL = 0.5: Safe default, stable for all scenarios
CFL = 0.7-0.9: Faster, use for well-tested scenarios
CFL < 0.5: Use if stability issues occur
Reinitialization Frequency
Level-set reinitialization maintains the signed-distance property:
reinit_int = -1: Never reinitialize (fastest, may drift over time)reinit_int = 10-20: Good balance for most scenariosreinit_int = 5: Frequent reinitialization (most accurate, slower)
Cost: Reinitialization typically adds 20-30% overhead per occurrence.
Plotfile Output
Plotfile I/O can dominate runtime for large grids and frequent output:
Disable during benchmarking:
plot_int = -1Minimize output frequency:
plot_int = 20or higherReduce output fields (future feature)
Example: 256×256 grid, plotfile writing can take 1-5 seconds per output.
Parallelization
Current Status
The solver is primarily serial with some parallelization:
AMReX-level parallelism: MPI domain decomposition (future)
Thread-level parallelism: OpenMP for some kernels (partial)
Future work: Full MPI+OpenMP hybrid parallelization for HPC systems.
Memory Usage
Approximate memory requirements (per grid cell, 2D):
Base solver: ~100-200 bytes/cell (level-set φ, ROS, intensity, etc.)
With spotting: +50-100 bytes/cell (ember density, spot tracking)
With crown fire: +50 bytes/cell (crown fuel fields)
AMReX overhead: ~50 bytes/cell (metadata, ghost cells)
Example: 256×256 grid (65,536 cells):
Base: ~13-20 MB
Full features: ~20-30 MB
3D scaling: Memory scales as O(N³), limit 3D grids to <100³ cells without HPC resources.
Profiling
AMReX Built-in Profiling
AMReX includes performance profiling tools. Build with:
cmake -S . -B build -DAMReX_TINY_PROFILE=ON
Run simulation, profiling output appears in console:
TinyProfiler total time across processes [min...avg...max]
LevelSet::Advect3D [0.123, 0.125, 0.128]
LevelSet::Reinit [0.045, 0.046, 0.047]
Rothermel::ComputeROS [0.089, 0.090, 0.091]
...
External Profiling Tools
Linux perf:
perf record -g ./levelset inputs.i
perf report
gprof (build with -pg):
gfortran -pg ... # or gcc -pg
./levelset inputs.i
gprof levelset gmon.out > profile.txt
Valgrind callgrind:
valgrind --tool=callgrind ./levelset inputs.i
kcachegrind callgrind.out.*
GPU Acceleration
Status: GPU support via AMReX GPU backends is under development.
Future capabilities:
CUDA/HIP backends for NVIDIA/AMD GPUs
GPU-accelerated fire spread kernels
Hybrid CPU/GPU execution
Estimated speedup: 5-20× for large grids (>128² cells) on modern GPUs.
Note: GPU scaling benchmarks are deferred pending infrastructure development.
Best Practices
For Performance-Critical Simulations
Profile first: Identify actual bottlenecks before optimizing
Right-size grid: Balance accuracy and performance
Disable unused features: Turn off spotting, crown fire if not needed
Reduce plotfile frequency: I/O can dominate runtime
Use coarser propagation: FARSITE can be faster than level-set for operational forecasts
For Accuracy-Critical Simulations
Grid convergence study: Test multiple resolutions
Enable reinitialization:
reinit_int = 10or lessConservative CFL:
cfl = 0.5Full physics: Enable all relevant features
For Development and Testing
Small grids: 32×32 or 64×64 for rapid iteration
Short runs:
nsteps = 50orfinal_time = 300Disable I/O:
plot_int = -1Single model: Test one fire spread model at a time
Performance Reporting
When reporting performance issues or regressions:
System specs: CPU model, RAM, compiler version
Build configuration: Debug/Release, compiler flags
Test case: Input file or scenario description
Timing data:
timing_results.csvfrom benchmark scriptProfiling data: If available
Example issue report:
System: Intel i9-12900K, 64 GB RAM, GCC 11.3, Ubuntu 22.04
Build: Release mode, -DCMAKE_BUILD_TYPE=Release
Test: 256×256 Rothermel level-set, 100 timesteps
Performance: 0.45 steps/s (expected: 0.6 steps/s based on v1.0)
Regression: 25% slower than previous version
References
AMReX documentation: https://amrex-codes.github.io/amrex/docs_html/
WENO schemes: Jiang & Shu (1996), “Efficient Implementation of Weighted ENO Schemes”
Level-set methods: Osher & Fedkiw (2003), “Level Set Methods and Dynamic Implicit Surfaces”
See Also
Regression Tests - Regression test suite including timing benchmarks
Building the Code - Build configuration options for performance
Usage Guide - Runtime parameters affecting performance