Performance and Benchmarking

This section describes the performance characteristics of the wildfire level-set solver and provides guidance on benchmarking, profiling, and optimization.

Overview 

The wildfire level-set solver is designed for high-performance wildfire simulations with scalable algorithms and efficient data structures. Performance depends on:

Grid resolution: Computational cost scales as O(N²) in 2D, O(N³) in 3D
Fire spread model: Physics-based models (Balbi, Lautenberger) are more expensive than empirical models (Rothermel, FBP)
Propagation method: FARSITE elliptical expansion is faster than level-set advection for coarse grids
Feature complexity: Spotting, crown fire, radiation preheating add overhead

Timing Benchmarks 

The solver includes automated timing benchmarks in regtest/misc/timing_benchmark/ to track performance across code versions and detect regressions.

Running Benchmarks 

From the build directory:

cd build

# Default benchmarks (level-set and FARSITE)
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
    --exe ./levelset --dim 2 --nsteps 30

# All fire spread models
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
    --exe ./levelset --dim 2 --nsteps 30 \
    --scenarios levelset farsite balbi cruz_crown cheney_gould fbp_o1a lautenberger

# Custom resolutions
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
    --exe ./levelset --dim 2 --nsteps 50 \
    --resolutions 64 128 256 512

Benchmark Output 

The benchmark script generates:

Console output: Real-time progress and summary table
timing_results.csv: Detailed timing data with columns:
- scenario: Fire spread model name
- n_cells: Grid cells per dimension
- total_cells: Total grid cells (n_cells^dim)
- wall_time_s: Wall-clock execution time
- nsteps: Number of timesteps executed
- steps_per_second: Throughput metric
- cells_per_step_per_s: Overall performance metric

Example output:

Scenario        N  Cells  Time(s)  Steps/s  Cells/step/s
----------------------------------------------------------------
levelset       32   1,024      1.23    24.4      24,976
levelset       64   4,096      4.82    6.2      25,393
levelset      128  16,384     19.45    1.5      24,784
levelset      256  65,536     78.12    0.4      25,042
farsite        32   1,024      0.95    31.6      32,375
farsite        64   4,096      3.54    8.5      34,790
...

Scaling Analysis 

The benchmark computes an empirical scaling exponent α where T ∝ N^α:

Expected range: [0.8·dim, 1.6·dim]
Ideal scaling: α = dim (linear in total cells)
Sub-linear scaling (α < dim): Cache efficiency, vectorization
Super-linear scaling (α > dim): Memory bandwidth limitations

Example:

Estimated scaling exponent α = 2.03  (expected [1.6, 3.2])  ✓

Fire Spread Model Performance 

Relative computational costs (normalized to Rothermel level-set):

Fire Spread Model	Relative Cost	Notes
Rothermel (level-set)	1.0× (baseline)	WENO5-Z + RK3 advection
FARSITE ellipse	0.7-0.9×	Faster for coarse grids
Balbi physics-based	1.2-1.5×	Additional radiation terms
Cruz crown fire	0.9-1.1×	Simple algebraic model
Cheney-Gould grassfire	0.8-1.0×	Piecewise-linear formula
FBP (Canadian)	0.9-1.1×	Lookup tables, efficient
Lautenberger semi-phys	1.1-1.4×	Physics-based coefficients

Note: Costs vary with grid resolution, timestep size, and feature configuration.

Feature Overhead 

Performance impact of optional features (% overhead relative to baseline):

Feature	Overhead	Notes
Albini spotting	5-15%	Depends on spot frequency
Ember accumulation	2-5%	Additional field updates
Radiation preheating	10-20%	View factor calculations
Crown fire (Van Wagner)	3-8%	Initiation criteria checks
Bulk fuel consumption	5-10%	Post-frontal burnout tracking
FMC phenology	1-3%	Seasonal moisture updates
McArthur moisture scaling	<2%	Temperature/RH modulation
Periodic wind gusts	<1%	Sinusoidal wind updates

Recommendation: Enable only features needed for your simulation to minimize overhead.

Optimization Guidelines 

Grid Resolution 

Choose grid resolution based on:

Feature scale: Cell size should resolve fire perimeter features (typically 5-50 m)
Computational budget: Runtime scales as O(N²) in 2D
Accuracy requirements: Finer grids reduce numerical diffusion

Example: For 1 km × 1 km domain:

32×32 grid (31 m cells): Fast, coarse features only
64×64 grid (16 m cells): Good balance for most applications
128×128 grid (8 m cells): High accuracy, slower
256×256 grid (4 m cells): Very high accuracy, expensive

Timestep Selection 

The CFL condition limits timestep size:

Δt ≤ CFL × Δx / max(ROS)

Recommendations:

CFL = 0.5: Safe default, stable for all scenarios
CFL = 0.7-0.9: Faster, use for well-tested scenarios
CFL < 0.5: Use if stability issues occur

Reinitialization Frequency 

Level-set reinitialization maintains the signed-distance property:

reinit_int = -1: Never reinitialize (fastest, may drift over time)
reinit_int = 10-20: Good balance for most scenarios
reinit_int = 5: Frequent reinitialization (most accurate, slower)

Cost: Reinitialization typically adds 20-30% overhead per occurrence.

Plotfile Output 

Plotfile I/O can dominate runtime for large grids and frequent output:

Disable during benchmarking: plot_int = -1
Minimize output frequency: plot_int = 20 or higher
Reduce output fields (future feature)

Example: 256×256 grid, plotfile writing can take 1-5 seconds per output.

Parallelization 

Current Status 

The solver is primarily serial with some parallelization:

AMReX-level parallelism: MPI domain decomposition (future)
Thread-level parallelism: OpenMP for some kernels (partial)

Future work: Full MPI+OpenMP hybrid parallelization for HPC systems.

Memory Usage 

Approximate memory requirements (per grid cell, 2D):

Base solver: ~100-200 bytes/cell (level-set φ, ROS, intensity, etc.)
With spotting: +50-100 bytes/cell (ember density, spot tracking)
With crown fire: +50 bytes/cell (crown fuel fields)
AMReX overhead: ~50 bytes/cell (metadata, ghost cells)

Example: 256×256 grid (65,536 cells):

Base: ~13-20 MB
Full features: ~20-30 MB

3D scaling: Memory scales as O(N³), limit 3D grids to <100³ cells without HPC resources.

Profiling 

AMReX Built-in Profiling 

AMReX includes performance profiling tools. Build with:

cmake -S . -B build -DAMReX_TINY_PROFILE=ON

Run simulation, profiling output appears in console:

TinyProfiler total time across processes [min...avg...max]
LevelSet::Advect3D        [0.123, 0.125, 0.128]
LevelSet::Reinit          [0.045, 0.046, 0.047]
Rothermel::ComputeROS     [0.089, 0.090, 0.091]
...

External Profiling Tools 

Linux perf:

perf record -g ./levelset inputs.i
perf report

gprof (build with -pg):

gfortran -pg ...  # or gcc -pg
./levelset inputs.i
gprof levelset gmon.out > profile.txt

Valgrind callgrind:

valgrind --tool=callgrind ./levelset inputs.i
kcachegrind callgrind.out.*

GPU Acceleration 

Status: GPU support via AMReX GPU backends is under development.

Future capabilities:

CUDA/HIP backends for NVIDIA/AMD GPUs
GPU-accelerated fire spread kernels
Hybrid CPU/GPU execution

Estimated speedup: 5-20× for large grids (>128² cells) on modern GPUs.

Note: GPU scaling benchmarks are deferred pending infrastructure development.

Best Practices 

For Performance-Critical Simulations 

Profile first: Identify actual bottlenecks before optimizing
Right-size grid: Balance accuracy and performance
Disable unused features: Turn off spotting, crown fire if not needed
Reduce plotfile frequency: I/O can dominate runtime
Use coarser propagation: FARSITE can be faster than level-set for operational forecasts

For Accuracy-Critical Simulations 

Grid convergence study: Test multiple resolutions
Enable reinitialization: reinit_int = 10 or less
Conservative CFL: cfl = 0.5
Full physics: Enable all relevant features

For Development and Testing 

Small grids: 32×32 or 64×64 for rapid iteration
Short runs: nsteps = 50 or final_time = 300
Disable I/O: plot_int = -1
Single model: Test one fire spread model at a time

Performance Reporting 

When reporting performance issues or regressions:

System specs: CPU model, RAM, compiler version
Build configuration: Debug/Release, compiler flags
Test case: Input file or scenario description
Timing data: timing_results.csv from benchmark script
Profiling data: If available

Example issue report:

System: Intel i9-12900K, 64 GB RAM, GCC 11.3, Ubuntu 22.04
Build: Release mode, -DCMAKE_BUILD_TYPE=Release
Test: 256×256 Rothermel level-set, 100 timesteps
Performance: 0.45 steps/s (expected: 0.6 steps/s based on v1.0)
Regression: 25% slower than previous version

References 

AMReX documentation: https://amrex-codes.github.io/amrex/docs_html/
WENO schemes: Jiang & Shu (1996), “Efficient Implementation of Weighted ENO Schemes”
Level-set methods: Osher & Fedkiw (2003), “Level Set Methods and Dynamic Implicit Surfaces”