Performance and Benchmarking

This section describes the performance characteristics of the wildfire level-set solver and provides guidance on benchmarking, profiling, and optimization.

Overview

The wildfire level-set solver is designed for high-performance wildfire simulations with scalable algorithms and efficient data structures. Performance depends on:

  • Grid resolution: Computational cost scales as O(N²) in 2D, O(N³) in 3D

  • Fire spread model: Physics-based models (Balbi, Lautenberger) are more expensive than empirical models (Rothermel, FBP)

  • Propagation method: FARSITE elliptical expansion is faster than level-set advection for coarse grids

  • Feature complexity: Spotting, crown fire, radiation preheating add overhead

Timing Benchmarks

The solver includes automated timing benchmarks in regtest/misc/timing_benchmark/ to track performance across code versions and detect regressions.

Running Benchmarks

From the build directory:

cd build

# Default benchmarks (level-set and FARSITE)
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
    --exe ./levelset --dim 2 --nsteps 30

# All fire spread models
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
    --exe ./levelset --dim 2 --nsteps 30 \
    --scenarios levelset farsite balbi cruz_crown cheney_gould fbp_o1a lautenberger

# Custom resolutions
python3 ../regtest/misc/timing_benchmark/run_benchmark.py \
    --exe ./levelset --dim 2 --nsteps 50 \
    --resolutions 64 128 256 512

Benchmark Output

The benchmark script generates:

  • Console output: Real-time progress and summary table

  • timing_results.csv: Detailed timing data with columns:

    • scenario: Fire spread model name

    • n_cells: Grid cells per dimension

    • total_cells: Total grid cells (n_cells^dim)

    • wall_time_s: Wall-clock execution time

    • nsteps: Number of timesteps executed

    • steps_per_second: Throughput metric

    • cells_per_step_per_s: Overall performance metric

Example output:

Scenario        N  Cells  Time(s)  Steps/s  Cells/step/s
----------------------------------------------------------------
levelset       32   1,024      1.23    24.4      24,976
levelset       64   4,096      4.82    6.2      25,393
levelset      128  16,384     19.45    1.5      24,784
levelset      256  65,536     78.12    0.4      25,042
farsite        32   1,024      0.95    31.6      32,375
farsite        64   4,096      3.54    8.5      34,790
...

Scaling Analysis

The benchmark computes an empirical scaling exponent α where T ∝ N^α:

  • Expected range: [0.8·dim, 1.6·dim]

  • Ideal scaling: α = dim (linear in total cells)

  • Sub-linear scaling (α < dim): Cache efficiency, vectorization

  • Super-linear scaling (α > dim): Memory bandwidth limitations

Example:

Estimated scaling exponent α = 2.03  (expected [1.6, 3.2])  ✓

Fire Spread Model Performance

Relative computational costs (normalized to Rothermel level-set):

Fire Spread Model

Relative Cost

Notes

Rothermel (level-set)

1.0× (baseline)

WENO5-Z + RK3 advection

FARSITE ellipse

0.7-0.9×

Faster for coarse grids

Balbi physics-based

1.2-1.5×

Additional radiation terms

Cruz crown fire

0.9-1.1×

Simple algebraic model

Cheney-Gould grassfire

0.8-1.0×

Piecewise-linear formula

FBP (Canadian)

0.9-1.1×

Lookup tables, efficient

Lautenberger semi-phys

1.1-1.4×

Physics-based coefficients

Note: Costs vary with grid resolution, timestep size, and feature configuration.

Feature Overhead

Performance impact of optional features (% overhead relative to baseline):

Feature

Overhead

Notes

Albini spotting

5-15%

Depends on spot frequency

Ember accumulation

2-5%

Additional field updates

Radiation preheating

10-20%

View factor calculations

Crown fire (Van Wagner)

3-8%

Initiation criteria checks

Bulk fuel consumption

5-10%

Post-frontal burnout tracking

FMC phenology

1-3%

Seasonal moisture updates

McArthur moisture scaling

<2%

Temperature/RH modulation

Periodic wind gusts

<1%

Sinusoidal wind updates

Recommendation: Enable only features needed for your simulation to minimize overhead.

Optimization Guidelines

Grid Resolution

Choose grid resolution based on:

  • Feature scale: Cell size should resolve fire perimeter features (typically 5-50 m)

  • Computational budget: Runtime scales as O(N²) in 2D

  • Accuracy requirements: Finer grids reduce numerical diffusion

Example: For 1 km × 1 km domain:

  • 32×32 grid (31 m cells): Fast, coarse features only

  • 64×64 grid (16 m cells): Good balance for most applications

  • 128×128 grid (8 m cells): High accuracy, slower

  • 256×256 grid (4 m cells): Very high accuracy, expensive

Timestep Selection

The CFL condition limits timestep size:

Δt ≤ CFL × Δx / max(ROS)

Recommendations:

  • CFL = 0.5: Safe default, stable for all scenarios

  • CFL = 0.7-0.9: Faster, use for well-tested scenarios

  • CFL < 0.5: Use if stability issues occur

Reinitialization Frequency

Level-set reinitialization maintains the signed-distance property:

  • reinit_int = -1: Never reinitialize (fastest, may drift over time)

  • reinit_int = 10-20: Good balance for most scenarios

  • reinit_int = 5: Frequent reinitialization (most accurate, slower)

Cost: Reinitialization typically adds 20-30% overhead per occurrence.

Plotfile Output

Plotfile I/O can dominate runtime for large grids and frequent output:

  • Disable during benchmarking: plot_int = -1

  • Minimize output frequency: plot_int = 20 or higher

  • Reduce output fields (future feature)

Example: 256×256 grid, plotfile writing can take 1-5 seconds per output.

Parallelization

Current Status

The solver is primarily serial with some parallelization:

  • AMReX-level parallelism: MPI domain decomposition (future)

  • Thread-level parallelism: OpenMP for some kernels (partial)

Future work: Full MPI+OpenMP hybrid parallelization for HPC systems.

Memory Usage

Approximate memory requirements (per grid cell, 2D):

  • Base solver: ~100-200 bytes/cell (level-set φ, ROS, intensity, etc.)

  • With spotting: +50-100 bytes/cell (ember density, spot tracking)

  • With crown fire: +50 bytes/cell (crown fuel fields)

  • AMReX overhead: ~50 bytes/cell (metadata, ghost cells)

Example: 256×256 grid (65,536 cells):

  • Base: ~13-20 MB

  • Full features: ~20-30 MB

3D scaling: Memory scales as O(N³), limit 3D grids to <100³ cells without HPC resources.

Profiling

AMReX Built-in Profiling

AMReX includes performance profiling tools. Build with:

cmake -S . -B build -DAMReX_TINY_PROFILE=ON

Run simulation, profiling output appears in console:

TinyProfiler total time across processes [min...avg...max]
LevelSet::Advect3D        [0.123, 0.125, 0.128]
LevelSet::Reinit          [0.045, 0.046, 0.047]
Rothermel::ComputeROS     [0.089, 0.090, 0.091]
...

External Profiling Tools

Linux perf:

perf record -g ./levelset inputs.i
perf report

gprof (build with -pg):

gfortran -pg ...  # or gcc -pg
./levelset inputs.i
gprof levelset gmon.out > profile.txt

Valgrind callgrind:

valgrind --tool=callgrind ./levelset inputs.i
kcachegrind callgrind.out.*

GPU Acceleration

Status: GPU support via AMReX GPU backends is under development.

Future capabilities:

  • CUDA/HIP backends for NVIDIA/AMD GPUs

  • GPU-accelerated fire spread kernels

  • Hybrid CPU/GPU execution

Estimated speedup: 5-20× for large grids (>128² cells) on modern GPUs.

Note: GPU scaling benchmarks are deferred pending infrastructure development.

Best Practices

For Performance-Critical Simulations

  1. Profile first: Identify actual bottlenecks before optimizing

  2. Right-size grid: Balance accuracy and performance

  3. Disable unused features: Turn off spotting, crown fire if not needed

  4. Reduce plotfile frequency: I/O can dominate runtime

  5. Use coarser propagation: FARSITE can be faster than level-set for operational forecasts

For Accuracy-Critical Simulations

  1. Grid convergence study: Test multiple resolutions

  2. Enable reinitialization: reinit_int = 10 or less

  3. Conservative CFL: cfl = 0.5

  4. Full physics: Enable all relevant features

For Development and Testing

  1. Small grids: 32×32 or 64×64 for rapid iteration

  2. Short runs: nsteps = 50 or final_time = 300

  3. Disable I/O: plot_int = -1

  4. Single model: Test one fire spread model at a time

Performance Reporting

When reporting performance issues or regressions:

  1. System specs: CPU model, RAM, compiler version

  2. Build configuration: Debug/Release, compiler flags

  3. Test case: Input file or scenario description

  4. Timing data: timing_results.csv from benchmark script

  5. Profiling data: If available

Example issue report:

System: Intel i9-12900K, 64 GB RAM, GCC 11.3, Ubuntu 22.04
Build: Release mode, -DCMAKE_BUILD_TYPE=Release
Test: 256×256 Rothermel level-set, 100 timesteps
Performance: 0.45 steps/s (expected: 0.6 steps/s based on v1.0)
Regression: 25% slower than previous version

References

  • AMReX documentation: https://amrex-codes.github.io/amrex/docs_html/

  • WENO schemes: Jiang & Shu (1996), “Efficient Implementation of Weighted ENO Schemes”

  • Level-set methods: Osher & Fedkiw (2003), “Level Set Methods and Dynamic Implicit Surfaces”

See Also