Wind Solver Implementation

This page describes the implementation details of the mass-consistent 3-D wind solver in src/wind_solver.cpp.

Source File: `src/wind_solver.cpp`

The entire solver is implemented as a single self-contained C++ source file. It uses AMReX for grid management, data distribution, GPU-portable field operations, and the MLMG linear solver.

Execution Flow

The main() function performs the following steps in order:

1. Parse inputs (ParmParse)

All solver parameters are read from an inputs.i file or from command-line key=value pairs via AMReX ParmParse.

2. Read terrain

The terrain point cloud (X, Y, Z) is read from a whitespace- or comma-separated CSV file. Lines beginning with # are treated as comments. The horizontal domain extents (x_lo, x_hi, y_lo, y_hi) are derived from the bounding box of the terrain data.

3. Build grid

Grid dimensions (nx, ny, nz) are computed from the domain extents and the requested cell spacings (dx, dy, dz). The vertical domain spans [z_lo, z_hi] where z_lo = min terrain elevation and z_hi = max terrain elevation + domain_height.

An AMReX Geometry, BoxArray, and DistributionMapping are constructed for parallel execution.

4. IDW terrain interpolation

For each horizontal column (i, j) the terrain elevation at the column centre is estimated by inverse-distance-weighting (IDW) using the six nearest terrain data points with inverse-square-distance weights. The result is stored in a host vector and copied to the device for use in GPU kernels.

5. Log-law initialisation (GPU kernel)

For every cell (i, j, k):

\[z_\text{agl} = z_\text{lo} + (k+0.5)\,\Delta z - z_\text{terrain}(i,j)\]

Cells where z_agl ≤ 0 are inside the terrain (set to zero). For cells above the terrain:

\[u_0(i,j,k) = \frac{u_*}{\kappa}\ln\!\left(\frac{z_\text{agl}+z_0}{z_0}\right)\hat{u}_x, \quad v_0(i,j,k) = \frac{u_*}{\kappa}\ln\!\left(\frac{z_\text{agl}+z_0}{z_0}\right)\hat{u}_y, \quad w_0(i,j,k) = 0\]

6. Compute divergence (GPU kernel)

The RHS of the Poisson equation is computed as:

\[\text{rhs}(i,j,k) = -\nabla\cdot\mathbf{u}_0\]

using centred differences in the interior and one-sided differences at domain boundaries. Sub-surface cells are skipped (rhs = 0).

7. MLMG Poisson solve

An MLABecLaplacian operator is set up with:

α_a = 0 (no identity term)
β_b = 1 (full diffusion)
B coefficients: bx = by = α_h², bz = α_v²
Boundary conditions: Dirichlet on x-faces, Neumann on y- and z-faces

MLMG solves for the Lagrange multiplier λ to the requested relative tolerance (default 1e-8).

8. Correct velocity (GPU kernel)

\[u = u_0 - \alpha_h^2\,\partial\lambda/\partial x, \quad v = v_0 - \alpha_h^2\,\partial\lambda/\partial y, \quad w = w_0 - \alpha_v^2\,\partial\lambda/\partial z\]

One-sided gradients are used at domain boundaries. Sub-surface cells are reset to zero.

9. Compute divergence diagnostics

The divergence before and after correction is computed for diagnostic purposes. The solver prints the maximum absolute divergence (as max|div|) before and after correction; the post-correction value should be at or below the MLMG tolerance multiplied by the maximum divergence before correction.

10. Write output

The corrected wind field, initial wind field, Lagrange multiplier, divergence diagnostics, and terrain elevation are written to an AMReX plotfile via WriteSingleLevelPlotfile.

If extract_agl or extract_k is set, a terrain-aligned 2-D CSV slice is also written.

Key Data Structures

Variable	Type	Description
`vel0`	`MultiFab(ba, dm, 3, 1)`	Initial log-law wind field (3 components, 1 ghost cell)
`vel_c`	`MultiFab(ba, dm, 3, 0)`	Mass-corrected wind field (no ghost cells needed for output)
`lam`	`MultiFab(ba, dm, 1, 1)`	Lagrange multiplier λ (1 ghost cell for gradient stencil)
`rhs`	`MultiFab(ba, dm, 1, 0)`	Poisson RHS = −∇·u₀
`d_terr`	`Gpu::DeviceVector<Real>`	Per-column terrain elevation on device (size nx × ny)

GPU Portability

All field-level loops use amrex::ParallelFor with AMREX_GPU_DEVICE lambdas, making the code portable across CUDA, HIP, and SYCL backends. The terrain elevation vector is copied to the device once before the kernel launches and accessed via a raw device pointer.

The source file must be compiled as CUDA (LANGUAGE CUDA) when the CUDA backend is selected. This is handled automatically by CMake when MASSCONSISTENT_GPU_BACKEND=CUDA.

Solver Settings

The MLMG solver is configured with:

Max iterations: 200
Max FMG iterations: 20
Pre/post smoothing steps: 16 each
Bottom solver verbosity: 0

These defaults are conservative and suitable for most single-level problems. For very large domains, reducing smoothing steps or increasing max_grid_size can improve throughput.

References

Sherman, C.A. (1978). A mass-consistent model for wind fields over complex terrain. Journal of Applied Meteorology, 17(3), 312–319.
Mathiesen, M. (1987). Simulation of wind fields in complex terrain. Boundary-Layer Meteorology, 38, 213–226.
AMReX MLMG documentation: https://amrex-codes.github.io/amrex/docs_html/LinearSolvers.html