.. _wind_solver:

Wind Solver Implementation
==========================

This page describes the implementation details of the mass-consistent 3-D
wind solver in ``src/wind_solver.cpp``.

Source File: ``src/wind_solver.cpp``
-------------------------------------

The entire solver is implemented as a single self-contained C++ source file.
It uses AMReX for grid management, data distribution, GPU-portable field
operations, and the MLMG linear solver.

Execution Flow
--------------

The ``main()`` function performs the following steps in order:

**1. Parse inputs** (``ParmParse``)

All solver parameters are read from an ``inputs.i`` file or from command-line
``key=value`` pairs via AMReX ``ParmParse``.

**2. Read terrain**

The terrain point cloud (X, Y, Z) is read from a whitespace- or
comma-separated CSV file.  Lines beginning with ``#`` are treated as comments.
The horizontal domain extents (x_lo, x_hi, y_lo, y_hi) are derived from the
bounding box of the terrain data.

**3. Build grid**

Grid dimensions ``(nx, ny, nz)`` are computed from the domain extents and the
requested cell spacings ``(dx, dy, dz)``.  The vertical domain spans
[z_lo, z_hi] where z_lo = min terrain elevation and z_hi = max terrain
elevation + ``domain_height``.

An AMReX ``Geometry``, ``BoxArray``, and ``DistributionMapping`` are
constructed for parallel execution.

**4. IDW terrain interpolation**

For each horizontal column (i, j) the terrain elevation at the column centre
is estimated by inverse-distance-weighting (IDW) using the six nearest terrain
data points with inverse-square-distance weights.  The result is stored in a
host vector and copied to the device for use in GPU kernels.

**5. Log-law initialisation** (GPU kernel)

For every cell (i, j, k):

.. math::

   z_\text{agl} = z_\text{lo} + (k+0.5)\,\Delta z - z_\text{terrain}(i,j)

Cells where z_agl ≤ 0 are inside the terrain (set to zero).  For cells above
the terrain:

.. math::

   u_0(i,j,k) = \frac{u_*}{\kappa}\ln\!\left(\frac{z_\text{agl}+z_0}{z_0}\right)\hat{u}_x, \quad
   v_0(i,j,k) = \frac{u_*}{\kappa}\ln\!\left(\frac{z_\text{agl}+z_0}{z_0}\right)\hat{u}_y, \quad
   w_0(i,j,k) = 0

**6. Compute divergence** (GPU kernel)

The RHS of the Poisson equation is computed as:

.. math::

   \text{rhs}(i,j,k) = -\nabla\cdot\mathbf{u}_0

using centred differences in the interior and one-sided differences at domain
boundaries.  Sub-surface cells are skipped (rhs = 0).

**7. MLMG Poisson solve**

An ``MLABecLaplacian`` operator is set up with:

* α_a = 0 (no identity term)
* β_b = 1 (full diffusion)
* B coefficients: bx = by = α_h², bz = α_v²
* Boundary conditions: Dirichlet on x-faces, Neumann on y- and z-faces

``MLMG`` solves for the Lagrange multiplier λ to the requested relative
tolerance (default 1e-8).

**8. Correct velocity** (GPU kernel)

.. math::

   u = u_0 - \alpha_h^2\,\partial\lambda/\partial x, \quad
   v = v_0 - \alpha_h^2\,\partial\lambda/\partial y, \quad
   w = w_0 - \alpha_v^2\,\partial\lambda/\partial z

One-sided gradients are used at domain boundaries.  Sub-surface cells are
reset to zero.

**9. Compute divergence diagnostics**

The divergence before and after correction is computed for diagnostic
purposes.  The solver prints the maximum absolute divergence (as ``max|div|``)
before and after correction; the post-correction value should be at or below
the MLMG tolerance multiplied by the maximum divergence before correction.

**10. Write output**

The corrected wind field, initial wind field, Lagrange multiplier, divergence
diagnostics, and terrain elevation are written to an AMReX plotfile via
``WriteSingleLevelPlotfile``.

If ``extract_agl`` or ``extract_k`` is set, a terrain-aligned 2-D CSV slice
is also written.

Key Data Structures
-------------------

.. list-table::
   :header-rows: 1
   :widths: 20 20 60

   * - Variable
     - Type
     - Description
   * - ``vel0``
     - ``MultiFab(ba, dm, 3, 1)``
     - Initial log-law wind field (3 components, 1 ghost cell)
   * - ``vel_c``
     - ``MultiFab(ba, dm, 3, 0)``
     - Mass-corrected wind field (no ghost cells needed for output)
   * - ``lam``
     - ``MultiFab(ba, dm, 1, 1)``
     - Lagrange multiplier λ (1 ghost cell for gradient stencil)
   * - ``rhs``
     - ``MultiFab(ba, dm, 1, 0)``
     - Poisson RHS = −∇·\ **u**₀
   * - ``d_terr``
     - ``Gpu::DeviceVector<Real>``
     - Per-column terrain elevation on device (size nx × ny)

GPU Portability
---------------

All field-level loops use ``amrex::ParallelFor`` with
``AMREX_GPU_DEVICE`` lambdas, making the code portable across CUDA, HIP,
and SYCL backends.  The terrain elevation vector is copied to the device
once before the kernel launches and accessed via a raw device pointer.

The source file must be compiled as CUDA (``LANGUAGE CUDA``) when the CUDA
backend is selected.  This is handled automatically by CMake when
``MASSCONSISTENT_GPU_BACKEND=CUDA``.

Solver Settings
---------------

The MLMG solver is configured with:

* Max iterations: 200
* Max FMG iterations: 20
* Pre/post smoothing steps: 16 each
* Bottom solver verbosity: 0

These defaults are conservative and suitable for most single-level problems.
For very large domains, reducing smoothing steps or increasing ``max_grid_size``
can improve throughput.

References
----------

* Sherman, C.A. (1978). A mass-consistent model for wind fields over complex
  terrain.  *Journal of Applied Meteorology*, 17(3), 312–319.
* Mathiesen, M. (1987). Simulation of wind fields in complex terrain.
  *Boundary-Layer Meteorology*, 38, 213–226.
* AMReX MLMG documentation:
  https://amrex-codes.github.io/amrex/docs_html/LinearSolvers.html