AI-Based Hydrological Forecasting for Ungauged Catchments

Task

Build a machine learning rainfall-runoff model that forecasts river discharge for ungauged catchments using only meteorological observations and static catchment attributes.

Technologies

LSTM neural networks, DuckDB, Apache Parquet, Dagster, Amazon EKS, Amazon S3, GPU compute, Numba, HAND/D8/D-Infinity terrain analysis.

Result

A validated LSTM model that generates 72-hour discharge forecasts for the Nysa Kłodzka catchment directly from meteorological inputs, ready to be extended to nationwide deployment.

Executive Summary

Five-month research collaboration with IMGW-PIB on hydrological forecasting for ungauged catchments
LSTM rainfall-runoff model trained on 8TB of heterogeneous hydrometeorological data
32 input features combining dynamic meteorological forcing with 27 static catchment attributes
Orchestrated on AWS with Dagster, Amazon EKS, Amazon S3, and scale-to-zero GPU compute
DuckDB plus Apache Parquet as the analytical backbone for geospatial data
Validated on the Nysa Kłodzka catchment, one of Poland's most flood-prone regions
Flood extent pipeline combining rating curves, HAND terrain analysis, D8 and D-Infinity flow routing

1. What is Hydrological Forecasting for Ungauged Catchments?

Hydrological forecasting predicts river discharge and water levels ahead of time so that emergency response teams, reservoir operators, and infrastructure managers can act before a flood event. An ungauged catchment is a watershed without permanent streamflow measurement, which is the majority of headwater catchments in mountainous regions.

Predicting discharge where there is no river gauge is one of the defining problems in modern hydrology. Traditional hydraulic models answer it with high accuracy at a single location, but they do not scale to thousands of catchments. Machine learning offers an alternative: a model trained on gauged catchments can be transferred to ungauged ones using only meteorological forcing and static catchment attributes.

Why this matters for flood response

Most flood events originate in upper river reaches, where observations are sparse. A forecasting system that works for ungauged catchments extends the lead time for emergency response from “water is already rising here” to “water will rise here in 72 hours.”

2. Initial State: Why Traditional Hydraulic Models Do Not Scale

Conventional hydraulic models such as HEC-RAS produce highly accurate water level predictions for individual rivers. They do so by combining detailed cross-sections, Manning roughness coefficients, and extensive environmental calibration.

What makes traditional models expensive at scale

Detailed hydraulic cross-sections must be surveyed or LiDAR-derived for every river segment
Manning roughness coefficients require site-specific calibration against observed events
Two-dimensional hydraulic simulators are computationally expensive, especially for large floodplains
Significant compute resources are required for ensemble forecasts across many catchments

These constraints make traditional hydraulic modeling effective for individual rivers but expensive and difficult to scale when monitoring thousands of locations simultaneously. The project set out to test whether modern machine learning could match the predictive capability of those models at a fraction of the computational cost.

3. Project Objectives

The goal was to develop an AI-driven rainfall-runoff model that forecasts river discharge directly from meteorological observations and forecasts, providing an alternative to traditional hydraulic modeling for large-scale operational deployment.

Core objectives

Forecast river discharge in ungauged catchments, with the upper reaches of southern Polish rivers as the primary testbed
Match the predictive capability of hydraulic models while reducing computational cost per forecast
Build a portable data and training platform that runs identically on a laptop and in the cloud
Extend discharge predictions into flood extent maps for emergency response teams
Establish the foundation for nationwide deployment through regionalization

4. Machine Learning Approach: LSTM Rainfall-Runoff Model

Following an extensive literature review and analysis of research conducted by NOAA and other hydrological institutions, the project selected an LSTM (Long Short-Term Memory) neural network architecture.

Why LSTM

LSTMs are a recurrent neural network variant that retains information across long time horizons through gated memory cells. Hydrological processes are inherently sequential: a catchment remembers weeks of antecedent rainfall, snow accumulation, and soil moisture state. LSTM networks were selected for their proven effectiveness in modeling sequential environmental data and capturing long-term hydrological processes.

What the model learns

Dynamic meteorological forcing: how weather inputs propagate into streamflow over hours and days
Static catchment characteristics: how soil, terrain, geology, and vegetation shape the watershed's response
Long-term temporal dependencies: how antecedent conditions across weeks influence current discharge
Multi-step discharge forecasting: how to roll predictions forward 72 hours with stable error growth

5. Input Features: 32 Signals per Prediction Step

Each prediction step receives a total of 32 input features, split into two categories.

Dynamic features (hourly meteorological time series)

Air temperature (minimum and maximum)
Precipitation
Shortwave solar radiation
Vapor pressure

Static catchment attributes (27 descriptors)

Climate statistics and aridity index
Precipitation seasonality
Soil properties and hydraulic conductivity
Vegetation indices and forest coverage
Topographic features and geological permeability
Catchment area and elevation

The combination of dynamic forcing and static attributes is the standard recipe for regionalization in rainfall-runoff modeling: train on gauged catchments, then transfer the learned behavior to ungauged ones whose attributes fall inside the training distribution.

6. Data Engineering: 8TB Across Nine Formats

The largest challenge was not model development but data acquisition, normalization, and integration. Training and validation datasets exceeded 8 TB and originated from numerous heterogeneous sources.

Source systems

IMGW national hydrometeorological datasets
Copernicus services for European-wide climate and land data
Public geospatial repositories for terrain and land cover
Meteorological archives and reanalysis products

File formats processed

The pipeline ingested and normalized ten distinct formats:

CSV, Excel, TXT for tabular observations
GRIB and GML for meteorological and XML-encoded data
GeoTIFF for raster terrain and land cover
SHP, GPKG, GPX, GeoJSON for vector geospatial data
Legacy DAN meteorological files from older Polish monitoring systems

Standardized access to these sources would have shortened the project by months.

7. Analytical Backbone: DuckDB and Apache Parquet

To efficiently process large geospatial datasets, the project adopted DuckDB as the analytical database engine. All analytical datasets were stored in Apache Parquet, providing efficient columnar compression and optimized analytical performance.

Why DuckDB

In-process execution avoids the operational overhead of a separate database server
SQL interface keeps transformation logic readable for hydrologists
Native support for Parquet, including direct reads from S3-compatible object storage
High-performance geospatial extensions for raster and vector joins

The combination of DuckDB and Parquet on S3 meant that the same queries used during interactive analysis could be replayed at training time on the full 8TB corpus, without copying data between systems.

8. Cloud Architecture: AWS, Dagster, and Scale-to-Zero GPU

The complete platform was deployed on Amazon Web Services. The infrastructure consisted of Amazon EKS for managed Kubernetes, Amazon S3 for object storage, GPU-enabled compute nodes for model training, and Dagster as the orchestration layer.

What Dagster coordinated

Data ingestion from the ten source formats
Preprocessing pipelines that normalized units, timezones, and spatial references
Feature engineering that produced the 32 input features for each prediction step
Model training runs with full experiment lineage
Model validation against held-out catchments

Scale-to-zero compute

The Kubernetes environment implemented a scale-to-zero strategy for worker nodes, allowing Dagster to provision compute resources dynamically based on workload requirements. Each pipeline could specify required CPU, memory, and GPU resources through execution tags. Training runs consumed GPU nodes only for the duration of the experiment, then released them back to the cluster.

No idle GPU costs. No manual cluster sizing for each experiment.

9. Local Development Environment

The entire platform was designed for portability. Every component could be executed locally using Docker Compose with compatible infrastructure services such as MinIO for object storage.

This architecture enabled identical execution environments across local development and cloud deployment. The same Dagster definitions, the same DBT-style transformations, and the same training code ran on a laptop and on EKS. Hydrologists and engineers could iterate on a model in a notebook locally and promote the same code to GPU training on AWS without modification.

10. Inference: 14 Days of History, 72 Hours of Forecast

The trained model predicts river discharge in cubic meters per second (m³/s) for selected catchments. Each inference call requires:

14 days of historical hourly observations as warm-up context
72 hours of hourly weather forecasts as the dynamic forcing horizon

The output consists of hourly discharge predictions for the next three days. Model performance was evaluated using dedicated Jupyter notebooks and continuously validated against available hydrological observations.

11. Flood Simulation Pipeline: From Discharge to Water Level to Extent

Predicting discharge alone is insufficient for flood visualization. Two additional computational problems had to be addressed before the forecast became useful to emergency response teams.

Step 1: Converting discharge to water level

Flood extent algorithms require water level rather than discharge. To estimate water level, discharge predictions were transformed using rating curves Q(H), derived from river cross-sections and Manning roughness coefficients.

Step 2: Terrain-based flood propagation

Instead of computationally expensive two-dimensional hydraulic simulators such as HEC-RAS, the solution combines three established terrain analysis techniques:

D8 flow direction algorithm for single-direction flow routing on raster terrain
D-Infinity flow routing for fractional flow across multiple downslope neighbors
HAND (Height Above Nearest Drainage) terrain analysis for fast flood depth estimation per cell

Initial Python implementations proved insufficient for large-scale processing. Performance evaluation using Rust prototypes demonstrated that the required throughput was achievable, but at the cost of development velocity. The final implementation combined Python with Numba JIT compilation, providing near-native execution performance while maintaining Python development flexibility.

12. Results: Nysa Kłodzka Catchment Validation

The project initially focused on the Nysa Kłodzka catchment, one of Poland's most flood-prone regions. The objective was to demonstrate that machine learning can accurately forecast river discharge in complex mountainous catchments before extending the methodology to nationwide deployment.

The resulting model successfully generated discharge forecasts directly from meteorological forecast data and demonstrated promising predictive performance. Validation against available hydrological observations confirmed that the LSTM approach captured both seasonal patterns and event-scale response in the catchment.

13. Key Findings

Several important conclusions emerged during the project.

Regionalization is essential

Catchment characteristics vary significantly across Poland. Future nationwide deployment will require regionalized models adapted to local hydrological conditions rather than a single national model.

Data quality matters more than data volume

Processing terabytes of heterogeneous datasets proved technically manageable. The primary bottleneck was obtaining consistent, high-quality observational data. Machine learning models cannot compensate for missing or inaccurate measurements.

Domain expertise remains critical

Hydrological expertise was essential for identifying measurement stations whose operational behavior introduced significant noise into the training data. Stations affected by controlled reservoir releases, for example, negatively impacted model accuracy and had to be excluded from training sets.

Machine learning delivers the most value in headwater catchments

The greatest operational value is achieved in upper river reaches, where traditional hydrological observations are sparse and flood events originate. This is exactly where the LSTM-plus-terrain approach is meant to operate.

Data standardization would accelerate development

Hydrological datasets remain distributed across numerous formats and protocols, including FTP, REST APIs, proprietary formats, and GIS repositories. Standardized access mechanisms would significantly improve both research efficiency and operational deployment.

Frequently Asked Questions

What is an ungauged catchment in hydrology?

An ungauged catchment is a watershed without a permanent streamflow measurement station. Most headwater catchments in mountainous regions fall into this category, which makes flood forecasting there significantly harder than at gauged locations.

Why use LSTM networks for rainfall-runoff modeling?

LSTMs are recurrent neural networks that retain information across long time horizons through gated memory cells. Hydrological processes are inherently sequential, with antecedent rainfall, snow, and soil moisture influencing streamflow days or weeks later. LSTMs capture these long-term dependencies better than feed-forward architectures.

How accurate is AI-based discharge forecasting compared to hydraulic models?

AI-based rainfall-runoff models match the predictive capability of well-calibrated hydraulic models at a fraction of the computational cost, especially for event-scale forecasting in ungauged catchments where hydraulic models cannot be applied at all. Their main limitation is that they learn from observed events rather than physical laws.

What is HAND terrain analysis?

HAND (Height Above Nearest Drainage) is a terrain analysis method that computes the vertical distance from each cell in a digital elevation model to the nearest drainage channel. It is widely used for fast flood extent and depth estimation without running a full two-dimensional hydraulic simulation.

Can this approach scale to nationwide deployment?

The architecture is designed for nationwide scale-out, but the models themselves need regionalization. Catchment characteristics vary across Poland, and the project concluded that regionalized models adapted to local hydrological conditions will be required for operational nationwide use.

What data is required to forecast discharge with this approach?

Each prediction requires 14 days of historical hourly meteorological observations as warm-up context and 72 hours of hourly weather forecasts as the dynamic forcing horizon, plus 27 static catchment attributes describing the watershed's physical properties.

Architecture diagram of the AI-based hydrological forecasting platform: ten heterogeneous data sources on the left, Apache Parquet on Amazon S3 with DuckDB, Dagster orchestration on Amazon EKS with scale-to-zero GPU, the LSTM rainfall-runoff model in the middle, and the flood simulation pipeline combining rating curves, D8, D-Infinity and HAND terrain analysis on the right, ending in 72-hour discharge and flood extent outputs

LSTM Architecture

Long Short-Term Memory network combining dynamic meteorological forcing with 27 static catchment attributes for 72-hour discharge forecasting.

AWS Foundation

Amazon EKS for managed Kubernetes, Amazon S3 for object storage, GPU-enabled compute nodes, and Dagster as the orchestration backbone.

Scale-to-Zero Compute

Worker nodes scale to zero between experiments. Each pipeline declares its CPU, memory, and GPU requirements through Dagster execution tags.

DuckDB and Parquet

In-process analytical engine with native Parquet support and direct reads from S3-compatible object storage across 8TB of training data.

Portable Local Environment

Every component runs locally via Docker Compose with MinIO, so the same definitions execute on a laptop and on AWS.

Dagster Lineage

Every ingestion, preprocessing, feature, training run, and validation is a Dagster asset with full experiment history and reproducibility.

Continuous Validation

Model performance evaluated continuously in Jupyter notebooks against available hydrological observations.

Regionalization Path

Architecture ready for nationwide scale-out through regionalized models adapted to local hydrological conditions.

Hydrological Domain Curation

Domain expertise filters out measurement stations affected by reservoir releases and other non-natural behaviors before training.

Show all use cases