Executive Summary
- Five-month research collaboration with IMGW-PIB on hydrological forecasting for ungauged catchments
- LSTM rainfall-runoff model trained on 8TB of heterogeneous hydrometeorological data
- 32 input features combining dynamic meteorological forcing with 27 static catchment attributes
- Orchestrated on AWS with Dagster, Amazon EKS, Amazon S3, and scale-to-zero GPU compute
- DuckDB plus Apache Parquet as the analytical backbone for geospatial data
- Validated on the Nysa Kłodzka catchment, one of Poland's most flood-prone regions
- Flood extent pipeline combining rating curves, HAND terrain analysis, D8 and D-Infinity flow routing
1. What is Hydrological Forecasting for Ungauged Catchments?
Hydrological forecasting predicts river discharge and water levels ahead of time so that emergency response teams, reservoir operators, and infrastructure managers can act before a flood event. An ungauged catchment is a watershed without permanent streamflow measurement, which is the majority of headwater catchments in mountainous regions.
Predicting discharge where there is no river gauge is one of the defining problems in modern hydrology. Traditional hydraulic models answer it with high accuracy at a single location, but they do not scale to thousands of catchments. Machine learning offers an alternative: a model trained on gauged catchments can be transferred to ungauged ones using only meteorological forcing and static catchment attributes.
Why this matters for flood response
Most flood events originate in upper river reaches, where observations are sparse. A forecasting system that works for ungauged catchments extends the lead time for emergency response from “water is already rising here” to “water will rise here in 72 hours.”
2. Initial State: Why Traditional Hydraulic Models Do Not Scale
Conventional hydraulic models such as HEC-RAS produce highly accurate water level predictions for individual rivers. They do so by combining detailed cross-sections, Manning roughness coefficients, and extensive environmental calibration.
What makes traditional models expensive at scale
- Detailed hydraulic cross-sections must be surveyed or LiDAR-derived for every river segment
- Manning roughness coefficients require site-specific calibration against observed events
- Two-dimensional hydraulic simulators are computationally expensive, especially for large floodplains
- Significant compute resources are required for ensemble forecasts across many catchments
These constraints make traditional hydraulic modeling effective for individual rivers but expensive and difficult to scale when monitoring thousands of locations simultaneously. The project set out to test whether modern machine learning could match the predictive capability of those models at a fraction of the computational cost.
3. Project Objectives
The goal was to develop an AI-driven rainfall-runoff model that forecasts river discharge directly from meteorological observations and forecasts, providing an alternative to traditional hydraulic modeling for large-scale operational deployment.
Core objectives
- Forecast river discharge in ungauged catchments, with the upper reaches of southern Polish rivers as the primary testbed
- Match the predictive capability of hydraulic models while reducing computational cost per forecast
- Build a portable data and training platform that runs identically on a laptop and in the cloud
- Extend discharge predictions into flood extent maps for emergency response teams
- Establish the foundation for nationwide deployment through regionalization
4. Machine Learning Approach: LSTM Rainfall-Runoff Model
Following an extensive literature review and analysis of research conducted by NOAA and other hydrological institutions, the project selected an LSTM (Long Short-Term Memory) neural network architecture.
Why LSTM
LSTMs are a recurrent neural network variant that retains information across long time horizons through gated memory cells. Hydrological processes are inherently sequential: a catchment remembers weeks of antecedent rainfall, snow accumulation, and soil moisture state. LSTM networks were selected for their proven effectiveness in modeling sequential environmental data and capturing long-term hydrological processes.
What the model learns
- Dynamic meteorological forcing: how weather inputs propagate into streamflow over hours and days
- Static catchment characteristics: how soil, terrain, geology, and vegetation shape the watershed's response
- Long-term temporal dependencies: how antecedent conditions across weeks influence current discharge
- Multi-step discharge forecasting: how to roll predictions forward 72 hours with stable error growth
5. Input Features: 32 Signals per Prediction Step
Each prediction step receives a total of 32 input features, split into two categories.
Dynamic features (hourly meteorological time series)
- Air temperature (minimum and maximum)
- Precipitation
- Shortwave solar radiation
- Vapor pressure
Static catchment attributes (27 descriptors)
- Climate statistics and aridity index
- Precipitation seasonality
- Soil properties and hydraulic conductivity
- Vegetation indices and forest coverage
- Topographic features and geological permeability
- Catchment area and elevation
The combination of dynamic forcing and static attributes is the standard recipe for regionalization in rainfall-runoff modeling: train on gauged catchments, then transfer the learned behavior to ungauged ones whose attributes fall inside the training distribution.
6. Data Engineering: 8TB Across Nine Formats
The largest challenge was not model development but data acquisition, normalization, and integration. Training and validation datasets exceeded 8 TB and originated from numerous heterogeneous sources.
Source systems
- IMGW national hydrometeorological datasets
- Copernicus services for European-wide climate and land data
- Public geospatial repositories for terrain and land cover
- Meteorological archives and reanalysis products
File formats processed
The pipeline ingested and normalized ten distinct formats:
- CSV, Excel, TXT for tabular observations
- GRIB and GML for meteorological and XML-encoded data
- GeoTIFF for raster terrain and land cover
- SHP, GPKG, GPX, GeoJSON for vector geospatial data
- Legacy DAN meteorological files from older Polish monitoring systems
Standardized access to these sources would have shortened the project by months.
7. Analytical Backbone: DuckDB and Apache Parquet
To efficiently process large geospatial datasets, the project adopted DuckDB as the analytical database engine. All analytical datasets were stored in Apache Parquet, providing efficient columnar compression and optimized analytical performance.
Why DuckDB
- In-process execution avoids the operational overhead of a separate database server
- SQL interface keeps transformation logic readable for hydrologists
- Native support for Parquet, including direct reads from S3-compatible object storage
- High-performance geospatial extensions for raster and vector joins
The combination of DuckDB and Parquet on S3 meant that the same queries used during interactive analysis could be replayed at training time on the full 8TB corpus, without copying data between systems.
8. Cloud Architecture: AWS, Dagster, and Scale-to-Zero GPU
The complete platform was deployed on Amazon Web Services. The infrastructure consisted of Amazon EKS for managed Kubernetes, Amazon S3 for object storage, GPU-enabled compute nodes for model training, and Dagster as the orchestration layer.
What Dagster coordinated
- Data ingestion from the ten source formats
- Preprocessing pipelines that normalized units, timezones, and spatial references
- Feature engineering that produced the 32 input features for each prediction step
- Model training runs with full experiment lineage
- Model validation against held-out catchments
Scale-to-zero compute
The Kubernetes environment implemented a scale-to-zero strategy for worker nodes, allowing Dagster to provision compute resources dynamically based on workload requirements. Each pipeline could specify required CPU, memory, and GPU resources through execution tags. Training runs consumed GPU nodes only for the duration of the experiment, then released them back to the cluster.
No idle GPU costs. No manual cluster sizing for each experiment.
9. Local Development Environment
The entire platform was designed for portability. Every component could be executed locally using Docker Compose with compatible infrastructure services such as MinIO for object storage.
This architecture enabled identical execution environments across local development and cloud deployment. The same Dagster definitions, the same DBT-style transformations, and the same training code ran on a laptop and on EKS. Hydrologists and engineers could iterate on a model in a notebook locally and promote the same code to GPU training on AWS without modification.
10. Inference: 14 Days of History, 72 Hours of Forecast
The trained model predicts river discharge in cubic meters per second (m³/s) for selected catchments. Each inference call requires:
- 14 days of historical hourly observations as warm-up context
- 72 hours of hourly weather forecasts as the dynamic forcing horizon
The output consists of hourly discharge predictions for the next three days. Model performance was evaluated using dedicated Jupyter notebooks and continuously validated against available hydrological observations.
11. Flood Simulation Pipeline: From Discharge to Water Level to Extent
Predicting discharge alone is insufficient for flood visualization. Two additional computational problems had to be addressed before the forecast became useful to emergency response teams.
Step 1: Converting discharge to water level
Flood extent algorithms require water level rather than discharge. To estimate water level, discharge predictions were transformed using rating curves Q(H), derived from river cross-sections and Manning roughness coefficients.
Step 2: Terrain-based flood propagation
Instead of computationally expensive two-dimensional hydraulic simulators such as HEC-RAS, the solution combines three established terrain analysis techniques:
- D8 flow direction algorithm for single-direction flow routing on raster terrain
- D-Infinity flow routing for fractional flow across multiple downslope neighbors
- HAND (Height Above Nearest Drainage) terrain analysis for fast flood depth estimation per cell
Initial Python implementations proved insufficient for large-scale processing. Performance evaluation using Rust prototypes demonstrated that the required throughput was achievable, but at the cost of development velocity. The final implementation combined Python with Numba JIT compilation, providing near-native execution performance while maintaining Python development flexibility.
12. Results: Nysa Kłodzka Catchment Validation
The project initially focused on the Nysa Kłodzka catchment, one of Poland's most flood-prone regions. The objective was to demonstrate that machine learning can accurately forecast river discharge in complex mountainous catchments before extending the methodology to nationwide deployment.
The resulting model successfully generated discharge forecasts directly from meteorological forecast data and demonstrated promising predictive performance. Validation against available hydrological observations confirmed that the LSTM approach captured both seasonal patterns and event-scale response in the catchment.
13. Key Findings
Several important conclusions emerged during the project.
Regionalization is essential
Catchment characteristics vary significantly across Poland. Future nationwide deployment will require regionalized models adapted to local hydrological conditions rather than a single national model.
Data quality matters more than data volume
Processing terabytes of heterogeneous datasets proved technically manageable. The primary bottleneck was obtaining consistent, high-quality observational data. Machine learning models cannot compensate for missing or inaccurate measurements.
Domain expertise remains critical
Hydrological expertise was essential for identifying measurement stations whose operational behavior introduced significant noise into the training data. Stations affected by controlled reservoir releases, for example, negatively impacted model accuracy and had to be excluded from training sets.
Machine learning delivers the most value in headwater catchments
The greatest operational value is achieved in upper river reaches, where traditional hydrological observations are sparse and flood events originate. This is exactly where the LSTM-plus-terrain approach is meant to operate.
Data standardization would accelerate development
Hydrological datasets remain distributed across numerous formats and protocols, including FTP, REST APIs, proprietary formats, and GIS repositories. Standardized access mechanisms would significantly improve both research efficiency and operational deployment.
Frequently Asked Questions
What is an ungauged catchment in hydrology?
An ungauged catchment is a watershed without a permanent streamflow measurement station. Most headwater catchments in mountainous regions fall into this category, which makes flood forecasting there significantly harder than at gauged locations.
Why use LSTM networks for rainfall-runoff modeling?
LSTMs are recurrent neural networks that retain information across long time horizons through gated memory cells. Hydrological processes are inherently sequential, with antecedent rainfall, snow, and soil moisture influencing streamflow days or weeks later. LSTMs capture these long-term dependencies better than feed-forward architectures.
How accurate is AI-based discharge forecasting compared to hydraulic models?
AI-based rainfall-runoff models match the predictive capability of well-calibrated hydraulic models at a fraction of the computational cost, especially for event-scale forecasting in ungauged catchments where hydraulic models cannot be applied at all. Their main limitation is that they learn from observed events rather than physical laws.
What is HAND terrain analysis?
HAND (Height Above Nearest Drainage) is a terrain analysis method that computes the vertical distance from each cell in a digital elevation model to the nearest drainage channel. It is widely used for fast flood extent and depth estimation without running a full two-dimensional hydraulic simulation.
Can this approach scale to nationwide deployment?
The architecture is designed for nationwide scale-out, but the models themselves need regionalization. Catchment characteristics vary across Poland, and the project concluded that regionalized models adapted to local hydrological conditions will be required for operational nationwide use.
What data is required to forecast discharge with this approach?
Each prediction requires 14 days of historical hourly meteorological observations as warm-up context and 72 hours of hourly weather forecasts as the dynamic forcing horizon, plus 27 static catchment attributes describing the watershed's physical properties.

