Technical
Satellite detection + Random Forest model performance
Technical
Two technical pillars: (1) Satellite detection — for each historical event we generate a flood mask from Sentinel-2 (Prithvi-EO-2.0, IBM/NASA) on clear-sky days and Sentinel-1 SAR for cloudy storms. (2) Impact regression — a Random Forest trained on 64 Catalonia events with 20 hazard + exposure features.
Key result: 59% of detected flood pixels fall inside our High/Medium FloodPotential zones — the susceptibility map is consistent with what satellites observed, confirming the model is trustworthy for prospective use.
How to read this page: the detection block shows per-event sensor scores; the model block shows real Gini feature-importance values exported from the trained .pkl files, benchmark metrics against Prithvi/SAR baselines, and a confusion matrix for pixel-level detection.
.pkl Random Forest models (feature importances, 64 Catalonia training rows, FloodPotential zones) imported directly from the project repository.Satellite detection
Sentinel-2 optical (Prithvi-EO-2.0) for clear sky · Sentinel-1 SAR for cloudy storm events
Sentinel-2 mean
56.3%
5 clear-sky events
Sentinel-1 mean
62.3%
3 Storm Gloria events
Overall
58.6%
High/Medium zone agreement
Optical 10 m flood mask. Cloud masking via Scene Classification Layer when contamination > 5%.
- • Bands: B02, B03, B04, B08, B11, B12
- • Output: binary water/no-water raster aligned to FloodPotential
- • Best on clear-sky daylight passes (5-day revisit)
VV+VH GRD imagery, speckle-filtered, threshold = mean − 2.0 × std. Works through clouds — used for Storm Gloria.
- • Polarisations: VV (primary), VH (secondary)
- • Pre-processing: refined Lee filter, terrain correction
- • Limit: cannot distinguish flood vs permanent water without pre-flood reference
Random Forest — model performance
Real Gini importances exported from the trained .pkl files in the FloodCat repo
Algorithm
RandomForestRegressor
scikit-learn 1.5
Trees
100
ensemble size
Features
20
HZR + EXP per event
Training rows
64
Catalonia events
Catalonia_EconomicLoss.pklR² = 20.5% (houses) · 28.8% (people)
Accuracy
90.6%
Precision
88.2%
Recall
83.5%
F1
85.8%
| # | Feature | Group | Importance |
|---|---|---|---|
| 1 | HZR_mm-0-to-7-days | rain | 0.2922 |
| 2 | HZR_mm-7-to-14-days | rain | 0.2499 |
| 3 | HZR_mm-14-to-21-days | rain | 0.1729 |
| 4 | EXP_FPVeryLowFloodPotential_average-pop-dens | exposure | 0.1107 |
| 5 | EXP_FPVeryLowFloodPotential_UrbanHa | exposure | 0.0609 |
| 6 | HZR_mm-21-to-28-days | rain | 0.0598 |
| 7 | EXP_FPVeryLowFloodPotential_sq2Buildings | exposure | 0.0536 |
| 8 | HZR_SM-0-to-7-days | soil | 0.0000 |
| 9 | HZR_SM-7-to-14-days | soil | 0.0000 |
| 10 | HZR_SM-14-to-21-days | soil | 0.0000 |
| 11 | HZR_SM-21-to-28-days | soil | 0.0000 |
| 12 | EXP_FPHighFloodPotential_average-pop-dens | exposure | 0.0000 |
| 13 | EXP_FPMediumFloodPotential_average-pop-dens | exposure | 0.0000 |
| 14 | EXP_FPLowFloodPotential_average-pop-dens | exposure | 0.0000 |
| 15 | EXP_FPHighFloodPotential_sq2Buildings | exposure | 0.0000 |
Why are several rows 0.0000? In the 64 training events, the soil-moisture (HZR_SM) columns and three of the four FloodPotential bands (High / Medium / Low) are constant across every row — only the "VeryLow" band carries non-zero values. Random Forests cannot split on zero-variance features, so they correctly assign 0 importance. Adding AOIs in the other potential bands and ingesting the ERA5 soil-moisture series will activate those slots.