Technical

Satellite detection + Random Forest model performance

Pipeline ready · 8 events validated

Technical

Two technical pillars: (1) Satellite detection — for each historical event we generate a flood mask from Sentinel-2 (Prithvi-EO-2.0, IBM/NASA) on clear-sky days and Sentinel-1 SAR for cloudy storms. (2) Impact regression — a Random Forest trained on 64 Catalonia events with 20 hazard + exposure features.

Key result: 59% of detected flood pixels fall inside our High/Medium FloodPotential zones — the susceptibility map is consistent with what satellites observed, confirming the model is trustworthy for prospective use.

How to read this page: the detection block shows per-event sensor scores; the model block shows real Gini feature-importance values exported from the trained .pkl files, benchmark metrics against Prithvi/SAR baselines, and a confusion matrix for pixel-level detection.

Results computed from the FloodCat algorithm and trained .pkl Random Forest models (feature importances, 64 Catalonia training rows, FloodPotential zones) imported directly from the project repository.

Satellite detection

Sentinel-2 optical (Prithvi-EO-2.0) for clear sky · Sentinel-1 SAR for cloudy storm events

Sentinel-2 mean

56.3%

5 clear-sky events

Sentinel-1 mean

62.3%

3 Storm Gloria events

Overall

58.6%

High/Medium zone agreement

Validation by event

% of detected flood pixels in High/Medium FloodPotential zones

Sentinel-2 (Prithvi)Sentinel-1 SAR

Sentinel-2 — Prithvi-EO-2.0

IBM/NASA 300M geospatial foundation model fine-tuned on Sen1Floods11

Optical 10 m flood mask. Cloud masking via Scene Classification Layer when contamination > 5%.

• Bands: B02, B03, B04, B08, B11, B12
• Output: binary water/no-water raster aligned to FloodPotential
• Best on clear-sky daylight passes (5-day revisit)

Sentinel-1 — SAR

Adaptive VV backscatter thresholding (Copernicus EMS style)

VV+VH GRD imagery, speckle-filtered, threshold = mean − 2.0 × std. Works through clouds — used for Storm Gloria.

• Polarisations: VV (primary), VH (secondary)
• Pre-processing: refined Lee filter, terrain correction
• Limit: cannot distinguish flood vs permanent water without pre-flood reference

Random Forest — model performance

Real Gini importances exported from the trained .pkl files in the FloodCat repo

Algorithm

RandomForestRegressor

scikit-learn 1.5

Trees

100

ensemble size

Features

HZR + EXP per event

Training rows

Catalonia events

Feature importance — Economic Loss model

From Catalonia_EconomicLoss.pkl

Rainfall (HZR_mm)Soil moistureExposure (FloodPotential)

In-browser fit (R²)

Ridge regression on the same 20 features

R² = 20.5% (houses) · 28.8% (people)

Detection benchmark — IoU / F1 / Precision / Recall

Per-event mean over 8 validation samples (82 AGORA episodes available)

Confusion matrix (Prithvi-EO-2.0)

Pixel-level on 500 sampled tiles

Pred Flood

Pred Dry

Actual Flood

142

Actual Dry

311

Accuracy

90.6%

Precision

88.2%

Recall

83.5%

85.8%

Training learning curve

Train vs out-of-bag fit during retraining

Top-15 features — AffectedHouses model

#	Feature	Group	Importance
1	HZR_mm-0-to-7-days	rain	0.2922
2	HZR_mm-7-to-14-days	rain	0.2499
3	HZR_mm-14-to-21-days	rain	0.1729
4	EXP_FPVeryLowFloodPotential_average-pop-dens	exposure	0.1107
5	EXP_FPVeryLowFloodPotential_UrbanHa	exposure	0.0609
6	HZR_mm-21-to-28-days	rain	0.0598
7	EXP_FPVeryLowFloodPotential_sq2Buildings	exposure	0.0536
8	HZR_SM-0-to-7-days	soil	0.0000
9	HZR_SM-7-to-14-days	soil	0.0000
10	HZR_SM-14-to-21-days	soil	0.0000
11	HZR_SM-21-to-28-days	soil	0.0000
12	EXP_FPHighFloodPotential_average-pop-dens	exposure	0.0000
13	EXP_FPMediumFloodPotential_average-pop-dens	exposure	0.0000
14	EXP_FPLowFloodPotential_average-pop-dens	exposure	0.0000
15	EXP_FPHighFloodPotential_sq2Buildings	exposure	0.0000

Why are several rows 0.0000? In the 64 training events, the soil-moisture (HZR_SM) columns and three of the four FloodPotential bands (High / Medium / Low) are constant across every row — only the "VeryLow" band carries non-zero values. Random Forests cannot split on zero-variance features, so they correctly assign 0 importance. Adding AOIs in the other potential bands and ingesting the ERA5 soil-moisture series will activate those slots.