← return to matrix

// GEOSPATIAL AI

Khandava — Forest Fire Prediction

Satellite-based forest fire prediction for Uttarakhand using multi-source geospatial data and XGBoost.

XGBoostGDALPythonPandasNumPyRasterIOERA5SMOTE

Overview

Architected a predictive system for forest fire risk in Uttarakhand by integrating multi-source geospatial data from NASA FIRMS, ISRO Bhuvan/Bhoonidhi, and ECMWF ERA5. An XGBoost classifier was optimized for extreme class imbalance across 450,000+ samples, achieving 75% Recall for fire events.

Problem

Uttarakhand is one of India's most fire-prone states, with dense forest cover and challenging terrain making ground monitoring infeasible. Early prediction from satellite data can enable timely forest department response and evacuation. The core challenge is extreme class imbalance — fire events are rare relative to non-fire observations.

Dataset Context

Name

Multi-Source Geospatial Fusion Dataset

Metadata Matrix Size

450,000+ samples

Telemetry Source

NASA FIRMS, ISRO Bhuvan, ISRO Bhoonidhi, ECMWF ERA5

Pipeline Preprocessing Steps

  • »ERA5 meteorological reanalysis (temperature, wind, humidity, precipitation) at 0.25° resolution
  • »DEM (Digital Elevation Model) — slope, aspect, elevation from ISRO Bhoonidhi
  • »LULC (Land Use Land Cover) classification rasters from ISRO Bhuvan
  • »NASA FIRMS VIIRS active fire detection as binary labels
  • »GDAL-based raster alignment and reprojection to common CRS
  • »SMOTE / class-weight tuning for extreme fire/non-fire imbalance

Architecture & Technical Foundation

Core Pattern

XGBoost Classifier

Technology Stack

scikit-learn + XGBoost

Key Components

  • Multi-source raster fusion and feature extraction via GDAL + RasterIO
  • XGBoost with scale_pos_weight tuning for class imbalance
  • Recall-optimized threshold selection (prioritizing early detection)
  • Spatial cross-validation to prevent geographic data leakage

ML Pipeline

01

Data Acquisition

Download ERA5, FIRMS, DEM, and LULC datasets for Uttarakhand bounding box.

02

Raster Preprocessing

GDAL-based CRS alignment, resampling, and clipping to study region.

03

Feature Engineering

Per-pixel feature extraction: wind speed, humidity, NDVI, slope, elevation, LULC class.

04

Label Generation

FIRMS fire hotspot labels matched to spatial grid.

05

Class Balancing

SMOTE + XGBoost scale_pos_weight for 450k+ imbalanced samples.

06

Training

XGBoost with early stopping, Recall-focused threshold tuning.

07

Evaluation

Spatial holdout evaluation on unseen geographic tiles.

Results & Outcomes

75%

Recall (Fire Events)

450,000+

Samples Processed

4

Data Sources

  • 75% Recall on fire events — maximizing early detection over global accuracy
  • Successfully fused 4 heterogeneous geospatial data sources into a unified tabular dataset
  • GDAL-based preprocessing pipeline handles high-resolution ERA5, DEM, and LULC rasters