Semi-Blind Hyperspectral Unmixing using Diffusion Models
"End-to-end system combining hybrid CNNs and diffusion models for semi-blind hyperspectral unmixing, jointly estimating abundance maps and endmember spectra with physically consistent constraints."
Research Problem
Hyperspectral images consist of mixed spectral signatures at each pixel, making it difficult to recover pure material spectra (endmembers) and their spatial distributions (abundances). Traditional unmixing methods struggle with noise, non-linearity, and lack of labeled data.
Motivation
Accurate unmixing is critical for remote sensing, environmental monitoring, and scientific imaging. Existing approaches either rely on strong assumptions or fail under noisy and complex real-world conditions.
Hypothesis
Combining spectral-spatial feature extraction (via hybrid CNNs) with diffusion-based denoising can improve robustness and enable semi-blind recovery of both abundance maps and endmember spectra.
System Design
Designed a multi-stage pipeline integrating supervised feature extraction and generative refinement. The system consists of three core components: (1) HybridSTU CNN for initial abundance estimation, (2) 2D diffusion UNet for refining abundance maps, and (3) 1D diffusion UNet for reconstructing endmember spectra.
Architecture
HybridSTU uses 3D convolutions to capture spectral-spatial correlations, followed by 2D convolutions for spatial reasoning. Outputs are constrained using softmax to enforce physically valid abundance distributions. Diffusion models (ResUNet2D and ResUNet1D) are conditioned on timestep embeddings and iteratively denoise abundance maps and spectral vectors.
Approach
First, estimate coarse abundances using HybridSTU. Then apply diffusion models to progressively refine abundance maps and reconstruct clean endmember spectra. Use DDPM for training and DDIM for efficient inference.
Implementation
Implemented modular pipeline in PyTorch with separate training scripts for each model. Integrated diffusion core (DDPM/DDIM), custom losses (SAM + MSE), and evaluation scripts. Built CLI-based workflow to run full pipeline from training to inference.
Experiments
Trained models on hyperspectral patches to improve data efficiency. Evaluated performance using reconstruction metrics (MSE, RMSE, PSNR), abundance accuracy, and spectral similarity (SAM). Compared diffusion-refined outputs against baseline CNN estimates.
Observations
Initial CNN outputs capture coarse abundance structure but suffer from noise. Diffusion models significantly improve spatial coherence and spectral fidelity, especially under noisy conditions.
Insights
Separating abundance and spectral denoising into dedicated diffusion models improves stability. Time-conditioned UNets enable controlled refinement across multiple noise levels.
Challenges
High computational cost due to multi-stage training, managing stability of diffusion models, and ensuring physically consistent outputs across both spatial and spectral domains.
Limitations
Pipeline complexity increases training time. Requires careful hyperparameter tuning and significant compute for diffusion steps.
Tags
Future Work
Extend to fully end-to-end joint training, explore diffusion-based direct unmixing, and integrate self-supervised learning for improved scalability on unlabeled hyperspectral datasets.