Long-Term Solar Irradiance
Forecasting for Koshi Province, Nepal
This study presents a 5-year (2025–2029) projection of Global Horizontal Irradiance (GHI) for the Koshi Province of Nepal, employing a Unified Regional Long Short-Term Memory (LSTM) Framework. Historical solar and meteorological data from six topographically diverse sites were concatenated into a single training matrix of 630,720 hourly observations spanning 12 years (2013–2024), sourced from the Solcast commercial satellite dataset.
To mitigate the exponential error accumulation inherent to multi-step autoregressive inference, the projection loop is anchored to a deterministic clear-sky physics engine (pvlib), computing exact orbital trajectory features at strict Δt = 1 h intervals across 43,824 future timesteps. Three architectures were benchmarked — LSTM, GRU, and CNN-LSTM — under identical training conditions. The 2-layer LSTM achieved superior performance and was selected for final projection.
Methodology
Unified Regional Training Dataset
Rather than developing highly localised models for individual coordinate points, data from all six measurement sites were merged into a single training matrix of 630,720 rows at strict Δt = 1 h resolution (8,760 records × 12 years × 6 sites). This compels the LSTM to extract fundamental atmospheric attenuation physics rather than memorise site-specific micro-climate signatures, substantially improving spatial transferability to unmonitored prospective sites.
Feature Engineering & Pearson Screening
13 kinematic and meteorological features were retained following Pearson
correlation screening (|r| > 0.05) against the GHI target. Primary predictors:
clearsky_ghi, zenith, dni,
cloud_opacity, air_temp, relative_humidity.
All features normalised to [0, 1] via MinMaxScaler prior to sequence generation.
Sliding window: npast = 24 h — exactly one diurnal cycle.
LSTM Architecture & Training Protocol
2-layer stacked LSTM (64 → 32 units), Adam optimiser, MSE loss function.
Early stopping with patience = 5 monitoring val_loss to prevent
overfitting. Three architectures — LSTM, GRU, CNN-LSTM — were benchmarked under
identical splits and hyperparameters. The LSTM returned the lowest test RMSE
(25.1 W/m²) and was selected for the 5-year autoregressive projection.
Deterministic pvlib Physics Anchor
To prevent exponential error accumulation during multi-step inference, the autoregressive loop was anchored to the pvlib deterministic clear-sky model. Exact orbital zenith angles and theoretical irradiance ceilings were computed at Δt = 1 h for all 43,824 future timesteps (5 × 8,760 hours + 24 leap-day hours). The LSTM operates as a non-linear atmospheric transfer function rather than a free-running statistical extrapolator, bounding predictions within physical constraints.
Daily GHI Yield — 2025 to 2029
Fig. 1. Daily integrated GHI yield (kWh/m²/day) from LSTM autoregressive inference, 2025–2029. Each trace corresponds to one calendar year; dashed amber curve is a centred 30-day moving average. Anchored to pvlib at Δt = 1 h, N = 43,824 steps.
| Year | Total Yield (kWh/m²) | Daily Average (kWh/m²/day) | Peak Day Yield (kWh/m²) | Monsoon Avg (kWh/m²/day) |
|---|---|---|---|---|
| Loading data… | ||||
Seasonal & Diurnal Irradiance Profile
Attenuation below the dashed clear-sky ceiling reflects cloud opacity and atmospheric aerosol scattering. Monsoon months (Jun–Aug) exhibit systematic suppression of GHI, consistent with historical Koshi Province climatology.
Monthly mean daily GHI by year reveals a consistent seasonal signal. The June–August trough corresponds to the South Asian monsoon, reducing average daily yield by approximately 38% versus the dry season.
Model Performance Comparison
All three architectures were trained under identical conditions: same 80/10/10 dataset splits, same optimiser (Adam, lr = 1×10−3), same early-stopping criterion (patience = 5, monitor = val_loss). Performance metrics are reported on the held-out test partition.
| Architecture | R² Score | RMSE (W/m²) | Training Time | Status |
|---|---|---|---|---|
| Loading data… | ||||
Fig. 4. R² scores (%) for LSTM, GRU, and CNN-LSTM on identical held-out test sequences. RMSE annotations shown inside bars. The standard 2-layer stacked LSTM achieves the highest R² (94.29%) and lowest hourly RMSE (25.1 W/m²), and was selected as the primary architecture for the 5-year autoregressive projection.