§ Overview

Long-Term Solar Irradiance
Forecasting for Koshi Province, Nepal

This study presents a 5-year (2025–2029) projection of Global Horizontal Irradiance (GHI) for the Koshi Province of Nepal, employing a Unified Regional Long Short-Term Memory (LSTM) Framework. Historical solar and meteorological data from six topographically diverse sites were concatenated into a single training matrix of 630,720 hourly observations spanning 12 years (2013–2024), sourced from the Solcast commercial satellite dataset.

To mitigate the exponential error accumulation inherent to multi-step autoregressive inference, the projection loop is anchored to a deterministic clear-sky physics engine (pvlib), computing exact orbital trajectory features at strict Δt = 1 h intervals across 43,824 future timesteps. Three architectures were benchmarked — LSTM, GRU, and CNN-LSTM — under identical training conditions. The 2-layer LSTM achieved superior performance and was selected for final projection.

2,375 kWh / m² / yr Annual Average Yield

11,877 kWh / m² 5-Year Cumulative

8.31 kWh / m² / day Peak Daily Yield

98.78% R² (daily integrated) LSTM Accuracy

§ 1 — Study Design

Methodology

01

Unified Regional Training Dataset

Rather than developing highly localised models for individual coordinate points, data from all six measurement sites were merged into a single training matrix of 630,720 rows at strict Δt = 1 h resolution (8,760 records × 12 years × 6 sites). This compels the LSTM to extract fundamental atmospheric attenuation physics rather than memorise site-specific micro-climate signatures, substantially improving spatial transferability to unmonitored prospective sites.

02

Feature Engineering & Pearson Screening

13 kinematic and meteorological features were retained following Pearson correlation screening (|r| > 0.05) against the GHI target. Primary predictors: clearsky_ghi, zenith, dni, cloud_opacity, air_temp, relative_humidity. All features normalised to [0, 1] via MinMaxScaler prior to sequence generation. Sliding window: n_past = 24 h — exactly one diurnal cycle.

03

LSTM Architecture & Training Protocol

2-layer stacked LSTM (64 → 32 units), Adam optimiser, MSE loss function. Early stopping with patience = 5 monitoring val_loss to prevent overfitting. Three architectures — LSTM, GRU, CNN-LSTM — were benchmarked under identical splits and hyperparameters. The LSTM returned the lowest test RMSE (25.1 W/m²) and was selected for the 5-year autoregressive projection.

04

Deterministic pvlib Physics Anchor

To prevent exponential error accumulation during multi-step inference, the autoregressive loop was anchored to the pvlib deterministic clear-sky model. Exact orbital zenith angles and theoretical irradiance ceilings were computed at Δt = 1 h for all 43,824 future timesteps (5 × 8,760 hours + 24 leap-day hours). The LSTM operates as a non-linear atmospheric transfer function rather than a free-running statistical extrapolator, bounding predictions within physical constraints.

§ 2 — Multi-Year Projection

Daily GHI Yield — 2025 to 2029

Fig. 1. Daily integrated GHI yield (kWh/m²/day) from LSTM autoregressive inference, 2025–2029. Each trace corresponds to one calendar year; dashed amber curve is a centred 30-day moving average. Anchored to pvlib at Δt = 1 h, N = 43,824 steps.

Table 1 — Annual Energy Yield Summary

Year	Total Yield (kWh/m²)	Daily Average (kWh/m²/day)	Peak Day Yield (kWh/m²)	Monsoon Avg (kWh/m²/day)
Loading data…

§ 3 — Temporal Analysis

Seasonal & Diurnal Irradiance Profile

Fig. 2 — Hourly GHI · 7-Day Snapshot

LSTM prediction (solid) vs. pvlib clear-sky ceiling (dashed)

Year

Season

Attenuation below the dashed clear-sky ceiling reflects cloud opacity and atmospheric aerosol scattering. Monsoon months (Jun–Aug) exhibit systematic suppression of GHI, consistent with historical Koshi Province climatology.

Fig. 3 — Monthly Average Daily Yield (kWh/m²/day)

Dark cells indicate monsoon-season irradiance attenuation

Monthly mean daily GHI by year reveals a consistent seasonal signal. The June–August trough corresponds to the South Asian monsoon, reducing average daily yield by approximately 38% versus the dry season.

§ 4 — Architecture Ablation Study

Model Performance Comparison

All three architectures were trained under identical conditions: same 80/10/10 dataset splits, same optimiser (Adam, lr = 1×10⁻³), same early-stopping criterion (patience = 5, monitor = val_loss). Performance metrics are reported on the held-out test partition.

Table 2 — Ablation Study Results

Architecture	R² Score	RMSE (W/m²)	Training Time	Status
Loading data…

Fig. 4. R² scores (%) for LSTM, GRU, and CNN-LSTM on identical held-out test sequences. RMSE annotations shown inside bars. The standard 2-layer stacked LSTM achieves the highest R² (94.29%) and lowest hourly RMSE (25.1 W/m²), and was selected as the primary architecture for the 5-year autoregressive projection.

Long-Term Solar Irradiance Forecasting for Koshi Province, Nepal