Back_To_Projects

[ PROJECT_ID: 0x7F1A ]

Recoltes: Digital Twin Framework

A capstone digital twin framework for agricultural monitoring spanning two complementary repositories. Recoltes is the core PHP/Blade web application handling operational interfaces and day-to-day usage. RecoltesAPI is the Python companion service where notebook-driven ML experimentation and per-sensor XGBoost/LightGBM/HistGradientBoosting models are developed. Together, the platform ingests data from 13 IoT sensor types via MQTT, trains models with 89 engineered features (lag, rolling mean, delta, cyclical time), and achieves R² = 0.9765 on soil moisture prediction.

PHPBladePythonFastAPIXGBoostPostgreSQLIoTMQTT
Recoltes: Digital Twin Framework

Best R²

0.9765

Features

89 Engineered

Sensors

13 Types

Actuators

6 Controlled

System Overview

01. Sensor Ingestion

Real-time streams from 13 IoT sensor types via MQTT into PostgreSQL landing tables with automated schema validation and 30-minute step intervals.

02. Operational Platform

PHP/Blade web application providing interfaces for routine operations, process flow management, and day-to-day monitoring of the lettuce growth environment.

03. ML Pipeline

Per-sensor XGBoost, LightGBM, and HistGradientBoosting models trained with 89 features (lag-1/2/3, rolling mean-3, deltas, cyclical time encodings). Hyperparameter tuning via RandomizedSearchCV with 5-fold CV on CUDA GPU.

04. Digital Twin Modeling

Per-sensor digital twin models with lag feature engineering (lag-1/2/3, rolling means, deltas) for improved forecasting accuracy, plus multi-label actuator classification across 6 control outputs.

Technical Specifications

MANIFEST_VER_2.0.0

// Core Stack

Web RuntimePHP / Blade
API RuntimeFastAPI / Uvicorn
ML ModelsXGBoost / LightGBM / HistGB
StoragePostgreSQL
ProtocolMQTT / REST
ComputeCUDA GPU

// Performance Metrics

Best R² (XGBoost)0.9765
Best MAE1.10
Engineered Features89
CV R² (LightGBM)0.801
Train Time (HistGB)124.9s
inference_pipeline.py
def run_inference(image_path, model_config):
    """YOLOv8 growth stage detection pipeline."""
    model = YOLO(model_config["weights_path"])

    results = model.predict(
        source=image_path,
        conf=model_config["confidence_threshold"],
        imgsz=640,
    )

    detections = parse_detections(results[0])
    return {"status": "SUCCESS", "stages": detections}

Analysis & Results

XGBoost soil moisture model: actual vs predicted scatter, residual distribution, and top 20 feature importances

XGBoost: Soil Moisture Prediction (R² = 0.9765)

The XGBoost model achieves the highest R² of 0.9765 with a near-zero residual mean (0.181) and tight standard deviation (1.564). Feature importance is dominated by the target's own lag values (soil_moisture, soil_moisture_rmean3, lag1/2/3), confirming strong temporal autocorrelation. Day-of-month and actuator states (natural_water_valve, water_pump) also contribute, validating the irrigation-driven dynamics.

LightGBM soil moisture model: actual vs predicted scatter, residual distribution, and top 20 feature importances

LightGBM: Soil Moisture Prediction (R² = 0.9732)

LightGBM achieves R² = 0.9732 with slightly higher residual bias (mean 0.515) than XGBoost, indicating a mild underprediction tendency. Its feature importance (split-based) is more evenly distributed across soil_moisture_delta, lag features, hour_sin, and nitrogen_rmean3, suggesting it discovers broader temporal and nutrient interactions even at a shallower max_depth of 3.

HistGradientBoosting soil moisture model: actual vs predicted scatter, residual distribution, and top 20 permutation feature importances

HistGradientBoosting: Soil Moisture Prediction (R² = 0.9759)

HistGB matches XGBoost closely at R² = 0.9759 while training 2.5x faster (124.9s vs 315.1s). Permutation importance reveals that only soil_moisture and soil_moisture_rmean3 carry meaningful predictive power — all other features contribute near-zero when permuted, indicating the model relies almost entirely on the target's recent history.

Core Architecture

Sensor Mesh Network

MQTT broker ingesting 13 sensor types — temperature, humidity, soil moisture, NPK, pH, EC, light intensity, water flow, and rain detection — at 30-minute intervals into the processing pipeline.

MQTT_PROTOCOL

ML Inference Engine

Per-sensor gradient boosting models (XGBoost, LightGBM, HistGB) with 89 engineered features, tuned via RandomizedSearchCV on CUDA GPU, achieving R² up to 0.9765 on soil moisture.

ML_INFERENCE

Digital Twin Pipeline

Per-sensor digital twin models with lag feature engineering and multi-label actuator classification, enabling predictive monitoring across 13 sensor types and 6 control outputs.

TWIN_PIPELINE