[ PROJECT_ID: 0x7F1A ]
Recoltes: Digital Twin Framework
A capstone digital twin framework for agricultural monitoring spanning two complementary repositories. Recoltes is the core PHP/Blade web application handling operational interfaces and day-to-day usage. RecoltesAPI is the Python companion service where notebook-driven ML experimentation and per-sensor XGBoost/LightGBM/HistGradientBoosting models are developed. Together, the platform ingests data from 13 IoT sensor types via MQTT, trains models with 89 engineered features (lag, rolling mean, delta, cyclical time), and achieves R² = 0.9765 on soil moisture prediction.

Best R²
0.9765
Features
89 Engineered
Sensors
13 Types
Actuators
6 Controlled
System Overview
01. Sensor Ingestion
Real-time streams from 13 IoT sensor types via MQTT into PostgreSQL landing tables with automated schema validation and 30-minute step intervals.
02. Operational Platform
PHP/Blade web application providing interfaces for routine operations, process flow management, and day-to-day monitoring of the lettuce growth environment.
03. ML Pipeline
Per-sensor XGBoost, LightGBM, and HistGradientBoosting models trained with 89 features (lag-1/2/3, rolling mean-3, deltas, cyclical time encodings). Hyperparameter tuning via RandomizedSearchCV with 5-fold CV on CUDA GPU.
04. Digital Twin Modeling
Per-sensor digital twin models with lag feature engineering (lag-1/2/3, rolling means, deltas) for improved forecasting accuracy, plus multi-label actuator classification across 6 control outputs.
Technical Specifications
MANIFEST_VER_2.0.0// Core Stack
// Performance Metrics
def run_inference(image_path, model_config):
"""YOLOv8 growth stage detection pipeline."""
model = YOLO(model_config["weights_path"])
results = model.predict(
source=image_path,
conf=model_config["confidence_threshold"],
imgsz=640,
)
detections = parse_detections(results[0])
return {"status": "SUCCESS", "stages": detections}Analysis & Results

XGBoost: Soil Moisture Prediction (R² = 0.9765)
The XGBoost model achieves the highest R² of 0.9765 with a near-zero residual mean (0.181) and tight standard deviation (1.564). Feature importance is dominated by the target's own lag values (soil_moisture, soil_moisture_rmean3, lag1/2/3), confirming strong temporal autocorrelation. Day-of-month and actuator states (natural_water_valve, water_pump) also contribute, validating the irrigation-driven dynamics.

LightGBM: Soil Moisture Prediction (R² = 0.9732)
LightGBM achieves R² = 0.9732 with slightly higher residual bias (mean 0.515) than XGBoost, indicating a mild underprediction tendency. Its feature importance (split-based) is more evenly distributed across soil_moisture_delta, lag features, hour_sin, and nitrogen_rmean3, suggesting it discovers broader temporal and nutrient interactions even at a shallower max_depth of 3.

HistGradientBoosting: Soil Moisture Prediction (R² = 0.9759)
HistGB matches XGBoost closely at R² = 0.9759 while training 2.5x faster (124.9s vs 315.1s). Permutation importance reveals that only soil_moisture and soil_moisture_rmean3 carry meaningful predictive power — all other features contribute near-zero when permuted, indicating the model relies almost entirely on the target's recent history.
Core Architecture
Sensor Mesh Network
MQTT broker ingesting 13 sensor types — temperature, humidity, soil moisture, NPK, pH, EC, light intensity, water flow, and rain detection — at 30-minute intervals into the processing pipeline.
MQTT_PROTOCOL
ML Inference Engine
Per-sensor gradient boosting models (XGBoost, LightGBM, HistGB) with 89 engineered features, tuned via RandomizedSearchCV on CUDA GPU, achieving R² up to 0.9765 on soil moisture.
ML_INFERENCE
Digital Twin Pipeline
Per-sensor digital twin models with lag feature engineering and multi-label actuator classification, enabling predictive monitoring across 13 sensor types and 6 control outputs.
TWIN_PIPELINE