From Regression to Machine Learning: Modeling Height–Diameter Relationships in Crimean Juniper Stands Without Calibration Overhead

Diamantopoulou, Maria; ÖZÇELİK, Ramazan; Eler, Ünal; KOPARAN, Burak

doi:10.3390/f16060972

From Regression to Machine Learning: Modeling Height–Diameter Relationships in Crimean Juniper Stands Without Calibration Overhead

Diamantopoulou M. J., ÖZÇELİK R., Eler Ü., KOPARAN B.

Forests, cilt.16, sa.6, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 6
Basım Tarihi: 2025
Doi Numarası: 10.3390/f16060972
Dergi Adı: Forests
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Environment Index, Geobase
Anahtar Kelimeler: calibration, extreme gradient boost, mixed-effects, quantile regression, random forest, shallow multilayer perceptron, tree height
Isparta Uygulamalı Bilimler Üniversitesi Adresli: Evet

Özet

Accurate modeling of height–diameter (h–d) relationships is critical for forest inventory and management, particularly in complex forest ecosystems such as natural and pure Crimean juniper (Juniperus excelsa Bieb.) stands. This study evaluates both traditional parametric and modern machine learning (ML) approaches to develop reliable h–d models based on 2135 sample trees measured in southern Türkiye. The modeling approaches include fixed-effects (FE), mixed-effects (ME), three quantile regression (QR) models based on three, five, and nine quantile levels, and non-parametric ML methods: shallow multilayer perceptron (S_MLP), extreme gradient boost (XGBoost), and random forest (RF). According to the assessment metrics for the fitting and test datasets, the XGBoost modeling approach achieved the most accurate performance. For the fitting dataset, it achieved root mean square error values of 1.11 m and 1.21 m. For the test dataset, the corresponding error values were 1.16 m and 1.24 m, resulting in the highest accuracy among all models, closely followed by the RF and S_MLP models. A key practical advantage of ML approaches is that they do not depend on calibration scenarios, meaning they can operate without the need for preliminary parameter configuration. In contrast, the ME model showed the highest accuracy among the parametric methods when calibration was applied. In this case, when applying ME models, the study recommends calibrating the model by measuring four randomly selected trees per plot to balance prediction accuracy and field sampling effort.