A comprehensive benchmarking of different outlier detection techniques for univariate datasets by Monte Carlo simulations


ERGİN M., Aksu Y., KOŞKAN Ö.

Communications in Statistics: Simulation and Computation, 2025 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1080/03610918.2025.2571982
  • Dergi Adı: Communications in Statistics: Simulation and Computation
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, MathSciNet, zbMATH
  • Anahtar Kelimeler: Analysis of variance, Monte Carlo simulation, Outlier detection, Sample size, Univariate data
  • Isparta Uygulamalı Bilimler Üniversitesi Adresli: Evet

Özet

In scientific studies, outliers are among the major factors that can alter experimental results, particularly in analysis of variance. To address this problem, various outlier detection techniques have been developed by researchers. Selecting a suitable technique for practical uses is critical to ensure accurate and reliable statistical inference. The aim of this simulation study was to present a benchmark of nine outlier detection techniques (2σ, Z score, Modified Z score, Median Absolute Deviation-2MADE and 3MADE-, IQR, Grubbs, Rosner, HMSDHM) in terms of type I error probability in ANOVA and their detection capability for injected outliers. A Monte Carlo simulation program was constructed using random numbers generated from normal distribution. Small, medium, and large sample sizes, various numbers of injected outliers, and magnitudes of outliers were considered. In results, the 3MADE and Grubbs tests outperformed others in small samples. For medium samples, the 3MADE and Rosner tests showed reliable results. Furthermore, the Z score, Modified Z score, and Rosner tests performed best in large samples in terms of detection accuracy of injected outliers and type I error probabilities. Overall, the results indicated that the best outlier detection technique depends on the sample size, with 3MADE, Grubbs, and Rosner showing superior performance.