Title
Research on the Effectiveness of Different Outlier Detection Methods in Common Data Distribution Types
Authors
Qingqing Song
Belarusian State University
Shaoliang Xia
Belarusian State University
Published In
Journal of Computer Technology and Applied Mathematics, 1(1), 13–25.
Published Date
2024-04-27
Abstract
Outlier detection are widely applied in areas such as network performance optimization and pre-processing of machine learning data. In the field of machine learning, the objective is to enhance data quality, thereby improving the performance of subsequent statistical analyses or machine learning models. Currently, there are numerous effective and reliable outlier analysis methods, and their effectiveness varies significantly when dealing with different types of data distributions. Therefore, it is essential to select an appropriate outlier analysis method. In this study, we conducted outlier detection on sample data from five continuous probability distributions (including Normal, Chi-square, Exponential, Gamma, and T distributions) and four discrete probability distributions (including Binomial, Poisson, Geometric, and Hypergeometric distributions). This paper employs five outlier detection methods, namely Z-Score, IQR, DBScan, Isolation Forest, and Random Forest, and evaluates the detection effectiveness of these methods. Through comparison and analysis, this paper summarizes the characteristics of various outlier detection methods when dealing with sample data from different types of distributions. These findings will assist us in making more rational method selections when facing different outlier detection scenarios.
Identifiers
- DOI: doi.org/10.5281/zenodo.10888672
Links
SUAS Press
https://www.suaspress.org/ojs/index.php/JCTAM/article/view/v1n1a03
Comment