نشریه علمی - پژوهشی مرتع و آبخیزداری

نوع مقاله : مقاله پژوهشی

نویسنده

استادیار گروه مرتع و آبخیزداری و عضوگروه پژوهشی خشکسالی و تغییر اقلیم، دانشکده منابع طبیعی و محیط زیست، دانشگاه بیرجند، بیرجند، ایران

10.22059/jrwm.2023.350908.1682

چکیده

این تحقیق به مقایسه روش‌های شناسایی داده پرت تک متغیره در بین داده‌های درصد پوشش گیاهی در یک مطالعه ارزیابی تاثیر شدت چرا در مراتع مناطق خشک می‌پردازد. بدین منظور، پس از اندازه‌گیری درصد پوشش گیاهی در مرتع و قبل از تحلیل آماری، وجود یا عدم وجود داده پرت به عنوان پیش فرض آزمون‌های پارامتریک فرضیه مقایسه‌ای بررسی شد. در این تحقیق از هشت روش شامل نمودار جعبه‌ای (Boxplot) و دامنه میان چارکی (روش Tukey)، انحراف معیار از میانگین (قانون Three-sigma)، انحراف مطلق از میانه (روش Hampel)، میانگین پیراسته، مقادیر صدک 1 و 99، آزمون کای اسکوئر (χ²)، آزمون گرابز (ESD) و آزمون روزنر (generalised ESD) استفاده شد. نتایج نشان داد که داده‌های درصد پوشش گیاهی مراتع با شدت چرای سبک و متوسط توزیع نرمال ندارند (آزمون شاپیرو-ویلک: 05/0 (P≤. حتی حذف داده پرت نیز منجر به نرمال شدن داده‌ها نشد، اما منجر به همگن شدن واریانس خطا شد (آزمون لیون: 05/0 (P≥. از هشت روش مورد استفاده، روش Z اصلاح شده و آزمون‌های گرابز و روزنر (05/0 (P≥، هیچکدام از داده‌های درصد پوشش گیاهی را به عنوان داده پرت تشخیص ندادند. از بین روش‌های مورد مطالعه، نمودار جعبه‌ای و روش انحراف مطلق از میانه که به میانگین وابسته نیستند، برای داده‌های پوشش گیاهی مناسب‌ترند. از این‌رو قبل از انجام هرگونه آزمون فرضیه مقایسه‌ای، استفاده ترکیبی از دو روش چشمی و آماری برای بررسی وجود یا عدم وجود داده‌های پرت توصیه می‌شود.

کلیدواژه‌ها

عنوان مقاله [English]

Comparison of outlier detection methods and their impact on rangeland measurement and Assessment studies

نویسنده [English]

  • Moselm Rostampour

Assistant Professor, Department of Rangeland and Watershed Management and Research Group of Drought and Climate Change, Faculty of Natural Resources and Environment, University of Birjand, Birjand, Iran.

چکیده [English]

This study compared of univariate outlier detection methods among vegetation data in a study of the effect of grazing intensity in the rangelands of arid regions. For this purpose, after measuring the vegetation cover in the rangeland and before the statistical analysis, the presence of outlier data was examined as the assumption of parametric comparison tests. In this study, eight methods including the boxplot and IQR (Tukey method), standard deviation of the mean (three-sigma rule), median absolute deviation (Hampel method), trimmed mean, 1st percentile and 99th percentile, The Chi Square test (χ²), the Grubbs Test (ESD) and the Rosner test (generalized ESD) were used. The results showed that the vegetation cover of rangelands with light and moderate grazing intensity was not normally distributed (Shapiro-Wilk test: p≤0.05). Even deletion of outliers did not lead to a normal distribution, but it resulted in the homogeneity of variances (Levene's test: p≥0.05). The modified Z-score and the Grubbs and Rosner tests (p≥0.05) did not identify outliers from the vegetation cover data. Among the methods evaluated, the boxplot and MAD method, which are not dependent on the mean, are more suitable for the vegetation cover. Therefore, before performing any comparison test, a combination of visual and statistical methods is recommended to evaluate the presence of outliers.

کلیدواژه‌ها [English]

  • mean
  • outliers
  • parametric statistics
  • rangeland
  • vegetation
  • Abdolalizadeh, Z., Ghorbani, A., Mostafazadeh, R., and Moameri, M. (2019). Evaluation of the relationship between the quantitative characteristics of vegetation and rangeland condition in the northern rangelands of in Ardebil province. Journal of Range and Watershed Management, 72(1), 167-182.
  • Aguinis, H., Gottfredson, R. K., and Joo, H. (2013). Best-Practice Recommendations for Defining, Identifying, and Handling Outliers. Organizational Research Methods, 16(2), 270–301.
  • Ahmadi, M. and Sarmad, M. (2010). Detecting Outliers in Normal Data Using Modified Z-Scores. Journal of Statistical Sciences, 3 (2), 119-139.
  • Amiri, F. and Arzani, H. (2019). Suitability Model of Medical and Industrial Plants of Semirom Rangelands in Isfahan. Journal of Range and Watershed Management, 72(1), 15-28.
  • André, Q. (2022). Outlier exclusion procedures must be blind to the researcher's hypothesis. Journal of experimental psychology. General, 151(1), 213–223.
  • Arzani, H. and Abedi, M. (2015). Rangeland Assessment: Vegetation Measurement, University of Tehran Press, 305 p.
  • Benhadi-Marín, J. (2018). A conceptual framework to deal with outliers in ecology. Biodivers Conserv, 27, 3295–3300.
  • Bihamta, M. R. and Zare Chahkoei, M. A. (2011). Principles of statistics for the natural resources sciences. University of Tehran Press. 300p.
  • Buckley, J. A. and Georgianna, T. D. (2001). Analysis of statistical outliers with application to whole effluent toxicity testing. Water Environment Research, 73(5), 575–583.
  • Carter, J. , Schwertman, N. C. and Kiser, T. L. (2009). A comparison of two boxplot methods for detecting univariate outliers which adjust for sample size and asymmetry. Statistical Methodology, 6(6), 604-621.
  • Chinipardaz, R. and Kamranfar, H. (2009). Effect of Different Types of Outliers on GARCH Models. Journal of Statistical Sciences,3 (1) ,31-46.
  • Cho, H., Lee, K. and Ahn, S. (2016). Impact of Outliers on the Statistical Measures of the Environmental Monitoring Data in Busan Coastal Sea. Ocean and Polar Research, 38, 149-159.
  • Christy, A., Gandhi, M.G. and Vaithyasubramanian, S. (2015). Cluster based outlier detection algorithm for healthcare data. Procedia Comput. Sci., 50, 209–215.
  • Dadkhah, K. and Samadi Tudar, E. (2018). Robust Analysis of Variance based on Permutation Distribution of Trimmed Mean. Journal of Statistical Sciences, 12 (1),119-141.
  • Dallmeier, F., Szaro, R. C., Alonso, A., Comiskey, J. and Henderson, A. (2013) Framework for Assessment and Monitoring of Biodiversity. In: Levin S.A. (ed.) Encyclopedia of Biodiversity, second edition, Volume 3, pp. 545-559. Waltham, MA: Academic Press.
  • Dehghan, S. and Faridrohani, M. (2022). Multivariate Outlier Detection Based on Depth-Based Outlyingness Function. Journal of Statistical Sciences, 15 (2), 443-462.
  • Dervilis, N., Worden, K. and Cross, E. J. (2015). On Robust Regression Analysis as a Mean of Exploring Environmental and Operational Conditions for SHM Data. Journal of Sound and Vibration, 347, 279-296.
  • Ebrahimi, A. (2017). Effect of sampling group and life-form on estimation of relationship between forage production and canopy cover. Journal of Range and Watershed Management, 70(1), 19-30.
  • Emami, H. and Mansoori, P. (2018). Influence Diagnostics in Semiparametric Linear Mixed Measurement Error Models. Journal of Statistical Sciences,11 (2), 219-240.
  • Garces, and Sbarbaro, D. (2011). Outliers detection in environmental monitoring databases, Engineering Applications of Artificial Intelligence, 24(2), 341–349.
  • Hajibagheri, F., Rasekh, A. and Akhoond, M. R. (2014). Detecting Outliers in Liu Regression Model. Journal of Statistical Sciences, 8 (1),19-36.
  • Hordo, M., Kiviste, A., Sims A. and Lang, M. (2006). M Outliers and /or measurement errors on the permanent sample plot data. USDA Forest Service - General Technical Report PNW
  • Jackson D. A. and Chen, Y. (2004). Robust Principal Component Analysis and Outlier Detection with Ecological Data. Environmetrics, 15(2),129-139628.
  • Jain, R. B. (2010). A recursive version of Grubbs' test for detecting multiple outliers in environmental and chemical data. Clin. Biochem. 43, 1030–1033.
  • Jolous Jamshidia, Yusup, Y., Stephen Kayod, J. and Kamaruddina, M. A. (2022). Detecting outliers in a univariate time series dataset using unsupervised combined statistical methods: A case study on surface water temperature. Ecological Informatics 69, 101672.
  • Kitzes, (2022). Handbook of Quantitative Ecology. University of Chicago Press.
  • Kolbaşi, A., and Ünsal A. (2019). A Comparison of the Outlier Detecting Methods: An Application on Turkish Foreign Trade Data. Journal of Mathematics and Statistical Science, 5, 213-234.
  • Komsta, L., (2022). outliers: Tests for Outliers. R package version 0.15. https://CRAN.R-project.org/package=outliers
  • Krebs, C. J. (2014). Ecological Methodology, 3rd ed. Addison-Wesley Educational Publishers, Inc.
  • Leys, C., Ley, C., Klein, O., Bernard, P. and Licata, L. (2013). Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49, 764–766.
  • Lintott, P. R., and Mathews, F. (2018). Basic mathematical errors may make ecological assessments unreliable. Biodiversity and conservation, 27(1), 265–267.
  • Livesey, J. H. (2007). Kurtosis provides a good omnibus test for outliers in small samples. Clinical biochemistry, 40(13-14), 1032–1036.
  • Moghaddam, M R. (2001). Quantitative plant ecology. University of Tehran press. 285p.
  • Mouret, F., Albughdadi, M., Duthoit, S., Kouamé, D., Rieu, G. and Tourneret, J-Y. (2021). Outlier Detection at the Parcel-Level in Wheat and Rapeseed Crops Using Multispectral and SAR Time Series. Remote Sensing, 13(5), 956.
  • Mowbray, F. I., Fox-Wasylyshyn, S. M. and El-Masri, M. M. (2019). Univariate Outliers: A Conceptual Overview for the Nurse Researcher. The Canadian journal of nursing, 51(1), 31–37.
  • Nicolae-Marius, J. (2014). Software solutions for identifying outliers, Computational Methods in Social Sciences (CMSS), 2(2), 5-14.
  • Nordstokke, D. W. and Zumbo, B. D. (2010). A new nonparametric Levene test for equal variances. Psicológica, 31(2), 401–430.
  • Norouzi, A., haghiyan, I. and Sheidai Karkaj, E. (2020). Rangeland management plans and rangeland health (Case Study: Rangelands of Torbat-e-Heydarieh). Journal of Range and Watershed Management, 72(4), 1131-1145.
  • Odoi, B., Twumasi-Ankrah, S., Samita, S. and Al-hassan, S. (2022). The Efficiency of Bartlett's Test using Different forms of Residuals for Testing Homogeneity of Variance in Single and Factorial Experiments-A Simulation Study. Scientific African, 17, e01323.
  • R Core Team, (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
  • Rasekh, A., Mansouri, B. and Hedayatpoor, N. (2019). Outlier Detection in Ridge Regression Model Under Stochastic Linear Restrictions. Journal of Statistical Sciences, 13 (1) ,117-137.
  • Roozbeh, M. and Amini, M. (2020). Feasible Generalized Rdge Robust Estimator in Semiparametric Regression Models. Journal of Statistical Sciences, 13 (2), 441-460.
  • Rostampour, M. (2022). Rangeland Ecosystems Monitoring in different climatic regions of Iran, South Khorasan Province, Khosf Site. Research Institute of Forests and Rangelands.
  • Saleem, S., Aslam, M. and Shaukat, M. (2021). A review and empirical comparison of univariate outlier detection methods. Pakistan Journal of Statistics, 37 (4), 447-462.
  • Seo, S. (2006). A review and comparison of methods for detecting outliers in univariate data sets (master’s thesis). Pittsburgh: University of Pittsburgh.
  • Shimizu, Y. (2022). Multiple Desirable Methods in Outlier Detection of Univariate Data with R Source Codes. Frontiers in psychology, 12, 819854.
  • Torres, J. M., Pastor Pérez, J., Sancho Val, J., McNabola, A., Martínez Comesaña, M. and Gallagher, J. A. (2020). functional data analysis approach for the detection of air pollution episodes and outliers: A case study in Dublin, Ireland. Mathematics, 8, 225.
  • Wang, H., Bah, M. J. and Hammad, M. (2019). Progress in Outlier Detection Techniques: A Survey. IEEE Access, 7, 107964–108000.
  • Wheater P. , Bell J. R. and Cook P. A. (2011). Practical Field Ecology: A Project Guide. Wiley‐Blackwell. 389 p.
  • Zar, J.H. (2010). Biostatistical Analysis, 5th Pearson Prentice Hall: Upper Saddle River, NJ.