نشریه علمی - پژوهشی مرتع و آبخیزداری

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه علوم خاک، دانشکده کشاورزی، دانشگاه گیلان، رشت، ایران

2 مؤسسه تحقیقات خاک و آب، سازمان تحقیقات، آموزش و ترویج کشاورزی، کرج، ایران

10.22059/jrwm.2023.354333.1692

چکیده

در سال‌های اخیر، استفاده از روش‌های نقشه‌برداری رقومی مبتنی بر الگوریتم‌های یادگیری ماشین باهدف تهیه نقشه‌ کلاس‌های خاک بطور گسترده‌ای توسعه یافته است. اساس این روش‌ها پیش‌بینی کلاس‌ها یا ویژگی‌های خاک به کمک مدل‌سازی روابط بین آن‌ها و متغیرهای محیطی به عنوان نمایندگان عوامل خاک‌سازی، می‌باشد. ماهیت نامتوازن توزیع خاک‌ها در طبیعت که منجر به بیش‌برازش کلاس‌های با فراوانی زیاد و کم برازش کلاس‌های با فراوانی کم و درنتیجه کاهش دقت فرآیند نقشه‌برداری خاک شده، از چالش‌های موجود در این روش می‌باشد. بنابراین، پژوهش حاضر باهدف ارزیابی توانایی دو الگوریتم جنگل تصادفی و ماشین‌بردار‌پشتیبان در نقشه‌برداری‌رقومی کلاس‌های فامیل خاک با توزیع نامتوازن، حاصل از 95 خاک‌رخ مطالعاتی در 4000 هکتار از اراضی زیرحوضه هنام، استان لرستان انجام گرفت. در این مطالعه موضوع عدم توازن در فراوانی کلاس‌های خاک با استفاده از 6 مجموعه داده، ازجمله مجموعه داده‌های اصلی و پنج مجموعه داده ایجادشده توسط چندین رویکرد نمونه‌گیری مجدد از داده‌های اصلی، شامل دو رویکرد طبقه‌بندی دستی و سه الگوریتم بیش‌نمونه‌گیری و کم‌نمونه‌گیری و بیش‌نمونه‌گیری اقلیت مصنوعی در محیط نرم افزار R موردبررسی قرار گرفت. نتایج نشان داد علیرغم مقایر پائین آماره‌های اعتبارسنجی، شباهت گسترش خاک‌های با فراوانی زیاد در منطقه مطالعاتی در نقشه‌های حاصل از مدل جنگل تصادفی و مجموعه داده‌های اصلی و همچنین الگوریتم بیش‌نمونه‌گیری اقلیت مصنوعی با نقشه خاک تهیه‌شده به روش مرسوم قابل‌توجه می‌باشد. بنابراین فراوانی کم سایر کلاس‌های خاک و در نتیجه آن عدم آموزش درست مدل‌ها برای آن‌ها را می‌توان یکی از دلایل اصلی صحت‌کلی کم مدل‌های به‌کار‌رفته دانست.

کلیدواژه‌ها

عنوان مقاله [English]

Investigating the effectiveness of resampling algorithms in improving the classification of unbalanced data in digital soil mapping

نویسندگان [English]

  • Fatemeh Ebrahimi Meymand 1
  • Hasan Ramezanpour 1
  • Nafiseh Yaghmaeian 1
  • Kamran Eftekhari 2

1 Soil Science Department, College of Agriculture, University of Guilan, Rasht, Iran.

2 Soil and Water Research Institute, Agriculture Research, Education and Extension Organization (AREEO), Karaj, Iran

چکیده [English]

In recent years, the use of digital soil mapping (DSM) based on machine learning algorithms with the aim of preparing soil maps has become widespread with the basis of soil class prediction with the help of modeling the relationships between them and environmental variables. One of this method's challenges is the imbalanced nature of soil distribution in landscape, which leads to overfitting and underfitting of classes, and as a result, reduces the accuracy of many used models. This study was conducted to evaluate the ability of two machine learning algorithms, including random forests and support vector machines, for the digital mapping of soil classes with an imbalanced data set. This study was conducted on 95 soil profile classes at the family level, in 4000 hectares of land in the Honam sub-basin, Lorestan province. The issue of imbalance in soil classes was investigated by using six data sets, including the original soil data set and five data sets created by several resampling approaches including two manual classifications and three over-sampling, under-sampling, and Synthetic Minority Over-Sampling Techniques in the R software. The results showed that despite the low values of overall accuracy, the Geographical distribution of soils with high frequency in the study area in digital soil map obtained from the random forest and the original data set as well as Synthetic Minority Over-Sampling Technique, with conventional soil map of study area is significant. Therefore, the low observation number of other soil classes and as a result incorrect training of models can be considered as one of the main reasons for the low accuracy of the used models.

کلیدواژه‌ها [English]

  • Machine learning
  • Oversampling
  • R software
  • soil map
  • under sampling
Abdi, L., & Hashemi, S. (2015). To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE transactions on Knowledge and Data Engineering28(1), 238-251.
Adhikari, K., Minasny, B., Greve, M. B., & Greve, M. H. (2014). Constructing a soil class map of Denmark based on the FAO legend using digital techniques. Geoderma214, 101-113.
Bagheri Bodaghabadi, M., Martínez-Casasnovas, J. A., Esfandiarpour Borujeni, I., Salehi, M. H., Mohammadi, J., & Toomanian, N. (2016). Database extension for digital soil mapping using artificial neural networks. Arabian Journal of Geosciences, 9(18), 1-13.
Banai M.H. (1998). Soil thermal-moisture regimes map of Iran on 1:1,250,000 scales. Soil and Water Research Institute, Tehran. Iran. (In Persian)
Brungard, C. W., J. L. Boettinger, M. C. Duniway, S. A. Wills & T. C. Edwards. (2015). Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma. 240: 68–83.
Chawla, N. V., K. W. Bowyer, L. O. Hall, & W. P. Kegelmeyer. (2002). “SMOTE: Synthetic Minority Over-Sampling Technique”. Journal of Artificial Intelligence Research, Vol. 16, pp. 321–357.
Congalton, R. 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing Environmental, 37: 35–46.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: review of methods and applications. Expert Syst, Appl.73, 220–239.
Hengl, T. (2007). A Practical Guide to Geostatistical Mapping of Environmental Variables. EUR 22904 EN. Luxembourg (Luxembourg): Office for Official Publications of the European Communities. JRC38153.
Hengl, T., de Jesus, J.M., Heuvelink, G.B.M., Gonzalez, M.R., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M.N., Geng, X., Bauer-Marschallinger, B., Guevara, M.A., Vargas, R., MacMillan, R.A., Batjes, N.H., Leenaars, J.G.B., Ribeiro, E., Wheeler, I.,Mantel, S., & Kempen, B.(2017). SoilGrids250m: global gridded soil information based on machine learning. PLoS One, 12, e0169748.
Eftekhari, k. (2018). Detailed soil survey and land classification of Honam sub catchment, emphasizing on soil-landform relations. Soil and water reserch institute. Project No: 14-10-10-9201-92001-K9201. (In Persian)
 Eftekhari, k. (2019). Discriminating of Geomorphic Surfaces in Honam Sub catchment as a Basis for Delineating Homogenous Land Areas, Using Landscape Analysis Techniques. Soil and water research institute. Project No: 14-10-10-9201-92002-K9201. (In Persian)
Jackson ML. (1973). Soil chemical analysis. New Delhi: Prentice Hall of India Pvt. Ltd.
Jafari A., Ayoubi S.H., Khademi H., Finke P.K., & Toomanian N. (2013). Selection of a taxonomic level for soil mapping using diversity and map purity indices: A case study from an Iranian arid region. Geomorphology, 201: 86-97.
Jasiewicz, J., & Stepinski, T. F. (2013). Geomorphons—a pattern recognition approach to classification and mapping of landforms. Geomorphology, 182, 147-156.
Kovačević, M., Bajat, B., & Gajić, B. (2010). Soil type classification and estimation of soil properties using support vector machines. Geoderma, 154(3-4), 340-347.
Ma, T., Brus, D. J., Zhu, A. X., Zhang, L., & Scholten, T. (2020). Comparison of conditioned Latin hypercube and feature space coverage sampling for predicting soil classes using simulation from soil maps. Geoderma, 370, 114366.
Martinez-Taboada, F., & Redondo, J. I. (2020). Variable importance plot (mean decrease accuracy and mean decrease Gini). Plos One, 15(4), e0230799.
Massawe, B. H., Subburayalu, S. K., Kaaya, A. K., Winowiecki, L., & Slater, B. K. (2018). Mapping numerically classified soil taxa in Kilombero Valley, Tanzania using machine learning. Geoderma, 311, 143-148.
McBratney, A. B., Santos, M. M., & Minasny, B. (2003). On digital soil mapping. Geoderma, 117(1-2), 3-52.
Minasny, B., & McBratney, A. B. (2016). Digital soil mapping: A brief history and some lessons. Geoderma, 264, 301-311.
Mirakzehi, M., Pahlavan-Rad, M. R., Shahriari, A, & Bameri, A. (2018). Digital soil mapping of deltaic soils: a case of study from Hirmand (Helmand) river delta, Geoderma, 313, 233–240.
Mosleh, Z., Salehi, M. H., Jafari, A., Borujeni, I. E., & Mehnatkesh, A. (2016). The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environmental monitoring and assessment, 188(3), 1-13.
Mousavi, S.R., Sarmadian, F & Rahmani, A. (2020). Modelling and Prediction of Soil Classes Using Boosting Regression Tree and Random Forests Machine Learning Algorithms in Some Part of Qazvin Plain. Iranian Journal of Soil and Water Research50(10), pp.2525-2538. (In Persian)
Nayal, A., Jomaa, H., & Awad, M. (2017). KerMinSVM for imbalanced datasets with a case study on arabic comics classification. Engineering Applications of Artificial Intelligence, 59, 159-169.
Nazari, S., Rostaminia, M., Ayoubi, S., Rahmani, A. & Mousavi, S.R. (2020). Efficiency of Different Feature Selection Methods in Digital Mapping of Subgroup and Soil Family Classes with Data Mining Algorithms. Journal of Water and Soil, 34(4,) 973-987. (In Persian)
Neyestani, M., Sarmadian, F., Jafari, A., Keshavarzi, A., & Sharififar, A. (2021). Digital mapping of soil classes using spatial extrapolation with imbalanced data. Geoderma Regional, 26, e00422.
pahlavan-Rad, M.R., Tomanian, N & Khormali, F (2016). Introduction to digital soil mapping. Land Management Journal, 4 (2), 97-114. (In Persian)
Pahlavan-Rad, M.R. & Akbari Moghaddam, A. R. (2018). Spatial variability of soil texture fractions and pH in a flood plain (a case study from eastern Iran), Catena, 160, 275–281.
Ramcharan, A., Hengl, T., Nauman, T., Brungard, C., Waltman, S., Wills, S., & Thompson, J. (2018). Soil property and class maps of the conterminous United States at 100‐meter spatial resolution. Soil Science Society of America Journal82(1), 186-201.
Sharififar, A., Sarmadian, F., Malone, B. P., & Minasny, B. (2019). Addressing the issue of digital mapping of soil classes with imbalanced class observations. Geoderma, 350, 84-92.
Soil Survey Staff. (2014). Keys to Soil Taxonomy. 12th Edition, USDA-Natural Resources Conservation Service, Washington DC.
Taghizadeh-Mehrjardi, R., K. Nabiollahi, B. Minasny & J. Triantafilis. (2015). Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma, 253-254: 67–77
Thomas, G. W. (1996). Soil pH and Soil Acidity. In: Methods of Soil Analysis, Part 3. Chemical Methods, American Society of Agronomy, Inc. Madison, Wisconsin, USA. 475-491
Weston, J., & Watkins, C. (1998). Multi-class support vector machines (pp. 98-04). Technical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, May.
Zhu, B., Baesens, B., & vanden Broucke, S. K. (2017). An empirical comparison of techniques for the class imbalance problem in churn prediction. Information sciences, 408, 84-99.