Fatemeh Ebrahimi Meymand; Hasan Ramezanpour; Nafiseh Yaghmaeian; Kamran Eftekhari
Abstract
In recent years, the use of digital soil mapping (DSM) based on machine learning algorithms with the aim of preparing soil maps has become widespread with the basis of soil class prediction with the help of modeling the relationships between them and environmental variables. One of this method's challenges ...
Read More
In recent years, the use of digital soil mapping (DSM) based on machine learning algorithms with the aim of preparing soil maps has become widespread with the basis of soil class prediction with the help of modeling the relationships between them and environmental variables. One of this method's challenges is the imbalanced nature of soil distribution in landscape, which leads to overfitting and underfitting of classes, and as a result, reduces the accuracy of many used models. This study was conducted to evaluate the ability of two machine learning algorithms, including random forests and support vector machines, for the digital mapping of soil classes with an imbalanced data set. This study was conducted on 95 soil profile classes at the family level, in 4000 hectares of land in the Honam sub-basin, Lorestan province. The issue of imbalance in soil classes was investigated by using six data sets, including the original soil data set and five data sets created by several resampling approaches including two manual classifications and three over-sampling, under-sampling, and Synthetic Minority Over-Sampling Techniques in the R software. The results showed that despite the low values of overall accuracy, the Geographical distribution of soils with high frequency in the study area in digital soil map obtained from the random forest and the original data set as well as Synthetic Minority Over-Sampling Technique, with conventional soil map of study area is significant. Therefore, the low observation number of other soil classes and as a result incorrect training of models can be considered as one of the main reasons for the low accuracy of the used models.
Serveh Darvand; Hassan Khosravi; Hamidreza Keshtkar; Gholamreza Zehtabian; Omid Rahmati
Abstract
The purpose of this study was to compare machine learning models including Support Vector Machine, Classification and Regression Tree, Random Forest, and Multivariate Discriminate Analysis to prioritize susceptible areas to dust production. To determine the dust days, hourly meteorological data of Alborz ...
Read More
The purpose of this study was to compare machine learning models including Support Vector Machine, Classification and Regression Tree, Random Forest, and Multivariate Discriminate Analysis to prioritize susceptible areas to dust production. To determine the dust days, hourly meteorological data of Alborz and Qazvin provinces and satellite images of the same days for the period 2000 to 2019 were used. 420 dust collection points were identified and the map of their distribution was prepared. The maps of factors affecting the occurrence of dust, including landuse map, soil orders map, slope map, slope aspect map, elevation map, vegetation map, topographic surface moisture, topographic surface ratio, and geology mam were prepared. Using the mentioned models, the impact of each of the effective factors of dust was determined and prioritization maps of dust harvesting areas were prepared. Models were evaluated using the ROC curve. According to the results, the elevation factor is more important in all models than the other parameters used in the model. The modeling results also showed that the Random Forest )RF( and Multivariate Discriminate Analysis (MDA) models had the highest values of accuracy (0.96), precision (0.94), Probability of Detection (POD) (0.98), and False Alarm Ratio (FAR) (0.051) compared to the others. The performance of the RF and MDA models is better than the other models, followed by the Support Vector Machine (SVM) and Classification and Regression Tree (CART) models, respectively. Also, in evaluating the models using Receiver Operating Characteristic (ROC), the RF model was selected as the best model.
Zahra Barati; Ebrahim Omidvar; Ataollah Shirzadi
Abstract
Landslide susceptibility mapping is considered as the first important step in landslide risk assessment. The main purpose of this study is to compare the performance of a machine learning algorithm (a logistic model tree), and a statistical model (a logistic regression), for landslide susceptibility ...
Read More
Landslide susceptibility mapping is considered as the first important step in landslide risk assessment. The main purpose of this study is to compare the performance of a machine learning algorithm (a logistic model tree), and a statistical model (a logistic regression), for landslide susceptibility modeling in the Sarkhoon watershed, Chaharmahal and Bakhtiari province. For this purpose, at first, a landslide inventory map including a total of 98 landslide locations was constructed using historical landslides, and extensive field surveys. In addition, a total of 100 non-landslide locations were also identified to construct a database. The landslide and non-landslide locations were randomly selected and divided into two groups with a 70/30 ratio for modelling and validation processes. Twenty conditioning factors were selected based on literature review and geo-environmental properties in the study area. Subsequently, the logistic model tree (LMT) and the logistic regression (LR) models were applied to identify the influence of conditioning factors on landslide occurrence. Finally, the performance of the models in landslide susceptibility mapping was investigated using the area under the receiver operating characteristics curve (AUC). The results concluded that the LR model (AUC = 0.797) outperformed and outclassed the LMT (AUC = 0.740) model in the study area. Although both models were reliable tools for spatial prediction of landslide susceptibility; however, the LR model was more accurate that it can be proposed as an alternative tool for better management of areas prone to landslide in the study area.