Credit scoring using machine learning algorithims
Keywords:
Machine Learning, Credit Risk, Random Forests, Lasso regression, Support Vector Machine, Logit regressionAbstract
Credit risk mitigation is an area of renewed interest due to the 2007-2008 financial crises and thus masses of data are collected by
the financial institutions. This has left the risk analysts with a daunting task of adequately determining the credit worthiness of an
individual. In the search for highly efficient credit scoring models, financial institutions can adopt sophisticated machine learning
techniques. We employ the AUROC approach to make a comparative analysis of machine learning methods of classification by
performing 10-fold cross validation for model selection on the German Credit data set from the UCI database. The results show that
Lasso regression provides the best estimation for default with an AUROC of 0.8048 followed by the Random Forest model with 0.7869
AUROC. The widely used logit model performed better than the Support Vector Machine (Linear) with 0.7678 and 0.7581 AUROC
respectively. Moreover, by the Kolmogorov-Smirnov test, we proved that the other machine learning techniques outperform the widely
used logit model in how well the model is able to classify “good” class from “bad” class.