A XGBOOST RISK MODEL
VIA FEATURE SELECTION AND BAYESIAN HYPER-PARAMETER OPTIMIZATION
Yan Wang1
and Xuelei Sherry Ni2
1Graduate College, Kennesaw State
University, Kennesaw, USA
2Department
of Statistics and Analytical Sciences, Kennesaw State University, Kennesaw, USA
ABSTRACT
This paper aims to explore models based on the
extreme gradient boosting (XGBoost) approach for business risk classification.
Feature selection (FS) algorithms and hyper-parameter optimizations are
simultaneously considered during model training. The five most commonly used FS
methods including weight by Gini, weight by Chi-square, hierarchical variable
clustering, weight by correlation, and weight by information are applied to
alleviate the effect of redundant features. Two hyper-parameter optimization
approaches, random search (RS) and Bayesian tree-structuredParzen Estimator
(TPE), are applied in XGBoost. The effect of different FS and hyper-parameter
optimization methods on the model performance are investigated by the Wilcoxon
Signed Rank Test. The performance of XGBoost is compared to the traditionally
utilized logistic regression (LR) model in terms of classification accuracy,
area under the curve (AUC), recall, and F1 score obtained from the 10-fold
cross validation. Results show that hierarchical clustering is the optimal FS
method for LR while weight by Chi-square achieves the best performance in
XG-Boost. Both TPE and RS optimization in XGBoost outperform LR significantly.
TPE optimization shows a superiority over RS since it results in a
significantly higher accuracy and a marginally higher AUC, recall and F1 score.
Furthermore, XGBoost with TPE tuning shows a lower variability than the RS
method. Finally, the ranking of feature importance based on XGBoost enhances
the model interpretation. Therefore, XGBoost with Bayesian TPE hyper-parameter
optimization serves as an operative while powerful approach for business risk
modeling.
KEYWORDS
Extreme gradient boosting;
XGBoost; feature selection; Bayesian tree-structured Parzen estimator; risk
modeling
Orginal Source URL: http://aircconline.com/ijdms/V11N1/11119ijdms01.pdf
Comments
Post a Comment