北京理工大学珠海学院2020届本科生毕业论文Research on the model of corporate dishonesty recognition basedon Data MiningAbstractHow to detect whether enterprises break the law or not has become a problem in the eraof big data.This paper will use data mining method to predict whether the enterprise isdishonest.First of all,we use Python's pandas package to make statistics on the missingvalues of the data,and remove the indicators with a missing rate of more than 30%,andthen remove the remaining 19 indicators that have no impact on the enterprise's dishonesty.At last,we leave 12 indicators.Then,based on the principle of KNN algorithm,the dmwrpackage of R is used to fill the missing data.Secondly,this paper makes a data visualiza-tion analysis on the four indicators of data,namely,enterprise type,registration authority,enterprise status and jurisdiction authority.Finally,the decision tree model,random forestmodel and gradient promotion decision tree model are selected to establish the enterprisedishonesty identification model.The model is evaluated by the accuracy rate,recall rate,confusion matrix and ROC curve.Finally,the prediction accuracy rates of decision tree,random forest and gradient promotion decision tree are 90.9%,92%and 92.57%.However,the AUC value ofthe three decision tree family models is in the range of 0.51 and 0.61,andthe prediction accuracy of the models is not stable enough.Finally,we try MLP model,andfind that the prediction accuracy of MLP model is 92 and AUC value is stable at 0.89.Keywords:Corporate discredit Decision tree data mining Receiver Operating Char-acteristic
暂无评论内容