Named Entity Recognition AlgorithmDesign for Chinese News TextAbstract:With the development of science and technology in China,more andmore people choose to watch news online,but too much news is dazzling.Manypeople can't view their favorite news.It is very meaningful to divide and identify thevarious news according to different types,so that users can easily view the differenttypes of news they like according to the name of the news.This study mainly uses Bayesian algorithm,word vector transformation andstutter library to identify the naming of news text,as well as the classification of newstypes.Firstly,a news text with 5000 data is identified and named.Because the data setcontains different kinds of news,it is chaotic,so the named recognition is carried outfirst.After calculation,the news with 10 names is identified,including 10 categoriesof automobile,finance,science and technology,health,sports,education,culture,military,entertainment and fashion.Then,according to the 10 named news,Thecontent of the identified news is divided.Finally,Bayesian algorithm is used toclassify and score the news.4000 samples are used in the training set,and more than5000 samples are used in the test set.Finally,through Bayesian algorithm,it is foundthat the score of the training set is more than 86%,and the prediction of the test set is79%.Basically,the naming and classification results of the whole news are good,anddifferent types of news can be divided.The techniques used in this study include the introduction of CAI jiebaku,thedrawing of word cloud,the introduction of stop words,and the code conversionmethod of converting word vectors into data sets.Finally,the confusion matrix inBayesian algorithm is used to score and classify the training set and test set of newsnaming recognition and classification.After testing,this study runs successfully.Keywords:Bayesian algorithm;Confusion matrix;News naming,identificationand classification
暂无评论内容