一种基于AdaBoost..MH算法的汉语多义词排歧方法AbstractWord sense disambiguation (WSD)plays an important role in many areas of naturallanguage processing such as machine translation,information retrival,sentence analysis,speechrecognition.The research on WSD has great theoretical and practical significance.The mainwork in the dissertation is to study the supervised learning algorithm learning WSD knowledgefrom many kinds of resources based on large sense-tagged Chinese corpus.An approach based on supervised AdaBoost.MH learning algorithm for Chinese wordsense disambiguation is presented.AdaBoost.MH algorithm is employed to learn WSDknowledge from many kinds of resources and to boost the accuracy of the weak stumps rulesfor decision trees and repeatedly calls a learner to finally produce a more accurate rule.A simplestopping criterion is also presented in view of the efficiency of learning and the utility of system.As for Chinese WSD,in order to extract more contextual information,we introduce a newWSD knowledge--semantic categorization as well as two classical knowledge sources:part-of-speech of neighboring words and local collocations.Experimental results show that thesemantic categorization knowledge is useful for improving the learning efficency of thealgorithm and accuracy of disambiguation.Due to the flexibility and complexity of bulding up a broad coverage semanticallyannotated corpus,an approach based on WwW search engines to automatically obtainannotated corpus for Chinse WSD is presented.AdaBoost.MH algorithm has a higher disambiguation accuracy rates which are 85.75%and 75.84%in open tests for 6 typical polysemous Chinese words and 20 polysemous wordsfrom SENSEVAL3 Chinese corpus.Key Words:Natural Language Processing;Word sense disambiguation;AdaBoost.MH algorithm;Multiple knowledge sources
暂无评论内容