AbstractWeb page data mining technology has already had preliminary research in the early 1980s.With the rapid development of the Internet and the development of large-scale data era,it islooking for potentially useful value information from the large amount of data in the "Iceberg".Data mining technology has played a role that cannot be ignored,and has become one of themost sought-after research hotspots.In recent years,the technology has made rapid progressand has achieved great results in various industries such as engineering,medicine and science,and its research value has also increased.The traditional web crawler technology,also known as web spiders or web spiders,is aprogram that downloads web pages in bulk.The web crawler for traditional web pages is usuallyextended by externally expanding the hyperlink relationship in the web page to obtaininformation about the pages in the entire Internet.A python-based web crawler needs to studybetween the various nodes in the website in order to obtain a node relationship diagram for theentire website.In web crawler development,Python is the most commonly used design language.Python has a special advantage in crawler design,its rich open source library and excellent codeencapsulation make Python crawlers a trend.This graduation design is carried out in this context,mainly using the Python languagedesign program,its rich library function can find the xml structure of the web page,and useregular expressions to filter the data.Finally,the data is stored in the mysql database and savedfor subsequent operations.This paper aims to facilitate the excavation of the examination andadjustment,and provides a more convenient and feasible method for consulting informationafter the postgraduate students.Key words:Data mining;postgraduate;python;Crawler
暂无评论内容