Design and implementation of web crawler based on PythonAbstract Since the Internet era,Internet search engines have become more and more essential.Inperiod of big data,common network search engines cannot satisfy the exact needs of users,Peopleattach importance to the search efficiency of specific information,and web crawler technology emergeas the times require.This design first analyzes the URL related web pages of the specified URL to findout the URL information rule of the target information in the web page;then select the beautiful soupmodule or the HTML module of Ixml to write the function to crawl these URLs hierarchically;finally,the information in the web pages corresponding to the URL is classified and saved in the text file.Thenuse the jeeba module to analyze the information in the crawled text based on TF IDF index,and thenfind out the words with high word frequency for further analysis.Based on Python,novel coronavirus isfirst analyzed.We find out the high frequency words in the news and draw the word cloud map.Then,inresponse to the epidemic situation caused by novel coronavirus,this design crawled the epidemicsituation related information from Tencent News Network and drew the epidemic situation distributionmap according to the related information.Two crawler examples show the feasibility andeffectiveness of the design.Keywords:Crawler,Internet,campus,epidemic situation
暂无评论内容