基于Python的网络爬虫设计与实现

文章大全

5月11日 11:06发布

1.08MB36页0509

第1页 / 共36页

第2页 / 共36页

第3页 / 共36页

第4页 / 共36页

第5页 / 共36页

第6页 / 共36页

第7页 / 共36页

第8页 / 共36页

试读已结束，还剩28页，您可下载完整版后进行离线阅读

文章版权归作者所有，未经允许请勿转载。

THE END

计算机与科学

文本预览

Design and implementation of web crawler based on PythonAbstract Since the Internet era,Internet search engines have become more and more essential.Inperiod of big data,common network search engines cannot satisfy the exact needs of users,Peopleattach importance to the search efficiency of specific information,and web crawler technology emergeas the times require.This design first analyzes the URL related web pages of the specified URL to findout the URL information rule of the target information in the web page;then select the beautiful soupmodule or the HTML module of Ixml to write the function to crawl these URLs hierarchically;finally,the information in the web pages corresponding to the URL is classified and saved in the text file.Thenuse the jeeba module to analyze the information in the crawled text based on TF IDF index,and thenfind out the words with high word frequency for further analysis.Based on Python,novel coronavirus isfirst analyzed.We find out the high frequency words in the news and draw the word cloud map.Then,inresponse to the epidemic situation caused by novel coronavirus,this design crawled the epidemicsituation related information from Tencent News Network and drew the epidemic situation distributionmap according to the related information.Two crawler examples show the feasibility andeffectiveness of the design.Keywords:Crawler,Internet,campus,epidemic situation

喜欢就支持一下吧