Rama Vadakattu wrote: > Is there any python library to solve the below problem? > > FOr the below URL : > -------------------------- > http://tinyurl.com/dzcwbg > > Summarized text is : > --------------------------- > By Roy Mark With sales plummeting and its smart phones failing to woo > new customers, Sony Ericsson follows its warning that first quarter > sales will be disappointing with the announcement that Najmi Jarwala, > president of Sony Ericsson USA and head of ... > > ~~~~~~~~~~~~~~ > Usually summarized text is a 2 to 3 line description of the URL which > we usually obtain by fetching that html page , examining the content > and figuring out short description from that html markup. > ~~~~~~~~~~~~~ > > Are there any python libraries which give summarized text for a given > url ?
BeautifulSoup makes it easy to access parts of a web page. import urllib2 from BeautifulSoup import BeautifulSoup data = urllib2.urlopen("http://tinyurl.com/dzcwbg").read() bs = BeautifulSoup(data) print bs.find("meta", dict(name="description"))["content"] > It is ok even if the library just gives intial two lines of text > from the given URL Instead of summarization. The problem is how you identify the summary. Different web sites will put it in different places using different markup. Peter -- http://mail.python.org/mailman/listinfo/python-list