Rama Vadakattu wrote:

> Is there any python library to solve the below problem?
> 
> FOr the below URL :
> --------------------------
> http://tinyurl.com/dzcwbg
> 
> Summarized text is :
> ---------------------------
> By Roy Mark With sales plummeting and its smart phones failing to woo
> new customers, Sony Ericsson follows its warning that first quarter
> sales will be disappointing with the announcement that Najmi Jarwala,
> president of Sony Ericsson USA and head of ...
> 
> ~~~~~~~~~~~~~~
> Usually summarized text is a  2 to 3 line description of the URL which
> we usually obtain by fetching that html page , examining the  content
> and  figuring out short description from that html markup.
> ~~~~~~~~~~~~~
> 
> Are there any python libraries which give summarized text for a given
> url ?

BeautifulSoup makes it easy to access parts of a web page. 

import urllib2
from BeautifulSoup import BeautifulSoup

data = urllib2.urlopen("http://tinyurl.com/dzcwbg";).read()
bs = BeautifulSoup(data)
print bs.find("meta", dict(name="description"))["content"]

> It is ok even if the library  just gives  intial two lines of text
> from the given URL Instead of summarization.

The problem is how you identify the summary. Different web sites will put it
in different places using different markup.

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to