On Wed, Jan 13, 2010 at 5:46 AM, yamamoto <blueskykin...@gmail.com> wrote: > Hi, > I am new to Python. I'd like to extract "a" tag from a website by > using "beautifulsoup" module. > but it doesnt work! > > //sample.py > > from BeautifulSoup import BeautifulSoup as bs > import urllib > url="http://www.d-addicts.com/forum/torrents.php" > doc=urllib.urlopen(url).read() > soup=bs(doc) > result=soup.findAll("a") > for i in result: > print i > > > Traceback (most recent call last): > File "C:\Users\falcon\workspace\p\pyqt\ex1.py", line 8, in <module> > soup=bs(doc) > File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 1499, in > __init__ > BeautifulStoneSoup.__init__(self, *args, **kwargs) > File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 1230, in > __init__ > self._feed(isHTML=isHTML) > File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 1263, in > _feed > self.builder.feed(markup) > File "C:\Python26\lib\HTMLParser.py", line 108, in feed > self.goahead(0) > File "C:\Python26\lib\HTMLParser.py", line 148, in goahead > k = self.parse_starttag(i) > File "C:\Python26\lib\HTMLParser.py", line 226, in parse_starttag > endpos = self.check_for_whole_start_tag(i) > File "C:\Python26\lib\HTMLParser.py", line 301, in > check_for_whole_start_tag > self.error("malformed start tag") > File "C:\Python26\lib\HTMLParser.py", line 115, in error > raise HTMLParseError(message, self.getpos()) > HTMLParser.HTMLParseError: malformed start tag, at line 276, column 36 > > any suggestion? > thanks in advance > > -- > http://mail.python.org/mailman/listinfo/python-list >
BeautifulSoup is overkill for this anyways. *#!/bin/python*from urllib import urlopen html = urlopen("http://www.d-addicts.com/forum/torrents.php").read()links = set([link.split('"')[0] *for* link in html.split('href="')])
-- http://mail.python.org/mailman/listinfo/python-list