Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Ethan Furman
Andrew Berg wrote: ElementTree doesn't seem to have been updated in a long time, so I'll assume it won't work with Python 3. I don't know how to use it, but you'll find ElementTree as xml.etree in Python 3. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list

Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Karim
On 05/19/2011 11:35 PM, Andrew Berg wrote: On 2011.05.16 02:26 AM, Karim wrote: Use regular expression for bad HTLM or beautifulSoup (google it), below a exemple to extract all html links: Actually, using regex wasn't so bad: import re import urllib.request url = 'http://x264.nl/x264/?dir=./6

Re: Trying to understand html.parser.HTMLParser

2011-05-19 Thread Andrew Berg
On 2011.05.16 02:26 AM, Karim wrote: > Use regular expression for bad HTLM or beautifulSoup (google it), below > a exemple to extract all html links: Actually, using regex wasn't so bad: > import re > import urllib.request > > url = 'http://x264.nl/x264/?dir=./64bit/8bit_depth' > page = str(urllib

Re: Trying to understand html.parser.HTMLParser

2011-05-18 Thread Stefan Behnel
Andrew Berg, 19.05.2011 02:39: On 2011.05.18 03:30 AM, Stefan Behnel wrote: Well, it pretty clearly states that on the PyPI page, but I also added it to the project home page now. lxml 2.3 works with any CPython version from 2.3 to 3.2. Thank you. I never would've looked at PyPI for info on a p

Re: Trying to understand html.parser.HTMLParser

2011-05-18 Thread Andrew Berg
On 2011.05.18 03:30 AM, Stefan Behnel wrote: > Well, it pretty clearly states that on the PyPI page, but I also added it > to the project home page now. lxml 2.3 works with any CPython version from > 2.3 to 3.2. Thank you. I never would've looked at PyPI for info on a project that has its own sit

Re: Trying to understand html.parser.HTMLParser

2011-05-18 Thread Stefan Behnel
Andrew Berg, 17.05.2011 03:05: lxml looks promising, but it doesn't say anywhere whether it'll work on Python 3 or not Well, it pretty clearly states that on the PyPI page, but I also added it to the project home page now. lxml 2.3 works with any CPython version from 2.3 to 3.2. Stefan --

Re: Trying to understand html.parser.HTMLParser

2011-05-17 Thread Karim
On 05/17/2011 03:05 AM, Andrew Berg wrote: On 2011.05.16 02:26 AM, Karim wrote: Use regular expression for bad HTLM or beautifulSoup (google it), below a exemple to extract all html links: linksList = re.findall('.*?',htmlSource) for link in linksList: print link I was afraid I might hav

Re: Trying to understand html.parser.HTMLParser

2011-05-16 Thread Andrew Berg
On 2011.05.16 02:26 AM, Karim wrote: > Use regular expression for bad HTLM or beautifulSoup (google it), below > a exemple to extract all html links: > > linksList = re.findall('.*?',htmlSource) > for link in linksList: > print link I was afraid I might have to use regexes (mostly because I c

Re: Trying to understand html.parser.HTMLParser

2011-05-16 Thread Karim
On 05/16/2011 03:06 AM, David Robinow wrote: On Sun, May 15, 2011 at 4:45 PM, Andrew Berg wrote: I'm trying to understand why HMTLParser.feed() isn't returning the whole page. My test script is this: import urllib.request import html.parser class MyHTMLParser(html.parser.HTMLParser): def h

Re: Trying to understand html.parser.HTMLParser

2011-05-15 Thread David Robinow
On Sun, May 15, 2011 at 4:45 PM, Andrew Berg wrote: > I'm trying to understand why HMTLParser.feed() isn't returning the whole > page. My test script is this: > > import urllib.request > import html.parser > class MyHTMLParser(html.parser.HTMLParser): >    def handle_starttag(self, tag, attrs): >

Trying to understand html.parser.HTMLParser

2011-05-15 Thread Andrew Berg
I'm trying to understand why HMTLParser.feed() isn't returning the whole page. My test script is this: import urllib.request import html.parser class MyHTMLParser(html.parser.HTMLParser): def handle_starttag(self, tag, attrs): if tag == 'a' and attrs: print(tag,'-',attrs