Re: parse HTML by class rather than tag

gatti Fri, 23 Feb 2007 00:37:30 -0800

On Feb 23, 8:54 am, [EMAIL PROTECTED] wrote:
> Hello,
>
> i'm would be interested in parsing a HTML files by its corresponding
> opening and closing tags but by taking into account the class
> attributes and its values,
[...]
> so i wondering if i should go with regular expression, but i do not
> think so as i must jumpt after inner closing div, or with a simple
> parser, i've searched and 
> foundhttp://www.diveintopython.org/html_processing/basehtmlprocessor.html
> but i would like the parser not to change anything at all (no
> lowercase).


Horribly brittle idea. Use a robust HTML parser (e.g.
http://www.crummy.com/software/BeautifulSoup/) to build a document
tree, then visit it top down and look at the value of the 'class'
attributes.

Regards,
Lorenzo Gatti

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parse HTML by class rather than tag

Reply via email to