Re: Extracting xml from html

kyosohma Mon, 17 Sep 2007 14:36:25 -0700

On Sep 17, 4:01 pm, Paul Boddie <[EMAIL PROTECTED]> wrote:
> On 17 Sep, 22:31, [EMAIL PROTECTED] wrote:
>
>
>
> > What's the best way to get at the XML? Do I need to somehow parse it
> > using the HTMLParser and then parse that with minidom or what?
>
> Probably easiest is to use an XML processing toolkit or library which
> supports HTML parsing. Since the libxml2 library (written in C) makes
> a fairly good job of HTML parsing, I would suggest either libxml2dom
> (for a DOM-like API) or lxml (for an ElementTree-like API) as suitable
> Python wrappers of libxml2. Of course, HTMLParser or SGMLParser should
> work, but the programming style is a bit more convoluted unless you're
> used to XML processing using a SAX-like API.
>
> Paul
>
> P.S. I'm biased towards libxml2dom, being the developer, but I use it
> routinely and it generally does the job for me.


I have lxml installed and I appear to also have libxml2dom installed.
I know lxml has decent docs, but I don't see much for yours. Is this
the only place to go: http://www.boddie.org.uk/python/libxml2dom.html
?

Mike

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Extracting xml from html

Reply via email to