On Sep 17, 4:51 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Mon, 17 Sep 2007 17:31:19 -0300, <[EMAIL PROTECTED]> escribi?: > > > I am attempting to extract some XML from an HTML document that I get > > returned from a form based web page. For some reason, I cannot figure > > out how to do this. I thought I could use the minidom module to do it, > > but all I get is a screwy traceback: > > > Traceback (most recent call last): > > File "C:\Python24\lib\xml\dom\expatbuilder.py", line 207, in > > parseFile > > parser.Parse(buffer, 0) > > ExpatError: mismatched tag: line 1, column 357 > > So your HTML is not a well formed XML document, as many html pages, and > you can't use an XML parser. (even a valid HTML document may not be valid > XML). Let's try with some mismatched tags:
> Depending on your document, you may prefer to extract the XML blocks using > BeautifulSoup, and then parse each one using BeautifulStoneSoup (the XML > parser) or xml.etree.ElementTree > > -- > Gabriel Genellina Thanks for the reply. I already knew about BeautifulSoup but I was hoping to avoid installing *yet another module* on my PC. I got it to work with lxml, but it's not very pretty. See my reply to Stefan. Mike -- http://mail.python.org/mailman/listinfo/python-list