On Sep 18, 3:31 pm, [EMAIL PROTECTED] wrote: > On Sep 17, 4:51 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > wrote: > > > > > En Mon, 17 Sep 2007 17:31:19 -0300, <[EMAIL PROTECTED]> escribi?: > > > > I am attempting to extract some XML from an HTML document that I get > > > returned from a form based web page. For some reason, I cannot figure > > > out how to do this. I thought I could use the minidom module to do it, > > > but all I get is a screwy traceback: > > > > Traceback (most recent call last): > > > File "C:\Python24\lib\xml\dom\expatbuilder.py", line 207, in > > > parseFile > > > parser.Parse(buffer, 0) > > > ExpatError: mismatched tag: line 1, column 357 > > > So your HTML is not a well formed XML document, as many html pages, and > > you can't use an XML parser. (even a valid HTML document may not be valid > > XML). Let's try with some mismatched tags: > > Depending on your document, you may prefer to extract the XML blocks using > > BeautifulSoup, and then parse each one using BeautifulStoneSoup (the XML > > parser) or xml.etree.ElementTree > > > -- > > Gabriel Genellina > > Thanks for the reply. I already knew about BeautifulSoup but I was > hoping to avoid installing *yet another module* on my PC.
That's a poor excuse for a self-contained module in a single file. "Installing" it can be as simple as having it in the same directory of your module that imports it. Given that you can do in 2 lines what took you around 15 with lxml, I wouldn't think it twice. George -- http://mail.python.org/mailman/listinfo/python-list