Hi, I am attempting to extract some XML from an HTML document that I get returned from a form based web page. For some reason, I cannot figure out how to do this. I thought I could use the minidom module to do it, but all I get is a screwy traceback:
Traceback (most recent call last): File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy \xml_parser.py", line 69, in ? inst = ApptParser(url) File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy \xml_parser.py", line 19, in __init__ xml = self.getXml(url) File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy \xml_parser.py", line 30, in getXml doc = xml.dom.minidom.parse(f) File "C:\Python24\lib\xml\dom\minidom.py", line 1915, in parse return expatbuilder.parse(file) File "C:\Python24\lib\xml\dom\expatbuilder.py", line 928, in parse result = builder.parseFile(file) File "C:\Python24\lib\xml\dom\expatbuilder.py", line 207, in parseFile parser.Parse(buffer, 0) ExpatError: mismatched tag: line 1, column 357 Here's a sample of the html: <html> <body> lots of screwy text including divs and spans <Row status="o"> <RecordNum>1126264</RecordNum> <Make>Mitsubishi</Make> <Model>Mirage DE</Model> </Row> </body> </html> What's the best way to get at the XML? Do I need to somehow parse it using the HTMLParser and then parse that with minidom or what? Thanks a lot! Mike -- http://mail.python.org/mailman/listinfo/python-list