"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > I understand that the web is full of ill-formed XHTML web pages but > this is Microsoft:
Yes... And Microsoft is responsible for a lot of the ill-formed pages on the web be it on their website or made by their applications. > > http://moneycentral.msn.com/companyreport?Symbol=BBBY > > I can't validate it and xml.minidom.dom.parseString won't work on it. > > If this was just some teenager's web site I'd move on. Is there any > hope avoiding regular expression hacks to extract the data from this > page? It all depends on what data you want. Probably a non-validating parser would be able to extract some things. Another option is pass the page through some validator that can fix the page, like tidy... -- Jorge Godoy <[EMAIL PROTECTED]> -- http://mail.python.org/mailman/listinfo/python-list