[EMAIL PROTECTED] wrote: > > Chris> http://moneycentral.msn.com/companyreport?Symbol=BBBY > > Chris> I can't validate it and xml.minidom.dom.parseString won't work on > Chris> it. > > Chris> If this was just some teenager's web site I'd move on. Is there > Chris> any hope avoiding regular expression hacks to extract the data > Chris> from this page? > > Tidy it perhaps or use BeautifulSoup? ElementTree can use tidy if it's > available.
ElementTree can also use BeautifulSoup: http://effbot.org/zone/element-soup.htm as noted on that page, tidy is a bit too picky for this kind of use; it's better suited for "normalizing" HTML that you're producing yourself than for parsing arbitrary HTML. </F> -- http://mail.python.org/mailman/listinfo/python-list