Re: How use XML parsing tools on this one specific URL?

Jorge Godoy Sun, 04 Mar 2007 09:56:01 -0800

"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:

> I understand that the web is full of ill-formed XHTML web pages but
> this is Microsoft:


Yes...  And Microsoft is responsible for a lot of the ill-formed pages on the
web be it on their website or made by their applications. 
>
> http://moneycentral.msn.com/companyreport?Symbol=BBBY
>
> I can't validate it and xml.minidom.dom.parseString won't work on it.
>
> If this was just some teenager's web site I'd move on.  Is there any
> hope avoiding regular expression hacks to extract the data from this
> page?

It all depends on what data you want.  Probably a non-validating parser would
be able to extract some things.  Another option is pass the page through some
validator that can fix the page, like tidy... 


-- 
Jorge Godoy      <[EMAIL PROTECTED]>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How use XML parsing tools on this one specific URL?

Reply via email to