Agreed that the web sites are probably broken. Try running the HTML though HTMLTidy (http://tidy.sourceforge.net/). Doing that has allowed me to parse where I had problem such as yours.
I have also had luck with BeautifulSoup, which also includes a tidy function in it. Just Another Victim of the Ambient Morality wrote: > "Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote > in message news:[EMAIL PROTECTED] > > > > Okay, I think I found what I'm looking for in HTMLParser in the > > HTMLParser module. > > Except it appears to be buggy or, at least, not very robust. There are > websites for which it falsely terminates early in the parsing. I have a > sneaking feeling the sgml parser will be more robust, if only it had that > one feature I am looking for. > Can someone help me out here? > Thank you... -- http://mail.python.org/mailman/listinfo/python-list