Hi all, I'm using TidyHTMLTreeBuilder to model syntax structure of HTML documents. I've been trying to feed in Yahoo and CNN, but the parser seems to crash:
" File "C:\Python23\Lib\site-packages\elementtidy\TidyHTMLTreeBuilder.py", line 89, in parse return ElementTree.parse(source, TreeBuilder()) File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 865, in parse tree.parse(source, parser) File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 590, in parse self._root = parser.close() File "C:\Python23\Lib\site-packages\elementtidy\TidyHTMLTreeBuilder.py", line 75, in close return ElementTree.XML(stdout) File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 879, in XML return parser.close() File "C:\Python23\lib\site-packages\elementtree\ElementTree.py", line 1169, in close self._parser.Parse("", 1) # end of data ExpatError: no element found: line 1, column 0" Could someone else please try it on their system and see if they also have the same problem? I suspect this problem relates to <form> inside <table>. Thank you very much for any help. Cheers, Michael -- http://mail.python.org/mailman/listinfo/python-list