So I see that elementtidy doesn't like strings with \0 characters in them: >>> import urllib >>> from elementtidy import TidyHTMLTreeBuilder >>> url = 'http://news.bbc.co.uk/1/hi/world/europe/492215.stm' >>> url_file = urllib.urlopen(url) >>> tree = TidyHTMLTreeBuilder.parse(url_file) Traceback (most recent call last): ... File "...elementtidy\TidyHTMLTreeBuilder.py", line 90, in close stdout, stderr = _elementtidy.fixup(*args) TypeError: fixup() argument 1 must be string without null bytes, not str
The obvious solution would be to str.replace('\0', '') on the file's text, but I'm not sure how to ask elementtidy to parse from a string instead of a file-like object. Do I need to wrap it in a StringIO, or is there a better way? STeVe -- http://mail.python.org/mailman/listinfo/python-list