So I see that elementtidy doesn't like strings with \0 characters in them:

 >>> import urllib
 >>> from elementtidy import TidyHTMLTreeBuilder
 >>> url = 'http://news.bbc.co.uk/1/hi/world/europe/492215.stm'
 >>> url_file = urllib.urlopen(url)
 >>> tree = TidyHTMLTreeBuilder.parse(url_file)
Traceback (most recent call last):
   ...
   File "...elementtidy\TidyHTMLTreeBuilder.py", line 90, in close
     stdout, stderr = _elementtidy.fixup(*args)
TypeError: fixup() argument 1 must be string without null bytes, not str

The obvious solution would be to str.replace('\0', '') on the file's 
text, but I'm not sure how to ask elementtidy to parse from a string 
instead of a file-like object.  Do I need to wrap it in a StringIO, or 
is there a better way?

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to