Tim Arnold wrote:
Hi, I have some html files that I want to validate by using an external
script 'validate'. The html files need a doctype header attached before
validation. The files are in utf8 encoding. My code:
---------------
import os,sys
import codecs,subprocess
HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'
filename = 'mytest.html'
fd = codecs.open(filename,'rb',encoding='utf8')
s = HEADER + fd.read()
fd.close()
p = subprocess.Popen(['validate'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
validate = p.communicate(unicode(s,encoding='utf8'))
print validate
---------------
I get lots of lines like this:
Error at line 1, character 66:\tillegal character number 0
etc etc.
But I can give the command in a terminal 'cat mytest.html | validate' and
get reasonable output. My subprocess code must be wrong, but I could use
some help to see what the problem is.
python2.5.1, freebsd6
thanks,
--Tim
The usual rule in debugging: split the problem into two parts, and test
each one separately, starting with the one you think most likely to be
the culprit
In this case the obvious place to split is with the data you're passing
to the communicate call. I expect it's already wrong, long before you
hand it to the subprocess. So write it to a file instead, and inspect
it with a binary file viewer. And of course test it manually with your
validate program. Is validate really expecting a Unicode stream in stdin ?
--
http://mail.python.org/mailman/listinfo/python-list