"Dave Angel" <da...@ieee.org> wrote in message news:mailman.25.1242113076.8015.python-l...@python.org... > Tim Arnold wrote: >> Hi, I have some html files that I want to validate by using an external >> script 'validate'. The html files need a doctype header attached before >> validation. The files are in utf8 encoding. My code: >> --------------- >> import os,sys >> import codecs,subprocess >> HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 >> Transitional//EN">' >> >> filename = 'mytest.html' >> fd = codecs.open(filename,'rb',encoding='utf8') >> s = HEADER + fd.read() >> fd.close() >> >> p = subprocess.Popen(['validate'], >> stdin=subprocess.PIPE, >> stdout=subprocess.PIPE, >> stderr=subprocess.STDOUT) >> validate = p.communicate(unicode(s,encoding='utf8')) >> print validate >> --------------- >> >> I get lots of lines like this: >> Error at line 1, character 66:\tillegal character number 0 >> etc etc. >> >> But I can give the command in a terminal 'cat mytest.html | validate' and >> get reasonable output. My subprocess code must be wrong, but I could use >> some help to see what the problem is. >> >> python2.5.1, freebsd6 >> thanks, >> --Tim >> >> >> >> > The usual rule in debugging: split the problem into two parts, and test > each one separately, starting with the one you think most likely to be the > culprit > > In this case the obvious place to split is with the data you're passing to > the communicate call. I expect it's already wrong, long before you hand > it to the subprocess. So write it to a file instead, and inspect it with > a binary file viewer. And of course test it manually with your validate > program. Is validate really expecting a Unicode stream in stdin ? >
Good advice from everyone. The example was simpler than my actual situation, but it did show the problem. Dave's final question was the right one: I needed to pass the html content as a string, not unicode object: HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">\n' filename = 'mytest.html' fd = codecs.open(filename,'rb',encoding='utf8') s = HEADER + fd.read().encode('utf8') # <- made the difference fd.close() p = subprocess.Popen(['validate',], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) validate = p.communicate(s) print validate -- http://mail.python.org/mailman/listinfo/python-list