Tim Arnold wrote:
Hi, I have some html files that I want to validate by using an external script 'validate'. The html files need a doctype header attached before validation. The files are in utf8 encoding. My code:
---------------
import os,sys
import codecs,subprocess
HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'

filename  = 'mytest.html'
fd = codecs.open(filename,'rb',encoding='utf8')
s = HEADER + fd.read()
fd.close()

p = subprocess.Popen(['validate'],
                    stdin=subprocess.PIPE,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.STDOUT)
validate = p.communicate(unicode(s,encoding='utf8'))
print validate
---------------

I get lots of lines like this:
Error at line 1, character 66:\tillegal character number 0
etc etc.

But I can give the command in a terminal 'cat mytest.html | validate' and get reasonable output. My subprocess code must be wrong, but I could use some help to see what the problem is.

python2.5.1, freebsd6
thanks,
--Tim



The usual rule in debugging: split the problem into two parts, and test each one separately, starting with the one you think most likely to be the culprit

In this case the obvious place to split is with the data you're passing to the communicate call. I expect it's already wrong, long before you hand it to the subprocess. So write it to a file instead, and inspect it with a binary file viewer. And of course test it manually with your validate program. Is validate really expecting a Unicode stream in stdin ?


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to