Re: piping input to an external script

norseman Mon, 11 May 2009 22:19:40 -0700

Tim Arnold wrote:

Hi, I have some html files that I want to validate by using an externalscript 'validate'. The html files need a doctype header attached beforevalidation. The files are in utf8 encoding. My code:
---------------
import os,sys
import codecs,subprocess
HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'
filename  = 'mytest.html'
fd = codecs.open(filename,'rb',encoding='utf8')
s = HEADER + fd.read()
fd.close()

p = subprocess.Popen(['validate'],
                    stdin=subprocess.PIPE,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.STDOUT)
validate = p.communicate(unicode(s,encoding='utf8'))
print validate
---------------

I get lots of lines like this:
Error at line 1, character 66:\tillegal character number 0
etc etc.
But I can give the command in a terminal 'cat mytest.html | validate' andget reasonable output. My subprocess code must be wrong, but I could usesome help to see what the problem is.
python2.5.1, freebsd6
thanks,
--Tim

============================

If you search through the recent Python-List for UTF-8 things you mightget the same understanding I have come to.

the problem is the use of python's 'print' subcommand or what ever itis. It 'cooks' things and someone decided that it would only handle 1/2of a byte (in the x'00 to x'7f' range) and ignore or send error messagesagainst anything else. I guess the person doing the deciding read thepart that says ASCII printables are in the 7 bit range and chose toignore the part about the rest of the byte being undefined. That isundefined, not disallowed. Means the high bit half can be used aswanted since it isn't already taken. Nor did whoever it was take a lookaround the computer world and realize the conflict that was going to begenerated by using only 1/2 of a byte in a 1byte+ world.

If you can modify your code to use read and write you can bypass printand be OK. Or just have python do the 'cat mytest.html | validate' foryou. (Apply a var for html and let python accomplish the the equivalentof Unix's:

   for f in *.html; do cat $f | validate; done
                        or
    for f in *.html; do validate $f; done  #file name available this way

If you still have problems, take a look at os.POPEN2 (and its popen3)
Also take look at os.spawn.. et al

HTH

Steve
--
http://mail.python.org/mailman/listinfo/python-list

Re: piping input to an external script

Reply via email to