At Thursday 11/1/2007 18:27, [EMAIL PROTECTED] wrote:

>HELP!
>Guy who was here before me wrote a script to parse files in Python.
>
>Includes line:
>print u
>where u is a line from a file we are parsing.
>However, we have started recieving data from Brazil. If I open file to
>parse in VI, looks like:
>
><Utt id="3" transcribe="yes" audioRoot="A1"
>audio="313-20070102144528.wav" grammarSet="G3" rawText="n&#227;o"
>recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
>transcribedText="n&#227;o" parsableText="n&#227;o"/

Is this part of an XML document? You should use a 
true XML parser instead of doing that by hand.

>Clearly those "n&#227" are some non-Ascii characters, but how do I get
>print to understand that?

Understanding how Unicode works may be very 
useful: http://www.amk.ca/python/howto/unicode

>I keep getting:
>"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
>position 40:
>  ordinal not in range(128)"

py> u = u"áéíóú"
py> print u, repr(u)
áéíóú u'\xe1\xe9\xed\xf3\xfa'
py> print str(u)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
characters in position 0-4: ordin
al not in range(128)
py> print u.encode('cp850')
áéíóú

(cp850 is my console encoding)


-- 
Gabriel Genellina
Softlab SRL 


        

        
                
__________________________________________________ 
Preguntá. Respondé. Descubrí. 
Todo lo que querías saber, y lo que ni imaginabas, 
está en Yahoo! Respuestas (Beta). 
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas 

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to