At Thursday 11/1/2007 18:27, [EMAIL PROTECTED] wrote: >HELP! >Guy who was here before me wrote a script to parse files in Python. > >Includes line: >print u >where u is a line from a file we are parsing. >However, we have started recieving data from Brazil. If I open file to >parse in VI, looks like: > ><Utt id="3" transcribe="yes" audioRoot="A1" >audio="313-20070102144528.wav" grammarSet="G3" rawText="não" >recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0" >transcribedText="não" parsableText="não"/
Is this part of an XML document? You should use a true XML parser instead of doing that by hand. >Clearly those "nã" are some non-Ascii characters, but how do I get >print to understand that? Understanding how Unicode works may be very useful: http://www.amk.ca/python/howto/unicode >I keep getting: >"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in >position 40: > ordinal not in range(128)" py> u = u"áéíóú" py> print u, repr(u) áéíóú u'\xe1\xe9\xed\xf3\xfa' py> print str(u) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordin al not in range(128) py> print u.encode('cp850') áéíóú (cp850 is my console encoding) -- Gabriel Genellina Softlab SRL __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas -- http://mail.python.org/mailman/listinfo/python-list