[EMAIL PROTECTED] schrieb: > HELP! > Guy who was here before me wrote a script to parse files in Python. > > Includes line: > print u > where u is a line from a file we are parsing. > However, we have started recieving data from Brazil. If I open file to > parse in VI, looks like: > > <Utt id="3" transcribe="yes" audioRoot="A1" > audio="313-20070102144528.wav" grammarSet="G3" rawText="não" > recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0" > transcribedText="não" parsableText="não"/ > > Clearly those "nã" are some non-Ascii characters, but how do I get > print to understand that? > > I keep getting: > "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in > position 40: > ordinal not in range(128)" >
Does the error happen at the print u line? If yes, what happens is that you try and print a unicode object. Which means that it has to be converted (actually the right term is encoded) to a byte-string. If you don't do that explicitely, it will be done implicitly, using the default encoding - which is ascii. If you have non-ascii characters, you end up with the error you see. What to do? Use something like this: print u.encode('utf-8') instead. Diez -- http://mail.python.org/mailman/listinfo/python-list