On 11 Jan 2007 13:28:14 -0800, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > HELP! > Guy who was here before me wrote a script to parse files in Python. > > Includes line: > print u > where u is a line from a file we are parsing. > However, we have started recieving data from Brazil. If I open file to > parse in VI, looks like: > > <Utt id="3" transcribe="yes" audioRoot="A1" > audio="313-20070102144528.wav" grammarSet="G3" rawText="não" > recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0" > transcribedText="não" parsableText="não"/ > > Clearly those "nã" are some non-Ascii characters, but how do I get > print to understand that? > > I keep getting: > "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in > position 40: > ordinal not in range(128)" >
Find out what encoding the files are in and modify the script to use it. -- http://mail.python.org/mailman/listinfo/python-list