On Jan 11, 8:53 pm, Jeremy <jlcon...@gmail.com> wrote: > I have a file that has unicode escape sequences, i.e., > > J\u00e9r\u00f4me > > and I want to replace all of them in a file and write the results to a new > file. The simple script I've created is copied below. However, I am getting > the following error: > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position > 947: ordinal not in range(128) > > It appears that the data isn't being converted when writing to the file. Can > someone please help?
Are you _sure_ that your file contains the characters '\', 'u', '0', '0', 'e' and '9'? I expect that actually your file contains a byte with value 0xe9 and you have inspected the file using Python, which has printed the byte using a Unicode escape sequence. Open the file using a text editor or hex editor and look at the value at offset 947 to be sure. If so, you need to replace 'unicode-escape' with the actual encoding of the file. > if __name__ == "__main__": > f = codecs.open(filename, 'r', 'unicode-escape') > lines = f.readlines() > line = ''.join(lines) > f.close() > > utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line) > print(utFound[:1000]) > > o = open('newDice.sql', 'w') > o.write(utFound.decode('utf-8')) > o.close() -- http://mail.python.org/mailman/listinfo/python-list