OK, I apologise for not being clearer. 1. Here is my input data file, line 2: gn1:1,1.2 R")$I73YT R")[EMAIL PROTECTED]
2. Here is my output data file, line 2: u'gn', u'1', u'1', u'1', u'2', u'-', u'R")$I73YT', u'R")$IYT', u'R")$IYT', u'@', u'ncfsa', u'nc', '', '', '', u'f', u's', u'a', '', '', '', '', '', '', '', '', u'B.:R")$I^YT', u'b.:cv)cv^yc', '\xc9\x94' 3. Here is my main program: # -*- coding: UTF-8 -*- import codecs import splitFunctions import surfaceIPA # Constants for file location # Working directory constants dir_root = 'E:\\' dir_relative = '2 Core\\2b Data\\Data Working\\' # Input file constants input_file_name = 'in.grab.txt' input_file_loc = dir_root + dir_relative + input_file_name # Initialise input file input_file = codecs.open(input_file_loc, 'r', 'utf-8') # Output file constants output_file_name = 'out.grab.txt' output_file_loc = dir_root + dir_relative + output_file_name # Initialise output file output_file = codecs.open(output_file_loc, 'w', 'utf-8') # unicode i = 0 for line in input_file: if line[0] != '>': # Ignore headers i += 1 if i != 1: word_info = splitFunctions.splitGrab(line, i) parse=splitFunctions.splitParse(word_info[10]) gloss=surfaceIPA.surfaceIPA(word_info[6],word_info[8],word_info[9],parse) a=str(word_info + parse + gloss).encode('utf-8') a=a[1:len(a)-1] output_file.write(a) output_file.write('\n') input_file.close() output_file.close() print 'done' 4. Here is my problem: At the end of my output file, where my unicode character \u0254 (OPEN O) appears, the file has '\xc9\x94' What I want is an output file like: 'gn', '1', '1', '1', '2', '-', ..... 'ɔ' where ɔ is an open O, and would display correctly in the appropriate font. Once I can get it to display properly, I will rewrite gloss so that it returns a proper translation of 'R")$I73YT', which will be a string of unicode characters. Is this clearer? The other two functions are basic. splitGrab turns 'gn1:1,1.2 R")$I73YT R")[EMAIL PROTECTED]' into 'gn 1 1 1 2 R")$I73YT R")$IYT @ ncfsa' and splitParse turns the final piece of this 'ncfsa' into 'n c f s a'. They have to be done separately as splitParse involves some translation and program logic. SurfaceIPA reads in 'R")$I73YT' and other data to produce the unicode string. At the moment it just returns two dummy strings and u'\u0254'.encode('utf-8'). All help is appreciated! Thanks -- http://mail.python.org/mailman/listinfo/python-list