manstey wrote: > a=str(word_info + parse + gloss).encode('utf-8') > a=a[1:len(a)-1] > > Is this clearer?
Indeed. The problem is your usage of str() to "render" the output. As word_info+parse+gloss is a list (or is it a tuple?), str() will already produce "Python source code", i.e. an ASCII byte string that can be read back into the interpreter; all Unicode is gone from that string. If you want comma-separated output, you should do this: def comma_separated_utf8(items): result = [] for item in items: result.append(item.encode('utf-8')) return ", ".join(result) and then a = comma_separated_utf8(word_info + parse + gloss) Then you don't have to drop the parentheses from a anymore, as it won't have parentheses in the first place. As the encoding will be done already in the output file, the following should also work: a = u", ".join(word_info + parse + gloss) This would make "a" a comma-separated unicode string, so that the subsequent output_file.write(a) encodes it as UTF-8. If that doesn't work, I would like to know what the exact value of gloss is, do print "GLOSS IS", repr(gloss) to print it out. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list