Re: A Unicode problem -HELP

Martin v. Löwis Tue, 16 May 2006 23:10:46 -0700

manstey wrote:
>                       a=str(word_info + parse + gloss).encode('utf-8')
>                       a=a[1:len(a)-1]
> 
> Is this clearer?


Indeed. The problem is your usage of str() to "render" the output.
As word_info+parse+gloss is a list (or is it a tuple?), str() will
already produce "Python source code", i.e. an ASCII byte string
that can be read back into the interpreter; all Unicode is gone
from that string. If you want comma-separated output, you should
do this:

def comma_separated_utf8(items):
    result = []
    for item in items:
        result.append(item.encode('utf-8'))
    return ", ".join(result)

and then
             a = comma_separated_utf8(word_info + parse + gloss)

Then you don't have to drop the parentheses from a anymore, as
it won't have parentheses in the first place.

As the encoding will be done already in the output file,
the following should also work:

              a = u", ".join(word_info + parse + gloss)

This would make "a" a comma-separated unicode string, so that
the subsequent output_file.write(a) encodes it as UTF-8.

If that doesn't work, I would like to know what the exact
value of gloss is, do

  print "GLOSS IS", repr(gloss)

to print it out.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A Unicode problem -HELP

Reply via email to