Re: Unicode again ... default codec ...

Gabriel Genellina Tue, 20 Oct 2009 20:55:24 -0700

En Tue, 20 Oct 2009 17:13:52 -0300, Stef Mientki <[email protected]>
escribió:

Form the thread "how to write a unicode string to a file ?"
and my specific situation:
- reading data from Excel, Delphi and other Windows programs and unicodePython
- using wxPython, which forces unicode
- writing to Excel and other Windows programs

almost all answers, directed to the following solution:
- in the python program, turn every string as soon as possible intounicode
- in Python all processing is done in unicode
- at the end, translate unicode into the windows specific character set(if necessary)


Yes. That's the way to go; if you follow the above guidelines when working
with character data, you should not encounter big unicode problems.

The above approach seems to work nicely,
but manipulating heavily with string like objects it's a crime.
It's impossible to change all my modules from strings to unicode at once,
and it's very tempting to do it just the opposite : convert everythinginto strings !


Wide is the road to hell...

# adding unicode string and windows strings, results in an error:
my_u = u'my_u'
my_w = 'my_w' + chr ( 246 )
x = my_s + my_u


(I guess you meant my_w + my_u). Formally:

x = my_w.decode('windows-1252') + my_u  # [1]

but why are you using a byte string in the first place? Why not:

my_w = u'my_w' + u'ö'

so you can compute my_w + my_u directly?

# to correctly handle the above ( in my situation), I need to write thefollowing code (which my code quite unreadable
my_u = u'my_u'
my_w = 'my_w' + chr ( 246 )
x = unicode ( my_s, 'windows-1252' )  + my_u

# converting to strings gives much better readable code:
my_u = u'my_u'
my_w = 'my_w' + chr ( 246 )
x = my_s + str(my_u)


But it's not the same thing, i.e., in the former case x is an unicode
object, in the later x is a byte string. Also, str(my_u) only works if it
contains just ascii characters. The counterpart of my code [1] above would
be:

x = my_w + my_u.encode('windows-1252')

That is, you use some_unicode_object.encode("desired-encoding") to do the
unicode->bytestring conversion, and
some_string_object.decode("known-encoding") to convert in the opposite
sense.

until I found this website:
  http://diveintopython.org/xml_processing/unicode.html

By settings the default encoding:
I now can go to unicode much more elegant and almost fully automatically:
(and I guess the writing to a file problem is also solved)
# now the manipulations of strings and unicode works OK:
my_u = u'my_u'
my_w = 'my_w' + chr ( 246 )
x = my_s + my_u

The only disadvantage is that you've to put a special named file intothe Python directory !!

So if someone knows a more elegant way to set the default codec,
I would be much obliged.


DON'T do that. Really. Changing the default encoding is a horrible,
horrible hack and causes a lot of problems. 'Dive into Python' is a great
book, but suggesting to alter the default character encoding is very, very
bad advice:

- site.py and sitecustomize.py contain *global* settings, affecting*all*

users and *all* scripts running on that machine. Other users may get very
angry at you when their own programs break or give incorrect results when
run with a different encoding.
   - you must have administrative rights to alter those files.
   - you won't be able to distribute your code, since almost everyone else
in the world won't be using *your* default encoding.
   - what if another library/package/application wants to set a different
default encoding?
   - the default encoding for Python>=3.0 is now 'utf-8' instead of 'ascii'

More reasons:
http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-is-evil/
See also this recent thread in python-dev:
http://comments.gmane.org/gmane.comp.python.devel/106134

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode again ... default codec ...

Reply via email to