Karen, thanks for your answer, here is some more details on what I am trying to do. I over-simplified I m sorry:
1 - I would like to realize a search on name fields encode in utf8 so that a search with key Remi entered by user reutrns all entries that are equivalent to their canonical form (Rémi would be returned in our example) For this I pull the data from DB and use unicodedata 's normalize function on foth unicode_data unic = string.decode('utf-8') normalized = unicodedata.normalize('NFKD', unic) return normalized.encode('ASCII', 'ignore') This one is throwing the error on the first decode Your suggestion that it is already unicode is really interesting. But I tried without the first decode and it still crashes. 2 - I am trying to pass some string to javascript so I would like to transform all ' by \' ######################################################## # stringEditApostrophe # Find all apostophes and slashes and delimit with a slash ######################################################## def stringEditApostrophe( vsText): lsMethod = className + '.stringEditApostrophe' try: lsFinal = vsText.replace("\\", "\\\\") lsFinal = lsFinal.replace("'", "\\'") lsFinal = lsFinal.replace('%', '%%') except Exception, e: #Log what happened logging.debug(lsMethod + ".Error: " + str(e)) return lsFinal It seems that this method is not recognizing any of the ' char. But ABOVE ALL pease look at this strange thing: i tried to simply recreate your example to understand how it works and I got an error even on a basic decode. Am I missing something? (note the 2.4.4 version) Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = 'Rémi' >>> type(s) <type 'str'> >>> s.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in ? File "encodings/utf_8.py", line 16, in decode UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data On Sep 1, 7:26 pm, "Karen Tracey" <[EMAIL PROTECTED]> wrote: > On Mon, Sep 1, 2008 at 4:23 PM, Max <[EMAIL PROTECTED]> wrote: > > > Hi, > > I ve been trying to figure out why I can t handle utf-8 properly on my > > production server, while it works perfectly on my local django dev > > server > > > When I try to run this line > > unicode_data.decode("utf-8") > > What is 'unicode_data' here? A model field? > > > > > With data coming from DB > > " Rémi " > > > I get on production only > > 'ascii' codec can't encode character u'\xe9' > > I am confused why you are trying to decode() data coming from the DB. You > decode() a string to turn it into a unicode object, but assuming you are > running a Django from post-Unicode branch merge (which '0.97' could be, but > '0.97' is not actually a release, just some level from SVN after 0.96 so it > is hard to be sure), the data should already be unicode when you see it. > The error you are getting on production is what I would expect for the case > where the data read from the DB has already been transformed from string to > unicode: > > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information.>>> s = > 'Rémi' > >>> type(s) > <type 'str'> > >>> u1 = s.decode('utf-8') > >>> type(u1) > <type 'unicode'> > >>> u1.decode('utf-8') > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "encodings/utf_8.py", line 16, in decode > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position > 1: ordinal not in range(128) > > > > So my question would not be why is this failing on production but why is it > working on development? The one case I know of where data is returned as > strings instead of unicode for MySQL is when you have set a binary collation > on the column in the database, but from what you list below you have not > done that. You are running an alpha version of MySQLdb on development, so > perhaps that is causing trouble. > > To debug further, please elaborate on what 'unicode_data' is (a model > field? if so, provide the model field definition). It would be nice to > know what type it is on each machine -- if it is a model field it should be > <type 'unicode'>, assuming it's a character -type field. And you should not > have to be calling decode() on it at all. > > Karen > > > > > I spent a lot of time on the forums and I could fixed that rather > > easily on my dev server with the addition of DEFAULT_CHARSET = 'utf-8' > > in settings.py and re-defining collations and charsets in DB. > > But for some reason there is still some ascii conversion in production > > and it s driving me Crazy!!! > > I tried it all, I don t know what to do next. Please help! > > > DEV ENV > > > ------------------------------------------------------------------------------ > > python 2.5.1 > > Mysql: 6.0.3-alpha-community MySQL Community Server (GPL) > > django: 0.97 > > os: vista > > > PROD ENV > > > ------------------------------------------------------------------------------ > > python 2.4.4 > > Mysql: 5.0.32-Debian_7etch1-log Debian etch distribution > > django: 0.97 > > os: linux > > > DEV DB > > > ------------------------------------------------------------------------------- > > character_set_client utf8 > > character_set_connection utf8 > > character_set_database latin1 > > character_set_filesystem binary > > character_set_results utf8 > > character_set_server latin1 > > character_set_system utf8 > > character_sets_dir C:\\Program Files\\MySQL\\MySQL Server > > 6.0\\share\\charsets\\ > > > collation_connection utf8_general_ci > > collation_database latin1_swedish_ci > > collation_server latin1_swedish_ci > > > On every table i did a > > alter table tablename CONVERT TO CHARACTER SET utf8 collate > > utf8_general_ci > > > PROD DB > > > ------------------------------------------------------------------------------- > > character_set_client utf8 > > character_set_connection utf8 > > character_set_database utf8 > > character_set_filesystem binary > > character_set_results utf8 > > character_set_server utf8 > > character_set_system utf8 > > character_sets_dir /usr/share/mysql/charsets/ > > > collation_connection utf8_general_ci > > collation_database utf8_general_ci > > collation_server utf8_unicode_ci > > > On every table i did a > > alter table tablename CONVERT TO CHARACTER SET utf8 collate > > utf8_general_ci > > > DEV SERVER: django dev server > > > ----------------------------------------------------------------------------------- > > no special setting > > > PROD SERVER : apache2 > > > ----------------------------------------------------------------------------------- > > AddDefaultCharset utf8 > > > DJANGO SETTINGS (prod and serv) > > > ----------------------------------------------------------------------------------- > > DEFAULT_CHARSET = 'utf-8' > > TIME_ZONE = 'America/New York' > > LANGUAGE_CODE = 'en-us' > > USE_I18N = True --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---