Karen,
thanks for your answer,
here is some more details on what I am trying to do. I over-simplified
I m sorry:

1 - I would like to realize a search on name fields encode in utf8 so
that a search with key Remi entered by user reutrns all entries that
are equivalent to their canonical form (Rémi would be returned in our
example)
For this I pull the data from DB and use unicodedata 's normalize
function on foth unicode_data

      unic = string.decode('utf-8')
      normalized = unicodedata.normalize('NFKD', unic)
      return normalized.encode('ASCII', 'ignore')

This one is throwing the error on the first decode
Your suggestion that it is already unicode is really interesting. But
I tried without the first decode and it still crashes.

2 - I am trying to pass some string to javascript so I would like to
transform all ' by \'
########################################################
# stringEditApostrophe
# Find all apostophes and slashes and delimit with a slash
########################################################
def stringEditApostrophe( vsText):
    lsMethod = className +  '.stringEditApostrophe'

    try:
        lsFinal = vsText.replace("\\", "\\\\")
        lsFinal = lsFinal.replace("'", "\\'")
        lsFinal = lsFinal.replace('%', '%%')

    except Exception, e:
        #Log what happened
        logging.debug(lsMethod + ".Error: " + str(e))

    return lsFinal

It seems that this method is not recognizing any of the ' char.


But ABOVE ALL pease look at this strange thing: i tried to simply
recreate your example to understand how it works and I got an error
even on a basic decode. Am I missing something? (note the 2.4.4
version)

Python 2.4.4 (#2, Apr  5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> s = 'Rémi'
>>> type(s)
<type 'str'>
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "encodings/utf_8.py", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data



On Sep 1, 7:26 pm, "Karen Tracey" <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 1, 2008 at 4:23 PM, Max <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > I ve been trying to figure out why I can t handle utf-8 properly on my
> > production server, while it works perfectly on my local django dev
> > server
>
> > When I try to run this line
> > unicode_data.decode("utf-8")
>
> What is 'unicode_data' here?  A model field?
>
>
>
> > With data coming from DB
> > " Rémi "
>
> > I get on production only
> > 'ascii' codec can't encode character u'\xe9'
>
> I am confused why you are trying to decode() data coming from the DB.  You
> decode() a string to turn it into a unicode object, but assuming you are
> running a Django from post-Unicode branch merge (which '0.97' could be, but
> '0.97' is not actually a release, just some level from SVN after 0.96 so it
> is hard to be sure), the data should already be unicode when you see it.
> The error you are getting on production is what I would expect for the case
> where the data read from the DB has already been transformed from string to
> unicode:
>
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
> [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> s = 
> 'Rémi'
> >>> type(s)
> <type 'str'>
> >>> u1 = s.decode('utf-8')
> >>> type(u1)
> <type 'unicode'>
> >>> u1.decode('utf-8')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "encodings/utf_8.py", line 16, in decode
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position
> 1: ordinal not in range(128)
>
>
>
> So my question would not be why is this failing on production but why is it
> working on development?  The one case I know of where data is returned as
> strings instead of unicode for MySQL is when you have set a binary collation
> on the column in the database, but from what you list below you have not
> done that.  You are running an alpha version of MySQLdb on development, so
> perhaps that is causing trouble.
>
> To debug further, please elaborate on what 'unicode_data' is (a model
> field?  if so, provide the model field definition).  It would be nice to
> know what type it is on each machine -- if it is a model field it should be
> <type 'unicode'>, assuming it's a character -type field.  And you should not
> have to be calling decode() on it at all.
>
> Karen
>
>
>
> > I spent a lot of time on the forums and I could fixed that rather
> > easily on my dev server with the addition of DEFAULT_CHARSET = 'utf-8'
> > in settings.py and re-defining collations and charsets in DB.
> > But for some reason there is still some ascii conversion in production
> > and it s driving me Crazy!!!
> > I tried it all, I don t know what to do next. Please help!
>
> > DEV ENV
>
> > ------------------------------------------------------------------------------
> > python 2.5.1
> > Mysql: 6.0.3-alpha-community MySQL Community Server (GPL)
> > django: 0.97
> > os: vista
>
> > PROD ENV
>
> > ------------------------------------------------------------------------------
> > python 2.4.4
> > Mysql: 5.0.32-Debian_7etch1-log Debian etch distribution
> > django: 0.97
> > os: linux
>
> > DEV DB
>
> > -------------------------------------------------------------------------------
> > character_set_client                utf8
> > character_set_connection        utf8
> > character_set_database           latin1
> > character_set_filesystem         binary
> > character_set_results              utf8
> > character_set_server               latin1
> > character_set_system             utf8
> > character_sets_dir                  C:\\Program Files\\MySQL\\MySQL Server
> > 6.0\\share\\charsets\\
>
> > collation_connection               utf8_general_ci
> > collation_database                 latin1_swedish_ci
> > collation_server                      latin1_swedish_ci
>
> > On every table i did a
> > alter table tablename CONVERT TO CHARACTER SET utf8 collate
> > utf8_general_ci
>
> > PROD DB
>
> > -------------------------------------------------------------------------------
> > character_set_client                utf8
> > character_set_connection        utf8
> > character_set_database           utf8
> > character_set_filesystem          binary
> > character_set_results               utf8
> > character_set_server                utf8
> > character_set_system              utf8
> > character_sets_dir                    /usr/share/mysql/charsets/
>
> > collation_connection                utf8_general_ci
> > collation_database                   utf8_general_ci
> > collation_server                        utf8_unicode_ci
>
> > On every table i did a
> > alter table tablename CONVERT TO CHARACTER SET utf8 collate
> > utf8_general_ci
>
> > DEV SERVER: django dev server
>
> > -----------------------------------------------------------------------------------
> > no special setting
>
> > PROD SERVER : apache2
>
> > -----------------------------------------------------------------------------------
> > AddDefaultCharset utf8
>
> > DJANGO SETTINGS (prod and serv)
>
> > -----------------------------------------------------------------------------------
> > DEFAULT_CHARSET = 'utf-8'
> > TIME_ZONE = 'America/New York'
> > LANGUAGE_CODE = 'en-us'
> > USE_I18N = True
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to