After researching it some more, it seems that Django uses UTF-8 byte
strings internally (as opposed to the actual Unicode strings that
Python supports).  So the following regular expression actually does
work:

        r"^name/(?P<name>[^/]+)/$"

What is passed in the `name` parameter is a UTF-8 byte string, so if
you need an actual Unicode string, you can do the following:

        unicode_name = name.decode('utf8')

The only thing I worried about initially was the possibility of having
one of the bytes making up the Chinese character be `0x2F`, which is
the code for a forward slash.  Since the regular expression matches
against the UTF-8 byte string, it treats each byte as an independent
character and thus would treat such a byte as a forward slash.  But
after reading a bit about UTF-8, it sounds like `0x2F` is never used
in anything but the forward slash.

Now my issue switches to the analogous one for the model layer.  It
seems that SQLite, at least, works fine with the UTF-8 byte strings
Django gives it, and faithfully returns them when asked.  But again,
one has to worry about remembering to decode them into Unicode strings
when needed, which is a bit annoying.  And of course you have to
remember to make your database fields three or four times longer than
needed, since each character takes up three or four bytes.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to