On Wed, Aug 24, 2011 at 2:50 PM, Michel30 <forerunn...@gmail.com> wrote: > > > On Aug 24, 3:22 pm, Tom Evans <tevans...@googlemail.com> wrote: >> On Wed, Aug 24, 2011 at 1:47 PM, Michel30 <forerunn...@gmail.com> wrote: >> > Hi all, >> >> > I have written an application using Django 1.3 , apache2 and a mysql >> > db. >> > I'm using the db to store filepaths and filenames for legacy purposes >> > while serving them to users with apache. >> >> > Now mysql is using latin-1 (with the filenames most likely stored in >> > CP-1252) while Django uses utf-8. >> >> That is not going to fly. You will likely need to ensure you have a >> consistent character encoding across your website, database and file >> system. >> >> Cheers >> >> Tom > > Tom, > > that looks like it would be best, yes (this is my first exposure to > encoding problems) > > I cannot change the filesystem or mysql encoding since the legacy > application is still using it. I assumed that with utf-8 I would be > good as it covers all(?) and I understood mysql translates itself from > latin-1 to utf-8 and vice versa. > > As far as I can see this only hurts my hyperlinks, more specifically > only file.filename so wouldn't translating only these work? >
Trusting mysql to DTRT with character encoding does not work well in my experience. For starters, if your database is latin1, there is a huge range of UTF-8 characters that cannot be encode to latin1. If your website is presented in UTF-8, as is default for Django, then input submitted by your users will be in UTF-8 as well, and quite easily cannot be stored in the database. Many browsers will submit \u2019 - ’ - instead of a simple ' character, which will not fit in latin1. When it comes to serving your files, Apache url-decodes your request, it doesn't assume anything about the character encoding of the bytes after that and will simply open that file system location path. If your files are stored in the file system with latin1 names, that means the requested file name must be encoded in latin1. So sure, you could latin1 encode each filename, and then urlencode the result. You are opening yourself up for a world of pain by not using consistent character encodings. It will hurt you eventually. Cheers Tom -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.