Keith Hughitt wrote:
Hi all,
I ran into a problem recently when trying to add support for earlier
versions of Python (2.4 and 2.5) to some database related code which
uses MySQLdb, and was wondering if anyone has any suggestions.
With later versions of Python (2.6), inserting Unicode is very simple,
e.g.:
# -*- coding: utf-8 -*-
...
cursor.execute('''INSERT INTO `table` VALUES (0,
'Ångström'),...''')
When the same code is run on earlier versions, however, the results is
either garbled text (e.g. "Ã or "?" instead of "Å" in Python 2.5), or
an exception being thrown (Python 2.4):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
position 60: ordinal not in range(128)
So far I've tried a number of different things, including:
1. Using Unicode strings (e.g. u"\u212B")
2. Manually specifying the encoding using sys.setdefaultencoding
('utf-8')
3. Manually enabling Unicode support in MySQLdb
(use_unicode=False, charset = "utf8")
No, that's backwards. Try:
db = MySQLdb.connect(host="localhost",
use_unicode = True, charset = "utf8",
user=username, passwd=password, db=database)
"use_unicode" means that you want MySQLdb to accept and return
Unicode strings. "charset="utf8" means you want MySQLdb to
negotiate with the server to use UTF8 on the socket connecting
it to the database. This works fine in Python 2.4 and 2.5.
Returned strings will be in Unicode.
At the database end, you have to make sure that 1) MySQL was
built with Unicode support (it usually is), 2) the database
fields of interest are in Unicode. I suggest
ALTER DATABASE dbname DEFAULT CHARACTER SET utf8;
before doing any CREATE TABLE operations. Then strings
will be UTF8 in the database.
Read this: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
It all works quite well.
John Nagle
--
http://mail.python.org/mailman/listinfo/python-list