hi, i have a feew questions concnering unicode and utf-8 handling and would appreciate any insights.
1) i got a xml document, utf-8, encoded and been trying to use etree to parse and then commit to mysql db. using etree, everything i've been extracting is return as a string except ascii char > 127, which come back as a unicode. using minidom on the same document, however, i get all unicode. is there a way to 'force' etree to use unicode? 2) i'm using mysql 5.x on * nix (mac, linux) and after much messing around, have things working, i.e. i have unicode from the (minidom) parser, set all mysql and mysqldb attributes, i get <str> back from mysql. is that expected behavior? #!/usr/bin/env python # -*- coding: UTF-8 -*- from xml.dom import minidom import MySQLdb import codecs from onix_model_01 import * db = MySQLdb.connect(host='localhost', user='root', passwd='', db='lsi', charset='utf8') cur = db.cursor() #cur.execute('SET NAMES utf8') #cur.execute('SET CHARACTER SET utf8') cur.execute('SET character_set_connection=utf8') cur.execute('SET character_set_server=utf8') cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') ... >>> print 'firstname, lastname types from xml: ', type(a.firstname), >>> type(a.lastname) >>>firstname, lastname types from xml: <type 'unicode'> <type 'unicode'> ... >>>cur.execute('''INSERT INTO encoding_test VALUES(null, %s, %s)''', >>>(a.firstname, a.lastname)) ... now i'm getting the results back from mysql >>>cur.execute('SELECT * FROM encoding_test') >>>query = cur.fetchall() >>>for q in query: ....print q, type(q[0]), type(q[1]), type(q[2]) ....print q[1], q[2] ....print repr(q[1]), repr(q[2]) >>>(24L, 'Bront\xc3\xab', 'Charlotte ') <type 'long'> <type 'str'> <type 'str'> >>> Brontë Charlotte >>>'Bront\xc3\xab' 'Charlotte ' so everything is coming back as it should, but i though i would get the sql results back as unicode not str ... what gives? finally, from a utf-8 perspective, is there any advantage using innodb over myisam? thx -- http://mail.python.org/mailman/listinfo/python-list