I am seeing different outcomes from simple requests against a common database when run from a freebsd machine and a win32 box.
The test script is ####################### import MySQLdb, sys print sys.version print MySQLdb.__version__ db=MySQLdb.connect(host='appx',db='sc_0',user='user',passwd='secret',use_unicode=True) cur=db.cursor() cur.execute('select * from sc_accomodation where id=31') data=cur.fetchall() for i,t in enumerate(data[0]): if isinstance(t,(str,unicode)): print i,repr(t) ####################### The table in question is charset='latin1', however the original owners put some special windows characters in eg 0x92 (a quote). in the windows version I see this kind of string in the output 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] 1.2.1_p2 ......... 14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of living in the midst of nature on the C\xf4te d\x92Or beach which stretches along the island\x92s northern coast.\r\n\r\nThe hotel\x92s 24 standard and 4 superior ro........ the freeBSD machine produces 2.4.3 (#2, Sep 7 2006, 09:34:29) [GCC 3.4.4 [FreeBSD] 20050518] ........... 14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of living in the midst of nature on the C\xf4te d\u2019Or beach which stretches along the island\u2019s northern coast.\r\n\r\nThe hotel\u2019s 24 standard and 4 superior rooms....... so the windows version seems to leave the \x92 as is and the freebsd version converts it to its correct value. This is already bad enough as I expected the outcomes to be the same, but given that the encoding of the database is wrong I expected some problems. However, if I don't have use_unicode=True in the above script I get back strings, but this time the difference is larger. windows 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] 1.2.1_p2 ....... 2 "C\xf4te d'Or\r\nPraslin" ....... unix 2.4.3 (#2, Sep 7 2006, 09:34:29) [GCC 3.4.4 [FreeBSD] 20050518] 1.2.1_p2 ...... 2 "C\xc3\xb4te d'Or\r\nPraslin" ....... so here the returned string appears to have been automatically converted to utf8. My questions are 1) why the difference in the unicode version? 2) why does the unix version convert to utf8? The database being common it seems it's either the underlying libraries or the compiled extension or python that causes these differences, but which? -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list