Thank you Martin and John, for you excellent explanations. I think I understand the unicode basic principles, what confuses me is the usage different applications make out of it.
For example, I got that EN DASH out of a web page which states <?xml version="1.0" encoding="ISO-8859-1"?> at the beggining. That's why I did go for that encoding. But if the browser can properly decode that character using that encoding, how come other applications can't? I might need to go for python's htmllib to avoid this, not sure. But if I don't, if I only want to just copy and paste some web pages text contents into a tkinter Text widget, what should I do to succesfully make every single character go all the way from the widget and out of tkinter into a python string variable? How did my browser knew it should render an EN DASH instead of a circumflexed lowercase u? This is the webpage in case you are interested, 4th line of first paragraph, there is the EN DASH: http://www.pagina12.com.ar/diario/elmundo/subnotas/102453-32303-2008-04-15.html Thanks a lot. On Wed, 16 Apr 2008 10:27:26 -0700 John Nagle <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > Hello guys & girls > > > > I'm pasting an "en dash" > > (http://www.fileformat.info/info/unicode/char/2013/index.htm) character into > > a tkinter widget, expecting it to be properly stored into a MySQL database. > > > > I'm getting this error: > > ***************************************************************************** > > Exception in Tkinter callback Traceback (most recent call last): File > > "C:\Python24\lib\lib-tk\Tkinter.py", line 1345, in __call__ return > > self.func(*args) File "chupadato.py", line 25, in guardar cursor.execute(a) > > File "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in > > execute > > query = query.encode(charset) UnicodeEncodeError: 'latin-1' codec can't > > encode character u'\u2013' in position 52: ordinal not in range(256) > > ***************************************************************************** > > Python and MySQL will do end to end Unicode quite well. But that's > not what you're doing. How did "latin-1" get involved? > > If you want to use MySQL in Unicode, there are several things to be done. > First, the connection has to be opened in Unicode: > > db = MySQLdb.connect(host="localhost", > use_unicode = True, charset = "utf8", > user=username, passwd=password, db=database) > > Yes, you have to specify both "use_unicode=True", which tells the client > to talk Unicode, and set "charset" to"utf8", which tells the server > to talk Unicode encoded as UTF-8". > > Then the tables need to be in Unicode. In SQL, > > ALTER DATABASE dbname DEFAULT CHARACTER SET utf8; > > before creating the tables. You can also change the types of > existing tables and even individual fields to utf8, if necessary. > (This takes time for big tables; the table is copied. But it works.) > > It's possible to get MySQL to store character sets other than > ASCII or Unicode; you can store data in "latin1" if you want. This > might make sense if, for example, all your data is in French or German, > which maps well to "latin1". Unless that's your situation, go with > either all-ASCII or all-Unicode. It's less confusing. > > John Nagle > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list