Thank you Martin and John, for you excellent explanations.

I think I understand the unicode basic principles, what confuses me is the 
usage different applications make out of it.

For example, I got that EN DASH out of a web page which states <?xml 
version="1.0" encoding="ISO-8859-1"?> at the beggining. That's why I did go for 
that encoding. But if the browser can properly decode that character using that 
encoding, how come other applications can't?

I might need to go for python's htmllib to avoid this, not sure. But if I 
don't, if I only want to just copy and paste some web pages text contents into 
a tkinter Text widget, what should I do to succesfully make every single 
character go all the way from the widget and out of tkinter into a python 
string variable? How did my browser knew it should render an EN DASH instead of 
a circumflexed lowercase u?

This is the webpage in case you are interested, 4th line of first paragraph, 
there is the EN DASH: 
http://www.pagina12.com.ar/diario/elmundo/subnotas/102453-32303-2008-04-15.html

Thanks a lot.


On Wed, 16 Apr 2008 10:27:26 -0700
John Nagle <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote:
> > Hello guys & girls
> > 
> > I'm pasting an "en dash"
> > (http://www.fileformat.info/info/unicode/char/2013/index.htm) character into
> > a tkinter widget, expecting it to be properly stored into a MySQL database.
> > 
> > I'm getting this error: 
> > *****************************************************************************
> >  Exception in Tkinter callback Traceback (most recent call last): File
> > "C:\Python24\lib\lib-tk\Tkinter.py", line 1345, in __call__ return
> > self.func(*args) File "chupadato.py", line 25, in guardar cursor.execute(a) 
> > File "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in 
> > execute
> >  query = query.encode(charset) UnicodeEncodeError: 'latin-1' codec can't
> > encode character u'\u2013' in position 52: ordinal not in range(256) 
> > *****************************************************************************
> 
>      Python and MySQL will do end to end Unicode quite well.  But that's
> not what you're doing.  How did "latin-1" get involved?
> 
>      If you want to use MySQL in Unicode, there are several things to be done.
> First, the connection has to be opened in Unicode:
> 
>       db = MySQLdb.connect(host="localhost",
>               use_unicode = True, charset = "utf8",
>               user=username, passwd=password, db=database)
> 
> Yes, you have to specify both "use_unicode=True", which tells the client
> to talk Unicode, and set "charset" to"utf8", which tells the server
> to talk Unicode encoded as UTF-8".
> 
> Then the tables need to be in Unicode.  In SQL,
> 
>      ALTER DATABASE dbname DEFAULT CHARACTER SET utf8;
> 
> before creating the tables.  You can also change the types of
> existing tables and even individual fields to utf8, if necessary.
> (This takes time for big tables; the table is copied.  But it works.)
> 
>      It's possible to get MySQL to store character sets other than
> ASCII or Unicode; you can store data in "latin1" if you want. This
> might make sense if, for example, all your data is in French or German,
> which maps well to "latin1".  Unless that's your situation, go with
> either all-ASCII or all-Unicode.  It's less confusing.
> 
>                                       John Nagle
> -- 
> http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to