Unicode chr(150) en dash
Hello guys & girls I'm pasting an "en dash" (http://www.fileformat.info/info/unicode/char/2013/index.htm) character into a tkinter widget, expecting it to be properly stored into a MySQL database. I'm getting this error: * Exception in Tkinter callback Traceback (most recent call last): File "C:\Python24\lib\lib-tk\Tkinter.py", line 1345, in __call__ return self.func(*args) File "chupadato.py", line 25, in guardar cursor.execute(a) File "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in execute query = query.encode(charset) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 52: ordinal not in range(256) * Variable 'a' in 'cursor.execute(a)' contains a proper SQL statement, which includes the 'en dash' character just pasted into the Text widget. When I type 'print chr(150)' into a python command line window I get a LATIN SMALL LETTER U WITH CIRCUMFLEX (http://www.fileformat.info/info/unicode/char/00fb/index.htm), but when I do so into a IDLE window I get a hypen (chr(45). Funny thing I quite don't understand is, when I do paste that 'en dash' character into a python command window (I'm using MSWindows), the character is conveniently converted to chr(45) which is a hyphen (I wouldn't mind if I could do that by coding, I mean 'adapting' by visual similarity). I tried searching "en dash" or even "dash" into the encodings folder of python Lib, but I couldn't find anything. I'm using Windows Vista english, Python 2.4, latest MySQLdb. Default encoding changed (while testing) into "iso-8859-1". Thanks for any help. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode chr(150) en dash
Thank you Martin and John, for you excellent explanations. I think I understand the unicode basic principles, what confuses me is the usage different applications make out of it. For example, I got that EN DASH out of a web page which states at the beggining. That's why I did go for that encoding. But if the browser can properly decode that character using that encoding, how come other applications can't? I might need to go for python's htmllib to avoid this, not sure. But if I don't, if I only want to just copy and paste some web pages text contents into a tkinter Text widget, what should I do to succesfully make every single character go all the way from the widget and out of tkinter into a python string variable? How did my browser knew it should render an EN DASH instead of a circumflexed lowercase u? This is the webpage in case you are interested, 4th line of first paragraph, there is the EN DASH: http://www.pagina12.com.ar/diario/elmundo/subnotas/102453-32303-2008-04-15.html Thanks a lot. On Wed, 16 Apr 2008 10:27:26 -0700 John Nagle <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > Hello guys & girls > > > > I'm pasting an "en dash" > > (http://www.fileformat.info/info/unicode/char/2013/index.htm) character into > > a tkinter widget, expecting it to be properly stored into a MySQL database. > > > > I'm getting this error: > > * > > Exception in Tkinter callback Traceback (most recent call last): File > > "C:\Python24\lib\lib-tk\Tkinter.py", line 1345, in __call__ return > > self.func(*args) File "chupadato.py", line 25, in guardar cursor.execute(a) > > File "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in > > execute > > query = query.encode(charset) UnicodeEncodeError: 'latin-1' codec can't > > encode character u'\u2013' in position 52: ordinal not in range(256) > > * > > Python and MySQL will do end to end Unicode quite well. But that's > not what you're doing. How did "latin-1" get involved? > > If you want to use MySQL in Unicode, there are several things to be done. > First, the connection has to be opened in Unicode: > > db = MySQLdb.connect(host="localhost", > use_unicode = True, charset = "utf8", > user=username, passwd=password, db=database) > > Yes, you have to specify both "use_unicode=True", which tells the client > to talk Unicode, and set "charset" to"utf8", which tells the server > to talk Unicode encoded as UTF-8". > > Then the tables need to be in Unicode. In SQL, > > ALTER DATABASE dbname DEFAULT CHARACTER SET utf8; > > before creating the tables. You can also change the types of > existing tables and even individual fields to utf8, if necessary. > (This takes time for big tables; the table is copied. But it works.) > > It's possible to get MySQL to store character sets other than > ASCII or Unicode; you can store data in "latin1" if you want. This > might make sense if, for example, all your data is in French or German, > which maps well to "latin1". Unless that's your situation, go with > either all-ASCII or all-Unicode. It's less confusing. > > John Nagle > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
MySQL hardcoding?
I've got this error (see the path in last line) db=MySQLdb.connect(host='localhost',use_unicode = True, charset = "Windows-1251",user='root',passwd='12',db='articulos') File "C:\Python24\Lib\site-packages\MySQLdb\__init__.py", line 74, in Connect return Connection(*args, **kwargs) File "C:\Python24\lib\site-packages\MySQLdb\connections.py", line 198, in __init__ self.set_character_set(charset) File "C:\Python24\lib\site-packages\MySQLdb\connections.py", line 277, in set_character_set super(Connection, self).set_character_set(charset) OperationalError: (2019, "Can't initialize character set Windows-1251 (path: C:\\mysqlshare\\charsets\\)") The truth of the matter is, MySQL is not installed in that path, but into Program Files. I don't know where the hardcoding is, but it is certainly somewhere. Except MySQL is reporting a wrong installation path. I haven't found any other topic in the list about this problem. I'm using Python 2.4 and latest MySQLdb. Have anyone heard of this issue and how to fix it? Thanks a lot. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode chr(150) en dash
On Thu, 17 Apr 2008 20:57:21 -0700 (PDT) hdante <[EMAIL PROTECTED]> wrote: > Don't use old 8-bit encodings. Use UTF-8. Yes, I'll try. But is a problem when I only want to read, not that I'm trying to write or create the content. To blame I suppose is Microsoft's commercial success. They won't adhere to standars if that doesn't make sense for their business. I'll change the approach trying to filter the contents with htmllib and mapping on my own those troubling characters. Anyway this has been a very instructive dive into unicode for me, I've got things cleared up now. Thanks to everyone for the great help. -- http://mail.python.org/mailman/listinfo/python-list
Re: MySQL hardcoding?
On Thu, 17 Apr 2008 22:00:21 GMT John Machin <[EMAIL PROTECTED]> wrote: > The empirical evidence from other recent postings is that you are > mucking about with Spanish-language newspaper "articulos" on the web ... > so why charset = "Windows-1251", which is Cyrillic (i.e. Russian etc)?? > Perhaps you mean 1252 which is Microsoft's latin1 with extras. > > HTH, > John > -- > http://mail.python.org/mailman/listinfo/python-list Yes John, thanks. The only problem is MySQL doesn't include a cp1252 or Windows-1252 or ansi. I'm trying to find my way with different approaches. But there is certainly a problem if an application goes to the wrong folder to get data as MySQL seems to be doing. Thanks. -- http://mail.python.org/mailman/listinfo/python-list