On 9/11/10 4:45 PM, Stef Mientki wrote:
  On 11-09-2010 21:11, Robert Kern wrote:
SQLite internally stores its strings as UTF-8 or UTF-16 encoded Unicode. So 
it's not clear what
you mean when you say the database is "windows-1252". Can you be more specific?
I doubt that, but I'm not sure ...

From the documentation, it looks like SQLite does not attempt to validate the input as UTF-8 encoded, so it is possible that someone pushed in raw bytes. See "Support for UTF-8 and UTF-16" in the following page:

  http://www.sqlite.org/version3.html

For some databases written by other programs and
written with Python, with
     cursor = self.conn.cursor ()
     self.conn.text_factory = str

Can only be read back with with text_factory = str
then the resulting string columns contains normal strings with windows 1252 
coding, like character 0xC3

You can probably use

  self.conn.text_factory = lambda x: x.decode('windows-1252')

to read the data, though I've never tried to use that API myself.

You will need to write a program yourself that opens one connection to your existing database for reading and another connection to another database (using the defaults) for writing. Then iterate over your tables and copy data from one database to the other.

You may also be able to simply dump the database to a text file using "sqlite3 bad-database.db .dump > bad-sql.sql", read the text file into Python as a string, decode it from windows-1252 to unicode and then encode it as utf-8 and write it back out. Then use "sqlite3 good-database.db .read good-sql.sql" to create the new database. I've never tried such a thing, so it may not work.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to