On 7/15/05, Bruce Dembecki <[EMAIL PROTECTED]> wrote:
> This process has worked for us taking our latin1 4.0 databases and
> turning them into utf8 4.1 databases. UTF8 data we had already
put in
> our 4.0 database despite it's latin1 encoding was correctly exported
> out of 4.0 and correctly converted on it's way in to 4.1
Wow! Really? This part amazes me.
So the MySQL importing process seems to do the converting of the
special characters into the newly-defined encoding format? (from
latin1 to utf-8 in my case)
See - we do webhosting for clients around the world, and right now our
default-encoded MySQL 4.0 databases have Swedish and Hebrew characters
in them.
I'm concerned that if I dumped them as latin1/default in 4.0, but then
imported as utf-8 in 4.1 that the non-ASCII characters would get
imported as the wrong encoding.
(Assuming, yes, that I would set our new 4.1 databases to do ALL utf-8
in the /etc/my.cnf and gladly take the small performance/size hit.)
OK, this is where we are at, and what it does for us... I can't speak
to where you are at and what it will do for you, however, we have this:
4.0 data everything is encoded by MySQL as Latin1. However the web
app and the JDBC both speak UTF8, so the data being given to the
database is UTF8 data which the database then stores in it's Latin1
table. When the JDBC extracts the data from 4.0 and gives it to the
web app it displays as we would expect in the language it was
entered... Was very cool and worked great for us.
Then we went to 4.1... just changed the Binary, expecting things to
work the way they did... wrong. Suddenly all our databases with
strong non latin1 character sets were in trouble. Our Chinese boards
were a mess, as were our German boards... even our Australian board
which should have been fine was a mess because one of the areas was
titled "Australia Café!" and the word was in there a lot. Took about
three months and many hours with MYSQL folks to figure it out. The
upshot however was if you went about the conversion by doing a dump
and an import, and made sure to tell the import to treat the specific
columns as UTF8, everything arrived in fine shape. (There was one set
of data loss because it turned out one of the Chinese boards was set
to Big8 and the JDBC was set to UTF8 and the database Latin1... Get's
to a point where there's only so much encoding it can take)
That said. the reality is if your application is anything like most
the majority of the columns are for the application to work with,
very few actually deal with text that needs to be encoded. The rest
are time stamps, id#s, references to help you track the data...
there's likely not much if any benefit to having them UTF8 encoded.
For us we have 80 databases with almost that many tables with many
columns.. figuring out one by one which way is up on everything
represents a challenge. We're going to go back and do it now. But if
I had one or two databases, I'd start by leaving the database the
default (Latin1) and doing the database create and importing the .sql
files (which creates the tables), then go through and run ALTER TABLE
a few times and set the specific columns to UTF8... do the import the
same as in my script... Latin1 text encodes in UTF8 to... Latin1
text, so you shouldn't have any problem, but I'm guessing here of
course. Dump your data to disk and bring it up on another server, try
different things and fidn out what works best for you before you do
the real import.
It SHOULD be pretty painless if you manage the import and use the
mysql tools with the appropriate flag set for character sets on the
way in... the only thing that will be painful about the process is if
you just upgrade the binaries and expect the thing to work the way it
did before, then you'll have some pain :-)
Best Regards, Bruce
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]