Excerpts from Ben Nemec's message of 2014-03-10 13:02:47 -0700: > On 2014-03-10 12:24, Chris Friesen wrote: > > Hi, > > > > I'm using havana and recent we ran into an issue with heat related to > > character sets. > > > > In heat/db/sqlalchemy/api.py in user_creds_get() we call > > _decrypt() on an encrypted password stored in the database and then > > try to convert the result to unicode. Today we hit a case where this > > errored out with the following message: > > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xf2 in position 0: > > invalid continuation byte > > > > We're using postgres and currently all the databases are using > > SQL_ASCII as the charset. > > > > I see that in icehouse heat will complain if you're using mysql and > > not using UTF-8. There doesn't seem to be any checks for other > > databases though. > > > > It looks like devstack creates most databases as UTF-8 but uses latin1 > > for nova/nova_bm/nova_cell. I assume this is because nova expects to > > migrate the db to UTF-8 later. Given that those migrations specify a > > character set only for mysql, when using postgres should we explicitly > > default to UTF-8 for everything? > > > > Thanks, > > Chris > > We just had a discussion about this in #openstack-oslo too. See the > discussion starting at 2014-03-10T16:32:26 > http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2014-03-10.log > > While it seems Heat does require utf8 (or at least matching character > sets) across all tables, I'm not sure the current solution is good. It > seems like we may want a migration to help with this for anyone who > might already have mismatched tables. There's a lot of overlap between > that discussion and how to handle Postgres with this, I think. > > I don't have a definite answer for any of this yet but I think it is > something we need to figure out, so hopefully we can get some input from > people who know more about the encoding requirements of the Heat and > other projects' databases.
Doing a migration for this is haphazard. MySQL has _four_ places which govern character set of any operation. server charset client charset db charset table charset There are also per-column charsets but those basically trump all the others. But MySQL can't possibly know what you _meant_ when you were inserting data. So, if you _assumed_ that the database was UTF-8, and inserted UTF-8 with all of those things accidentally set for latin1, then you will have UTF-8 in your db, but MySQL will think it is latin1. So if you now try to alter the table to UTF-8, all of your high-byte strings will be double-encoded. It unfortunately takes analysis to determine what the course of action is. That is why we added the check to Heat, so that it would complain very early if your tables and/or server configuration were going to disagree with the assumptions. It would likely be best for there to be a more generally available solution for stopping and complaining loudly when a badly configured database is encountered. _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev