Hi All, Last week I tried to upgrade my production system and ran into https://bugs.launchpad.net/nova/+bug/1245502 (after having run the test upgrade in a clean grizzly which is insufficient). The fix for this was in head (now backported to stable/havana) and only involved one file 185_rename_unique_constraints. py which I thought I copied in, reverted the DB from a previous dump and then hit the same error (I'm not 100% sure I did what I though since I can't reproduce that failure in testing, but we'll get to that later).
Eventually I gave up on the production upgrade, reverted everything to pre upgrade state and moved back into my testing world, but using the dump of my production DB as the base rather than a clean and empty grizzly schema. The production and test sytems are both Ubuntu 12.04 using cloud archive packages and community puppet modules for management. The production system was originally installed with essex and updated for folsom and grizzly in turn. Including the shadow tables the DB has history for approx 500k instances. I've run into a fair number of issues in testing, but I'm dubious about my test environment sinc eth efirst failure in testing was in v183 which was sooner than I saw in production so clearly that had worked. Also after kludging my way through that v185 did apply properly (which may just be that I screwed up in my previous attempts). Most strangely though after hacking through as far as v208, having attempted a fix for some breakage in v209 it started failing way back in v187. I'd blame my last kludge for screwing something up,but it complains that table instance_groups exists where my last hack was deleting some rows from instance_actions_events. I'm stuck at this point since while instance_groups is empty I can't drop it due to existing constraints. But since the early testing steps do not match my experience with the production attempt I fear I may be chasing ghosts that may not even exist in production or worse missing issues that do. Here's a step by step of what I've attempted and brief results at each stage: ---------------------------------------------------------------------- Test upgrade 1) install Grizzly based controller node on OpenStack instance using production puppet config modulo IP addrs & hostnames 2) reload production DBs into test system 3) fix enpoint URLs to point back to test rather than production 4) stop all nova services: for i in nova-api nova-cert nova-conductor nova-consoleauth \ nova-novncproxy nova-scheduler nova-objectstore;do service $i \ stop;done 5) mysqldump --all-databases # or atleast the nova db 6) snapshot instance 7) run puppet test environment (changes cloudarchive source to havana, installs new packages and fixes configs). Expect to fail as bug fix isn't packaged yet. But expected to fail at v184 not 182! -> fails ending at at v182 2014-01-07 19:18:22.193 1463 TRACE nova.db.sqlalchemy.utils OperationalError: (OperationalError) (1050, "Table 'shadow_security_group_default_rules' already exists") '\nCREATE TABLE shadow_security_group_default_rules (\n\tcreated_at DATETIME, \n\tupdated_at DATETI ME, \n\tdeleted_at DATETIME, \n\tdeleted INTEGER(11), \n\tid INTEGER(11) NOT NULL AUTO_INCREMENT, \n\tprotocol VARCHAR(5), \n\tfrom_port INTEGER(11), \n\tto_port INTEGER(11), \n\tcidr VARCHAR(43), \n\tPRIMARY KEY (id)\n)ENGINE=InnoDB\n\n' () 2014-01-07 19:18:22.193 1463 TRACE nova.db.sqlalchemy.utils Command failed, please check log for more info 2014-01-07 19:18:22.197 1463 CRITICAL nova [-] Shadow table with name shadow_security_group_default_rules already exists. /usr/bin/nova-manage db version 182 8) stop all nova-services again 9) grab latest 185_rename_unique_constraints.py from git git log 185_rename_unique_constraints.py |head -5 2014-01-07 14:45:59 jon pts/15 commit c620cafb700ca195db0bd0ef9d62a0c9459bdc38 Author: Joshua Hesketh <j...@nitrotech.org> Date: Tue Oct 29 09:40:41 2013 +1100 Fix migration 185 to work with old fkey names 10) reload nova database as dumped at step 5 /usr/bin/nova-manage db version 161 11) nova-manage db sync still fails in same way. 11.1) mysql -e 'drop table shadow_security_group_default_rules;' nova don't care at all about the contents of this table so let's be brutal 11.2) try again: nova-manage db sync fails in new way (notably 185 succeeds) 2014-01-07 20:05:29.157 8499 CRITICAL nova [-] (IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`nova`.`block_device_mapping`, CONSTRAINT `block_device_mapping_instance_uuid_fkey` FOREIGN KEY (`instance_uuid`) REFERENCES `instances` (`uuid`))') 'INSERT INTO block_device_mapping (instance_uuid, source_type, destination_type, device_type, boot_index, image_id) VALUES (%s, %s, %s, %s, %s, %s)' ('0acda551-e1f8-4e29-a7b3-2c8fe9d2fb72', 'image', 'local', 'disk', -1, 'aee1d242-730f-431f-88c1-87630c0f07ba') root@test:~# nova-manage db version 185 sure enough there is no instance with uuid 0acda551-e1f8-4e29-a7b3-2c8fe9d2fb72 but there was (it's now in shadow_instances) also the block_device_mapping this is trying to insert into is currently a shadow_block_device_mapping. 11.3) OK I don't really care about that table either, let's rever and drop it along with the shadow_security_group_default_rules: root@test:~# mysql nova < nova.sql root@test:~# mysql -e 'drop table shadow_security_group_default_rules;drop table shadow_block_device_mapping;' nova root@test:~# nova-manage db sync 11.4) that didn't work becaus eit needs the table let's try just clearing it then: root@test:~# mysql nova < nova.sql root@test:~# mysql -e 'drop table shadow_security_group_default_rules;TRUNCATE TABLE shadow_block_device_mapping ;' nova root@test-nimbus:~# nova-manage db sync Failure, but progress: Command failed, please check log for more info 2014-01-07 21:41:05.407 28650 CRITICAL nova [-] (IntegrityError) (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`nova`.`instance_actions_events`, CONSTRAINT `instance_actions_events_ibfk_1` FOREIGN KEY (`action_id`) REFERENCES `instance_actions` (`id`))') 'DELETE FROM instance_actions WHERE instance_actions.instance_uuid NOT IN (SELECT instances.uuid \nFROM instances)' () root@test:~# nova-manage db version 208 11.5) rewind and delete all the instance_actions_events that reference the instance actions this wants to delete root@test:~# mysql nova < nova.sql root@test:~# mysql -e 'drop table shadow_security_group_default_rules;TRUNCATE TABLE shadow_block_device_mapping ;DELETE FROM instance_actions_events WHERE action_id IN (SELECT id FROM instance_actions WHERE instance_actions.instance_uuid NOT IN (SELECT instances.uuid FROM instances));' nova root@test-nimbus:~# nova-manage db sync insanely this is now failing earlier: root@test-nimbus:~# nova-manage db sync Command failed, please check log for more info 2014-01-07 22:09:00.229 1898 CRITICAL nova [-] (OperationalError) (1050, "Table 'instance_groups' already exists") '\nCREATE TABLE instance_groups (\n\tcreated_at DATETIME, \n\tupdated_at DATETIME, \n\tdeleted_at DATETIME, \n\tdeleted INTEGER, \n\tid INTEGER NOT NULL AUTO_INCREMENT, \n\tuser_id VARCHAR(255), \n\tproject_id VARCHAR(255), \n\tuuid VARCHAR(36) NOT NULL, \n\tname VARCHAR(255), \n\tPRIMARY KEY (id), \n\tCONSTRAINT uniq_instance_groups0uuid0deleted UNIQUE (uuid, deleted)\n)ENGINE=InnoDB CHARSET=utf8\n\n' () root@test-nimbus:~# nova-manage db version 186 Since this is all in test and virtualized I can try any weird thing any one might suggest without repercussion, but I'm fairly out of ideas on my own. I'm particularly interested in seeing if anyone can spot a flaw in the initial set up of the test environment that might make it diverge from my production system in ways I haven't seen. Thanks, -Jon _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack