Reviewed: https://review.opendev.org/706331 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=001f3a7bfe6b2c8af135daff8e154a708792070e Submitter: Zuul Branch: master
commit 001f3a7bfe6b2c8af135daff8e154a708792070e Author: Dan Smith <[email protected]> Date: Thu Feb 6 09:21:38 2020 -0800 Fix instance.hidden migration and querying It was discovered that default= on a Column definition in a schema migration will attempt to update the table with the provided value, instead of just translating on read, which is often the assumption. The Instance.hidden=False change introduced in Train[1] used such a default on the new column, which caused at least one real-world deployment to time out rewriting the instances table due to size. Apparently SQLAlchemy-migrate also does not consider such a timeout to be a failure and proceeds on. The end result is that some existing instances in the database have hidden=NULL values, and the DB model layer does not convert those to hidden=False when we read/query them, causing those instances to be excluded from the API list view. This change alters the 399 schema migration to remove the default=False specification. This does not actually change the schema, but /will/ prevent users who have not yet upgraded to Train from rewriting the table. This change also makes the instance_get_all_by_filters() code handle hidden specially, including false and NULL in a query for non-hidden instances. A future change should add a developer trap test to ensure that future migrations do not add default= values to new columns to avoid this situation in the future. [1] Iaffb27bd8c562ba120047c04bb62619c0864f594 Change-Id: Iace3f653b42c20887b40ee0105c8e9a4edeff1f7 Closes-Bug: #1862205 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1862205 Title: Instances not visible when hidden=NULL Status in OpenStack Compute (nova): Fix Released Bug description: During an upgrade of a cloud from Stein to Train, there is a migration which adds the `hidden` field to the database. In that migration, it was assumed that it does not backfill all of the columns. However, upon verifying, it actually does backfill all columns and the order of operations *seems* to be: 1. Create new column for `hidden` 2. Update database migration version 3. Start backfilling all existing instances with hidden=0 In my case, the migration did create the column but failed to backfill all existing instances because of the large number of instances. However, running the migrations again seems to simply continue and not block on that migration, but leaving all columns with hidden=NULL. ==================== Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:06:13.566 10596 INFO migrate.versioning.api [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] 398 -> 399... Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.129 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1180, 'Got error 90 "Message too long" during COMMIT') Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.132 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This Connection is closed Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.930 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 398 -> 399... Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 399 -> 400... Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 400 -> 401... Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 401 -> 402... Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.244 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done ==================== This issue is two-part, because now it seems that Nova does not assume that hidden=NULL means that the instance is not hidden and no longer displays the instance via API or any other operations. The "very silly" confirmation of this behaviour of backfilling was my attempt at patching things up resulted in the same error: ================== MariaDB [nova]> update instances set hidden=0; ERROR 1180 (HY000): Got error 90 "Message too long" during COMMIT =================== Ideally, Nova shouldn't try and backfill values and it should treat hidden=NULL as 0. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1862205/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

