The archiving has issues since very long time [1], something like this [2] is expected to replace it.
The archiving just move trash to the other side of the desk, usually just permanently deleting everything what is deleted for more than 7 day is better for everyone. For now, maybe just wiping out the shadow tables and the existing nova-mange functionality is better choice. [3] [1] https://bugs.launchpad.net/nova/+bug/1305892 [2] https://blueprints.launchpad.net/nova/+spec/db-purge-engine [3] https://bugs.launchpad.net/nova/+bug/1426873 ----- Original Message ----- > From: "Mike Bayer" <mba...@redhat.com> > To: "OpenStack Development Mailing List (not for usage questions)" > <openstack-dev@lists.openstack.org> > Sent: Friday, March 13, 2015 12:29:55 AM > Subject: [openstack-dev] [nova] if by "archived" you mean, "wipes out your > tables completely", then sure, it works > fine > > Hello Nova - > > Not sure if I’m just staring at this for too long, or if > archive_deleted_rows_for_table() is just not something we ever use. > Because it looks like it’s really, really broken very disastrously, and I’m > wondering if I’m just missing something in front of me. > > Let’s look at what it does! > > First, archive_deleted_rows() calls it with a table name. These names are > taken by collecting every single table name from nova.db.sqlalchemy.models. > > Then, the function uses table reflection (that is, doesn’t look in the model > at all, just goes right to the database) to load the table definitions: > > table = Table(tablename, metadata, autoload=True) > shadow_tablename = _SHADOW_TABLE_PREFIX + tablename > rows_archived = 0 > try: > shadow_table = Table(shadow_tablename, metadata, autoload=True) > except NoSuchTableError: > # No corresponding shadow table; skip it. > return rows_archived > > this is pretty heavy handed and wasteful from an efficiency point of view, > and I’d like to fix this too, but let’s go with it. Now we have the two > tables. > > Then we do this: > > deleted_column = table.c.deleted > query_insert = sql.select([table], > deleted_column != deleted_column.default).\ > order_by(column).limit(max_rows) > query_delete = sql.select([column], > deleted_column != deleted_column.default).\ > order_by(column).limit(max_rows) > > We make some SELECT statements that we’re going to use to find “soft > deleted” rows, and these will be embedded into an INSERT > and a DELETE. It is trying to make a statement like “SELECT .. FROM > table WHERE deleted != <deleted_default>”, so that it finds rows where > “deleted” has been changed to something, e.g. the row was > soft deleted. > > But what’s the value of “deleted_default” ? Remember, all this > table knows is what the database just told us about it, because it only > uses reflection. Let’s see what the “deleted” column in a table like > instance_types looks like: > > MariaDB [nova]> show create table instance_types; > | instance_types | CREATE TABLE `instance_types` ( > `created_at` datetime DEFAULT NULL, > > … [omitted] ... > > `deleted` int(11) DEFAULT NULL, > ) > > The default that we get for this column is NULL. That is very interesting! > Because, if we look at the *Python-side value of deleted*, we see something > that is quite the opposite of NULL, e.g. a thing that is most certainly not > null: > > class SoftDeleteMixin(object): > deleted_at = Column(DateTime) > deleted = Column(Integer, default=0) > > See that zero there? That’s a ***Python-side default***. It is **not the > server default**!! You will **not** get it from reflection, the database has > no clue about it (oddly enough, this entire subject matter is fully > documented in SQLAlchemy’s documentation, and guess what, the docs are free! > Read them all you like, I won’t ask for a dime, no questions asked!). > > So, all of our INSERTS **will** put a zero, not NULL, into that column. > Let’s look in instance_types and see: > > MariaDB [nova]> select id, name, deleted from instance_types; > +----+-----------+---------+ > | id | name | deleted | > +----+-----------+---------+ > | 3 | m1.large | 0 | > | 1 | m1.medium | 0 | > | 7 | m1.micro | 0 | > | 6 | m1.nano | 0 | > | 5 | m1.small | 0 | > | 2 | m1.tiny | 0 | > | 4 | m1.xlarge | 0 | > +----+-----------+---------+ > 7 rows in set (0.00 sec) > > No NULLs. The value of non-deleted rows is zero. > > What does this all mean? > > It means, when this archival routine runs, it runs queries like this: > > INSERT INTO shadow_quota_usages SELECT quota_usages.created_at, > quota_usages.updated_at, quota_usages.deleted_at, quota_usages.id, > quota_usages.project_id, quota_usages.resource, quota_usages.in_use, > quota_usages.reserved, quota_usages.until_refresh, quota_usages.deleted, > quota_usages.user_id > FROM quota_usages > WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id > LIMIT ? OFFSET ? > 2015-03-12 17:01:01,218 INFO [sqlalchemy.engine.base.Engine] (7, 0) > 2015-03-12 17:01:01,219 INFO [sqlalchemy.engine.base.Engine] DELETE FROM > quota_usages WHERE quota_usages.id in (SELECT T1.id FROM (SELECT > quota_usages.id > FROM quota_usages > WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id > LIMIT ? OFFSET ?) as T1) > > The second query is to DELETE rows from a table like quota_usages based on > looking at rows where the “deleted” column is “NOT NULL”. Which means, all > of them! They are all zeros, not NULL! > > If we assume that all of our tables are filled up with zeroes for those > deleted columns, because that’s the default, this **wipes the whole table > clean**. > > How do the tests pass? Well the tests are in test_db_api->ArchiveTestCase, > and actually, they don’t. But they don’t fail every time, because the test > suite here runs with a database that is almost completely empty anyway, so > the broken archival routine doesn’t find too many rows to blow away except > for the rows in “instance_types”, which it only finds sometimes because the > tests are only running it with a small number of things to delete and the > order of the tables is non-deterministic. > > I’ve posted the bug report at https://bugs.launchpad.net/nova/+bug/1431571 > where I started out not knowing much about how this worked except that my > tests were failing, and slowly stumbled my way to come to this conclusion. A > patch is at https://review.openstack.org/#/c/164009/, where we look at the > actual Python-side default. However I’d recommend that we just hardcode the > zero here, since that’s how our soft-delete columns work. > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev