+1. Let's do it. If we need to add some extra tests to protect against regressions, then so be it. I will help. I also think better use could be made of the notifications system. A properly defined topic namespace would go a long way to assist that.
-----Original Message----- From: openstack-bounces+dave.haynes=hp....@lists.launchpad.net [mailto:openstack-bounces+dave.haynes=hp....@lists.launchpad.net] On Behalf Of Pitucha, Stanislaw Izaak Sent: 02 October 2012 17:43 To: openstack@lists.launchpad.net Subject: [Openstack] Discussion / proposal: deleted column marker Hi all, I'd like to open a discussion on a topic that's been bugging me for a number of reasons - soft deletes (by that I mean marking rows with deleted=1 in the db) and related - actions audit. Some research and speculations first... To be honest I could not find any reason why the feature is there in the first place. Here's the commit that introduced the 'deleted' columns: https://github.com/openstack/nova/commit/ae6905b9f1ef97206ee3c8722cec3b26fc0 64f38 - unfortunately the description says only "Refactored orm to support atomic actions". So the guessing part starts here. These are the possible uses for soft-deletion of the database records that I could come up with: 1. safety net (recover data what was deleted by accident) 2. audit / log (preserve the information about past data) 3. some kind of micro-optimisation where update is more useful than deletion - be it speed or ease of handling foreign constraints (or not handling them straight away more likely) 4. ... no... that's all But I think there's a number of issues with that approach. First - what are the issues with the possible uses above. Then - issues that I can see otherwise. Point by point: 1. Soft-deletion probably makes some restoration possible, but I doubt there's much that could be done without full analysis of the situation. Mainly because the database is only about metainformation - the actual data users care about either goes away (ephemeral disks, memory, ...) or not (volumes, networks, ...) and is not recoverable. Since resources like ips and volumes can be just reused in other instances, not all recovery is possible anyway. Most hardcore fixes could be done by reinserting the original/reconstructed data just as easily as verifying what's safe to undelete. Both actions require looking at existing data and locking out information so it doesn't get reused while we're messing with the previous state. 2. Soft-deleted records are not great as a source of old information. This is connected to the previous point - some resources are just reused / rewritten instead of created and deleted. For example there's no record of what happens with old floating ips - the information gets overwritten when the IP is reassigned to the new instance, so the useful bits are gone. 3. This is the only thing I could come up with related to the commit message itself and the "support atomic actions" part. Maybe it was sometimes easier to mark something as deleted rather than managing and properly ordering deletes of a number of related entries. So with that out of the way, here's a number of issues related to soft-deletes that I run into myself: 4. Indexing all this data on a busy system is getting a bit silly. Unless you do your own cleanup of old entries, you will end up in a situation where looking up instances on a host actually looks through thousands of "deleted" rows even if only around 20 or so can be live and interesting. I know it's not a huge deal, but still an unnecessary cpu cycle burning. 5. Some things are just not possible to do in a safe and portable way at the moment. For example adding a new network and fixed IPs (there's a bug for that https://bugs.launchpad.net/nova/+bug/755138). I tried to fix this situation, but actually discovered that this is not possible to do using only sessions and with the 'deleted' column in place. There are ways to do it in a specific database (you can lock the whole table in mysql for example), but it's not portable then. The best you can do easily is limit the issue and hope that two inserts in different sessions won't happen at the same time. This could be easily done with an unique constraint if the 'deleted' column wasn't there. I haven't checked, but guess that anything that can be named (and should have a unique name) has the same problem - security groups, keys, instances, ... 6. The amount of data grows pretty quickly in a busy environment. It has to be cleaned up, but due to some constraints, it can't be done easily in one go. Cleanup triggers help here, but that's some additional work that needs maintenance during schema changes. Schema changes themselves get interesting when you're actually spending time converting mostly rows you really don't care about. There were also instances where migration over many steps failed for some reason on very old rows (virtual interface related, can't recall which step was it at the moment). 7. Not directly related, but I'll get back to that in the summary: owners of bigger deployments will either want to or are required to hold some record of various events and customer information. For example to handle security abuse reports, it would be great to know who owned a specific floating IP at a specific moment. So what's my point? Any use case I can find right now is not really improved by the current schema. It doesn't look like there are many benefits, but there are definitely some downsides. Does anyone know why soft-delete is still in place? Are there any reasons it can't / shouldn't be removed at this time? If it's possible to remove it, would you miss it? If the answers are all "no", my proposal for a future release is to: - Check if the `deleted` column can be removed. Make sure other rows are either disconnected (update ref to null) or just deleted as needed and all data is really being removed. - Add a system similar to notifications, but for auditable events - who did what with which resource at what time - in some semi-structured way that allows reviewing and summaries (basic information as separate columns + a description as a text message). I saw there was some blueprint for a cloudaudit api (https://blueprints.launchpad.net/openstack-common/+spec/cloud-audit-api), but not much happened with it for a year, so I'm assuming it's dead now. This would allow both proper cleanup of the data and retention of what's really necessary. It would also make it possible to use unique constraints where they're really needed (mainly IP descriptions) to prevent silly mistakes. Any additional external processing of deleted records would be easier to do because the database trigger could be just set on the delete action. Thoughts, comments and critique welcome :) Let me know what you think about these issues. Regards, Stanisław Pitucha Cloud Services Hewlett Packard _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp