Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

Andrew Laski Wed, 26 Nov 2014 13:34:33 -0800


On 11/26/2014 03:39 PM, Belmiro Moreira wrote:

Hi,
my experience is that "soft delete" is important to keep record ofdeleted instances and its characteristics.In fact in my organization we are obliged to keep these records forseveral months.However, it would be nice that after few months we were able to purgethe DB with a nova tool.

I think that any solution for this needs to keep the deleted dataavailable in some form. Is it important for you that the deleted databe in the same table as non deleted rows, or could they be moved intoanother table? And would it matter if the format of the row changedduring a move?

In the particular case of this cells table my major concern is thathaving a "delete" field maybe means that top and children databasesneed to be synchronized. Looking into the current cells design havingduplicated information in different databases is one of the main issues.

Agreed. I think this can be solved by ensuring that instance deletionis only about setting the deleted column in the cell instance table.The instance mapping being deleted makes no statement about whether ornot the instance is deleted, though it would be a bug to delete itbefore the instance was deleted.


Belmiro

On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski<andrew.la...@rackspace.com <mailto:andrew.la...@rackspace.com>> wrote:



    On 11/25/2014 11:54 AM, Solly Ross wrote:

            I can't comment on other projects, but Nova definitely
            needs the soft
            delete in the main nova database. Perhaps not for every
            table, but
            there is definitely code in the code base which uses it
            right now.
            Search for read_deleted=True if you're curious.

        Just to save people a bit of time, it's actually
        `read_deleted='yes'`
        or `read_deleted="yes"` for many cases.

        Just to give people a quick overview:

        A cursory glance (no pun intended) seems to indicate that
        quite a few of
        these are reading potentially deleted flavors.  For this case,
        it makes
        sense to keep things in one table (as we do).

        There are also quite a few that seem to be making sure deleted
        "things"
        are properly cleaned up.  In this case, 'deleted' acts as a
        "CLEANUP"
        state, so it makes just as much sense to keep the deleted rows
        in a
        separate table.

            For this case in particular, the concern is that operators
            might need
            to find where an instance was running once it is deleted
            to be able to
            diagnose issues reported by users. I think that's a valid
            use case of
            this particular data.

                    This is a new database, so its our big chance to
                    get this right. So,
                    ideas welcome...

                    Some initial proposals:

                    - we do what we do in the current nova database --
                    we have a deleted
                    column, and we set it to true when we delete the
                    instance.

                    - we have shadow tables and we move delete rows to
                    a shadow table.


                Both approaches are viable, but as the soft-delete
                column is widespread, it
                would be thorny for this new app to use some totally
                different scheme,
                unless the notion is that all schemes should move to
                the audit table
                approach (which I wouldn’t mind, but it would be a big
                job).    FTR, the
                audit table approach is usually what I prefer for
                greenfield development,
                if all that’s needed is forensic capabilities at the
                database inspection
                level, and not as much active GUI-based “deleted”
                flags.   That is, if you
                really don’t need to query the history tables very
                often except when
                debugging an issue offline.  The reason its preferable
                is because those
                rows are still “deleted” from your main table, and
                they don’t get in the
                way of querying.   But if you need to refer to these
                history rows in
                context of the application, that means you need to get
                them mapped in such
                a way that they behave like the primary rows, which
                overall is a more
                difficult approach than just using the soft delete column.

        I think it does really come down here to how you intend to use
        the soft-delete
        functionality in Cells.  If you just are using it to debug or
        audit, then I think
        the right way to go would be either the audit table
        (potentially can store more
        lifecycle data, but could end up taking up more space) or a
        separate shadow
        table (takes up less space).

        If you are going to use the soft delete for application
        functionality, I would
        consider differentiating between "deleted" and "we still have
        things left to
        clean up", since this seems to be mixing two different
        requirements into one.


    The case that spawned this discussion is one where deleted rows
    are not needed for application functionality.  So I'm going to
    update the proposed schema there to not include a 'deleted'
    column. Fortunately there's still some time before the question of
    how to handle deletes needs to be fully sorted out.


                That said, I have a lot of plans to send improvements
                down the way of the
                existing approach of “soft delete column” into
                projects, from the querying
                POV, so that criteria to filter out soft delete can be
                done in a much more
                robust fashion (see
                
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
                But this is still more complex and less performant
                than if the rows are
                just gone totally, off in a history table somewhere
                (again, provided you
                really don’t need to look at those history rows in an
                application context,
                otherwise it gets all complicated again).

            Interesting. I hadn't seen consistency between the two
            databases as
            trumping doing this less horribly, but it sounds like its
            more of a
            thing that I thought.

            Thanks,
            Michael

            --
            Rackspace Australia

            _______________________________________________
            OpenStack-dev mailing list
            OpenStack-dev@lists.openstack.org
            <mailto:OpenStack-dev@lists.openstack.org>
            http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

        _______________________________________________
        OpenStack-dev mailing list
        OpenStack-dev@lists.openstack.org
        <mailto:OpenStack-dev@lists.openstack.org>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



    _______________________________________________
    OpenStack-dev mailing list
    OpenStack-dev@lists.openstack.org
    <mailto:OpenStack-dev@lists.openstack.org>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

Reply via email to