On Wed, Apr 25, 2018 at 9:14 AM, Dmitry Tantsur <[email protected]> wrote: > Hi all, > > I'd like to restart conversation on enabling node automated cleaning by > default for the undercloud. This process wipes partitioning tables > (optionally, all the data) from overcloud nodes each time they move to > "available" state (i.e. on initial enrolling and after each tear down). > > We have had it disabled for a few reasons: > - it was not possible to skip time-consuming wiping if data from disks > - the way our workflows used to work required going between manageable and > available steps several times > > However, having cleaning disabled has several issues: > - a configdrive left from a previous deployment may confuse cloud-init > - a bootable partition left from a previous deployment may take precedence > in some BIOS > - an UEFI boot partition left from a previous deployment is likely to > confuse UEFI firmware > - apparently ceph does not work correctly without cleaning (I'll defer to > the storage team to comment) > > For these reasons we don't recommend having cleaning disabled, and I propose > to re-enable it. > > It has the following drawbacks: > - The default workflow will require another node boot, thus becoming several > minutes longer (incl. the CI) > - It will no longer be possible to easily restore a deleted overcloud node.
I'm trending towards -1, for these exact reasons you list as drawbacks. There has been no shortage of occurrences of users who have ended up with accidentally deleted overclouds. These are usually caused by user error or unintended/unpredictable Heat operations. Until we have a way to guarantee that Heat will never delete a node, or Heat is entirely out of the picture for Ironic provisioning, then I'd prefer that we didn't enable automated cleaning by default. I believe we had done something with policy.json at one time to prevent node delete, but I don't recall if that protected from both user initiated actions and Heat actions. And even that was not enabled by default. IMO, we need to keep "safe" defaults. Even if it means manually documenting that you should clean to prevent the issues you point out above. The alternative is to have no way to recover deleted nodes by default. -- -- James Slagle -- __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
