Sorry for the delay in responding to this, Gibi and Eric. Comments inline.
tl;dr: go with option a) On 08/16/2018 11:34 AM, Eric Fried wrote:
Thanks for this, gibi. TL;DR: a). I didn't look, but I'm pretty sure we're not caching allocations in the report client. Today, nobody outside of nova (specifically the resource tracker via the report client) is supposed to be mucking with instance allocations, right? And given the global lock in the resource tracker, it should be pretty difficult to race e.g. a resize and a delete in any meaningful way.
It's not a global (i.e. multi-node) lock. It's a semaphore for just that compute node. Migrations (mostly) involve more than one compute node, so the compute node semaphore is useless in that regard, thus the need to go with option a) and bail out if any change to a generation of any of the consumers involved in the migration operation.
So short term, IMO it is reasonable to treat any generation conflict as an error. No retries. Possible wrinkle on delete, where it should be a failure unless forced.
Agreed for all migration and deletion operations.
Long term, I also can't come up with any scenario where it would be appropriate to do a narrowly-focused GET+merge/replace+retry. But implementing the above short-term plan shouldn't prevent us from adding retries for individual scenarios later if we do uncover places where it makes sense.
Neither do I. Safety first, IMHO. Best, -jay __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev