Sorry for the delay in responding to this, Gibi and Eric. Comments inline.

tl;dr: go with option a)

On 08/16/2018 11:34 AM, Eric Fried wrote:
Thanks for this, gibi.

TL;DR: a).

I didn't look, but I'm pretty sure we're not caching allocations in the
report client. Today, nobody outside of nova (specifically the resource
tracker via the report client) is supposed to be mucking with instance
allocations, right? And given the global lock in the resource tracker,
it should be pretty difficult to race e.g. a resize and a delete in any
meaningful way.

It's not a global (i.e. multi-node) lock. It's a semaphore for just that compute node. Migrations (mostly) involve more than one compute node, so the compute node semaphore is useless in that regard, thus the need to go with option a) and bail out if any change to a generation of any of the consumers involved in the migration operation.

So short term, IMO it is reasonable to treat any generation conflict
as an error. No retries. Possible wrinkle on delete, where it should
be a failure unless forced.

Agreed for all migration and deletion operations.

Long term, I also can't come up with any scenario where it would be
appropriate to do a narrowly-focused GET+merge/replace+retry. But
implementing the above short-term plan shouldn't prevent us from adding
retries for individual scenarios later if we do uncover places where it
makes sense.

Neither do I. Safety first, IMHO.

Best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to