Re: [openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

Jay Pipes Mon, 27 Aug 2018 07:28:51 -0700

On 08/22/2018 08:55 AM, Balázs Gibizer wrote:

On Fri, Aug 17, 2018 at 5:40 PM, Eric Fried <openst...@fried.cc> wrote:
gibi-
- On migration, when we transfer the allocations in eitherdirection, a
 conflict means someone managed to resize (or otherwise change
allocations?) since the last time we pulled data. Given the globallock
 in the report client, this should have been tough to do. If it does
 happen, I would think any retry would need to be done all the way back
at the claim, which I imagine is higher up than we should go. Soagain,
 I think we should fail the migration and make the user retry.
 Do we want to fail the whole migration or just the migration step (e.g.
 confirm, revert)?
 The later means that failure during confirm or revert would put the
instance back to VERIFY_RESIZE. While the former would mean that incase of conflict at confirm we try an automatic revert. But for aconflict at
 revert we can only put the instance to ERROR state.
This again should be "impossible" to come across. What would the
behavior be if we hit, say, ValueError in this spot?
I might not totally follow you. I see two options to choose from for therevert case:
a) Allocation manipulation error during revert of a migration causesthat instance goes to ERROR. -> end user cannot retry the revert theinstance needs to be deleted.

I would say this one is correct, but not because the user did anythingwrong. Rather, *something inside Nova failed* because technically Novashouldn't allow resource allocation to change while a server is inCONFIRMING_RESIZE task state. If we didn't make the server go to anERROR state, I'm afraid we'd have no indication anywhere that thisimproper situation ever happened and we'd end up hiding some seriousdata corruption bugs.

b) Allocation manipulation error during revert of a migration causesthat the instance goes back to VERIFY_RESIZE state. -> end user canretry the revert via the API.
I see three options to choose from for the confirm case:
a) Allocation manipulation error during confirm of a migration causesthat instance goes to ERROR. -> end user cannot retry the confirm theinstance needs to be deleted.


For the same reasons outlined above, I think this is the only safe option.

Best,
-jay

b) Allocation manipulation error during confirm of a migration causesthat the instance goes back to VERIFY_RESIZE state. -> end user canretry the confirm via the API.
c) Allocation manipulation error during confirm of a migration causesthat nova automatically tries to revert the migration. (For failureduring this revert the same options available as for the generic revertcase, see above)
We also need to consider live migration. It is similar in a sense thatit also use move_allocations. But it is different as the end userdoesn't explicitly confirm or revert a live migration.
I'm looking for opinions about which option we should take in each cases.

gibi
-efried
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

Reply via email to