[ https://issues.apache.org/jira/browse/CLOUDSTACK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Vazquez updated CLOUDSTACK-10326: ----------------------------------------- Description: This issue was discovered, fixed and tested on KVM, but applies for every hypervisor. h2. Background When enabling maintenance mode in a host, host state is put into 'PrepareForMaintenance' and running VMs are migrated into another host. After every VM is migrated, host goes to 'Maintenance' state. Checks are performed on ResourceManagerImpl.checkAndMaintan() method: * List VMs with host_id = HOST_ID * List VMs with last_host_id = HOST_ID and state=Migrating When both queries are empty, then the host can be put into Maintenance. When a VM is being migrated to DEST_HOST, its host_id column is set to DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then migration fails, host_id = last_host_id = ORIGIN_HOST h2. Issue This sequence: * Enable maintenance mode on ORIGIN_HOST * VMs start being migrated to a host, say DEST_HOST * checkAndMaintain() starts: ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are being migrated) ** Before the second check, one or more migrations fail ** Second check passes, however there are VMs running on the host as migrations have failed. * Host goes into Maintenance state. Screenshots attached, query executed on each case: select id, name, instance_name, state, host_id, last_host_id from vm_instance; was: This issue was discovered, fixed and tested on KVM, but applies for every hypervisor. h2. Background When enabling maintenance mode in a host, host state is put into 'PrepareForMaintenance' and running VMs are migrated into another host. After every VM is migrated, host goes to 'Maintenance' state. Checks are performed on ResourceManagerImpl.checkAndMaintan() method: * List VMs with host_id = HOST_ID * List VMs with last_host_id = HOST_ID and state=Migrating When both queries are empty, then the host can be put into Maintenance. When a VM is being migrated to DEST_HOST, its host_id column is set to DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then migration fails, host_id = last_host_id = ORIGIN_HOST h2. Issue This sequence: * Enable maintenance mode on ORIGIN_HOST * VMs start being migrated to a host, say DEST_HOST * checkAndMaintain() starts: ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are being migrated) ** Before the second check, one or more migrations fail ** Second check passes, however there are VMs running on the host as migrations have failed. * Host goes into Maintenance state. > Prevent hosts fall into Maintenance when there are running VMs on it > -------------------------------------------------------------------- > > Key: CLOUDSTACK-10326 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10326 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Affects Versions: 4.11.0.0 > Reporter: Nicolas Vazquez > Assignee: Nicolas Vazquez > Priority: Major > Fix For: 4.11.1.0 > > Attachments: CLOUDSTACK-10326-Debug.png, > CLOUDSTACK-10326-InitialState.png, CLOUDSTACK-10326-Migrating.png, > CLOUDSTACK-10326-MigrationFailed.png > > > This issue was discovered, fixed and tested on KVM, but applies for every > hypervisor. > h2. Background > When enabling maintenance mode in a host, host state is put into > 'PrepareForMaintenance' and running VMs are migrated into another host. After > every VM is migrated, host goes to 'Maintenance' state. > Checks are performed on ResourceManagerImpl.checkAndMaintan() method: > * List VMs with host_id = HOST_ID > * List VMs with last_host_id = HOST_ID and state=Migrating > When both queries are empty, then the host can be put into Maintenance. > When a VM is being migrated to DEST_HOST, its host_id column is set to > DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then > migration fails, host_id = last_host_id = ORIGIN_HOST > h2. Issue > This sequence: > * Enable maintenance mode on ORIGIN_HOST > * VMs start being migrated to a host, say DEST_HOST > * checkAndMaintain() starts: > ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are > being migrated) > ** Before the second check, one or more migrations fail > ** Second check passes, however there are VMs running on the host as > migrations have failed. > * Host goes into Maintenance state. > Screenshots attached, query executed on each case: > select id, name, instance_name, state, host_id, last_host_id from vm_instance; -- This message was sent by Atlassian JIRA (v7.6.3#76005)