jgotteswinter opened a new issue, #13010:
URL: https://github.com/apache/cloudstack/issues/13010

   ### problem
   
   While enabling maintenance mode i see random instances getting stopped while 
the host is evacuated, the majority is migrated without any issues. But 
sometimes i see a instance which should have been live migrated being stopped.
   
   the management server says this
   
   `2026-04-13 10:26:54,986 INFO  [c.c.h.HighAvailabilityManagerExtImpl] 
(HA-Worker-1:[ctx-7abbe53d, work-3314]) (logid:5ce65c99) Migration attempt: for 
VM VM instance 
{"id":4930,"instanceName":"i-55-4930-VM","state":"Running","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}from
 host Host 
{"id":18,"name":"XXXch02","type":"Routing","uuid":"dc51-a18d-4f7d-9a2e-7dfbb7a1b908"}.
 Starting attempt: 1/5 times.
   2026-04-13 10:42:32,197 INFO  [c.c.v.ClusteredVirtualMachineManagerImpl] 
(Work-Job-Executor-21:[ctx-1e4a3543, job-742712/job-743693, ctx-ff1f267b]) 
(logid:279e8d1b) Migrating VM instance 
{"id":4930,"instanceName":"i-55-4930-VM","state":"Running","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}
 to 
Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] 
: Dest[Zone(3)-Pod(3)-Cluster(3)-Host(18)-Storage()]
   2026-04-13 10:42:32,349 WARN  [c.c.v.ClusteredVirtualMachineManagerImpl] 
(Work-Job-Executor-21:[ctx-1e4a3543, job-742712/job-743693, ctx-ff1f267b]) 
(logid:279e8d1b) Unable to migrate VM instance 
{"id":4930,"instanceName":"i-55-4930-VM","state":"Running","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}
 to Host 
{"id":18,"name":"XXXch02","type":"Routing","uuid":"dc51-a18d-4f7d-9a2e-7dfbb7a1b908"}
 due to [Resource [Host:18] is unreachable: Host 18: Operation timed out] 
com.cloud.exception.AgentUnavailableException: Resource [Host:18] is 
unreachable: Host 18: Operation timed out
   2026-04-13 10:43:27,247 INFO  [c.c.r.ResourceManagerImpl] 
(AgentMonitor-1:[ctx-6e6b2b3f]) (logid:afd387b5) Attempting maintenance for 
Host 
{"id":21,"name":"XXXch03","type":"Routing","uuid":"eacf-b3e7-4aa9-b4ae-ff5a41862c06"}
 found pending migration for VM instance 
{"id":4930,"instanceName":"i-55-4930-VM","state":"Stopping","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}.
   2026-04-13 10:43:40,248 ERROR [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-21:[ctx-1e4a3543, job-742712/job-743693, ctx-ff1f267b]) 
(logid:279e8d1b) Invocation exception, caused by: 
com.cloud.utils.exception.CloudRuntimeException: Unable to migrate VM instance 
{"id":4930,"instanceName":"i-55-4930-VM","state":"Running","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}
   2026-04-13 10:43:40,248 INFO  [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-21:[ctx-1e4a3543, job-742712/job-743693, ctx-ff1f267b]) 
(logid:279e8d1b) Rethrow exception 
com.cloud.utils.exception.CloudRuntimeException: Unable to migrate VM instance 
{"id":4930,"instanceName":"i-55-4930-VM","state":"Running","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}
   2026-04-13 10:43:40,248 ERROR [c.c.v.VmWorkJobDispatcher] 
(Work-Job-Executor-21:[ctx-1e4a3543, job-742712/job-743693]) (logid:279e8d1b) 
Unable to complete AsyncJob 
{"accountId":1,"cmd":"com.cloud.vm.VmWorkMigrateAway","cmdInfo":"rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAATQnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAFQ","cmdVersion":0,"completeMsid":null,"created":"Mon
 Apr 13 10:42:31 UTC 
2026","id":743693,"initMsid":90520733699643,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":1,"uuid":"1401-8cf9-4276-ab57-c6a844371dd2"},
 job origin: 742712 com.cloud.utils.exception.CloudRuntimeException: Unable to 
migrate VM instance {"id":4930,"instanceName":"i-55-4930-VM","s
 tate":"Running","type":"User","uuid":"cf19-00b6-465e-98f1-c63b4860498d"}`
   
   i would expect to just leave the instance alone up and running on its origin 
host and trigger a failure for the maintenance mode. 
   
   ### versions
   
   ACS 4.22
   Ubuntu 24.04
   KVM
   
   
   ### The steps to reproduce the bug
   
   1.
   2.
   3.
   ...
   
   
   ### What to do about it?
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to