[ https://issues.apache.org/jira/browse/CLOUDSTACK-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rafael Weingärtner updated CLOUDSTACK-8949: ------------------------------------------- Affects Version/s: (was: 4.3.0) Environment: (was: XenServer 6.2) Fix Version/s: Future Component/s: Doc Issue Type: Improvement (was: Bug) > Job timeout caused by improper/inconsistent application of wait parameters > -------------------------------------------------------------------------- > > Key: CLOUDSTACK-8949 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8949 > Project: CloudStack > Issue Type: Improvement > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Doc, Hypervisor Controller > Reporter: Ryan Farrington > Priority: Minor > Fix For: Future > > > when creating a "MigrateVolumeCommand" a timeout should be set, this way the > ACS would not use the default value of parameter "wait". That timeout value > should be the same as the one used on CitrixResourceBase and its children to > control the migration of volumes. > Environment: XenServer 6.2, multiple pre-setup primary storage locations > that are iscsi SRs in XenServer. > Summary of actions: Attempt to move a volume that is attached to a > running instance to another primary storage location. > What *I* assume should happen: Cloudstack communicates with the host of > the VM in question and issues a xe vdi-pool-migrate. The command is > synchronous and doesn't return control back until it has performed the > action. Cloudstack should know this is a synchronous call and wait the > configured amount of time for the migration to be performed. > What *I* think is happening(with some education from Rafael Weingärtner): > There are multiple timers being set in the process (please reference > https://issues.apache.org/jira/browse/CLOUDSTACK-8946 for the strangely > documented and implemented wait parameters) when the timer that should be > used for all the timeouts is the one that is currently defined as the > migratewait. > Here is a full transaction of a 1TB volume migration for instance i-34-311. > {noformat} > 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d > ctx-c6b92515) Seq 38-996939857: Sending { Cmd , MgmtId: 42756806312036, via: > 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d > ctx-c6b92515) Seq 38-996939857: Executing: { Cmd , MgmtId: 42756806312036, > via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache] > (DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request > 2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some > more time because this is the current command > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some > more time because this is the current command > 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: > com.cloud.exception.OperationTimedoutException in error code list for > exceptions > 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on > Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), > Ver: v1, Flags: 100111, > [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}] > } > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling. > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more > commands found > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to > Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 > to Host 38 timed out after 7200 > 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed > com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due > to Agent:38, com.cloud.exception.OperationTimedoutException: Commands > 996939857 to Host 38 timed out after 7200 > at > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116) > at > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382) > at > org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421) > at > org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70) > at > org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931) > at > com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680) > at > com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666) > at > com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622) > at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:622) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) > at > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) > at > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) > at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source) > at > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103) > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161) > at > com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109) > at > com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:701) > 2015-10-12 18:41:20,479 WARN [o.a.c.s.d.ObjectInDataStoreManagerImpl] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME, > org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no > need to delete from object in store ref table > 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume > failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send > command, due to Agent:38, com.cloud.exception.OperationTimedoutException: > Commands 996939857 to Host 38 timed out after 7200 > 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, > resultCode: 530, result: > org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed > to migrate volume"} > 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > (Job-Executor-63:ctx-f7b6817d) Done executing > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257 > 2015-10-12 18:41:20,489 INFO [o.a.c.f.j.i.AsyncJobMonitor] > (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)