[
https://issues.apache.org/jira/browse/CLOUDSTACK-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rafael Weingärtner updated CLOUDSTACK-8949:
-------------------------------------------
Affects Version/s: (was: 4.3.0)
Environment: (was: XenServer 6.2)
Fix Version/s: Future
Component/s: Doc
Issue Type: Improvement (was: Bug)
> Job timeout caused by improper/inconsistent application of wait parameters
> --------------------------------------------------------------------------
>
> Key: CLOUDSTACK-8949
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8949
> Project: CloudStack
> Issue Type: Improvement
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Doc, Hypervisor Controller
> Reporter: Ryan Farrington
> Priority: Minor
> Fix For: Future
>
>
> when creating a "MigrateVolumeCommand" a timeout should be set, this way the
> ACS would not use the default value of parameter "wait". That timeout value
> should be the same as the one used on CitrixResourceBase and its children to
> control the migration of volumes.
> Environment: XenServer 6.2, multiple pre-setup primary storage locations
> that are iscsi SRs in XenServer.
> Summary of actions: Attempt to move a volume that is attached to a
> running instance to another primary storage location.
> What *I* assume should happen: Cloudstack communicates with the host of
> the VM in question and issues a xe vdi-pool-migrate. The command is
> synchronous and doesn't return control back until it has performed the
> action. Cloudstack should know this is a synchronous call and wait the
> configured amount of time for the migration to be performed.
> What *I* think is happening(with some education from Rafael Weingärtner):
> There are multiple timers being set in the process (please reference
> https://issues.apache.org/jira/browse/CLOUDSTACK-8946 for the strangely
> documented and implemented wait parameters) when the timer that should be
> used for all the timeouts is the one that is currently defined as the
> migratewait.
> Here is a full transaction of a 1TB volume migration for instance i-34-311.
> {noformat}
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d
> ctx-c6b92515) Seq 38-996939857: Sending { Cmd , MgmtId: 42756806312036, via:
> 38(xen-nc-bc2b7), Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d
> ctx-c6b92515) Seq 38-996939857: Executing: { Cmd , MgmtId: 42756806312036,
> via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request
> 2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some
> more time because this is the current command
> 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some
> more time because this is the current command
> 2015-10-12 18:41:20,457 INFO [c.c.u.e.CSExceptionErrorCode]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception:
> com.cloud.exception.OperationTimedoutException in error code list for
> exceptions
> 2015-10-12 18:41:20,465 WARN [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on
> Seq 38-996939857: { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
> Ver: v1, Flags: 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more
> commands found
> 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to
> Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857
> to Host 38 timed out after 7200
> 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due
> to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
> 996939857 to Host 38 timed out after 7200
> at
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
> at
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
> at
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
> at
> org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
> at
> org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
> at
> com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
> at
> com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
> at
> com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
> at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:622)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
> at
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
> at
> com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:701)
> 2015-10-12 18:41:20,479 WARN [o.a.c.s.d.ObjectInDataStoreManagerImpl]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME,
> org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no
> need to delete from object in store ref table
> 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
> failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send
> command, due to Agent:38, com.cloud.exception.OperationTimedoutException:
> Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED,
> resultCode: 530, result:
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> to migrate volume"}
> 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Job-Executor-63:ctx-f7b6817d) Done executing
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
> 2015-10-12 18:41:20,489 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)