[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafael Weingärtner updated CLOUDSTACK-8949:
-------------------------------------------
    Affects Version/s:     (was: 4.3.0)
          Environment:     (was: XenServer 6.2)
        Fix Version/s: Future
          Component/s: Doc
           Issue Type: Improvement  (was: Bug)

> Job timeout caused by improper/inconsistent application of wait parameters
> --------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-8949
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8949
>             Project: CloudStack
>          Issue Type: Improvement
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Doc, Hypervisor Controller
>            Reporter: Ryan Farrington
>            Priority: Minor
>             Fix For: Future
>
>
> when creating a "MigrateVolumeCommand" a timeout should be set, this way the 
> ACS would not use the default value of parameter "wait".  That timeout value 
> should be the same as the one used on CitrixResourceBase and its children to 
> control the migration of volumes.
> Environment:    XenServer 6.2, multiple pre-setup primary storage locations 
> that are iscsi SRs in XenServer.
> Summary of actions:      Attempt to move a volume that is attached to a 
> running instance to another primary storage location.
> What *I* assume should happen:    Cloudstack communicates with the host of 
> the VM in question and issues a xe vdi-pool-migrate.   The command is 
> synchronous and doesn't return control back until it has performed the 
> action.   Cloudstack should know this is a synchronous call and wait the 
> configured amount of time for the migration to be performed.
> What *I* think is happening(with some education from Rafael Weingärtner):    
> There are multiple timers being set in the process (please reference  
> https://issues.apache.org/jira/browse/CLOUDSTACK-8946 for the strangely 
> documented and implemented wait parameters) when the timer that should be 
> used for all the timeouts is the one that is currently defined as the 
> migratewait.  
> Here is a full transaction of a 1TB volume migration for instance i-34-311.
> {noformat}
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d 
> ctx-c6b92515) Seq 38-996939857: Sending  { Cmd , MgmtId: 42756806312036, via: 
> 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, 
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
>  }
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d 
> ctx-c6b92515) Seq 38-996939857: Executing:  { Cmd , MgmtId: 42756806312036, 
> via: 38(xen-nc-bc2b7), Ver: v1, Flags: 100111, 
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
>  }
> 2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache] 
> (DirectAgent-494:ctx-5aac3f29) Seq 38-996939857: Executing request
> 2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some 
> more time because this is the current command
> 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Waiting some 
> more time because this is the current command
> 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find exception: 
> com.cloud.exception.OperationTimedoutException in error code list for 
> exceptions
> 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Timed out on 
> Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), 
> Ver: v1, Flags: 100111, 
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
>  }
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: Cancelling.
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857: No more 
> commands found
> 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send command, due to 
> Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 
> to Host 38 timed out after 7200
> 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due 
> to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 
> 996939857 to Host 38 timed out after 7200
>         at 
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
>         at 
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
>         at 
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
>         at 
> org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
>         at 
> org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
>         at 
> com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
>         at 
> com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
>         at 
> com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
>         at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
>         at 
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
>         at 
> com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
>         at 
> com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at 
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
>         at 
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
>         at 
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at 
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:701)
> 2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data object (VOLUME, 
> org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f), no 
> need to delete from object in store ref table
> 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] 
> (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume 
> failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send 
> command, due to Agent:38, com.cloud.exception.OperationTimedoutException: 
> Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
> (Job-Executor-63:ctx-f7b6817d) Complete async job-5257, jobStatus: FAILED, 
> resultCode: 530, result: 
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
>  to migrate volume"}
> 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
> (Job-Executor-63:ctx-f7b6817d) Done executing 
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
> 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor] 
> (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job monitoring
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to