Would prefer “A bigger design fix would be to make management server asynchronous of agent side answer/response handling”. However, I understand the volume of changes that requires.
I looked at the PR, and I think that everything is ok there. Of course, I think we might need some more time to review and think about the possible outcomes of such changes. On Fri, May 11, 2018 at 7:55 AM, Rohit Yadav <rohit.ya...@shapeblue.com> wrote: > All, > > > Historically, when the agent (kvm, ssvm, cpvm) is disconnected from the > management server (say due to mgmt server restart etc), the reconnection > logic waits for any pending tasks/commands to complete before reconnection > attempts are made. I tried to search git history but could not find a > reason, can anyone share why we may need this? > > > Based on the reported issue: > > https://github.com/apache/cloudstack/issues/2633 > > > I've a working patch which removes this limitation: > > https://github.com/apache/cloudstack/pull/2638 > > > From testing with various combinations of tasks, I found that when that > happens even if the pending task succeeds it fails to send an Answer to the > mgmt server, therefore from the control plane's perspective that task is > still pending/on-going. > > > When the mgmt server comes back online, and the agent finally reconnects > (pending on how long the pending task took) the executed operation is still > pending in mgmt server's view and may sometimes require manual cleanups in > database. By removing the limitation in above PR, at least the agent > reconnects faster while of the failure/fault behaviours remain the same. A > bigger design fix would be to make management server asynchronous of agent > side answer/response handling. > > > - Rohit > > <https://cloudstack.apache.org> > > > > rohit.ya...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > -- Rafael Weingärtner