[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901633#comment-14901633
 ] 

ASF GitHub Bot commented on CLOUDSTACK-8883:
--------------------------------------------

GitHub user borisroman opened a pull request:

    https://github.com/apache/cloudstack/pull/863

    [BLOCKER][4.6]CLOUDSTACK-8883: Resolved connect/reconnect issue.

    Hi!
    
    @wilderrodrigues by implementing Callable you switched a couple of methods 
and fields. I switched them some more!
    
    The reason why the Agent wouldn't reconnect was due to two facts.
    
    Problem 1: Selector was blocking.
    In the while loop at [1] _selector.select(); was blocking when the 
connection was lost. This means at [2] _isStartup = false; was never excecuted. 
Therefore at [3] the call to isStartup() always returned true resulting in an 
infinite loop.
    
    Resolution 1: Move the call to cleanUp() [4] before checking if isStartup() 
has turned to false. cleanUp() will close() the _selector resulting in 
_isStartup to be set to false.
    
    Problem 2: Setting _isStartup & _isRunning to true when init() throwed an 
unchecked exception (ConnectException).
    The exception was nicely caught, but only logged. No action was taken! 
Resulting in _isStartup & _isRunning being set to true. Resulting in the fact 
the Agent thought it was connected successfully, though it wasn't.
    
    Resolution 2: Adding return to the catch statement [5]. This way _isStartup 
& _isRunning aren't set to true.
    
    Steps to test:
    1. Deploy ACS.
    2. Try all combinations of stopping/starting managment server/agent.
    
    
    
[1]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L128
    
[2]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L176
    
[3]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/agent/src/com/cloud/agent/Agent.java#L404
    
[4]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/agent/src/com/cloud/agent/Agent.java#L399
    
[5]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L91

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/borisroman/cloudstack CLOUDSTACK-8883

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #863
    
----
commit 9693b97c2147b3fdb9579a1ebb33597cd3bf1d11
Author: Boris Schrijver <bo...@pcextreme.nl>
Date:   2015-09-21T14:54:56Z

    Call cleanUp() before looping isStartup().

commit b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e
Author: Boris Schrijver <bo...@pcextreme.nl>
Date:   2015-09-21T22:38:16Z

    Added return statement to stop start() if there has been an 
ConnectException.

----


> [Blocker] KVM host goes into disconnected state when MS is restarted
> --------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-8883
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8883
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: 4.6.0
>            Reporter: Raja Pullela
>            Assignee: Boris Schrijver
>            Priority: Blocker
>             Fix For: 4.6.0
>
>
> steps to reproduce:
> - restart MS
> - see the KVM host status
> Expected
> - Agent should reconnect
> Actual
> - Host states in disconnect state and Agent does not reconnect
> Apparently a recent commit broke and BVTs are for KVM are all failing because 
> Hosts go into a disconnected state and the SSVM/CPVMs don't come up.  
> Current Agent Log - during the MS restart
> 2015-09-18 07:05:37,301 INFO  [kvm.storage.LibvirtStorageAdaptor] 
> (agentRequest-                   Handler-5:null) Asking libvirt to refresh 
> storage pool c8bd627f-101f-3215-8545-7                   2f7ce50f2c6
> 2015-09-18 07:06:37,452 INFO  [kvm.storage.LibvirtStorageAdaptor] 
> (agentRequest-Handler-1:null) Trye pool c8bd627f-101f-3215-8545-72f7ce50f2c6 
> from libvirt
> 2015-09-18 07:06:37,469 INFO  [kvm.storage.LibvirtStorageAdaptor] 
> (agentRequest-Handler-1:null) Askesh storage pool 
> c8bd627f-101f-3215-8545-72f7ce50f2c6
> 2015-09-18 07:07:32,417 INFO  [cloud.agent.Agent] (Agent-Handler-5:null) Lost 
> connection to the server. Dealing with the remaining commands...
> Previously Agent used to reconnect -
> 2015-09-18 12:15:11,902 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Reconnecting...
> 2015-09-18 12:15:11,903 INFO  [utils.nio.NioClient] (Agent-Selector:null) 
> Connecting to 10.147.28.47:8250
> 2015-09-18 12:15:11,904 WARN  [utils.nio.NioConnection] (Agent-Selector:null) 
> Unable to connect to remote: is there a server running on port 8250



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to