GitHub user remibergsma opened a pull request:

    https://github.com/apache/cloudstack/pull/211

    return a state instead of null in AbstractInvestigatorImpl

    When a full cluster is down or unreachable,
    CloudStack currently reports everything the
    same as the last known state, which is usually
    Up. When it cannot reach a host and cannot
    reach another host in the same cluster either,
    it returns null and says "I don't know". This
    prevents it from reporting the problem. Now,
    we return an Alert or Disconnected state so
    proper action can be taken.
    
    Also logging was added, so we know what part
    of the code put it to Alert or Disconnected.
    
    When the host is available again, it goes
    from Alert state back to Up and CloudStack
    starts HA work to recover the VMs. I tested
    it on 4.6/master and it works fine now.
    
    As this is a nasty bug, we might want to fix
    this also in 4.5 and 4.4.
    
    Thanks to @dahn and @snuf for their
    help solving this issue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/remibergsma/cloudstack 
investigator_null_state_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #211
    
----
commit 78e095e64b2344a49e96a7939ca6edd3b36d93dd
Author: Remi Bergsma <git...@remi.nl>
Date:   2015-04-29T18:14:14Z

    return a state instead of null
    
    When a full cluster is down or unreachable,
    CloudStack currently reports everything the
    same as the last known state, which is usually
    Up. When it cannot reach a host and cannot
    reach another host in the same cluster either,
    it returns null and says "I don't know". This
    prevents it from reporting the problem. Now,
    we return an Alert or Disconnected state so
    proper action can be taken.
    
    Also logging was added, so we know what part
    of the code put it to Alert or Disconnected.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to