Re: KVM HA BUG 4.11.1.0 centos 7

2018-08-13 Thread Thomas Heil
Hi,

Ive added more hosts and enabled ha on all of them. Now i shoot down
node cs-hv-06, which is running r-199. Here
are the logs iam gettin.

--
2018-08-13 12:12:51,402 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(pool-5-thread-1:null) (logid:b71a09c7) Notifying HA Mgr of to restart
vm 199-r-199-VM
2018-08-13 12:12:51,410 INFO  [c.c.h.HighAvailabilityManagerImpl]
(pool-5-thread-1:null) (logid:b71a09c7) Schedule vm for HA: 
VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,418 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Processing work
HAWork[48-HA-199-Running-Investigating]
2018-08-13 12:12:51,421 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) HA on
VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,424 DEBUG [c.c.h.CheckOnAgentInvestigator]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Unable to reach the
agent for VM[DomainRouter|r-199-VM]: Resource [Host:32] is unreachable:
Host 32: Host with specified id is not in the right state: Down
2018-08-13 12:12:51,424 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) SimpleInvestigator
could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,424 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3)
XenServerInvestigator could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:12:51,426 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) KVMInvestigator
found VM[DomainRouter|r-199-VM] to be alive? true
2018-08-13 12:12:51,426 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) VM r-199-VM is found
to be alive by KVMInvestigator
2018-08-13 12:12:51,426 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Rescheduling work
HAWork[48-HA-199-Running-Investigating] to try again at Mon Aug 13
12:13:52 CEST 2018
2018-08-13 12:14:51,431 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Processing work
HAWork[48-HA-199-Running-Investigating]
2018-08-13 12:14:51,433 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) HA on
VM[DomainRouter|r-199-VM]
2018-08-13 12:14:51,436 DEBUG [c.c.h.CheckOnAgentInvestigator]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Unable to reach the
agent for VM[DomainRouter|r-199-VM]: Resource [Host:32] is unreachable:
Host 32: Host with specified id is not in the right state: Down
2018-08-13 12:14:51,436 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) SimpleInvestigator
could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:14:51,436 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e)
XenServerInvestigator could not find VM[DomainRouter|r-199-VM]
2018-08-13 12:14:51,438 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) KVMInvestigator
found VM[DomainRouter|r-199-VM] to be alive? true
2018-08-13 12:14:51,438 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) VM r-199-VM is found
to be alive by KVMInvestigator
2018-08-13 12:14:51,438 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Rescheduling work
HAWork[48-HA-199-Running-Investigating] to try again at Mon Aug 13
12:15:52 CEST 2018
--

The router ist dead, but its still detected as alive. It seems its also
not BUG https://issues.apache.org/jira/browse/CLOUDSTACK-3535.

kind regards,
thomas


On 02.08.2018 16:46, Thomas Heil wrote:
> Hi,
>
> I have a setup with one advanced zone, one cluster and two Hosts. The
> hosts are KVM and use a single NFS Storage von Primary and one for
> Secondary.
>
> Everything is running smootly until I remove power from one host.
>
> In my honest opinion cloudstack should now delcare the faulty host as
> dead, declare the vm's etc. that were running bevore there as dead and
> start them on the leaving KVM host.
>
> But nothing happens. The VM's states remain as running, the system vms
> become 'Agent State: dicconnected' and thats all.
>
> The only solution to fix that issue for me was to set the state of all
> VM's that were running on the faulty node to 'Stopped'.
>
> Could anybody confirm that this is a reproduceable problem?
>
> kind regards,
> thomas




Hybrid Event Bus for CloudStack (for CloudStack users who use Kafka or RMQ)

2018-08-13 Thread Ivan Kudryavtsev
Hello, users, devs.

Today we would like to share new event bus plug-in (hybrid) which combines
inMemory (default) and (Kafka or RabbitMQ) event buses, so inMemory is used
for intra-CloudStack purposes and RMQ or Kafka are used for external.

We use Kafka Event Bus in our deployment and after the development of a new
plug-in (which we publish soon), we discovered that this bus has lack of
functionality. Internal event receivers are not able to subscribe to events
when that bus is used. Actually, it's quite a nasty thing because of
cutting certain CloudStack functionality and It's really impossible who
would like to subscribe to events even inside the CloudStack core, e.g. to
implement delayed processing. So, using standard Kafka Event Bus these
functions stop working and the regression discovery will not be handy. I
know that RabbitMQ implementation also has certain subscription problems.

To overcome it, we implemented a "hybrid" even bus, which uses inMemory for
internals and any other for external communications.

You can find that bus on GitHub:
https://github.com/bwsw/cloud-plugin-event-bus-hybrid

License: Apache 2

If the developers and community wish to merge it into the main tree, please
feel free, we grant it. We also can merge ourselves if project leaders find
the question worth it.



-- 
With best regards, Ivan Kudryavtsev
Bitworks LLC
Cell: +7-923-414-1515
WWW: http://bitworks.software/