Hi Li,

You would like to take a look at the next PR from Wido -
https://github.com/apache/cloudstack/pull/2985 - this is 4.12 only.

In other words, you are using Mimic, non-LTS release of Ceph - and I have a
hard time believing that anyone is using this in production with CloudStack
(since it's decently recent Ceph release).

Test a ACS 4.12 and see if your problem goes away.

@Wido den Hollander <w...@42on.com> , any thought?

Regards,
Andrija

On Tue, 28 May 2019 at 06:24, li jerry <div...@hotmail.com> wrote:

> Hello guys
>
> we’ve deployed an environment with CloudStack 4.11.2 and KVM(CentOS7.6),
> and Ceph 13.2.5 is deployed as the primary storage.
> We found some issues with the HA solution, and we are here to ask for you
> suggestions.
>
> We’ve both enabled VM HA and Host HA feature in CloudStack, and the
> compute offering is tagged as ha.
> When we try to perform a power failure test (unplug 1 node of 4), the
> running VMs on the removed node is automatically rescheduled to the other
> living nodes after 5 minutes, but all of them can not boot into the OS. We
> found the booting procedure is stuck by the IO read/write failure.
>
>
>
> The following information is prompted after VM starts:
>
> Generating "/run/initramfs/rdsosreport.txt"
>
> Entering emergency mode. Exit the shell to continue.
> Type "journalctl" to view system logs.
> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or
> /boot
> after mounting them and attach it to a bug report
>
> :/#
>
>
>
> We found this is caused by the lock on the image:
> [root@cn01-nodea ~]# rbd lock list a93010b0-2be2-49bd-b25e-ec89b3a98b4b
> There is 1 exclusive lock on this image.
> Locker         ID                  Address
> client.1164351 auto 94464726847232 10.226.16.128:0/3002249644
>
> If we remove the lock from the image, and restart the VM under CloudStack,
> this VM will boot successfully.
>
> We know that if we disable the Exclusive Lock feature (by setting
> rbd_default_features = 3) for Ceph would solve this problem. But we don’t
> think it’s the best solution for the HA, so could you please give us some
> ideas about how you are doing and what is the best practice for this
> feature?
>
> Thanks.
>
>

-- 

Andrija Panić

Reply via email to