Thx Wido! On Tue, 28 May 2019 at 13:51, Wido den Hollander <w...@widodh.nl> wrote:
> > > On 5/28/19 1:48 PM, li jerry wrote: > > Hi Wido > > > > > > > > I filled in the CLOUDSTACK is the following KEY > > > > > > > > [root@cn01-nodeb ~]# ceph auth get client.cloudstack > > > > exported keyring for client.cloudstack > > > > [client.cloudstack] > > > > key = AQDTh7pcIJjNIhAAwk8jtxilJWXQR7osJRFMLw== > > > > caps mon = "allow r" > > > > caps osd = "allow rwx pool=rbd" > > > > > > That's the problem :-) Your user needs to be updated. > > The caps should be: > > [client.cloudstack] > key = AQDTh7pcIJjNIhAAwk8jtxilJWXQR7osJRFMLw== > caps mon = "profile rbd" > caps osd = "profile rbd pool=rbd" > > See: http://docs.ceph.com/docs/master/rbd/rbd-cloudstack/ > > This will allow the client to blacklist the other and take over the > exclusive-lock. > > Wido > > > > > *发件人: *Wido den Hollander <mailto:w...@widodh.nl> > > *发送时间: *2019年5月28日19:42 > > *收件人: *dev@cloudstack.apache.org <mailto:dev@cloudstack.apache.org>; > > li jerry <mailto:div...@hotmail.com>; us...@cloudstack.apache.org > > <mailto:us...@cloudstack.apache.org> > > *主题: *Re: RBD primary storage VM encounters Exclusive Lock after > > triggering HA > > > > > > > > > > > > On 5/28/19 6:16 AM, li jerry wrote: > >> Hello guys > >> > >> we’ve deployed an environment with CloudStack 4.11.2 and > KVM(CentOS7.6), and Ceph 13.2.5 is deployed as the primary storage. > >> We found some issues with the HA solution, and we are here to ask for > you suggestions. > >> > >> We’ve both enabled VM HA and Host HA feature in CloudStack, and the > compute offering is tagged as ha. > >> When we try to perform a power failure test (unplug 1 node of 4), the > running VMs on the removed node is automatically rescheduled to the other > living nodes after 5 minutes, but all of them can not boot into the OS. We > found the booting procedure is stuck by the IO read/write failure. > >> > >> > >> > >> The following information is prompted after VM starts: > >> > >> Generating "/run/initramfs/rdsosreport.txt" > >> > >> Entering emergency mode. Exit the shell to continue. > >> Type "journalctl" to view system logs. > >> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick > or /boot > >> after mounting them and attach it to a bug report > >> > >> :/# > >> > >> > >> > >> We found this is caused by the lock on the image: > >> [root@cn01-nodea ~]# rbd lock list a93010b0-2be2-49bd-b25e-ec89b3a98b4b > >> There is 1 exclusive lock on this image. > >> Locker ID Address > >> client.1164351 auto 94464726847232 10.226.16.128:0/3002249644 > >> > >> If we remove the lock from the image, and restart the VM under > CloudStack, this VM will boot successfully. > >> > >> We know that if we disable the Exclusive Lock feature (by setting > rbd_default_features = 3) for Ceph would solve this problem. But we don’t > think it’s the best solution for the HA, so could you please give us some > ideas about how you are doing and what is the best practice for this > feature? > >> > > > > exclusive-lock is something to prevent a split-brain and having two > > clients write to it at the same time. > > > > The lock should be released to the other client if this is requested, > > but I have the feeling that you might have a cephx problem there. > > > > Can you post the output of: > > > > $ ceph auth get client.X > > > > Where you replace X by the user you are using for CloudStack? Also > > remove they 'key', I don't need that. > > > > I want to look at the caps of the user. > > > > Wido > > > >> Thanks. > >> > >> > > > > > > > -- Andrija Panić