[ceph-users] Write access delay after OSD & Mon lost

Mathieu Dupré Tue, 06 Oct 2020 00:41:16 -0700
Hi everybody,
Our need is to do VM failover using an image disk over RBD to avoid data 
loss.We want to limit the downtime as much as
possible.
We have: - Two hypervisors with a Ceph Monitor and a Ceph OSD. - A third 
machine with a Ceph Monitor and a Ceph
Manager. 
VM are running over qemu.The VM disks are on a "replicated" rbd pool formed by 
the two OSDs.Ceph version:
NautilusDistribution: Yocto Zeus
The following test is performed: we electrically turn off one hypervisor (and 
therefore a Ceph Monitor and a Ceph OSD),
which causes its VMs to switch to the second hypervisor.
My main issue is that the mount time of a partition in rw is very slow in the 
case of a failover (after the loss of an
OSD its monitor).
With failover we can write on the device after ~25s:[   25.609074] EXT4-fs 
(vda3): mounted filesystem with ordered data
mode. Opts: (null)
In normal boot we can write on the device after ~4s:[    3.087412] EXT4-fs 
(vda3): mounted filesystem with ordered data
mode. Opts: (null)
I wasn't able to reduce this time by tweaking Ceph settings. I am wondering if 
someone could help me on that.
Here is our configuration.
ceph.conf[global]    fsid = fa7a17d1-5351-459e-bf0e-07e7edc9a625    mon initial 
members =
hypervisor1,hypervisor2,observer    mon host = 
192.168.217.131,192.168.217.132,192.168.217.133    public network =
192.168.217.0/24    auth cluster required = cephx    auth service required = 
cephx    auth client required =
cephx    osd journal size = 1024    osd pool default size = 2    osd pool 
default min size = 1    osd crush chooseleaf
type = 1    mon osd adjust heartbeat grace = false    mon osd min down 
reporters = 1[mon.hypervisor1]    host =
hypervisor1    mon addr = 192.168.217.131:6789[mon.hypervisor2]    host = 
hypervisor2    mon addr =
192.168.217.132:6789[mon.observer]    host = observer    mon addr = 
192.168.217.133:6789[osd.0]    host =
hypervisor1    public_addr = 192.168.217.131    cluster_addr = 
192.168.217.131[osd.1]    host =
hypervisor2    public_addr = 192.168.217.132    cluster_addr = 192.168.217.13
# ceph config dump WHO    MASK LEVEL    OPTION                           VALUE  
  RO global      advanced
mon_osd_adjust_down_out_interval false       global      advanced
mon_osd_adjust_heartbeat_grace   false       global      advanced
mon_osd_down_out_interval        5           global      advanced
mon_osd_report_timeout           4            global      advanced
osd_beacon_report_interval       1           global      advanced
osd_heartbeat_grace              2           global      advanced
osd_heartbeat_interval           1           global      advanced
osd_mon_ack_timeout              1.000000    global      advanced
osd_mon_heartbeat_interval       2           global      advanced 
osd_mon_report_interval          3 
Thanks
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Write access delay after OSD & Mon lost

Reply via email to