Hi everybody,
Our need is to do VM failover using an image disk over RBD to avoid data
loss.We want to limit the downtime as much as
possible.
We have: - Two hypervisors with a Ceph Monitor and a Ceph OSD. - A third
machine with a Ceph Monitor and a Ceph
Manager.
VM are running over qemu.The VM disks are on a "replicated" rbd pool formed by
the two OSDs.Ceph version:
NautilusDistribution: Yocto Zeus
The following test is performed: we electrically turn off one hypervisor (and
therefore a Ceph Monitor and a Ceph OSD),
which causes its VMs to switch to the second hypervisor.
My main issue is that the mount time of a partition in rw is very slow in the
case of a failover (after the loss of an
OSD its monitor).
With failover we can write on the device after ~25s:[ 25.609074] EXT4-fs
(vda3): mounted filesystem with ordered data
mode. Opts: (null)
In normal boot we can write on the device after ~4s:[ 3.087412] EXT4-fs
(vda3): mounted filesystem with ordered data
mode. Opts: (null)
I wasn't able to reduce this time by tweaking Ceph settings. I am wondering if
someone could help me on that.
Here is our configuration.
ceph.conf[global] fsid = fa7a17d1-5351-459e-bf0e-07e7edc9a625 mon initial
members =
hypervisor1,hypervisor2,observer mon host =
192.168.217.131,192.168.217.132,192.168.217.133 public network =
192.168.217.0/24 auth cluster required = cephx auth service required =
cephx auth client required =
cephx osd journal size = 1024 osd pool default size = 2 osd pool
default min size = 1 osd crush chooseleaf
type = 1 mon osd adjust heartbeat grace = false mon osd min down
reporters = 1[mon.hypervisor1] host =
hypervisor1 mon addr = 192.168.217.131:6789[mon.hypervisor2] host =
hypervisor2 mon addr =
192.168.217.132:6789[mon.observer] host = observer mon addr =
192.168.217.133:6789[osd.0] host =
hypervisor1 public_addr = 192.168.217.131 cluster_addr =
192.168.217.131[osd.1] host =
hypervisor2 public_addr = 192.168.217.132 cluster_addr = 192.168.217.13
# ceph config dump WHO MASK LEVEL OPTION VALUE
RO global advanced
mon_osd_adjust_down_out_interval false global advanced
mon_osd_adjust_heartbeat_grace false global advanced
mon_osd_down_out_interval 5 global advanced
mon_osd_report_timeout 4 global advanced
osd_beacon_report_interval 1 global advanced
osd_heartbeat_grace 2 global advanced
osd_heartbeat_interval 1 global advanced
osd_mon_ack_timeout 1.000000 global advanced
osd_mon_heartbeat_interval 2 global advanced
osd_mon_report_interval 3
Thanks
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io