What are your pool size and min_size settings? An object with less than
min_size replicas will not receive I/O (
http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas).
So if size=2 and min_size=1 then an OSD failure means blocked operations to
all objects located on the failed OSD until they have been replicated again.

On Sat, Nov 5, 2016 at 9:04 AM, fcid <f...@altavoz.net> wrote:

> Dear ceph community,
>
> I'm working in a small ceph deployment for testing purposes, in which i
> want to test the high availability features of Ceph and how clients are
> affected during outages in the cluster.
>
> This small cluster is deployed using 3 servers on which are running 2 OSDs
> and 1 monitor each, and we are using it to serve Rados block devices for
> KVM hypervisors in other hosts. The ceph software was installed using
> ceph-deploy.
>
> For HA testing we are simulating disk failures by physically detaching OSD
> disks from servers and also by eliminating the power source from servers we
> want to fail.
>
> I have some doubts regarding the behavior during OSD and disk failures
> under light workloads.
>
> During disk failures, the cluster takes a long time to promote the
> secondary OSD to primary, thus blocking all the disk operations of virtual
> machines using RBD until the cluster map is updated with the failed OSD
> (which can take up to 10 minutes in our cluster). Is this the expected
> behavior of the OSD cluster? or should it be transparent to clients when
> the disks fails?
>
> Thanks in advance, kind regards.
>
> Configuration and version of our ceph cluster:
>
> root@ceph00:~# cat /etc/ceph/ceph.conf
> [global]
> fsid = 440fce60-3097-4f1c-a489-c170e65d8e09
> mon_initial_members = ceph00
> mon_host = 192.168.x1.x1
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public network = 192.168.x.x/x
> cluster network = y.y.y.y/y
> [osd]
> osd mkfs options = -f -i size=2048 -n size=64k
> osd mount options xfs = inode64,noatime,logbsize=256k
> osd journal size = 20480
> filestore merge threshold = 40
> filestore split multiple = 8
> filestore xattr use omap = true
>
> root@ceph00:~# ceph -v
> ceph version 10.2.3
>
> --
> Fernando Cid O.
> Ingeniero de Operaciones
> AltaVoz S.A.
>  http://www.altavoz.net
> Viña del Mar, Valparaiso:
>  2 Poniente 355 of 53
>  +56 32 276 8060
> Santiago:
>  San Pío X 2460, oficina 304, Providencia
>  +56 2 2585 4264
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to