What are your pool size and min_size settings? An object with less than min_size replicas will not receive I/O ( http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas). So if size=2 and min_size=1 then an OSD failure means blocked operations to all objects located on the failed OSD until they have been replicated again.
On Sat, Nov 5, 2016 at 9:04 AM, fcid <f...@altavoz.net> wrote: > Dear ceph community, > > I'm working in a small ceph deployment for testing purposes, in which i > want to test the high availability features of Ceph and how clients are > affected during outages in the cluster. > > This small cluster is deployed using 3 servers on which are running 2 OSDs > and 1 monitor each, and we are using it to serve Rados block devices for > KVM hypervisors in other hosts. The ceph software was installed using > ceph-deploy. > > For HA testing we are simulating disk failures by physically detaching OSD > disks from servers and also by eliminating the power source from servers we > want to fail. > > I have some doubts regarding the behavior during OSD and disk failures > under light workloads. > > During disk failures, the cluster takes a long time to promote the > secondary OSD to primary, thus blocking all the disk operations of virtual > machines using RBD until the cluster map is updated with the failed OSD > (which can take up to 10 minutes in our cluster). Is this the expected > behavior of the OSD cluster? or should it be transparent to clients when > the disks fails? > > Thanks in advance, kind regards. > > Configuration and version of our ceph cluster: > > root@ceph00:~# cat /etc/ceph/ceph.conf > [global] > fsid = 440fce60-3097-4f1c-a489-c170e65d8e09 > mon_initial_members = ceph00 > mon_host = 192.168.x1.x1 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > public network = 192.168.x.x/x > cluster network = y.y.y.y/y > [osd] > osd mkfs options = -f -i size=2048 -n size=64k > osd mount options xfs = inode64,noatime,logbsize=256k > osd journal size = 20480 > filestore merge threshold = 40 > filestore split multiple = 8 > filestore xattr use omap = true > > root@ceph00:~# ceph -v > ceph version 10.2.3 > > -- > Fernando Cid O. > Ingeniero de Operaciones > AltaVoz S.A. > http://www.altavoz.net > Viña del Mar, Valparaiso: > 2 Poniente 355 of 53 > +56 32 276 8060 > Santiago: > San Pío X 2460, oficina 304, Providencia > +56 2 2585 4264 > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com