This is because of the min_size specification. I would bet you have it set at 2 (which is good).
ceph osd pool get rbd min_size With 4 hosts, and a size of 3, removing 2 of the hosts (or 2 drives 1 from each hosts) results in some of the objects only having 1 replica min_size dictates that IO freezes for those objects until min_size is achieved. http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas I cant tell if your under the impression that your RBD device is a single object. It is not. It is chunked up into many objects and spread throughout the cluster, as Kjeti mentioned earlier. On Mon, Mar 20, 2017 at 8:48 PM, Kjetil Jørgensen <kje...@medallia.com> wrote: > Hi, > > rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's contents will get > you a "prefix", which then gets you on to rbd_header.<prefix>, > rbd_header.prefix contains block size, striping, etc. The actual data > bearing objects will be named something like rbd_data.prefix.%-016x. > > Example - vm-100-disk-1 has the prefix 86ce2ae8944a, the first <block > size> of that image will be named rbd_data. 86ce2ae8944a.000000000000, the > second <block size> will be 86ce2ae8944a.000000000001, and so on, chances > are that one of these objects are mapped to a pg which has both host3 and > host4 among it's replicas. > > An rbd image will end up scattered across most/all osds of the pool it's > in. > > Cheers, > -KJ > > On Fri, Mar 17, 2017 at 12:30 PM, Adam Carheden <carhe...@ucar.edu> wrote: > >> I have a 4 node cluster shown by `ceph osd tree` below. Monitors are >> running on hosts 1, 2 and 3. It has a single replicated pool of size >> 3. I have a VM with its hard drive replicated to OSDs 11(host3), >> 5(host1) and 3(host2). >> >> I can 'fail' any one host by disabling the SAN network interface and >> the VM keeps running with a simple slowdown in I/O performance just as >> expected. However, if 'fail' both nodes 3 and 4, I/O hangs on the VM. >> (i.e. `df` never completes, etc.) The monitors on hosts 1 and 2 still >> have quorum, so that shouldn't be an issue. The placement group still >> has 2 of its 3 replicas online. >> >> Why does I/O hang even though host4 isn't running a monitor and >> doesn't have anything to do with my VM's hard drive. >> >> >> Size? >> # ceph osd pool get rbd size >> size: 3 >> >> Where's rbd_id.vm-100-disk-1? >> # ceph osd getmap -o /tmp/map && osdmaptool --pool 0 --test-map-object >> rbd_id.vm-100-disk-1 /tmp/map >> got osdmap epoch 1043 >> osdmaptool: osdmap file '/tmp/map' >> object 'rbd_id.vm-100-disk-1' -> 0.1ea -> [11,5,3] >> >> # ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 8.06160 root default >> -7 5.50308 room A >> -3 1.88754 host host1 >> 4 0.40369 osd.4 up 1.00000 1.00000 >> 5 0.40369 osd.5 up 1.00000 1.00000 >> 6 0.54008 osd.6 up 1.00000 1.00000 >> 7 0.54008 osd.7 up 1.00000 1.00000 >> -2 3.61554 host host2 >> 0 0.90388 osd.0 up 1.00000 1.00000 >> 1 0.90388 osd.1 up 1.00000 1.00000 >> 2 0.90388 osd.2 up 1.00000 1.00000 >> 3 0.90388 osd.3 up 1.00000 1.00000 >> -6 2.55852 room B >> -4 1.75114 host host3 >> 8 0.40369 osd.8 up 1.00000 1.00000 >> 9 0.40369 osd.9 up 1.00000 1.00000 >> 10 0.40369 osd.10 up 1.00000 1.00000 >> 11 0.54008 osd.11 up 1.00000 1.00000 >> -5 0.80737 host host4 >> 12 0.40369 osd.12 up 1.00000 1.00000 >> 13 0.40369 osd.13 up 1.00000 1.00000 >> >> >> -- >> Adam Carheden >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Kjetil Joergensen <kje...@medallia.com> > SRE, Medallia Inc > Phone: +1 (650) 739-6580 <(650)%20739-6580> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Respectfully, Wes Dillingham wes_dilling...@harvard.edu Research Computing | Infrastructure Engineer Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 210
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com