I was going to submit this as a bug, but thought I would put it here for
discussion first. I have a feeling that it could be behavior by design.

ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

I'm using a cache pool and was playing around with the size and min_size on
the pool to see the effects of replication. I set size/min_size to 1, then
I ran "ceph osd pool set ssd size 3; ceph osd pool set ssd min_size 2".
Client I/O immediately blocked as there was not 2 copies yet (as expected).
However, after the degraded objects are cleared up, there are several PGs
in the remapped+incomplete state and client I/O continues to be blocked
even though all OSDs are up and healthy (even left overnight). If I set
min_size back down to 1, the cluster recovers and client I/O continues.

I expected that as long as there is one copy of the data, the cluster can
copy that data to min_size and cluster operations resume.

Where I think it could be by design is when min_size was already set to 2
and you lose enough OSDs fast enough to dip below that level. There could
be the chance that the serving OSD could have bad data (but we wouldn't
know that anyway at the moment). The bad data could then be replicated and
the ability to recover any good data would be lost.

However, if Ceph immediately replicated the sole OSD to get back to
min_size then when the other(s) came back online, it could back fill and
just destroy the extras.

It seems that immediately replication to keep the cluster operational seems
like a good thing overall. Am I missing something?

Thanks,
Robert
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to