Hello,

Recently I have an issue with OSD process with dying disk under it -
disk suddenly started doing cluster remapping so OSD was stale for a
couple of minutes. Unfortunately flapping prevention was not
triggered, since writes are simply degraded, not frozen. May be it
will be worth to introduce self-marking mechanism working in the
seperate thread watching on queue of non-flushed operations and
raising a flag on long-time watermark crossing, say, minutes. It`ll be
helpful in companion of relatively high down_out interval and in very
large setups, where one degraded storage can bring entire data
placement to the knees(and flaps are not presented by some reason).
Right now I may do such job using orchestrator and watching per-socket
statistic, but it is not very reliable at all.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to