> On 17 Feb 2023, at 23:20, Anthony D'Atri <anthony.da...@gmail.com> wrote: > > > >> * if rebalance will starts due EDAC or SFP degradation, is faster to fix the >> issue via DC engineers and put node back to work > > A judicious mon_osd_down_out_subtree_limit setting can also do this by not > rebalancing when an entire node is detected down.
Yes. But in this case when single disk dead, it's may be not actually dead, the examples: * disk just stuck - reboot or/and physical inject_insert return in to live * disk read errors - such errors lead to OSD down, but after OSD restart is just works normal (Pending Sectors -> Reallocates) The fill of single 16TB OSD may be a 7-10 days. And it's may be fixed with 10-20 minutes with duty engineer > >> * noout prevents unwanted OSD's fills and the run out of space => outage of >> services > > Do you run your clusters very full? We provide public services. This means client can rent 1000 disks x 1000GB via one terraform command, at 02:00 Saturday night. Just physically impossible to add nodes at this case. Any movement without upmap is highly undesirable k _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io