Re: [ceph-users] ceph (jewel) unable to recover after node failure

2020-01-10 Thread Eugen Block
Hi, A. will ceph be able to recover over time? I am afraid that the 14 PGs that are down will not recover. if all OSDs come back (stable) the recovery should eventually finish. B. what caused the OSDs going down and up during recovery after the failed OSD node came back online? (step 2 above

[ceph-users] ceph (jewel) unable to recover after node failure

2020-01-07 Thread Hanspeter Kunz
here is the output of ceph health detail: HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds; 134 pgs backfill_wait; 11 pgs backfilling; 69 pgs degraded; 14 pgs down; 2 pgs incomplete; 14 pgs peering; 6 pgs recovery_wait; 69 pgs stuck degraded; 16 pgs stuck inactive; 167 pgs stuck

[ceph-users] ceph (jewel) unable to recover after node failure

2020-01-07 Thread Hanspeter Kunz
Hi, after a node failure ceph is unable to recover, i.e. unable to reintegrate the failed node back into the cluster. what happened? 1. a node with 11 osds crashed, the remaining 4 nodes (also with 11 osds each) re-balanced, although reporting the following error condition: too many PGs per OSD