Hello, I have a cluster with 6 OSD nodes each with 10 SATA 8TB drives. Node 6 was just added. All nodes are 10Gbps on the network with Jumbo frames. S3 application access is working as expected but recovery is extremely slow. Based on past posts I attempted to do the following:
Alter the osd_recovery_sleep_hdd. I tried 0 and 0.1. 0 seems to improve the speed slightly but it is still very slow. I also attempted to change osd_max_backfills to 16 from 8 and osd_recovery_max_active to 8 from 4. This showed no noticeable improvement. The cluster is running 13.2.5. Here is the output from ceph -s cluster: id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx health: HEALTH_ERR 3 large omap objects 67164650/268993641 objects misplaced (24.969%) Degraded data redundancy: 612258/268993641 objects degraded (0.228%), 8 pgs degraded, 8 pgs undersized Degraded data redundancy (low space): 9 pgs backfill_toofull services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon2(active), standbys: mon1 osd: 55 osds: 50 up, 50 in; 531 remapped pgs rgw: 3 daemons active data: pools: 15 pools, 1476 pgs objects: 89.66 M objects, 49 TiB usage: 159 TiB used, 205 TiB / 364 TiB avail pgs: 612258/268993641 objects degraded (0.228%) 67164650/268993641 objects misplaced (24.969%) 945 active+clean 507 active+remapped+backfill_wait 9 active+remapped+backfill_wait+backfill_toofull 7 active+remapped+backfilling 4 active+undersized+degraded+remapped+backfill_wait 4 active+undersized+degraded+remapped+backfilling io: client: 5.3 MiB/s rd, 3.9 MiB/s wr, 844 op/s rd, 81 op/s wr recovery: 19 MiB/s, 33 objects/s Any clue at what I can looks at further to investigate the slow recovery would be appreciated.
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com