On 8/14/19 9:48 AM, Simon Oosthoek wrote:
> Hi all,
>
> Yesterday I marked out all the osds on one node in our new cluster to
> reconfigure them with WAL/DB on their NVMe devices, but it is taking
> ages to rebalance. The whole cluster (and thus the osds) is only ~1%
> full, therefore the full ratio is nowhere in sight.
>
> We have 14 osd nodes with 12 disks each, one of them was marked out,
> Yesterday around noon. It is still not completed and all the while, the
> cluster is in ERROR state, even though this is a normal maintenance
> operation.
>
> We are still experimenting with the cluster, and it is still operational
> while being in ERROR state, however it is slightly worrying when
> considering that it could take even (50x?) longer if the cluster has 50x
> the amount of data. And the OSD's are mostly flatlined in the dashboard
> graphs, so it could potentially do it much faster, I think.
>
> below are a few outputs of ceph -s(w):
>
> Yesterday afternoon (~16:00)
> # ceph -w
> cluster:
> id: b489547c-ba50-4745-a914-23eb78e0e5dc
> health: HEALTH_ERR
> Degraded data redundancy (low space): 139 pgs backfill_toofull
>
> services:
> mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 4h)
> mgr: cephmon1(active, since 4h), standbys: cephmon2, cephmon3
> mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby
> osd: 168 osds: 168 up (since 3h), 156 in (since 3h); 1588 remapped pgs
> rgw: 1 daemon active (cephs3.rgw0)
>
> data:
> pools: 12 pools, 4116 pgs
> objects: 14.04M objects, 11 TiB
> usage: 20 TiB used, 1.7 PiB / 1.8 PiB avail
> pgs: 16188696/109408503 objects misplaced (14.797%)
> 2528 active+clean
> 1422 active+remapped+backfill_wait
> 139 active+remapped+backfill_wait+backfill_toofull
> 27 active+remapped+backfilling
>
> io:
> recovery: 205 MiB/s, 198 objects/s
>
> progress:
> Rebalancing after osd.47 marked out
> [=====================.........]
> Rebalancing after osd.5 marked out
> [===================...........]
> Rebalancing after osd.132 marked out
> [=====================.........]
> Rebalancing after osd.90 marked out
> [=====================.........]
> Rebalancing after osd.76 marked out
> [=====================.........]
> Rebalancing after osd.157 marked out
> [==================............]
> Rebalancing after osd.19 marked out
> [=====================.........]
> Rebalancing after osd.118 marked out
> [====================..........]
> Rebalancing after osd.146 marked out
> [=================.............]
> Rebalancing after osd.104 marked out
> [====================..........]
> Rebalancing after osd.62 marked out
> [=======================.......]
> Rebalancing after osd.33 marked out
> [======================........]
>
>
> This morning:
> # ceph -s
> cluster:
> id: b489547c-ba50-4745-a914-23eb78e0e5dc
> health: HEALTH_ERR
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
> services:
> mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 22h)
> mgr: cephmon1(active, since 22h), standbys: cephmon2, cephmon3
> mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
> osd: 168 osds: 168 up (since 22h), 156 in (since 21h); 189 remapped pgs
> rgw: 1 daemon active (cephs3.rgw0)
>
> data:
> pools: 12 pools, 4116 pgs
> objects: 14.11M objects, 11 TiB
> usage: 21 TiB used, 1.7 PiB / 1.8 PiB avail
> pgs: 4643284/110159565 objects misplaced (4.215%)
> 3927 active+clean
> 162 active+remapped+backfill_wait
> 19 active+remapped+backfilling
> 8 active+remapped+backfill_wait+backfill_toofull
>
> io:
> client: 32 KiB/s rd, 0 B/s wr, 31 op/s rd, 21 op/s wr
> recovery: 198 MiB/s, 149 objects/s
>
It is still recovering it seems with 149 objects/second.
> progress:
> Rebalancing after osd.47 marked out
> [=============================.]
> Rebalancing after osd.5 marked out
> [=============================.]
> Rebalancing after osd.132 marked out
> [=============================.]
> Rebalancing after osd.90 marked out
> [=============================.]
> Rebalancing after osd.76 marked out
> [=============================.]
> Rebalancing after osd.157 marked out
> [=============================.]
> Rebalancing after osd.19 marked out
> [=============================.]
> Rebalancing after osd.146 marked out
> [=============================.]
> Rebalancing after osd.104 marked out
> [=============================.]
> Rebalancing after osd.62 marked out
> [=============================.]
>
>
> I found some hints, though I'm not sure it's right for us at this url:
> https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/
> :
>> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
>> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
>
> Since the cluster is currently hardly loaded, backfilling can take up
> all the unused bandwidth as far as I'm concerned...
>
> Is it a good idea to give the above commands or other commands to speed
> up the backfilling? (e.g. like increasing "osd max backfills")
>
Yes, as right now the OSDs aren't doing that many backfills. You still
have a large queue of PGs which need to be backfilled.
$ ceph tell osd.* config set osd_max_backfills 5
The default is that only one (1) backfills runs at the same time per
OSD. By setting it to 5 you speed up the process by increasing the
concurrency. This will however add load to the system and thus reduce
the available I/O for clients.
Wido
> Cheers
>
> /Simon
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com