[ceph-users] endless remapping after increasing number of PG in a pool

Michel Jouvin Tue, 01 Apr 2025 00:07:49 -0700

Hi,

We are observing a new strange behaviour on our production cluster : weincreased the number of PG (from 256 to 2048) in a (EC) pool after awarning that there was a very high number of objects per pool (the poolhas 52M objects).

Background: this happens in the cluster that had a strange problem lastweek, discussed in the thread "Production cluster in bad shape afterseveral OSD crashes". The PG increase was done after the clusterreturned to a normal state.

The increase of the number of PG resulted in 20% misplaced objects and~160 PG remapped (over 256). As there is no much user activity on thiscluster (except on this pool) these days, we decided to set the mclockprofile to high_recovery_ops. We also disabled autoscaler on this pool(it was enabled and it is not clear why we add the warning withautoscaler enabled). The pool was created with --bulk.

The remapping went steadily during 2-3 days (as much as we can tell) butwhen reaching the end (between 0.5% and 1% of misplaced objects, ~10 PGremapped) it readds remapped PG (that can be seen with 'ceph pgdump_stuck'), all belonging to the pool affected by the increase. Thisalready happened 3-4 times and it is very unclear why. There was nospecific problem reported on the cluster that may explain this (no OSDdown). I was wondering if the balancer may be responsible for this but Idon't have the feeling it is the case: first the balancer doesn't reportabout doing something (but I may miss the history), second the balancerwould probably affect PG from different pools. (there are 50 ppols inthe cluster). There are 2 warnings that may or may not be related:

- All mon have their data size > 15 GB (mon_data_size_warn). Currently16 GB. It happened progressively during the remapping on all 3 mon, Iguess it is due to the operation in progress and is harmless. Do youconfirm?

- Since yesterday we have 4 PG that have not been deep scrubbed in time,belonging to different pools. Again, I tend to correlate this to theremapping in progress putting too much load (or other constraints), asthere are a lot of deep scrubs per day. The current age distribution ofdeep scrubs is:


      4 "2025-03-19
     21 "2025-03-20
     46 "2025-03-21
     35 "2025-03-22
     81 "2025-03-23
    597 "2025-03-24
   1446 "2025-03-25
   2234 "2025-03-26
   2256 "2025-03-27
   1625 "2025-03-28
   1980 "2025-03-29
   2993 "2025-03-30
   3871 "2025-03-31
   1113 "2025-04-01

Should we worry about the situation? If yes, what would you advise tolook at or do? To clear the problem last week, we had to restart all OSDbut we didn't restart mon. Do they play a role in deciding the remappingplan? Is restarting them something that may help?


As usual, thanks in advance for any help/hint.

Best regards,

Michel
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] endless remapping after increasing number of PG in a pool

Reply via email to