Hi Francois, What is the output of `ceph balancer status` ? Also, can you increase the debug_mgr to 4/5 then share the log file of the active mgr?
Best, Dan On Fri, Jan 29, 2021 at 10:54 AM Francois Legrand <f...@lpnhe.in2p3.fr> wrote: > > Thanks for your suggestion. I will have a look ! > > But I am a bit surprised that the "official" balancer seems so unefficient ! > > F. > > Le 28/01/2021 à 12:00, Jonas Jelten a écrit : > > Hi! > > > > We also suffer heavily from this so I wrote a custom balancer which yields > > much better results: > > https://github.com/TheJJ/ceph-balancer > > > > After you run it, it echoes the PG movements it suggests. You can then just > > run those commands the cluster will balance more. > > It's kinda work in progress, so I'm glad about your feedback. > > > > Maybe it helps you :) > > > > -- Jonas > > > > On 27/01/2021 17.15, Francois Legrand wrote: > >> Hi all, > >> I have a cluster with 116 disks (24 new disks of 16TB added in december > >> and the rest of 8TB) running nautilus 14.2.16. > >> I moved (8 month ago) from crush_compat to upmap balancing. > >> But the cluster seems not well balanced, with a number of pgs on the 8TB > >> disks varying from 26 to 52 ! And an occupation from 35 to 69%. > >> The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space > >> between 30 and 43%. > >> Last week, I realized that some osd were maybe not using upmap because I > >> did a ceph osd crush weight-set ls and got (compat) as result. > >> Thus I ran a ceph osd crush weight-set rm-compat which triggered some > >> rebalancing. Now there is no more recovery for 2 days, but the cluster is > >> still unbalanced. > >> As far as I understand, upmap is supposed to reach an equal number of pgs > >> on all the disks (I guess weighted by their capacity). > >> Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the > >> 16TB and around 50% usage on all. Which is not the case (by far). > >> The problem is that it impact the free available space in the pools (264Ti > >> while there is more than 578Ti free in the cluster) because free space > >> seems to be based on space available before the first osd will be full ! > >> Is it normal ? Did I missed something ? What could I do ? > >> > >> F. > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io