> On Jan 2, 2025, at 11:18 AM, Nicola Mori <m...@fi.infn.it> wrote: > > Hi Anthony, thanks for your insights. I actually used df -h from the bash > shell of a machine mounting the CephFS with the kernel module, and here's the > current result: > > wizardfs_rootsquash@b1029256-7bb3-11ec-a8ce-ac1f6b627b45.wizardfs=/ 217T > 78T 139T 36% /wizard/ceph > > So it seems the fs size is 217 TiB, which is about 66% of the total amount of > raw disk space (320 TiB) as I wrote before. > > Then I tried the command you suggested: > > # ceph df > --- RAW STORAGE --- > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 320 TiB 216 TiB 104 TiB 104 TiB 32.56 > TOTAL 320 TiB 216 TiB 104 TiB 104 TiB 32.56 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL > .mgr 1 1 242 MiB 62 726 MiB 0 62 TiB > wizard_metadata 2 16 1.2 GiB 85.75k 3.5 GiB 0 62 TiB > wizard_data 3 512 78 TiB 27.03M 104 TiB 36.06 138 TiB > > In order to find the total size of the data pool I don't understand how to > interpret the "MAX AVAIL" column: should it be summed to "STORED" or to > "USED”?
Do you have a lot of small files? > In the first case I'd get 216 TiB which corresponds to what df -h says and > thus to 66%, in the second case I'd get 242 TiB which is very close to 75%... > But I guess the first option is the right one. > > Then I looked at the weights of my failure domain (host): > > # ceph osd tree | grep host > > -7 25.51636 host aka > -3 25.51636 host balin > -13 29.10950 host bifur > -17 29.10950 host bofur > -21 29.10371 host dwalin > -23 21.83276 host fili > -25 29.10950 host kili > -9 25.51636 host ogion > -19 25.51636 host prestno > -15 29.10522 host remolo > -5 25.51636 host rokanan > -11 27.29063 host romolo > > They seem quite even and quite reflecting the actual total size of each host: > > # ceph orch host ls --detail > HOST . . . HDD > aka 9/28.3TB > balin 9/28.3TB > bifur 9/32.5TB > bofur 8/32.0TB > dwalin 16/32.0TB > fili 12/24.0TB > kili 8/32.0TB > ogion 8/28.0TB > prestno 9/28.3TB > remolo 16/32.0TB > rokanan 9/28.5TB > romolo 16/30.0TB > > so I see no problem here (in fact, making these even is the idea behind the > disk upgrade strategy I am pursuing). > > About the OSD outlier: there seems to be not such an OSD, the maximum OSD > occupancy is 38% and it smoothly decreases down to a minimum of 27% with no > jumps. That’s a very high variance. If the balancer is working it should be like +/- 1-2%. Available space in the cluster will be reported as though all OSDs are 38%. > > About PGs: I have 512 PGs in the data pool and 124 OSDs in total, maybe the > count is too low but I'm hesitant to increase it since my cluster is very low > specs and I fear to run out of memory on the oldest machines. > > About CRUSH rules: I don't know exactly what to search for, so if you believe > it's important then I'd need some advice. > > Thank you again for your precious help, > > Nicola > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io