Hello Eugenio, All previous "it just hangs" issues that I have seen previously were down to some network problem. Please check that you can ping all OSDs, MDSs, and MONs from the client. Please retest using large pings (ping -M dont -s 8972 192.168.12.34). Please inspect firewalls. If multiple network cards are used, make sure that cables are not accidentally swapped.
On Wed, Sep 4, 2024 at 6:50 PM Eugenio Tampieri <eugenio.tampi...@readydigital.it> wrote: > > > Has it worked before or did it just stop working at some point? What's the > > exact command that fails (and error message if there is)? > > It was working using the NFS gateway, I never tried with the Ceph FUSE mount. > The command is ceph-fuse --id migration /mnt/repo. No error message, it just > hangs. > > > > For the "too many PGs per OSD" I suppose I have to add some other > > > OSDs, right? > > > Either that or reduce the number of PGs. If you had only a few pools I'd > > suggest to leave it to the autoscaler, but not for 13 pools. You can paste > > 'ceph osd df' and 'ceph osd pool ls detail' if you need more input for that. > > I already have the autoscaler enabled. Here is the output you asked for > --- > ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS > 2 hdd 0.90970 1.00000 932 GiB 332 GiB 330 GiB 1.7 MiB 1.4 GiB > 600 GiB 35.63 0.88 329 up > 4 hdd 0.90970 1.00000 932 GiB 400 GiB 399 GiB 1.6 MiB 1.5 GiB > 531 GiB 42.94 1.07 331 up > 3 hdd 0.45479 1.00000 466 GiB 203 GiB 202 GiB 1.0 MiB 988 MiB > 263 GiB 43.57 1.08 206 up > 5 hdd 0.93149 1.00000 932 GiB 379 GiB 378 GiB 1.6 MiB 909 MiB > 552 GiB 40.69 1.01 321 up > TOTAL 3.2 TiB 1.3 TiB 1.3 TiB 5.9 MiB 4.8 GiB > 1.9 TiB 40.30 > MIN/MAX VAR: 0.88/1.08 STDDEV: 3.15 > --- > ceph osd pool ls detail > pool 1 '.mgr' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins > pg_num 1 pgp_num 1 autoscale_mode on last_change 24150 flags hashpspool > stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr > pool 2 'kubernetes' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/92 > flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd > pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor > 0/0/123 flags hashpspool stripe_width 0 application rgw > pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 > lfor 0/0/132 flags hashpspool stripe_width 0 application rgw > pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 > lfor 0/0/132 flags hashpspool stripe_width 0 application rgw > pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 > lfor 0/0/134 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application > rgw > pool 7 'repo_data' replicated size 2 min_size 1 crush_rule 0 object_hash > rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 30692 lfor > 0/30692/30690 flags hashpspool stripe_width 0 application cephfs > pool 8 'repo_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor > 0/0/150 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 > recovery_priority 5 application cephfs > pool 9 '.nfs' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins > pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/169 flags > hashpspool stripe_width 0 application nfs > pool 11 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 > lfor 0/0/592 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application > rgw > pool 12 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 > lfor 0/0/592 flags hashpspool stripe_width 0 application rgw > pool 13 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule > 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change > 24150 lfor 0/0/644 flags hashpspool stripe_width 0 application rgw > pool 19 'kubernetes-lan' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 > lfor 0/0/15682 flags hashpspool,selfmanaged_snaps stripe_width 0 application > rbd > --- > Regards > > Zitat von Eugenio Tampieri <eugenio.tampi...@readydigital.it>: > > > Hi Eugen, > > Sorry, but I had some trouble when I signed up and then I was away so > > I missed your reply. > > > >> ceph auth export client.migration > >> [client.migration] > >> key = redacted > >> caps mds = "allow rw fsname=repo" > >> caps mon = "allow r fsname=repo" > >> caps osd = "allow rw tag cephfs data=repo" > > > > For the "too many PGs per OSD" I suppose I have to add some other > > OSDs, right? > > > > Thanks, > > > > Eugenio > > > > -----Messaggio originale----- > > Da: Eugen Block <ebl...@nde.ag> > > Inviato: mercoledì 4 settembre 2024 10:07 > > A: ceph-users@ceph.io > > Oggetto: [ceph-users] Re: CephFS troubleshooting > > > > Hi, I already responded to your first attempt: > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GS7KJ > > RJP7BAOF66KJM255G27TJ4KG656/ > > > > Please provide the requested details. > > > > > > Zitat von Eugenio Tampieri <eugenio.tampi...@readydigital.it>: > > > >> Hello, > >> I'm writing to troubleshoot an otherwise functional Ceph quincy > >> cluster that has issues with cephfs. > >> I cannot mount it with ceph-fuse (it gets stuck), and if I mount it > >> with NFS I can list the directories but I cannot read or write > >> anything. > >> Here's the output of ceph -s > >> cluster: > >> id: 3b92e270-1dd6-11ee-a738-000c2937f0ec > >> health: HEALTH_WARN > >> mon ceph-storage-a is low on available space > >> 1 daemons have recently crashed > >> too many PGs per OSD (328 > max 250) > >> > >> services: > >> mon: 5 daemons, quorum > >> ceph-mon-a,ceph-storage-a,ceph-mon-b,ceph-storage-c,ceph-storage-d > >> (age 105m) > >> mgr: ceph-storage-a.ioenwq(active, since 106m), standbys: > >> ceph-mon-a.tiosea > >> mds: 1/1 daemons up, 2 standby > >> osd: 4 osds: 4 up (since 104m), 4 in (since 24h) > >> rbd-mirror: 2 daemons active (2 hosts) > >> rgw: 2 daemons active (2 hosts, 1 zones) > >> > >> data: > >> volumes: 1/1 healthy > >> pools: 13 pools, 481 pgs > >> objects: 231.83k objects, 648 GiB > >> usage: 1.3 TiB used, 1.8 TiB / 3.1 TiB avail > >> pgs: 481 active+clean > >> > >> io: > >> client: 1.5 KiB/s rd, 8.6 KiB/s wr, 1 op/s rd, 0 op/s wr > >> Best regards, > >> > >> Eugenio Tampieri > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > >> email to ceph-users-le...@ceph.io > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > > email to ceph-users-le...@ceph.io > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io