[ceph-users] Re: R: R: Re: CephFS troubleshooting

Alexander Patrakov Wed, 04 Sep 2024 04:54:27 -0700

Hello Eugenio,

All previous "it just hangs" issues that I have seen previously were
down to some network problem. Please check that you can ping all OSDs,
MDSs, and MONs from the client. Please retest using large pings (ping
-M dont -s 8972 192.168.12.34). Please inspect firewalls. If multiple
network cards are used, make sure that cables are not accidentally
swapped.


On Wed, Sep 4, 2024 at 6:50 PM Eugenio Tampieri
<eugenio.tampi...@readydigital.it> wrote:
>
> > Has it worked before or did it just stop working at some point? What's the 
> > exact command that fails (and error message if there is)?
>
> It was working using the NFS gateway, I never tried with the Ceph FUSE mount. 
> The command is ceph-fuse --id migration /mnt/repo. No error message, it just 
> hangs.
>
> > > For the "too many PGs per OSD" I suppose I have to add some other
> > > OSDs, right?
>
> > Either that or reduce the number of PGs. If you had only a few pools I'd 
> > suggest to leave it to the autoscaler, but not for 13 pools. You can paste 
> > 'ceph osd df' and 'ceph osd pool ls detail' if you need more input for that.
>
> I already have the autoscaler enabled. Here is the output you asked for
> ---
> ceph osd df
> ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     
> AVAIL    %USE   VAR   PGS  STATUS
>  2    hdd  0.90970   1.00000  932 GiB  332 GiB  330 GiB  1.7 MiB  1.4 GiB  
> 600 GiB  35.63  0.88  329      up
>  4    hdd  0.90970   1.00000  932 GiB  400 GiB  399 GiB  1.6 MiB  1.5 GiB  
> 531 GiB  42.94  1.07  331      up
>  3    hdd  0.45479   1.00000  466 GiB  203 GiB  202 GiB  1.0 MiB  988 MiB  
> 263 GiB  43.57  1.08  206      up
>  5    hdd  0.93149   1.00000  932 GiB  379 GiB  378 GiB  1.6 MiB  909 MiB  
> 552 GiB  40.69  1.01  321      up
>                        TOTAL  3.2 TiB  1.3 TiB  1.3 TiB  5.9 MiB  4.8 GiB  
> 1.9 TiB  40.30
> MIN/MAX VAR: 0.88/1.08  STDDEV: 3.15
> ---
> ceph osd pool ls detail
> pool 1 '.mgr' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins 
> pg_num 1 pgp_num 1 autoscale_mode on last_change 24150 flags hashpspool 
> stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
> pool 2 'kubernetes' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/92 
> flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 
> 0/0/123 flags hashpspool stripe_width 0 application rgw
> pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
> lfor 0/0/132 flags hashpspool stripe_width 0 application rgw
> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
> lfor 0/0/132 flags hashpspool stripe_width 0 application rgw
> pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
> lfor 0/0/134 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application 
> rgw
> pool 7 'repo_data' replicated size 2 min_size 1 crush_rule 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 30692 lfor 
> 0/30692/30690 flags hashpspool stripe_width 0 application cephfs
> pool 8 'repo_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 
> 0/0/150 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
> recovery_priority 5 application cephfs
> pool 9 '.nfs' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins 
> pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/169 flags 
> hashpspool stripe_width 0 application nfs
> pool 11 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
> lfor 0/0/592 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application 
> rgw
> pool 12 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
> lfor 0/0/592 flags hashpspool stripe_width 0 application rgw
> pool 13 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 
> 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 
> 24150 lfor 0/0/644 flags hashpspool stripe_width 0 application rgw
> pool 19 'kubernetes-lan' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
> lfor 0/0/15682 flags hashpspool,selfmanaged_snaps stripe_width 0 application 
> rbd
> ---
> Regards
>
> Zitat von Eugenio Tampieri <eugenio.tampi...@readydigital.it>:
>
> > Hi Eugen,
> > Sorry, but I had some trouble when I signed up and then I was away so
> > I missed your reply.
> >
> >> ceph auth export client.migration
> >> [client.migration]
> >>         key = redacted
> >>         caps mds = "allow rw fsname=repo"
> >>         caps mon = "allow r fsname=repo"
> >>         caps osd = "allow rw tag cephfs data=repo"
> >
> > For the "too many PGs per OSD" I suppose I have to add some other
> > OSDs, right?
> >
> > Thanks,
> >
> > Eugenio
> >
> > -----Messaggio originale-----
> > Da: Eugen Block <ebl...@nde.ag>
> > Inviato: mercoledì 4 settembre 2024 10:07
> > A: ceph-users@ceph.io
> > Oggetto: [ceph-users] Re: CephFS troubleshooting
> >
> > Hi, I already responded to your first attempt:
> >
> > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GS7KJ
> > RJP7BAOF66KJM255G27TJ4KG656/
> >
> > Please provide the requested details.
> >
> >
> > Zitat von Eugenio Tampieri <eugenio.tampi...@readydigital.it>:
> >
> >> Hello,
> >> I'm writing to troubleshoot an otherwise functional Ceph quincy
> >> cluster that has issues with cephfs.
> >> I cannot mount it with ceph-fuse (it gets stuck), and if I mount it
> >> with NFS I can list the directories but I cannot read or write
> >> anything.
> >> Here's the output of ceph -s
> >>   cluster:
> >>     id:     3b92e270-1dd6-11ee-a738-000c2937f0ec
> >>     health: HEALTH_WARN
> >>             mon ceph-storage-a is low on available space
> >>             1 daemons have recently crashed
> >>             too many PGs per OSD (328 > max 250)
> >>
> >>   services:
> >>     mon:        5 daemons, quorum
> >> ceph-mon-a,ceph-storage-a,ceph-mon-b,ceph-storage-c,ceph-storage-d
> >> (age 105m)
> >>     mgr:        ceph-storage-a.ioenwq(active, since 106m), standbys:
> >> ceph-mon-a.tiosea
> >>     mds:        1/1 daemons up, 2 standby
> >>     osd:        4 osds: 4 up (since 104m), 4 in (since 24h)
> >>     rbd-mirror: 2 daemons active (2 hosts)
> >>     rgw:        2 daemons active (2 hosts, 1 zones)
> >>
> >>   data:
> >>     volumes: 1/1 healthy
> >>     pools:   13 pools, 481 pgs
> >>     objects: 231.83k objects, 648 GiB
> >>     usage:   1.3 TiB used, 1.8 TiB / 3.1 TiB avail
> >>     pgs:     481 active+clean
> >>
> >>   io:
> >>     client:   1.5 KiB/s rd, 8.6 KiB/s wr, 1 op/s rd, 0 op/s wr
> >> Best regards,
> >>
> >> Eugenio Tampieri
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> >> email to ceph-users-le...@ceph.io
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Alexander Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: R: R: Re: CephFS troubleshooting

Reply via email to