[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

Dan van der Ster Mon, 29 May 2023 11:26:53 -0700

Hi,

Sorry for poking this old thread, but does this issue still persist in
the 6.3 kernels?


Cheers, Dan

______________________________
Clyso GmbH | https://www.clyso.com


On Wed, Dec 7, 2022 at 3:42 AM William Edwards <wedwa...@cyberfusion.nl> wrote:
>
>
> > Op 7 dec. 2022 om 11:59 heeft Stefan Kooman <ste...@bit.nl> het volgende 
> > geschreven:
> >
> > On 5/13/22 09:38, Xiubo Li wrote:
> >>> On 5/12/22 12:06 AM, Stefan Kooman wrote:
> >>> Hi List,
> >>>
> >>> We have quite a few linux kernel clients for CephFS. One of our customers 
> >>> has been running mainline kernels (CentOS 7 elrepo) for the past two 
> >>> years. They started out with 3.x kernels (default CentOS 7), but upgraded 
> >>> to mainline when those kernels would frequently generate MDS warnings 
> >>> like "failing to respond to capability release". That worked fine until 
> >>> 5.14 kernel. 5.14 and up would use a lot of CPU and *way* more bandwidth 
> >>> on CephFS than older kernels (order of magnitude). After the MDS was 
> >>> upgraded from Nautilus to Octopus that behavior is gone (comparable CPU / 
> >>> bandwidth usage as older kernels). However, the newer kernels are now the 
> >>> ones that give "failing to respond to capability release", and worse, 
> >>> clients get evicted (unresponsive as far as the MDS is concerned). Even 
> >>> the latest 5.17 kernels have that. No difference is observed between 
> >>> using messenger v1 or v2. MDS version is 15.2.16.
> >>> Surprisingly the latest stable kernels from CentOS 7 work flawlessly now. 
> >>> Although that is good news, newer operating systems come with newer 
> >>> kernels.
> >>>
> >>> Does anyone else observe the same behavior with newish kernel clients?
> >> There have some known bugs, which have been fixed or under fixing 
> >> recently, even in the mainline and, not sure whether are they related. 
> >> Such as [1][2][3][4]. More detail please see ceph-client repo testing 
> >> branch [5].
> >
> > None of the issues you mentioned were related. We gained some more 
> > experience with newer kernel clients, specifically on Ubuntu Focal / Jammy 
> > (5.15). Performance issues seem to arise in certain workloads, specifically 
> > load-balanced Apache shared web hosting clusters with CephFS. We have 
> > tested linux kernel clients from 5.8 up to and including 6.0 with a 
> > production workload and the short summary is:
> >
> > < 5.13, everything works fine
> > 5.13 and up is giving issues
>
> I see this issue on 6.0.0 as well.
>
> >
> > We tested the 5.13.-rc1 as well, and already that kernel is giving issues. 
> > So something has changed in 5.13 that results in performance regression in 
> > certain workloads. And I wonder if it has something to do with the changes 
> > related to fscache that have, and are, happening in the kernel. These web 
> > servers might access the same directories / files concurrently.
> >
> > Note: we have quite a few 5.15 kernel clients not doing any (load-balanced) 
> > web based workload (container clusters on CephFS) that don't have any 
> > performance issue running these kernels.
> >
> > Issue: poor CephFS performance
> > Symptom / result: excessive CephFS network usage (order of magnitude higher 
> > than for older kernels not having this issue), within a minute there are a 
> > bunch of slow web service processes, claiming loads of virtual memory, that 
> > result in heavy swap usage and basically rendering the node unusable slow.
> >
> > Other users that replied to this thread experienced similar symptoms. It is 
> > reproducible on both CentOS (EPEL mainline kernels) as well as on Ubuntu 
> > (hwe as well as default relase kernel).
> >
> > MDS version used: 15.2.16 (with a backported patch from 15.2.17) (single 
> > active / standby-replay)
> >
> > Does this ring a bell?
> >
> > Gr. Stefan
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

Reply via email to