Hi, Sorry for poking this old thread, but does this issue still persist in the 6.3 kernels?
Cheers, Dan ______________________________ Clyso GmbH | https://www.clyso.com On Wed, Dec 7, 2022 at 3:42 AM William Edwards <wedwa...@cyberfusion.nl> wrote: > > > > Op 7 dec. 2022 om 11:59 heeft Stefan Kooman <ste...@bit.nl> het volgende > > geschreven: > > > > On 5/13/22 09:38, Xiubo Li wrote: > >>> On 5/12/22 12:06 AM, Stefan Kooman wrote: > >>> Hi List, > >>> > >>> We have quite a few linux kernel clients for CephFS. One of our customers > >>> has been running mainline kernels (CentOS 7 elrepo) for the past two > >>> years. They started out with 3.x kernels (default CentOS 7), but upgraded > >>> to mainline when those kernels would frequently generate MDS warnings > >>> like "failing to respond to capability release". That worked fine until > >>> 5.14 kernel. 5.14 and up would use a lot of CPU and *way* more bandwidth > >>> on CephFS than older kernels (order of magnitude). After the MDS was > >>> upgraded from Nautilus to Octopus that behavior is gone (comparable CPU / > >>> bandwidth usage as older kernels). However, the newer kernels are now the > >>> ones that give "failing to respond to capability release", and worse, > >>> clients get evicted (unresponsive as far as the MDS is concerned). Even > >>> the latest 5.17 kernels have that. No difference is observed between > >>> using messenger v1 or v2. MDS version is 15.2.16. > >>> Surprisingly the latest stable kernels from CentOS 7 work flawlessly now. > >>> Although that is good news, newer operating systems come with newer > >>> kernels. > >>> > >>> Does anyone else observe the same behavior with newish kernel clients? > >> There have some known bugs, which have been fixed or under fixing > >> recently, even in the mainline and, not sure whether are they related. > >> Such as [1][2][3][4]. More detail please see ceph-client repo testing > >> branch [5]. > > > > None of the issues you mentioned were related. We gained some more > > experience with newer kernel clients, specifically on Ubuntu Focal / Jammy > > (5.15). Performance issues seem to arise in certain workloads, specifically > > load-balanced Apache shared web hosting clusters with CephFS. We have > > tested linux kernel clients from 5.8 up to and including 6.0 with a > > production workload and the short summary is: > > > > < 5.13, everything works fine > > 5.13 and up is giving issues > > I see this issue on 6.0.0 as well. > > > > > We tested the 5.13.-rc1 as well, and already that kernel is giving issues. > > So something has changed in 5.13 that results in performance regression in > > certain workloads. And I wonder if it has something to do with the changes > > related to fscache that have, and are, happening in the kernel. These web > > servers might access the same directories / files concurrently. > > > > Note: we have quite a few 5.15 kernel clients not doing any (load-balanced) > > web based workload (container clusters on CephFS) that don't have any > > performance issue running these kernels. > > > > Issue: poor CephFS performance > > Symptom / result: excessive CephFS network usage (order of magnitude higher > > than for older kernels not having this issue), within a minute there are a > > bunch of slow web service processes, claiming loads of virtual memory, that > > result in heavy swap usage and basically rendering the node unusable slow. > > > > Other users that replied to this thread experienced similar symptoms. It is > > reproducible on both CentOS (EPEL mainline kernels) as well as on Ubuntu > > (hwe as well as default relase kernel). > > > > MDS version used: 15.2.16 (with a backported patch from 15.2.17) (single > > active / standby-replay) > > > > Does this ring a bell? > > > > Gr. Stefan > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io