** Description changed: Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command: | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e- 583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" + Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. + * The bcache btree size fluctuation seems "normal" [5] + * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". + + I decided to check if one of the function called during + bch_btree_insert_node was taking longer than usual when the shrinker was + enabled. + + I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. + Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. + + I'm going to try to have another go with some perf/klockstat/funclatency + focused on bch_mca_scan and the function called there. + + Also, here are some memory related metrics [8]. + [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ + [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ + [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ + [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ + [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions: linux-restricted-modules-4.15.0-118-generic N/A linux-backports-modules-4.15.0-118-generic N/A linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1898786 Title: Issue with bcache bch_mca_scan causing huge IO wait To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs