TL;DR @seth-arnold, as a test can you try to set the following options? $ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_bytes $ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_background_bytes
Repeat the test and see if the system is still unresponsive. Details below. This is what I think it's happening in this last scenario: interactive performance killed when a large I/O writer is running. The large I/O writer generates a lot of dirty pages, nothing is forcing to sync those pages to the backing store until the dirty_ratio (=20%) / dirty_background_ratio (=10%) thresholds are hit. And they can be quite high with the default settings in systems with a lot of RAM. For example in a system with 16GB of free/reclaimable memory, the amount of dirty memory that is allowed before a writer is actively forced to flush those pages to the backing store is: 16GB * 20 / 100 = 3.2GB. Flusher threads are started when the amount of dirty pages is 16GB * 10 / 100 = 1.6GB of dirty memory. So, if the writer doesn't stop, it will consume all the free pages in the system and at that point we are going to have a lot of dirty pages. Then the kernel needs to decide what to do to free up some pages. Reclaimable memory is the first choice: cached clean pages that already have a copy on the corresponding backing store are easy to reclaim, because they just need to be dropped from the page cache (no I/O involved). Dirty pages are more expensive to reclaim, because they need to be flushed to the backing store before freeing up the page. Same with anonymous memory that needs to be flushed to the swap device, before being able to re-use the page. So when the system starts to reclaim some pages, we see some swap activity and we also see some I/O due to the flushing of the dirty pages. I think the system becomes sluggish, because there are too many dirty pages, the kernel is spending too much time to select the right pages to reclaim and interactive performance is killed. This looks like a bug/regression in the kernel and I think we should definitely investigate more and track down the reason of the problem. In the meantime, as a test to prove this thoery I think we could try to reduce the amount of allowed dirty pages in the system, tuning the dirty thresholds: vm.dirty_bytes and vm.dirty_background_bytes (using the *_bytes tuners to have a more fine-grained control on those thresholds) and see if there are some benefits in the specific scenario reported by Seth. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861359 Title: swap storms kills interactive use Status in linux package in Ubuntu: Confirmed Status in linux source package in Focal: Confirmed Bug description: [Impact] High watermark boosting can cause large swap activity under certain memory intensive workloads, making the system very unresponsive (screen does not refresh, keyboard not responding, etc.). This large swap activity seems to be prevented disabling high watermark boosting. [Test case] Opening this web page in chrome seems to be a good reproducer of the problem: https://platform.leolabs.space/visualizations/conjunction?type=conjunction&reportId=2004981040 When this page is opened we can clearly see from 'top' (for example) that the used swap is going up very quickly. With the fix applied swap is not used at all and the system is always responsive. [Fix] Set vm.watermark_boost_factor to 0, disabling watermark boosting by default. [Regression potential] Regression potential is minimal, setting vm.watermark_boost_factor to 0 by default restores the old kernel behavior before watermark boosting was introduced. In case of unexpected regressions we can always fix this in user-space via sysctl. [Original report] Hello, several times since upgrading to focal from 19.04 I've found my computer entirely unresponsive for periods of twenty or thirty seconds. No mouse movement, no keyboard input, the screen output does not change. My computer was using swap space and despite very slow writeout speeds well below what the NVME drive can handle, the computer was unusable. I've captured some vmstat 1 output and top output that I started collecting during the event. (Normally one very long painful period is followed by several shorter periods of uselessness.) Thanks ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.4.0-12-generic 5.4.0-12.15 ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8 Uname: Linux 5.4.0-12-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu15 Architecture: amd64 Date: Wed Jan 29 23:44:05 2020 ProcEnviron: TERM=rxvt-unicode-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-5.4 UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago) --- ProblemType: Bug AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-12-generic. ApportVersion: 2.20.11-0ubuntu16 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: sarnold 2734 F.... pulseaudio /dev/snd/controlC1: sarnold 2734 F.... pulseaudio Card0.Amixer.info: Card hw:0 'PCH'/'HDA Intel PCH at 0x2fe1028000 irq 145' Mixer name : 'Realtek ALC285' Components : 'HDA:10ec0285,17aa225c,00100002 HDA:8086280b,80860101,00100000' Controls : 53 Simple ctrls : 15 Card1.Amixer.info: Card hw:1 'Audio'/'Generic ThinkPad Dock USB Audio at usb-0000:00:14.0-4.2.4, high speed' Mixer name : 'USB Mixer' Components : 'USB17ef:306f' Controls : 9 Simple ctrls : 4 DistroRelease: Ubuntu 20.04 HibernationDevice: RESUME=none IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: LENOVO 20KHCTO1WW NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) ProcEnviron: TERM=rxvt-unicode-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 i915drmfb ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu@/vmlinuz-5.4.0-12-generic root=ZFS=rpool/ROOT/ubuntu ro root=ZFS=rpool/ROOT/ubuntu quiet splash acpi_osi=! "acpi_osi=Windows 2015" vt.handoff=1 ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8 RelatedPackageVersions: linux-restricted-modules-5.4.0-12-generic N/A linux-backports-modules-5.4.0-12-generic N/A linux-firmware 1.185 Tags: focal Uname: Linux 5.4.0-12-generic x86_64 UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago) UserGroups: adm cdrom libvirt lpadmin plugdev sambashare sbuild sudo _MarkForUpload: True dmi.bios.date: 11/25/2019 dmi.bios.vendor: LENOVO dmi.bios.version: N23ET69W (1.44 ) dmi.board.asset.tag: Not Available dmi.board.name: 20KHCTO1WW dmi.board.vendor: LENOVO dmi.board.version: SDK0J40709 WIN dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: None dmi.modalias: dmi:bvnLENOVO:bvrN23ET69W(1.44):bd11/25/2019:svnLENOVO:pn20KHCTO1WW:pvrThinkPadX1Carbon6th:rvnLENOVO:rn20KHCTO1WW:rvrSDK0J40709WIN:cvnLENOVO:ct10:cvrNone: dmi.product.family: ThinkPad X1 Carbon 6th dmi.product.name: 20KHCTO1WW dmi.product.sku: LENOVO_MT_20KH_BU_Think_FM_ThinkPad X1 Carbon 6th dmi.product.version: ThinkPad X1 Carbon 6th dmi.sys.vendor: LENOVO --- ProblemType: Bug AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-12-generic. ApportVersion: 2.20.11-0ubuntu16 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: sarnold 2734 F.... pulseaudio /dev/snd/controlC1: sarnold 2734 F.... pulseaudio Card0.Amixer.info: Card hw:0 'PCH'/'HDA Intel PCH at 0x2fe1028000 irq 145' Mixer name : 'Realtek ALC285' Components : 'HDA:10ec0285,17aa225c,00100002 HDA:8086280b,80860101,00100000' Controls : 53 Simple ctrls : 15 Card1.Amixer.info: Card hw:1 'Audio'/'Generic ThinkPad Dock USB Audio at usb-0000:00:14.0-4.2.4, high speed' Mixer name : 'USB Mixer' Components : 'USB17ef:306f' Controls : 9 Simple ctrls : 4 DistroRelease: Ubuntu 20.04 HibernationDevice: RESUME=none IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: LENOVO 20KHCTO1WW NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) ProcEnviron: TERM=rxvt-unicode-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 i915drmfb ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu@/vmlinuz-5.4.0-12-generic root=ZFS=rpool/ROOT/ubuntu ro root=ZFS=rpool/ROOT/ubuntu quiet splash acpi_osi=! "acpi_osi=Windows 2015" vt.handoff=1 ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8 RelatedPackageVersions: linux-restricted-modules-5.4.0-12-generic N/A linux-backports-modules-5.4.0-12-generic N/A linux-firmware 1.185 Tags: focal Uname: Linux 5.4.0-12-generic x86_64 UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago) UserGroups: adm cdrom libvirt lpadmin plugdev sambashare sbuild sudo _MarkForUpload: True dmi.bios.date: 11/25/2019 dmi.bios.vendor: LENOVO dmi.bios.version: N23ET69W (1.44 ) dmi.board.asset.tag: Not Available dmi.board.name: 20KHCTO1WW dmi.board.vendor: LENOVO dmi.board.version: SDK0J40709 WIN dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: None dmi.modalias: dmi:bvnLENOVO:bvrN23ET69W(1.44):bd11/25/2019:svnLENOVO:pn20KHCTO1WW:pvrThinkPadX1Carbon6th:rvnLENOVO:rn20KHCTO1WW:rvrSDK0J40709WIN:cvnLENOVO:ct10:cvrNone: dmi.product.family: ThinkPad X1 Carbon 6th dmi.product.name: 20KHCTO1WW dmi.product.sku: LENOVO_MT_20KH_BU_Think_FM_ThinkPad X1 Carbon 6th dmi.product.version: ThinkPad X1 Carbon 6th dmi.sys.vendor: LENOVO To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp