Launchpad has imported 55 comments from the remote bug at https://bugzilla.kernel.org/show_bug.cgi?id=65201.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2013-11-19T19:40:40+00:00 nleo wrote: kswapd0 randomly load one core of CPU by 100% Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013 x86_64 GNU/Linux No swap enabled Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae, and there is no such problem. [root@localhost ~]# free -mh total used free shared buffers cached Mem: 3.8G 2.4G 1.3G 0B 150M 508M -/+ buffers/cache: 1.8G 2.0G Swap: 0B 0B 0B [root@localhost ~]# cat /proc/meminfo MemTotal: 3935792 kB MemFree: 1381360 kB Buffers: 154216 kB Cached: 533096 kB SwapCached: 0 kB Active: 1958896 kB Inactive: 438004 kB Active(anon): 1740916 kB Inactive(anon): 136292 kB Active(file): 217980 kB Inactive(file): 301712 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 2064 kB Writeback: 0 kB AnonPages: 1709628 kB Mapped: 196696 kB Shmem: 167620 kB Slab: 81516 kB SReclaimable: 61312 kB SUnreclaim: 20204 kB KernelStack: 1696 kB PageTables: 13088 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1967896 kB Committed_AS: 3498576 kB VmallocTotal: 34359738367 kB VmallocUsed: 361304 kB VmallocChunk: 34359300731 kB HardwareCorrupted: 0 kB AnonHugePages: 157696 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 18476 kB DirectMap2M: 4059136 kB And I can't kill it. I heared that it's not good idea, but just for lulz) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/0 ------------------------------------------------------------------------ On 2013-11-20T23:32:02+00:00 atomlin wrote: (In reply to nleo from comment #0) > kswapd0 randomly load one core of CPU by 100% You cannot issue a SIGKILL to 'kswapd' since it is a kernel thread. > CommitLimit: 1967896 kB > Committed_AS: 3498576 kB ^^^^^^^ Seem to be over committing memory. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/1 ------------------------------------------------------------------------ On 2013-11-22T00:57:01+00:00 akpm wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 19 Nov 2013 19:40:40 +0000 bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=65201 > > Bug ID: 65201 > Summary: kswapd0 randomly high cpu load > Product: Memory Management > Version: 2.5 > Kernel Version: 3.12 > Hardware: x86-64 > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: a...@linux-foundation.org > Reporter: n...@nm.ru > Regression: No > > kswapd0 randomly load one core of CPU by 100% > > Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013 > x86_64 > GNU/Linux > > No swap enabled > > Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae, > and > there is no such problem. > > [root@localhost ~]# free -mh > total used free shared buffers cached > Mem: 3.8G 2.4G 1.3G 0B 150M 508M > -/+ buffers/cache: 1.8G 2.0G > Swap: 0B 0B 0B hm, I wonder what kswapd is up to. Could you please make it happen again and then dmesg -n 7 dmesg -c echo m > /proc/sysrq-trigger echo t > /proc/sysrq-trigger dmesg -s 1000000 > foo then send us foo? > > [root@localhost ~]# cat /proc/meminfo > MemTotal: 3935792 kB > MemFree: 1381360 kB > Buffers: 154216 kB > Cached: 533096 kB > SwapCached: 0 kB > Active: 1958896 kB > Inactive: 438004 kB > Active(anon): 1740916 kB > Inactive(anon): 136292 kB > Active(file): 217980 kB > Inactive(file): 301712 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 2064 kB > Writeback: 0 kB > AnonPages: 1709628 kB > Mapped: 196696 kB > Shmem: 167620 kB > Slab: 81516 kB > SReclaimable: 61312 kB > SUnreclaim: 20204 kB > KernelStack: 1696 kB > PageTables: 13088 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 1967896 kB > Committed_AS: 3498576 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 361304 kB > VmallocChunk: 34359300731 kB > HardwareCorrupted: 0 kB > AnonHugePages: 157696 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 18476 kB > DirectMap2M: 4059136 kB > > And I can't kill it. I heared that it's not good idea, but just for lulz) > > -- > You are receiving this mail because: > You are the assignee for the bug. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/2 ------------------------------------------------------------------------ On 2015-04-21T14:54:19+00:00 mihail.zenkov wrote: Created attachment 174671 kmsg dump Sometimes I have same problem. I don't have swap. I have kernel 3.19.0 (i686) compiled without CONFIG_SWAP. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/3 ------------------------------------------------------------------------ On 2015-04-30T04:24:52+00:00 sakhnik wrote: My Acer C720 too suffers occasionally. Turning swap on/off doesn't help. Dropping caches *does* help: # echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough Next my guess would be to try to deactivate zswap. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/4 ------------------------------------------------------------------------ On 2015-05-03T06:33:59+00:00 sakhnik wrote: Zswap isn't to blame, dropping caches may help or may not. There's the output of `sudo perf top`: 26,24% [kernel] [k] _raw_spin_lock 14,72% [kernel] [k] _raw_spin_unlock 6,62% [kernel] [k] super_cache_count 4,97% [kernel] [k] shrink_slab.part.12 4,92% [kernel] [k] list_lru_count_one 2,15% [i2c_designware_core] [k] 0x0000000000000099 1,86% [kernel] [k] shrink_lruvec 1,74% [kernel] [k] mem_cgroup_iter 1,61% [kernel] [k] native_read_tsc 1,55% [kernel] [k] delay_tsc 1,52% [kernel] [k] kswapd% Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/5 ------------------------------------------------------------------------ On 2015-11-09T20:45:11+00:00 ponymarzanna wrote: (In reply to Anatoli Sakhnik from comment #4) > My Acer C720 too suffers occasionally. Turning swap on/off doesn't help. I have the same hardware. After system upgrade (current running kernel version 4.2.0) I get high CPU usage after "heavy" web site opens. If suggested workaround doesn't help (dropping caches), I just quit web browser and everything returns back to normal. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/6 ------------------------------------------------------------------------ On 2015-11-10T19:40:15+00:00 samkostka wrote: Same here, also on an Acer C720 running arch. kswapd0 takes up a whole core whenever swap is being used. I run the Arch kernel, with a small patch to the chromos_laptop driver to enable my trackpad. The weird thing is memory and swap both aren't that full. Memory is at 50% utilization, and swap is only at 8%, according to xfce4-taskmanager. It seems like Google Docs is the worst offender for triggering this issue. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/7 ------------------------------------------------------------------------ On 2015-11-16T20:18:55+00:00 mvanross wrote: I had this bug, and for me it turned out to be my /tmp directory that is a tmpfs (to gain speed and save my ssd). df /tmp gave tmpfs 3880480 2449036 1431444 95% /tmp After removing junk from /tmp/ the system returned to normal. Also in my case I had no swap, and sufficient free memory. Would be interested to know if this works for you. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/8 ------------------------------------------------------------------------ On 2016-01-19T06:19:40+00:00 serianox wrote: same problem here, c720p chromebook , happens on several different distros like arch, ubuntu, xubuntu. I downgraded to the 4.1.x kernel and the issue is less frequent (needs much more memory pressure to trigger). then I downgraded to the 3.17 kernel and the issue is gone completely. all the previous suggestions and workarrounds didn't work for me. only downgrading the kernel did. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/83 ------------------------------------------------------------------------ On 2016-02-09T11:23:59+00:00 liststuff wrote: Same problem here on Acer C720 Chromebook. I have 2GB of swap space on the SSD (I replaced the original 16GB M2 SSD with a 256GB version) and whenever swap is used I get this problem. Linux localhost 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 15.10 Release: 15.10 Codename: wily echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough works around the issue for me too Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/86 ------------------------------------------------------------------------ On 2016-02-09T12:49:06+00:00 sakhnik wrote: I didn't suffer from the bug since compiled kernel myself: https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out something causing the trouble, but I didn't try to bisect what was the culprit. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/87 ------------------------------------------------------------------------ On 2016-02-09T19:23:59+00:00 serianox wrote: (In reply to Anatoli Sakhnik from comment #11) > I didn't suffer from the bug since compiled kernel myself: > https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out > something causing the trouble, but I didn't try to bisect what was the > culprit. This bug seems to affect 2Gb models only. Do you have the 2Gb or 4Gb version? What are the changes you made on your kernel? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/88 ------------------------------------------------------------------------ On 2016-02-09T20:22:17+00:00 sakhnik wrote: Mine is 2G. I didn't change anything in the kernel source code, but switched off many options in the config file: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 . Even today, if I boot stock arch kernel, the bug regresses; if I boot linux-c720, kswapd0 is still. In theory, I could experiment with different configurations in between stock's and mine to triage the issue. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/89 ------------------------------------------------------------------------ On 2016-02-09T20:32:45+00:00 serianox wrote: perhaps you removed something related to http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html ? also relevant: https://github.com/GalliumOS/galliumos-distro/issues/52#issuecomment-174261443 Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/90 ------------------------------------------------------------------------ On 2016-02-09T20:39:48+00:00 sakhnik wrote: I have no idea yet. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/91 ------------------------------------------------------------------------ On 2016-02-10T10:45:39+00:00 ponymarzanna wrote: To avoid this bug I installed ChromeOS on my C720 (with 2GB RAM). I was happy with performance. Until today. I noticed lags. For some reason this bug appeared suddenly. There was no update. Kernel version is 3.8.11. Stock ChromeOS kernel. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/92 ------------------------------------------------------------------------ On 2016-02-14T06:12:55+00:00 serianox wrote: (In reply to Anatoli Sakhnik from comment #13) > Mine is 2G. I didn't change anything in the kernel source code, but switched > off many options in the config file: > https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 . > > Even today, if I boot stock arch kernel, the bug regresses; if I boot > linux-c720, kswapd0 is still. In theory, I could experiment with different > configurations in between stock's and mine to triage the issue. could you please share your configuration for the kernel so I can try your AUR package and solve this issue once for all :) ? thanks in advance Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/93 ------------------------------------------------------------------------ On 2016-02-14T07:25:17+00:00 sakhnik wrote: There it is: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/94 ------------------------------------------------------------------------ On 2016-02-15T12:23:55+00:00 jonathan wrote: We encounter this regularly on AWS, but only on t2.small instances, which indeed are the only ones we run which have 2GB of RAM. We use the latest Ubuntu 15.10 AMIs as found here https://cloud- images.ubuntu.com/locator/ec2/. Please let me know if we can do anything to help track this down. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/95 ------------------------------------------------------------------------ On 2016-02-21T07:26:28+00:00 liststuff wrote: The workaround suggested above (echo 3 > /proc/sys/vm/drop_caches) doesn't work consistently for me on kernel 4.2.0 (Ubuntu 15.10) on an Acer C720 Chromebook. I've found another workaround that works well for me so far: create a file /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents and reboot: vm.min_free_kbytes=67584 The idea behind this workaround is a post by Kirill A. Shutemov on LKML (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this Gallium OS bug report: https://github.com/GalliumOS/galliumos- distro/issues/52 Would be interesting to know if this helps others Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/97 ------------------------------------------------------------------------ On 2016-03-04T20:23:01+00:00 sgnn7 wrote: Same problem here: - No swap machine - Wily (U15.10) - 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux - 1GB RAM - `meminfo` - Should have enough RAM to not swap though buffers do seem high MemTotal: 1014932 kB MemFree: 231296 kB MemAvailable: 871180 kB Buffers: 580684 kB Cached: 47812 kB SwapCached: 0 kB Active: 547952 kB Inactive: 164364 kB Active(anon): 84280 kB Inactive(anon): 4288 kB Active(file): 463672 kB Inactive(file): 160076 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 224 kB Writeback: 0 kB AnonPages: 83800 kB Mapped: 39688 kB Shmem: 4768 kB Slab: 48008 kB SReclaimable: 31172 kB SUnreclaim: 16836 kB KernelStack: 1936 kB PageTables: 3844 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 507464 kB Committed_AS: 314640 kB VmallocTotal: 34359738367 kB VmallocUsed: 13524 kB VmallocChunk: 34359717628 kB HardwareCorrupted: 0 kB AnonHugePages: 49152 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 53248 kB DirectMap2M: 1126400 kB - kernel config: https://gist.github.com/sgnn7/cbb41ce21d3a927eca27 - strace shows nothing interesting - `perf` report: Samples: 12K of event 'cpu-clock', Event count (approx.): 3245250000 Overhead Command Shared Object Symbol 19.34% kswapd0 [kernel.kallsyms] [k] shrink_lruvec 17.04% kswapd0 [kernel.kallsyms] [k] mem_cgroup_iter 8.60% kswapd0 [kernel.kallsyms] [k] mem_cgroup_zone_lruvec 6.57% kswapd0 [kernel.kallsyms] [k] shrink_slab 5.47% kswapd0 [kernel.kallsyms] [k] global_dirty_limits 4.18% kswapd0 [kernel.kallsyms] [k] domain_dirty_limits 3.71% kswapd0 [kernel.kallsyms] [k] mem_cgroup_get_lru_size 3.59% kswapd0 [kernel.kallsyms] [k] super_cache_count 3.27% kswapd0 [kernel.kallsyms] [k] get_lru_size 3.26% kswapd0 [kernel.kallsyms] [k] throttle_vm_writeout 2.20% kswapd0 [kernel.kallsyms] [k] css_next_descendant_pre 2.15% kswapd0 [kernel.kallsyms] [k] blk_flush_plug_list 1.96% kswapd0 [kernel.kallsyms] [k] shrink_zone 1.73% kswapd0 [kernel.kallsyms] [k] _raw_spin_lock 1.59% kswapd0 [kernel.kallsyms] [k] __list_lru_count_one.isra.2 1.43% kswapd0 [kernel.kallsyms] [k] list_lru_count_one 1.37% kswapd0 [kernel.kallsyms] [k] memcg_kmem_is_active 1.27% kswapd0 [kernel.kallsyms] [k] __raw_callee_save___pv_queued_spin_unlock ... I'm going to try gdb, changing swappiness, changing vm.min_free_kbytes, and reducing buffer limits in that order and report back but most likely I'll have one shot before the bug goes away for the next few days. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/98 ------------------------------------------------------------------------ On 2016-03-04T21:39:59+00:00 sgnn7 wrote: Cont'd from previous post In order of attempts on a live system: - gdb didn't work at all since kernel wasn't built w/ debugging flags - hotload of 10 and 0 swappiness (from 60) didn't make the kswapd process reduce cpu usage - hotload of vm.min_free_kbytes=64K (from 4K) didn't make the process reduce cpu usage - hotload of vm.dirty_background_ratio=5 (from 10) didn't make the process reduce cpu usage - hotload of vm.dirty_ratio=10 (from 20) didn't make the process reduce cpu usage - hotload of vm.dirty_background_ratio=15 (from 5) didn't make the process reduce cpu usage - hotload of vm.dirty_ratio=25 (from 10) didn't make the process reduce cpu usage - live swapon on a new 256MB swapfile didn't reduce process use - live swapoff and swapon after that also didn't drop cpu usage Sidenote: We're using Docker so I'm not sure if that is contributing to the situation. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/99 ------------------------------------------------------------------------ On 2016-03-08T04:28:41+00:00 cdlscpmv wrote: Good news! I was able to get rid of the bug completely by setting the `mem` kernel parameter to a value slightly less than physical memory. I own an Acer C720 (2GB model), and setting `mem=1920M` does the job. The idea sprung up in my head after reading the aforementioned bug report on github[1]. I hope this might give some clue to the issue. [1]: https://github.com/GalliumOS/galliumos-distro/issues/52 Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/103 ------------------------------------------------------------------------ On 2016-03-09T15:30:59+00:00 ivanov.maxim wrote: Created attachment 208411 ftrace (function_graph) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/104 ------------------------------------------------------------------------ On 2016-03-09T15:31:36+00:00 ivanov.maxim wrote: Created attachment 208421 ftrace (vmscan tracepoints) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/105 ------------------------------------------------------------------------ On 2016-03-09T15:32:36+00:00 ivanov.maxim wrote: Created attachment 208431 /proc/vmstat (time 0) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/106 ------------------------------------------------------------------------ On 2016-03-09T15:33:01+00:00 ivanov.maxim wrote: Created attachment 208441 /proc/vmstat (time 5s) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/107 ------------------------------------------------------------------------ On 2016-03-09T15:33:30+00:00 ivanov.maxim wrote: Created attachment 208451 /proc/zoneinfo Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/108 ------------------------------------------------------------------------ On 2016-03-09T15:33:47+00:00 ivanov.maxim wrote: Created attachment 208461 /proc/pagetypeinfo Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/109 ------------------------------------------------------------------------ On 2016-03-09T15:34:09+00:00 ivanov.maxim wrote: Created attachment 208471 /proc/buddyinfo Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/110 ------------------------------------------------------------------------ On 2016-03-09T15:34:45+00:00 ivanov.maxim wrote: Created attachment 208481 vmstat -m (time 0) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/111 ------------------------------------------------------------------------ On 2016-03-09T15:35:35+00:00 ivanov.maxim wrote: Created attachment 208491 vmstat -m (time 5s) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/112 ------------------------------------------------------------------------ On 2016-03-09T15:59:37+00:00 ivanov.maxim wrote: I am able to semi-reliably reproduce this (or very similar?) problem on a setup very close to one in comment #21 - kernel: 4.2.0-30-generic (ubuntu 15.10) - 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance) - docker with LVM thin-pool storage backend, running 3 containers, no memory limits set for their memcg's - server is mostly idling (load average 0.0-0.1) To reproduce it I have to: 1. set vm.overcomit_memory=1 2. initiate some disk activity: find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null & find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null & 3. run some memory allocations until you hit OOM for x in {1..200}; do ./memalloc & : ; done memalloc above is a simple C program which allocates 100MB and memsets it with 'x': #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> int main(int argc, char *argv[]) { int block_mb = 100; char *buf; printf("allocing %dMB: ", block_mb); buf = malloc(block_mb * 1024 * 1000); if (! buf) { printf("FAILED!\n"); exit(EXIT_FAILURE); } printf("ok\n"); memset(buf, 'x', block_mb * 1024 * 1000); sleep(180); return 0; } once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and then check top. many times it spins `kswapd0` then recovers within tens of seconds, but once in a while it stays there for hours (didn't have patience to check for longer). Once I triggered bug, I tried to get as much information as possible from running system. I am attaching /proc/*info files (some taken 5 s apart), ftrace outputs for event tracer (vmscan events only), ftrace output for function_graph tester. Let me know if you need more information. To recover from situation need to free enough memory in a short period of time, sometime dropping caches helps, sometimes needed to close applications/containers as well, but never had to reboot to recover. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/113 ------------------------------------------------------------------------ On 2016-03-09T16:05:18+00:00 ivanov.maxim wrote: It would be very helpful if there was a way to get output similar to ftrace function_graph tracer, but with function args and return values, but from the look of it, `pgdat_balance` for some reason keeps returning false even that /proc/zoneinfo shows that number of free pages is much higher than any watermark. Problem description and recovery method very closely resembles discussion around kernel 3.7 (https://lkml.org/lkml/2012/11/28/88): > The zonelist reclaim in kswapd would do > nothing because all high watermarks are met, but the compaction logic > would find its own requirements unmet and loop over the zones again. > Indefinitely, until some third party would free enough memory to help > meet the higher compaction watermark. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/114 ------------------------------------------------------------------------ On 2016-04-30T13:38:28+00:00 hdefendme wrote: (In reply to Anatoli Sakhnik from comment #4) > My Acer C720 too suffers occasionally. Turning swap on/off doesn't help. > Dropping caches *does* help: > > # echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough > > Next my guess would be to try to deactivate zswap. above work around works for me, kernel 4.4.2 debian jessie. bug happens randomly after heavy web browsers for kernel 4.5 downgrade to 3.16 stable jessie kernel, bug gone. upgrade 4.4.2 bug came again Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/124 ------------------------------------------------------------------------ On 2016-07-25T18:46:14+00:00 mail+kernel-bugzilla wrote: Same thing on Thinkpad X220 with 8 GB RAM running Ubuntu 14.04, with Ubuntu's Kernel 3.16.0-77-generic. Swap is disabled. kswapd0 runs on high CPU and the HD light is on all the time during this (no idea why). After 20 (!) minutes the OOM killer manages to kill a process to resolve the situation. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/149 ------------------------------------------------------------------------ On 2016-08-25T06:51:45+00:00 n.sherlock wrote: Same problem on Amazon's t2.nano instance (512MB of RAM). Seemed to be triggered by doing a bunch of file IO. This is a brand new install of Ubuntu 16.04. I have no swap enabled, and yet: top - 06:42:57 up 1:58, 1 user, load average: 2.43, 2.66, 2.31 Tasks: 125 total, 3 running, 122 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.1 us, 6.9 sy, 0.0 ni, 0.0 id, 0.9 wa, 0.0 hi, 0.0 si, 90.1 st KiB Mem : 498416 total, 348096 free, 49772 used, 100548 buff/cache KiB Swap: 0 total, 0 free, 0 used. 411900 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29 root 20 0 0 0 0 R 65.0 0.0 103:16.64 kswapd0 14343 root 20 0 0 0 0 R 2.9 0.0 0:00.82 python Running "echo 1 > /proc/sys/vm/drop_caches" didn't fix the problem, but it did fix it immediately with "3". Also, my /tmp isn't full at all (6.5GB / 85% left on root). Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/157 ------------------------------------------------------------------------ On 2016-08-25T07:10:24+00:00 n.sherlock wrote: A workaround for machines running under Xen has been found over on Ubuntu's bug tracker, see comment #69: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 The workaround is to disable hot-add of memory: touch /etc/udev/rules.d/40-vm-hotadd.rules reboot Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/158 ------------------------------------------------------------------------ On 2016-08-30T16:43:55+00:00 dek94 wrote: I tried the same Ubuntu inspired "disable hot-add of memory" (and CPU) workaround under AWS EC2 HVM, Centos 7.x with mainline (elrepo) 4.4.15 kernel: no such luck, I still see this occasionally. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/159 ------------------------------------------------------------------------ On 2016-10-01T17:51:02+00:00 ddstreet wrote: I detailed why this bug happens here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126 this appears to be fixed by Mel Gorman's patch series to change memory reclaim from "per zone" to "per node": https://marc.info/?l=linux-mm&m=146797052519026 So this bug should be fixed with the latest kernel. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/168 ------------------------------------------------------------------------ On 2016-10-02T20:18:14+00:00 mail+kernel-bugzilla wrote: (In reply to Dan Streetman from comment #40) > I detailed why this bug happens here: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126 > > So this bug should be fixed with the latest kernel. Can you clarify, the link you mention seems to talk mainly about Xen. Do you think the latest kernel will fix it also for non-Xen machines? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/170 ------------------------------------------------------------------------ On 2016-10-02T20:30:38+00:00 ddstreet wrote: (In reply to mail+kernel-bugzilla from comment #41) > (In reply to Dan Streetman from comment #40) > > I detailed why this bug happens here: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126 > > > > So this bug should be fixed with the latest kernel. > > Can you clarify, the link you mention seems to talk mainly about Xen. Do you > think the latest kernel will fix it also for non-Xen machines? what does your /proc/zoneinfo look like? do you have a system with (approx) <= 4g and Normal zone with few managed pages? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/171 ------------------------------------------------------------------------ On 2016-10-02T20:45:13+00:00 mail+kernel-bugzilla wrote: (In reply to Dan Streetman from comment #42) > what does your /proc/zoneinfo look like? do you have a system with (approx) > <= 4g and Normal zone with few managed pages? My zoneinfo file right now looks like this: https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94 (I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment #36.) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/172 ------------------------------------------------------------------------ On 2016-10-02T21:20:24+00:00 ddstreet wrote: (In reply to mail+kernel-bugzilla from comment #43) > (In reply to Dan Streetman from comment #42) > > what does your /proc/zoneinfo look like? do you have a system with > (approx) > > <= 4g and Normal zone with few managed pages? > > My zoneinfo file right now looks like this: > https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94 > > (I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment > #36.) That zoneinfo doesn't look like you're seeing the same problem, so if you are seeing consistent, sustained (not just transient) 100% cpu from kswapd, I think it's a different problem from what I described in comment 40. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/173 ------------------------------------------------------------------------ On 2016-10-13T21:25:14+00:00 samkostka wrote: I'm assuming by latest kernel you mean 4.8? If so I'm looking forward to Arch pushing it through testing :) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/175 ------------------------------------------------------------------------ On 2016-11-15T21:32:23+00:00 jc wrote: I am having the same issue on Fedora 24 with kernel 4.8.6. So I guess it has not been pushed there, or it does not fix anything. It is a huge job stopper as I need to transfer many files between two USB disks. Kwapd0 appears on top of processes after a while, and slowly degrades overall performance until I have to hard reboot the machine in the middle of some transfer. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/185 ------------------------------------------------------------------------ On 2016-11-15T21:36:22+00:00 samkostka wrote: My guess is Fedora didn't put the changes through or something, because 4.8 has DEFINITELY fixed it for me. I used to have to reboot about twice daily due to this, but ever since I upgraded to 4.8 it hasn't happened once. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/186 ------------------------------------------------------------------------ On 2016-11-20T21:49:36+00:00 me wrote: I'm on openSUSE with 4.8.8 and still have this issue. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/187 ------------------------------------------------------------------------ On 2016-12-09T01:27:14+00:00 00cpxxx wrote: I'm on Debian with 4.8.7 and still have this issue. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/190 ------------------------------------------------------------------------ On 2017-01-06T04:39:29+00:00 Wilhelm.Buchmueller wrote: 4.8.13-100.fc23.i686+PAE #1 /dev/sda is Samsung SSD 850 EVO 250GB swapoff -va sysctl vm.drop_caches=3 Problem, causes always heavy kswapd0 load: cat /dev/sda >> /dev/zero hdparm -t /dev/sda ddrescue /dev/sda /dev/zero -vf hexdump /dev/sda dd if=/dev/sda of=/dev/zero etc. No problem (read speed ~500MB/s, except hdparm ): hdparm --direct -t /dev/sda dd iflag=direct if=/dev/sda of=/dev/zero bs=1073741824 ddrescue --direct /dev/sda /dev/zero -vf -b 4096 -c 8192 Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/195 ------------------------------------------------------------------------ On 2017-01-08T23:03:59+00:00 dclowes1 wrote: I am not sure if this is the same bug, but for me kswapd0 goes high-cpu following a page allocation failure in xhci_segment_alloc and I think that this has been occurring since moving to 4.8 on Fedora 24. I don't remember experiencing it before that. Currently on 4.8.15. I normally boot with 3 or 4 USB 3.0 disks attached and, after the upgrade to 4.8.x noticed that kswapd0 was running at 100%. I went back to 4.7.x and no problem. Searches on this issue frequently referred to USB disks so I unplugged and rebooted. If I unplug all of my USB 3.0 devices I get a normal boot, even with a USB weather station, keyboard, mouse. Sometimes, one or two USB 3.0 disks is OK too, If I boot with all of the USB 3.0 disks included, I get a kworker page allocation failure and after boot kswapd0 is high-cpu, usually split across 2-4 cores. If I boot with two USB 3.0 disks and get a normal boot (no page allocation failure and normal kswapd) and then plug in a hub with the rest of the disks (and a USB 3.0 card reader) I get the page allocation failure at that point and kswapd0 goes high-cpu. I have not looked at them all, but whenever I see kswapd0 high-cpu and I do look, there is the page allocation failure in the log. The 'perf top' command seems to show different information from time to time but the top contenders are frequently 'shrink_inactive_list', 'inactive_list_is_low', 'find_next_bit', 'shrink_none_memcg', '_raw_spin_lock' to name a few. Makes me wonder if the xhci allocation failure is the trigger, and fails to clean up on the error exit path, and kswapd0 is just a hapless victim. There is a stack trace (on ubuntu kernel) of the page allocation failure in the dmesg attached to https://bugzilla.redhat.com/show_bug.cgi?id=1395825 on this issue but I have more if it would help. I have 19GiB free on a 24GiB machine so there should be no memory shortage to prompt swapping or the page allocation failure. I had also noticed frequently that not all of my USB disks were mounted after boot and that I had to remove and reinsert a disk to use it. IIRC this affected my USB 2.0 disks too and from before the upgrade to 4.8 too. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/196 ------------------------------------------------------------------------ On 2017-01-12T20:14:07+00:00 ddstreet wrote: > Problem, causes always heavy kswapd0 load: > cat /dev/sda >> /dev/zero > hdparm -t /dev/sda > ddrescue /dev/sda /dev/zero -vf > hexdump /dev/sda > dd if=/dev/sda of=/dev/zero > etc. of course those cause kswapd work, all those commands will fill your page cache and kswapd is responsible for clearing those pages out. kswapd running isn't a problem, if it's doing work. kswapd running *without* doing work is the problem. When you stop running those commands, does kswapd catch up and stop using cpu? If so, that's normal. If not, and it never stops using cpu, that's the problem. > No problem (read speed ~500MB/s, except hdparm ): > hdparm --direct -t /dev/sda > dd iflag=direct if=/dev/sda of=/dev/zero bs=1073741824 > ddrescue --direct /dev/sda /dev/zero -vf -b 4096 -c 8192 the difference is those commands bypass the page cache - so the page cache doesn't fill up and kswapd doesn't need to clear it out. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/197 ------------------------------------------------------------------------ On 2017-01-12T20:41:51+00:00 ddstreet wrote: > I am not sure if this is the same bug, but for me kswapd0 goes high-cpu > following a page allocation failure in xhci_segment_alloc and I think that > this has been occurring since moving to 4.8 on Fedora 24 from your dmesg, it certainly doesn't look like the same bug. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/198 ------------------------------------------------------------------------ On 2018-12-18T17:55:52+00:00 xpaint wrote: (In reply to Dan Streetman from comment #52) > of course those cause kswapd work, all those commands will fill your page > cache and kswapd is responsible for clearing those pages out. > > kswapd running isn't a problem, if it's doing work. kswapd running > *without* doing work is the problem. When you stop running those commands, > does kswapd catch up and stop using cpu? If so, that's normal. If not, and > it never stops using cpu, that's the problem. but, why kswapd so aggressively write something to storage when no data to flush (swap not set)? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/207 ** Changed in: linux Status: Unknown => Confirmed ** Changed in: linux Importance: Unknown => Medium ** Bug watch added: github.com/GalliumOS/galliumos-distro/issues #52 https://github.com/GalliumOS/galliumos-distro/issues/52 ** Bug watch added: Red Hat Bugzilla #1395825 https://bugzilla.redhat.com/show_bug.cgi?id=1395825 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1518457 Title: kswapd0 100% CPU usage Status in Linux: Confirmed Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: As per bug 721896 and various others: I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory). Occasionally (about once a day), kswapd0 falls into a busy loop and spins on 100% CPU usage indefinitely. This can be provoked by copying/writing large files (e.g. dding a 256MB file), but it happens occasionally otherwise. System memory usage (not including buffers/caches) currently sits at 36%, which is typical[1]. Initially I had no swap space configured; I've since tried enabling a 256MB swap file, but the problem continues to occur and no swap space is used. The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`. Happy to provide further information/take further debugging actions. [1] Full output from `free`: total used free shared buffers cached Mem: 1014936 483448 531488 28556 9756 112700 -/+ buffers/cache: 360992 653944 Swap: 262140 0 262140 ProblemType: Bug DistroRelease: Ubuntu 15.10 Package: linux-image-4.2.0-18-generic 4.2.0-18.22 ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3 Uname: Linux 4.2.0-18-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Nov 19 19:40 seq crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.19.1-0ubuntu5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A Date: Fri Nov 20 20:44:30 2015 Ec2AMI: ami-1c552a76 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1d Ec2InstanceType: t2.micro Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 MachineType: Xen HVM domU PciMultimedia: ProcEnviron: TERM=screen PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 xen ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-18-generic root=UUID=35bc01f4-4602-4823-976e-508edef899df ro console=tty1 console=ttyS0 net.ifnames=0 RelatedPackageVersions: linux-restricted-modules-4.2.0-18-generic N/A linux-backports-modules-4.2.0-18-generic N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 05/06/2015 dmi.bios.vendor: Xen dmi.bios.version: 4.2.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd05/06/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.2.amazon dmi.sys.vendor: Xen To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1518457/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp