[Kernel-packages] [Bug 1518457] Re: kswapd0 100% CPU usage

Bug Watch Updater Wed, 19 Dec 2018 02:23:31 -0800

Launchpad has imported 55 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=65201.


If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2013-11-19T19:40:40+00:00 nleo wrote:

kswapd0 randomly load one core of CPU by 100%

Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013
x86_64 GNU/Linux

No swap enabled

Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit
pae, and there is no such problem.

[root@localhost ~]# free -mh
             total       used       free     shared    buffers     cached
Mem:          3.8G       2.4G       1.3G         0B       150M       508M
-/+ buffers/cache:       1.8G       2.0G
Swap:           0B         0B         0B


[root@localhost ~]# cat /proc/meminfo
MemTotal:        3935792 kB
MemFree:         1381360 kB
Buffers:          154216 kB
Cached:           533096 kB
SwapCached:            0 kB
Active:          1958896 kB
Inactive:         438004 kB
Active(anon):    1740916 kB
Inactive(anon):   136292 kB
Active(file):     217980 kB
Inactive(file):   301712 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              2064 kB
Writeback:             0 kB
AnonPages:       1709628 kB
Mapped:           196696 kB
Shmem:            167620 kB
Slab:              81516 kB
SReclaimable:      61312 kB
SUnreclaim:        20204 kB
KernelStack:        1696 kB
PageTables:        13088 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1967896 kB
Committed_AS:    3498576 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      361304 kB
VmallocChunk:   34359300731 kB
HardwareCorrupted:     0 kB
AnonHugePages:    157696 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       18476 kB
DirectMap2M:     4059136 kB

And I can't kill it. I heared that it's not good idea, but just for
lulz)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/0

------------------------------------------------------------------------
On 2013-11-20T23:32:02+00:00 atomlin wrote:

(In reply to nleo from comment #0)
> kswapd0 randomly load one core of CPU by 100%

You cannot issue a SIGKILL to 'kswapd' since it is
a kernel thread.

> CommitLimit:     1967896 kB
> Committed_AS:    3498576 kB
                   ^^^^^^^

Seem to be over committing memory.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/1

------------------------------------------------------------------------
On 2013-11-22T00:57:01+00:00 akpm wrote:

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 19 Nov 2013 19:40:40 +0000 bugzilla-dae...@bugzilla.kernel.org
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=65201
> 
>             Bug ID: 65201
>            Summary: kswapd0 randomly high cpu load
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.12
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: a...@linux-foundation.org
>           Reporter: n...@nm.ru
>         Regression: No
> 
> kswapd0 randomly load one core of CPU by 100%
> 
> Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013
> x86_64
> GNU/Linux
> 
> No swap enabled
> 
> Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae,
> and
> there is no such problem.
> 
> [root@localhost ~]# free -mh
>              total       used       free     shared    buffers     cached
> Mem:          3.8G       2.4G       1.3G         0B       150M       508M
> -/+ buffers/cache:       1.8G       2.0G
> Swap:           0B         0B         0B

hm, I wonder what kswapd is up to.

Could you please make it happen again and then

dmesg -n 7
dmesg -c
echo m > /proc/sysrq-trigger
echo t > /proc/sysrq-trigger
dmesg -s 1000000 > foo

then send us foo?

> 
> [root@localhost ~]# cat /proc/meminfo
> MemTotal:        3935792 kB
> MemFree:         1381360 kB
> Buffers:          154216 kB
> Cached:           533096 kB
> SwapCached:            0 kB
> Active:          1958896 kB
> Inactive:         438004 kB
> Active(anon):    1740916 kB
> Inactive(anon):   136292 kB
> Active(file):     217980 kB
> Inactive(file):   301712 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:              2064 kB
> Writeback:             0 kB
> AnonPages:       1709628 kB
> Mapped:           196696 kB
> Shmem:            167620 kB
> Slab:              81516 kB
> SReclaimable:      61312 kB
> SUnreclaim:        20204 kB
> KernelStack:        1696 kB
> PageTables:        13088 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     1967896 kB
> Committed_AS:    3498576 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      361304 kB
> VmallocChunk:   34359300731 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:    157696 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:       18476 kB
> DirectMap2M:     4059136 kB
> 
> And I can't kill it. I heared that it's not good idea, but just for lulz)
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/2

------------------------------------------------------------------------
On 2015-04-21T14:54:19+00:00 mihail.zenkov wrote:

Created attachment 174671
kmsg dump

Sometimes I have same problem. I don't have swap. I have kernel 3.19.0
(i686) compiled without CONFIG_SWAP.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/3

------------------------------------------------------------------------
On 2015-04-30T04:24:52+00:00 sakhnik wrote:

My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.
Dropping caches *does* help:

# echo 3 > /proc/sys/vm/drop_caches  # 1 isn't enough

Next my guess would be to try to deactivate zswap.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/4

------------------------------------------------------------------------
On 2015-05-03T06:33:59+00:00 sakhnik wrote:

Zswap isn't to blame, dropping caches may help or may not. There's the
output of `sudo perf top`:

  26,24%  [kernel]                         [k] _raw_spin_lock
  14,72%  [kernel]                         [k] _raw_spin_unlock
   6,62%  [kernel]                         [k] super_cache_count
   4,97%  [kernel]                         [k] shrink_slab.part.12
   4,92%  [kernel]                         [k] list_lru_count_one
   2,15%  [i2c_designware_core]            [k] 0x0000000000000099
   1,86%  [kernel]                         [k] shrink_lruvec
   1,74%  [kernel]                         [k] mem_cgroup_iter
   1,61%  [kernel]                         [k] native_read_tsc
   1,55%  [kernel]                         [k] delay_tsc
   1,52%  [kernel]                         [k] kswapd%

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/5

------------------------------------------------------------------------
On 2015-11-09T20:45:11+00:00 ponymarzanna wrote:

(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.

I have the same hardware. After system upgrade (current running kernel
version 4.2.0) I get high CPU usage after "heavy" web site opens. If
suggested workaround doesn't help (dropping caches), I just quit web
browser and everything returns back to normal.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/6

------------------------------------------------------------------------
On 2015-11-10T19:40:15+00:00 samkostka wrote:

Same here, also on an Acer C720 running arch.  kswapd0 takes up a whole
core whenever swap is being used.  I run the Arch kernel, with a small
patch to the chromos_laptop driver to enable my trackpad.

The weird thing is memory and swap both aren't that full.  Memory is at
50% utilization, and swap is only at 8%, according to xfce4-taskmanager.
It seems like Google Docs is the worst offender for triggering this
issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/7

------------------------------------------------------------------------
On 2015-11-16T20:18:55+00:00 mvanross wrote:

I had this bug, and for me it turned out to be my /tmp directory that
is a tmpfs (to gain speed and save my ssd).

df /tmp 
gave
tmpfs            3880480 2449036   1431444  95% /tmp

After removing junk from /tmp/ the system returned to normal.

Also in my case I had no swap, and sufficient free memory.

Would be interested to know if this works for you.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/8

------------------------------------------------------------------------
On 2016-01-19T06:19:40+00:00 serianox wrote:

same problem here, c720p chromebook , happens on several different
distros like arch, ubuntu, xubuntu. I downgraded to the 4.1.x kernel and
the issue is less frequent (needs much more memory pressure to trigger).
then I downgraded to the 3.17 kernel and the issue is gone completely.
all the previous suggestions and workarrounds didn't work for me. only
downgrading the kernel did.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/83

------------------------------------------------------------------------
On 2016-02-09T11:23:59+00:00 liststuff wrote:

Same problem here on Acer C720 Chromebook. I have 2GB of swap space on
the SSD (I replaced the original 16GB M2 SSD with a 256GB version) and
whenever swap is used I get this problem.

Linux localhost 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

echo 3 > /proc/sys/vm/drop_caches  # 1 isn't enough works around the
issue for me too

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/86

------------------------------------------------------------------------
On 2016-02-09T12:49:06+00:00 sakhnik wrote:

I didn't suffer from the bug since compiled kernel myself:
https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled
out something causing the trouble, but I didn't try to bisect what was
the culprit.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/87

------------------------------------------------------------------------
On 2016-02-09T19:23:59+00:00 serianox wrote:

(In reply to Anatoli Sakhnik from comment #11)
> I didn't suffer from the bug since compiled kernel myself:
> https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out
> something causing the trouble, but I didn't try to bisect what was the
> culprit.

This bug seems to affect 2Gb models only. Do you have the 2Gb or 4Gb
version? What are the changes you made on your kernel?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/88

------------------------------------------------------------------------
On 2016-02-09T20:22:17+00:00 sakhnik wrote:

Mine is 2G. I didn't change anything in the kernel source code, but
switched off many options in the config file:
https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .

Even today, if I boot stock arch kernel, the bug regresses; if I boot
linux-c720, kswapd0 is still. In theory, I could experiment with
different configurations in between stock's and mine to triage the
issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/89

------------------------------------------------------------------------
On 2016-02-09T20:32:45+00:00 serianox wrote:

perhaps you removed something related to 
http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html ?
also relevant:
https://github.com/GalliumOS/galliumos-distro/issues/52#issuecomment-174261443

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/90

------------------------------------------------------------------------
On 2016-02-09T20:39:48+00:00 sakhnik wrote:

I have no idea yet.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/91

------------------------------------------------------------------------
On 2016-02-10T10:45:39+00:00 ponymarzanna wrote:

To avoid this bug I installed ChromeOS on my C720 (with 2GB RAM). I was
happy with performance. Until today. I noticed lags. For some reason
this bug appeared suddenly. There was no update. Kernel version is
3.8.11. Stock ChromeOS kernel.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/92

------------------------------------------------------------------------
On 2016-02-14T06:12:55+00:00 serianox wrote:

(In reply to Anatoli Sakhnik from comment #13)
> Mine is 2G. I didn't change anything in the kernel source code, but switched
> off many options in the config file:
> https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .
> 
> Even today, if I boot stock arch kernel, the bug regresses; if I boot
> linux-c720, kswapd0 is still. In theory, I could experiment with different
> configurations in between stock's and mine to triage the issue.

could you please share your configuration for the kernel so I can try
your AUR package and solve this issue once for all :) ?  thanks in
advance

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/93

------------------------------------------------------------------------
On 2016-02-14T07:25:17+00:00 sakhnik wrote:

There it is:
https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/94

------------------------------------------------------------------------
On 2016-02-15T12:23:55+00:00 jonathan wrote:

We encounter this regularly on AWS, but only on t2.small instances,
which indeed are the only ones we run which have 2GB of RAM.

We use the latest Ubuntu 15.10 AMIs as found here https://cloud-
images.ubuntu.com/locator/ec2/. Please let me know if we can do anything
to help track this down.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/95

------------------------------------------------------------------------
On 2016-02-21T07:26:28+00:00 liststuff wrote:

The workaround suggested above (echo 3 > /proc/sys/vm/drop_caches)
doesn't work consistently for me on kernel 4.2.0 (Ubuntu 15.10) on an
Acer C720 Chromebook.

I've found another workaround that works well for me so far: create a file 
/etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents and 
reboot:
vm.min_free_kbytes=67584

The idea behind this workaround is a post by Kirill A. Shutemov on LKML
(http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this
Gallium OS bug report: https://github.com/GalliumOS/galliumos-
distro/issues/52

Would be interesting to know if this helps others

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/97

------------------------------------------------------------------------
On 2016-03-04T20:23:01+00:00 sgnn7 wrote:

Same problem here:
- No swap machine
- Wily (U15.10) - 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 
x86_64 x86_64 x86_64 GNU/Linux
- 1GB RAM

- `meminfo` - Should have enough RAM to not swap though buffers do seem
high

MemTotal:        1014932 kB
MemFree:          231296 kB
MemAvailable:     871180 kB
Buffers:          580684 kB
Cached:            47812 kB
SwapCached:            0 kB
Active:           547952 kB
Inactive:         164364 kB
Active(anon):      84280 kB
Inactive(anon):     4288 kB
Active(file):     463672 kB
Inactive(file):   160076 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               224 kB
Writeback:             0 kB
AnonPages:         83800 kB
Mapped:            39688 kB
Shmem:              4768 kB
Slab:              48008 kB
SReclaimable:      31172 kB
SUnreclaim:        16836 kB
KernelStack:        1936 kB
PageTables:         3844 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      507464 kB
Committed_AS:     314640 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       13524 kB
VmallocChunk:   34359717628 kB
HardwareCorrupted:     0 kB
AnonHugePages:     49152 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       53248 kB
DirectMap2M:     1126400 kB

- kernel config: https://gist.github.com/sgnn7/cbb41ce21d3a927eca27

- strace shows nothing interesting

- `perf` report:
Samples: 12K of event 'cpu-clock', Event count (approx.): 3245250000            
                                                                                
                                                                                
   
Overhead  Command  Shared Object      Symbol                                    
                                                                                
                                                                                
   
  19.34%  kswapd0  [kernel.kallsyms]  [k] shrink_lruvec                         
                                                                                
                                                                                
   
  17.04%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_iter                       
                                                                                
                                                                                
   
   8.60%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_zone_lruvec                
                                                                                
                                                                                
   
   6.57%  kswapd0  [kernel.kallsyms]  [k] shrink_slab                           
                                                                                
                                                                                
   
   5.47%  kswapd0  [kernel.kallsyms]  [k] global_dirty_limits                   
                                                                                
                                                                                
   
   4.18%  kswapd0  [kernel.kallsyms]  [k] domain_dirty_limits                   
                                                                                
                                                                                
   
   3.71%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_get_lru_size               
                                                                                
                                                                                
   
   3.59%  kswapd0  [kernel.kallsyms]  [k] super_cache_count                     
                                                                                
                                                                                
   
   3.27%  kswapd0  [kernel.kallsyms]  [k] get_lru_size                          
                                                                                
                                                                                
   
   3.26%  kswapd0  [kernel.kallsyms]  [k] throttle_vm_writeout                  
                                                                                
                                                                                
   
   2.20%  kswapd0  [kernel.kallsyms]  [k] css_next_descendant_pre               
                                                                                
                                                                                
   
   2.15%  kswapd0  [kernel.kallsyms]  [k] blk_flush_plug_list                   
                                                                                
                                                                                
   
   1.96%  kswapd0  [kernel.kallsyms]  [k] shrink_zone                           
                                                                                
                                                                                
   
   1.73%  kswapd0  [kernel.kallsyms]  [k] _raw_spin_lock                        
                                                                                
                                                                                
   
   1.59%  kswapd0  [kernel.kallsyms]  [k] __list_lru_count_one.isra.2           
                                                                                
                                                                                
   
   1.43%  kswapd0  [kernel.kallsyms]  [k] list_lru_count_one                    
                                                                                
                                                                                
   
   1.37%  kswapd0  [kernel.kallsyms]  [k] memcg_kmem_is_active                  
                                                                                
                                                                                
   
   1.27%  kswapd0  [kernel.kallsyms]  [k] 
__raw_callee_save___pv_queued_spin_unlock                                       
                                                                                
                                         
...


I'm going to try gdb, changing swappiness, changing vm.min_free_kbytes, and 
reducing buffer limits in that order and report back but most likely I'll have 
one shot before the bug goes away for the next few days.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/98

------------------------------------------------------------------------
On 2016-03-04T21:39:59+00:00 sgnn7 wrote:

Cont'd from previous post

In order of attempts on a live system:
- gdb didn't work at all since kernel wasn't built w/ debugging flags
- hotload of 10 and 0 swappiness (from 60) didn't make the kswapd process 
reduce cpu usage
- hotload of vm.min_free_kbytes=64K (from 4K) didn't make the process reduce 
cpu usage
- hotload of vm.dirty_background_ratio=5 (from 10) didn't make the process 
reduce cpu usage
- hotload of vm.dirty_ratio=10 (from 20) didn't make the process reduce cpu 
usage
- hotload of vm.dirty_background_ratio=15 (from 5) didn't make the process 
reduce cpu usage
- hotload of vm.dirty_ratio=25 (from 10) didn't make the process reduce cpu 
usage
- live swapon on a new 256MB swapfile didn't reduce process use
- live swapoff and swapon after that also didn't drop cpu usage


Sidenote: We're using Docker so I'm not sure if that is contributing to the 
situation.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/99

------------------------------------------------------------------------
On 2016-03-08T04:28:41+00:00 cdlscpmv wrote:

Good news! I was able to get rid of the bug completely by setting the
`mem` kernel parameter to a value slightly less than physical memory. I
own an Acer C720 (2GB model), and setting `mem=1920M` does the job.

The idea sprung up in my head after reading the aforementioned bug
report on github[1]. I hope this might give some clue to the issue.

[1]: https://github.com/GalliumOS/galliumos-distro/issues/52

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/103

------------------------------------------------------------------------
On 2016-03-09T15:30:59+00:00 ivanov.maxim wrote:

Created attachment 208411
ftrace (function_graph)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/104

------------------------------------------------------------------------
On 2016-03-09T15:31:36+00:00 ivanov.maxim wrote:

Created attachment 208421
ftrace (vmscan tracepoints)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/105

------------------------------------------------------------------------
On 2016-03-09T15:32:36+00:00 ivanov.maxim wrote:

Created attachment 208431
/proc/vmstat (time 0)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/106

------------------------------------------------------------------------
On 2016-03-09T15:33:01+00:00 ivanov.maxim wrote:

Created attachment 208441
/proc/vmstat (time 5s)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/107

------------------------------------------------------------------------
On 2016-03-09T15:33:30+00:00 ivanov.maxim wrote:

Created attachment 208451
/proc/zoneinfo

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/108

------------------------------------------------------------------------
On 2016-03-09T15:33:47+00:00 ivanov.maxim wrote:

Created attachment 208461
/proc/pagetypeinfo

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/109

------------------------------------------------------------------------
On 2016-03-09T15:34:09+00:00 ivanov.maxim wrote:

Created attachment 208471
/proc/buddyinfo

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/110

------------------------------------------------------------------------
On 2016-03-09T15:34:45+00:00 ivanov.maxim wrote:

Created attachment 208481
vmstat -m (time 0)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/111

------------------------------------------------------------------------
On 2016-03-09T15:35:35+00:00 ivanov.maxim wrote:

Created attachment 208491
vmstat -m (time 5s)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/112

------------------------------------------------------------------------
On 2016-03-09T15:59:37+00:00 ivanov.maxim wrote:

I am able to semi-reliably reproduce this (or very similar?) problem on
a setup very close to one in comment #21

- kernel: 4.2.0-30-generic (ubuntu 15.10)
- 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance)
- docker with LVM thin-pool storage backend, running 3 containers, no memory 
limits set for their memcg's
- server is mostly idling (load average 0.0-0.1)

To reproduce it I have to:

1. set vm.overcomit_memory=1
2. initiate some disk activity: 
     find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null & 
     find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null & 

3. run some memory allocations until you hit OOM
    for x in {1..200}; do ./memalloc & : ; done

memalloc above is a simple C program which allocates 100MB and memsets
it with 'x':

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
  int block_mb = 100;
  char *buf;


  printf("allocing %dMB: ", block_mb);
  buf = malloc(block_mb * 1024 * 1000);
  if (! buf) {
    printf("FAILED!\n");
    exit(EXIT_FAILURE);
  }
  printf("ok\n");
  memset(buf, 'x', block_mb * 1024 * 1000);
  sleep(180);
  return 0;
}


once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and 
then check top. many times it spins `kswapd0` then recovers within tens of 
seconds, but once in a while it stays there for hours (didn't have patience to 
check for longer).

Once I triggered bug, I tried to get as much information as possible
from running system. I am attaching /proc/*info files (some taken 5 s
apart), ftrace outputs for event tracer (vmscan events only), ftrace
output for function_graph tester. Let me know if you need more
information.


To recover from situation need to free enough memory in a short period of time, 
sometime dropping caches helps, sometimes needed to close 
applications/containers as well, but never had to reboot to recover.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/113

------------------------------------------------------------------------
On 2016-03-09T16:05:18+00:00 ivanov.maxim wrote:

It would be very helpful if there was a way to get output similar to
ftrace function_graph tracer, but with function args and return values,
but from the look of it, `pgdat_balance` for some reason keeps returning
false even that /proc/zoneinfo shows that number of free pages is much
higher than any watermark.


Problem description and recovery method very closely resembles discussion 
around kernel 3.7 (https://lkml.org/lkml/2012/11/28/88):

> The zonelist reclaim in kswapd would do
> nothing because all high watermarks are met, but the compaction logic
> would find its own requirements unmet and loop over the zones again.
> Indefinitely, until some third party would free enough memory to help
> meet the higher compaction watermark.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/114

------------------------------------------------------------------------
On 2016-04-30T13:38:28+00:00 hdefendme wrote:

(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.
> Dropping caches *does* help:
> 
> # echo 3 > /proc/sys/vm/drop_caches  # 1 isn't enough
> 
> Next my guess would be to try to deactivate zswap.

above work around works for me, kernel 4.4.2 debian jessie.

bug happens randomly after heavy web browsers for kernel 4.5
downgrade to 3.16 stable jessie kernel, bug gone.
upgrade 4.4.2 bug came again

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/124

------------------------------------------------------------------------
On 2016-07-25T18:46:14+00:00 mail+kernel-bugzilla wrote:

Same thing on Thinkpad X220 with 8 GB RAM running Ubuntu 14.04, with
Ubuntu's Kernel 3.16.0-77-generic.

Swap is disabled.

kswapd0 runs on high CPU and the HD light is on all the time during this
(no idea why).

After 20 (!) minutes the OOM killer manages to kill a process to resolve
the situation.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/149

------------------------------------------------------------------------
On 2016-08-25T06:51:45+00:00 n.sherlock wrote:

Same problem on Amazon's t2.nano instance (512MB of RAM). Seemed to be
triggered by doing a bunch of file IO. This is a brand new install of
Ubuntu 16.04. I have no swap enabled, and yet:

top - 06:42:57 up  1:58,  1 user,  load average: 2.43, 2.66, 2.31
Tasks: 125 total,   3 running, 122 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.1 us,  6.9 sy,  0.0 ni,  0.0 id,  0.9 wa,  0.0 hi,  0.0 si, 90.1 st
KiB Mem :   498416 total,   348096 free,    49772 used,   100548 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   411900 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
   29 root      20   0       0      0      0 R 65.0  0.0 103:16.64 kswapd0
14343 root      20   0       0      0      0 R  2.9  0.0   0:00.82 python

Running "echo 1 > /proc/sys/vm/drop_caches" didn't fix the problem, but
it did fix it immediately with "3".

Also, my /tmp isn't full at all (6.5GB / 85% left on root).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/157

------------------------------------------------------------------------
On 2016-08-25T07:10:24+00:00 n.sherlock wrote:

A workaround for machines running under Xen has been found over on
Ubuntu's bug tracker, see comment #69:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

The workaround is to disable hot-add of memory:

touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/158

------------------------------------------------------------------------
On 2016-08-30T16:43:55+00:00 dek94 wrote:

I tried the same Ubuntu inspired "disable hot-add of memory" (and CPU)
workaround under AWS EC2 HVM, Centos 7.x with mainline (elrepo) 4.4.15
kernel: no such luck, I still see this occasionally.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/159

------------------------------------------------------------------------
On 2016-10-01T17:51:02+00:00 ddstreet wrote:

I detailed why this bug happens here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126

this appears to be fixed by Mel Gorman's patch series to change memory reclaim 
from "per zone" to "per node":
https://marc.info/?l=linux-mm&m=146797052519026

So this bug should be fixed with the latest kernel.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/168

------------------------------------------------------------------------
On 2016-10-02T20:18:14+00:00 mail+kernel-bugzilla wrote:

(In reply to Dan Streetman from comment #40)
> I detailed why this bug happens here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
>
> So this bug should be fixed with the latest kernel.

Can you clarify, the link you mention seems to talk mainly about Xen. Do
you think the latest kernel will fix it also for non-Xen machines?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/170

------------------------------------------------------------------------
On 2016-10-02T20:30:38+00:00 ddstreet wrote:

(In reply to mail+kernel-bugzilla from comment #41)
> (In reply to Dan Streetman from comment #40)
> > I detailed why this bug happens here:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
> >
> > So this bug should be fixed with the latest kernel.
> 
> Can you clarify, the link you mention seems to talk mainly about Xen. Do you
> think the latest kernel will fix it also for non-Xen machines?

what does your /proc/zoneinfo look like?  do you have a system with
(approx) <= 4g and Normal zone with few managed pages?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/171

------------------------------------------------------------------------
On 2016-10-02T20:45:13+00:00 mail+kernel-bugzilla wrote:

(In reply to Dan Streetman from comment #42)
> what does your /proc/zoneinfo look like?  do you have a system with (approx)
> <= 4g and Normal zone with few managed pages?

My zoneinfo file right now looks like this:
https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94

(I upgraded from 8 GB to 16 GB memory recently though, after I wrote
comment #36.)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/172

------------------------------------------------------------------------
On 2016-10-02T21:20:24+00:00 ddstreet wrote:

(In reply to mail+kernel-bugzilla from comment #43)
> (In reply to Dan Streetman from comment #42)
> > what does your /proc/zoneinfo look like?  do you have a system with
> (approx)
> > <= 4g and Normal zone with few managed pages?
> 
> My zoneinfo file right now looks like this:
> https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94
> 
> (I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment
> #36.)

That zoneinfo doesn't look like you're seeing the same problem, so if
you are seeing consistent, sustained (not just transient) 100% cpu from
kswapd, I think it's a different problem from what I described in
comment 40.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/173

------------------------------------------------------------------------
On 2016-10-13T21:25:14+00:00 samkostka wrote:

I'm assuming by latest kernel you mean 4.8?  If so I'm looking forward
to Arch pushing it through testing :)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/175

------------------------------------------------------------------------
On 2016-11-15T21:32:23+00:00 jc wrote:

I am having the same issue on Fedora 24 with kernel 4.8.6. So I guess it has 
not been pushed there, or it does not fix anything.
It is a huge job stopper as I need to transfer many files between two USB disks.
Kwapd0 appears on top of processes after a while, and slowly degrades overall 
performance until I have to hard reboot the machine in the middle of some 
transfer.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/185

------------------------------------------------------------------------
On 2016-11-15T21:36:22+00:00 samkostka wrote:

My guess is Fedora didn't put the changes through or something, because
4.8 has DEFINITELY fixed it for me.  I used to have to reboot about
twice daily due to this, but ever since I upgraded to 4.8 it hasn't
happened once.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/186

------------------------------------------------------------------------
On 2016-11-20T21:49:36+00:00 me wrote:

I'm on openSUSE with 4.8.8 and still have this issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/187

------------------------------------------------------------------------
On 2016-12-09T01:27:14+00:00 00cpxxx wrote:

I'm on Debian with 4.8.7 and still have this issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/190

------------------------------------------------------------------------
On 2017-01-06T04:39:29+00:00 Wilhelm.Buchmueller wrote:

4.8.13-100.fc23.i686+PAE #1
/dev/sda is Samsung SSD 850 EVO 250GB 

swapoff -va
sysctl vm.drop_caches=3

Problem, causes always heavy kswapd0 load:
  cat /dev/sda >> /dev/zero
  hdparm -t /dev/sda
  ddrescue /dev/sda /dev/zero  -vf
  hexdump /dev/sda
  dd if=/dev/sda of=/dev/zero
  etc.

No problem (read speed ~500MB/s, except hdparm ):
  hdparm   --direct -t /dev/sda
  dd   iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
  ddrescue --direct    /dev/sda /dev/zero -vf  -b 4096 -c 8192

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/195

------------------------------------------------------------------------
On 2017-01-08T23:03:59+00:00 dclowes1 wrote:

I am not sure if this is the same bug, but for me kswapd0 goes high-cpu
following a page allocation failure in xhci_segment_alloc and I think
that this has been occurring since moving to 4.8 on Fedora 24. I don't
remember experiencing it before that. Currently on 4.8.15.

I normally boot with 3 or 4 USB 3.0 disks attached and, after the
upgrade to 4.8.x noticed that kswapd0 was running at 100%. I went back
to 4.7.x and no problem. Searches on this issue frequently referred to
USB disks so I unplugged and rebooted.

If I unplug all of my USB 3.0 devices I get a normal boot, even with a
USB weather station, keyboard, mouse. Sometimes, one or two USB 3.0
disks is OK too, If I boot with all of the USB 3.0 disks included, I get
a kworker page allocation failure and after boot kswapd0 is high-cpu,
usually split across 2-4 cores.

If I boot with two USB 3.0 disks and get a normal boot (no page
allocation failure and normal kswapd) and then plug in a hub with the
rest of the disks (and a USB 3.0 card reader) I get the page allocation
failure at that point and kswapd0 goes high-cpu.

I have not looked at them all, but whenever I see kswapd0 high-cpu and I
do look, there is the page allocation failure in the log.

The 'perf top' command seems to show different information from time to
time but the top contenders are frequently 'shrink_inactive_list',
'inactive_list_is_low', 'find_next_bit', 'shrink_none_memcg',
'_raw_spin_lock' to name a few.

Makes me wonder if the xhci allocation failure is the trigger, and fails
to clean up on the error exit path, and kswapd0 is just a hapless
victim. There is a stack trace (on ubuntu kernel) of the page allocation
failure in the dmesg attached to
https://bugzilla.redhat.com/show_bug.cgi?id=1395825 on this issue but I
have more if it would help.

I have 19GiB free on a 24GiB machine so there should be no memory
shortage to prompt swapping or the page allocation failure.

I had also noticed frequently that not all of my USB disks were mounted
after boot and that I had to remove and reinsert a disk to use it. IIRC
this affected my USB 2.0 disks too and from before the upgrade to 4.8
too.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/196

------------------------------------------------------------------------
On 2017-01-12T20:14:07+00:00 ddstreet wrote:

> Problem, causes always heavy kswapd0 load:
>  cat /dev/sda >> /dev/zero
>  hdparm -t /dev/sda
>  ddrescue /dev/sda /dev/zero  -vf
>  hexdump /dev/sda
>  dd if=/dev/sda of=/dev/zero
>  etc.

of course those cause kswapd work, all those commands will fill your
page cache and kswapd is responsible for clearing those pages out.

kswapd running isn't a problem, if it's doing work.  kswapd running
*without* doing work is the problem.  When you stop running those
commands, does kswapd catch up and stop using cpu?  If so, that's
normal.  If not, and it never stops using cpu, that's the problem.

> No problem (read speed ~500MB/s, except hdparm ):
>  hdparm   --direct -t /dev/sda
>  dd   iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
>  ddrescue --direct    /dev/sda /dev/zero -vf  -b 4096 -c 8192

the difference is those commands bypass the page cache - so the page
cache doesn't fill up and kswapd doesn't need to clear it out.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/197

------------------------------------------------------------------------
On 2017-01-12T20:41:51+00:00 ddstreet wrote:

> I am not sure if this is the same bug, but for me kswapd0 goes high-cpu
> following a page allocation failure in xhci_segment_alloc and I think that
> this has been occurring since moving to 4.8 on Fedora 24

from your dmesg, it certainly doesn't look like the same bug.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/198

------------------------------------------------------------------------
On 2018-12-18T17:55:52+00:00 xpaint wrote:

(In reply to Dan Streetman from comment #52)

> of course those cause kswapd work, all those commands will fill your page
> cache and kswapd is responsible for clearing those pages out.
> 
> kswapd running isn't a problem, if it's doing work.  kswapd running
> *without* doing work is the problem.  When you stop running those commands,
> does kswapd catch up and stop using cpu?  If so, that's normal.  If not, and
> it never stops using cpu, that's the problem.

but, why kswapd so aggressively write something to storage when no data
to flush (swap not set)?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/207


** Changed in: linux
       Status: Unknown => Confirmed

** Changed in: linux
   Importance: Unknown => Medium

** Bug watch added: github.com/GalliumOS/galliumos-distro/issues #52
   https://github.com/GalliumOS/galliumos-distro/issues/52

** Bug watch added: Red Hat Bugzilla #1395825
   https://bugzilla.redhat.com/show_bug.cgi?id=1395825

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1518457

Title:
  kswapd0 100% CPU usage

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  As per bug 721896 and various others:

  I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
  Occasionally (about once a day), kswapd0 falls into a busy loop and
  spins on 100% CPU usage indefinitely. This can be provoked by
  copying/writing large files (e.g. dding a 256MB file), but it happens
  occasionally otherwise. System memory usage (not including
  buffers/caches) currently sits at 36%, which is typical[1]. Initially
  I had no swap space configured; I've since tried enabling a 256MB swap
  file, but the problem continues to occur and no swap space is used.
  The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.

  Happy to provide further information/take further debugging actions.

  
  [1] Full output from `free`:
               total       used       free     shared    buffers     cached
  Mem:       1014936     483448     531488      28556       9756     112700
  -/+ buffers/cache:     360992     653944
  Swap:       262140          0     262140

  ProblemType: Bug
  DistroRelease: Ubuntu 15.10
  Package: linux-image-4.2.0-18-generic 4.2.0-18.22
  ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
  Uname: Linux 4.2.0-18-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Nov 19 19:40 seq
   crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  Date: Fri Nov 20 20:44:30 2015
  Ec2AMI: ami-1c552a76
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: t2.micro
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize 
libusb: -99
  MachineType: Xen HVM domU
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 xen
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-18-generic 
root=UUID=35bc01f4-4602-4823-976e-508edef899df ro console=tty1 console=ttyS0 
net.ifnames=0
  RelatedPackageVersions:
   linux-restricted-modules-4.2.0-18-generic N/A
   linux-backports-modules-4.2.0-18-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 05/06/2015
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd05/06/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1518457/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1518457] Re: kswapd0 100% CPU usage

Reply via email to