On Sun 2017-01-15, Ben Hutchings <b...@decadent.org.uk> wrote: > On Sun, 2017-01-15 at 11:05 +0100, Florian Siegesmund wrote:
>> Package: src:linux >> Version: 4.8.15-2 >> Severity: critical >> Tags: upstream >> Justification: breaks the whole system >> >> [...] >> >> It seems to me that commit 7838fbe25a95ce2cd6e8ae27a76d369365da89d4 >> (kernel ChangeLog-4.8.12) was meant to deal with this problem, but it >> doesn't work for me. >> >> I will add a sample syslog for this issue. > > That includes an RCU error before the OOM. Was that the first message > relating to RCU? No. LOGHOST:/var/log # grep -i 'LXCONTAINER.*RCU' syslog* syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] INFO: rcu_sched detected stalls on CPUs/tasks: syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] All QSes seen, last rcu_sched kthread activity 5260 (9621692-9616432), jiffies_till_next_fqs=1, root ->qsmask 0x0 syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790 syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] rcu_sched kthread starved for 5260 jiffies! g1427280 c1427279 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] rcu_sched R running 0 7 2 0x00000000 syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0 syslog:Jan 15 02:03:03 LXCONTAINER kernel: [38786.739883] [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100 syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] INFO: rcu_sched detected stalls on CPUs/tasks: syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] All QSes seen, last rcu_sched kthread activity 5253 (9792768-9787515), jiffies_till_next_fqs=1, root ->qsmask 0x0 syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790 syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] rcu_sched kthread starved for 5253 jiffies! g1435292 c1435291 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] rcu_sched R running 0 7 2 0x00000000 syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0 syslog:Jan 15 02:14:13 LXCONTAINER kernel: [39471.046708] [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100 syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] INFO: rcu_sched detected stalls on CPUs/tasks: syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] All QSes seen, last rcu_sched kthread activity 5254 (9890282-9885028), jiffies_till_next_fqs=1, root ->qsmask 0x0 syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790 syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] rcu_sched kthread starved for 5254 jiffies! g1443629 c1443628 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] rcu_sched R running 0 7 2 0x00000000 syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0 syslog:Jan 15 02:21:12 LXCONTAINER kernel: [39861.120914] [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100 LOGHOST:/var/log # grep -i 'HOSTNAME.*RCU' syslog* syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] INFO: rcu_sched detected stalls on CPUs/tasks: syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] All QSes seen, last rcu_sched kthread activity 5258 (9945547-9940289), jiffies_till_next_fqs=1, root ->qsmask 0x0 syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790 syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] rcu_sched kthread starved for 5258 jiffies! g1445441 c1445440 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] rcu_sched R running 0 7 2 0x00000000 syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0 syslog:Jan 15 02:24:30 HOSTNAME kernel: [40082.167882] [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100 syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] INFO: rcu_sched detected stalls on CPUs/tasks: syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] All QSes seen, last rcu_sched kthread activity 5254 (14628655-14623401), jiffies_till_next_fqs=1, root ->qsmask 0x0 syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] [<cb0d0653>] ? rcu_check_callbacks+0x773/0x790 syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] rcu_sched kthread starved for 5254 jiffies! g1582915 c1582914 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] rcu_sched R running 0 7 2 0x00000000 syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] [<cb0cedc2>] ? rcu_gp_kthread+0x4a2/0x7d0 syslog:Jan 15 07:36:36 HOSTNAME kernel: [58814.608747] [<cb0ce920>] ? rcu_note_context_switch+0x100/0x100 LOGHOST:/var/log # Oldest syslog message in the archive is from January 8th, 00:00:00 local time, so no RCU related messages when running 3.16.x. HOSTNAME:~# last | grep boot reboot system boot 4.9.0-1-686-pae Sun Jan 15 11:42 still running reboot system boot 4.8.0-2-686-pae Sun Jan 15 10:16 - 11:40 (01:23) reboot system boot 3.16.0-4-686-pae Sun Jan 15 08:44 - 10:15 (01:31) reboot system boot 4.8.0-2-686-pae Sat Jan 14 15:16 - 10:15 (18:59) reboot system boot 3.16.0-4-686-pae Fri Jan 6 23:58 - 15:15 (7+15:17) HOSTNAME:~# FWIW RCU related messages are logged by the LXC container running on HOSTNAME first. > Linux 4.9 is said to have better OOM behaviour and is now available in > unstable. Can you test whether it works better on this system? We will see: HOSTNAME:~# uname -a ; uptime Linux kyrrdis 4.9.0-1-686-pae #1 SMP Debian 4.9.2-2 (2017-01-12) i686 GNU/Linux 22:35:41 up 10:53, 2 users, load average: 0.28, 0.11, 0.04 HOSTNAME:~# Florian -- :wq