You have been subscribed to a public bug:

Description:
------------
Call traces dumping continuously with Leaf IO and SMT tests with SMT fix 
(140718) after 10+ hours of regression run and not able to get the prompt.

Steps to re-create:
------------------
> cap installed with latest ubuntu160401 kernel ,4.4.0-38-generic.

> Applied SMT kernel patch on system cap for issue:140718

root@cap:~# ls -l
total 56572
-rw-r--r-- 1 root root 18838772 Sep 22 12:24 
linux-image-4.4.0-21-generic_4.4.0-21.37+smt_ppc64el.deb
-rw-r--r-- 1 root root 39081588 Sep 22 12:24 
linux-image-extra-4.4.0-21-generic_4.4.0-21.37+smt_ppc64el.deb
-rw------- 1 root root       70 Sep 21 04:24 nohup.out

> Booted with above kernel

root@cap:~# uname -a
Linux cap 4.4.0-21-generic #37+smt SMP Mon Aug 29 15:07:28 CDT 2016 ppc64le 
ppc64le ppc64le GNU/Linux
root@cap:~# uname -r
4.4.0-21-generic

> Enabled sysrq and also xmon before starting tests

root@cap:~# cat /proc/sys/kernel/sysrq
1
root@cap:~# cat /proc/cmdline
root=UUID=4114e1ef-5e30-45ae-a5fb-a5429946434c ro xmon=on splash quiet 
crashkernel=384M-:128M

> root@cap:~/fix_140718# ppc64_cpu --smt
SMT=8

> Started tests with Leaf IO and SMT. After 10+ hours of run, ipmi console 
> dumping call traces continuously and not able to get the prompt.
   ssh cap is hung, ping cap is working fine 

[ipjoga@kte ~]$ ssh root@cap


ipjoga@kte ~]$ ping cap
PING cap.isst.aus.stglabs.ibm.com (10.33.17.16) 56(84) bytes of data.
64 bytes from cap.isst.aus.stglabs.ibm.com (10.33.17.16): icmp_seq=1 ttl=64 
time=0.095 ms
64 bytes from cap.isst.aus.stglabs.ibm.com (10.33.17.16): icmp_seq=2 ttl=64 
time=0.055 ms
^C
 
> Attached Call traces 

> Also memory in this system is

oot@cap:/kte/tools/setup.d# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.0T        4.4G        1.0T         37M        9.4G        1.0T
Swap:           37G          0B         37G
root@cap:/kte/tools/setup.d# 

                                      
UBUNTU  BUILD:                         4.4.0-38-generic

SL Firmware Version :                  IBM-garrison-ibm-OP8_v1.10_2.17

IO team thinks this is related/fixed by commit 135e8c9250dd
("sched/core: Fix a race between try_to_wake_up() and a woken up task").
We built a kernel with that patch applied and asked indira to restart
the tests.

Developer provided the fix for above issue . Applied it and restarted
Leaf IO and SMT tests which has both SMT fix and Memory barrier fix.

root@cap:~# uname -r
4.4.0-38.58+ibm-smt1-generic

Run went fine for more than 60+ hours without any system hang.

Canonical, we believe the following issue to be fixed by:

commit 135e8c9250dd ("sched/core: Fix a race between try_to_wake_up()
and a woken up task")

Which was marked to the -stable tree.  Can you pull it into your kernel?

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-146713 severity-critical 
targetmilestone-inin16041
-- 
ISST-LTE:pNV:cap: Call traces dumping continuously after 10+ hours of 
regression with Leaf IO and SMT tests with SMT fix
https://bugs.launchpad.net/bugs/1629872
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to