You have been subscribed to a public bug: Description: ------------ Call traces dumping continuously with Leaf IO and SMT tests with SMT fix (140718) after 10+ hours of regression run and not able to get the prompt.
Steps to re-create: ------------------ > cap installed with latest ubuntu160401 kernel ,4.4.0-38-generic. > Applied SMT kernel patch on system cap for issue:140718 root@cap:~# ls -l total 56572 -rw-r--r-- 1 root root 18838772 Sep 22 12:24 linux-image-4.4.0-21-generic_4.4.0-21.37+smt_ppc64el.deb -rw-r--r-- 1 root root 39081588 Sep 22 12:24 linux-image-extra-4.4.0-21-generic_4.4.0-21.37+smt_ppc64el.deb -rw------- 1 root root 70 Sep 21 04:24 nohup.out > Booted with above kernel root@cap:~# uname -a Linux cap 4.4.0-21-generic #37+smt SMP Mon Aug 29 15:07:28 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux root@cap:~# uname -r 4.4.0-21-generic > Enabled sysrq and also xmon before starting tests root@cap:~# cat /proc/sys/kernel/sysrq 1 root@cap:~# cat /proc/cmdline root=UUID=4114e1ef-5e30-45ae-a5fb-a5429946434c ro xmon=on splash quiet crashkernel=384M-:128M > root@cap:~/fix_140718# ppc64_cpu --smt SMT=8 > Started tests with Leaf IO and SMT. After 10+ hours of run, ipmi console > dumping call traces continuously and not able to get the prompt. ssh cap is hung, ping cap is working fine [ipjoga@kte ~]$ ssh root@cap ipjoga@kte ~]$ ping cap PING cap.isst.aus.stglabs.ibm.com (10.33.17.16) 56(84) bytes of data. 64 bytes from cap.isst.aus.stglabs.ibm.com (10.33.17.16): icmp_seq=1 ttl=64 time=0.095 ms 64 bytes from cap.isst.aus.stglabs.ibm.com (10.33.17.16): icmp_seq=2 ttl=64 time=0.055 ms ^C > Attached Call traces > Also memory in this system is oot@cap:/kte/tools/setup.d# free -h total used free shared buff/cache available Mem: 1.0T 4.4G 1.0T 37M 9.4G 1.0T Swap: 37G 0B 37G root@cap:/kte/tools/setup.d# UBUNTU BUILD: 4.4.0-38-generic SL Firmware Version : IBM-garrison-ibm-OP8_v1.10_2.17 IO team thinks this is related/fixed by commit 135e8c9250dd ("sched/core: Fix a race between try_to_wake_up() and a woken up task"). We built a kernel with that patch applied and asked indira to restart the tests. Developer provided the fix for above issue . Applied it and restarted Leaf IO and SMT tests which has both SMT fix and Memory barrier fix. root@cap:~# uname -r 4.4.0-38.58+ibm-smt1-generic Run went fine for more than 60+ hours without any system hang. Canonical, we believe the following issue to be fixed by: commit 135e8c9250dd ("sched/core: Fix a race between try_to_wake_up() and a woken up task") Which was marked to the -stable tree. Can you pull it into your kernel? ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-146713 severity-critical targetmilestone-inin16041 -- ISST-LTE:pNV:cap: Call traces dumping continuously after 10+ hours of regression with Leaf IO and SMT tests with SMT fix https://bugs.launchpad.net/bugs/1629872 You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp