*** This bug is a duplicate of bug 1827755 ***
    https://bugs.launchpad.net/bugs/1827755

** This bug has been marked a duplicate of bug 1827755
   nx842 - CRB request time out (-110) when uninstall NX modules and initiate 
NX request

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1831536

Title:
  crash with "Data Access Out of Range" when using nx-842 zswap on
  POWER9

Status in The Ubuntu-power-systems project:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  On my two socket POWER9 system (powernv) with 842 zwap set up, I
  recently got a crash with the Ubuntu kernel (I haven't tried with
  upstream, and this is the first time the system has died like this, so
  I'm not sure how repeatable it is).

  [    2.891463] zswap: loaded using pool 842-nx/zbud
  ...
  [15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 5000000 
us, giving up : 00 00 00 00 00000000
  [16868.932913] Unable to handle kernel paging request for data at address 
0x6655f67da816cdb8
  [16868.933726] Faulting instruction address: 0xc000000000391600

  
  cpu 0x68: Vector: 380 (Data Access Out of Range) at [c000001c9d98b9a0]
      pc: c000000000391600: kmem_cache_alloc+0x2e0/0x340
      lr: c0000000003915ec: kmem_cache_alloc+0x2cc/0x340
      sp: c000001c9d98bc20
     msr: 900000000280b033
     dar: 6655f67da816cdb8
    current = 0xc000001ad43cb400
    paca    = 0xc00000000fac7800   softe: 0        irq_happened: 0x01
      pid   = 8319, comm = make
  Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 7.3.0 
(Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 (Ubuntu 
4.15.0-50.54-generic 4.15.18)

  68:mon> t
  [c000001c9d98bc20] c0000000003914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
  [c000001c9d98bc80] c0000000003b1e14 __khugepaged_enter+0x54/0x220
  [c000001c9d98bcc0] c00000000010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
  [c000001c9d98bda0] c00000000010fe4c _do_fork+0xec/0x510
  [c000001c9d98be30] c00000000000b584 ppc_clone+0x8/0xc
  --- Exception: c00 (System Call) at 00007afe9daf87f4
  SP (7fffca606880) is in userspace

  So, it looks like there could be a problem in the error path, plausibly
  fixed by this patch:

  commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
  Author: Haren Myneni <ha...@linux.vnet.ibm.com>
  Date:   Wed Jun 13 00:32:40 2018 -0700

      crypto/nx: Initialize 842 high and normal RxFIFO control registers
      
      NX increments readOffset by FIFO size in receive FIFO control register
      when CRB is read. But the index in RxFIFO has to match with the
      corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
      may be processing incorrect CRBs and can cause CRB timeout.
      
      VAS FIFO offset is 0 when the receive window is opened during
      initialization. When the module is reloaded or in kexec boot, readOffset
      in FIFO control register may not match with VAS entry. This patch adds
      nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
      control register for both high and normal FIFOs.
      
      Signed-off-by: Haren Myneni <ha...@us.ibm.com>
      [mpe: Fixup uninitialized variable warning]
      Signed-off-by: Michael Ellerman <m...@ellerman.id.au>

  $ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
  v4.19-rc1~24^2~50

  
  Which was never backported to any stable release, so probably needs to
  be for v4.14 through v4.18. Notably, Ubuntu is on v4.15 and it doesn't
  seem to have picked up the patch.

  Reported to upstream (and there may be further discussion) over at
  https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/191438.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1831536/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to