[Kernel-packages] [Bug 1777736] Re: hisi_sas_v3_hw: internal task abort: timeout and not done.

Launchpad Bug Tracker Wed, 25 Jul 2018 22:27:24 -0700

This bug was fixed in the package linux - 4.15.0-29.31

---------------
linux (4.15.0-29.31) bionic; urgency=medium


  * linux: 4.15.0-29.31 -proposed tracker (LP: #1782173)

  * [SRU Bionic][Cosmic] kernel panic in ipmi_ssif at msg_done_handler
    (LP: #1777716)
    - ipmi_ssif: Fix kernel panic at msg_done_handler

  * Update to ocxl driver for 18.04.1 (LP: #1775786)
    - misc: ocxl: use put_device() instead of device_unregister()
    - powerpc: Add TIDR CPU feature for POWER9
    - powerpc: Use TIDR CPU feature to control TIDR allocation
    - powerpc: use task_pid_nr() for TID allocation
    - ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
    - ocxl: Expose the thread_id needed for wait on POWER9
    - ocxl: Add an IOCTL so userspace knows what OCXL features are available
    - ocxl: Document new OCXL IOCTLs
    - ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

  * Hard LOCKUP observed on stressing Ubuntu 18 04 (LP: #1777194)
    - powerpc: use NMI IPI for smp_send_stop
    - powerpc: Fix smp_send_stop NMI IPI handling

  * IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
    CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n (LP: #1773964)
    - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
    comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
    - SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
      stress-ng: Corrupt inode bitmap"
    - SAUCE: ext4: check for allocation block validity with block group locked

linux (4.15.0-28.30) bionic; urgency=medium

  * linux: 4.15.0-28.30 -proposed tracker (LP: #1781433)

  * Cannot set MTU higher than 1500 in Xen instance (LP: #1781413)
    - xen-netfront: Fix mismatched rtnl_unlock
    - xen-netfront: Update features after registering netdev

linux (4.15.0-27.29) bionic; urgency=medium

  * linux: 4.15.0-27.29 -proposed tracker (LP: #1781062)

  * [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99:
    comm stress-ng: Corrupt inode bitmap (LP: #1780137)
    - SAUCE: ext4: fix ext4_validate_inode_bitmap: comm stress-ng: Corrupt inode
      bitmap

linux (4.15.0-26.28) bionic; urgency=medium

  * linux: 4.15.0-26.28 -proposed tracker (LP: #1780112)

  * failure to boot with linux-image-4.15.0-24-generic (LP: #1779827) // Cloud-
    init causes potentially huge boot delays with 4.15 kernels (LP: #1780062)
    - random: Make getrandom() ready earlier

linux (4.15.0-25.27) bionic; urgency=medium

  * linux: 4.15.0-25.27 -proposed tracker (LP: #1779354)

  * hisi_sas_v3_hw: internal task abort: timeout and not done. (LP: #1777736)
    - scsi: hisi_sas: Update a couple of register settings for v3 hw

  * hisi_sas: Add missing PHY spinlock init (LP: #1777734)
    - scsi: hisi_sas: Add missing PHY spinlock init

  * hisi_sas: improve read performance by pre-allocating slot DMA buffers
    (LP: #1777727)
    - scsi: hisi_sas: use dma_zalloc_coherent()
    - scsi: hisi_sas: Use dmam_alloc_coherent()
    - scsi: hisi_sas: Pre-allocate slot DMA buffers

  * hisi_sas: Failures during host reset (LP: #1777696)
    - scsi: hisi_sas: Only process broadcast change in phy_bcast_v3_hw()
    - scsi: hisi_sas: Fix the conflict between dev gone and host reset
    - scsi: hisi_sas: Adjust task reject period during host reset
    - scsi: hisi_sas: Add a flag to filter PHY events during reset
    - scsi: hisi_sas: Release all remaining resources in clear nexus ha

  * Fake SAS addresses for SATA disks on HiSilicon D05 are non-unique
    (LP: #1776750)
    - scsi: hisi_sas: make SAS address of SATA disks unique

  * Vcs-Git header on bionic linux source package points to zesty git tree
    (LP: #1766055)
    - [Packaging]: Update Vcs-Git

  * large KVM instances run out of IRQ routes (LP: #1778261)
    - SAUCE: kvm -- increase KVM_MAX_IRQ_ROUTES to 2048 on x86

 -- Stefan Bader <[email protected]>  Tue, 17 Jul 2018 10:57:50
+0200

** Changed in: linux (Ubuntu)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1777736

Title:
  hisi_sas_v3_hw: internal task abort: timeout and not done.

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]
  On deployments with lots of disks, timeouts can occur that escalate into 
nexus resets. This can cause disk devices to disappear from the system, 
possibly requiring a reboot to recover:

  [18324.951189] cq: iptt:892, task:ffff8026fbde5000, cmp_st:3, 
err_rcrd_xfrd:1,rspns_xfrd:0,error_phase:6,devid:16,io_cfg_err_code:0,err_code:0,
 ft = 0x0, ata_st=0x0, tgt_io_st=0x0,disk_err=0x0
  [18324.951190] sb dw0:0x8001,dw1:0x0,dw2:0x0,dw3:0x0
  [18324.951191] cmd table: 0x0,0x0,0x0,0x0,0x0
  [18324.951192] itct: 
0x12fa0345,0x5000cca25d31dac1,0x1000000001388,0x0,0x0,0x0,0x0,0x0,0x0,0x0
  [18324.951334] hisi_sas_v3_hw 0000:74:02.0: slot complete: 
task(ffff8026fbde5000) ignored

  [18325.039774] sb dw0:0x8001,dw1:0x0,dw2:0x0,dw3:0x0
  [18325.044467] cmd table: 0x0,0x0,0x0,0x0,0x0
  [18325.048553] itct: 
0x12fa0345,0x5000c50094c65c55,0x1000000001388,0x0,0x0,0x0,0x0,0x0,0x0,0x0
  [18325.057058] hisi_sas_v3_hw 0000:74:02.0: slot complete: 
task(ffff8027dc8e7500) ignored

  [18326.951312] cq: iptt:1705, task:ffff8027820d0200, cmp_st:3, 
err_rcrd_xfrd:1,rspns_xfrd:0,error_phase:6,devid:18,io_cfg_err_code:0,err_code:0,
 ft = 0x0, ata_st=0x0, tgt_io_st=0x0,disk_err=0x0
  [18326.968247] sb dw0:0x8001,dw1:0x0,dw2:0x0,dw3:0x0
  [18326.972938] cmd table: 0x0,0x0,0x0,0x0,0x0
  [18326.977023] itct: 
0x12fa0345,0x5000cca0803e9c1d,0x1000000001388,0x0,0x0,0x0,0x0,0x0,0x0,0x0
  [18326.985496] hisi_sas_v3_hw 0000:74:02.0: slot complete: 
task(ffff8027820d0200) ignored

  [18329.384695] hisi_sas_v3_hw 0000:74:02.0: internal task abort: timeout and 
not done.
  [18329.392344] hisi_sas_v3_hw 0000:74:02.0: start dump all regs,reason:abort 
timeout!
  [18329.399904] ***************DUMP IS DISABLED***************
  [18329.405467] dump reg fail.
  [18329.408162] hisi_sas_v3_hw 0000:74:02.0: I_T nexus reset: internal abort 
(-5)
  [18329.936017] cq: iptt:649, task:ffff8027981f8500, cmp_st:3, 
err_rcrd_xfrd:1,rspns_xfrd:0,error_phase:6,devid:19,io_cfg_err_code:0,err_code:0,
 ft = 0x0, ata_st=0x0, tgt_io_st=0x0,disk_err=0x0
  [18329.936154] cq: iptt:1091, task:ffff8026ff666d00, cmp_st:3, 
err_rcrd_xfrd:1,rspns_xfrd:0,error_phase:6,devid:49,io_cfg_err_code:0,err_code:0,
 ft = 0x0, ata_st=0x0, tgt_io_st=0x0,disk_err=0x0
  [18329.936155] sb dw0:0x8001,dw1:0x0,dw2:0x0,dw3:0x0
  [18329.936156] cmd table: 0x0,0x0,0x0,0x0,0x0
  [18329.936158] itct: 
0x12fa0345,0x5000cca2552b2855,0x1000000001388,0x0,0x0,0x0,0x0,0x0,0x0,0x0
  [18329.936301] hisi_sas_v3_hw 0000:74:02.0: slot complete: 
task(ffff8026ff666d00) ignored

  [Test Case]
  This was seen on a system with 100s of disks, something I don't have access 
to, so verification testing will be regression-only.

  [Fix]
  A fix queued in the scsi maintainer's tree adjusts some magic registers in 
the controller, and that somehow fixes the problem (I don't have programming 
docs for this controller, so I can only hand-wave here).

  [Regression Risk]
  The fix is localized to the hisi_sas_v3_hw driver, which is only used in 
Ubuntu for the D06 platform.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777736/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1777736] Re: hisi_sas_v3_hw: internal task abort: timeout and not done.

Reply via email to