Patches posted to kernel-team mailing list [1]. [1] https://lists.ubuntu.com/archives/kernel-team/2018-October/096072.html [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)
** Description changed: [Impact] - * Detaching virtio-scsi disk in Xenial guest can cause - CPU soft lockup in guest (and take 100% CPU in host). + * Detaching virtio-scsi disk in Xenial guest can cause + CPU soft lockup in guest (and take 100% CPU in host). - * It may prevent further progress on other tasks that - depend on resources locked earlier in the SCSI target - removal stack, and/or impact other SCSI functionality. + * It may prevent further progress on other tasks that + depend on resources locked earlier in the SCSI target + removal stack, and/or impact other SCSI functionality. - * The fix resolves a corner case in the requests counter - in the virtio SCSI target, which impacts a downstream - (SAUCE) patch in the virtio-scsi target removal handler - that depends on the requests counter. + * The fix resolves a corner case in the requests counter + in the virtio SCSI target, which impacts a downstream + (SAUCE) patch in the virtio-scsi target removal handler + that depends on the requests counter value to be zero. [Test Case] - * See LP #1798110 (this bug)'s comment #3 (too long for - this section -- synthetic case with GDB+QEMU) and - comment #4 (organic test case in cloud instance). + * See LP #1798110 (this bug)'s comment #3 (too long for + this section -- synthetic case with GDB+QEMU) and + comment #4 (organic test case in cloud instance). [Regression Potential] - * It seem low -- this only affects the SCSI command requeue - path with regards to the reference counter, which is only - used with real chance of problems in our downstream patch - (which is now passing this testcase). + * It seem low -- this only affects the SCSI command requeue + path with regards to the reference counter, which is only + used with real chance of problems in our downstream patch + (which is now passing this testcase). - * The other less serious issue would be decrementing it to - a negative / < 0 value, which is not possible with this - driver logic (see commit message), because the reqs counter - is always incremented before calling virtscsi_queuecommand(), - where this decrement operation is inserted. + * The other less serious issue would be decrementing it to + a negative / < 0 value, which is not possible with this + driver logic (see commit message), because the reqs counter + is always incremented before calling virtscsi_queuecommand(), + where this decrement operation is inserted. [Original Description] A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial when detaching a virtio-scsi drive, and provided a crashdump that shows 2 things: 1) The soft locked up CPU is waiting for another CPU to finish something, and that does not happen because the other CPU is infinitely looping in virtscsi_target_destroy(). 2) The loop happens because the 'tgt->reqs' counter is non-zero, and that probably happened due to a missing decrement in SCSI command requeue path, exercised when the virtio ring is full. The reported problem itself happens because of a downstream/SAUCE patch, coupled with the problem of the missing decrement for the reqs counter. Introducing a decrement in the SCSI command requeue path resolves the problem, verified synthetically with QEMU+GDB and with test-case/loop provided by the customer as problem reproducer. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1798110 Title: xenial: virtio-scsi: CPU soft lockup due to loop in virtscsi_target_destroy() Status in linux package in Ubuntu: Confirmed Bug description: [Impact] * Detaching virtio-scsi disk in Xenial guest can cause CPU soft lockup in guest (and take 100% CPU in host). * It may prevent further progress on other tasks that depend on resources locked earlier in the SCSI target removal stack, and/or impact other SCSI functionality. * The fix resolves a corner case in the requests counter in the virtio SCSI target, which impacts a downstream (SAUCE) patch in the virtio-scsi target removal handler that depends on the requests counter value to be zero. [Test Case] * See LP #1798110 (this bug)'s comment #3 (too long for this section -- synthetic case with GDB+QEMU) and comment #4 (organic test case in cloud instance). [Regression Potential] * It seem low -- this only affects the SCSI command requeue path with regards to the reference counter, which is only used with real chance of problems in our downstream patch (which is now passing this testcase). * The other less serious issue would be decrementing it to a negative / < 0 value, which is not possible with this driver logic (see commit message), because the reqs counter is always incremented before calling virtscsi_queuecommand(), where this decrement operation is inserted. [Original Description] A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial when detaching a virtio-scsi drive, and provided a crashdump that shows 2 things: 1) The soft locked up CPU is waiting for another CPU to finish something, and that does not happen because the other CPU is infinitely looping in virtscsi_target_destroy(). 2) The loop happens because the 'tgt->reqs' counter is non-zero, and that probably happened due to a missing decrement in SCSI command requeue path, exercised when the virtio ring is full. The reported problem itself happens because of a downstream/SAUCE patch, coupled with the problem of the missing decrement for the reqs counter. Introducing a decrement in the SCSI command requeue path resolves the problem, verified synthetically with QEMU+GDB and with test-case/loop provided by the customer as problem reproducer. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798110/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp