Well, we do drive our storage sub-system from time to time to the limits - 
especially if we do parallel LPAR deployments for OpenStack environments.
But that's on a z13 and a DS8k - and so far we never saw such issues in this 
environment.

Further investigations in Launchpad did not resulted in further
references to similar reports like this, with SCSI / wbt (or wbt in
general) on focal.

However, I found that there were wbt, respectively blk-wbt, issues in the past 
with kernels > 4.10 and < v4.19 that partially led to CPU hard lockups on heavy 
writes (largely reported on NVMe drives).
But those bugs where only reported on bionic (and cosmic) - which fits to the 
kernel range above - and got fixed quite some time ago.
The bionic (and cosmic) kernels where patched via backports of:
2887e41b910b - "blk-wbt: Avoid lock contention and thundering herd issue in 
wbt_wait"
38cfb5a45ee0 - "blk-wbt: improve waking of tasks"
I just double checked that the fixes from those tickets are (still) in, and 
they are.

With only having heard about this problem in this bug here, I agree that 
recommending to turn WBT off in general would not be good - even preferring 
stability over performance.
(I still have the suspicion that it could be XIV related, rather than general 
block or SCSI layer...)

However, for now we may add a statement to the s390x section of the
release notes pointing to WBT and the udev rule for disabling it for the
block-devices, in case one hits such issues under high disk I/O stress.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1881109

Title:
  [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be
  triggered by scsi errors.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1881109/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to