** Attachment added: "lspci-vnvn.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821738/+attachment/5249466/+files/lspci-vnvn.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821738

Title:
  A userspace process hangs in d-state forever in a virtual machine
  environment with a virtio-scsi disk

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Ubuntu 4.15.0-46.49-generic 4.15.18

  
  It happens because the process is waiting for its request completion which 
never happens.

  The reason for the hung request is a race condition inside the block
  layer.

  Namely, there is a race condition with a long request.

  Each request has a timer. When timer fires it sets REQ_ATOM_COMPLETE
  and clears it after finishing.

  The request completion checks REQ_ATOM_COMPLETE and if it is set the 
completion returns doing nothing and never executes again, thinking that the 
request doesn't need any attention anymore since it's actually completed.
   
  Thus, if the request completion starts executing when the timer handler is in 
progress it just returns seeing that the complete flag is set, then the timer 
clears the complete flag and the request stays in the system forever executing 
the timer handler again and again which just rearms itself.

  This happens with the long-running requests only. By default, the request 
timeout is 30 seconds so there should be a request which execution time > 30 
seconds.
  This is a rare case for local hardware storages but may appear more often 
when the storage is accessed via a network.

  The behavior described affects mainstream 4.13, 4.14, 4.15 kernels and 
rh7-3.10.0-957.5.1.el7 kernel based systems.
   
  Before 4.13 - the timer didn't rearm itself and just aborted the request. The 
patch rearming the timer was introduced in 4.13: e72c9a2a67a6400c "scsi: 
virtio_scsi: let host do exception handling"

  After 4.15 the block layer switched to using MQ scheme in block layer
  which isn't prone to this kind of races. In recent kernel >=5.0 there
  is the only MQ scheme left and the legacy race-prone block layer code
  has been removed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821738/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to