** Attachment added: "lspci-vnvn.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821738/+attachment/5249466/+files/lspci-vnvn.log
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821738 Title: A userspace process hangs in d-state forever in a virtual machine environment with a virtio-scsi disk Status in linux package in Ubuntu: Confirmed Bug description: Ubuntu 4.15.0-46.49-generic 4.15.18 It happens because the process is waiting for its request completion which never happens. The reason for the hung request is a race condition inside the block layer. Namely, there is a race condition with a long request. Each request has a timer. When timer fires it sets REQ_ATOM_COMPLETE and clears it after finishing. The request completion checks REQ_ATOM_COMPLETE and if it is set the completion returns doing nothing and never executes again, thinking that the request doesn't need any attention anymore since it's actually completed. Thus, if the request completion starts executing when the timer handler is in progress it just returns seeing that the complete flag is set, then the timer clears the complete flag and the request stays in the system forever executing the timer handler again and again which just rearms itself. This happens with the long-running requests only. By default, the request timeout is 30 seconds so there should be a request which execution time > 30 seconds. This is a rare case for local hardware storages but may appear more often when the storage is accessed via a network. The behavior described affects mainstream 4.13, 4.14, 4.15 kernels and rh7-3.10.0-957.5.1.el7 kernel based systems. Before 4.13 - the timer didn't rearm itself and just aborted the request. The patch rearming the timer was introduced in 4.13: e72c9a2a67a6400c "scsi: virtio_scsi: let host do exception handling" After 4.15 the block layer switched to using MQ scheme in block layer which isn't prone to this kind of races. In recent kernel >=5.0 there is the only MQ scheme left and the legacy race-prone block layer code has been removed. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821738/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp