On Thu, Apr 10, 2025 at 05:02:00PM +0900, Damien Le Moal wrote: > On 4/10/25 4:54 PM, Christoph Hellwig wrote: > > On Thu, Apr 10, 2025 at 01:33:11PM +0900, Damien Le Moal wrote: > >> When a dm-delay devie is being suspended, the .presuspend() operation is > > > > s/devie/device/ > > > >> first executed (delay_presuspend()) to immediately issue all the BIOs > >> present in the delayed list of the device and also sets the device > >> may_delay boolean to false. At the same time, any new BIO is issued to > >> the device will not be delayed and immediately issued with delay_bio() > >> returning DM_MAPIO_REMAPPED. This creates a situation where potentially > >> 2 different contexts may be issuing write BIOs to the same zone of a > >> zone device without respecting the issuing order from the user, that is, > >> a newly issued write BIO may be issued before other write BIOs for the > >> same target zone that are in the device delayed list. If such situation > >> occurs, write BIOs may be failed by the underlying zoned device due to > >> an unaligned write error. > >> > >> Prevent this situation from happening by always handling newly issued > >> write BIOs using the delayed list of BIOs, even when the device is being > >> suspended. This is done by forcing the use of the worker kthread for > >> zoned devices, and by modifying flush_worker_fn() to always flush all > >> delayed BIOs if the device may_delay boolean is false. > > > > Is that scenario specific to dm-delay? I think the same applies to > > any other target that supports passing on bios to zoned devices. > > Don't think so. The delayed BIO list is a dm-delay thing only. dm-linear, > dm-error or dm-crypt do not delay BIOs like dm-delay so we do not have BIOs > being issued in the pre-suspend context. > > > > >> spin_lock(&dc->delayed_bios_lock); > >> if (unlikely(!dc->may_delay)) { > >> - spin_unlock(&dc->delayed_bios_lock); > >> - return DM_MAPIO_REMAPPED; > >> + /* > >> + * Issue the BIO immediately if the device is not zoned. FOr a > >> + * zoned device, preserver the ordering of write operations by > >> + * using the delay list. > >> + */ > >> + if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) { > >> + spin_unlock(&dc->delayed_bios_lock); > >> + return DM_MAPIO_REMAPPED; > > > > Don't we only need to do that if anything is queued up due to a > > suspension? Then again having a different patch might be premature > > optimization, it's not like anyone cares about dm-delay performance > > almost by definition :) > > True. I can add a list_empty check here. Will send a v2 with that.
I may be pointing out the obvious, but we can't just check if dc->delayed_bios is empty, since we pull the bios off the list before we submit them. We can only skip adding the bios if the list is empty and flush_delay_bios() isn't currently processing any bios. -Ben > > > -- > Damien Le Moal > Western Digital Research