For a zoned dm-crypt device, the native support for zone append
operations (REQ_OP_ZONE_APPEND) is not enabled and instead dm-crypt
relies on the emulation done by the block layer using regular write
operations. This still requires to properly return the written sector as
the BIO sector of the original BIO. However, this can be done correctly
only and only if there is a single clone BIO used for processing the
original zone append operation issued by the user. If the size of a zone
append operation is larger than dm-crypt max_write_size, then the
orginal BIO will be split and processed as a chain of regular write
operations. Such chaining result in an incorrect written sector being
returned to the zone append issuer using the original BIO sector.
This in turn results in file system data corruptions using xfs or btrfs.

Fix this by modifying crypt_io_hints() to cap the max_hw_sectors limit
for the dm-crypt device to the max_write_size limit. As this limit also
caps the device max_zone_append_sectors limit, this forces the caller to
issue zone append operations smaller than this limit, thus moving the
splitting of large append operations to the caller instead of the
incorrect internal splitting.

This change does not have any effect on the size of BIOs received by
dm-crypt since the block layer does not automatically splits BIOs for
BIO-based devices. So there is no impact on the performance of regular
read and write operations.

Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation")
Cc: sta...@vger.kernel.org
Signed-off-by: Damien Le Moal <dlem...@kernel.org>
---
 drivers/md/dm-crypt.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9dfdb63220d7..7b1e88781a82 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3730,6 +3730,18 @@ static void crypt_io_hints(struct dm_target *ti, struct 
queue_limits *limits)
                max_t(unsigned int, limits->physical_block_size, 
cc->sector_size);
        limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size);
        limits->dma_alignment = limits->logical_block_size - 1;
+
+       /*
+        * For zoned devices, we cannot split write operations used to emulate
+        * zone append operations. And since all write requests are going to be
+        * split on get_max_request_size(cc, true) size, apply this limit to the
+        * maximum hardware I/O size so that we have a cap on the
+        * max_zone_append_sectors limit when the zone limits are validated by
+        * the block layer.
+        */
+       if (ti->emulate_zone_append)
+               limits->max_hw_sectors = min(limits->max_hw_sectors,
+                                            get_max_request_size(cc, true));
 }
 
 static struct target_type crypt_target = {
-- 
2.49.0


Reply via email to