For a zoned dm-crypt device, the native support for zone append operations (REQ_OP_ZONE_APPEND) is not enabled and instead dm-crypt relies on the emulation done by the block layer using regular write operations. This still requires to properly return the written sector as the BIO sector of the original BIO. However, this can be done correctly only and only if there is a single clone BIO used for processing the original zone append operation issued by the user. If the size of a zone append operation is larger than dm-crypt max_write_size, then the orginal BIO will be split and processed as a chain of regular write operations. Such chaining result in an incorrect written sector being returned to the zone append issuer using the original BIO sector. This in turn results in file system data corruptions using xfs or btrfs.
Fix this by modifying crypt_io_hints() to cap the max_hw_sectors limit for the dm-crypt device to the max_write_size limit. As this limit also caps the device max_zone_append_sectors limit, this forces the caller to issue zone append operations smaller than this limit, thus moving the splitting of large append operations to the caller instead of the incorrect internal splitting. This change does not have any effect on the size of BIOs received by dm-crypt since the block layer does not automatically splits BIOs for BIO-based devices. So there is no impact on the performance of regular read and write operations. Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation") Cc: sta...@vger.kernel.org Signed-off-by: Damien Le Moal <dlem...@kernel.org> --- drivers/md/dm-crypt.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 9dfdb63220d7..7b1e88781a82 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -3730,6 +3730,18 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits) max_t(unsigned int, limits->physical_block_size, cc->sector_size); limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size); limits->dma_alignment = limits->logical_block_size - 1; + + /* + * For zoned devices, we cannot split write operations used to emulate + * zone append operations. And since all write requests are going to be + * split on get_max_request_size(cc, true) size, apply this limit to the + * maximum hardware I/O size so that we have a cap on the + * max_zone_append_sectors limit when the zone limits are validated by + * the block layer. + */ + if (ti->emulate_zone_append) + limits->max_hw_sectors = min(limits->max_hw_sectors, + get_max_request_size(cc, true)); } static struct target_type crypt_target = { -- 2.49.0