On Mon, Dec 08, 2025 at 06:07:14PM +0800, Yongpeng Yang wrote:
> From: Yongpeng Yang <[email protected]>
> 
> Currently, the max_hw_discard_sectors of a stripe target is set to the
> minimum max_hw_discard_sectors among all sub devices. When the discard
> bio is larger than max_hw_discard_sectors, this may cause the stripe
> device to split discard bios unnecessarily, because the value of
> max_hw_discard_sectors affects max_discard_sectors, which equal to
> min(max_hw_discard_sectors, max_user_discard_sectors).
> 
> For example:
> root@vm:~# echo '0 33554432 striped 2 256 /dev/vdd 0 /dev/vde 0' | dmsetup 
> create stripe_dev
> root@vm:~# cat /sys/block/dm-1/queue/discard_max_bytes
> 536870912
> root@vm:~# cat /sys/block/dm-1/slaves/vdd/queue/discard_max_bytes
> 536870912
> root@vm:~# blkdiscard -o 0 -l 1073741824 -p 1073741824 /dev/mapper/stripe_dev
> 
> dm-1 is the stripe device, and its discard_max_bytes is equal to
> each sub device’s discard_max_bytes. Since the requested discard
> length exceeds discard_max_bytes, the block layer splits the discard bio:
> 
> block_bio_queue: 252,1 DS 0 + 2097152 [blkdiscard]
> block_split: 252,1 DS 0 / 1048576 [blkdiscard]
> block_rq_issue: 253,48 DS 268435456 () 0 + 524288 be,0,4 [blkdiscard]
> block_bio_queue: 253,64 DS 524288 + 524288 [blkdiscard]
> 
> However, both vdd and vde can actually handle a discard bio of 536870912
> bytes, so this split is not necessary.
> 
> This patch updates the stripe target’s q->limits.max_hw_discard_sectors
> to be the minimum max_hw_discard_sectors of the sub devices multiplied
> by the # of stripe devices. This enables the stripe device to handle
> larger discard bios without incurring unnecessary splitting.
> 
> Signed-off-by: Yongpeng Yang <[email protected]>
> ---
>  drivers/md/dm-stripe.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
> index 1461dc740dae..799d6def699b 100644
> --- a/drivers/md/dm-stripe.c
> +++ b/drivers/md/dm-stripe.c
> @@ -38,6 +38,9 @@ struct stripe_c {
>       uint32_t chunk_size;
>       int chunk_size_shift;
>  
> +     /* The minimum max_hw_discard_sectors of all sub devices. */
> +     unsigned int max_hw_discard_sectors;
> +
>       /* Needed for handling events */
>       struct dm_target *ti;
>  
> @@ -169,6 +172,8 @@ static int stripe_ctr(struct dm_target *ti, unsigned int 
> argc, char **argv)
>        * Get the stripe destinations.
>        */
>       for (i = 0; i < stripes; i++) {
> +             struct request_queue *q;
> +
>               argv += 2;
>  
>               r = get_stripe(ti, sc, i, argv);
> @@ -180,6 +185,13 @@ static int stripe_ctr(struct dm_target *ti, unsigned int 
> argc, char **argv)
>                       return r;
>               }
>               atomic_set(&(sc->stripe[i].error_count), 0);
> +
> +             q = bdev_get_queue(sc->stripe[i].dev->bdev);
> +             if (i == 0)
> +                     sc->max_hw_discard_sectors = 
> q->limits.max_hw_discard_sectors;
> +             else
> +                     sc->max_hw_discard_sectors = 
> min_not_zero(sc->max_hw_discard_sectors,
> +                                     q->limits.max_hw_discard_sectors);

I don't think any of the above is necessary. When stripe_io_hints() is
called, dm_set_device_limits() will already have been called on all the
underlying stripe devices to combine the limits. So
limits->max_hw_discard_sectors should already be set to the same value
as you're computing for sc->max_hw_discard_sectors. Right?

>       }
>  
>       ti->private = sc;
> @@ -456,7 +468,7 @@ static void stripe_io_hints(struct dm_target *ti,
>                           struct queue_limits *limits)
>  {
>       struct stripe_c *sc = ti->private;
> -     unsigned int io_min, io_opt;
> +     unsigned int io_min, io_opt, max_hw_discard_sectors;
>  
>       limits->chunk_sectors = sc->chunk_size;
>  
> @@ -465,6 +477,8 @@ static void stripe_io_hints(struct dm_target *ti,
>               limits->io_min = io_min;
>               limits->io_opt = io_opt;
>       }
> +     if (!check_mul_overflow(sc->max_hw_discard_sectors, sc->stripes, 
> &max_hw_discard_sectors))

sc->max_hw_discard_sectors should be the same as
limits->max_hw_discard_sectors here 

> +             limits->max_hw_discard_sectors = max_hw_discard_sectors;

I see a couple of issues with this calculation. First, this only works
if max_hw_discard_sectors is greater than sc->chunk_size. But more than
that, before you multiply the original limit by the number of stripes,
you must round it down to a multiple of chunk_size. Otherwise, you can
end up writing too much to some of the underlying devices. To show this
with simple numbers, imagine you had 3 stripes that with a chunk_size of
4 and all the underlying devices had a max_hw_discard_sectors of 5. You
would set max_hw_discard_sectors to 5 * 3 = 15. But if you discared 15
sectors from the beginning of the device, you would end up discarding
the first 4 sectors of each underlying device, and then loop around
discard 3 more sectors of the first device. This means that you would
discard 7 from the first device, instead of the 5 that it could handle.
Rounding max_hw_discard_sectors down to a multiple of chunk_size will
fix that.

Lastly, if you do overflow when multiplying max_hw_discard_sectors by
the number of stripes, you should probably just set
limits->max_hw_discard_sectors to UINT_MAX >> SECTOR_SHIFT, instead of
leaving it at what it was.

-Ben

>  }
>  
>  static struct target_type stripe_target = {
> -- 
> 2.43.0


Reply via email to