On Wed, 2015-10-21 at 14:18 -0400, Mike Snitzer wrote: > On Wed, Oct 21 2015 at 1:33pm -0400, > Ming Lin <m...@kernel.org> wrote: > > > On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote: > > > On Wed, Oct 21 2015 at 12:02pm -0400, > > > Mike Snitzer <snit...@redhat.com> wrote: > > > > > > > On Wed, Oct 14 2015 at 9:27am -0400, > > > > Christoph Hellwig <h...@infradead.org> wrote: > > > > > > > > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote: > > > > > > I just did a quick test with a Samsung 900G NVMe device. > > > > > > mkfs.xfs is OK on 4.3-rc5. > > > > > > > > > > > > What's your device model? I may find a similar one to try. > > > > > > > > > > This is a HGST Ultrastar SN100 > > > > > > > > > > Analsys and tentativ fix below: > > > > > > > > > > blktrace for before the commit: > > > > > > > > > > 259,0 1 2 0.000002543 2394 G D 0 + 8388607 > > > > > [mkfs.xfs] > > > > > 259,0 1 3 0.000008230 2394 I D 0 + 8388607 > > > > > [mkfs.xfs] > > > > > 259,0 1 4 0.000031090 207 D D 0 + 8388607 > > > > > [kworker/1:1H] > > > > > 259,0 1 5 0.000044869 2394 Q D 8388607 + 8388607 > > > > > [mkfs.xfs] > > > > > 259,0 1 6 0.000045992 2394 G D 8388607 + 8388607 > > > > > [mkfs.xfs] > > > > > 259,0 1 7 0.000049559 2394 I D 8388607 + 8388607 > > > > > [mkfs.xfs] > > > > > 259,0 1 8 0.000061551 207 D D 8388607 + 8388607 > > > > > [kworker/1:1H] > > > > > > > > > > .. and so on. > > > > > > > > > > blktrace with the commit: > > > > > > > > > > 259,0 2 1 0.000000000 1228 Q D 0 + 4194304 > > > > > [mkfs.xfs] > > > > > 259,0 2 2 0.000002543 1228 G D 0 + 4194304 > > > > > [mkfs.xfs] > > > > > 259,0 2 3 0.000010080 1228 I D 0 + 4194304 > > > > > [mkfs.xfs] > > > > > 259,0 2 4 0.000082187 267 D D 0 + 4194304 > > > > > [kworker/2:1H] > > > > > 259,0 2 5 0.000224869 1228 Q D 4194304 + 4194304 > > > > > [mkfs.xfs] > > > > > 259,0 2 6 0.000225835 1228 G D 4194304 + 4194304 > > > > > [mkfs.xfs] > > > > > 259,0 2 7 0.000229457 1228 I D 4194304 + 4194304 > > > > > [mkfs.xfs] > > > > > 259,0 2 8 0.000238507 267 D D 4194304 + 4194304 > > > > > [kworker/2:1H] > > > > > > > > > > So discards are smaller, but better aligned. Now if I tweak a single > > > > > line in blk-lib.c to be able to use all of bi_size I get the old I/O > > > > > pattern back and everything works fine again: > > > > > > > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c > > > > > index bd40292..65b61dc 100644 > > > > > --- a/block/blk-lib.c > > > > > +++ b/block/blk-lib.c > > > > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, > > > > > sector_t sector, > > > > > break; > > > > > } > > > > > > > > > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS); > > > > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); > > > > > end_sect = sector + req_sects; > > > > > > > > > > bio->bi_iter.bi_sector = sector; > > > > > > > > Can we change UINT_MAX >> 9 to rounddown to the first factor of > > > > minimum_io_size? > > > > > > > > That should work for all devices and for dm-thinp (and dm-cache) in > > > > particular will ensure that all discards that are issued will be a > > > > multiple of the underlying device's blocksize. > > > > > > Jeff Moyer pointed out having req_sects be a factor of > > > discard_granularity makes more sense. And I agree. Same difference in > > > the end (since dm-thinp sets discard_granularity to the thinp > > > blocksize). > > > > An old version of this patch did use discard_granularity > > https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html > > > > But you didn't agree. > > https://www.redhat.com/archives/dm-devel/2015-August/msg00001.html > > > > Maybe we can re-add discard_granularity now? > > I disagreed on a more generic level than discard_granularity shaping the > split boundary. > > But we are where we are. If we're going to split (due to 32-bit limits > in bio->bi_iter.bi_size) then we should at least do so in terms of the > support discard_granularity.
How about below? It actually reverts commit b49a0871 and adds patch at https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html Christoph, could you help to try it? commit 122bf0a43cb1611ed62aaf945f25b649c27a71ed Author: Ming Lin <m...@kernel.org> Date: Wed Oct 21 11:24:48 2015 -0700 block: check discard_granularity and alignment Signed-off-by: Ming Lin <min...@ssi.samsung.com> --- block/blk-lib.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index bd40292..9ebf653 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio) bio_put(bio); } -/* - * Ensure that max discard sectors doesn't overflow bi_size and hopefully - * it is of the proper granularity as long as the granularity is a power - * of two. - */ -#define MAX_BIO_SECTORS ((1U << 31) >> 9) - /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; + unsigned int granularity; + int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q->limits.discard_granularity >> 9, 1U); + alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; + if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(&plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect; + sector_t end_sect, tmp; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS); + /* Make sure bi_size doesn't overflow */ + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); + + /* + * If splitting a request, and the next starting sector would be + * misaligned, stop the discard at the previous aligned sector. + */ end_sect = sector + req_sects; + tmp = end_sect; + if (req_sects < nr_sects && + sector_div(tmp, granularity) != alignment) { + end_sect = end_sect - alignment; + sector_div(end_sect, granularity); + end_sect = end_sect * granularity + alignment; + req_sects = end_sect - sector; + } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/