On Tue, Mar 07 2017 at 3:29pm -0500, NeilBrown <ne...@suse.com> wrote:
> On Tue, Mar 07 2017, Mike Snitzer wrote: > > > On Tue, Mar 07 2017 at 12:05pm -0500, > > Jens Axboe <ax...@kernel.dk> wrote: > > > >> On 03/07/2017 09:52 AM, Mike Snitzer wrote: > >> > > >> > In addition to Jack's MD raid test there is a DM snapshot deadlock test, > >> > albeit unpolished/needy to get running, see: > >> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > >> > >> Can you run this patch with that test, reverting your DM workaround? > > > > Yeap, will do. Last time Mikulas tried a similar patch it still > > deadlocked. But I'll give it a go (likely tomorrow). > > I don't think this will fix the DM snapshot deadlock by itself. > Rather, it make it possible for some internal changes to DM to fix it. > The DM change might be something vaguely like: > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 3086da5664f3..06ee0960e415 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct > clone_info *ci) > > len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count); > > + if (len < ci->sector_count) { > + struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set); > + bio_chain(split, bio); > + generic_make_request(bio); > + bio = split; > + ci->sector_count = len; > + } > + > r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); > if (r < 0) > return r; > > Instead of looping inside DM, this change causes the remainder to be > passed to generic_make_request() and DM only handles or region at a > time. So there is only one loop, in the top generic_make_request(). > That loop will not reliable handle bios in the "right" order. s/not reliable/now reliably/ ? ;) But thanks for the suggestion Neil. Will dig in once I get through a backlog of other DM target code I have queued for 4.12 review. Mike