Hi Kuai I found one thing. The interface ->blocks_synced doesn't work in the write io path. So there is a risk of data corruption. For example:
mdadm -CR /dev/md0 -l5 -n5 /dev/loop[0-4] --bitmap=lockless (all bits are unwritten because lazy initial recovery) 1. D1 D2 D3 D4 P1, a small write hits D2. rmw is 2 (need to read old data of D2 and P1), rcw is 3 (need to read D1 D3 and D4). 2. D2 disk fails 3. read data from disk2. It needs to calculate the data from other disks. But the result is not the real data which was written to D2 So ->blocks_synced needs to be checked in handle_stripe_dirtying. Best Regards Xiao On Fri, Aug 1, 2025 at 3:11 PM Yu Kuai <yuku...@huaweicloud.com> wrote: > > From: Yu Kuai <yuku...@huawei.com> > > Currently, raid456 must perform a whole array initial recovery to build > initail xor data, then IO to the array won't have to read all the blocks > in underlying disks. > > This behavior will affect IO performance a lot, and nowadays there are > huge disks and the initial recovery can take a long time. Hence llbitmap > will support lazy initial recovery in following patches. This method is > used to check if data blocks is synced or not, if not then IO will still > have to read all blocks for raid456. > > Signed-off-by: Yu Kuai <yuku...@huawei.com> > Reviewed-by: Christoph Hellwig <h...@lst.de> > Reviewed-by: Hannes Reinecke <h...@suse.de> > Reviewed-by: Xiao Ni <x...@redhat.com> > Reviewed-by: Li Nan <linan...@huawei.com> > --- > drivers/md/md-bitmap.h | 1 + > drivers/md/raid5.c | 6 ++++++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h > index 95453696c68e..5f41724cbcd8 100644 > --- a/drivers/md/md-bitmap.h > +++ b/drivers/md/md-bitmap.h > @@ -90,6 +90,7 @@ struct bitmap_operations { > md_bitmap_fn *end_discard; > > sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset); > + bool (*blocks_synced)(struct mddev *mddev, sector_t offset); > bool (*start_sync)(struct mddev *mddev, sector_t offset, > sector_t *blocks, bool degraded); > void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t > *blocks); > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 5285e72341a2..2121f0ff5e30 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -3748,6 +3748,7 @@ static int want_replace(struct stripe_head *sh, int > disk_idx) > static int need_this_block(struct stripe_head *sh, struct stripe_head_state > *s, > int disk_idx, int disks) > { > + struct mddev *mddev = sh->raid_conf->mddev; > struct r5dev *dev = &sh->dev[disk_idx]; > struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]], > &sh->dev[s->failed_num[1]] }; > @@ -3762,6 +3763,11 @@ static int need_this_block(struct stripe_head *sh, > struct stripe_head_state *s, > */ > return 0; > > + /* The initial recover is not done, must read everything */ > + if (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced && > + !mddev->bitmap_ops->blocks_synced(mddev, sh->sector)) > + return 1; > + > if (dev->toread || > (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags))) > /* We need this block to directly satisfy a request */ > -- > 2.39.2 >