February 26, 2026 at 21:21, "Jan Kara" <[email protected] 
mailto:[email protected]?to=%22Jan%20Kara%22%20%3Cjack%40suse.cz%3E > wrote:


> 
> On Thu 26-02-26 16:40:07, Jiayuan Chen wrote:
> 
> > 
> > From: Jiayuan Chen <[email protected]>
> >  
> >  KCSAN reports a data race when concurrent readers access the same
> >  struct file:
> >  
> >  BUG: KCSAN: data-race in filemap_read / filemap_splice_read
> >  
> >  write to 0xffff88811a6f8228 of 8 bytes by task 10061 on cpu 0:
> >  filemap_splice_read+0x523/0x780 mm/filemap.c:3125
> >  ...
> >  
> >  write to 0xffff88811a6f8228 of 8 bytes by task 10066 on cpu 1:
> >  filemap_read+0x98d/0xa10 mm/filemap.c:2873
> >  ...
> >  
> >  Both filemap_read() and filemap_splice_read() update f_ra.prev_pos
> >  without synchronization. This is a benign race since prev_pos is only
> >  used as a hint for readahead heuristics in page_cache_sync_ra(), and a
> >  stale or torn value merely results in a suboptimal readahead decision,
> >  not a correctness issue.
> >  
> >  Use WRITE_ONCE/READ_ONCE to annotate all accesses to prev_pos across
> >  the tree for consistency and silence KCSAN.
> >  
> >  Reported-by: [email protected]
> >  Link: https://syzkaller.appspot.com/bug?extid=6880f676b265dbd42d63
> >  Signed-off-by: Jiayuan Chen <[email protected]>
> > 
> Given this, I think it would be much less intrusive and also more
> explanatory to just mark prev_pos with __data_racy with appropriate reason
> you're mentioning in the changelog.


Thanks for the suggestion. I'm fine either way — __data_racy is indeed
cleaner and less intrusive for a purely heuristic field like this.

I'll wait a bit to see if Andrew or other mm folks have a preference
before resending. Happy to go with whichever approach they prefer.

>  Honza
> 
> > 
> > ---
> >  fs/ext4/dir.c | 2 +-
> >  fs/ntfs3/fsntfs.c | 2 +-
> >  include/trace/events/readahead.h | 2 +-
> >  mm/filemap.c | 6 +++---
> >  mm/readahead.c | 4 ++--
> >  mm/shmem.c | 2 +-
> >  6 files changed, 9 insertions(+), 9 deletions(-)
> >  
> >  diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> >  index 28b2a3deb954..1ddf7acce5ca 100644
> >  --- a/fs/ext4/dir.c
> >  +++ b/fs/ext4/dir.c
> >  @@ -200,7 +200,7 @@ static int ext4_readdir(struct file *file, struct 
> > dir_context *ctx)
> >  sb->s_bdev->bd_mapping,
> >  &file->f_ra, file, index,
> >  1 << EXT4_SB(sb)->s_min_folio_order);
> >  - file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
> >  + WRITE_ONCE(file->f_ra.prev_pos, (loff_t)index << PAGE_SHIFT);
> >  bh = ext4_bread(NULL, inode, map.m_lblk, 0);
> >  if (IS_ERR(bh)) {
> >  err = PTR_ERR(bh);
> >  diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c
> >  index 0df2aa81d884..d1232fc03c08 100644
> >  --- a/fs/ntfs3/fsntfs.c
> >  +++ b/fs/ntfs3/fsntfs.c
> >  @@ -1239,7 +1239,7 @@ int ntfs_read_run_nb_ra(struct ntfs_sb_info *sbi, 
> > const struct runs_tree *run,
> >  if (!ra_has_index(ra, index)) {
> >  page_cache_sync_readahead(mapping, ra, NULL,
> >  index, 1);
> >  - ra->prev_pos = (loff_t)index << PAGE_SHIFT;
> >  + WRITE_ONCE(ra->prev_pos, (loff_t)index << PAGE_SHIFT);
> >  }
> >  }
> >  
> >  diff --git a/include/trace/events/readahead.h 
> > b/include/trace/events/readahead.h
> >  index 0997ac5eceab..63d8df6c2983 100644
> >  --- a/include/trace/events/readahead.h
> >  +++ b/include/trace/events/readahead.h
> >  @@ -101,7 +101,7 @@ DECLARE_EVENT_CLASS(page_cache_ra_op,
> >  __entry->async_size = ra->async_size;
> >  __entry->ra_pages = ra->ra_pages;
> >  __entry->mmap_miss = ra->mmap_miss;
> >  - __entry->prev_pos = ra->prev_pos;
> >  + __entry->prev_pos = READ_ONCE(ra->prev_pos);
> >  __entry->req_count = req_count;
> >  ),
> >  
> >  diff --git a/mm/filemap.c b/mm/filemap.c
> >  index 63f256307fdd..d3e2d4b826b9 100644
> >  --- a/mm/filemap.c
> >  +++ b/mm/filemap.c
> >  @@ -2771,7 +2771,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct 
> > iov_iter *iter,
> >  int i, error = 0;
> >  bool writably_mapped;
> >  loff_t isize, end_offset;
> >  - loff_t last_pos = ra->prev_pos;
> >  + loff_t last_pos = READ_ONCE(ra->prev_pos);
> >  
> >  if (unlikely(iocb->ki_pos < 0))
> >  return -EINVAL;
> >  @@ -2870,7 +2870,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct 
> > iov_iter *iter,
> >  } while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
> >  
> >  file_accessed(filp);
> >  - ra->prev_pos = last_pos;
> >  + WRITE_ONCE(ra->prev_pos, last_pos);
> >  return already_read ? already_read : error;
> >  }
> >  EXPORT_SYMBOL_GPL(filemap_read);
> >  @@ -3122,7 +3122,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t 
> > *ppos,
> >  len -= n;
> >  total_spliced += n;
> >  *ppos += n;
> >  - in->f_ra.prev_pos = *ppos;
> >  + WRITE_ONCE(in->f_ra.prev_pos, *ppos);
> >  if (pipe_is_full(pipe))
> >  goto out;
> >  }
> >  diff --git a/mm/readahead.c b/mm/readahead.c
> >  index 7b05082c89ea..de49b35b0329 100644
> >  --- a/mm/readahead.c
> >  +++ b/mm/readahead.c
> >  @@ -142,7 +142,7 @@ void
> >  file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
> >  {
> >  ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
> >  - ra->prev_pos = -1;
> >  + WRITE_ONCE(ra->prev_pos, -1);
> >  }
> >  EXPORT_SYMBOL_GPL(file_ra_state_init);
> >  
> >  @@ -584,7 +584,7 @@ void page_cache_sync_ra(struct readahead_control 
> > *ractl,
> >  }
> >  
> >  max_pages = ractl_max_pages(ractl, req_count);
> >  - prev_index = (unsigned long long)ra->prev_pos >> PAGE_SHIFT;
> >  + prev_index = (unsigned long long)READ_ONCE(ra->prev_pos) >> PAGE_SHIFT;
> >  /*
> >  * A start of file, oversized read, or sequential cache miss:
> >  * trivial case: (index - prev_index) == 1
> >  diff --git a/mm/shmem.c b/mm/shmem.c
> >  index 5e7dcf5bc5d3..03569199baf4 100644
> >  --- a/mm/shmem.c
> >  +++ b/mm/shmem.c
> >  @@ -3642,7 +3642,7 @@ static ssize_t shmem_file_splice_read(struct file 
> > *in, loff_t *ppos,
> >  len -= n;
> >  total_spliced += n;
> >  *ppos += n;
> >  - in->f_ra.prev_pos = *ppos;
> >  + WRITE_ONCE(in->f_ra.prev_pos, *ppos);
> >  if (pipe_is_full(pipe))
> >  break;
> >  
> >  -- 
> >  2.43.0
> > 
> -- 
> Jan Kara <[email protected]>
> SUSE Labs, CR
>

Reply via email to