On Sat 12-09-20 09:19:11, Amir Goldstein wrote: > On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner <da...@fromorbit.com> wrote: > > > > From: Dave Chinner <dchin...@redhat.com> > > > > The page faultround path ->map_pages is implemented in XFS via > > filemap_map_pages(). This function checks that pages found in page > > cache lookups have not raced with truncate based invalidation by > > checking page->mapping is correct and page->index is within EOF. > > > > However, we've known for a long time that this is not sufficient to > > protect against races with invalidations done by operations that do > > not change EOF. e.g. hole punching and other fallocate() based > > direct extent manipulations. The way we protect against these > > races is we wrap the page fault operations in a XFS_MMAPLOCK_SHARED > > lock so they serialise against fallocate and truncate before calling > > into the filemap function that processes the fault. > > > > Do the same for XFS's ->map_pages implementation to close this > > potential data corruption issue. > > > > Signed-off-by: Dave Chinner <dchin...@redhat.com> > > --- > > fs/xfs/xfs_file.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 7b05f8fd7b3d..4b185a907432 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -1266,10 +1266,23 @@ xfs_filemap_pfn_mkwrite( > > return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true); > > } > > > > +static void > > +xfs_filemap_map_pages( > > + struct vm_fault *vmf, > > + pgoff_t start_pgoff, > > + pgoff_t end_pgoff) > > +{ > > + struct inode *inode = file_inode(vmf->vma->vm_file); > > + > > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > + filemap_map_pages(vmf, start_pgoff, end_pgoff); > > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > > +} > > + > > static const struct vm_operations_struct xfs_file_vm_ops = { > > .fault = xfs_filemap_fault, > > .huge_fault = xfs_filemap_huge_fault, > > - .map_pages = filemap_map_pages, > > + .map_pages = xfs_filemap_map_pages, > > .page_mkwrite = xfs_filemap_page_mkwrite, > > .pfn_mkwrite = xfs_filemap_pfn_mkwrite, > > }; > > -- > > 2.26.2.761.g0e0b3e54be > > > > It appears that ext4, f2fs, gfs2, orangefs, zonefs also need this fix > > zonefs does not support hole punching, so it may not need to use > mmap_sem at all.
So I've written an ext4 fix for this but before actually posting it Nikolay working on btrfs fix asked why exactly is filemap_map_pages() actually a problem and I think he's right it actually isn't a problem. The thing is: filemap_map_pages() never does any page mapping or IO. It only creates PTEs for uptodate pages that are already in page cache. As such it is a rather different beast compared to the fault handler from fs POV and does not need protection from hole punching (current serialization on page lock and checking of page->mapping is enough). That being said I agree this is subtle and the moment someone adds e.g. a readahead call into filemap_map_pages() we have a real problem. I'm not sure how to prevent this risk... Honza -- Jan Kara <j...@suse.com> SUSE Labs, CR