[Bug 1787089] [NEW] [AEP-bug] ext4: more rare direct I/O vs unmap failures

quanxian Tue, 14 Aug 2018 21:46:15 -0700

Private bug reported:

Description:
Even with the ext4_break_layouts() support added by "ext4: handle layout 
changes to pinned DAX mappings" Still seeing occasional cases with unit test 
where we are truncating a page that has an elevated reference count. 
Investigate.


—

The root cause of this issue is that while the ei->i_mmap_sem provides
synchronization between ext4_break_layouts() and page faults, it doesn't
provide synchronize us with the direct I/O path. This exact same issue exists
in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK.

This allows the direct I/O path to do I/O and raise & lower page->_refcount
while we're executing a truncate/hole punch. This leads to us trying to free
a page with an elevated refcount.

Here's one instance of the race:

CPU 0 CPU 1
----- -----
ext4_punch_hole()
ext4_break_layouts() # all pages have refcount=1

ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page() # elevates refcount

truncate_pagecache_range()
... a few layers ...
dax_disassociate_entry() # sees elevated refcount, WARN_ON_ONCE()

A similar race occurs when the refcount is being dropped while we're running
ext4_break_layouts(), and this is the one that my test was actually hitting:

CPU 0 CPU 1
----- -----
ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page()

elevates refcount of page X
ext4_punch_hole()
ext4_break_layouts() # two pages, X & Y, have refcount == 2
__wait_var_event() # called for page X
__put_devmap_managed_page()

drops refcount of X to 1
__wait_var_events() checks X's refcount in "if (condition)", and breaks.
We never actually called ext4_wait_dax_page(), so 'retry' in
ext4_break_layouts() is still false. Exit do/while loop in
ext4_break_layouts, never attempting to wait on page Y which still has an
elevated refcount of 2.
truncate_pagecache_range()
... a few layers ...
dax_disassociate_entry() # sees elevated refcount for Y, WARN_ON_ONCE()

Essentially the solution will most likely involve adding synchronization
between the direct I/O path and truncate/hole punch type operations, and
it'll need to happen for both ext4 and XFS, so the filesystem folks need
to be involved.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: intel-kernel-18.10

** Information type changed from Public to Private

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1787089

Title:
  [AEP-bug] ext4: more rare direct I/O vs unmap failures

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787089/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1787089] [NEW] [AEP-bug] ext4: more rare direct I/O vs unmap failures

Reply via email to