** Changed in: linux (Ubuntu)
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1787089
Title:
[AEP-bug] ext4: more rare direct I/O vs unmap failures
Status in intel:
Triaged
Status in linux package in Ubuntu:
Fix Committed
Bug description:
Description:
Even with the ext4_break_layouts() support added by "ext4: handle layout
changes to pinned DAX mappings" Still seeing occasional cases with unit test
where we are truncating a page that has an elevated reference count.
Investigate.
—
The root cause of this issue is that while the ei->i_mmap_sem provides
synchronization between ext4_break_layouts() and page faults, it doesn't
provide synchronize us with the direct I/O path. This exact same issue exists
in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK.
This allows the direct I/O path to do I/O and raise & lower page->_refcount
while we're executing a truncate/hole punch. This leads to us trying to free
a page with an elevated refcount.
Here's one instance of the race:
CPU 0 CPU 1
----- -----
ext4_punch_hole()
ext4_break_layouts() # all pages have refcount=1
ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page() # elevates refcount
truncate_pagecache_range()
... a few layers ...
dax_disassociate_entry() # sees elevated refcount, WARN_ON_ONCE()
A similar race occurs when the refcount is being dropped while we're running
ext4_break_layouts(), and this is the one that my test was actually hitting:
CPU 0 CPU 1
----- -----
ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page()
elevates refcount of page X
ext4_punch_hole()
ext4_break_layouts() # two pages, X & Y, have refcount == 2
__wait_var_event() # called for page X
__put_devmap_managed_page()
drops refcount of X to 1
__wait_var_events() checks X's refcount in "if (condition)", and breaks.
We never actually called ext4_wait_dax_page(), so 'retry' in
ext4_break_layouts() is still false. Exit do/while loop in
ext4_break_layouts, never attempting to wait on page Y which still has an
elevated refcount of 2.
truncate_pagecache_range()
... a few layers ...
dax_disassociate_entry() # sees elevated refcount for Y, WARN_ON_ONCE()
Essentially the solution will most likely involve adding
synchronization between the direct I/O path and truncate/hole punch
type operations, and it'll need to happen for both ext4 and XFS, so
the filesystem folks need to be involved.
To manage notifications about this bug go to:
https://bugs.launchpad.net/intel/+bug/1787089/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp