On 4/8/21 6:08 AM, Miaohe Lin wrote:
> When I was investigating the swap code, I found the below possible race
> window:
>
> CPU 1 CPU 2
> ----- -----
> do_swap_page
> synchronous swap_readpage
> alloc_page_vma
> swapoff
> release swap_file, bdev, or ...
Perhaps I'm missing something. The release of swap_file, bdev etc
happens after we have cleared the SWP_VALID bit in si->flags in
destroy_swap_extents
if I read the swapoff code correctly.
> swap_readpage
> check sis->flags is ok
> access swap_file, bdev...[oops!]
> si->flags = 0
This happens after we clear the si->flags
synchronize_rcu()
release swap_file, bdev, in
destroy_swap_extents()
So I think if we have get_swap_device/put_swap_device in do_swap_page,
it should fix the race you've pointed out here.
Then synchronize_rcu() will wait till we have completed do_swap_page and
call put_swap_device.
>
> Using current get/put_swap_device() to guard against concurrent swapoff for
> swap_readpage() looks terrible because swap_readpage() may take really long
> time. And this race may not be really pernicious because swapoff is usually
> done when system shutdown only. To reduce the performance overhead on the
> hot-path as much as possible, it appears we can use the percpu_ref to close
> this race window(as suggested by Huang, Ying).
I think it is better to break this patch into two.
One patch is to fix the race in do_swap_page and swapoff
by adding get_swap_device/put_swap_device in do_swap_page.
The second patch is to modify get_swap_device and put_swap_device
with percpu_ref. But swapoff is a relatively rare events.
I am not sure making percpu_ref change for performance is really beneficial.
Did you encounter a real use case where you see a problem with swapoff?
The delay in swapoff is primarily in try_to_unuse to bring all
the swapped off pages back into memory. Synchronizing with other
CPU for paging in probably is a small component in overall scheme
of things.
Thanks.
Tim