On Wed, Jul 9, 2025 at 10:47 AM Suren Baghdasaryan <sur...@google.com> wrote: > > On Wed, Jul 9, 2025 at 4:12 PM Liam R. Howlett <liam.howl...@oracle.com> > wrote: > > > > * Suren Baghdasaryan <sur...@google.com> [250709 11:06]: > > > On Wed, Jul 9, 2025 at 3:03 PM Vlastimil Babka <vba...@suse.cz> wrote: > > > > > > > > On 7/9/25 16:43, Suren Baghdasaryan wrote: > > > > > On Wed, Jul 9, 2025 at 1:57 AM Vlastimil Babka <vba...@suse.cz> wrote: > > > > >> > > > > >> On 7/8/25 01:10, Suren Baghdasaryan wrote: > > > > >> >>> + rcu_read_unlock(); > > > > >> >>> + vma = lock_vma_under_mmap_lock(mm, iter, address); > > > > >> >>> + rcu_read_lock(); > > > > >> >> OK I guess we hold the RCU lock the whole time as we traverse > > > > >> >> except when > > > > >> >> we lock under mmap lock. > > > > >> > Correct. > > > > >> > > > > >> I wonder if it's really necessary? Can't it be done just inside > > > > >> lock_next_vma()? It would also avoid the unlock/lock dance quoted > > > > >> above. > > > > >> > > > > >> Even if we later manage to extend this approach to smaps and employ > > > > >> rcu > > > > >> locking to traverse the page tables, I'd think it's best to separate > > > > >> and > > > > >> fine-grain the rcu lock usage for vma iterator and page tables, if > > > > >> only to > > > > >> avoid too long time under the lock. > > > > > > > > > > I thought we would need to be in the same rcu read section while > > > > > traversing the maple tree using vma_next() but now looking at it, > > > > > maybe we can indeed enter only while finding and locking the next > > > > > vma... > > > > > Liam, would that work? I see struct ma_state containing a node field. > > > > > Can it be freed from under us if we find a vma, exit rcu read section > > > > > then re-enter rcu and use the same iterator to find the next vma? > > > > > > > > If the rcu protection needs to be contigous, and patch 8 avoids the > > > > issue by > > > > always doing vma_iter_init() after rcu_read_lock() (but does it really > > > > avoid > > > > the issue or is it why we see the syzbot reports?) then I guess in the > > > > code > > > > quoted above we also need a vma_iter_init() after the rcu_read_lock(), > > > > because although the iterator was used briefly under mmap_lock > > > > protection, > > > > that was then unlocked and there can be a race before the > > > > rcu_read_lock(). > > > > > > Quite true. So, let's wait for Liam's confirmation and based on his > > > answer I'll change the patch by either reducing the rcu read section > > > or adding the missing vma_iter_init() after we switch to mmap_lock. > > > > You need to either be under rcu or mmap lock to ensure the node in the > > maple state hasn't been freed (and potentially, reallocated). > > > > So in this case, in the higher level, we can hold the rcu read lock for > > a series of walks and avoid re-walking the tree then the performance > > would be better. > > Got it. Thanks for confirming! > > > > > When we return to userspace, then we should drop the rcu read lock and > > will need to vma_iter_set()/vma_iter_invalidate() on return. I thought > > this was being done (through vma_iter_init()), but syzbot seems to > > indicate a path that was missed? > > We do that in m_start()/m_stop() by calling > lock_vma_range()/unlock_vma_range() but I think I have two problems > here: > 1. As Vlastimil mentioned I do not reset the iterator when falling > back to mmap_lock and exiting and then re-entering rcu read section; > 2. I do not reset the iterator after exiting rcu read section in > m_stop() and re-entering it in m_start(), so the later call to > lock_next_vma() might be using an iterator with a node that was freed > (and possibly reallocated). > > > > > This is the same thing that needed to be done previously with the mmap > > lock, but now under the rcu lock. > > > > I'm not sure how to mitigate the issue with the page table, maybe we > > guess on the number of vmas that we were doing for 4k blocks of output > > and just drop/reacquire then. Probably a problem for another day > > anyways. > > > > Also, I think you can also change the vma_iter_init() to vma_iter_set(), > > which is slightly less code under the hood. Vlastimil asked about this > > and it's probably a better choice. > > Ack. > I'll update my series with these fixes and all comments I received so > far, will run the reproducers to confirm no issues and repost them > later today.
I have the patchset ready but would like to test it some more. Will post it tomorrow. > Thanks, > Suren. > > > > > Thanks, > > Liam > >