On Thu, Apr 24, 2025 at 8:20 AM Suren Baghdasaryan <sur...@google.com> wrote: > > On Wed, Apr 23, 2025 at 5:24 PM Liam R. Howlett <liam.howl...@oracle.com> > wrote: > > > > * Andrii Nakryiko <andrii.nakry...@gmail.com> [250423 18:06]: > > > On Wed, Apr 23, 2025 at 2:49 PM Suren Baghdasaryan <sur...@google.com> > > > wrote: > > > > > > > > On Tue, Apr 22, 2025 at 3:49 PM Andrii Nakryiko > > > > <andrii.nakry...@gmail.com> wrote: > > > > > > > > > > On Fri, Apr 18, 2025 at 10:50 AM Suren Baghdasaryan > > > > > <sur...@google.com> wrote: > > > > > > > > > > > > With maple_tree supporting vma tree traversal under RCU and vma and > > > > > > its important members being RCU-safe, /proc/pid/maps can be read > > > > > > under > > > > > > RCU and without the need to read-lock mmap_lock. However vma content > > > > > > can change from under us, therefore we make a copy of the vma and we > > > > > > pin pointer fields used when generating the output (currently only > > > > > > vm_file and anon_name). Afterwards we check for concurrent address > > > > > > space modifications, wait for them to end and retry. While we take > > > > > > the mmap_lock for reading during such contention, we do that > > > > > > momentarily > > > > > > only to record new mm_wr_seq counter. This change is designed to > > > > > > reduce > > > > > > > > > > This is probably a stupid question, but why do we need to take a lock > > > > > just to record this counter? uprobes get away without taking mmap_lock > > > > > even for reads, and still record this seq counter. And then detect > > > > > whether there were any modifications in between. Why does this change > > > > > need more heavy-weight mmap_read_lock to do speculative reads? > > > > > > > > Not a stupid question. mmap_read_lock() is used to wait for the writer > > > > to finish what it's doing and then we continue by recording a new > > > > sequence counter value and call mmap_read_unlock. This is what > > > > get_vma_snapshot() does. But your question made me realize that we can > > > > optimize m_start() further by not taking mmap_read_lock at all. > > > > Instead of taking mmap_read_lock then doing drop_mmap_lock() we can > > > > try mmap_lock_speculate_try_begin() and only if it fails do the same > > > > dance we do in the get_vma_snapshot(). I think that should work. > > > > > > Ok, yeah, it would be great to avoid taking a lock in a common case! > > > > We can check this counter once per 4k block and maintain the same > > 'tearing' that exists today instead of per-vma. Not that anyone said > > they had an issue with changing it, but since we're on this road anyways > > I'd thought I'd point out where we could end up. > > We would need to run that check on the last call to show_map() right > before seq_file detects the overflow and flushes the page. On > contention we will also be throwing away more prepared data (up to a > page worth of records) vs only the last record. All in all I'm not > convinced this is worth doing unless increased chances of data tearing > is identified as a problem. >
Yep, I agree, with filling out 4K of data we run into much higher chances of conflict, IMO. Not worth it, I'd say. > > > > I am concerned about live locking in either scenario, but I haven't > > looked too deep into this pattern. > > > > I also don't love (as usual) the lack of ensured forward progress. > > Hmm. Maybe we should add a retry limit on > mmap_lock_speculate_try_begin() and once the limit is hit we just take > the mmap_read_lock and proceed with it? That would prevent a > hyperactive writer from blocking the reader's forward progress > indefinitely. Came here to say the same. I'd add a small number of retries (3-5?) and then fallback to the read-locked approach. The main challenge is to keep all this logic nicely isolated from the main VMA search/printing logic. For a similar pattern in uprobes, we don't even bother to rety, we just fallback to mmap_read_lock and proceed, under the assumption that this is going to be very rare and thus not important from the overall performance perspective. > > > > > It seems like we have four cases for the vm area state now: > > 1. we want to read a stable vma or set of vmas (per-vma locking) > > 2. we want to read a stable mm state for reading (the very short named > > mmap_lock_speculate_try_begin) > > and we don't mind retrying on contention. This one should be done > under RCU protection. > > > 3. we ensure a stable vma/mm state for reading (mmap read lock) > > 4. we are writing - get out of my way (mmap write lock). > > I wouldn't call #2 a vma state. More of a usecase when we want to read > vma under RCU (valid but can change from under us) and then retry if > it might have been modified from under us. > > > > > Cheers, > > Liam > >