On Wed, Jul 16, 2025 at 6:57 AM Vlastimil Babka <vba...@suse.cz> wrote: > > On 7/16/25 05:05, Suren Baghdasaryan wrote: > > With maple_tree supporting vma tree traversal under RCU and per-vma > > locks, /proc/pid/maps can be read while holding individual vma locks > > instead of locking the entire address space. > > A completely lockless approach (walking vma tree under RCU) would be > > quite complex with the main issue being get_vma_name() using callbacks > > which might not work correctly with a stable vma copy, requiring > > original (unstable) vma - see special_mapping_name() for example. > > > > When per-vma lock acquisition fails, we take the mmap_lock for reading, > > lock the vma, release the mmap_lock and continue. This fallback to mmap > > read lock guarantees the reader to make forward progress even during > > lock contention. This will interfere with the writer but for a very > > short time while we are acquiring the per-vma lock and only when there > > was contention on the vma reader is interested in. > > > > We shouldn't see a repeated fallback to mmap read locks in practice, as > > this require a very unlikely series of lock contentions (for instance > > due to repeated vma split operations). However even if this did somehow > > happen, we would still progress. > > > > One case requiring special handling is when a vma changes between the > > time it was found and the time it got locked. A problematic case would > > be if a vma got shrunk so that its vm_start moved higher in the address > > space and a new vma was installed at the beginning: > > > > reader found: |--------VMA A--------| > > VMA is modified: |-VMA B-|----VMA A----| > > reader locks modified VMA A > > reader reports VMA A: | gap |----VMA A----| > > > > This would result in reporting a gap in the address space that does not > > exist. To prevent this we retry the lookup after locking the vma, however > > we do that only when we identify a gap and detect that the address space > > was changed after we found the vma. > > > > This change is designed to reduce mmap_lock contention and prevent a > > process reading /proc/pid/maps files (often a low priority task, such > > as monitoring/data collection services) from blocking address space > > updates. Note that this change has a userspace visible disadvantage: > > it allows for sub-page data tearing as opposed to the previous mechanism > > where data tearing could happen only between pages of generated output > > data. Since current userspace considers data tearing between pages to be > > acceptable, we assume is will be able to handle sub-page data tearing > > as well. > > > > Signed-off-by: Suren Baghdasaryan <sur...@google.com> > > Reviewed-by: Vlastimil Babka <vba...@suse.cz> > > Nit: the previous patch changed lines with e.g. -2UL to -2 and this seems > changing the same lines to add a comment e.g. *ppos = -2; /* -2 indicates > gate vma */ > > That comment could have been added in the previous patch already. Also if > you feel the need to add the comments, maybe it's time to just name those > special values with a #define or something :)
Good point. I'll see if I can fit that into the next version. >