On Mon, Mar 09, 2026 at 01:10:46PM +0000, Matthew Wilcox wrote: > On Fri, Feb 27, 2026 at 04:00:22PM +0000, Dmitry Ilvokhin wrote: > > Zone lock contention can significantly impact allocation and > > reclaim latency, as it is a central synchronization point in > > the page allocator and reclaim paths. Improved visibility into > > its behavior is therefore important for diagnosing performance > > issues in memory-intensive workloads. > > > > On some production workloads at Meta, we have observed noticeable > > zone lock contention. Deeper analysis of lock holders and waiters > > is currently difficult with existing instrumentation. > > > > While generic lock contention_begin/contention_end tracepoints > > cover the slow path, they do not provide sufficient visibility > > into lock hold times. In particular, the lack of a release-side > > event makes it difficult to identify long lock holders and > > correlate them with waiters. As a result, distinguishing between > > short bursts of contention and pathological long hold times > > requires additional instrumentation. > > > > This patch series adds dedicated tracepoint instrumentation to > > zone lock, following the existing mmap_lock tracing model. > > I don't like this at all. We have CONFIG_LOCK_STAT. That should be > improved insted of coming up with one-offs for every single lock > that someone deems "special".
Thanks for the feedback, Matthew. CONFIG_LOCK_STAT provides useful statistics, but it is primarily a debug facility and is generally too heavyweight for the production environments. The motivation for this series was to provide lightweight observability for the zone lock in production workloads. I agree that improving generic lock instrumentation would be preferable. I did consider whether something similar could be done generically for spinlocks, but the unlock path there is typically just a single atomic store, so adding generic lightweight instrumentation without affecting the fast path is difficult. In parallel, I've been experimenting with improving observability for sleepable locks by adding a contended_release tracepoint, which would allow correlating lock holders and waiters in a more generic way. I've posted an RFC here: https://lore.kernel.org/all/[email protected]/ I'd appreciate feedback on whether that direction makes sense for improving the generic lock tracing infrastructure.
