On 6/24/25 17:30, Christoph Berg wrote:
> Re: Tomas Vondra
>> If it's a reliable fix, then I guess we can do it like this. But won't
>> that be a performance penalty on everyone? Or does the system split the
>> array into 16-element chunks anyway, so this makes no difference?
> 
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
> 
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.
> 
>> Anyway, maybe we should start by reporting this to the kernel people. Do
>> you want me to do that, or shall one of you take care of that? I suppose
>> that'd be better, as you already wrote a fix / know the code better.
> 
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
> 

Thanks! Now we wait ...

Attached is a minor tweak of the valgrind suppresion rules, to add the
two places touching the memory. I was hoping I could add a single rule
for pg_numa_touch_mem_if_required, but that does not work - it's a
macro, not a function. So I had to add one rule for both functions,
querying the NUMA. That's a bit disappointing, because it means it'll
hide all other failues (of Memcheck:Addr8 type) in those functions.

Perhaps it'd be be better to turn pg_numa_touch_mem_if_required into a
proper (inlined) function, at least with USE_VALGRIND defined. Something
like the v2 patch - needs more testing to ensure the inlined function
doesn't break the touching or something silly like that.

regards

-- 
Tomas Vondra
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..36bf3253f76 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,22 @@
    Memcheck:Cond
    fun:PyObject_Realloc
 }
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for both places querying the NUMA status.
+{
+   pg_buffercache_numa_pages
+   Memcheck:Addr8
+   fun:pg_buffercache_numa_pages
+   fun:ExecMakeTableFunctionResult
+}
+
+{
+   pg_get_shmem_allocations_numa
+   Memcheck:Addr8
+   fun:pg_get_shmem_allocations_numa
+   fun:ExecMakeTableFunctionResult
+}
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index 40f1d324dcf..3b9a5b42898 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -24,9 +24,22 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
  * This is required on Linux, before pg_numa_query_pages() as we
  * need to page-fault before move_pages(2) syscall returns valid results.
  */
+#ifdef USE_VALGRIND
+
+static inline void
+pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
+{
+	volatile uint64 ro_volatile_var pg_attribute_unused();
+	ro_volatile_var = *(volatile uint64 *) ptr;
+}
+
+#else
+
 #define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
 	ro_volatile_var = *(volatile uint64 *) ptr
 
+#endif
+
 #else
 
 #define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..6b9a8998f82 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,14 @@
    Memcheck:Cond
    fun:PyObject_Realloc
 }
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for all places querying the NUMA status.
+{
+   pg_numa_touch_mem_if_required
+   Memcheck:Addr8
+   fun:pg_numa_touch_mem_if_required
+}

Reply via email to