On 12/11/25 13:29, Christoph Berg wrote:
> Re: Tomas Vondra
>>>> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
>>>> attached patch. It still calls numa_available(), so that we don't
>>>> silently miss future libnuma changes.
>>>>
>>>> Can you check this makes it work inside the docker container?
>>>
>>> Yes your patch works. (Sorry I meant to test earlier, but RL...)
>>
>> Thanks. I've pushed the fix (and backpatched to 18).
>
> It looks like we are not done here yet :(
>
> postgresql-18 is failing here intermittently with this diff:
>
> 12:20:24 ---
> /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out
> 2025-11-10 21:52:06.000000000 +0000
> 12:20:24 +++
> /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out
> 2025-12-11 11:20:22.618989603 +0000
> 12:20:24 @@ -6,8 +6,4 @@
> 12:20:24 -- switch to superuser
> 12:20:24 \c -
> 12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
> 12:20:24 - ok
> 12:20:24 -----
> 12:20:24 - t
> 12:20:24 -(1 row)
> 12:20:24 -
> 12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
>
> That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.
>
> I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
> questing/amd64, where libnuma should take care of this itself, without
> the extra patch in PG. There was another case on bullseye/amd64 which
> has the old libnuma.
>
> It's been frequent enough so it killed 4 out of the 10 builds
> currently visible on
> https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
> (Though to be fair, only one distribution/arch combination was failing
> for each of them.)
>
> There is also one instance of it in
> https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/
>
> I currently have no idea what's happening.
>
Hmmm, strange. -2 is ENOENT, which should mean this:
-ENOENT
The page is not present.
But what does "not present" mean in this context? And why would that be
only intermittent? Presumably this is still running in Docker, so maybe
it's another weird consequence of that?
regards
--
Tomas Vondra