Why our Valgrind reports suck

Tom Lane Thu, 08 May 2025 19:04:24 -0700

A nearby thread [1] reminded me to wonder why we seem to have
so many false-positive leaks reported by Valgrind these days.
For example, at exit of a backend that's executed a couple of
trivial queries, I see

==00:00:00:25.515 260013== LEAK SUMMARY:
==00:00:00:25.515 260013== definitely lost: 3,038 bytes in 90 blocks
==00:00:00:25.515 260013== indirectly lost: 4,431 bytes in 61 blocks
==00:00:00:25.515 260013== possibly lost: 390,242 bytes in 852 blocks
==00:00:00:25.515 260013== still reachable: 579,139 bytes in 1,457 blocks
==00:00:00:25.515 260013== suppressed: 0 bytes in 0 blocks

so about a thousand "leaked" blocks, all but a couple of which
are false positives --- including nearly all the "definitely"
leaked ones.

Some testing and reading of the Valgrind manual [2] turned up a
number of answers, which mostly boil down to us using very
Valgrind-unfriendly data structures. Per [2],

There are two ways a block can be reached. The first is with a
"start-pointer", i.e. a pointer to the start of the block. The
second is with an "interior-pointer", i.e. a pointer to the middle
of the block.

[ A block is reported as "possibly lost" if ] a chain of one or
more pointers to the block has been found, but at least one of the
pointers is an interior-pointer.

We have a number of places that allocate space in such a way that
blocks can only be reached by "interior pointers", leading to
those blocks being reported as possibly lost:

* MemoryContextAllocAligned does this more or less by definition.

* Dynahash tables often end up looking like this, since the first
element in each block created by element_alloc will be the tail end of
its freelist, and thus will be reachable only via interior pointers
later in the block, except when it's currently allocated.

* Blocks that are reached via slists or dlists are like this
unless the slist_node or dlist_node is at the front, which is not
our typical practice.

(There may be more cases, but those are what I identified in the
leak report quoted above.)

Another odd thing, which I have not found the explanation for, is
that strings made by MemoryContextCopyAndSetIdentifier() show up
as "definitely lost". This is nonsense, because they are surely
referenced by the context's "ident" field; but apparently Valgrind
isn't counting that for some reason.

I'd be okay with using a suppression pattern to hide the
MemoryContextCopyAndSetIdentifier cases, but that doesn't seem
like a very palatable answer for these other cases: too much
risk of hiding actual leaks.

I don't see a way to avoid the problem for MemoryContextAllocAligned
with the current context-type-agnostic implementation of that. We
could probably fix it though if we pushed it down a layer, so that the
alignment padding space could be treated as part of the chunk header.
Might be able to waste less space that way, too.

Dynahash tables could be fixed by expending a little extra storage
to chain all the element-pool blocks together explicitly, which
seems probably acceptable to do in USE_VALGRIND builds. (Maybe
while we are at that, we could fix things so that currently-unused
element slots are marked NOACCESS.)

I don't have an answer for slist/dlist usages other than rearranging
all the related structs. Anybody see a better way?

Or we could do nothing, but I think there is value in having
less clutter in Valgrind reports. Thoughts?

regards, tom lane

[1]
https://www.postgresql.org/message-id/flat/CAA9OW9e4RbpgQd8NSzpW6BgJQNpKGEFoohWhkbEL8%3DZ5obvD0Q%40mail.gmail.com
[2] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks

Why our Valgrind reports suck

Reply via email to