Re: Why our Valgrind reports suck

Yasir Fri, 09 May 2025 03:40:37 -0700

On Fri, May 9, 2025 at 7:04 AM Tom Lane <t...@sss.pgh.pa.us> wrote:


> A nearby thread [1] reminded me to wonder why we seem to have
> so many false-positive leaks reported by Valgrind these days.
> For example, at exit of a backend that's executed a couple of
> trivial queries, I see
>
> ==00:00:00:25.515 260013== LEAK SUMMARY:
> ==00:00:00:25.515 260013==    definitely lost: 3,038 bytes in 90 blocks
> ==00:00:00:25.515 260013==    indirectly lost: 4,431 bytes in 61 blocks
> ==00:00:00:25.515 260013==      possibly lost: 390,242 bytes in 852 blocks
> ==00:00:00:25.515 260013==    still reachable: 579,139 bytes in 1,457
> blocks
> ==00:00:00:25.515 260013==         suppressed: 0 bytes in 0 blocks
>
> so about a thousand "leaked" blocks, all but a couple of which
> are false positives --- including nearly all the "definitely"
> leaked ones.
>
> Some testing and reading of the Valgrind manual [2] turned up a
> number of answers, which mostly boil down to us using very
> Valgrind-unfriendly data structures.  Per [2],
>
>     There are two ways a block can be reached. The first is with a
>     "start-pointer", i.e. a pointer to the start of the block. The
>     second is with an "interior-pointer", i.e. a pointer to the middle
>     of the block.
>
>     [ A block is reported as "possibly lost" if ] a chain of one or
>     more pointers to the block has been found, but at least one of the
>     pointers is an interior-pointer.
>
> We have a number of places that allocate space in such a way that
> blocks can only be reached by "interior pointers", leading to
> those blocks being reported as possibly lost:
>
> * MemoryContextAllocAligned does this more or less by definition.
>
> * Dynahash tables often end up looking like this, since the first
> element in each block created by element_alloc will be the tail end of
> its freelist, and thus will be reachable only via interior pointers
> later in the block, except when it's currently allocated.
>
> * Blocks that are reached via slists or dlists are like this
> unless the slist_node or dlist_node is at the front, which is not
> our typical practice.
>
> (There may be more cases, but those are what I identified in the
> leak report quoted above.)
>
> Another odd thing, which I have not found the explanation for, is
> that strings made by MemoryContextCopyAndSetIdentifier() show up
> as "definitely lost".  This is nonsense, because they are surely
> referenced by the context's "ident" field; but apparently Valgrind
> isn't counting that for some reason.
>
> I'd be okay with using a suppression pattern to hide the
> MemoryContextCopyAndSetIdentifier cases, but that doesn't seem
> like a very palatable answer for these other cases: too much
> risk of hiding actual leaks.
>
> I don't see a way to avoid the problem for MemoryContextAllocAligned
> with the current context-type-agnostic implementation of that.  We
> could probably fix it though if we pushed it down a layer, so that the
> alignment padding space could be treated as part of the chunk header.
> Might be able to waste less space that way, too.
>
> Dynahash tables could be fixed by expending a little extra storage
> to chain all the element-pool blocks together explicitly, which
> seems probably acceptable to do in USE_VALGRIND builds.  (Maybe
> while we are at that, we could fix things so that currently-unused
> element slots are marked NOACCESS.)
>
> I don't have an answer for slist/dlist usages other than rearranging
> all the related structs.  Anybody see a better way?
>
> Or we could do nothing, but I think there is value in having
> less clutter in Valgrind reports.  Thoughts?
>
>                         regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/flat/CAA9OW9e4RbpgQd8NSzpW6BgJQNpKGEFoohWhkbEL8%3DZ5obvD0Q%40mail.gmail.com
> [2] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks
>
>
>
Thanks for the detailed analysis, this was very informative. I agree that
reducing noise in Valgrind reports would be valuable, especially for
catching real leaks. Having a clearer signal from Valgrind would definitely
help long term.

Regards,

Yasir
Data Bene

Re: Why our Valgrind reports suck

Reply via email to