Hi,

David, there's a question about a commit of yours below, hence adding
you.

On 2021-03-16 19:36:10 -0400, Tom Lane wrote:
> Out of curiosity I poked at this for a little while.

Cool.


> It doesn't appear to me that we leak much at all, at least not if you
> are willing to take "still reachable" blocks as not-leaked.

Well, I think for any sort of automated testing - which I think would be
useful - we'd really need *no* leaks. I know that I get a few bleats
whenever I forget to set --leak-check=no. It's also not just postgres
itself, but some of the helper tools...

And it's not just valgrind, also gcc/clang sanitizers...


> What I found out is that we have a lot of issues that seem to devolve
> to valgrind not being sure that a block is referenced.  I identified
> two main causes of that:
>
> (1) We have a pointer, but it doesn't actually point right at the start
> of the block.  A primary culprit here is lists of thingies that use the
> slist and dlist infrastructure.  As an experiment, I moved the dlist_node
> fields of some popular structs to the beginning, and verified that that
> silences associated complaints.  I'm not sure that we want to insist on
> put-the-link-first as policy (although if we did, it could provide some
> notational savings perhaps).  However, unless someone knows of a way to
> teach valgrind about this situation, there may be no other way to silence
> those leakage complaints.  A secondary culprit is people randomly applying
> CACHELINEALIGN or the like to a palloc'd address, so that the address we
> have isn't pointing right at the block start.

Hm, do you still have a backtrace / suppression for one of those? I
didn't see any in a quick (*) serial installcheck I just ran. Or I
wasn't able to pinpoint them to this issue.


I think the run might have shown a genuine leak:

==2048803== 16 bytes in 1 blocks are definitely lost in loss record 139 of 906
==2048803==    at 0x89D2EA: palloc (mcxt.c:975)
==2048803==    by 0x2392D3: heap_beginscan (heapam.c:1198)
==2048803==    by 0x264E8F: table_beginscan_strat (tableam.h:918)
==2048803==    by 0x265994: systable_beginscan (genam.c:453)
==2048803==    by 0x83C2D1: SearchCatCacheMiss (catcache.c:1359)
==2048803==    by 0x83C197: SearchCatCacheInternal (catcache.c:1299)
==2048803==    by 0x83BE9A: SearchCatCache1 (catcache.c:1167)
==2048803==    by 0x85876A: SearchSysCache1 (syscache.c:1134)
==2048803==    by 0x84CDB3: RelationInitTableAccessMethod (relcache.c:1795)
==2048803==    by 0x84F807: RelationBuildLocalRelation (relcache.c:3554)
==2048803==    by 0x303C9D: heap_create (heap.c:395)
==2048803==    by 0x305790: heap_create_with_catalog (heap.c:1291)
==2048803==    by 0x41A327: DefineRelation (tablecmds.c:885)
==2048803==    by 0x6C96B6: ProcessUtilitySlow (utility.c:1131)
==2048803==    by 0x6C948A: standard_ProcessUtility (utility.c:1034)
==2048803==    by 0x6C865F: ProcessUtility (utility.c:525)
==2048803==    by 0x6C7409: PortalRunUtility (pquery.c:1159)
==2048803==    by 0x6C7636: PortalRunMulti (pquery.c:1305)
==2048803==    by 0x6C6B11: PortalRun (pquery.c:779)
==2048803==    by 0x6C05AB: exec_simple_query (postgres.c:1173)
==2048803==
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:palloc
   fun:heap_beginscan
   fun:table_beginscan_strat
   fun:systable_beginscan
   fun:SearchCatCacheMiss
   fun:SearchCatCacheInternal
   fun:SearchCatCache1
   fun:SearchSysCache1
   fun:RelationInitTableAccessMethod
   fun:RelationBuildLocalRelation
   fun:heap_create
   fun:heap_create_with_catalog
   fun:DefineRelation
   fun:ProcessUtilitySlow
   fun:standard_ProcessUtility
   fun:ProcessUtility
   fun:PortalRunUtility
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
}

Since 56788d2156fc heap_beginscan() allocates
        scan->rs_base.rs_private =
                palloc(sizeof(ParallelBlockTableScanWorkerData));
in heap_beginscan(). But doesn't free it in heap_endscan().

In most of the places heap scans are begun inside transient contexts,
but not always. In the above trace for example
RelationBuildLocalRelation switched to CacheMemoryContext, and nothing
switched to something else.

I'm a bit confused about the precise design of rs_private /
ParallelBlockTableScanWorkerData, specifically why it's been added to
TableScanDesc, instead of just adding it to HeapScanDesc? And why is it
allocated unconditionally, instead of just for parallel scans?


I don't think this is a false positive, even though it theoretically
could be freed by resetting CacheMemoryContext (see below)?


I saw a lot of false positives from autovacuum workers, because
AutovacMemCxt is never deleted, and because table_toast_map is created
in TopMemoryContext.  Adding an explicit
MemoryContextDelete(AutovacMemCxt) and parenting table_toast_map in that
shut that up.


Which brings me to the question why allocations in CacheMemoryContext,
AutovacMemCxt are considered to be "lost", even though they're still
"reachable" via a context reset.  I think valgrind ends up treating
memory allocated via memory contexts that still exist at process end as
lost, regardless of being reachable via the the memory pool (from
valgrinds view). Which I guess actually makes sense, for things like
TopMemoryContext and CacheContext - anything not reachable by means
other than a context reset is effectively lost for those.

For autovac launcher / worker it seems like a sensible thing to just
delete AutovacMemCxt.



> (2) The only pointer to the start of a block is actually somewhere within
> the block.  This is common in dynahash tables, where we allocate a slab
> of entries in a single palloc and then thread them together.  Each entry
> should have exactly one referencing pointer, but that pointer is more
> likely to be elsewhere within the same palloc block than in the external
> hash bucket array.  AFAICT, all cases except where the slab's first entry
> is pointed to by a hash bucket pointer confuse valgrind to some extent.
> I was able to hack around this by preventing dynahash from allocating
> more than one hash entry per palloc, but I wonder if there's a better way.

Hm. For me the number of leaks seem to stay the same with or without
your changes related to this. Is this a USE_VALGRIND build?

I'm using valgrind-3.16.1


> Attached is a very crude hack, not meant for commit, that hacks things
> up enough to greatly reduce the number of complaints with
> "--leak-check=full".
>
> One thing I've failed to silence so far is a bunch of entries like
>
> ==00:00:03:56.088 3467702== 1,861 bytes in 67 blocks are definitely lost in 
> loss record 1,290 of 1,418
> ==00:00:03:56.088 3467702==    at 0x950650: MemoryContextAlloc (mcxt.c:827)
> ==00:00:03:56.088 3467702==    by 0x951710: MemoryContextStrdup (mcxt.c:1179)
> ==00:00:03:56.088 3467702==    by 0x91C86E: RelationInitIndexAccessInfo 
> (relcache.c:1444)
> ==00:00:03:56.088 3467702==    by 0x91DA9C: RelationBuildDesc 
> (relcache.c:1200)
>
> which is complaining about the memory context identifiers for system
> indexes' rd_indexcxt contexts.  Those are surely not being leaked in
> any real sense.  I suspect that this has something to do with valgrind
> not counting the context->ident fields as live pointers, but I don't
> have enough valgrind-fu to fix that.

Yea. I suspect it's related to the fact that we mark the memory as a
valgrind mempool, I'll try to investigate.


I do see a bunch of leaks bleats below fun:plpgsql_compile that I don't
yet understand. E.g.

==2054558== 32 bytes in 1 blocks are definitely lost in loss record 284 of 913
==2054558==    at 0x89D389: palloc (mcxt.c:975)
==2054558==    by 0x518732: new_list (list.c:134)
==2054558==    by 0x518C0C: lappend (list.c:341)
==2054558==    by 0x83CAE7: SearchCatCacheList (catcache.c:1691)
==2054558==    by 0x859A9C: SearchSysCacheList (syscache.c:1447)
==2054558==    by 0x313192: FuncnameGetCandidates (namespace.c:975)
==2054558==    by 0x313D91: FunctionIsVisible (namespace.c:1450)
==2054558==    by 0x7C2891: format_procedure_extended (regproc.c:375)
==2054558==    by 0x7C27C3: format_procedure (regproc.c:324)
==2054558==    by 0xA7693E1: do_compile (pl_comp.c:348)
==2054558==    by 0xA769130: plpgsql_compile (pl_comp.c:224)

and

==2054558== 30 bytes in 4 blocks are definitely lost in loss record 225 of 913
==2054558==    at 0x89D389: palloc (mcxt.c:975)
==2054558==    by 0x3ADDAE: downcase_identifier (scansup.c:52)
==2054558==    by 0x3ADD85: downcase_truncate_identifier (scansup.c:39)
==2054558==    by 0x3AB5E4: core_yylex (scan.l:1032)
==2054558==    by 0xA789B2D: internal_yylex (pl_scanner.c:321)
==2054558==    by 0xA7896E3: plpgsql_yylex (pl_scanner.c:152)
==2054558==    by 0xA780015: plpgsql_yyparse (pl_gram.c:1945)
==2054558==    by 0xA76A652: do_compile (pl_comp.c:788)
==2054558==    by 0xA769130: plpgsql_compile (pl_comp.c:224)
==2054558==    by 0xA78948F: plpgsql_validator (pl_handler.c:539)

Based on the quick look I had (dinner is calling) I didn't yet
understand how plpgsql_compile_tmp_cxt error handling works.

Greetings,

Andres Freund


(*) or not so quick, I had to figure out why valgrind was so slow. It
turned out that I had typed shared_buffers=32MB into
shared_buffers=32GB...


Reply via email to