[ starting a new thread for this ] Andres Freund <and...@anarazel.de> writes: > I wonder if it'd be worth starting to explicitly annotate all the places > that do allocations and are fine with leaking them. E.g. by introducing > malloc_permanently() or such. Right now it's hard to use valgrind et al > to detect leaks because of all the false positives due to such "ok to > leak" allocations.
Out of curiosity I poked at this for a little while. It doesn't appear to me that we leak much at all, at least not if you are willing to take "still reachable" blocks as not-leaked. Most of the problem is more subtle than that. I found just a couple of things that really seem like leaks of permanent data structures to valgrind: * Where ps_status.c copies the original "environ" array (on PS_USE_CLOBBER_ARGV platforms), valgrind thinks that's all leaked, implying that it doesn't count the "environ" global as a valid reference to leakable data. I was able to shut that up by also saving the pointer into an otherwise-unused static variable. (This is sort of a poor man's implementation of your "malloc_permanently" idea; but I doubt it's worth working harder, given the small number of use-cases.) * The postmaster's sock_paths and lock_files lists appear to be leaked, but we're doing that to ourselves by throwing away the pointers to them without physically freeing the lists. We can just not do that. What I found out is that we have a lot of issues that seem to devolve to valgrind not being sure that a block is referenced. I identified two main causes of that: (1) We have a pointer, but it doesn't actually point right at the start of the block. A primary culprit here is lists of thingies that use the slist and dlist infrastructure. As an experiment, I moved the dlist_node fields of some popular structs to the beginning, and verified that that silences associated complaints. I'm not sure that we want to insist on put-the-link-first as policy (although if we did, it could provide some notational savings perhaps). However, unless someone knows of a way to teach valgrind about this situation, there may be no other way to silence those leakage complaints. A secondary culprit is people randomly applying CACHELINEALIGN or the like to a palloc'd address, so that the address we have isn't pointing right at the block start. (2) The only pointer to the start of a block is actually somewhere within the block. This is common in dynahash tables, where we allocate a slab of entries in a single palloc and then thread them together. Each entry should have exactly one referencing pointer, but that pointer is more likely to be elsewhere within the same palloc block than in the external hash bucket array. AFAICT, all cases except where the slab's first entry is pointed to by a hash bucket pointer confuse valgrind to some extent. I was able to hack around this by preventing dynahash from allocating more than one hash entry per palloc, but I wonder if there's a better way. Attached is a very crude hack, not meant for commit, that hacks things up enough to greatly reduce the number of complaints with "--leak-check=full". One thing I've failed to silence so far is a bunch of entries like ==00:00:03:56.088 3467702== 1,861 bytes in 67 blocks are definitely lost in loss record 1,290 of 1,418 ==00:00:03:56.088 3467702== at 0x950650: MemoryContextAlloc (mcxt.c:827) ==00:00:03:56.088 3467702== by 0x951710: MemoryContextStrdup (mcxt.c:1179) ==00:00:03:56.088 3467702== by 0x91C86E: RelationInitIndexAccessInfo (relcache.c:1444) ==00:00:03:56.088 3467702== by 0x91DA9C: RelationBuildDesc (relcache.c:1200) which is complaining about the memory context identifiers for system indexes' rd_indexcxt contexts. Those are surely not being leaked in any real sense. I suspect that this has something to do with valgrind not counting the context->ident fields as live pointers, but I don't have enough valgrind-fu to fix that. Anyway, the bottom line is that I do not think that we have all that many uses of the pattern you postulated originally. It's more that we've designed some valgrind-unfriendly data structures. We need to improve that situation to make much progress here. regards, tom lane
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 4c7b1e7bfd..cd984929a6 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -888,7 +888,6 @@ RemoveSocketFiles(void) (void) unlink(sock_path); } /* Since we're about to exit, no need to reclaim storage */ - sock_paths = NIL; } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 55c9445898..2abdd44190 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -814,7 +814,7 @@ InitCatCache(int id, * Note: we rely on zeroing to initialize all the dlist headers correctly */ sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE; - cp = (CatCache *) CACHELINEALIGN(palloc0(sz)); + cp = (CatCache *) palloc0(sz); cp->cc_bucket = palloc0(nbuckets * sizeof(dlist_head)); /* diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 6546e3c7c7..df39bc77df 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -1713,6 +1713,10 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx) if (hashp->isfixed) return false; + /* Force separate allocations to de-confuse valgrind */ + if (!hashp->isshared) + nelem = 1; + /* Each element has a HASHELEMENT header plus user data. */ elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(hctl->entrysize); diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index 8b73850d0d..be37e8b312 100644 --- a/src/backend/utils/init/miscinit.c +++ b/src/backend/utils/init/miscinit.c @@ -944,7 +944,6 @@ UnlinkLockFiles(int status, Datum arg) /* Should we complain if the unlink fails? */ } /* Since we're about to exit, no need to reclaim storage */ - lock_files = NIL; /* * Lock file removal should always be the last externally visible action diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 5819faaf2d..2d0ef37f7f 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -113,6 +113,11 @@ static size_t ps_buffer_fixed_size; /* size of the constant prefix */ static int save_argc; static char **save_argv; +/* We use this to convince Valgrind that replacement environ is referenced */ +#ifdef PS_USE_CLOBBER_ARGV +static char ** volatile fake_environ; +#endif + /* * Call this early in startup to save the original argc/argv values. @@ -192,6 +197,8 @@ save_ps_display_args(int argc, char **argv) } new_environ[i] = NULL; environ = new_environ; + /* Valgrind tends to think this memory is leaked, so fool it */ + fake_environ = new_environ; } #endif /* PS_USE_CLOBBER_ARGV */ diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h index fc7706314b..f32a13f786 100644 --- a/src/include/postmaster/bgworker_internals.h +++ b/src/include/postmaster/bgworker_internals.h @@ -32,6 +32,7 @@ */ typedef struct RegisteredBgWorker { + slist_node rw_lnode; /* list link (first to placate valgrind) */ BackgroundWorker rw_worker; /* its registry entry */ struct bkend *rw_backend; /* its BackendList entry, or NULL */ pid_t rw_pid; /* 0 if not running */ @@ -39,7 +40,6 @@ typedef struct RegisteredBgWorker TimestampTz rw_crashed_at; /* if not 0, time it last crashed */ int rw_shmem_slot; bool rw_terminate; - slist_node rw_lnode; /* list link */ } RegisteredBgWorker; extern slist_head BackgroundWorkerList; diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index ddc2762eb3..4f2e631562 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -85,6 +85,13 @@ typedef struct catcache typedef struct catctup { + /* + * Each tuple in a cache is a member of a dlist that stores the elements + * of its hash bucket. We keep each dlist in LRU order to speed repeated + * lookups. + */ + dlist_node cache_elem; /* list member of per-bucket list */ + int ct_magic; /* for identifying CatCTup entries */ #define CT_MAGIC 0x57261502 @@ -96,13 +103,6 @@ typedef struct catctup */ Datum keys[CATCACHE_MAXKEYS]; - /* - * Each tuple in a cache is a member of a dlist that stores the elements - * of its hash bucket. We keep each dlist in LRU order to speed repeated - * lookups. - */ - dlist_node cache_elem; /* list member of per-bucket list */ - /* * A tuple marked "dead" must not be returned by subsequent searches. * However, it won't be physically deleted from the cache until its @@ -156,13 +156,13 @@ typedef struct catctup */ typedef struct catclist { + dlist_node cache_elem; /* list member of per-catcache list */ + int cl_magic; /* for identifying CatCList entries */ #define CL_MAGIC 0x52765103 uint32 hash_value; /* hash value for lookup keys */ - dlist_node cache_elem; /* list member of per-catcache list */ - /* * Lookup keys for the entry, with the first nkeys elements being valid. * All by-reference are separately allocated.