On Mon, Dec 16, 2019 at 03:35:12PM -0800, Andres Freund wrote:
Hi,

I was responding to a question about postgres' per-backend memory usage,
making me look at the various contexts below CacheMemoryContext.  There
is pretty much always a significant number of contexts below, one for
each index:

 CacheMemoryContext: 524288 total in 7 blocks; 8680 free (0 chunks); 515608 used
   index info: 2048 total in 2 blocks; 568 free (1 chunks); 1480 used: 
pg_class_tblspc_relfilenode_index
   index info: 2048 total in 2 blocks; 960 free (0 chunks); 1088 used: 
pg_statistic_ext_relid_index
   index info: 2048 total in 2 blocks; 976 free (0 chunks); 1072 used: 
blarg_pkey
   index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: 
pg_index_indrelid_index
   index info: 2048 total in 2 blocks; 600 free (1 chunks); 1448 used: 
pg_attrdef_adrelid_adnum_index
   index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: 
pg_db_role_setting_databaseid_rol_index
   index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: 
pg_opclass_am_name_nsp_index
   index info: 2048 total in 2 blocks; 928 free (2 chunks); 1120 used: 
pg_foreign_data_wrapper_name_index
   index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: 
pg_enum_oid_index
   index info: 2048 total in 2 blocks; 600 free (1 chunks); 1448 used: 
pg_class_relname_nsp_index
   index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: 
pg_foreign_server_oid_index
   index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: 
pg_publication_pubname_index
...
   index info: 3072 total in 2 blocks; 1144 free (2 chunks); 1928 used: 
pg_conversion_default_index
...

while I also think we could pretty easily reduce the amount of memory
used for each index, I want to focus on something else here:

We waste a lot of space due to all these small contexts. Even leaving
aside the overhead of the context and its blocks - not insignificant -
they are mostly between ~1/2 a ~1/4 empty.

At the same time we probably don't want to inline all of them into
CacheMemoryContext - too likely to introduce bugs, and too hard to
maintain leak free.


But what if we had a new type of memory context that did not itself
manage memory underlying allocations, but instead did so via the parent?
If such a context tracked all the live allocations in some form of list,
it could then free them from the parent at reset time. In other words,
it'd proxy all memory management via the parent, only adding a separate
name, and tracking of all live chunks.

Obviously such a context would be less efficient to reset than a plain
aset.c one - but I don't think that'd matter much for these types of
use-cases.  The big advantage in this case would be that we wouldn't
have separate two separate "blocks" for each index cache entry, but
instead allocations could all be done within CacheMemoryContext.

Does that sound like a sensible idea?


I do think it's an interesting idea, worth exploring.

I agree it's probably OK if the proxy contexts are a bit less efficient,
but I think we can restrict their use to places where that's not an
issue (i.e. low frequency of resets, small number of allocated chunks
etc.). And if needed we can probably find ways to improve the efficiency
e.g. by replacing the linked list with a small hash table or something
(to speed-up pfree etc.). Or something.

I think the big question is what this would mean for the parent context.
Because suddenly it's a mix of chunks with different life spans, which
would originally be segregared in different malloc-ed blocks. And now
that would not be true, so e.g. after deleting the child context the
memory would not be freed but just moved to the freelist.

It would also confuse MemoryContextStats, which would suddenly not
realize some of the chunks are actually "owned" by the child context.
Maybe this could be improved, but only partially (unless we'd want to
have a per-chunk flag if it's owned by the context or by a proxy).

Not sure if this would impact accounting (e.g. what if someone creates a
custom aggregate, creating a separate proxy context per group?). Would
that work or not?

Also, would this need to support nested proxy contexts? That might
complicate things quite a bit, I'm afraid.

FWIW I don't know answers to these questions.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to