Hi Rahila, Thanks for the updated and rebased patch. I've tried the pgbench test again, to see if it gets stuck somewhere, and I'm observing this on a new / idle cluster:
$ pgbench -n -f test.sql -P 1 test -T 60 pgbench (18devel) progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed ... There's always a period of 10-15 seconds when everything seems to be working fine, and then a couple seconds when it gets stuck, with the usual LOG: Wait for 69454 process to publish stats timed out, trying again The PIDs I've seen were for checkpointer, autovacuum launcher, ... all of that are processes that should be handling the signal, so how come it gets stuck every now and then? The system is entirely idle, there's no contention for the shmem stuff, etc. Could it be forgetting about the signal in some cases, or something like that? The test.sql is super simple: SELECT * FROM pg_get_process_memory_contexts( (SELECT pid FROM pg_stat_activity WHERE pid != pg_backend_pid() ORDER BY random() LIMIT 1) , false); Aside from this, I went through the patch to do a regular review, so here's the main comments in somewhat random order: 1) The SGML docs talk about "contexts at level" but I don't think that's defined/explained anywhere, there are different ways to assign levels in a tree-like structure, so it's unclear if levels are assigned from the top or bottom. 2) volatile sig_atomic_t PublishMemoryContextPending = false; I'd move this right after LogMemoryContextPending (to match the other places that add new stuff). 3) typedef enum PrintDetails I suppose this should have some comments, explaining what the typedef is for. Also, "details" sounds pretty generic, perhaps "destination" or maybe "target" would be better? 4) The memcpy here seems unnecessary - the string is going to be static in the binary, no need to copy it. In which case the whole switch is going to be the same as in PutMemoryContextsStatsTupleStore, so maybe move that into a separate function? + switch (context->type) + { + case T_AllocSetContext: + type = "AllocSet"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_GenerationContext: + type = "Generation"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_SlabContext: + type = "Slab"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_BumpContext: + type = "Bump"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + default: + type = "???"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + } 5) The comment about hash table in ProcessGetMemoryContextInterrupt seems pretty far from hash_create(), so maybe move it. 6) ProcessGetMemoryContextInterrupt seems pretty long / complex, with multiple nested loops, it'd be good to split it into smaller parts that are easier to understand. 7) I'm not sure if/why we need to move MemoryContextId to memutils.h. 8) The new stuff in memutils.h is added to the wrong place, into a section labeled "Memory-context-type-specific functions" (which it certainly is not) 9) autovacuum.c adds the ProcessGetMemoryContextInterrupt() call after ProcessCatchupInterrupt() - that's not wrong, but I'd move it right after ProcessLogMemoryContextInterrupt(), just like everywhere else. 10) The pg_get_process_memory_contexts comment says: Signal a backend or an auxiliary process to send its ... But this is not just about the signal, it also waits for the results and produces the result set. 11) pg_get_process_memory_contexts - Wouldn't it be better to move the InitMaterializedSRF() call until after the privilege check, etc.? 12) The pg_get_process_memory_contexts comment should explain why it's superuser-only function. Presumably it has similar DoS risks as the other functions, because if not why would we have the restriction? 13) I reworded and expanded the pg_get_process_memory_contexts comment a bit, and re-wrapped it too. But I think it also needs to explain how it communicates with the other process (sending signal, sending data through a DSA, ...). And also how the timeouts work. 14) I'm a bit confused about the DSA allocations (but I also haven't worked with DSA very much, so maybe it's fine). Presumably the 16MB is upper limit, we won't use that all the time. We allocate 1MB, but allow it to grow up to 16MB, correct? 16MB seems like a lot, certainly enough for this purpose - if it's not, I don't think we can come up with a better limit. 15) In any case, I don't think the 16 should be hardcoded as a magic constant in multiple places. That's bound to be error-prone. 16) I've reformatted / reindented / wrapped the code in various places, to make it easier to read and more consistent with the nearby code. I also added a bunch of comments explaining what the block of code is meant to do (I mean, what it aims to do). 16) A comment in pg_get_process_memory_contexts says: Pin the mapping so that it doesn't throw a warning That doesn't seem very useful. It's not clear what kind of warning this hides, but more importantly - we're not doing stuff to hide some sort of warning, we do it to prevent what the warning is about. 17) pg_get_process_memory_contexts has a bunch of error cases, where we need to detach the DSA and return NULL. Would be better to do a label with a goto, I think. 18) I think pg_get_process_memory_contexts will have issues if this happens in the first loop: if ((memCtxState[procNumber].proc_id == pid) && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) break; Because then we end up with memctx_info pointing to garbage after the loop. I don't know how hard is to hit this, I guess it can happen in many processes calling pg_get_process_memory_contexts? 19) Minor comment and formatting of MemCtxShmemSize / MemCtxShmemInit. 20) MemoryContextInfo etc. need to be added to typedefs.list, so that pgindent can do the right thing. 21) I think ProcessGetMemoryContextInterrupt has a bug because it uses get_summary before reading it from the shmem. Attached are two patches - 0001 is the original patch, 0002 has most of my review comments (mentioned above), and a couple additional changes to comments/formatting, etc. Those are suggestions rather than issues. regards -- Tomas Vondra
From be12803e8dbc671595f7945693b3f70abb2b8745 Mon Sep 17 00:00:00 2001 From: Rahila Syed <rahilasyed...@gmail.com> Date: Sun, 15 Sep 2024 17:56:06 +0530 Subject: [PATCH vtomas 1/2] Function to report memory context stats of any backend This function sends a signal to a backend to publish statistics of all its memory contexts. Signal handler sets a flag, which causes the relevant backend to copy its MemoryContextStats to a DSA, as part of next CHECK_FOR_INTERRUPTS(). It there are more that 16MB worth of statistics, the remaining statistics are copied as a cumulative total of the remaining contexts. Once its done, it signals the client backend using a condition variable. The client backend then wakes up, reads the shared memory and returns these values in the form of set of records, one for each memory context, to the user, followed by a cumulative total of the remaining contexts, if any. Each backend and auxiliary process has its own slot for reporting the stats. There is an array of such memory slots of size MaxBackends+NumofAuxiliary processes in fixed shared memory. Each of these slots point to a DSA, which contains the stats to be shared by the corresponding process. Each slot has its own LW lock and condition variable for synchronization and communication between the publishing process and the client backend. --- doc/src/sgml/func.sgml | 26 ++ src/backend/postmaster/autovacuum.c | 4 + src/backend/postmaster/checkpointer.c | 4 + src/backend/postmaster/interrupt.c | 4 + src/backend/postmaster/pgarch.c | 4 + src/backend/postmaster/startup.c | 4 + src/backend/postmaster/walsummarizer.c | 4 + src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procsignal.c | 3 + src/backend/tcop/postgres.c | 3 + .../utils/activity/wait_event_names.txt | 1 + src/backend/utils/adt/mcxtfuncs.c | 274 ++++++++++- src/backend/utils/init/globals.c | 1 + src/backend/utils/mmgr/mcxt.c | 424 +++++++++++++++++- src/include/catalog/pg_proc.dat | 10 + src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 1 + src/include/utils/memutils.h | 51 +++ src/test/regress/expected/sysviews.out | 12 + src/test/regress/sql/sysviews.sql | 12 + 20 files changed, 822 insertions(+), 23 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 47370e581ae..5d0399508ea 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28358,6 +28358,32 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_get_process_memory_contexts</primary> + </indexterm> + <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> ) + <returnvalue>setof record</returnvalue> + </para> + <para> + This function handles requests to display the memory contexts of a + PostgreSQL process with the specified process ID (PID). It takes two + arguments: PID and a boolean, get_summary. The function can send requests + to both backend and auxiliary processes. + + After receiving memory context statistics from the target process, it + returns the results as one row per context. The num_agg_contexts column + indicates the number of contexts aggregated in the displayed statistics. + + When get_summary is set to true, memory context statistics at levels 1 and 2, + are displayed with each level 2 context showing a cumulative total of all + its child contexts. + When get_summary is set to false, the num_agg_contexts value is 1, indicating + that individual statistics are being displayed. + </para></entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 3f826532b88..eb4c98a17bc 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void) /* Process sinval catchup interrupts that happened while sleeping */ ProcessCatchupInterrupt(); + + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); } /* diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 9bfd0fd665c..ee8360ad6fa 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void) /* Perform logging of memory contexts of this process */ if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); } /* diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c index be69e4c7136..9481a5cd241 100644 --- a/src/backend/postmaster/interrupt.c +++ b/src/backend/postmaster/interrupt.c @@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void) /* Perform logging of memory contexts of this process */ if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); } /* diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 12ee815a626..cd1ecb6b93d 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -865,6 +865,10 @@ HandlePgArchInterrupts(void) if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); + if (ConfigReloadPending) { char *archiveLib = pstrdup(XLogArchiveLibrary); diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index 59d213031b3..d670954c4e9 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void) /* Perform logging of memory contexts of this process */ if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); } diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c index ffbf0439358..b1a5e86a85c 100644 --- a/src/backend/postmaster/walsummarizer.c +++ b/src/backend/postmaster/walsummarizer.c @@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void) /* Perform logging of memory contexts of this process */ if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); } /* diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 174eed70367..4a70eabf7f3 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -50,6 +50,7 @@ #include "storage/sinvaladt.h" #include "utils/guc.h" #include "utils/injection_point.h" +#include "utils/memutils.h" /* GUCs */ int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE; @@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void) StatsShmemInit(); WaitEventCustomShmemInit(); InjectionPointShmemInit(); + MemCtxShmemInit(); } /* diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 7401b6e625e..e425b9eeb03 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS) if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT)) HandleLogMemoryContextInterrupt(); + if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT)) + HandleGetMemoryContextInterrupt(); + if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE)) HandleParallelApplyMessageInterrupt(); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index c01cff9d650..0eae9be122b 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3497,6 +3497,9 @@ ProcessInterrupts(void) if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); + if (ParallelApplyMessagePending) HandleParallelApplyMessages(); } diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 0b53cba807d..68a17699675 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit." WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication." WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated." XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end." +MEM_CTX_PUBLISH "Waiting for backend to publish memory information." ABI_compatibility: diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c index 396c2f223b4..c067cdf8709 100644 --- a/src/backend/utils/adt/mcxtfuncs.c +++ b/src/backend/utils/adt/mcxtfuncs.c @@ -17,28 +17,25 @@ #include "funcapi.h" #include "mb/pg_wchar.h" +#include "miscadmin.h" +#include "access/twophase.h" +#include "catalog/pg_authid_d.h" +#include "nodes/pg_list.h" #include "storage/proc.h" #include "storage/procarray.h" +#include "utils/acl.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/hsearch.h" +#include "utils/memutils.h" +#include "utils/wait_event_types.h" /* ---------- * The max bytes for showing identifiers of MemoryContext. * ---------- */ -#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024 -/* - * MemoryContextId - * Used for storage of transient identifiers for - * pg_get_backend_memory_contexts. - */ -typedef struct MemoryContextId -{ - MemoryContext context; - int context_id; -} MemoryContextId; +struct MemoryContextState *memCtxState = NULL; /* * int_list_to_array @@ -71,7 +68,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore, TupleDesc tupdesc, MemoryContext context, HTAB *context_id_lookup) { -#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10 +#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11 Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; @@ -305,3 +302,256 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) PG_RETURN_BOOL(true); } + +/* + * pg_get_process_memory_contexts + * Signal a backend or an auxiliary process to send its memory contexts. + * + * On receipt of this signal, a backend or an auxiliary process sets the flag + * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS() + * or process-specific interrupt handler to copy the memory context statistics + * in a dynamic shared memory space. The statistics for contexts that do not fit in + * shared memory area are stored as a cumulative total of those contexts, + * at the end in the dynamic shared memory. + * Wait for the backend to send signal on the condition variable after + * writing statistics to a shared memory. + * Once condition variable comes out of sleep, check if the required + * backends statistics are available to read and display. + */ +Datum +pg_get_process_memory_contexts(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_INT32(0); + bool get_summary = PG_GETARG_BOOL(1); + PGPROC *proc; + ProcNumber procNumber = INVALID_PROC_NUMBER; + int i; + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + dsa_area *area; + dsa_handle handle; + MemoryContextInfo *memctx_info; + MemoryContext oldContext; + int num_retries = 0; + + InitMaterializedSRF(fcinfo, 0); + + /* + * Only superusers or users with pg_read_all_stats privileges can view the + * memory context statistics of another process + */ + if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("memory context statistics privilege error"))); + + /* + * See if the process with given pid is a backend or an auxiliary process. + */ + proc = BackendPidGetProc(pid); + if (proc == NULL) + proc = AuxiliaryPidGetProc(pid); + + /* + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid + * isn't valid; but by the time we reach kill(), a process for which we + * get a valid proc here might have terminated on its own. There's no way + * to acquire a lock on an arbitrary process to prevent that. But since + * this mechanism is usually used to debug a backend or an auxiliary + * process running and consuming lots of memory, that it might end on its + * own first and its memory contexts are not logged is not a problem. + */ + if (proc == NULL) + { + /* + * This is just a warning so a loop-through-resultset will not abort + * if one backend terminated on its own during the run. + */ + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", pid))); + PG_RETURN_NULL(); + } + + procNumber = GetNumberFromPGProc(proc); + if (procNumber == MyProcNumber) + { + ereport(WARNING, + (errmsg("cannot return statistics for local backend"), + errhint("Use pg_backend_memory_contexts view instead"))); + PG_RETURN_NULL(); + } + + /* + * Return statistics of top level 1 and 2 contexts, if get_summary is + * true. + */ + LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); + memCtxState[procNumber].get_summary = get_summary; + + /* + * Create a DSA segment with maximum size of 16MB, send handle to the + * publishing process for storing the stats. If number of contexts exceed + * 16MB, a cumulative total is stored for such contexts. + */ + if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID) + { + oldContext = MemoryContextSwitchTo(TopMemoryContext); + area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE, + 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE); + MemoryContextSwitchTo(oldContext); + handle = dsa_get_handle(area); + memCtxState[procNumber].memstats_dsa_handle = handle; + /* Pin the mapping so that it doesn't throw a warning */ + dsa_pin(area); + dsa_pin_mapping(area); + } + else + { + area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle); + dsa_pin_mapping(area); + } + LWLockRelease(&memCtxState[procNumber].lw_lock); + if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0) + { + ereport(WARNING, + (errmsg("could not send signal to process %d: %m", pid))); + dsa_detach(area); + PG_RETURN_NULL(); + } + + /* + * Wait for a backend to publish stats, indicated by a valid dsa pointer + * set by the backend. + */ + ConditionVariablePrepareToSleep(&memCtxState[procNumber].memctx_cv); + while (1) + { + proc = BackendPidGetProc(pid); + if (proc == NULL) + proc = AuxiliaryPidGetProc(pid); + if (proc == NULL) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", pid))); + dsa_detach(area); + PG_RETURN_NULL(); + } +#define MEMSTATS_WAIT_TIMEOUT 5000 +#define MAX_RETRIES 20 + if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT, + WAIT_EVENT_MEM_CTX_PUBLISH)) + { + ereport(LOG, + (errmsg("Wait for %d process to publish stats timed out, trying again", pid))); + if (num_retries > MAX_RETRIES) + { + dsa_detach(area); + PG_RETURN_NULL(); + } + num_retries = num_retries + 1; + } + + /* + * We expect to come out of sleep when the requested process has + * finished publishing the statistics, verified using the a valid dsa + * pointer. + * + * Make sure that the information belongs to pid we requested + * information for, Otherwise loop back and wait for the server + * process to finish publishing statistics. + */ + LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); + + if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) + break; + else + LWLockRelease(&memCtxState[procNumber].lw_lock); + } + if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) + memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer); + /* Backend has finished publishing the stats, read them */ + for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++) + { + ArrayType *path_array; + int path_length; + Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; + bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; + + memset(values, 0, sizeof(values)); + memset(nulls, 0, sizeof(nulls)); + + if (strlen(memctx_info[i].name) != 0) + values[0] = CStringGetTextDatum(memctx_info[i].name); + else + nulls[0] = true; + if (strlen(memctx_info[i].ident) != 0) + values[1] = CStringGetTextDatum(memctx_info[i].ident); + else + nulls[1] = true; + + values[2] = CStringGetTextDatum(memctx_info[i].type); + path_length = memctx_info[i].path_length; + path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID); + values[3] = PointerGetDatum(path_array); + values[4] = Int64GetDatum(memctx_info[i].totalspace); + values[5] = Int64GetDatum(memctx_info[i].nblocks); + values[6] = Int64GetDatum(memctx_info[i].freespace); + values[7] = Int64GetDatum(memctx_info[i].freechunks); + values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace); + values[9] = Int32GetDatum(memctx_info[i].num_contexts); + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + } + /* If there are more contexts, display a cumulative total of those */ + if (memCtxState[procNumber].total_stats > i) + { + Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; + bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; + + values[0] = CStringGetTextDatum(memctx_info[i].name); + nulls[1] = true; + nulls[2] = true; + nulls[3] = true; + values[4] = Int64GetDatum(memctx_info[i].totalspace); + values[5] = Int64GetDatum(memctx_info[i].nblocks); + values[6] = Int64GetDatum(memctx_info[i].freespace); + values[7] = Int64GetDatum(memctx_info[i].freechunks); + values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace); + values[9] = Int32GetDatum(memctx_info[i].num_contexts); + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + } + LWLockRelease(&memCtxState[procNumber].lw_lock); + ConditionVariableCancelSleep(); + dsa_detach(area); + PG_RETURN_NULL(); +} + +static Size +MemCtxShmemSize(void) +{ + Size size; + Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)); + + size = TotalProcs * sizeof(MemoryContextState); + return size; +} + +void +MemCtxShmemInit(void) +{ + bool found; + Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)); + + memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState", + MemCtxShmemSize(), + &found); + if (!found) + { + for (int i = 0; i < TotalProcs; i++) + { + ConditionVariableInit(&memCtxState[i].memctx_cv); + LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId()); + LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting"); + memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID; + memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer; + } + } +} diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index b844f9fdaef..6bc253da5da 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false; volatile uint32 InterruptHoldoffCount = 0; volatile uint32 QueryCancelHoldoffCount = 0; volatile uint32 CritSectionCount = 0; +volatile sig_atomic_t PublishMemoryContextPending = false; int MyProcPid; pg_time_t MyStartTime; diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index aa6da0d0352..245aba5987c 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -19,16 +19,22 @@ *------------------------------------------------------------------------- */ +#include <math.h> #include "postgres.h" #include "mb/pg_wchar.h" #include "miscadmin.h" +#include "nodes/pg_list.h" +#include "storage/fd.h" +#include "storage/lwlock.h" +#include "storage/dsm.h" +#include "utils/dsa.h" +#include "utils/hsearch.h" #include "utils/memdebug.h" #include "utils/memutils.h" #include "utils/memutils_internal.h" #include "utils/memutils_memorychunk.h" - static void BogusFree(void *pointer); static void *BogusRealloc(void *pointer, Size size, int flags); static MemoryContext BogusGetChunkContext(void *pointer); @@ -135,6 +141,12 @@ static const MemoryContextMethods mcxt_methods[] = { }; #undef BOGUS_MCTX +typedef enum PrintDetails +{ + PRINT_STATS_TO_STDERR = 0, + PRINT_STATS_TO_LOGS, + PRINT_STATS_NONE +} PrintDetails; /* * CurrentMemoryContext @@ -162,10 +174,11 @@ static void MemoryContextCallResetCallbacks(MemoryContext context); static void MemoryContextStatsInternal(MemoryContext context, int level, int max_level, int max_children, MemoryContextCounters *totals, - bool print_to_stderr); + PrintDetails print_location, int *num_contexts); static void MemoryContextStatsPrint(MemoryContext context, void *passthru, const char *stats_string, bool print_to_stderr); +static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts); /* * You should not do memory allocations within a critical section, because @@ -831,11 +844,19 @@ MemoryContextStatsDetail(MemoryContext context, bool print_to_stderr) { MemoryContextCounters grand_totals; + int num_contexts; + PrintDetails print_location; memset(&grand_totals, 0, sizeof(grand_totals)); + if (print_to_stderr) + print_location = PRINT_STATS_TO_STDERR; + else + print_location = PRINT_STATS_TO_LOGS; + + /* num_contexts report number of contexts aggregated in the output */ MemoryContextStatsInternal(context, 0, max_level, max_children, - &grand_totals, print_to_stderr); + &grand_totals, print_location, &num_contexts); if (print_to_stderr) fprintf(stderr, @@ -876,18 +897,43 @@ static void MemoryContextStatsInternal(MemoryContext context, int level, int max_level, int max_children, MemoryContextCounters *totals, - bool print_to_stderr) + PrintDetails print_location, int *num_contexts) { MemoryContext child; int ichild; + bool print_to_stderr = true; Assert(MemoryContextIsValid(context)); - /* Examine the context itself */ - context->methods->stats(context, - MemoryContextStatsPrint, - &level, - totals, print_to_stderr); + if (print_location == PRINT_STATS_TO_STDERR) + print_to_stderr = true; + else if (print_location == PRINT_STATS_TO_LOGS) + print_to_stderr = false; + + if (print_location != PRINT_STATS_NONE) + { + /* Examine the context itself */ + context->methods->stats(context, + MemoryContextStatsPrint, + &level, + totals, print_to_stderr); + } + /* Do not print the statistics */ + + /* + * print_to_stderr is a no-op if no statistics are going to be printed i.e + * print_location == PRINT_STATS_NONE + */ + else + { + /* Examine the context itself */ + context->methods->stats(context, + NULL, + NULL, + totals, print_to_stderr); + } + /* Increment the context count */ + *num_contexts = *num_contexts + 1; /* * Examine children. @@ -907,7 +953,7 @@ MemoryContextStatsInternal(MemoryContext context, int level, MemoryContextStatsInternal(child, level + 1, max_level, max_children, totals, - print_to_stderr); + print_location, num_contexts); } } @@ -925,6 +971,7 @@ MemoryContextStatsInternal(MemoryContext context, int level, ichild++; child = MemoryContextTraverseNext(child, context); } + *num_contexts = *num_contexts + ichild; if (print_to_stderr) { @@ -939,7 +986,7 @@ MemoryContextStatsInternal(MemoryContext context, int level, local_totals.freechunks, local_totals.totalspace - local_totals.freespace); } - else + else if (print_location != PRINT_STATS_NONE) ereport(LOG_SERVER_ONLY, (errhidestmt(true), errhidecontext(true), @@ -1276,6 +1323,21 @@ HandleLogMemoryContextInterrupt(void) /* latch will be set by procsignal_sigusr1_handler */ } +/* + * HandleGetMemoryContextInterrupt + * Handle receipt of an interrupt indicating publishing of memory + * contexts. + * + * All the actual work is deferred to ProcessLogMemoryContextInterrupt() + */ +void +HandleGetMemoryContextInterrupt(void) +{ + InterruptPending = true; + PublishMemoryContextPending = true; + /* latch will be set by procsignal_sigusr1_handler */ +} + /* * ProcessLogMemoryContextInterrupt * Perform logging of memory contexts of this backend process. @@ -1313,6 +1375,346 @@ ProcessLogMemoryContextInterrupt(void) MemoryContextStatsDetail(TopMemoryContext, 100, 100, false); } +/* + * Run by each backend to publish their memory context + * statistics. It performs a breadth first search + * on the memory context tree, so that the parents + * get a chance to report stats before their children. + * + * Statistics are shared via dynamic shared memory which + * can hold statistics of approx 6700 contexts. Remaining + * contexts statistics is captured as a cumulative total. + */ +void +ProcessGetMemoryContextInterrupt(void) +{ + /* Store the memory context details in shared memory */ + + List *contexts; + + HASHCTL ctl; + HTAB *context_id_lookup; + int context_id = 0; + bool found; + MemoryContext stat_cxt; + MemoryContextInfo *meminfo; + bool get_summary = false; + dsa_area *area; + int num_stats; + int idx = MyProcNumber; + int stats_count = 0; + MemoryContextCounters stat; + + PublishMemoryContextPending = false; + + /* + * The hash table is used for constructing "path" column of + * pg_get_process_memory_context is view, similar to its local backend + * couterpart. + */ + + /* + * Make a new context that will contain the hash table, to ease the + * cleanup + */ + + stat_cxt = AllocSetContextCreate(CurrentMemoryContext, + "Memory context statistics", + ALLOCSET_DEFAULT_SIZES); + + ctl.keysize = sizeof(MemoryContext); + ctl.entrysize = sizeof(MemoryContextId); + ctl.hcxt = stat_cxt; + + context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts", + 256, + &ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + contexts = list_make1(TopMemoryContext); + + /* Compute the number of stats that can fit in the DSM seg */ + num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo)); + + /* + * Traverse the memory context tree to find total number of contexts. If + * summary is requested find the total number of contexts at level 1 and + * 2. + */ + foreach_ptr(MemoryContextData, cur, contexts) + { + MemoryContextId *entry; + + entry = (MemoryContextId *) hash_search(context_id_lookup, &cur, + HASH_ENTER, &found); + stats_count = stats_count + 1; + /* context id starts with 1 */ + entry->context_id = stats_count; + + /* Append the children of the current context to the main list */ + for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild) + { + if (get_summary) + { + entry = (MemoryContextId *) hash_search(context_id_lookup, &c, + HASH_ENTER, &found); + stats_count = stats_count + 1; + entry->context_id = stats_count; + } + contexts = lappend(contexts, c); + } + /* In summary only the first level contexts are displayed */ + if (get_summary) + break; + } + + /* + * Allocate memory in this process's dsa for storing statistics of the the + * memory contexts upto num_stats, for contexts that don't fit in the DSA + * segment, a cumulative total is written as the last record in the DSA + * segment. + */ + stats_count = stats_count > num_stats ? num_stats : stats_count; + + /* Attach to DSA segment */ + LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE); + area = dsa_attach(memCtxState[idx].memstats_dsa_handle); + memCtxState[idx].proc_id = MyProcPid; + get_summary = memCtxState[idx].get_summary; + + /* Free the memory allocated previously by the same process */ + if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer)) + { + dsa_free(area, memCtxState[idx].memstats_dsa_pointer); + memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer; + } + memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo)); + meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer); + + if (get_summary) + { + int ctx_id = 0; + List *path = NIL; + + /* Copy TopMemoryContext statistics to DSA */ + path = lcons_int(1, path); + memset(&stat, 0, sizeof(stat)); + (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true); + PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, NULL, stat, 1); + ctx_id = ctx_id + 1; + + /* + * Copy statistics for each of TopMemoryContexts children(XXX. Make it + * capped at 100). This includes statistics of all of their children + * upto level 100 + */ + for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild) + { + MemoryContextCounters grand_totals; + int num_contexts = 0; + char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE]; + + path = NIL; + memset(&grand_totals, 0, sizeof(grand_totals)); + + MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts); + + /* + * Figure out the transient context_id of this context and each of + * its ancestors. + */ + for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent) + { + MemoryContextId *cur_entry; + + cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found); + + if (!found) + { + elog(LOG, "hash table corrupted, can't construct path value"); + break; + } + path = lcons_int(cur_entry->context_id, path); + } + /* Trim and copy the identifier if it is not set to NULL */ + if (c->ident != NULL) + { + int idlen = strlen(c->ident); + + /* + * Some identifiers such as SQL query string can be very long, + * truncate oversize identifiers. + */ + if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE) + idlen = pg_mbcliplen(c->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1); + + memcpy(clipped_ident, c->ident, idlen); + clipped_ident[idlen] = '\0'; + } + PublishMemoryContext(meminfo, ctx_id, c, path, (c->ident != NULL ? clipped_ident : NULL), grand_totals, num_contexts); + ctx_id = ctx_id + 1; + } + /* For summary mode, total_stats and in_memory_stats remain the same */ + memCtxState[idx].in_memory_stats = ctx_id; + memCtxState[idx].total_stats = ctx_id; + goto cleanup; + } + + foreach_ptr(MemoryContextData, cur, contexts) + { + List *path = NIL; + char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE]; + + /* + * Figure out the transient context_id of this context and each of its + * ancestors. + */ + for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent) + { + MemoryContextId *cur_entry; + + cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found); + + if (!found) + { + elog(LOG, "hash table corrupted, can't construct path value"); + break; + } + path = lcons_int(cur_entry->context_id, path); + } + /* Trim and copy the identifier if it is not set to NULL */ + if (cur->ident != NULL) + { + int idlen = strlen(cur->ident); + + /* + * Some identifiers such as SQL query string can be very long, + * truncate oversize identifiers. + */ + if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE) + idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1); + + memcpy(clipped_ident, cur->ident, idlen); + clipped_ident[idlen] = '\0'; + } + if (context_id <= (num_stats - 2)) + { + /* Examine the context stats */ + memset(&stat, 0, sizeof(stat)); + (*cur->methods->stats) (cur, NULL, NULL, &stat, true); + /* Copy statistics to DSM memory */ + PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL), stat, 1); + } + else + { + /* Examine the context stats */ + memset(&stat, 0, sizeof(stat)); + (*cur->methods->stats) (cur, NULL, NULL, &stat, true); + + meminfo[num_stats - 1].totalspace += stat.totalspace; + meminfo[num_stats - 1].nblocks += stat.nblocks; + meminfo[num_stats - 1].freespace += stat.freespace; + meminfo[num_stats - 1].freechunks += stat.freechunks; + } + + /* + * DSA max limit is reached, write total of the remaining statistics. + */ + if (context_id == (num_stats - 2) && context_id < (stats_count - 1)) + { + memCtxState[idx].in_memory_stats = context_id + 1; + strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16); + } + context_id++; + } + if (context_id < (num_stats - 2)) + { + memCtxState[idx].in_memory_stats = context_id; + } + /* Report number of aggregated memory contexts */ + else + { + meminfo[num_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats; + } + memCtxState[idx].total_stats = context_id; +cleanup: + + /* + * Signal all the waiting client backends after setting the exit condition + * flag + */ + LWLockRelease(&memCtxState[idx].lw_lock); + ConditionVariableBroadcast(&memCtxState[idx].memctx_cv); + /* Delete the hash table memory context */ + MemoryContextDelete(stat_cxt); + + dsa_detach(area); +} + +static void +PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts) +{ + char *type; + + if (context->name != NULL) + { + Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE); + strncpy(memctx_info[curr_id].name, context->name, strlen(context->name)); + } + else + memctx_info[curr_id].name[0] = '\0'; + + if (clipped_ident != NULL) + { + /* + * To be consistent with logging output, we label dynahash contexts + * with just the hash table name as with MemoryContextStatsPrint(). + */ + if (!strncmp(context->name, "dynahash", 8)) + { + strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident)); + memctx_info[curr_id].ident[0] = '\0'; + } + else + strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident)); + } + else + memctx_info[curr_id].ident[0] = '\0'; + + memctx_info[curr_id].path_length = list_length(path); + foreach_int(i, path) + memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i); + + switch (context->type) + { + case T_AllocSetContext: + type = "AllocSet"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_GenerationContext: + type = "Generation"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_SlabContext: + type = "Slab"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_BumpContext: + type = "Bump"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + default: + type = "???"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + } + memctx_info[curr_id].totalspace = stat.totalspace; + memctx_info[curr_id].nblocks = stat.nblocks; + memctx_info[curr_id].freespace = stat.freespace; + memctx_info[curr_id].freechunks = stat.freechunks; + memctx_info[curr_id].num_contexts = num_contexts; +} + void * palloc(Size size) { diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index b37e8a6f882..4d6ae0728ac 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8449,6 +8449,16 @@ prorettype => 'bool', proargtypes => 'int4', prosrc => 'pg_log_backend_memory_contexts' }, +# publishing memory contexts of the specified postgres process +{ oid => '2173', descr => 'publish memory contexts of the specified backend', + proname => 'pg_get_process_memory_contexts', provolatile => 'v', + prorows => '100', proretset => 't', proparallel => 'r', + prorettype => 'record', proargtypes => 'int4 bool', + proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4}', + proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}', + proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}', + prosrc => 'pg_get_process_memory_contexts' }, + # non-persistent series generator { oid => '1066', descr => 'non-persistent series generator', proname => 'generate_series', prorows => '1000', diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index d016a9c9248..fc75ea143c3 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending; extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending; extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending; +extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending; extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending; extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost; diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h index 022fd8ed933..477ab993386 100644 --- a/src/include/storage/procsignal.h +++ b/src/include/storage/procsignal.h @@ -35,6 +35,7 @@ typedef enum PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */ PROCSIG_BARRIER, /* global barrier interrupt */ PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */ + PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */ PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */ /* Recovery conflict reasons */ diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 8abc26abce2..9fac394aad3 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -18,6 +18,9 @@ #define MEMUTILS_H #include "nodes/memnodes.h" +#include "storage/condition_variable.h" +#include "storage/lmgr.h" +#include "utils/dsa.h" /* @@ -48,7 +51,11 @@ #define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize) +#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024 +#define MEM_CONTEXT_SHMEM_STATS_SIZE 30 +#define MEM_CONTEXT_MAX_LEVEL 64 +#define MAX_TYPE_STRING_LENGTH 64 /* * Standard top-level memory contexts. * @@ -115,6 +122,50 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent, Size initBlockSize, Size maxBlockSize); +/* Dynamic shared memory state for Memory Context Statistics reporting */ +typedef struct MemoryContextInfo +{ + char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE]; + char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE]; + Datum path[MEM_CONTEXT_MAX_LEVEL]; + char type[MAX_TYPE_STRING_LENGTH]; + int path_length; + int64 totalspace; + int64 nblocks; + int64 freespace; + int64 freechunks; + int num_contexts; +} MemoryContextInfo; + +typedef struct MemoryContextState +{ + ConditionVariable memctx_cv; + LWLock lw_lock; + int proc_id; + int in_memory_stats; + int total_stats; + bool get_summary; + dsa_handle memstats_dsa_handle; + dsa_pointer memstats_dsa_pointer; + +} MemoryContextState; + +/* + * MemoryContextId + * Used for storage of transient identifiers for + * pg_get_backend_memory_contexts. + */ +typedef struct MemoryContextId +{ + MemoryContext context; + int context_id; +} MemoryContextId; + +extern PGDLLIMPORT MemoryContextState * memCtxState; +extern void ProcessGetMemoryContextInterrupt(void); +extern void HandleGetMemoryContextInterrupt(void); +extern void MemCtxShmemInit(void); + /* * This wrapper macro exists to check for non-constant strings used as context * names; that's no longer supported. (Use MemoryContextSetIdentifier if you diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 91089ac215f..5e3382132c3 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs; t (1 row) +DO $$ +DECLARE + checkpointer_pid int; + r RECORD; +BEGIN + SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid; + + select type, name, ident + from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r; + RAISE NOTICE '%', r; +END $$; +NOTICE: (AllocSet,TopMemoryContext,) diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql index b2a79237543..f3127fea400 100644 --- a/src/test/regress/sql/sysviews.sql +++ b/src/test/regress/sql/sysviews.sql @@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia'; select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs; set timezone_abbreviations = 'India'; select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs; + +DO $$ +DECLARE + checkpointer_pid int; + r RECORD; +BEGIN + SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid; + + select type, name, ident + from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r; + RAISE NOTICE '%', r; +END $$; -- 2.47.1
From 0ef536a26403973b081fd14de79e047cd0cd1a13 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <to...@vondra.me> Date: Mon, 6 Jan 2025 19:22:14 +0100 Subject: [PATCH vtomas 2/2] review --- src/backend/utils/adt/mcxtfuncs.c | 109 +++++++++++++++++++++++++----- src/backend/utils/mmgr/mcxt.c | 93 +++++++++++++++++-------- src/include/utils/memutils.h | 5 +- 3 files changed, 159 insertions(+), 48 deletions(-) diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c index c067cdf8709..f3ffd6937a5 100644 --- a/src/backend/utils/adt/mcxtfuncs.c +++ b/src/backend/utils/adt/mcxtfuncs.c @@ -307,16 +307,26 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) * pg_get_process_memory_contexts * Signal a backend or an auxiliary process to send its memory contexts. * + * By default, only superusers are allowed to signal to return the memory + * contexts because allowing any users to issue this request at an unbounded + * rate would cause lots of log messages and which can lead to denial of + * service. Additional roles can be permitted with GRANT. + * * On receipt of this signal, a backend or an auxiliary process sets the flag * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS() - * or process-specific interrupt handler to copy the memory context statistics - * in a dynamic shared memory space. The statistics for contexts that do not fit in - * shared memory area are stored as a cumulative total of those contexts, - * at the end in the dynamic shared memory. - * Wait for the backend to send signal on the condition variable after - * writing statistics to a shared memory. - * Once condition variable comes out of sleep, check if the required - * backends statistics are available to read and display. + * or process-specific interrupt handler to copy the memory context details + * to a dynamic shared memory space. + * + * The shared memory buffer has a limited size - it the process has too many + * memory contexts, the memory contexts into that do not fit are summarized + * and represented as cumulative total at the end of the buffer. + * + * Once condition variable comes out of sleep, check if the memory context + * information is available for read and display. + * + * XXX Explain how the backends communicate through condition variables. + * + * XXX Explain what happens with timeouts, etc. */ Datum pg_get_process_memory_contexts(PG_FUNCTION_ARGS) @@ -333,6 +343,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS) MemoryContext oldContext; int num_retries = 0; + /* XXX Shouldn't this be after the privilege check etc.? */ InitMaterializedSRF(fcinfo, 0); /* @@ -383,24 +394,50 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS) /* * Return statistics of top level 1 and 2 contexts, if get_summary is * true. + * + * XXX Is this comment still accurate? Or are we returning information + * about more contexts? Or more precisely, isn't that irrelevant here? + * We just process whatever the process puts into the DSA, right? + * + * XXX I'd move this comment until after we wake up and are ready to + * process the information, close to the comment: + * + * Backend has finished publishing the stats, read them */ - LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); - memCtxState[procNumber].get_summary = get_summary; /* * Create a DSA segment with maximum size of 16MB, send handle to the * publishing process for storing the stats. If number of contexts exceed * 16MB, a cumulative total is stored for such contexts. + * + * XXX 16MB seems like an awfully large amount of memory, particularly + * for small machines. Maybe it should be configurable as a parameter + * of the SQL function? In any case, it should not be hardcoded as a + * magic constant. Maybe add a #define constant? */ + LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); + + memCtxState[procNumber].get_summary = get_summary; + if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID) { oldContext = MemoryContextSwitchTo(TopMemoryContext); - area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE, + + area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, + DSA_DEFAULT_INIT_SEGMENT_SIZE, 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE); + MemoryContextSwitchTo(oldContext); + handle = dsa_get_handle(area); + memCtxState[procNumber].memstats_dsa_handle = handle; - /* Pin the mapping so that it doesn't throw a warning */ + + /* Pin the mapping so that it doesn't throw a warning + * + * XXX We don't pin stuff "so that it doesn't throw a warning". Surely + * the warning has a reason, so maybe mention that? + */ dsa_pin(area); dsa_pin_mapping(area); } @@ -409,11 +446,20 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS) area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle); dsa_pin_mapping(area); } + LWLockRelease(&memCtxState[procNumber].lw_lock); + + /* + * Send a signal to the auxiliary process, informing it we want it to + * produce information about memory contexts. + */ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0) { ereport(WARNING, (errmsg("could not send signal to process %d: %m", pid))); + + /* XXX We do exactly this in a number of places. Maybe it'd be better + * to define an "error" label at the end, and goto to it? */ dsa_detach(area); PG_RETURN_NULL(); } @@ -461,13 +507,22 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS) */ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); - if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) + /* + * XXX Explain how could it happen that the PID does not match. + */ + if ((memCtxState[procNumber].proc_id == pid) && + DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) break; else LWLockRelease(&memCtxState[procNumber].lw_lock); } + if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer); + + /* XXX What if the memstats_dsa_pointer is not valid? Is it even possible? + * If it is, we have garbage in memctx_info. Maybe should be an Assert()? */ + /* Backend has finished publishing the stats, read them */ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++) { @@ -489,17 +544,21 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS) nulls[1] = true; values[2] = CStringGetTextDatum(memctx_info[i].type); + path_length = memctx_info[i].path_length; path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID); values[3] = PointerGetDatum(path_array); + values[4] = Int64GetDatum(memctx_info[i].totalspace); values[5] = Int64GetDatum(memctx_info[i].nblocks); values[6] = Int64GetDatum(memctx_info[i].freespace); values[7] = Int64GetDatum(memctx_info[i].freechunks); values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace); values[9] = Int32GetDatum(memctx_info[i].num_contexts); + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); } + /* If there are more contexts, display a cumulative total of those */ if (memCtxState[procNumber].total_stats > i) { @@ -516,29 +575,41 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS) values[7] = Int64GetDatum(memctx_info[i].freechunks); values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace); values[9] = Int32GetDatum(memctx_info[i].num_contexts); + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); } + LWLockRelease(&memCtxState[procNumber].lw_lock); + ConditionVariableCancelSleep(); + dsa_detach(area); PG_RETURN_NULL(); } +/* + * Shared memory sizing for reporting memory context information. + */ static Size MemCtxShmemSize(void) { - Size size; - Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)); + Size TotalProcs = + add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)); - size = TotalProcs * sizeof(MemoryContextState); - return size; + return mul_size(TotalProcs, sizeof(MemoryContextState)); } +/* + * Init shared memory for reporting memory context information. + * + * XXX Should this check IsUnderPostmaster, similarly to e.g. CommitTsShmemInit? + */ void MemCtxShmemInit(void) { bool found; - Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)); + Size TotalProcs = + add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)); memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState", MemCtxShmemSize(), @@ -548,8 +619,10 @@ MemCtxShmemInit(void) for (int i = 0; i < TotalProcs; i++) { ConditionVariableInit(&memCtxState[i].memctx_cv); + LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId()); LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting"); + memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID; memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer; } diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 245aba5987c..8992a01ee32 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -178,7 +178,11 @@ static void MemoryContextStatsInternal(MemoryContext context, int level, static void MemoryContextStatsPrint(MemoryContext context, void *passthru, const char *stats_string, bool print_to_stderr); -static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts); +static void PublishMemoryContext(MemoryContextInfo * memctx_infos, + int curr_id, MemoryContext context, + List *path, char *clipped_ident, + MemoryContextCounters stat, + int num_contexts); /* * You should not do memory allocations within a critical section, because @@ -1376,20 +1380,22 @@ ProcessLogMemoryContextInterrupt(void) } /* - * Run by each backend to publish their memory context - * statistics. It performs a breadth first search - * on the memory context tree, so that the parents - * get a chance to report stats before their children. + * ProcessGetMemoryContextInterrupt + * Generate information about memory contexts used by the process. * - * Statistics are shared via dynamic shared memory which - * can hold statistics of approx 6700 contexts. Remaining - * contexts statistics is captured as a cumulative total. + * Performs a breadth first search on the memory context tree, so that the + * parents get a chance to report stats before their children. + * + * Statistics are shared via dynamic shared memory which can hold statistics + * of approx 6700 contexts. Remaining contexts statistics is captured as a + * cumulative total. + * + * XXX Seems a bit fragile to mention the number of contexts here - if the + * DSA size changes (in mcxtfuncs.c), this will get stale. */ void ProcessGetMemoryContextInterrupt(void) { - /* Store the memory context details in shared memory */ - List *contexts; HASHCTL ctl; @@ -1407,21 +1413,18 @@ ProcessGetMemoryContextInterrupt(void) PublishMemoryContextPending = false; - /* - * The hash table is used for constructing "path" column of - * pg_get_process_memory_context is view, similar to its local backend - * couterpart. - */ - /* * Make a new context that will contain the hash table, to ease the - * cleanup + * cleanup. */ - stat_cxt = AllocSetContextCreate(CurrentMemoryContext, "Memory context statistics", ALLOCSET_DEFAULT_SIZES); + /* + * The hash table used for constructing "path" column of the view, similar + * to its local backend counterpart. + */ ctl.keysize = sizeof(MemoryContext); ctl.entrysize = sizeof(MemoryContextId); ctl.hcxt = stat_cxt; @@ -1431,39 +1434,65 @@ ProcessGetMemoryContextInterrupt(void) &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + /* List of contexts to process in the next round - start at the top. */ contexts = list_make1(TopMemoryContext); /* Compute the number of stats that can fit in the DSM seg */ + /* + * XXX Don't hardcode the size in two places. Define a constant or + * something like that, so that we can change one place and it stays + * in sync. Or even better, define it just in mcxtfuncs.c, and store + * the size in the shmem. + * + * XXX The name is misleading - this is not the number of stats we're + * about to produce, it's the maximum number of entries we can fit into + * the shmem. I'd name this max_stats. + * + * XXX Also, what if we fill exactly this number of contexts? Won't we + * lose the last entry because it will be overwitten by the summary? + */ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo)); /* * Traverse the memory context tree to find total number of contexts. If * summary is requested find the total number of contexts at level 1 and * 2. + * + * XXX I'm confused about how this interacts with the get_summary flag. + * In fact, this always uses get_summary=false, because we only read the + * flag from shmem later. Seems like a bug. */ foreach_ptr(MemoryContextData, cur, contexts) { - MemoryContextId *entry; + MemoryContextId *entry; entry = (MemoryContextId *) hash_search(context_id_lookup, &cur, HASH_ENTER, &found); - stats_count = stats_count + 1; + Assert(!found); + /* context id starts with 1 */ - entry->context_id = stats_count; + entry->context_id = (++stats_count); - /* Append the children of the current context to the main list */ + /* Append the children of the current context to the main list. */ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild) { + /* XXX I don't understand why we need to check get_summary here? */ if (get_summary) { entry = (MemoryContextId *) hash_search(context_id_lookup, &c, HASH_ENTER, &found); - stats_count = stats_count + 1; - entry->context_id = stats_count; + Assert(!found); + + entry->context_id = (++stats_count); } + contexts = lappend(contexts, c); } - /* In summary only the first level contexts are displayed */ + + /* In summary only the first level contexts are displayed + * + * XXX Probably should say "first two levels"? + */ if (get_summary) break; } @@ -1474,23 +1503,30 @@ ProcessGetMemoryContextInterrupt(void) * segment, a cumulative total is written as the last record in the DSA * segment. */ - stats_count = stats_count > num_stats ? num_stats : stats_count; + stats_count = (stats_count > num_stats) ? num_stats : stats_count; /* Attach to DSA segment */ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE); area = dsa_attach(memCtxState[idx].memstats_dsa_handle); + memCtxState[idx].proc_id = MyProcPid; + + /* XXX this is where we get the get_summary flag */ get_summary = memCtxState[idx].get_summary; - /* Free the memory allocated previously by the same process */ + /* Free the memory allocated previously by the same process. */ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer)) { dsa_free(area, memCtxState[idx].memstats_dsa_pointer); memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer; } + memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo)); meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer); + /* XXX I find this really hard to understand, with the nested loops etc. + * I suggest breaking this up into smaller functions, and calling them + * (easier to understand) than huge lump of code. */ if (get_summary) { int ctx_id = 0; @@ -1506,7 +1542,7 @@ ProcessGetMemoryContextInterrupt(void) /* * Copy statistics for each of TopMemoryContexts children(XXX. Make it * capped at 100). This includes statistics of all of their children - * upto level 100 + * upto level 100. */ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild) { @@ -1651,6 +1687,7 @@ cleanup: dsa_detach(area); } +/* XXX this really needs some better formatting and comments */ static void PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts) { diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 9fac394aad3..49c7bf5c376 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -123,8 +123,9 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent, Size maxBlockSize); /* Dynamic shared memory state for Memory Context Statistics reporting */ -typedef struct MemoryContextInfo +typedef struct MemoryContextInfo /* XXX I'd name this MemoryContextEntry */ { + /* XXX isn't 2 x 1kB for every context a bit too much? Maybe better to make it variable-length? */ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE]; char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE]; Datum path[MEM_CONTEXT_MAX_LEVEL]; @@ -135,7 +136,7 @@ typedef struct MemoryContextInfo int64 freespace; int64 freechunks; int num_contexts; -} MemoryContextInfo; +} MemoryContextInfo; /* XXX needs to be added to typedefs, so that pgindent works */ typedef struct MemoryContextState { -- 2.47.1