Updated patches attached.
==================================================================== pg-stat-activity-backend-memory-allocated ==================================================================== DSM allocations created by a process and not destroyed prior to it's exit are considered long lived and are tracked in global_dsm_allocated_bytes. created 2 new system views (see below): pg_stat_global_memory_allocation view displays datid, shared_memory_size, shared_memory_size_in_huge_pages, global_dsm_allocated_bytes. shared_memory_size and shared_memory_size_in_huge_pages display the calculated read only values for these GUCs. pg_stat_memory_allocation view Migrated allocated_bytes out of pg_stat_activity view into this view. pg_stat_memory_allocation also contains a breakdown of allocation by allocator type (aset, dsm, generation, slab). View displays datid, pid, allocated_bytes, aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes, slab_allocated_bytes by process. Reduced calls to initialize allocation counters by moving intialization call into InitPostmasterChild. postgres=# select * from pg_stat_global_memory_allocation; datid | shared_memory_size | shared_memory_size_in_huge_pages | global_dsm_allocated_bytes -------+--------------------+----------------------------------+---------------------------- 5 | 192MB | 96 | 1048576 (1 row) postgres=# select * from pg_stat_memory_allocation; datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 981842 | 771512 | 771512 | 0 | 0 | 0 | 981843 | 736696 | 736696 | 0 | 0 | 0 5 | 981913 | 4274792 | 4274792 | 0 | 0 | 0 | 981838 | 107216 | 107216 | 0 | 0 | 0 | 981837 | 123600 | 123600 | 0 | 0 | 0 | 981841 | 107216 | 107216 | 0 | 0 | 0 (6 rows) postgres=# select ps.datid, ps.pid, state,application_name,backend_type, pa.* from pg_stat_activity ps join pg_stat_memory_allocation pa on (pa.pid = ps.pid) order by dsm_allocated_bytes, pa.pid; datid | pid | state | application_name | backend_type | datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+--------+------------------+------------------------------+-------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 981837 | | | checkpointer | | 981837 | 123600 | 123600 | 0 | 0 | 0 | 981838 | | | background writer | | 981838 | 107216 | 107216 | 0 | 0 | 0 | 981841 | | | walwriter | | 981841 | 107216 | 107216 | 0 | 0 | 0 | 981842 | | | autovacuum launcher | | 981842 | 771512 | 771512 | 0 | 0 | 0 | 981843 | | | logical replication launcher | | 981843 | 736696 | 736696 | 0 | 0 | 0 5 | 981913 | active | psql | client backend | 5 | 981913 | 5390864 | 5382824 | 0 | 8040 | 0 (6 rows) ==================================================================== dev-max-memory ==================================================================== Include shared_memory_size in max_total_backend_memory calculations. max_total_backend_memory is reduced by shared_memory_size at startup. Local allowance is refilled when consumed from global max_total_bkend_mem_bytes_available. pg_stat_global_memory_allocation view add columns max_total_backend_memory_bytes, max_total_bkend_mem_bytes_available. max_total_backend_memory_bytes displays a byte representation of max_total_backend_memory. max_total_bkend_mem_bytes_available tracks the balance of max_total_backend_memory_bytes available to backend processes. postgres=# select * from pg_stat_global_memory_allocation; datid | shared_memory_size | shared_memory_size_in_huge_pages | max_total_backend_memory_bytes | max_total_bkend_mem_bytes_available | global_dsm_allocated_bytes -------+--------------------+----------------------------------+--------------------------------+-------------------------------------+---------------------------- 5 | 192MB | 96 | 2147483648 | 1874633712 | 5242880 (1 row) postgres=# select * from pg_stat_memory_allocation ; datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 534528 | 812472 | 812472 | 0 | 0 | 0 | 534529 | 736696 | 736696 | 0 | 0 | 0 5 | 556271 | 4458088 | 4458088 | 0 | 0 | 0 5 | 534942 | 1298680 | 1298680 | 0 | 0 | 0 5 | 709283 | 7985464 | 7985464 | 0 | 0 | 0 5 | 718693 | 8809240 | 8612504 | 196736 | 0 | 0 5 | 752113 | 25803192 | 25803192 | 0 | 0 | 0 5 | 659886 | 9042232 | 9042232 | 0 | 0 | 0 | 534525 | 2491088 | 2491088 | 0 | 0 | 0 | 534524 | 4465360 | 4465360 | 0 | 0 | 0 | 534527 | 107216 | 107216 | 0 | 0 | 0 (11 rows) postgres=# select ps.datid, ps.pid, state,application_name,backend_type, pa.* from pg_stat_activity ps join pg_stat_memory_allocation pa on (pa.pid = ps.pid) order by dsm_allocated_bytes, pa.pid; datid | pid | state | application_name | backend_type | datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes -------+--------+--------+------------------+------------------------------+-------+--------+-----------------+----------------------+---------------------+----------------------------+---------------------- | 534524 | | | checkpointer | | 534524 | 4465360 | 4465360 | 0 | 0 | 0 | 534525 | | | background writer | | 534525 | 2491088 | 2491088 | 0 | 0 | 0 | 534527 | | | walwriter | | 534527 | 107216 | 107216 | 0 | 0 | 0 | 534528 | | | autovacuum launcher | | 534528 | 812472 | 812472 | 0 | 0 | 0 | 534529 | | | logical replication launcher | | 534529 | 736696 | 736696 | 0 | 0 | 0 5 | 534942 | idle | psql | client backend | 5 | 534942 | 1298680 | 1298680 | 0 | 0 | 0 5 | 556271 | active | psql | client backend | 5 | 556271 | 4866576 | 4858536 | 0 | 8040 | 0 5 | 659886 | active | | autovacuum worker | 5 | 659886 | 8993080 | 8993080 | 0 | 0 | 0 5 | 709283 | active | | autovacuum worker | 5 | 709283 | 7928120 | 7928120 | 0 | 0 | 0 5 | 752113 | active | | autovacuum worker | 5 | 752113 | 27935608 | 27935608 | 0 | 0 | 0 5 | 718693 | active | psql | client backend | 5 | 718693 | 8669976 | 8473240 | 196736 | 0 | 0 (11 rows)
From 4dd47f04764b5df9c3962d9fdb4096398bf85dfd Mon Sep 17 00:00:00 2001 From: Reid Thompson <jreidthomp...@nc.rr.com> Date: Sat, 4 Jun 2022 22:23:59 -0400 Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be allocated to backends. This builds on the work that adds backend memory allocated tracking. Add GUC variable max_total_backend_memory. Specifies a limit to the amount of memory (in MB) that may be allocated to backends in total (i.e. this is not a per user or per backend limit). If unset, or set to 0 it is disabled. It is intended as a resource to help avoid the OOM killer on LINUX and manage resources in general. A backend request that would exhaust max_total_backend_memory memory will be denied with an out of memory error causing that backend's current query/transaction to fail. Further requests will not be allocated until dropping below the limit. Keep this in mind when setting this value. Due to the dynamic nature of memory allocations, this limit is not exact. This limit does not affect auxiliary backend processes. Backend memory allocations are displayed in the pg_stat_memory_allocation and pg_stat_global_memory_allocation views. --- doc/src/sgml/config.sgml | 28 +++ doc/src/sgml/monitoring.sgml | 48 ++++- src/backend/catalog/system_views.sql | 6 +- src/backend/storage/ipc/dsm_impl.c | 18 ++ src/backend/storage/lmgr/proc.c | 45 +++++ src/backend/utils/activity/backend_status.c | 173 ++++++++++++++++++ src/backend/utils/adt/pgstatfuncs.c | 16 +- src/backend/utils/hash/dynahash.c | 3 +- src/backend/utils/misc/guc_tables.c | 11 ++ src/backend/utils/misc/postgresql.conf.sample | 3 + src/backend/utils/mmgr/aset.c | 33 ++++ src/backend/utils/mmgr/generation.c | 16 ++ src/backend/utils/mmgr/slab.c | 16 +- src/include/catalog/pg_proc.dat | 6 +- src/include/storage/proc.h | 7 + src/include/utils/backend_status.h | 87 ++++++++- src/test/regress/expected/rules.out | 4 +- 17 files changed, 499 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 481f93cea1..9f37f6f070 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2113,6 +2113,34 @@ include_dir 'conf.d' </listitem> </varlistentry> + <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory"> + <term><varname>max_total_backend_memory</varname> (<type>integer</type>) + <indexterm> + <primary><varname>max_total_backend_memory</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies a limit to the amount of memory (MB) that may be allocated to + backends in total (i.e. this is not a per user or per backend limit). + If unset, or set to 0 it is disabled. At databse startup + max_total_backend_memory is reduced by shared_memory_size_mb + (shared buffers). Each backend process is intialized with a 1MB local + allowance which also reduces max_total_bkend_mem_bytes_available. Keep + this in mind when setting this value. A backend request that would + exhaust the limit will be denied with an out of memory error causing + that backend's current query/transaction to fail. Further requests will + not be allocated until dropping below the limit. This limit does not + affect auxiliary backend processes + <xref linkend="glossary-auxiliary-proc"/>. Backend memory allocations + (<varname>allocated_bytes</varname>) are displayed in the + <link linkend="monitoring-pg-stat-memory-allocation-view"><structname>pg_stat_memory_allocation</structname></link> + view. Due to the dynamic nature of memory allocations, this limit is + not exact. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index d943821071..a67bd484f2 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5643,9 +5643,13 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <para> The <structname>pg_stat_memory_allocation</structname> view will have one row per server process, showing information related to the current memory - allocation of that process. Use <function>pg_size_pretty</function> - described in <xref linkend="functions-admin-dbsize"/> to make these values - more easily readable. + allocation of that process in total and by allocator type. Dynamic shared + memory allocations are included only in the value displayed for the backend + that created them, they are not included in the value for backends that are + attached to them to avoid double counting. Use + <function>pg_size_pretty</function> described in + <xref linkend="functions-admin-dbsize"/> to make these values more easily + readable. </para> <table id="pg-stat-memory-allocation-view" xreflabel="pg_stat_memory_allocation"> @@ -5687,10 +5691,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para> <para> Memory currently allocated to this backend in bytes. This is the balance - of bytes allocated and freed by this backend. Dynamic shared memory - allocations are included only in the value displayed for the backend that - created them, they are not included in the value for backends that are - attached to them to avoid double counting. + of bytes allocated and freed by this backend. </para></entry> </row> @@ -5803,6 +5804,39 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para></entry> </row> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>max_total_backend_memory_bytes</structfield> <type>bigint</type> + </para> + <para> + Reports the user defined backend maximum allowed shared memory in bytes. + 0 if disabled or not set. See + <xref linkend="guc-max-total-backend-memory"/>. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>max_total_bkend_mem_bytes_available</structfield> <type>bigint</type> + </para> + <para> + Tracks max_total_backend_memory (in bytes) available for allocation. At + database startup, max_total_bkend_mem_bytes_available is reduced by the + byte equivalent of shared_memory_size_mb. Each backend process is + intialized with a 1MB local allowance which also reduces + max_total_bkend_mem_bytes_available. A process's allocation requests + reduce it's local allowance. If a process's allocation request exceeds + it's remaining allowance, an attempt is made to refill the local + allowance from max_total_bkend_mem_bytes_available. If the refill request + fails, then the requesting process will fail with an out of memory error + resulting in the cancellation of that process's active query/transaction. + The default refill allocation quantity is 1MB. If a request is greater + than 1MB, an attempt will be made to allocate the full amount. If + max_total_backend_memory is disabled, this will be -1. + <xref linkend="guc-max-total-backend-memory"/>. + </para></entry> + </row> + <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 4bbd992311..86bde2a44c 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1346,8 +1346,10 @@ CREATE VIEW pg_stat_memory_allocation AS CREATE VIEW pg_stat_global_memory_allocation AS SELECT S.datid AS datid, - current_setting('shared_memory_size'::text, true) AS shared_memory_size, - (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages, + current_setting('shared_memory_size', true) as shared_memory_size, + (current_setting('shared_memory_size_in_huge_pages', true))::integer as shared_memory_size_in_huge_pages, + pg_size_bytes(current_setting('max_total_backend_memory', true)) as max_total_backend_memory_bytes, + S.max_total_bkend_mem_bytes_available, S.global_dsm_allocated_bytes FROM pg_stat_get_global_memory_allocation() AS S LEFT JOIN pg_database AS D ON (S.datid = D.oid); diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c index 16e2bded59..68780de717 100644 --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -254,6 +254,16 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, return true; } + /* Do not exceed maximum allowed memory allocation */ + if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size)) + { + ereport(elevel, + (errcode_for_dynamic_shared_memory(), + errmsg("out of memory for segment \"%s\" - exceeds max_total_backend_memory: %m", + name))); + return false; + } + /* * Create new segment or open an existing one for attach. * @@ -522,6 +532,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size, int flags = IPCProtection; size_t segsize; + /* Do not exceed maximum allowed memory allocation */ + if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size)) + return false; + /* * Allocate the memory BEFORE acquiring the resource, so that we don't * leak the resource if memory allocation fails. @@ -716,6 +730,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size, return true; } + /* Do not exceed maximum allowed memory allocation */ + if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size)) + return false; + /* Create new segment or open an existing one for attach. */ if (op == DSM_OP_CREATE) { diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index d86fbdfd9b..80db49d775 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -51,6 +51,7 @@ #include "storage/procsignal.h" #include "storage/spin.h" #include "storage/standby.h" +#include "utils/guc.h" #include "utils/timeout.h" #include "utils/timestamp.h" @@ -182,6 +183,50 @@ InitProcGlobal(void) pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO); pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0); + /* Setup backend memory limiting if configured */ + if (max_total_bkend_mem > 0) + { + /* + * Convert max_total_bkend_mem to bytes, account for shared_memory_size, + * and initialize max_total_bkend_mem_bytes. + */ + int result = 0; + + /* Get integer value of shared_memory_size */ + if (parse_int(GetConfigOption("shared_memory_size", true, false), &result, 0, NULL)) + { + /* + * Error on startup if backend memory limit is less than shared + * memory size. Warn on startup if backend memory available is less + * than arbitrarily picked value of 100MB. + */ + + if (max_total_bkend_mem - result <= 0) + { + ereport(ERROR, + errmsg("configured max_total_backend_memory %dMB is <= shared_memory_size %dMB", + max_total_bkend_mem, result), + errhint("Disable or increase the configuration parameter \"max_total_backend_memory\".")); + } + else if (max_total_bkend_mem - result <= 100) + { + ereport(WARNING, + errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB", + max_total_bkend_mem, result), + errhint("Consider increasing the configuration parameter \"max_total_backend_memory\".")); + } + + /* + * Account for shared memory size and initialize + * max_total_bkend_mem_bytes. + */ + pg_atomic_init_u64(&ProcGlobal->max_total_bkend_mem_bytes, + max_total_bkend_mem * 1024 * 1024 - result * 1024 * 1024); + } + else + ereport(ERROR, errmsg("max_total_backend_memory initialization is unable to parse shared_memory_size")); + } + /* * Create and initialize all the PGPROC structures we'll need. There are * five separate consumers: (1) normal backends, (2) autovacuum workers diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index f921c4bbde..a4f9c6eb35 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -45,6 +45,12 @@ bool pgstat_track_activities = false; int pgstat_track_activity_query_size = 1024; +/* + * Max backend memory allocation allowed (MB). 0 = disabled. + * Centralized bucket ProcGlobal->max_total_bkend_mem is initialized + * as a byte representation of this value in InitProcGlobal(). + */ +int max_total_bkend_mem = 0; /* exposed so that backend_progress.c can access it */ PgBackendStatus *MyBEEntry = NULL; @@ -68,6 +74,31 @@ uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes; uint64 local_my_slab_allocated_bytes = 0; uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes; +/* + * Define initial allocation allowance for a backend. + * + * NOTE: initial_allocation_allowance && allocation_allowance_refill_qty + * may be candidates for future GUC variables. Arbitrary 1MB selected initially. + */ +uint64 initial_allocation_allowance = 1024 * 1024; +uint64 allocation_allowance_refill_qty = 1024 * 1024; + +/* + * Local counter to manage shared memory allocations. At backend startup, set to + * initial_allocation_allowance via pgstat_init_allocated_bytes(). Decrease as + * memory is malloc'd. When exhausted, atomically refill if available from + * ProcGlobal->max_total_bkend_mem via exceeds_max_total_bkend_mem(). + */ +uint64 allocation_allowance = 0; + +/* + * Local counter of free'd shared memory. Return to global + * max_total_bkend_mem when return threshold is met. Arbitrary 1MB bytes + * selected initially. + */ +uint64 allocation_return = 0; +uint64 allocation_return_threshold = 1024 * 1024; + static PgBackendStatus *BackendStatusArray = NULL; static char *BackendAppnameBuffer = NULL; static char *BackendClientHostnameBuffer = NULL; @@ -1271,6 +1302,8 @@ pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes, my_slab_allocated_bytes = slab_allocated_bytes; *slab_allocated_bytes = local_my_slab_allocated_bytes; + + return; } /* @@ -1294,6 +1327,23 @@ pgstat_reset_allocated_bytes_storage(void) *my_dsm_allocated_bytes); } + /* + * When limiting maximum backend memory, return this backend's memory + * allocations to global. + */ + if (max_total_bkend_mem) + { + volatile PROC_HDR *procglobal = ProcGlobal; + + pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes, + *my_allocated_bytes + allocation_allowance + + allocation_return); + + /* Reset memory allocation variables */ + allocation_allowance = 0; + allocation_return = 0; + } + /* Reset memory allocation variables */ *my_allocated_bytes = local_my_allocated_bytes = 0; *my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0; @@ -1307,4 +1357,127 @@ pgstat_reset_allocated_bytes_storage(void) my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes; my_generation_allocated_bytes = &local_my_generation_allocated_bytes; my_slab_allocated_bytes = &local_my_slab_allocated_bytes; + + return; +} + +/* + * Determine if allocation request will exceed max backend memory allowed. + * Do not apply to auxiliary processes. + * Refill allocation request bucket when needed/possible. + */ +bool +exceeds_max_total_bkend_mem(uint64 allocation_request) +{ + bool result = false; + + /* + * When limiting maximum backend memory, attempt to refill allocation + * request bucket if needed. + */ + if (max_total_bkend_mem && allocation_request > allocation_allowance) + { + volatile PROC_HDR *procglobal = ProcGlobal; + uint64 available_max_total_bkend_mem = 0; + bool sts = false; + + /* + * If allocation request is larger than memory refill quantity then + * attempt to increase allocation allowance with requested amount, + * otherwise fall through. If this refill fails we do not have enough + * memory to meet the request. + */ + if (allocation_request >= allocation_allowance_refill_qty) + { + while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_request) + { + if ((result = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes, + &available_max_total_bkend_mem, + available_max_total_bkend_mem - allocation_request))) + { + allocation_allowance = allocation_allowance + allocation_request; + break; + } + } + + /* + * If the atomic exchange fails, we do not have enough reserve + * memory to meet the request. Negate result to return the proper + * value. + */ + return !result; + } + + /* + * Attempt to increase allocation allowance by memory refill quantity. + * If available memory is/becomes less than memory refill quantity, + * fall through to attempt to allocate remaining available memory. + */ + while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_allowance_refill_qty) + { + if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes, + &available_max_total_bkend_mem, + available_max_total_bkend_mem - allocation_allowance_refill_qty))) + { + allocation_allowance = allocation_allowance + allocation_allowance_refill_qty; + break; + } + } + + if (!sts) + { + /* + * If available_max_total_bkend_mem is 0, no memory is currently + * available to refill with, otherwise attempt to allocate + * remaining memory available if it exceeds the requested amount + * or the requested amount if more than requested amount gets + * returned while looping. + */ + while ((available_max_total_bkend_mem = (int64) pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) > 0) + { + uint64 newval = 0; + + /* + * If available memory is less than requested allocation we + * cannot fulfil request. + */ + if (available_max_total_bkend_mem < allocation_request) + break; + + /* + * If we happen to loop and a large chunk of memory has been + * returned to global, allocate request amount only. + */ + if (available_max_total_bkend_mem > allocation_request) + newval = available_max_total_bkend_mem - allocation_request; + + /* Allocate memory */ + if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes, + &available_max_total_bkend_mem, + newval))) + { + allocation_allowance = allocation_allowance + + newval == 0 ? available_max_total_bkend_mem : allocation_request; + + break; + } + } + } + + /* + * If refill is not successful, we return true, memory limit exceeded + */ + if (!sts) + result = true; + } + + /* + * Exclude auxiliary processes from the check. Return false. While we want + * to exclude them from the check, we do not want to exclude them from the + * above allocation handling. + */ + if (MyAuxProcType != NotAnAuxProcess) + result = false; + + return result; } diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index be973b1bdb..73cf3be4e3 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -2128,7 +2128,7 @@ pg_stat_get_memory_allocation(PG_FUNCTION_ARGS) Datum pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS) { -#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2 +#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 3 TupleDesc tupdesc; Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0}; bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0}; @@ -2138,15 +2138,23 @@ pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS) tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS); TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid", OIDOID, -1, 0); - TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes", + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "max_total_bkend_mem_bytes_available", + INT8OID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "global_dsm_allocated_bytes", INT8OID, -1, 0); BlessTupleDesc(tupdesc); /* datid */ values[0] = ObjectIdGetDatum(MyDatabaseId); - /* get global_dsm_allocated_bytes */ - values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation)); + /* Get max_total_bkend_mem_bytes - return -1 if disabled */ + if (max_total_bkend_mem == 0) + values[1] = Int64GetDatum(-1); + else + values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)); + + /* Get global_dsm_allocated_bytes */ + values[2] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation)); /* Returns the record as Datum */ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 012d4a0b1f..cd68e5265a 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -104,7 +104,6 @@ #include "utils/dynahash.h" #include "utils/memutils.h" - /* * Constants * @@ -359,7 +358,6 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags) Assert(flags & HASH_ELEM); Assert(info->keysize > 0); Assert(info->entrysize >= info->keysize); - /* * For shared hash tables, we have a local hash header (HTAB struct) that * we allocate in TopMemoryContext; all else is in shared memory. @@ -377,6 +375,7 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags) } else { + /* Set up to allocate the hash header */ /* Create the hash table's private memory context */ if (flags & HASH_CONTEXT) CurrentDynaHashCxt = info->hcxt; diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 1c0583fe26..639b63138b 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -3468,6 +3468,17 @@ struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM, + gettext_noop("Restrict total backend memory allocations to this max."), + gettext_noop("0 turns this feature off."), + GUC_UNIT_MB + }, + &max_total_bkend_mem, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index d06074b86f..bc2d449c87 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -156,6 +156,9 @@ # mmap # (change requires restart) #min_dynamic_shared_memory = 0MB # (change requires restart) +#max_total_backend_memory = 0MB # Restrict total backend memory allocations + # to this max (in MB). 0 turns this feature + # off. # - Disk - diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index f3f5945fdf..4a83a2f60f 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -440,6 +440,18 @@ AllocSetContextCreateInternal(MemoryContext parent, else firstBlockSize = Max(firstBlockSize, initBlockSize); + /* Do not exceed maximum allowed memory allocation */ + if (exceeds_max_total_bkend_mem(firstBlockSize)) + { + if (TopMemoryContext) + MemoryContextStats(TopMemoryContext); + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory - exceeds max_total_backend_memory"), + errdetail("Failed while creating memory context \"%s\".", + name))); + } + /* * Allocate the initial block. Unlike other aset.c blocks, it starts with * the context header and its block header follows that. @@ -741,6 +753,11 @@ AllocSetAlloc(MemoryContext context, Size size) #endif blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; + + /* Do not exceed maximum allowed memory allocation */ + if (exceeds_max_total_bkend_mem(blksize)) + return NULL; + block = (AllocBlock) malloc(blksize); if (block == NULL) return NULL; @@ -938,6 +955,10 @@ AllocSetAlloc(MemoryContext context, Size size) while (blksize < required_size) blksize <<= 1; + /* Do not exceed maximum allowed memory allocation */ + if (exceeds_max_total_bkend_mem(blksize)) + return NULL; + /* Try to allocate it */ block = (AllocBlock) malloc(blksize); @@ -1176,6 +1197,18 @@ AllocSetRealloc(void *pointer, Size size) blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; oldblksize = block->endptr - ((char *) block); + /* + * Do not exceed maximum allowed memory allocation. NOTE: checking for + * the full size here rather than just the amount of increased + * allocation to prevent a potential underflow of *my_allocation + * allowance in cases where blksize - oldblksize does not trigger a + * refill but blksize is greater than *my_allocation_allowance. + * Underflow would occur with the call below to + * pgstat_report_allocated_bytes_increase() + */ + if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize)) + return NULL; + block = (AllocBlock) realloc(block, blksize); if (block == NULL) { diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c index 5708e8da7a..584b2ec8ef 100644 --- a/src/backend/utils/mmgr/generation.c +++ b/src/backend/utils/mmgr/generation.c @@ -201,6 +201,16 @@ GenerationContextCreate(MemoryContext parent, else allocSize = Max(allocSize, initBlockSize); + if (exceeds_max_total_bkend_mem(allocSize)) + { + MemoryContextStats(TopMemoryContext); + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory - exceeds max_total_backend_memory"), + errdetail("Failed while creating memory context \"%s\".", + name))); + } + /* * Allocate the initial block. Unlike other generation.c blocks, it * starts with the context header and its block header follows that. @@ -380,6 +390,9 @@ GenerationAlloc(MemoryContext context, Size size) { Size blksize = required_size + Generation_BLOCKHDRSZ; + if (exceeds_max_total_bkend_mem(blksize)) + return NULL; + block = (GenerationBlock *) malloc(blksize); if (block == NULL) return NULL; @@ -483,6 +496,9 @@ GenerationAlloc(MemoryContext context, Size size) if (blksize < required_size) blksize = pg_nextpower2_size_t(required_size); + if (exceeds_max_total_bkend_mem(blksize)) + return NULL; + block = (GenerationBlock *) malloc(blksize); if (block == NULL) diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c index 31814901f3..80e8b95071 100644 --- a/src/backend/utils/mmgr/slab.c +++ b/src/backend/utils/mmgr/slab.c @@ -356,9 +356,19 @@ SlabContextCreate(MemoryContext parent, elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks", blockSize, chunkSize); - + /* Do not exceed maximum allowed memory allocation */ + if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock))) + { + MemoryContextStats(TopMemoryContext); + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory - exceeds max_total_backend_memory"), + errdetail("Failed while creating memory context \"%s\".", + name))); + } slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock)); + if (slab == NULL) { MemoryContextStats(TopMemoryContext); @@ -560,6 +570,10 @@ SlabAlloc(MemoryContext context, Size size) } else { + /* Do not exceed maximum allowed memory allocation */ + if (exceeds_max_total_bkend_mem(slab->blockSize)) + return NULL; + block = (SlabBlock *) malloc(slab->blockSize); if (unlikely(block == NULL)) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index d6fbca4a1e..8937764a46 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5440,9 +5440,9 @@ descr => 'statistics: global memory allocation information', proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f', provolatile => 's', proparallel => 'r', prorettype => 'record', - proargtypes => '', proallargtypes => '{oid,int8}', - proargmodes => '{o,o}', - proargnames => '{datid,global_dsm_allocated_bytes}', + proargtypes => '', proallargtypes => '{oid,int8,int8}', + proargmodes => '{o,o,o}', + proargnames => '{datid,max_total_bkend_mem_bytes_available,global_dsm_allocated_bytes}', prosrc =>'pg_stat_get_global_memory_allocation' }, { oid => '2022', descr => 'statistics: information about currently active backends', diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index c2c878219d..a2a5364a85 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -406,6 +406,13 @@ typedef struct PROC_HDR int startupBufferPinWaitBufId; /* Global dsm allocations */ pg_atomic_uint64 global_dsm_allocation; + + /* + * Max backend memory allocation tracker. Used/Initialized when + * max_total_bkend_mem > 0 as max_total_bkend_mem (MB) converted to bytes. + * Decreases/increases with free/malloc of backend memory. + */ + pg_atomic_uint64 max_total_bkend_mem_bytes; } PROC_HDR; extern PGDLLIMPORT PROC_HDR *ProcGlobal; diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 6434ece1ef..bca6fe10f3 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -15,6 +15,7 @@ #include "libpq/pqcomm.h" #include "miscadmin.h" /* for BackendType */ #include "storage/backendid.h" +#include "storage/proc.h" #include "utils/backend_progress.h" @@ -304,6 +305,7 @@ typedef struct LocalPgBackendStatus */ extern PGDLLIMPORT bool pgstat_track_activities; extern PGDLLIMPORT int pgstat_track_activity_query_size; +extern PGDLLIMPORT int max_total_bkend_mem; /* ---------- @@ -316,6 +318,10 @@ extern PGDLLIMPORT uint64 *my_aset_allocated_bytes; extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes; extern PGDLLIMPORT uint64 *my_generation_allocated_bytes; extern PGDLLIMPORT uint64 *my_slab_allocated_bytes; +extern PGDLLIMPORT uint64 allocation_allowance; +extern PGDLLIMPORT uint64 initial_allocation_allowance; +extern PGDLLIMPORT uint64 allocation_return; +extern PGDLLIMPORT uint64 allocation_return_threshold; /* ---------- @@ -363,6 +369,7 @@ extern int pgstat_fetch_stat_numbackends(void); extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid); extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid); extern char *pgstat_clip_activity(const char *raw_activity); +extern bool exceeds_max_total_bkend_mem(uint64 allocation_request); /* ---------- * pgstat_report_allocated_bytes_decrease() - @@ -384,6 +391,10 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes, /* On overflow, set pgstat count of allocated bytes to zero */ *my_allocated_bytes = 0; + /* Add freed memory to allocation return counter. */ + allocation_return += proc_allocated_bytes; + + /* On overflow, set allocator type bytes to zero */ switch (pg_allocator_type) { case PG_ALLOC_ASET: @@ -399,13 +410,35 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes, *my_slab_allocated_bytes = 0; break; } + + /* + * Return freed memory to the global counter if return threshold is + * met. + */ + if (max_total_bkend_mem && allocation_return >= allocation_return_threshold) + { + if (ProcGlobal) + { + volatile PROC_HDR *procglobal = ProcGlobal; + + /* Add to global tracker */ + pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes, + allocation_return); + + /* Restart the count */ + allocation_return = 0; + } + } } else { /* decrease allocation */ *my_allocated_bytes -= proc_allocated_bytes; - /* Decrease allocator type allocated bytes. */ + /* Add freed memory to allocation return counter */ + allocation_return += proc_allocated_bytes; + + /* Decrease allocator type allocated bytes */ switch (pg_allocator_type) { case PG_ALLOC_ASET: @@ -427,6 +460,25 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes, *my_slab_allocated_bytes -= proc_allocated_bytes; break; } + + /* + * Return freed memory to the global counter if return threshold is + * met. + */ + if (max_total_bkend_mem && allocation_return >= allocation_return_threshold) + { + if (ProcGlobal) + { + volatile PROC_HDR *procglobal = ProcGlobal; + + /* Add to global tracker */ + pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes, + allocation_return); + + /* Restart the count */ + allocation_return = 0; + } + } } return; @@ -444,6 +496,9 @@ static inline void pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes, int pg_allocator_type) { + /* Remove allocated memory from local allocation allowance */ + allocation_allowance -= proc_allocated_bytes; + *my_allocated_bytes += proc_allocated_bytes; /* Increase allocator type allocated bytes */ @@ -488,6 +543,36 @@ pgstat_init_allocated_bytes(void) *my_generation_allocated_bytes = 0; *my_slab_allocated_bytes = 0; + /* If we're limiting backend memory */ + if (max_total_bkend_mem) + { + volatile PROC_HDR *procglobal = ProcGlobal; + uint64 available_max_total_bkend_mem = 0; + + allocation_return = 0; + allocation_allowance = 0; + + /* Account for the initial allocation allowance */ + while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= initial_allocation_allowance) + { + /* + * On success populate allocation_allowance. Failure here will + * result in the backend's first invocation of + * exceeds_max_total_bkend_mem allocating requested, default, or + * available memory or result in an out of memory error. + */ + if (pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes, + &available_max_total_bkend_mem, + available_max_total_bkend_mem - + initial_allocation_allowance)) + { + allocation_allowance = initial_allocation_allowance; + + break; + } + } + } + return; } diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 9cf035a74a..0edd7d387c 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1874,8 +1874,10 @@ pg_stat_database_conflicts| SELECT oid AS datid, pg_stat_global_memory_allocation| SELECT s.datid, current_setting('shared_memory_size'::text, true) AS shared_memory_size, (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages, + pg_size_bytes(current_setting('max_total_backend_memory'::text, true)) AS max_total_backend_memory_bytes, + s.max_total_bkend_mem_bytes_available, s.global_dsm_allocated_bytes - FROM (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes) + FROM (pg_stat_get_global_memory_allocation() s(datid, max_total_bkend_mem_bytes_available, global_dsm_allocated_bytes) LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_gssapi| SELECT pid, gss_auth AS gss_authenticated, -- 2.25.1
From 752d40bcefa66afc8c73976990d3d5943c35bf0d Mon Sep 17 00:00:00 2001 From: Reid Thompson <jreidthomp...@nc.rr.com> Date: Thu, 11 Aug 2022 12:01:25 -0400 Subject: [PATCH 1/2] Add tracking of backend memory allocated Add tracking of backend memory allocated in total and by allocation type (aset, dsm, generation, slab) by process. allocated_bytes tracks the current bytes of memory allocated to the backend process. aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes and slab_allocated_bytes track the allocation by type for the backend process. They are updated for the process as memory is malloc'd/freed. Memory allocated to items on the freelist is included. Dynamic shared memory allocations are included only in the value displayed for the backend that created them, they are not included in the value for backends that are attached to them to avoid double counting. DSM allocations that are not destroyed by the creating process prior to it's exit are considered long lived and are tracked in a global counter global_dsm_allocated_bytes. We limit the floor of allocation counters to zero. Created views pg_stat_global_memory_allocation and pg_stat_memory_allocation for access to these trackers. --- doc/src/sgml/monitoring.sgml | 188 ++++++++++++++++++++ src/backend/catalog/system_views.sql | 21 +++ src/backend/storage/ipc/dsm.c | 11 +- src/backend/storage/ipc/dsm_impl.c | 78 ++++++++ src/backend/storage/lmgr/proc.c | 1 + src/backend/utils/activity/backend_status.c | 114 ++++++++++++ src/backend/utils/adt/pgstatfuncs.c | 84 +++++++++ src/backend/utils/init/miscinit.c | 3 + src/backend/utils/mmgr/aset.c | 17 ++ src/backend/utils/mmgr/generation.c | 15 ++ src/backend/utils/mmgr/slab.c | 23 +++ src/include/catalog/pg_proc.dat | 17 ++ src/include/storage/proc.h | 2 + src/include/utils/backend_status.h | 156 +++++++++++++++- src/test/regress/expected/rules.out | 15 ++ src/test/regress/expected/stats.out | 36 ++++ src/test/regress/sql/stats.sql | 20 +++ 17 files changed, 799 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 21e6ce2841..d943821071 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5633,6 +5633,194 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </sect2> + <sect2 id="monitoring-pg-stat-memory-allocation-view"> + <title><structname>pg_stat_memory_allocation</structname></title> + + <indexterm> + <primary>pg_stat_memory_allocation</primary> + </indexterm> + + <para> + The <structname>pg_stat_memory_allocation</structname> view will have one + row per server process, showing information related to the current memory + allocation of that process. Use <function>pg_size_pretty</function> + described in <xref linkend="functions-admin-dbsize"/> to make these values + more easily readable. + </para> + + <table id="pg-stat-memory-allocation-view" xreflabel="pg_stat_memory_allocation"> + <title><structname>pg_stat_memory_allocation</structname> View</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + Column Type + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>datid</structfield> <type>oid</type> + </para> + <para> + OID of the database this backend is connected to + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>pid</structfield> <type>integer</type> + </para> + <para> + Process ID of this backend + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + Memory currently allocated to this backend in bytes. This is the balance + of bytes allocated and freed by this backend. Dynamic shared memory + allocations are included only in the value displayed for the backend that + created them, they are not included in the value for backends that are + attached to them to avoid double counting. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>aset_allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + Memory currently allocated to this backend in bytes via the allocation + set allocator. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>dsm_allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + Memory currently allocated to this backend in bytes via the dynamic + shared memory allocator. Upon process exit, dsm allocations that have + not been freed are considered long lived and added to + <structfield>global_dsm_allocated_bytes</structfield> found in the + pg_stat_global_memory_allocation view. See + <xref linkend="monitoring-pg-stat-global-memory-allocation-view"/>. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>generation_allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + Memory currently allocated to this backend in bytes via the generation + allocator. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>slab_allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + Memory currently allocated to this backend in bytes via the slab + allocator. + </para></entry> + </row> + + </tbody> + </tgroup> + </table> + + </sect2> + + <sect2 id="monitoring-pg-stat-global-memory-allocation-view"> + <title><structname>pg_stat_global_memory_allocation</structname></title> + + <indexterm> + <primary>pg_stat_global-memory_allocation</primary> + </indexterm> + + <para> + The <structname>pg_stat_global_memory_allocation</structname> view will + have one row showing information related to current shared memory + allocations. + </para> + + <table id="pg-stat-global-memory-allocation-view" xreflabel="pg_stat_global_memory_allocation"> + <title><structname>pg_stat_global_memory_allocation</structname> View</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + Column Type + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>datid</structfield> <type>oid</type> + </para> + <para> + OID of the database this backend is connected to + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>shared_memory_size_mb</structfield> <type>integer</type> + </para> + <para> + Reports the size of the main shared memory area, rounded up to the + nearest megabyte. See <xref linkend="guc-shared-memory-size"/>. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>shared_memory_size_in_huge_pages</structfield> <type>bigint</type> + </para> + <para> + Reports the number of huge pages that are needed for the main shared + memory area based on the specified huge_page_size. If huge pages are not + supported, this will be -1. See + <xref linkend="guc-shared-memory-size-in-huge-pages"/>. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + Long lived dynamically allocated memory currently allocated to the + database. Use <function>pg_size_pretty</function> described in + <xref linkend="functions-admin-dbsize"/> to make this value more easily + readable. + </para></entry> + </row> + + </tbody> + </tgroup> + </table> + + </sect2> + <sect2 id="monitoring-stats-functions"> <title>Statistics Functions</title> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 8ea159dbde..4bbd992311 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1330,3 +1330,24 @@ CREATE VIEW pg_stat_subscription_stats AS ss.stats_reset FROM pg_subscription as s, pg_stat_get_subscription_stats(s.oid) as ss; + +CREATE VIEW pg_stat_memory_allocation AS + SELECT + S.datid AS datid, + S.pid, + S.allocated_bytes, + S.aset_allocated_bytes, + S.dsm_allocated_bytes, + S.generation_allocated_bytes, + S.slab_allocated_bytes + FROM pg_stat_get_memory_allocation(NULL) AS S + LEFT JOIN pg_database AS D ON (S.datid = D.oid); + +CREATE VIEW pg_stat_global_memory_allocation AS + SELECT + S.datid AS datid, + current_setting('shared_memory_size'::text, true) AS shared_memory_size, + (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages, + S.global_dsm_allocated_bytes + FROM pg_stat_get_global_memory_allocation() AS S + LEFT JOIN pg_database AS D ON (S.datid = D.oid); diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c index 10b029bb16..64b1fecd1c 100644 --- a/src/backend/storage/ipc/dsm.c +++ b/src/backend/storage/ipc/dsm.c @@ -775,6 +775,15 @@ dsm_detach_all(void) void dsm_detach(dsm_segment *seg) { + /* + * Retain mapped_size to pass into destroy call in cases where the detach + * is the last reference. mapped_size is zeroed as part of the detach + * process, but is needed later in these cases for dsm_allocated_bytes + * accounting. + */ + Size local_seg_mapped_size = seg->mapped_size; + Size *ptr_local_seg_mapped_size = &local_seg_mapped_size; + /* * Invoke registered callbacks. Just in case one of those callbacks * throws a further error that brings us back here, pop the callback @@ -855,7 +864,7 @@ dsm_detach(dsm_segment *seg) */ if (is_main_region_dsm_handle(seg->handle) || dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private, - &seg->mapped_address, &seg->mapped_size, WARNING)) + &seg->mapped_address, ptr_local_seg_mapped_size, WARNING)) { LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE); if (is_main_region_dsm_handle(seg->handle)) diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c index f0965c3481..16e2bded59 100644 --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -66,6 +66,7 @@ #include "postmaster/postmaster.h" #include "storage/dsm_impl.h" #include "storage/fd.h" +#include "utils/backend_status.h" #include "utils/guc.h" #include "utils/memutils.h" @@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, name))); return false; } + + /* + * Detach and destroy pass through here, only decrease the memory + * shown allocated in pg_stat_activity when the creator destroys the + * allocation. + */ + if (op == DSM_OP_DESTROY) + pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM); *mapped_address = NULL; *mapped_size = 0; if (op == DSM_OP_DESTROY && shm_unlink(name) != 0) @@ -332,6 +341,33 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, name))); return false; } + + /* + * Attach and create pass through here, only update backend memory + * allocated in pg_stat_activity for the creator process. + */ + if (op == DSM_OP_CREATE) + { + /* + * Posix creation calls dsm_impl_posix_resize implying that resizing + * occurs or may be added in the future. As implemented + * dsm_impl_posix_resize utilizes fallocate or truncate, passing the + * whole new size as input, growing the allocation as needed (only + * truncate supports shrinking). We update by replacing the old + * allocation with the new. + */ +#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__) + /* + * posix_fallocate does not shrink allocations, adjust only on + * allocation increase. + */ + if (request_size > *mapped_size) + pgstat_report_allocated_bytes_increase(request_size - *mapped_size, PG_ALLOC_DSM); +#else + pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM); + pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM); +#endif + } *mapped_address = address; *mapped_size = request_size; close(fd); @@ -537,6 +573,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size, name))); return false; } + + /* + * Detach and destroy pass through here, only decrease the memory + * shown allocated in pg_stat_activity when the creator destroys the + * allocation. + */ + if (op == DSM_OP_DESTROY) + pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM); *mapped_address = NULL; *mapped_size = 0; if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0) @@ -584,6 +628,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size, name))); return false; } + + /* + * Attach and create pass through here, only update backend memory + * allocated in pg_stat_activity for the creator process. + */ + if (op == DSM_OP_CREATE) + pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM); *mapped_address = address; *mapped_size = request_size; @@ -652,6 +703,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size, return false; } + /* + * Detach and destroy pass through here, only decrease the memory + * shown allocated in pg_stat_activity when the creator destroys the + * allocation. + */ + if (op == DSM_OP_DESTROY) + pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM); *impl_private = NULL; *mapped_address = NULL; *mapped_size = 0; @@ -768,6 +826,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size, return false; } + /* + * Attach and create pass through here, only update backend memory + * allocated in pg_stat_activity for the creator process. + */ + if (op == DSM_OP_CREATE) + pgstat_report_allocated_bytes_increase(info.RegionSize, PG_ALLOC_DSM); *mapped_address = address; *mapped_size = info.RegionSize; *impl_private = hmap; @@ -812,6 +876,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size, name))); return false; } + + /* + * Detach and destroy pass through here, only decrease the memory + * shown allocated in pg_stat_activity when the creator destroys the + * allocation. + */ + pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM); *mapped_address = NULL; *mapped_size = 0; if (op == DSM_OP_DESTROY && unlink(name) != 0) @@ -933,6 +1004,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size, name))); return false; } + + /* + * Attach and create pass through here, only update backend memory + * allocated in pg_stat_activity for the creator process. + */ + if (op == DSM_OP_CREATE) + pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM); *mapped_address = address; *mapped_size = request_size; diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 22b4278610..d86fbdfd9b 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -180,6 +180,7 @@ InitProcGlobal(void) ProcGlobal->checkpointerLatch = NULL; pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO); pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO); + pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0); /* * Create and initialize all the PGPROC structures we'll need. There are diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 608d01ea0d..f921c4bbde 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -49,6 +49,24 @@ int pgstat_track_activity_query_size = 1024; /* exposed so that backend_progress.c can access it */ PgBackendStatus *MyBEEntry = NULL; +/* + * Memory allocated to this backend prior to pgstats initialization. Migrated to + * shared memory on pgstats initialization. + */ +uint64 local_my_allocated_bytes = 0; +uint64 *my_allocated_bytes = &local_my_allocated_bytes; + +/* Memory allocated to this backend by type prior to pgstats initialization. + * Migrated to shared memory on pgstats initialization + */ +uint64 local_my_aset_allocated_bytes = 0; +uint64 *my_aset_allocated_bytes = &local_my_aset_allocated_bytes; +uint64 local_my_dsm_allocated_bytes = 0; +uint64 *my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes; +uint64 local_my_generation_allocated_bytes = 0; +uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes; +uint64 local_my_slab_allocated_bytes = 0; +uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes; static PgBackendStatus *BackendStatusArray = NULL; static char *BackendAppnameBuffer = NULL; @@ -400,6 +418,32 @@ pgstat_bestart(void) lbeentry.st_progress_command_target = InvalidOid; lbeentry.st_query_id = UINT64CONST(0); + /* Alter allocation reporting from local storage to shared memory */ + pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes, + &MyBEEntry->aset_allocated_bytes, + &MyBEEntry->dsm_allocated_bytes, + &MyBEEntry->generation_allocated_bytes, + &MyBEEntry->slab_allocated_bytes); + + /* + * Populate sum of memory allocated prior to pgstats initialization to + * pgstats and zero the local variable. This is a += assignment because + * InitPostgres allocates memory after pgstat_beinit but prior to + * pgstat_bestart so we have allocations to both local and shared memory + * to combine. + */ + lbeentry.allocated_bytes += local_my_allocated_bytes; + local_my_allocated_bytes = 0; + lbeentry.aset_allocated_bytes += local_my_aset_allocated_bytes; + local_my_aset_allocated_bytes = 0; + + lbeentry.dsm_allocated_bytes += local_my_dsm_allocated_bytes; + local_my_dsm_allocated_bytes = 0; + lbeentry.generation_allocated_bytes += local_my_generation_allocated_bytes; + local_my_generation_allocated_bytes = 0; + lbeentry.slab_allocated_bytes += local_my_slab_allocated_bytes; + local_my_slab_allocated_bytes = 0; + /* * we don't zero st_progress_param here to save cycles; nobody should * examine it until st_progress_command has been set to something other @@ -459,6 +503,9 @@ pgstat_beshutdown_hook(int code, Datum arg) { volatile PgBackendStatus *beentry = MyBEEntry; + /* Stop reporting memory allocation changes to shared memory */ + pgstat_reset_allocated_bytes_storage(); + /* * Clear my status entry, following the protocol of bumping st_changecount * before and after. We use a volatile pointer here to ensure the @@ -1194,3 +1241,70 @@ pgstat_clip_activity(const char *raw_activity) return activity; } + +/* + * Configure bytes allocated reporting to report allocated bytes to + * shared memory. + * + * Expected to be called during backend startup (in pgstat_bestart), to point + * allocated bytes accounting into shared memory. + */ +void +pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes, + uint64 *aset_allocated_bytes, + uint64 *dsm_allocated_bytes, + uint64 *generation_allocated_bytes, + uint64 *slab_allocated_bytes) +{ + /* Map allocations to shared memory */ + my_allocated_bytes = allocated_bytes; + *allocated_bytes = local_my_allocated_bytes; + + my_aset_allocated_bytes = aset_allocated_bytes; + *aset_allocated_bytes = local_my_aset_allocated_bytes; + + my_dsm_allocated_bytes = dsm_allocated_bytes; + *dsm_allocated_bytes = local_my_dsm_allocated_bytes; + + my_generation_allocated_bytes = generation_allocated_bytes; + *generation_allocated_bytes = local_my_generation_allocated_bytes; + + my_slab_allocated_bytes = slab_allocated_bytes; + *slab_allocated_bytes = local_my_slab_allocated_bytes; +} + +/* + * Reset allocated bytes storage location. + * + * Expected to be called during backend shutdown, before the locations set up + * by pgstat_set_allocated_bytes_storage become invalid. + */ +void +pgstat_reset_allocated_bytes_storage(void) +{ + if (ProcGlobal) + { + volatile PROC_HDR *procglobal = ProcGlobal; + + /* + * Add dsm allocations that have not been freed to global dsm + * accounting + */ + pg_atomic_add_fetch_u64(&procglobal->global_dsm_allocation, + *my_dsm_allocated_bytes); + } + + /* Reset memory allocation variables */ + *my_allocated_bytes = local_my_allocated_bytes = 0; + *my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0; + *my_dsm_allocated_bytes = local_my_dsm_allocated_bytes = 0; + *my_generation_allocated_bytes = local_my_generation_allocated_bytes = 0; + *my_slab_allocated_bytes = local_my_slab_allocated_bytes = 0; + + /* Point my_{*_}allocated_bytes from shared memory back to local */ + my_allocated_bytes = &local_my_allocated_bytes; + my_aset_allocated_bytes = &local_my_aset_allocated_bytes; + my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes; + my_generation_allocated_bytes = &local_my_generation_allocated_bytes; + my_slab_allocated_bytes = &local_my_slab_allocated_bytes; +} diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 56119737c8..be973b1bdb 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -2067,3 +2067,87 @@ pg_stat_have_stats(PG_FUNCTION_ARGS) PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid)); } + +/* + * Get the memory allocation of PG backends. + */ +Datum +pg_stat_get_memory_allocation(PG_FUNCTION_ARGS) +{ +#define PG_STAT_GET_MEMORY_ALLOCATION_COLS 7 + int num_backends = pgstat_fetch_stat_numbackends(); + int curr_backend; + int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0); + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + + InitMaterializedSRF(fcinfo, 0); + + /* 1-based index */ + for (curr_backend = 1; curr_backend <= num_backends; curr_backend++) + { + /* for each row */ + Datum values[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0}; + bool nulls[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0}; + LocalPgBackendStatus *local_beentry; + PgBackendStatus *beentry; + + /* Get the next one in the list */ + local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); + beentry = &local_beentry->backendStatus; + + /* If looking for specific PID, ignore all the others */ + if (pid != -1 && beentry->st_procpid != pid) + continue; + + /* Values available to all callers */ + if (beentry->st_databaseid != InvalidOid) + values[0] = ObjectIdGetDatum(beentry->st_databaseid); + else + nulls[0] = true; + + values[1] = Int32GetDatum(beentry->st_procpid); + values[2] = UInt64GetDatum(beentry->allocated_bytes); + values[3] = UInt64GetDatum(beentry->aset_allocated_bytes); + values[4] = UInt64GetDatum(beentry->dsm_allocated_bytes); + values[5] = UInt64GetDatum(beentry->generation_allocated_bytes); + values[6] = UInt64GetDatum(beentry->slab_allocated_bytes); + + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + + /* If only a single backend was requested, and we found it, break. */ + if (pid != -1) + break; + } + + return (Datum) 0; +} + +/* + * Get the global memory allocation statistics. + */ +Datum +pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS) +{ +#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2 + TupleDesc tupdesc; + Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0}; + bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0}; + volatile PROC_HDR *procglobal = ProcGlobal; + + /* Initialise attributes information in the tuple descriptor */ + tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid", + OIDOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes", + INT8OID, -1, 0); + BlessTupleDesc(tupdesc); + + /* datid */ + values[0] = ObjectIdGetDatum(MyDatabaseId); + + /* get global_dsm_allocated_bytes */ + values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation)); + + /* Returns the record as Datum */ + PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); +} diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index a604432126..7b8eeb7dbb 100644 --- a/src/backend/utils/init/miscinit.c +++ b/src/backend/utils/init/miscinit.c @@ -171,6 +171,9 @@ InitPostmasterChild(void) (errcode_for_socket_access(), errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m"))); #endif + + /* Init allocated bytes to avoid double counting parent allocation */ + pgstat_init_allocated_bytes(); } /* diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 2589941ec4..f3f5945fdf 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -47,6 +47,7 @@ #include "postgres.h" #include "port/pg_bitutils.h" +#include "utils/backend_status.h" #include "utils/memdebug.h" #include "utils/memutils.h" #include "utils/memutils_memorychunk.h" @@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent, name); ((MemoryContext) set)->mem_allocated = firstBlockSize; + pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_ASET); return (MemoryContext) set; } @@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context) AllocSet set = (AllocSet) context; AllocBlock block; Size keepersize PG_USED_FOR_ASSERTS_ONLY; + uint64 deallocation = 0; Assert(AllocSetIsValid(set)); @@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context) { /* Normal case, release the block */ context->mem_allocated -= block->endptr - ((char *) block); + deallocation += block->endptr - ((char *) block); #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block->freeptr - ((char *) block)); @@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context) } Assert(context->mem_allocated == keepersize); + pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET); /* Reset block size allocation sequence, too */ set->nextBlockSize = set->initBlockSize; @@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context) AllocSet set = (AllocSet) context; AllocBlock block = set->blocks; Size keepersize PG_USED_FOR_ASSERTS_ONLY; + uint64 deallocation = 0; Assert(AllocSetIsValid(set)); @@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context) freelist->first_free = (AllocSetContext *) oldset->header.nextchild; freelist->num_free--; + deallocation += oldset->header.mem_allocated; /* All that remains is to free the header/initial block */ free(oldset); } Assert(freelist->num_free == 0); + pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET); } /* Now add the just-deleted context to the freelist. */ @@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context) AllocBlock next = block->next; if (block != set->keeper) + { context->mem_allocated -= block->endptr - ((char *) block); + deallocation += block->endptr - ((char *) block); + } #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block->freeptr - ((char *) block)); @@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context) } Assert(context->mem_allocated == keepersize); + pgstat_report_allocated_bytes_decrease(deallocation + context->mem_allocated, PG_ALLOC_ASET); /* Finally, free the context header, including the keeper block */ free(set); @@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET); block->aset = set; block->freeptr = block->endptr = ((char *) block) + blksize; @@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET); block->aset = set; block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ; @@ -1041,6 +1055,7 @@ AllocSetFree(void *pointer) block->next->prev = block->prev; set->header.mem_allocated -= block->endptr - ((char *) block); + pgstat_report_allocated_bytes_decrease(block->endptr - ((char *) block), PG_ALLOC_ASET); #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block->freeptr - ((char *) block)); @@ -1171,7 +1186,9 @@ AllocSetRealloc(void *pointer, Size size) /* updated separately, not to underflow when (oldblksize > blksize) */ set->header.mem_allocated -= oldblksize; + pgstat_report_allocated_bytes_decrease(oldblksize, PG_ALLOC_ASET); set->header.mem_allocated += blksize; + pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET); block->freeptr = block->endptr = ((char *) block) + blksize; diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c index ebcb61e9b6..5708e8da7a 100644 --- a/src/backend/utils/mmgr/generation.c +++ b/src/backend/utils/mmgr/generation.c @@ -37,6 +37,7 @@ #include "lib/ilist.h" #include "port/pg_bitutils.h" +#include "utils/backend_status.h" #include "utils/memdebug.h" #include "utils/memutils.h" #include "utils/memutils_memorychunk.h" @@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent, name); ((MemoryContext) set)->mem_allocated = firstBlockSize; + pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_GENERATION); return (MemoryContext) set; } @@ -283,6 +285,7 @@ GenerationReset(MemoryContext context) { GenerationContext *set = (GenerationContext *) context; dlist_mutable_iter miter; + uint64 deallocation = 0; Assert(GenerationIsValid(set)); @@ -305,9 +308,14 @@ GenerationReset(MemoryContext context) if (block == set->keeper) GenerationBlockMarkEmpty(block); else + { + deallocation += block->blksize; GenerationBlockFree(set, block); + } } + pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_GENERATION); + /* set it so new allocations to make use of the keeper block */ set->block = set->keeper; @@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context) { /* Reset to release all releasable GenerationBlocks */ GenerationReset(context); + + pgstat_report_allocated_bytes_decrease(context->mem_allocated, PG_ALLOC_GENERATION); + /* And free the context header and keeper block */ free(context); } @@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION); /* block with a single (used) chunk */ block->context = set; @@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION); /* initialize the new block */ GenerationBlockInit(set, block, blksize); @@ -729,6 +742,8 @@ GenerationFree(void *pointer) dlist_delete(&block->node); set->header.mem_allocated -= block->blksize; + pgstat_report_allocated_bytes_decrease(block->blksize, PG_ALLOC_GENERATION); + free(block); } diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c index 33dca0f37c..31814901f3 100644 --- a/src/backend/utils/mmgr/slab.c +++ b/src/backend/utils/mmgr/slab.c @@ -69,6 +69,7 @@ #include "postgres.h" #include "lib/ilist.h" +#include "utils/backend_status.h" #include "utils/memdebug.h" #include "utils/memutils.h" #include "utils/memutils_memorychunk.h" @@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent, parent, name); + /* + * If SlabContextCreate is updated to add context header size to + * context->mem_allocated, then update here and SlabDelete appropriately + */ + pgstat_report_allocated_bytes_increase(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock), + PG_ALLOC_SLAB); + return (MemoryContext) slab; } @@ -429,6 +437,7 @@ SlabReset(MemoryContext context) SlabContext *slab = (SlabContext *) context; dlist_mutable_iter miter; int i; + uint64 deallocation = 0; Assert(SlabIsValid(slab)); @@ -449,6 +458,7 @@ SlabReset(MemoryContext context) #endif free(block); context->mem_allocated -= slab->blockSize; + deallocation += slab->blockSize; } /* walk over blocklist and free the blocks */ @@ -465,9 +475,11 @@ SlabReset(MemoryContext context) #endif free(block); context->mem_allocated -= slab->blockSize; + deallocation += slab->blockSize; } } + pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_SLAB); slab->curBlocklistIndex = 0; Assert(context->mem_allocated == 0); @@ -480,8 +492,17 @@ SlabReset(MemoryContext context) void SlabDelete(MemoryContext context) { + /* Reset to release all the SlabBlocks */ SlabReset(context); + + /* + * Until context header allocation is included in context->mem_allocated, + * cast to slab and decrement the header allocation + */ + pgstat_report_allocated_bytes_decrease(Slab_CONTEXT_HDRSZ(((SlabContext *) context)->chunksPerBlock), + PG_ALLOC_SLAB); + /* And free the context header */ free(context); } @@ -546,6 +567,7 @@ SlabAlloc(MemoryContext context, Size size) block->slab = slab; context->mem_allocated += slab->blockSize; + pgstat_report_allocated_bytes_increase(slab->blockSize, PG_ALLOC_SLAB); /* use the first chunk in the new block */ chunk = SlabBlockGetChunk(slab, block, 0); @@ -732,6 +754,7 @@ SlabFree(void *pointer) #endif free(block); slab->header.mem_allocated -= slab->blockSize; + pgstat_report_allocated_bytes_decrease(slab->blockSize, PG_ALLOC_SLAB); } /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7c358cff16..d6fbca4a1e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5427,6 +5427,23 @@ proname => 'pg_stat_get_backend_idset', prorows => '100', proretset => 't', provolatile => 's', proparallel => 'r', prorettype => 'int4', proargtypes => '', prosrc => 'pg_stat_get_backend_idset' }, +{ oid => '9890', + descr => 'statistics: memory allocation information for backends', + proname => 'pg_stat_get_memory_allocation', prorows => '100', proisstrict => 'f', + proretset => 't', provolatile => 's', proparallel => 'r', + prorettype => 'record', proargtypes => 'int4', + proallargtypes => '{int4,oid,int4,int8,int8,int8,int8,int8}', + proargmodes => '{i,o,o,o,o,o,o,o}', + proargnames => '{pid,datid,pid,allocated_bytes,aset_allocated_bytes,dsm_allocated_bytes,generation_allocated_bytes,slab_allocated_bytes}', + prosrc => 'pg_stat_get_memory_allocation' }, +{ oid => '9891', + descr => 'statistics: global memory allocation information', + proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f', + provolatile => 's', proparallel => 'r', prorettype => 'record', + proargtypes => '', proallargtypes => '{oid,int8}', + proargmodes => '{o,o}', + proargnames => '{datid,global_dsm_allocated_bytes}', + prosrc =>'pg_stat_get_global_memory_allocation' }, { oid => '2022', descr => 'statistics: information about currently active backends', proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f', diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 4258cd92c9..c2c878219d 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -404,6 +404,8 @@ typedef struct PROC_HDR int spins_per_delay; /* Buffer id of the buffer that Startup process waits for pin on, or -1 */ int startupBufferPinWaitBufId; + /* Global dsm allocations */ + pg_atomic_uint64 global_dsm_allocation; } PROC_HDR; extern PGDLLIMPORT PROC_HDR *ProcGlobal; diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index f7bd83113a..6434ece1ef 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -10,6 +10,7 @@ #ifndef BACKEND_STATUS_H #define BACKEND_STATUS_H +#include "common/int.h" #include "datatype/timestamp.h" #include "libpq/pqcomm.h" #include "miscadmin.h" /* for BackendType */ @@ -32,6 +33,14 @@ typedef enum BackendState STATE_DISABLED } BackendState; +/* Enum helper for reporting memory allocator type */ +enum pg_allocator_type +{ + PG_ALLOC_ASET = 1, + PG_ALLOC_DSM, + PG_ALLOC_GENERATION, + PG_ALLOC_SLAB +}; /* ---------- * Shared-memory data structures @@ -169,6 +178,15 @@ typedef struct PgBackendStatus /* query identifier, optionally computed using post_parse_analyze_hook */ uint64 st_query_id; + + /* Current memory allocated to this backend */ + uint64 allocated_bytes; + + /* Current memory allocated to this backend by type */ + uint64 aset_allocated_bytes; + uint64 dsm_allocated_bytes; + uint64 generation_allocated_bytes; + uint64 slab_allocated_bytes; } PgBackendStatus; @@ -293,6 +311,11 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size; * ---------- */ extern PGDLLIMPORT PgBackendStatus *MyBEEntry; +extern PGDLLIMPORT uint64 *my_allocated_bytes; +extern PGDLLIMPORT uint64 *my_aset_allocated_bytes; +extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes; +extern PGDLLIMPORT uint64 *my_generation_allocated_bytes; +extern PGDLLIMPORT uint64 *my_slab_allocated_bytes; /* ---------- @@ -324,7 +347,12 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser); extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen); extern uint64 pgstat_get_my_query_id(void); - +extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes, + uint64 *aset_allocated_bytes, + uint64 *dsm_allocated_bytes, + uint64 *generation_allocated_bytes, + uint64 *slab_allocated_bytes); +extern void pgstat_reset_allocated_bytes_storage(void); /* ---------- * Support functions for the SQL-callable functions to @@ -336,5 +364,131 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid); extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid); extern char *pgstat_clip_activity(const char *raw_activity); +/* ---------- + * pgstat_report_allocated_bytes_decrease() - + * Called to report decrease in memory allocated for this backend. + * + * my_{*_}allocated_bytes initially points to local memory, making it safe to + * call this before pgstats has been initialized. + * ---------- + */ +static inline void +pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes, + int pg_allocator_type) +{ + uint64 temp; + + /* Avoid allocated_bytes unsigned integer overflow on decrease */ + if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp)) + { + /* On overflow, set pgstat count of allocated bytes to zero */ + *my_allocated_bytes = 0; + + switch (pg_allocator_type) + { + case PG_ALLOC_ASET: + *my_aset_allocated_bytes = 0; + break; + case PG_ALLOC_DSM: + *my_dsm_allocated_bytes = 0; + break; + case PG_ALLOC_GENERATION: + *my_generation_allocated_bytes = 0; + break; + case PG_ALLOC_SLAB: + *my_slab_allocated_bytes = 0; + break; + } + } + else + { + /* decrease allocation */ + *my_allocated_bytes -= proc_allocated_bytes; + + /* Decrease allocator type allocated bytes. */ + switch (pg_allocator_type) + { + case PG_ALLOC_ASET: + *my_aset_allocated_bytes -= proc_allocated_bytes; + break; + case PG_ALLOC_DSM: + + /* + * Some dsm allocations live beyond process exit. These are + * accounted for in a global counter in + * pgstat_reset_allocated_bytes_storage at process exit. + */ + *my_dsm_allocated_bytes -= proc_allocated_bytes; + break; + case PG_ALLOC_GENERATION: + *my_generation_allocated_bytes -= proc_allocated_bytes; + break; + case PG_ALLOC_SLAB: + *my_slab_allocated_bytes -= proc_allocated_bytes; + break; + } + } + + return; +} + +/* ---------- + * pgstat_report_allocated_bytes_increase() - + * Called to report increase in memory allocated for this backend. + * + * my_allocated_bytes initially points to local memory, making it safe to call + * this before pgstats has been initialized. + * ---------- + */ +static inline void +pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes, + int pg_allocator_type) +{ + *my_allocated_bytes += proc_allocated_bytes; + + /* Increase allocator type allocated bytes */ + switch (pg_allocator_type) + { + case PG_ALLOC_ASET: + *my_aset_allocated_bytes += proc_allocated_bytes; + break; + case PG_ALLOC_DSM: + + /* + * Some dsm allocations live beyond process exit. These are + * accounted for in a global counter in + * pgstat_reset_allocated_bytes_storage at process exit. + */ + *my_dsm_allocated_bytes += proc_allocated_bytes; + break; + case PG_ALLOC_GENERATION: + *my_generation_allocated_bytes += proc_allocated_bytes; + break; + case PG_ALLOC_SLAB: + *my_slab_allocated_bytes += proc_allocated_bytes; + break; + } + + return; +} + +/* --------- + * pgstat_init_allocated_bytes() - + * + * Called to initialize allocated bytes variables after fork and to + * avoid double counting allocations. + * --------- + */ +static inline void +pgstat_init_allocated_bytes(void) +{ + *my_allocated_bytes = 0; + *my_aset_allocated_bytes = 0; + *my_dsm_allocated_bytes = 0; + *my_generation_allocated_bytes = 0; + *my_slab_allocated_bytes = 0; + + return; +} #endif /* BACKEND_STATUS_H */ diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 996d22b7dd..9cf035a74a 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1871,6 +1871,12 @@ pg_stat_database_conflicts| SELECT oid AS datid, pg_stat_get_db_conflict_bufferpin(oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(oid) AS confl_deadlock FROM pg_database d; +pg_stat_global_memory_allocation| SELECT s.datid, + current_setting('shared_memory_size'::text, true) AS shared_memory_size, + (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages, + s.global_dsm_allocated_bytes + FROM (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes) + LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_gssapi| SELECT pid, gss_auth AS gss_authenticated, gss_princ AS principal, @@ -1889,6 +1895,15 @@ pg_stat_io| SELECT backend_type, fsyncs, stats_reset FROM pg_stat_get_io() b(backend_type, io_object, io_context, reads, writes, extends, op_bytes, evictions, reuses, fsyncs, stats_reset); +pg_stat_memory_allocation| SELECT s.datid, + s.pid, + s.allocated_bytes, + s.aset_allocated_bytes, + s.dsm_allocated_bytes, + s.generation_allocated_bytes, + s.slab_allocated_bytes + FROM (pg_stat_get_memory_allocation(NULL::integer) s(datid, pid, allocated_bytes, aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes, slab_allocated_bytes) + LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_progress_analyze| SELECT s.pid, s.datid, d.datname, diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 55b4c6df01..5fad38d49d 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -1469,4 +1469,40 @@ SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2; DROP TABLE brin_hot_3; SET enable_seqscan = on; +-- ensure that allocated_bytes exist for backends +SELECT + allocated_bytes > 0 AS result +FROM + pg_stat_activity ps + JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid) +WHERE + backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher'); + result +-------- + t + t + t + t +(4 rows) + +-- ensure that pg_stat_global_memory_allocation view exists +SELECT + datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0 +FROM + pg_stat_global_memory_allocation; + ?column? | ?column? | ?column? | ?column? +----------+----------+----------+---------- + t | t | t | t +(1 row) + +-- ensure that pg_stat_memory_allocation view exists +SELECT + pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0 +FROM + pg_stat_memory_allocation limit 1; + ?column? | ?column? | ?column? | ?column? | ?column? | ?column? +----------+----------+----------+----------+----------+---------- + t | t | t | t | t | t +(1 row) + -- End of Stats Test diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index d958e70a86..e768f3df84 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -763,4 +763,24 @@ DROP TABLE brin_hot_3; SET enable_seqscan = on; +-- ensure that allocated_bytes exist for backends +SELECT + allocated_bytes > 0 AS result +FROM + pg_stat_activity ps + JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid) +WHERE + backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher'); + +-- ensure that pg_stat_global_memory_allocation view exists +SELECT + datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0 +FROM + pg_stat_global_memory_allocation; + +-- ensure that pg_stat_memory_allocation view exists +SELECT + pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0 +FROM + pg_stat_memory_allocation limit 1; -- End of Stats Test -- 2.25.1