On Fri, Mar 14, 2025 at 1:08 PM Bertrand Drouvot <bertranddrouvot...@gmail.com> wrote: > On Fri, Mar 14, 2025 at 11:05:28AM +0100, Jakub Wartak wrote: > > On Thu, Mar 13, 2025 at 3:15 PM Bertrand Drouvot > > <bertranddrouvot...@gmail.com> wrote: > > > > Hi, > > > > Thank you very much for the review! I'm answering to both reviews in > > one go and the results is attached v12, seems it all should be solved > > now: > > Thanks for v12! > > I'll review 0001 and 0003 later, but want to share what I've done for 0002. > > I did prepare a patch file (attached as .txt to not disturb the cfbot) to > apply > on top of v11 0002 (I just rebased it a bit so that it now applies on top of > v12 0002).
Hey Bertrand, all LGTM (good ideas), so here's v13 attached with applied all of that (rebased, tested). BTW: I'm sending to make cfbot as it still tried to apply that .patch (on my side it .patch, not .txt) > === 9 > > -static bool firstUseInBackend = true; > +static bool firstNumaTouch = true; > > Looks better to me but still not 100% convinced by the name. IMHO, Yes, it looks much better. > === 10 > > static BufferCachePagesContext * > -pg_buffercache_init_entries(FuncCallContext *funcctx, PG_FUNCTION_ARGS) > +pg_buffercache_init_entries(FuncCallContext *funcctx, FunctionCallInfo > fcinfo) > > as PG_FUNCTION_ARGS is usually used for fmgr-compatible function and there is > a lof of examples in the code that make use of "FunctionCallInfo" for non > fmgr-compatible function. Cool, thanks. > and also: > > === 11 > > I don't like the fact that we iterate 2 times over NBuffers in > pg_buffercache_numa_pages(). > > But I'm having having hard time finding a better approach given the fact that > pg_numa_query_pages() needs all the pointers "prepared" before it can be > called... > > Those 2 loops are probably the best approach, unless someone has a better > idea. IMHO, it doesn't hurt and I've also not been able to think of any better idea. > === 12 > > Upthread you asked "Can you please take a look again on this, is this up to > the > project standards?" > > Was the question about using pg_buffercache_numa_prepare_ptrs() as an inlined > wrapper? Yes, this was for an earlier doubt regarding question "19" about reviewing the code after removal of `query_numa` variable. This is the same code for 2 loops, IMHO it is good now. > What do you think? The comments, doc and code changes are just proposals and > are > fully open to discussion. They are great, thank You! -J.
From 1a55056446fff06e0441d8d05a9e84832dbdc821 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <jakub.wartak@enterprisedb.com> Date: Fri, 21 Feb 2025 10:19:35 +0100 Subject: [PATCH v13 1/3] Add optional dependency to libnuma (Linux-only) for basic NUMA awareness routines and add minimal src/port/pg_numa.c portability wrapper. Other platforms can be added later. This also adds function pg_numa_available() that can be used to check if the server was linked with NUMA support. libnuma is unavailable on 32-bit builds, so due to lack of i386 shared object, we disable it there (it does not make sense anyway on i386 it is very memory limited platform even with PAE) Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com --- .cirrus.tasks.yml | 12 +- configure | 87 ++++++++++++++ configure.ac | 13 +++ doc/src/sgml/func.sgml | 13 +++ doc/src/sgml/installation.sgml | 21 ++++ meson.build | 23 ++++ meson_options.txt | 3 + src/Makefile.global.in | 1 + src/backend/utils/misc/guc_tables.c | 2 +- src/include/catalog/pg_proc.dat | 4 + src/include/pg_config.h.in | 3 + src/include/port/pg_numa.h | 46 ++++++++ src/include/storage/pg_shmem.h | 1 + src/makefiles/meson.build | 3 + src/port/Makefile | 1 + src/port/meson.build | 1 + src/port/pg_numa.c | 168 ++++++++++++++++++++++++++++ 17 files changed, 397 insertions(+), 5 deletions(-) create mode 100644 src/include/port/pg_numa.h create mode 100644 src/port/pg_numa.c diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index 5849cbb839a..7010dff7aef 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -445,8 +445,10 @@ task: EOF setup_additional_packages_script: | - #apt-get update - #DEBIAN_FRONTEND=noninteractive apt-get -y install ... + apt-get update + DEBIAN_FRONTEND=noninteractive apt-get -y install \ + libnuma1 \ + libnuma-dev matrix: # SPECIAL: @@ -471,6 +473,7 @@ task: --enable-cassert --enable-injection-points --enable-debug \ --enable-tap-tests --enable-nls \ --with-segsize-blocks=6 \ + --with-libnuma \ \ ${LINUX_CONFIGURE_FEATURES} \ \ @@ -519,6 +522,7 @@ task: -Dllvm=disabled \ --pkg-config-path /usr/lib/i386-linux-gnu/pkgconfig/ \ -DPERL=perl5.36-i386-linux-gnu \ + -Dlibnuma=disabled \ build-32 EOF @@ -835,8 +839,8 @@ task: folder: $CCACHE_DIR setup_additional_packages_script: | - #apt-get update - #DEBIAN_FRONTEND=noninteractive apt-get -y install ... + apt-get update + DEBIAN_FRONTEND=noninteractive apt-get -y install libnuma1 libnuma-dev ### # Test that code can be built with gcc/clang without warnings diff --git a/configure b/configure index 93fddd69981..23c33dd9971 100755 --- a/configure +++ b/configure @@ -711,6 +711,7 @@ with_libxml LIBCURL_LIBS LIBCURL_CFLAGS with_libcurl +with_libnuma with_uuid with_readline with_systemd @@ -868,6 +869,7 @@ with_libedit_preferred with_uuid with_ossp_uuid with_libcurl +with_libnuma with_libxml with_libxslt with_system_tzdata @@ -1581,6 +1583,7 @@ Optional Packages: --with-uuid=LIB build contrib/uuid-ossp using LIB (bsd,e2fs,ossp) --with-ossp-uuid obsolete spelling of --with-uuid=ossp --with-libcurl build with libcurl support + --with-libnuma build with libnuma support --with-libxml build with XML support --with-libxslt use XSLT support when building contrib/xml2 --with-system-tzdata=DIR @@ -9140,6 +9143,33 @@ fi +# +# NUMA +# + + + +# Check whether --with-libnuma was given. +if test "${with_libnuma+set}" = set; then : + withval=$with_libnuma; + case $withval in + yes) + +$as_echo "#define USE_LIBNUMA 1" >>confdefs.h + + ;; + no) + : + ;; + *) + as_fn_error $? "no argument expected for --with-libnuma option" "$LINENO" 5 + ;; + esac + +else + with_libnuma=no + +fi @@ -12378,6 +12408,63 @@ fi fi +if test "$with_libnuma" = yes ; then + + ac_fn_c_check_header_mongrel "$LINENO" "numa.h" "ac_cv_header_numa_h" "$ac_includes_default" +if test "x$ac_cv_header_numa_h" = xyes; then : + +else + as_fn_error $? "header file <numa.h> is required for --with-libnuma" "$LINENO" 5 +fi + + + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for numa_available in -lnuma" >&5 +$as_echo_n "checking for numa_available in -lnuma... " >&6; } +if ${ac_cv_lib_numa_numa_available+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_check_lib_save_LIBS=$LIBS +LIBS="-lnuma $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char numa_available (); +int +main () +{ +return numa_available (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + ac_cv_lib_numa_numa_available=yes +else + ac_cv_lib_numa_numa_available=no +fi +rm -f core conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi + +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_numa_numa_available" >&5 +$as_echo "$ac_cv_lib_numa_numa_available" >&6; } +if test "x$ac_cv_lib_numa_numa_available" = xyes; then : + + LIBS="-lnuma $LIBS" + +else + as_fn_error $? "library 'numa' does not provide numa_available" "$LINENO" 5 +fi + +fi + # XXX libcurl must link after libgssapi_krb5 on FreeBSD to avoid segfaults # during gss_acquire_cred(). This is possibly related to Curl's Heimdal # dependency on that platform? diff --git a/configure.ac b/configure.ac index b6d02f5ecc7..1a394dfc077 100644 --- a/configure.ac +++ b/configure.ac @@ -1041,6 +1041,19 @@ if test "$with_libcurl" = yes ; then fi +# +# libnuma +# +AC_MSG_CHECKING([whether to build with libnuma support]) +PGAC_ARG_BOOL(with, libnuma, no, [use libnuma for NUMA awareness], + [AC_DEFINE([USE_LIBNUMA], 1, [Define to build with NUMA awareness support. (--with-libnuma)])]) +AC_MSG_RESULT([$with_libnuma]) +AC_SUBST(with_libnuma) + +if test "$with_libnuma" = yes ; then + AC_CHECK_LIB(numa, numa_available, [], [AC_MSG_ERROR([library 'libnuma' is required for NUMA awareness])]) +fi + # # XML # diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 1c3810e1a04..113588defdd 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25078,6 +25078,19 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_numa_available</primary> + </indexterm> + <function>pg_numa_available</function> () + <returnvalue>boolean</returnvalue> + </para> + <para> + Returns true if the server has been compiled with <acronym>NUMA</acronym> support. + </para></entry> + </row> + <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index e076cefa3b9..9f56205a1d7 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -1156,6 +1156,16 @@ build-postgresql: </listitem> </varlistentry> + <varlistentry id="configure-option-with-libnuma"> + <term><option>--with-libnuma</option></term> + <listitem> + <para> + Build with libnuma support for basic NUMA support. + Only supported on platforms for which the libnuma library is implemented. + </para> + </listitem> + </varlistentry> + <varlistentry id="configure-option-with-libxml"> <term><option>--with-libxml</option></term> <listitem> @@ -2611,6 +2621,17 @@ ninja install </listitem> </varlistentry> + <varlistentry id="configure-with-libnuma-meson"> + <term><option>-Dlibnuma={ auto | enabled | disabled }</option></term> + <listitem> + <para> + Build with libnuma support for basic NUMA support. + Only supported on platforms for which the libnuma library is implemented. + The default for this option is auto. + </para> + </listitem> + </varlistentry> + <varlistentry id="configure-with-libxml-meson"> <term><option>-Dlibxml={ auto | enabled | disabled }</option></term> <listitem> diff --git a/meson.build b/meson.build index 13c13748e5d..4106c4b13f5 100644 --- a/meson.build +++ b/meson.build @@ -949,6 +949,27 @@ else endif +############################################################### +# Library: libnuma +############################################################### + +libnumaopt = get_option('libnuma') +if not libnumaopt.disabled() + # via pkg-config + libnuma = dependency('numa', required: libnumaopt) + if not libnuma.found() + libnuma = cc.find_library('numa', required: libnumaopt) + endif + if not cc.has_header('numa.h', dependencies: libnuma, required: libnumaopt) + libnuma = not_found_dep + endif + if libnuma.found() + cdata.set('USE_LIBNUMA', 1) + endif +else + libnuma = not_found_dep +endif + ############################################################### # Library: libxml @@ -3168,6 +3189,7 @@ backend_both_deps += [ icu_i18n, ldap, libintl, + libnuma, libxml, lz4, pam, @@ -3823,6 +3845,7 @@ if meson.version().version_compare('>=0.57') 'icu': icu, 'ldap': ldap, 'libcurl': libcurl, + 'libnuma': libnuma, 'libxml': libxml, 'libxslt': libxslt, 'llvm': llvm, diff --git a/meson_options.txt b/meson_options.txt index 702c4517145..adaadb5faf1 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -106,6 +106,9 @@ option('libcurl', type : 'feature', value: 'auto', option('libedit_preferred', type: 'boolean', value: false, description: 'Prefer BSD Libedit over GNU Readline') +option('libnuma', type: 'feature', value: 'auto', + description: 'NUMA awareness support') + option('libxml', type: 'feature', value: 'auto', description: 'XML support') diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 3b620bac5ac..0bd4b2d7d32 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -191,6 +191,7 @@ with_gssapi = @with_gssapi@ with_krb_srvnam = @with_krb_srvnam@ with_ldap = @with_ldap@ with_libcurl = @with_libcurl@ +with_libnuma = @with_libnuma@ with_libxml = @with_libxml@ with_libxslt = @with_libxslt@ with_llvm = @with_llvm@ diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 9c0b10ad4dc..c5e8ce06c97 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -563,7 +563,7 @@ static int ssl_renegotiation_limit; */ int huge_pages = HUGE_PAGES_TRY; int huge_page_size; -static int huge_pages_status = HUGE_PAGES_UNKNOWN; +int huge_pages_status = HUGE_PAGES_UNKNOWN; /* * These variables are all dummies that don't do anything, except in some diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 890822eaf79..85902903653 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8492,6 +8492,10 @@ proargnames => '{name,off,size,allocated_size}', prosrc => 'pg_get_shmem_allocations' }, +{ oid => '9685', descr => 'Is NUMA compilation available?', + proname => 'pg_numa_available', provolatile => 'v', prorettype => 'bool', + proargtypes => '', prosrc => 'pg_numa_available' }, + # memory context of local backend { oid => '2282', descr => 'information about all memory contexts of local backend', diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index db6454090d2..8894f800607 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -672,6 +672,9 @@ /* Define to 1 to build with libcurl support. (--with-libcurl) */ #undef USE_LIBCURL +/* Define to 1 to build with NUMA awareness support. (--with-libnuma) */ +#undef USE_LIBNUMA + /* Define to 1 to build with XML support. (--with-libxml) */ #undef USE_LIBXML diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h new file mode 100644 index 00000000000..986152e0942 --- /dev/null +++ b/src/include/port/pg_numa.h @@ -0,0 +1,46 @@ +/*------------------------------------------------------------------------- + * + * pg_numa.h + * Basic NUMA portability routines + * + * + * Copyright (c) 2025, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/include/port/pg_numa.h + * + *------------------------------------------------------------------------- + */ +#ifndef PG_NUMA_H +#define PG_NUMA_H + +#include "c.h" +#include "postgres.h" +#include "fmgr.h" + +extern PGDLLIMPORT int pg_numa_init(void); +extern PGDLLIMPORT int pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status); +extern PGDLLIMPORT int pg_numa_get_max_node(void); +extern PGDLLIMPORT Size pg_numa_get_pagesize(void); +extern PGDLLIMPORT Datum pg_numa_available(PG_FUNCTION_ARGS); + +#ifdef USE_LIBNUMA + +/* + * This is required on Linux, before pg_numa_query_pages() as we + * need to page-fault before move_pages(2) syscall returns valid results. + */ +#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \ + ro_volatile_var = *(uint64 *)ptr + +extern void numa_warn(int num, char *fmt,...) pg_attribute_printf(2, 3); +extern void numa_error(char *where); + +#else + +#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \ + do {} while(0) + +#endif + +#endif /* PG_NUMA_H */ diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h index b99ebc9e86f..5f7d4b83a60 100644 --- a/src/include/storage/pg_shmem.h +++ b/src/include/storage/pg_shmem.h @@ -45,6 +45,7 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */ extern PGDLLIMPORT int shared_memory_type; extern PGDLLIMPORT int huge_pages; extern PGDLLIMPORT int huge_page_size; +extern PGDLLIMPORT int huge_pages_status; /* Possible values for huge_pages and huge_pages_status */ typedef enum diff --git a/src/makefiles/meson.build b/src/makefiles/meson.build index 60e13d50235..f786c191605 100644 --- a/src/makefiles/meson.build +++ b/src/makefiles/meson.build @@ -199,6 +199,8 @@ pgxs_empty = [ 'PTHREAD_CFLAGS', 'PTHREAD_LIBS', 'ICU_LIBS', + + 'LIBNUMA_CFLAGS', 'LIBNUMA_LIBS' ] if host_system == 'windows' and cc.get_argument_syntax() != 'msvc' @@ -230,6 +232,7 @@ pgxs_deps = { 'icu': icu, 'ldap': ldap, 'libcurl': libcurl, + 'libnuma': libnuma, 'libxml': libxml, 'libxslt': libxslt, 'llvm': llvm, diff --git a/src/port/Makefile b/src/port/Makefile index 4c224319512..a68a29d5414 100644 --- a/src/port/Makefile +++ b/src/port/Makefile @@ -44,6 +44,7 @@ OBJS = \ noblock.o \ path.o \ pg_bitutils.o \ + pg_numa.o \ pg_popcount_avx512.o \ pg_strong_random.o \ pgcheckdir.o \ diff --git a/src/port/meson.build b/src/port/meson.build index 7fcfa728d43..7ffbd4d88d2 100644 --- a/src/port/meson.build +++ b/src/port/meson.build @@ -7,6 +7,7 @@ pgport_sources = [ 'noblock.c', 'path.c', 'pg_bitutils.c', + 'pg_numa.c', 'pg_popcount_avx512.c', 'pg_strong_random.c', 'pgcheckdir.c', diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c new file mode 100644 index 00000000000..7d905ef31f5 --- /dev/null +++ b/src/port/pg_numa.c @@ -0,0 +1,168 @@ +/*------------------------------------------------------------------------- + * + * pg_numa.c + * Basic NUMA portability routines + * + * + * Copyright (c) 2025, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/port/pg_numa.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" +#include <unistd.h> + +#ifdef WIN32 +#include <windows.h> +#endif + +#include "fmgr.h" +#include "port/pg_numa.h" +#include "storage/pg_shmem.h" + +/* + * At this point we provide support only for Linux thanks to libnuma, but in + * future support for other platforms e.g. Win32 or FreeBSD might be possible + * too. For Win32 NUMA APIs see + * https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support + */ +#ifdef USE_LIBNUMA + +#include <numa.h> +#include <numaif.h> + +/* libnuma requires initialization as per numa(3) on Linux */ +int +pg_numa_init(void) +{ + int r = numa_available(); + + return r; +} + +int +pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status) +{ + return numa_move_pages(pid, count, pages, NULL, status, 0); +} + +int +pg_numa_get_max_node(void) +{ + return numa_max_node(); +} + +Size +pg_numa_get_pagesize(void) +{ + Size os_page_size = sysconf(_SC_PAGESIZE); + + if (huge_pages_status == HUGE_PAGES_ON) + GetHugePageSize(&os_page_size, NULL); + + return os_page_size; +} + +#ifndef FRONTEND +/* + * XXX: not really tested as there is no way to trigger this in our + * current usage of libnuma. + * + * The libnuma built-in code can be seen here: + * https://github.com/numactl/numactl/blob/master/libnuma.c + * + */ +void +numa_warn(int num, char *fmt,...) +{ + va_list ap; + int olde = errno; + int needed; + StringInfoData msg; + + initStringInfo(&msg); + + va_start(ap, fmt); + needed = appendStringInfoVA(&msg, fmt, ap); + va_end(ap); + if (needed > 0) + { + enlargeStringInfo(&msg, needed); + va_start(ap, fmt); + appendStringInfoVA(&msg, fmt, ap); + va_end(ap); + } + + ereport(WARNING, + (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), + errmsg_internal("libnuma: WARNING: %s", msg.data))); + + pfree(msg.data); + + errno = olde; +} + +void +numa_error(char *where) +{ + int olde = errno; + + /* + * XXX: for now we issue just WARNING, but long-term that might depend on + * numa_set_strict() here. + */ + elog(WARNING, "libnuma: ERROR: %s", where); + errno = olde; +} +#endif /* FRONTEND */ + +#else + +/* Empty wrappers */ +int +pg_numa_init(void) +{ + /* We state that NUMA is not available */ + return -1; +} + +int +pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status) +{ + return 0; +} + +int +pg_numa_get_max_node(void) +{ + return 0; +} + +Size +pg_numa_get_pagesize(void) +{ +#ifndef WIN32 + Size os_page_size = sysconf(_SC_PAGESIZE); +#else + Size os_page_size; + SYSTEM_INFO sysinfo; + + GetSystemInfo(&sysinfo); + os_page_size = sysinfo.dwPageSize; +#endif + if (huge_pages_status == HUGE_PAGES_ON) + GetHugePageSize(&os_page_size, NULL); + return os_page_size; +} + +#endif + +Datum +pg_numa_available(PG_FUNCTION_ARGS) +{ + PG_RETURN_BOOL(pg_numa_init() != -1); +} -- 2.39.5
From 3110606f68cc40e02b8ab4670c66089be4e2e305 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <jakub.wartak@enterprisedb.com> Date: Fri, 21 Feb 2025 14:20:18 +0100 Subject: [PATCH v13 3/3] Add pg_shmem_numa_allocations to show NUMA zones for shared memory allocations Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com --- doc/src/sgml/system-views.sgml | 78 ++++++++++++++ src/backend/catalog/system_views.sql | 8 ++ src/backend/storage/ipc/shmem.c | 125 +++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 8 ++ src/test/regress/expected/numa.out | 12 +++ src/test/regress/expected/numa_1.out | 3 + src/test/regress/expected/privileges.out | 16 ++- src/test/regress/expected/rules.out | 4 + src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/numa.sql | 9 ++ src/test/regress/sql/privileges.sql | 6 +- 11 files changed, 266 insertions(+), 5 deletions(-) create mode 100644 src/test/regress/expected/numa.out create mode 100644 src/test/regress/expected/numa_1.out create mode 100644 src/test/regress/sql/numa.sql diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 3f5a306247e..5164083131a 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -176,6 +176,11 @@ <entry>shared memory allocations</entry> </row> + <row> + <entry><link linkend="view-pg-shmem-numa-allocations"><structname>pg_shmem_numa_allocations</structname></link></entry> + <entry>NUMA mappings for shared memory allocations</entry> + </row> + <row> <entry><link linkend="view-pg-stats"><structname>pg_stats</structname></link></entry> <entry>planner statistics</entry> @@ -3746,6 +3751,79 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx </para> </sect1> + <sect1 id="view-pg-shmem-numa-allocations"> + <title><structname>pg_shmem_numa_allocations</structname></title> + + <indexterm zone="view-pg-shmem-numa-allocations"> + <primary>pg_shmem_numa_allocations</primary> + </indexterm> + + <para> + The <structname>pg_shmem_numa_allocations</structname> view shows NUMA nodes + assigned allocations made from the server's main shared memory segment. + This includes both memory allocated by <productname>PostgreSQL</productname> + itself and memory allocated by extensions using the mechanisms detailed in + <xref linkend="xfunc-shared-addin" />. + </para> + + <para> + Note that this view does not include memory allocated using the dynamic + shared memory infrastructure. + </para> + + <table> + <title><structname>pg_shmem_numa_allocations</structname> Columns</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + Column Type + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>name</structfield> <type>text</type> + </para> + <para> + The name of the shared memory allocation. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>numa_zone_id</structfield> <type>int4</type> + </para> + <para> + ID of NUMA node + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>numa_size</structfield> <type>int4</type> + </para> + <para> + Size of the allocation on this particular NUMA node in bytes + </para></entry> + </row> + + </tbody> + </tgroup> + </table> + + <para> + By default, the <structname>pg_shmem_numa_allocations</structname> view can be + read only by superusers or roles with privileges of the + <literal>pg_read_all_stats</literal> role. + </para> + </sect1> + <sect1 id="view-pg-stats"> <title><structname>pg_stats</structname></title> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index a4d2cfdcaf5..cc014a62dc2 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -658,6 +658,14 @@ GRANT SELECT ON pg_shmem_allocations TO pg_read_all_stats; REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; GRANT EXECUTE ON FUNCTION pg_get_shmem_allocations() TO pg_read_all_stats; +CREATE VIEW pg_shmem_numa_allocations AS + SELECT * FROM pg_get_shmem_numa_allocations(); + +REVOKE ALL ON pg_shmem_numa_allocations FROM PUBLIC; +GRANT SELECT ON pg_shmem_numa_allocations TO pg_read_all_stats; +REVOKE EXECUTE ON FUNCTION pg_get_shmem_numa_allocations() FROM PUBLIC; +GRANT EXECUTE ON FUNCTION pg_get_shmem_numa_allocations() TO pg_read_all_stats; + CREATE VIEW pg_backend_memory_contexts AS SELECT * FROM pg_get_backend_memory_contexts(); diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 895a43fb39e..9331a5760f6 100644 --- a/src/backend/storage/ipc/shmem.c +++ b/src/backend/storage/ipc/shmem.c @@ -68,6 +68,7 @@ #include "fmgr.h" #include "funcapi.h" #include "miscadmin.h" +#include "port//pg_numa.h" #include "storage/lwlock.h" #include "storage/pg_shmem.h" #include "storage/shmem.h" @@ -89,6 +90,8 @@ slock_t *ShmemLock; /* spinlock for shared memory and LWLock static HTAB *ShmemIndex = NULL; /* primary index hashtable for shmem */ +/* To get reliable results for NUMA inquiry we need to "touch pages" once */ +static bool firstUseInBackend = true; /* * InitShmemAccess() --- set up basic pointers to shared memory. @@ -568,3 +571,125 @@ pg_get_shmem_allocations(PG_FUNCTION_ARGS) return (Datum) 0; } + +/* SQL SRF showing NUMA zones for allocated shared memory */ +Datum +pg_get_shmem_numa_allocations(PG_FUNCTION_ARGS) +{ +#define PG_GET_SHMEM_NUMA_SIZES_COLS 3 + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + HASH_SEQ_STATUS hstat; + ShmemIndexEnt *ent; + Datum values[PG_GET_SHMEM_NUMA_SIZES_COLS]; + bool nulls[PG_GET_SHMEM_NUMA_SIZES_COLS]; + Size os_page_size; + void **page_ptrs; + int *pages_status; + int shm_total_page_count, + shm_ent_page_count, + max_zones; + Size *zones; + + InitMaterializedSRF(fcinfo, 0); + + if (pg_numa_init() == -1) + { + elog(NOTICE, "libnuma initialization failed or NUMA is not supported on this platform, some NUMA data might be unavailable.");; + return (Datum) 0; + } + max_zones = pg_numa_get_max_node(); + zones = palloc(sizeof(Size) * (max_zones + 1)); + + /* + * This is for gathering some NUMA statistics. We might be using various + * DB block sizes (4kB, 8kB , .. 32kB) that end up being allocated in + * various different OS memory pages sizes, so first we need to understand + * the OS memory page size before calling move_pages() + */ + os_page_size = pg_numa_get_pagesize(); + + /* + * Preallocate memory all at once without going into details which shared + * memory segment is the biggest (technically min s_b can be as low as + * 16xBLCKSZ) + */ + shm_total_page_count = ShmemSegHdr->totalsize / os_page_size; + page_ptrs = palloc(sizeof(void *) * shm_total_page_count); + pages_status = palloc(sizeof(int) * shm_total_page_count); + memset(page_ptrs, 0, sizeof(void *) * shm_total_page_count); + + if (firstUseInBackend) + elog(DEBUG1, "NUMA: page-faulting shared memory segments for proper NUMA readouts"); + + LWLockAcquire(ShmemIndexLock, LW_SHARED); + + hash_seq_init(&hstat, ShmemIndex); + + /* output all allocated entries */ + memset(nulls, 0, sizeof(nulls)); + while ((ent = (ShmemIndexEnt *) hash_seq_search(&hstat)) != NULL) + { + int i; + + shm_ent_page_count = ent->allocated_size / os_page_size; + /* It is always at least 1 page */ + shm_ent_page_count = shm_ent_page_count == 0 ? 1 : shm_ent_page_count; + + /* + * If we get ever 0xff back from kernel inquiry, then we probably have + * bug in our buffers to OS page mapping code here + */ + memset(pages_status, 0xff, sizeof(int) * shm_ent_page_count); + + for (i = 0; i < shm_ent_page_count; i++) + { + /* + * In order to get reliable results we also need to touch memory + * pages so that inquiry about NUMA zone doesn't return -2. + */ + volatile uint64 touch pg_attribute_unused(); + + page_ptrs[i] = (char *) ent->location + (i * os_page_size); + if (firstUseInBackend) + pg_numa_touch_mem_if_required(touch, page_ptrs[i]); + + CHECK_FOR_INTERRUPTS(); + } + + if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1) + elog(ERROR, "failed NUMA pages inquiry status: %m"); + + memset(zones, 0, sizeof(Size) * (max_zones + 1)); + /* Count number of NUMA zones used for this shared memory entry */ + for (i = 0; i < shm_ent_page_count; i++) + { + int s = pages_status[i]; + + /* Ensure we are adding only valid index to the array */ + if (s >= 0 && s <= max_zones) + zones[s]++; + } + + for (i = 0; i <= max_zones; i++) + { + values[0] = CStringGetTextDatum(ent->key); + values[1] = i; + values[2] = Int64GetDatum(zones[i] * os_page_size); + + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, + values, nulls); + } + } + + /* + * XXX: We are ignoring in NUMA version reporting of the following regions + * (compare to pg_get_shmem_allocations() case): 1. output shared memory + * allocated but not counted via the shmem index 2. output as-of-yet + * unused shared memory + */ + + LWLockRelease(ShmemIndexLock); + firstUseInBackend = false; + + return (Datum) 0; +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 85902903653..55ff305a713 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8496,6 +8496,14 @@ proname => 'pg_numa_available', provolatile => 'v', prorettype => 'bool', proargtypes => '', prosrc => 'pg_numa_available' }, +# shared memory usage with NUMA info +{ oid => '9686', descr => 'NUMA mappings for the main shared memory segment', + proname => 'pg_get_shmem_numa_allocations', prorows => '50', proretset => 't', + provolatile => 'v', prorettype => 'record', proargtypes => '', + proallargtypes => '{text,int4,int8}', proargmodes => '{o,o,o}', + proargnames => '{name,numa_zone_id,numa_size}', + prosrc => 'pg_get_shmem_numa_allocations' }, + # memory context of local backend { oid => '2282', descr => 'information about all memory contexts of local backend', diff --git a/src/test/regress/expected/numa.out b/src/test/regress/expected/numa.out new file mode 100644 index 00000000000..fb882c5b771 --- /dev/null +++ b/src/test/regress/expected/numa.out @@ -0,0 +1,12 @@ +SELECT NOT(pg_numa_available()) AS skip_test \gset +\if :skip_test +\quit +\endif +-- switch to superuser +\c - +SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_numa_allocations; + ok +---- + t +(1 row) + diff --git a/src/test/regress/expected/numa_1.out b/src/test/regress/expected/numa_1.out new file mode 100644 index 00000000000..6dd6824b4e4 --- /dev/null +++ b/src/test/regress/expected/numa_1.out @@ -0,0 +1,3 @@ +SELECT NOT(pg_numa_available()) AS skip_test \gset +\if :skip_test +\quit diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out index 954f549555e..d9d62470cdc 100644 --- a/src/test/regress/expected/privileges.out +++ b/src/test/regress/expected/privileges.out @@ -3127,8 +3127,8 @@ REVOKE MAINTAIN ON lock_table FROM regress_locktable_user; -- clean up DROP TABLE lock_table; DROP USER regress_locktable_user; --- test to check privileges of system views pg_shmem_allocations and --- pg_backend_memory_contexts. +-- test to check privileges of system views pg_shmem_allocations, +-- pg_shmem_numa_allocations and pg_backend_memory_contexts. -- switch to superuser \c - CREATE ROLE regress_readallstats; @@ -3144,6 +3144,12 @@ SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations','SELECT f (1 row) +SELECT has_table_privilege('regress_readallstats','pg_shmem_numa_allocations','SELECT'); -- no + has_table_privilege +--------------------- + f +(1 row) + GRANT pg_read_all_stats TO regress_readallstats; SELECT has_table_privilege('regress_readallstats','pg_backend_memory_contexts','SELECT'); -- yes has_table_privilege @@ -3157,6 +3163,12 @@ SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations','SELECT t (1 row) +SELECT has_table_privilege('regress_readallstats','pg_shmem_numa_allocations','SELECT'); -- yes + has_table_privilege +--------------------- + t +(1 row) + -- run query to ensure that functions within views can be executed SET ROLE regress_readallstats; SELECT COUNT(*) >= 0 AS ok FROM pg_backend_memory_contexts; diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 62f69ac20b2..b63c6e0f744 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1740,6 +1740,10 @@ pg_shmem_allocations| SELECT name, size, allocated_size FROM pg_get_shmem_allocations() pg_get_shmem_allocations(name, off, size, allocated_size); +pg_shmem_numa_allocations| SELECT name, + numa_zone_id, + numa_size + FROM pg_get_shmem_numa_allocations() pg_get_shmem_numa_allocations(name, numa_zone_id, numa_size); pg_stat_activity| SELECT s.datid, d.datname, s.pid, diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 37b6d21e1f9..c07a4c7633a 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # The stats test resets stats, so nothing else needing stats access can be in # this group. # ---------- -test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate +test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate numa # event_trigger depends on create_am and cannot run concurrently with # any test that runs DDL diff --git a/src/test/regress/sql/numa.sql b/src/test/regress/sql/numa.sql new file mode 100644 index 00000000000..fddb21a260a --- /dev/null +++ b/src/test/regress/sql/numa.sql @@ -0,0 +1,9 @@ +SELECT NOT(pg_numa_available()) AS skip_test \gset +\if :skip_test +\quit +\endif + +-- switch to superuser +\c - + +SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_numa_allocations; diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql index b81694c24f2..f93d4829702 100644 --- a/src/test/regress/sql/privileges.sql +++ b/src/test/regress/sql/privileges.sql @@ -1911,8 +1911,8 @@ REVOKE MAINTAIN ON lock_table FROM regress_locktable_user; DROP TABLE lock_table; DROP USER regress_locktable_user; --- test to check privileges of system views pg_shmem_allocations and --- pg_backend_memory_contexts. +-- test to check privileges of system views pg_shmem_allocations, +-- pg_shmem_numa_allocations and pg_backend_memory_contexts. -- switch to superuser \c - @@ -1921,11 +1921,13 @@ CREATE ROLE regress_readallstats; SELECT has_table_privilege('regress_readallstats','pg_backend_memory_contexts','SELECT'); -- no SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations','SELECT'); -- no +SELECT has_table_privilege('regress_readallstats','pg_shmem_numa_allocations','SELECT'); -- no GRANT pg_read_all_stats TO regress_readallstats; SELECT has_table_privilege('regress_readallstats','pg_backend_memory_contexts','SELECT'); -- yes SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations','SELECT'); -- yes +SELECT has_table_privilege('regress_readallstats','pg_shmem_numa_allocations','SELECT'); -- yes -- run query to ensure that functions within views can be executed SET ROLE regress_readallstats; -- 2.39.5
From 661c36d0a98e572ad0d3d47174f273f5fa3943c4 Mon Sep 17 00:00:00 2001 From: Jakub Wartak <jakub.wartak@enterprisedb.com> Date: Fri, 21 Feb 2025 11:17:28 +0100 Subject: [PATCH v13 2/3] Extend pg_buffercache with new view pg_buffercache_numa to show NUMA zone for indvidual buffer. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com --- contrib/pg_buffercache/Makefile | 3 +- .../expected/pg_buffercache_numa.out | 28 + .../expected/pg_buffercache_numa_1.out | 3 + contrib/pg_buffercache/meson.build | 2 + .../pg_buffercache--1.5--1.6.sql | 42 ++ contrib/pg_buffercache/pg_buffercache.control | 2 +- contrib/pg_buffercache/pg_buffercache_pages.c | 479 +++++++++++++----- .../sql/pg_buffercache_numa.sql | 20 + doc/src/sgml/pgbuffercache.sgml | 61 ++- 9 files changed, 506 insertions(+), 134 deletions(-) create mode 100644 contrib/pg_buffercache/expected/pg_buffercache_numa.out create mode 100644 contrib/pg_buffercache/expected/pg_buffercache_numa_1.out create mode 100644 contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql create mode 100644 contrib/pg_buffercache/sql/pg_buffercache_numa.sql diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile index eae65ead9e5..2a33602537e 100644 --- a/contrib/pg_buffercache/Makefile +++ b/contrib/pg_buffercache/Makefile @@ -8,7 +8,8 @@ OBJS = \ EXTENSION = pg_buffercache DATA = pg_buffercache--1.2.sql pg_buffercache--1.2--1.3.sql \ pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql \ - pg_buffercache--1.3--1.4.sql pg_buffercache--1.4--1.5.sql + pg_buffercache--1.3--1.4.sql pg_buffercache--1.4--1.5.sql \ + pg_buffercache--1.5--1.6.sql PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time" REGRESS = pg_buffercache diff --git a/contrib/pg_buffercache/expected/pg_buffercache_numa.out b/contrib/pg_buffercache/expected/pg_buffercache_numa.out new file mode 100644 index 00000000000..d4de5ea52fc --- /dev/null +++ b/contrib/pg_buffercache/expected/pg_buffercache_numa.out @@ -0,0 +1,28 @@ +SELECT NOT(pg_numa_available()) AS skip_test \gset +\if :skip_test +\quit +\endif +select count(*) = (select setting::bigint + from pg_settings + where name = 'shared_buffers') +from pg_buffercache_numa; + ?column? +---------- + t +(1 row) + +-- Check that the functions / views can't be accessed by default. To avoid +-- having to create a dedicated user, use the pg_database_owner pseudo-role. +SET ROLE pg_database_owner; +SELECT count(*) > 0 FROM pg_buffercache_numa; +ERROR: permission denied for view pg_buffercache_numa +RESET role; +-- Check that pg_monitor is allowed to query view / function +SET ROLE pg_monitor; +SELECT count(*) > 0 FROM pg_buffercache_numa; + ?column? +---------- + t +(1 row) + +RESET role; diff --git a/contrib/pg_buffercache/expected/pg_buffercache_numa_1.out b/contrib/pg_buffercache/expected/pg_buffercache_numa_1.out new file mode 100644 index 00000000000..6dd6824b4e4 --- /dev/null +++ b/contrib/pg_buffercache/expected/pg_buffercache_numa_1.out @@ -0,0 +1,3 @@ +SELECT NOT(pg_numa_available()) AS skip_test \gset +\if :skip_test +\quit diff --git a/contrib/pg_buffercache/meson.build b/contrib/pg_buffercache/meson.build index 12d1fe48717..7cd039a1df9 100644 --- a/contrib/pg_buffercache/meson.build +++ b/contrib/pg_buffercache/meson.build @@ -23,6 +23,7 @@ install_data( 'pg_buffercache--1.2.sql', 'pg_buffercache--1.3--1.4.sql', 'pg_buffercache--1.4--1.5.sql', + 'pg_buffercache--1.5--1.6.sql', 'pg_buffercache.control', kwargs: contrib_data_args, ) @@ -34,6 +35,7 @@ tests += { 'regress': { 'sql': [ 'pg_buffercache', + 'pg_buffercache_numa', ], }, } diff --git a/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql b/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql new file mode 100644 index 00000000000..42a693aa4d4 --- /dev/null +++ b/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql @@ -0,0 +1,42 @@ +/* contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION pg_buffercache" to load this file. \quit + +-- Register the new function. +DROP VIEW pg_buffercache; +DROP FUNCTION pg_buffercache_pages(); + +CREATE OR REPLACE FUNCTION pg_buffercache_pages() +RETURNS SETOF RECORD +AS 'MODULE_PATHNAME', 'pg_buffercache_pages' +LANGUAGE C PARALLEL SAFE; + +CREATE OR REPLACE FUNCTION pg_buffercache_numa_pages() +RETURNS SETOF RECORD +AS 'MODULE_PATHNAME', 'pg_buffercache_numa_pages' +LANGUAGE C PARALLEL SAFE; + +-- Create a view for convenient access. +CREATE OR REPLACE VIEW pg_buffercache AS + SELECT P.* FROM pg_buffercache_pages() AS P + (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, + relforknumber int2, relblocknumber int8, isdirty bool, usagecount int2, + pinning_backends int4); + +CREATE OR REPLACE VIEW pg_buffercache_numa AS + SELECT P.* FROM pg_buffercache_numa_pages() AS P + (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, + relforknumber int2, relblocknumber int8, isdirty bool, usagecount int2, + pinning_backends int4, zone_id int4); + +-- Don't want these to be available to public. +REVOKE ALL ON FUNCTION pg_buffercache_pages() FROM PUBLIC; +REVOKE ALL ON FUNCTION pg_buffercache_numa_pages() FROM PUBLIC; +REVOKE ALL ON pg_buffercache FROM PUBLIC; +REVOKE ALL ON pg_buffercache_numa FROM PUBLIC; + +GRANT EXECUTE ON FUNCTION pg_buffercache_pages() TO pg_monitor; +GRANT EXECUTE ON FUNCTION pg_buffercache_numa_pages() TO pg_monitor; +GRANT SELECT ON pg_buffercache TO pg_monitor; +GRANT SELECT ON pg_buffercache_numa TO pg_monitor; diff --git a/contrib/pg_buffercache/pg_buffercache.control b/contrib/pg_buffercache/pg_buffercache.control index 5ee875f77dd..b030ba3a6fa 100644 --- a/contrib/pg_buffercache/pg_buffercache.control +++ b/contrib/pg_buffercache/pg_buffercache.control @@ -1,5 +1,5 @@ # pg_buffercache extension comment = 'examine the shared buffer cache' -default_version = '1.5' +default_version = '1.6' module_pathname = '$libdir/pg_buffercache' relocatable = true diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c index 3ae0a018e10..c5cfa32fa07 100644 --- a/contrib/pg_buffercache/pg_buffercache_pages.c +++ b/contrib/pg_buffercache/pg_buffercache_pages.c @@ -11,12 +11,12 @@ #include "access/htup_details.h" #include "catalog/pg_type.h" #include "funcapi.h" +#include "port/pg_numa.h" #include "storage/buf_internals.h" #include "storage/bufmgr.h" - #define NUM_BUFFERCACHE_PAGES_MIN_ELEM 8 -#define NUM_BUFFERCACHE_PAGES_ELEM 9 +#define NUM_BUFFERCACHE_PAGES_ELEM 10 #define NUM_BUFFERCACHE_SUMMARY_ELEM 5 #define NUM_BUFFERCACHE_USAGE_COUNTS_ELEM 4 @@ -43,6 +43,7 @@ typedef struct * because of bufmgr.c's PrivateRefCount infrastructure. */ int32 pinning_backends; + int32 numa_zone_id; } BufferCachePagesRec; @@ -61,84 +62,258 @@ typedef struct * relation node/tablespace/database/blocknum and dirty indicator. */ PG_FUNCTION_INFO_V1(pg_buffercache_pages); +PG_FUNCTION_INFO_V1(pg_buffercache_numa_pages); PG_FUNCTION_INFO_V1(pg_buffercache_summary); PG_FUNCTION_INFO_V1(pg_buffercache_usage_counts); PG_FUNCTION_INFO_V1(pg_buffercache_evict); -Datum -pg_buffercache_pages(PG_FUNCTION_ARGS) +/* Only need to touch memory once per backend process lifetime */ +static bool firstNumaTouch = true; + +/* + * Helper routine to map Buffers into addresses that is used by + * pg_numa_query_pages(). + * + * When database block size (BLCKSZ) is smaller than the OS page size (4kB), + * multiple database buffers will map to the same OS memory page. In this case, + * we only need to query the NUMA zone for the first memory address of each + * unique OS page rather than for every buffer. + * + * In order to get reliable results we also need to touch memory pages, so that + * inquiry about NUMA zone doesn't return -2 (which indicates unmapped/unallocated + * pages) + */ +static inline void +pg_buffercache_numa_prepare_ptrs(int buffer_id, float pages_per_blk, + Size os_page_size, + void **os_page_ptrs) +{ + size_t blk2page = (size_t) (buffer_id * pages_per_blk); + + for (size_t j = 0; j < pages_per_blk; j++) + { + size_t blk2pageoff = blk2page + j; + + if (os_page_ptrs[blk2pageoff] == 0) + { + volatile uint64 touch pg_attribute_unused(); + + /* NBuffers starts from 1 */ + os_page_ptrs[blk2pageoff] = (char *) BufferGetBlock(buffer_id + 1) + + (os_page_size * j); + + /* Only need to touch memory once per backend process lifetime */ + if (firstNumaTouch) + pg_numa_touch_mem_if_required(touch, os_page_ptrs[blk2pageoff]); + + } + + CHECK_FOR_INTERRUPTS(); + } +} + +/* + * Helper routine for pg_buffercache_pages() and pg_buffercache_numa_pages(). + * + * This is almost identical to pg_buffercache_numa_pages(), but this one performs + * memory mapping inquiries to display NUMA zone information for each buffer. + */ +static BufferCachePagesContext * +pg_buffercache_init_entries(FuncCallContext *funcctx, FunctionCallInfo fcinfo) { - FuncCallContext *funcctx; - Datum result; - MemoryContext oldcontext; BufferCachePagesContext *fctx; /* User function context. */ + MemoryContext oldcontext; TupleDesc tupledesc; TupleDesc expected_tupledesc; - HeapTuple tuple; - if (SRF_IS_FIRSTCALL()) - { - int i; + /* + * Switch context when allocating stuff to be used in later calls + */ + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); - funcctx = SRF_FIRSTCALL_INIT(); + /* Create a user function context for cross-call persistence */ + fctx = (BufferCachePagesContext *) palloc(sizeof(BufferCachePagesContext)); - /* Switch context when allocating stuff to be used in later calls */ - oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + /* + * To smoothly support upgrades from version 1.0 of this extension + * transparently handle the (non-)existence of the pinning_backends + * column. We unfortunately have to get the result type for that... - we + * can't use the result type determined by the function definition without + * potentially crashing when somebody uses the old (or even wrong) + * function definition though. + */ + if (get_call_result_type(fcinfo, NULL, &expected_tupledesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); - /* Create a user function context for cross-call persistence */ - fctx = (BufferCachePagesContext *) palloc(sizeof(BufferCachePagesContext)); + if (expected_tupledesc->natts < NUM_BUFFERCACHE_PAGES_MIN_ELEM || + expected_tupledesc->natts > NUM_BUFFERCACHE_PAGES_ELEM) + elog(ERROR, "incorrect number of output arguments"); + + /* Construct a tuple descriptor for the result rows. */ + tupledesc = CreateTemplateTupleDesc(expected_tupledesc->natts); + TupleDescInitEntry(tupledesc, (AttrNumber) 1, "bufferid", + INT4OID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 2, "relfilenode", + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 3, "reltablespace", + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 4, "reldatabase", + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 5, "relforknumber", + INT2OID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 6, "relblocknumber", + INT8OID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 7, "isdirty", + BOOLOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 8, "usage_count", + INT2OID, -1, 0); + + if (expected_tupledesc->natts >= NUM_BUFFERCACHE_PAGES_ELEM - 1) + TupleDescInitEntry(tupledesc, (AttrNumber) 9, "pinning_backends", + INT4OID, -1, 0); + if (expected_tupledesc->natts == NUM_BUFFERCACHE_PAGES_ELEM) + TupleDescInitEntry(tupledesc, (AttrNumber) 10, "numa_zone_id", + INT4OID, -1, 0); + + fctx->tupdesc = BlessTupleDesc(tupledesc); + + /* Allocate NBuffers worth of BufferCachePagesRec records. */ + fctx->record = (BufferCachePagesRec *) + MemoryContextAllocHuge(CurrentMemoryContext, + sizeof(BufferCachePagesRec) * NBuffers); + + /* Set max calls and remember the user function context. */ + funcctx->max_calls = NBuffers; + funcctx->user_fctx = fctx; + + /* + * Return to original context when allocating transient memory + */ + MemoryContextSwitchTo(oldcontext); + return fctx; +} + +/* + * Helper routine for pg_buffercache_pages() and pg_buffercache_numa_pages(). + * + * Build buffer cache information for a single buffer. + */ +static void +pg_buffercache_build_tuple(int record_id, BufferCachePagesContext *fctx) +{ + BufferDesc *bufHdr; + uint32 buf_state; + + bufHdr = GetBufferDescriptor(record_id); + /* Lock each buffer header before inspecting. */ + buf_state = LockBufHdr(bufHdr); + + fctx->record[record_id].bufferid = BufferDescriptorGetBuffer(bufHdr); + fctx->record[record_id].relfilenumber = BufTagGetRelNumber(&bufHdr->tag); + fctx->record[record_id].reltablespace = bufHdr->tag.spcOid; + fctx->record[record_id].reldatabase = bufHdr->tag.dbOid; + fctx->record[record_id].forknum = BufTagGetForkNum(&bufHdr->tag); + fctx->record[record_id].blocknum = bufHdr->tag.blockNum; + fctx->record[record_id].usagecount = BUF_STATE_GET_USAGECOUNT(buf_state); + fctx->record[record_id].pinning_backends = BUF_STATE_GET_REFCOUNT(buf_state); + + if (buf_state & BM_DIRTY) + fctx->record[record_id].isdirty = true; + else + fctx->record[record_id].isdirty = false; + + /* + * Note if the buffer is valid, and has storage created + */ + if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID)) + fctx->record[record_id].isvalid = true; + else + fctx->record[record_id].isvalid = false; + + fctx->record[record_id].numa_zone_id = -1; + + UnlockBufHdr(bufHdr, buf_state); +} + +/* + * Helper routine for pg_buffercache_pages() and pg_buffercache_numa_pages(). + * + * Format and return a tuple for a single buffer cache entry. + */ +static Datum +get_buffercache_tuple(int record_id, BufferCachePagesContext *fctx) +{ + Datum values[NUM_BUFFERCACHE_PAGES_ELEM]; + bool nulls[NUM_BUFFERCACHE_PAGES_ELEM]; + HeapTuple tuple; + + values[0] = Int32GetDatum(fctx->record[record_id].bufferid); + nulls[0] = false; + + /* + * Set all fields except the bufferid to null if the buffer is unused or + * not valid. + */ + if (fctx->record[record_id].blocknum == InvalidBlockNumber || + fctx->record[record_id].isvalid == false) + { + nulls[1] = true; + nulls[2] = true; + nulls[3] = true; + nulls[4] = true; + nulls[5] = true; + nulls[6] = true; + nulls[7] = true; + + /* + * unused for v1.0 callers, but the array is always long enough + */ + nulls[8] = true; + nulls[9] = true; + } + else + { + values[1] = ObjectIdGetDatum(fctx->record[record_id].relfilenumber); + nulls[1] = false; + values[2] = ObjectIdGetDatum(fctx->record[record_id].reltablespace); + nulls[2] = false; + values[3] = ObjectIdGetDatum(fctx->record[record_id].reldatabase); + nulls[3] = false; + values[4] = ObjectIdGetDatum(fctx->record[record_id].forknum); + nulls[4] = false; + values[5] = Int64GetDatum((int64) fctx->record[record_id].blocknum); + nulls[5] = false; + values[6] = BoolGetDatum(fctx->record[record_id].isdirty); + nulls[6] = false; + values[7] = Int16GetDatum(fctx->record[record_id].usagecount); + nulls[7] = false; /* - * To smoothly support upgrades from version 1.0 of this extension - * transparently handle the (non-)existence of the pinning_backends - * column. We unfortunately have to get the result type for that... - - * we can't use the result type determined by the function definition - * without potentially crashing when somebody uses the old (or even - * wrong) function definition though. + * unused for v1.0 callers, but the array is always long enough */ - if (get_call_result_type(fcinfo, NULL, &expected_tupledesc) != TYPEFUNC_COMPOSITE) - elog(ERROR, "return type must be a row type"); + values[8] = Int32GetDatum(fctx->record[record_id].pinning_backends); + nulls[8] = false; + values[9] = Int32GetDatum(fctx->record[record_id].numa_zone_id); + nulls[9] = false; + } - if (expected_tupledesc->natts < NUM_BUFFERCACHE_PAGES_MIN_ELEM || - expected_tupledesc->natts > NUM_BUFFERCACHE_PAGES_ELEM) - elog(ERROR, "incorrect number of output arguments"); + /* Build and return the tuple. */ + tuple = heap_form_tuple(fctx->tupdesc, values, nulls); + return HeapTupleGetDatum(tuple); +} - /* Construct a tuple descriptor for the result rows. */ - tupledesc = CreateTemplateTupleDesc(expected_tupledesc->natts); - TupleDescInitEntry(tupledesc, (AttrNumber) 1, "bufferid", - INT4OID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 2, "relfilenode", - OIDOID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 3, "reltablespace", - OIDOID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 4, "reldatabase", - OIDOID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 5, "relforknumber", - INT2OID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 6, "relblocknumber", - INT8OID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 7, "isdirty", - BOOLOID, -1, 0); - TupleDescInitEntry(tupledesc, (AttrNumber) 8, "usage_count", - INT2OID, -1, 0); - - if (expected_tupledesc->natts == NUM_BUFFERCACHE_PAGES_ELEM) - TupleDescInitEntry(tupledesc, (AttrNumber) 9, "pinning_backends", - INT4OID, -1, 0); - - fctx->tupdesc = BlessTupleDesc(tupledesc); - - /* Allocate NBuffers worth of BufferCachePagesRec records. */ - fctx->record = (BufferCachePagesRec *) - MemoryContextAllocHuge(CurrentMemoryContext, - sizeof(BufferCachePagesRec) * NBuffers); - - /* Set max calls and remember the user function context. */ - funcctx->max_calls = NBuffers; - funcctx->user_fctx = fctx; - - /* Return to original context when allocating transient memory */ - MemoryContextSwitchTo(oldcontext); +Datum +pg_buffercache_pages(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + BufferCachePagesContext *fctx; /* User function context. */ + + if (SRF_IS_FIRSTCALL()) + { + int i; + + funcctx = SRF_FIRSTCALL_INIT(); + fctx = pg_buffercache_init_entries(funcctx, fcinfo); /* * Scan through all the buffers, saving the relevant fields in the @@ -149,36 +324,7 @@ pg_buffercache_pages(PG_FUNCTION_ARGS) * locks, so the information of each buffer is self-consistent. */ for (i = 0; i < NBuffers; i++) - { - BufferDesc *bufHdr; - uint32 buf_state; - - bufHdr = GetBufferDescriptor(i); - /* Lock each buffer header before inspecting. */ - buf_state = LockBufHdr(bufHdr); - - fctx->record[i].bufferid = BufferDescriptorGetBuffer(bufHdr); - fctx->record[i].relfilenumber = BufTagGetRelNumber(&bufHdr->tag); - fctx->record[i].reltablespace = bufHdr->tag.spcOid; - fctx->record[i].reldatabase = bufHdr->tag.dbOid; - fctx->record[i].forknum = BufTagGetForkNum(&bufHdr->tag); - fctx->record[i].blocknum = bufHdr->tag.blockNum; - fctx->record[i].usagecount = BUF_STATE_GET_USAGECOUNT(buf_state); - fctx->record[i].pinning_backends = BUF_STATE_GET_REFCOUNT(buf_state); - - if (buf_state & BM_DIRTY) - fctx->record[i].isdirty = true; - else - fctx->record[i].isdirty = false; - - /* Note if the buffer is valid, and has storage created */ - if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID)) - fctx->record[i].isvalid = true; - else - fctx->record[i].isvalid = false; - - UnlockBufHdr(bufHdr, buf_state); - } + pg_buffercache_build_tuple(i, fctx); } funcctx = SRF_PERCALL_SETUP(); @@ -188,59 +334,130 @@ pg_buffercache_pages(PG_FUNCTION_ARGS) if (funcctx->call_cntr < funcctx->max_calls) { + Datum result; uint32 i = funcctx->call_cntr; - Datum values[NUM_BUFFERCACHE_PAGES_ELEM]; - bool nulls[NUM_BUFFERCACHE_PAGES_ELEM]; - values[0] = Int32GetDatum(fctx->record[i].bufferid); - nulls[0] = false; + result = get_buffercache_tuple(i, fctx); + SRF_RETURN_NEXT(funcctx, result); + } + else + { + SRF_RETURN_DONE(funcctx); + } +} + +/* + * This is almost identical to the above, but performs + * NUMA inuqiry about memory mappings + */ +Datum +pg_buffercache_numa_pages(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + BufferCachePagesContext *fctx; /* User function context. */ + + if (SRF_IS_FIRSTCALL()) + { + int i; + Size os_page_size = 0; + void **os_page_ptrs = NULL; + int *os_pages_status = NULL; + uint64 os_page_count = 0; + float pages_per_blk = 0; + + funcctx = SRF_FIRSTCALL_INIT(); + + if (pg_numa_init() == -1) + elog(ERROR, "libnuma initialization failed or NUMA is not supported on this platform"); + + fctx = pg_buffercache_init_entries(funcctx, fcinfo); /* - * Set all fields except the bufferid to null if the buffer is unused - * or not valid. + * Different database block sizes (4kB, 8kB, ..., 32kB) can be used, + * while the OS may have different memory page sizes. + * + * To correctly map between them, we need to: - Determine the OS + * memory page size - Calculate how many OS pages are used by all + * buffer blocks - Calculate how many OS pages are contained within + * each database block + * + * This information is needed before calling move_pages() for NUMA + * zone inquiry. + */ + os_page_size = pg_numa_get_pagesize(); + os_page_count = ((uint64) NBuffers * BLCKSZ) / os_page_size; + pages_per_blk = (float) BLCKSZ / os_page_size; + + elog(DEBUG1, "NUMA: os_page_count=%lu os_page_size=%zu pages_per_blk=%.2f", + (unsigned long) os_page_count, os_page_size, pages_per_blk); + + os_page_ptrs = palloc(sizeof(void *) * os_page_count); + os_pages_status = palloc(sizeof(uint64) * os_page_count); + memset(os_page_ptrs, 0, sizeof(void *) * os_page_count); + + /* + * If we ever get 0xff back from kernel inquiry, then we probably have + * bug in our buffers to OS page mapping code here + */ + memset(os_pages_status, 0xff, sizeof(int) * os_page_count); + + if (firstNumaTouch) + elog(DEBUG1, "NUMA: page-faulting the buffercache for proper NUMA readouts"); + + /* + * Scan through all the buffers, saving the relevant fields in the + * fctx->record structure. + * + * We don't hold the partition locks, so we don't get a consistent + * snapshot across all buffers, but we do grab the buffer header + * locks, so the information of each buffer is self-consistent. */ - if (fctx->record[i].blocknum == InvalidBlockNumber || - fctx->record[i].isvalid == false) + for (i = 0; i < NBuffers; i++) { - nulls[1] = true; - nulls[2] = true; - nulls[3] = true; - nulls[4] = true; - nulls[5] = true; - nulls[6] = true; - nulls[7] = true; - /* unused for v1.0 callers, but the array is always long enough */ - nulls[8] = true; + pg_buffercache_build_tuple(i, fctx); + pg_buffercache_numa_prepare_ptrs(i, pages_per_blk, os_page_size, + os_page_ptrs); } - else + + if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_pages_status) == -1) + elog(ERROR, "failed NUMA pages inquiry: %m"); + + for (i = 0; i < NBuffers; i++) { - values[1] = ObjectIdGetDatum(fctx->record[i].relfilenumber); - nulls[1] = false; - values[2] = ObjectIdGetDatum(fctx->record[i].reltablespace); - nulls[2] = false; - values[3] = ObjectIdGetDatum(fctx->record[i].reldatabase); - nulls[3] = false; - values[4] = ObjectIdGetDatum(fctx->record[i].forknum); - nulls[4] = false; - values[5] = Int64GetDatum((int64) fctx->record[i].blocknum); - nulls[5] = false; - values[6] = BoolGetDatum(fctx->record[i].isdirty); - nulls[6] = false; - values[7] = Int16GetDatum(fctx->record[i].usagecount); - nulls[7] = false; - /* unused for v1.0 callers, but the array is always long enough */ - values[8] = Int32GetDatum(fctx->record[i].pinning_backends); - nulls[8] = false; + int blk2page = (int) i * pages_per_blk; + + /* + * Set the NUMA zone ID for this buffer based on the first OS page + * it maps to. + * + * Note: We could check for errors in os_pages_status and report + * them. Also, a single DB block might span multiple NUMA zones if + * it crosses OS pages on zone boundaries, but we only record the + * zone of the first page. This is a simplification but should be + * sufficient for most analyses. + */ + fctx->record[i].numa_zone_id = os_pages_status[blk2page]; } + } - /* Build and return the tuple. */ - tuple = heap_form_tuple(fctx->tupdesc, values, nulls); - result = HeapTupleGetDatum(tuple); + funcctx = SRF_PERCALL_SETUP(); + /* Get the saved state */ + fctx = funcctx->user_fctx; + + if (funcctx->call_cntr < funcctx->max_calls) + { + Datum result; + uint32 i = funcctx->call_cntr; + + result = get_buffercache_tuple(i, fctx); SRF_RETURN_NEXT(funcctx, result); } else + { + firstNumaTouch = false; SRF_RETURN_DONE(funcctx); + } } Datum diff --git a/contrib/pg_buffercache/sql/pg_buffercache_numa.sql b/contrib/pg_buffercache/sql/pg_buffercache_numa.sql new file mode 100644 index 00000000000..2225b879f58 --- /dev/null +++ b/contrib/pg_buffercache/sql/pg_buffercache_numa.sql @@ -0,0 +1,20 @@ +SELECT NOT(pg_numa_available()) AS skip_test \gset +\if :skip_test +\quit +\endif + +select count(*) = (select setting::bigint + from pg_settings + where name = 'shared_buffers') +from pg_buffercache_numa; + +-- Check that the functions / views can't be accessed by default. To avoid +-- having to create a dedicated user, use the pg_database_owner pseudo-role. +SET ROLE pg_database_owner; +SELECT count(*) > 0 FROM pg_buffercache_numa; +RESET role; + +-- Check that pg_monitor is allowed to query view / function +SET ROLE pg_monitor; +SELECT count(*) > 0 FROM pg_buffercache_numa; +RESET role; diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml index 802a5112d77..4b49bb2974a 100644 --- a/doc/src/sgml/pgbuffercache.sgml +++ b/doc/src/sgml/pgbuffercache.sgml @@ -30,7 +30,9 @@ <para> This module provides the <function>pg_buffercache_pages()</function> function (wrapped in the <structname>pg_buffercache</structname> view), - the <function>pg_buffercache_summary()</function> function, the + <function>pg_buffercache_numa_pages()</function> function (wrapped in the + <structname>pg_buffercache_numa</structname> view), the + <function>pg_buffercache_summary()</function> function, the <function>pg_buffercache_usage_counts()</function> function and the <function>pg_buffercache_evict()</function> function. </para> @@ -42,6 +44,14 @@ convenient use. </para> + <para> + The <function>pg_buffercache_numa_pages()</function> provides the same information + as <function>pg_buffercache_pages()</function> but is slower because it also + provides the <acronym>NUMA</acronym> node ID per shared buffer entry. + The <structname>pg_buffercache_numa</structname> view wraps the function for + convenient use. + </para> + <para> The <function>pg_buffercache_summary()</function> function returns a single row summarizing the state of the shared buffer cache. @@ -200,6 +210,55 @@ </para> </sect2> + <sect2 id="pgbuffercache-pg-buffercache_numa"> + <title>The <structname>pg_buffercache_numa</structname> View</title> + + <para> + The definitions of the columns exposed are identical to the + <structname>pg_buffercache</structname> view, except that this one includes + one additional <structfield>zone_id</structfield> column as defined in + <xref linkend="pgbuffercache-numa-columns"/>. + </para> + + <table id="pgbuffercache-numa-columns"> + <title><structname>pg_buffercache_numa</structname> Extra column</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + Column Type + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>zone_id</structfield> <type>integer</type> + </para> + <para> + <acronym>NUMA</acronym> node ID. NULL if the shared buffer + has not been used yet. On systems without <acronym>NUMA</acronym> support + this returns 0. + </para></entry> + </row> + + </tbody> + </tgroup> + </table> + + <para> + As <acronym>NUMA</acronym> node ID inquiry for each page requires memory pages + to be paged-in, the first execution of this function can take a noticeable + amount of time. In all the cases (first execution or not), retrieving this + information is costly and querying the view at a high frequency is not recommended. + </para> + + </sect2> + <sect2 id="pgbuffercache-summary"> <title>The <function>pg_buffercache_summary()</function> Function</title> -- 2.39.5