Re: pg_stat_statements oddity with track = all
On Fri, Dec 04, 2020 at 06:09:13PM +0800, Julien Rouhaud wrote: > On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote: > > Hello > > > > Seems we need also change PGSS_FILE_HEADER. > > Indeed, thanks! v2 attached. There was a conflict on PGSS_FILE_HEADER since some recent commit, v3 attached. >From 832b1a81cfba8a38c6b58b0a9212a6a95fc231a8 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Fri, 4 Dec 2020 13:33:51 +0800 Subject: [PATCH v3] Add a bool toplevel column to pg_stat_statements. As top level statements include resource consumption for underlying statements, it's not possible to compute the total resource consumption accurately. Fix that by adding a new toplevel boolean field that indicates whether the counters were cumulated for queries executed at top level or not. It can lead to more entries being stored for the same workload if pg_stat_statements.track is set to all, but it seems unlikely to have the same queries being executed both at top level and as nested statements. --- contrib/pg_stat_statements/Makefile | 3 +- .../expected/pg_stat_statements.out | 40 + .../pg_stat_statements--1.9--1.10.sql | 57 +++ .../pg_stat_statements/pg_stat_statements.c | 44 +++--- .../pg_stat_statements.control| 2 +- .../sql/pg_stat_statements.sql| 21 +++ doc/src/sgml/pgstatstatements.sgml| 9 +++ 7 files changed, 166 insertions(+), 10 deletions(-) create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 3ec627b956..cab4f626ad 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,7 +6,8 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \ +DATA = pg_stat_statements--1.4.sql \ +pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \ diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 16158525ca..fb97f68737 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info; 0 (1 row) +-- +-- top level handling +-- +SET pg_stat_statements.track = 'top'; +DELETE FROM test; +DO $$ +BEGIN +DELETE FROM test; +END; +$$ LANGUAGE plpgsql; +SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel; + query | toplevel | plans | calls +---+--+---+--- + DELETE FROM test | t| 1 | 1 + DO $$+| t| 0 | 1 + BEGIN+| | | + DELETE FROM test;+| | | + END; +| | | + $$ LANGUAGE plpgsql | | | +(2 rows) + +SET pg_stat_statements.track = 'all'; +DELETE FROM test; +DO $$ +BEGIN +DELETE FROM test; +END; +$$ LANGUAGE plpgsql; +SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel; + query | toplevel | plans | calls +---+--+---+--- + DELETE FROM test | f| 1 | 1 + DELETE FROM test | t| 2 | 2 + DO $$+| t| 0 | 2 + BEGIN+| | | + DELETE FROM test;+| | | + END; +| | | + $$ LANGUAGE plpgsql | | | +(3 rows) + DROP EXTENSION pg_stat_statements; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql new file mode 100644 index 00..f97d16497d --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql @@ -0,0 +1,57 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(boolean); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statement
Re: Parallel Inserts in CREATE TABLE AS
On Sat, Dec 26, 2020 at 9:20 PM vignesh C wrote: > +-- parallel inserts must occur > +select explain_pictas( > +'create table parallel_write as select length(stringu1) from tenk1;'); > +select count(*) from parallel_write; > +drop table parallel_write; > > We can change comment "parallel inserts must occur" like "parallel > insert must be selected for CTAS on normal table" > > +-- parallel inserts must occur > +select explain_pictas( > +'create unlogged table parallel_write as select length(stringu1) from > tenk1;'); > +select count(*) from parallel_write; > +drop table parallel_write; > > We can change comment "parallel inserts must occur" like "parallel > insert must be selected for CTAS on unlogged table" > Similar comment need to be handled in other places also. I think the existing comments look fine. The info like table type and the Query CTAS or CMV is visible by looking at the test case. What I wanted from the comments is whether we support parallel inserts or not and if not why so that it will be easy to read. I tried to keep it as succinctly as possible. > If possible try to make a common function for both and use. Yes you are right. The function explain_pictas is the same as explain_parallel_append from partition_prune.sql. It's a test function, and I also see that we have serial_schedule and parallel_schedule which means that these sql files can run in any order. I'm not quite sure whether we can have it in a common test sql file and use it across other tests sql files. AFAICS, I didn't find any function being used in such a manner. Thoughts? > + if (intoclausestr && OidIsValid(objectid)) > + fpes->objectid = objectid; > + else > + fpes->objectid = InvalidOid; > Here OidIsValid(objectid) check is not required intoclausestr will be > set only if OidIsValid. Removed the OidIsValid check in the latest v16 patch set posted upthread. Please have a look. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: range_agg
On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov wrote: > > I think this patch is very close to committable. I'm going to spend > some more time further polishing it and commit (if I don't find a > major issue or face objections). The main patch is committed. I've prepared a set of improvements. 0001 Fixes bug in bsearch comparison functions 0002 Implements missing @> (range,multirange) operator and its commutator 0003 Does refactors signatures of *_internal() multirange functions 0004 Adds cross-type (range, multirange) operators handling to existing range GiST opclass 0005 Adds support for GiST multirange indexing by approximation of multirange as the union range with no gaps The patchset is quite trivial. I'm going to push it if there are no objections. The SP-GiST handling is more tricky and requires substantial work. -- Regards, Alexander Korotkov 0001-Fix-bugs-in-comparison-functions-for-multirange_bsea.patch Description: Binary data 0002-Implement-operators-for-checking-if-the-range-contai.patch Description: Binary data 0003-Improve-the-signature-of-internal-multirange-functio.patch Description: Binary data 0005-Add-GiST-indexes-for-multiranges.patch Description: Binary data 0004-Add-support-of-multirange-matching-to-the-existing-r.patch Description: Binary data
Re: Add table access method as an option to pgbench
Hello Justin, src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; ... src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2; Or maybe using SET default_tablespace instead of modifying the CREATE sql. That'd need to be done separately for indexes, and RESET after creating the tables, to avoid accidentally affecting indexes, too. Why should it not affect indexes? @Fabien: I think the table should be created and populated within the same transaction, to allow this optimization in v13 to apply during the "initialization" phase. c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal. Possibly. This would be a change of behavior: currently only filling tables is under an explicit transaction. That would be a matter for another patch, probably with an added option. As VACUUM is not transaction compatible, it might be a little bit tricky to add such a feature. Also I'm not sure about some constraint declarations. ISTM that I submitted a patch a long time ago to allow "()" to control transaction from the command line, but somehow it got lost or rejected. I redeveloped it, see attached. I cannot see reliable performance improvement by playing with (), though. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index b03d0cc50f..8d9b642fdd 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -170,7 +170,7 @@ pgbench options d init_steps specifies the initialization steps to be performed, using one character per step. Each step is invoked in the specified order. -The default is dtgvp. +The default is dt(g)vp. The available steps are: @@ -226,6 +226,22 @@ pgbench options d + + ( (Begin) + + +Begin a transaction. + + + + + ) (Commit) + + +Commit a transaction. + + + v (Vacuum) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 3057665bbe..f77e33056c 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -132,8 +132,8 @@ static int pthread_join(pthread_t th, void **thread_return); / * some configurable parameters */ -#define DEFAULT_INIT_STEPS "dtgvp" /* default -I setting */ -#define ALL_INIT_STEPS "dtgGvpf" /* all possible steps */ +#define DEFAULT_INIT_STEPS "dt(g)vp" /* default -I setting */ +#define ALL_INIT_STEPS "dtgGvpf()" /* all possible steps */ #define LOG_STEP_SECONDS 5 /* seconds between log messages */ #define DEFAULT_NXACTS 10 /* default nxacts */ @@ -3823,12 +3823,6 @@ initGenerateDataClientSide(PGconn *con) fprintf(stderr, "generating data (client-side)...\n"); - /* - * we do all of this in one transaction to enable the backend's - * data-loading optimizations - */ - executeStatement(con, "begin"); - /* truncate away any old data */ initTruncateTables(con); @@ -3940,8 +3934,6 @@ initGenerateDataClientSide(PGconn *con) } termPQExpBuffer(&sql); - - executeStatement(con, "commit"); } /* @@ -3958,12 +3950,6 @@ initGenerateDataServerSide(PGconn *con) fprintf(stderr, "generating data (server-side)...\n"); - /* - * we do all of this in one transaction to enable the backend's - * data-loading optimizations - */ - executeStatement(con, "begin"); - /* truncate away any old data */ initTruncateTables(con); @@ -3989,8 +3975,6 @@ initGenerateDataServerSide(PGconn *con) executeStatement(con, sql.data); termPQExpBuffer(&sql); - - executeStatement(con, "commit"); } /* @@ -4076,6 +4060,8 @@ initCreateFKeys(PGconn *con) static void checkInitSteps(const char *initialize_steps) { + bool in_tx = false; + if (initialize_steps[0] == '\0') { pg_log_fatal("no initialization steps specified"); @@ -4090,6 +4076,36 @@ checkInitSteps(const char *initialize_steps) pg_log_info("Allowed step characters are: \"" ALL_INIT_STEPS "\"."); exit(1); } + + if (*step == '(') + { + if (in_tx) + { +pg_log_fatal("opening a transaction inside a transaction"); +exit(1); + } + in_tx = true; + } + else if (*step == ')') + { + if (!in_tx) + { +pg_log_fatal("closing a transaction without opening it"); +exit(1); + } + in_tx = false; + } + else if (*step == 'v' && in_tx) + { + pg_log_fatal("cannot run vacuum within a transaction"); + exit(1); + } + } + + if (in_tx) + { + pg_log_fatal("uncommitted transaction"); + exit(1); } } @@ -4150,6 +4166,14 @@ runInitSteps(const char *initialize_steps) op = "foreign keys"; initCreateFKeys(con); break; + case '(': +op = "begin"; +executeStatement(con, "begin"); +break; + case ')': +op = "commit"; +
Re: Add session statistics to pg_stat_database
On Fri, 2020-12-25 at 20:28 +0900, Masahiro Ikeda wrote: > As a user, I want this feature to know whether > clients' session activities are as expected. > > I have some comments about the patch. > > 1. pg_proc.dat > > The unit of "session time" and so on says "in seconds". > But, is "in milliseconds" right? > > 2. monitoring.sgml > > IIUC, "active_time" includes the time executes a fast-path function and > "idle in transaction" includes "idle in transaction(aborted)" time. > > Why don't you reference pg_stat_activity's "state" column and > "active_time" is the total time when the state is "active" and "fast > path"? > "idle in transaction" is as same too. > > 3. pgstat.h > > The comment of PgStat_MsgConn says "Sent by pgstat_connection". > I thought "pgstat_connection" is a function, but it doesn't exist. > > Is "Sent by the backend" right? > > Although this is a trivial thing, the following row has too many tabs. > Other structs have only one space. > // }Pgstat_MsgConn; Thanks for the feedback. I am currently on vacations and will take a look after January 7. Yours, Laurenz Albe
Re: PoC Refactor AM analyse API
> 8 дек. 2020 г., в 16:44, Denis Smirnov написал(а): > > Andrey, thanks for your feedback! > > I agree that AMs with fix sized blocks can have much alike code in > acquire_sample_rows() (though it is not a rule). But there are several points > about current master sampling. > > * It is not perfect - AM developers may want to improve it with other > sampling algorithms. > * It is designed with a big influence of heap AM - for example, > RelationGetNumberOfBlocks() returns uint32 while other AMs can have a bigger > amount of blocks. > * heapam_acquire_sample_rows() is a small function - I don't think it is not > a big trouble to write something alike for any AM developer. > * Some AMs may have a single level sampling (only algorithm Z from Vitter for > example) - why not? > > As a result we get a single and clear method to acquire rows for statistics. > If we don’t modify but rather extend current API ( for example in a manner it > is done for FDW) the code becomes more complicated and difficult to > understand. This makes sense. Purpose of the API is to provide flexible abstraction. Current table_scan_analyze_next_block()/table_scan_analyze_next_tuple() API assumes too much about AM implementation. But why do you pass int natts and VacAttrStats **stats to acquire_sample_rows()? Is it of any use? It seems to break abstraction too. Best regards, Andrey Borodin.
Re: Proposed patch for key managment
On Sat, Dec 19, 2020 at 11:45:08AM +, Alastair Turner wrote: > On Fri, 18 Dec 2020 at 21:36, Stephen Frost wrote: > > These ideas don't seem too bad but I'd think we'd pass which key we want > > on the command-line using a %i or something like that, rather than using > > stdin, unless there's some reason that would be an issue..? Certainly > > the CLI utilities I've seen tend to expect the name of the secret that > > you're asking for to be passed on the command line. > > Agreed. I was working on the assumption that the calling process > (initdb or pg_ctl) would have access to the challenge material and be > passing it to the utility, so putting it in a command line would allow > it to leak. If the utility is accessing the challenge material > directly, then just an indicator of which key to work with would be a > lot simpler, true. I want to repeat here what I said in another thread: > I think ultimately we will need three commands to control the keys. > First, there is the cluster_key_command, which we have now. Second, I > think we will need an optional command which returns random bytes --- > this would allow users to get random bytes from a different source than > that used by the server code. > > Third, we will probably need a command that returns the data encryption > keys directly, either heap/index or WAL keys, probably based on key > number --- you pass the key number you want, and the command returns the > data key. There would not be a cluster key in this case, but the > command could still prompt the user for perhaps a password to the KMS > server. It could not be used if any of the previous two commands are > used. I assume an HMAC would still be stored in the pg_cryptokeys > directory to check that the right key has been returned. > > I thought we should implement the first command, because it will > probably be the most common and easiest to use, and then see what people > want added. There is also a fourth option where the command returns multiple keys, one per line of hex digits. That could be written in shell script, but it would be fragile and complex. It could be written in Perl, but that would add a new language requirement for this feature. It could be written in C, but that would limits its flexibility because changes would require a recompile, and you would probably need that C file to call external scripts to tailor input like we do now from the server. You could actually write a full implemention of what we do on the server side in client code, but pg_alterckey would not work, since it would not know the data format, so we would need another cluster key alter for that. It could be written as a C extension, but that would be also have C's limitations. In summary, having the server do most of the complex work for the default case seems best, and eventually allowing the ability for the client to do everything seems ideal. I think we need more input before we go beyond what we do now. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Add table access method as an option to pgbench
On Sun, Dec 27, 2020 at 09:14:53AM -0400, Fabien COELHO wrote: > > src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE > > HANDLER heap_tableam_handler; > > ... > > src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2; > > > Or maybe using SET default_tablespace instead of modifying the CREATE sql. > > That'd need to be done separately for indexes, and RESET after creating the > > tables, to avoid accidentally affecting indexes, too. > > Why should it not affect indexes? I rarely use pgbench, and probably never looked at its source before, but I saw that it has a separate --tablespace and --index-tablespace, and that --tablespace is *not* the default for indexes. So if we changed it to use SET default_tablespace instead of amending the DDL sql, we'd need to make sure the SET applied only to the intended CREATE command, and not all following create commands. In the case that --index-tablespace is not specified, it would be buggy to do this: SET default_tablespace=foo; CREATE TABLE ... CREATE INDEX ... CREATE TABLE ... CREATE INDEX ... ... -- Justin PS. Thanks for patching it to work with partitioned tables :)
Re: range_agg
Hi, This is not an ideal way to index multirages, but something we can easily have. typo: multiranges Cheers On Sun, Dec 27, 2020 at 1:50 AM Alexander Korotkov wrote: > On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov > wrote: > > > > I think this patch is very close to committable. I'm going to spend > > some more time further polishing it and commit (if I don't find a > > major issue or face objections). > > The main patch is committed. I've prepared a set of improvements. > 0001 Fixes bug in bsearch comparison functions > 0002 Implements missing @> (range,multirange) operator and its commutator > 0003 Does refactors signatures of *_internal() multirange functions > 0004 Adds cross-type (range, multirange) operators handling to > existing range GiST opclass > 0005 Adds support for GiST multirange indexing by approximation of > multirange as the union range with no gaps > > The patchset is quite trivial. I'm going to push it if there are no > objections. > > The SP-GiST handling is more tricky and requires substantial work. > > -- > Regards, > Alexander Korotkov >
Re: range_agg
On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote: > Hi, > > This is not an ideal way to index multirages, but something we can > easily have. What sort of indexing improvements do you have in mind? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: pg_upgrade test for binary compatibility of core data types
On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote: > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > > I meant to notice if the binary format is accidentally changed again, which > > was > > what happened here: > > 7c15cef86 Base information_schema.sql_identifier domain on name, not > > varchar. > > > > I added a table to the regression tests so it's processed by pg_upgrade > > tests, > > run like: > > > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 > > oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin > > Per cfbot, this avoids testing ::xml (support for which may not be enabled) > And also now tests oid types. > > I think the per-version hacks should be grouped by logical change, rather than > by version. Which I've started doing here. rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f -- Justin >From a1114c3db36891f169122383db136fd8fb47cb10 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 5 Dec 2020 22:31:19 -0600 Subject: [PATCH v3 1/2] WIP: pg_upgrade/test.sh: changes needed to allow testing upgrade from v11 --- src/bin/pg_upgrade/test.sh | 92 ++ 1 file changed, 84 insertions(+), 8 deletions(-) diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh index 04aa7fd9f5..9733217535 100644 --- a/src/bin/pg_upgrade/test.sh +++ b/src/bin/pg_upgrade/test.sh @@ -23,7 +23,7 @@ standard_initdb() { # To increase coverage of non-standard segment size and group access # without increasing test runtime, run these tests with a custom setting. # Also, specify "-A trust" explicitly to suppress initdb's warning. - "$1" -N --wal-segsize 1 -g -A trust + "$1" -N -A trust if [ -n "$TEMP_CONFIG" -a -r "$TEMP_CONFIG" ] then cat "$TEMP_CONFIG" >> "$PGDATA/postgresql.conf" @@ -108,6 +108,9 @@ export EXTRA_REGRESS_OPTS mkdir "$outputdir" mkdir "$outputdir"/testtablespace +mkdir "$outputdir"/sql +mkdir "$outputdir"/expected + logdir=`pwd`/log rm -rf "$logdir" mkdir "$logdir" @@ -172,16 +175,83 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then fix_sql="" case $oldpgversion in 804??) -fix_sql="DROP FUNCTION public.myfunc(integer); DROP FUNCTION public.oldstyle_length(integer, text);" +fix_sql="$fix_sql DROP FUNCTION public.myfunc(integer);" ;; - *) -fix_sql="DROP FUNCTION public.oldstyle_length(integer, text);" + esac + + # Removed in v10 commit 5ded4bd21 + case $oldpgversion in + 804??|9) +fix_sql="$fix_sql DROP FUNCTION public.oldstyle_length(integer, text);" +;; + esac + + # commit 068503c76511cdb0080bab689662a20e86b9c845 + case $oldpgversion in + 10) # XXX +fix_sql="$fix_sql DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;" +;; + esac + + # commit db3af9feb19f39827e916145f88fa5eca3130cb2 + case $oldpgversion in + 10) # XXX +fix_sql="$fix_sql DROP FUNCTION boxarea(box);" +fix_sql="$fix_sql DROP FUNCTION funny_dup17();" ;; esac + + # commit cda6a8d01d391eab45c4b3e0043a1b2b31072f5f + case $oldpgversion in + 10) # XXX +fix_sql="$fix_sql DROP TABLE abstime_tbl;" +fix_sql="$fix_sql DROP TABLE reltime_tbl;" +fix_sql="$fix_sql DROP TABLE tinterval_tbl;" +;; + esac + + # Various things removed for v14 + case $oldpgversion in + 804??|9|10|11|12|13) +# commit 76f412ab3 +# This one is only needed for v11+ ?? +# (see below for more operators removed that also apply to older versions) +fix_sql="$fix_sql DROP OPERATOR public.!=- (pg_catalog.int8, NONE);" +;; + esac + case $oldpgversion in + 804??|9|10|11|12|13) +# commit 76f412ab3 +fix_sql="$fix_sql DROP OPERATOR public.#@# (pg_catalog.int8, NONE);" +fix_sql="$fix_sql DROP OPERATOR public.#%# (pg_catalog.int8, NONE);" +fix_sql="$fix_sql DROP OPERATOR public.#@%# (pg_catalog.int8, NONE);" + +# commit 9e38c2bb5 and 97f73a978 +# fix_sql="$fix_sql DROP AGGREGATE array_larger_accum(anyarray);" +fix_sql="$fix_sql DROP AGGREGATE array_cat_accum(anyarray);" +fix_sql="$fix_sql DROP AGGREGATE first_el_agg_any(anyelement);" + +# commit 76f412ab3 +#fix_sql="$fix_sql DROP OPERATOR @#@(bigint,NONE);" +fix_sql="$fix_sql DROP OPERATOR @#@(NONE,bigint);" +;; + esac + + # commit 578b22971: OIDS removed in v12 + case $oldpgversion in + 804??|9|10|11) +fix_sql="$fix_sql ALTER TABLE public.tenk1 SET WITHOUT OIDS;" +fix_sql="$fix_sql ALTER TABLE public.tenk1 SET WITHOUT OIDS;" +#fix_sql="$fix_sql ALTER TABLE public.stud_emp SET WITHOUT OIDS;" # inherited +fix_sql="$fix_sql ALTER TABLE public.emp SET WITHOUT OIDS;" +fix_sql="$fix_sql ALTER TABLE public.tt7 SET WITHOUT OIDS;" +;; + esac + psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$? fi - pg_dumpall --no-sync -f "$temp_root"/dump1.sql || pg_dumpall1_status=$? + pg_dumpall --extra-float-digits=0 --no-sync -f "$temp_root"/dump1.sql || pg_d
Re: range_agg
On Sun, Dec 27, 2020 at 8:52 PM Zhihong Yu wrote: > This is not an ideal way to index multirages, but something we can easily > have. > > typo: multiranges Thanks for catching. I will revise the commit message before committing. -- Regards, Alexander Korotkov
Re: range_agg
On Sun, Dec 27, 2020 at 9:07 PM David Fetter wrote: > On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote: > > This is not an ideal way to index multirages, but something we can > > easily have. > > What sort of indexing improvements do you have in mind? Approximation of multirange as a range can cause false positives. It's good if gaps are small, but what if they aren't. Ideally, we should split multirange to the ranges and index them separately. So, we would need a GIN-like index. The problem is that the GIN entry tree is a B-tree, which is not very useful for searching for ranges. If we could replace the GIN entry tree with GiST or SP-GiST, that should be good. We could index multirage parts separately and big gaps wouldn't be a problem. Similar work was already prototyped (it was prototyped under the name "vodka", but I'm not a big fan of this name). FWIW, such a new access method would need a lot of work to bring it to commit. I don't think it would be reasonable, before multiranges get popular. Regarding the GiST opclass, it seems the best we can do in GiST. -- Regards, Alexander Korotkov
Re: Parallel Inserts in CREATE TABLE AS
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch: + if (ignore && + (root->parse->CTASParallelInsInfo & +CTAS_PARALLEL_INS_TUP_COST_CAN_IGN)) I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the above if since when ignore_parallel_tuple_cost returns true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already. + * In this function we only care Append and Gather nodes. 'care' -> 'care about' + for (int i = 0; i < aps->as_nplans; i++) + { + parallel |= PushDownCTASParallelInsertState(dest, + aps->appendplans[i], + gather_exists); It seems the loop termination condition can include parallel since we can come out of the loop once parallel is true. + if (!allow && tuple_cost_flags && gather_exists) As the above code shows, gather_exists is only checked when allow is false. +* We set the flag for two cases when there is no parent path will +* be created(such as : limit,sort,distinct...): Please correct the grammar : there are two verbs following 'when' For set_append_rel_size: + { + root->parse->CTASParallelInsInfo |= + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND; + } + } + + if (root->parse->CTASParallelInsInfo & + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) + { + root->parse->CTASParallelInsInfo &= + ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND; In the if block for childrel->rtekind == RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it cleared immediately after ? + /* Set to this in case tuple cost needs to be ignored for Append cases. */ + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3 Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn on' or similar term in the comment. Because 'set to' normally means assignment. Cheers On Sun, Dec 27, 2020 at 12:50 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar > wrote: > > I have reviewed part of v15-0001 patch, I have a few comments, I will > > continue to review this. > > Thanks a lot. > > > 1. > > Why is this temporary hack? and what is the plan for removing this hack? > > The changes in xact.c, xact.h and heapam.c are common to all the > parallel insert patches - COPY, INSERT INTO SELECT. That was the > initial comment, I forgot to keep it in sync with the other patches. > Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan > was to have these code in all the parallel inserts patch, whichever > gets to review and commit first, others will update their patches > accordingly. > > > 2. > > +/* > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > + * insertion is possible, if yes set the parallel insert state i.e. > push down > > + * the dest receiver to the Gather nodes. > > + */ > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > +{ > > +if (!IS_CTAS(into)) > > +return; > > > > When will this hit? The functtion name suggest that it is from CTAS > > but now you have a check that if it is > > not for CTAS then return, can you add the comment that when do you > > expect this case? > > Yes it will hit for explain cases, but I choose to remove this and > check outside in the explain something like: > if (into) > ChooseParallelInsertsInCTAS() > > > Also the function name should start in a new line > > i.e > > void > > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > Ah, missed that. Modified now. > > > 3. > > +/* > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > + * insertion is possible, if yes set the parallel insert state i.e. > push down > > + * the dest receiver to the Gather nodes. > > + */ > > > > Push down to the Gather nodes? I think the right statement will be > > push down below the Gather node. > > Modified. > > > 4. > > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) > > { > > DR_intorel *myState = (DR_intorel *) self; > > > > -- Comment ->in parallel worker we don't need to crease dest recv > blah blah > > +if (myState->is_parallel_worker) > > { > > --parallel worker handling-- > > return; > > } > > > > --non-parallel worker code stay right there, instead of moving to > else > > Done. > > > 5. > > +/* > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > + * insertion is possible, if yes set the parallel insert state i.e. > push down > > + * the dest receiver to the Gather nodes. > > + */ > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > +{ > > > > From function name and comments it appeared that this function will > > return boolean saying whether > > Parallel insert should be selected or not. I t
hash_extension(text)
Hi, When developing extensions, I find myself spending time on manually checking if my update scripts (invoked via ALTER EXTENSION ... UPDATE) gives the same result as CREATE EXTENSION ... would do if installing the latest version from scratch without any previous version. I thought it would be efficient if one could quickly detect such a difference. If there would be a way to concatenate all extension objects and their definitions in a deterministic order, a hash could be created, which could then be used to detect differences. Below is my attempt to implement this idea. The strategy is fragile as it doesn't handle all the regclasses, so false negatives are possible, but I don't think false positives should be possible, if implemented correctly. Is there a better way to solve my problem already? Feedback welcome. Best regards, Joel SELECT extversion FROM pg_extension WHERE extname = 'uniphant'; extversion 1.0 (1 row) SELECT hash_extension('uniphant'); hash_extension -1425682867 (1 row) ALTER EXTENSION uniphant UPDATE; SELECT hash_extension('uniphant'); hash_extension -1615520783 (1 row) DROP EXTENSION uniphant; CREATE EXTENSION uniphant; SELECT hash_extension('uniphant'); hash_extension -1615520783 (1 row) CREATE OR REPLACE FUNCTION hash_extension(text) RETURNS integer STABLE LANGUAGE sql AS $$ -- -- Constructs a text string containing most of all the extension objects -- and their create definitions. -- -- This is useful to detect a diff between the result of -- --ALTER EXTENSION ... UPDATE; --SELECT hash_extension(...); -- -- compared to if one would install the latest version -- of the extension from scratch using -- --CREATE EXTENSION ...; --SELECT hash_extension(...); -- -- This could happen if the author of the extension -- made a mistake in the update scripts. -- -- This function is meant to be useful to check -- the correctness of such update scripts. -- SELECT hashtext(jsonb_agg(jsonb_build_array( pg_describe_object, CASE classid WHEN 'pg_namespace'::regclass THEN ( SELECT jsonb_build_array(pg_roles.rolname, pg_namespace.nspacl) FROM pg_namespace JOIN pg_roles ON pg_roles.oid = pg_namespace.nspowner WHERE pg_namespace.oid = q.objid ) WHEN 'pg_proc'::regclass THEN jsonb_build_array(pg_get_functiondef(objid)) WHEN 'pg_class'::regclass THEN ( SELECT jsonb_agg(jsonb_build_array( a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod), (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid, true) for 128) FROM pg_catalog.pg_attrdef d WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef), a.attnotnull, (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation), a.attidentity, a.attgenerated ) ORDER BY a.attnum) FROM pg_catalog.pg_attribute a WHERE a.attrelid = q.objid AND a.attnum > 0 AND NOT a.attisdropped ) END, classid::regclass ) ORDER BY pg_describe_object)::text) FROM ( SELECT pg_describe_object(classid, objid, 0), classid::regclass, objid FROM pg_depend WHERE refclassid = 'pg_extension'::regclass AND refobjid = (SELECT oid FROM pg_extension WHERE extname = $1) ) AS q $$;
vacuum_cost_page_miss default value and modern hardware
vacuum_cost_page_miss has a default value of 10, while vacuum_cost_page_dirty has a default value of 20. This has been the case since cost-based delays were introduced by commit f425b605f4e back in 2004. The obvious implication is that dirtying a page is on average only twice as expensive as a single shared_buffers miss (that doesn't dirty the page). While that might have made sense back in 2004, when magnetic disks were the norm, it seems questionable now. The trend these days is that the total number of dirty pages is the limiting factor for OLTP workloads. This is a broad trend among all disk-based RDBMSs with an ARIES style design. It is more or less a result of long term trends in main memory size scaling and flash storage. It's rare for OLTP workloads to be truly I/O bound, and yet the backpressure from page cleaning/checkpointing becomes a bottleneck. In short, writes are way more expensive than reads -- the *relative* cost of writes has increased significantly (our use of the OS FS cache makes this even worse). I suspect that this trend will become even more important for Postgres in the coming years, but that's not what I want to talk about right now. I just want to talk about vacuum_cost_page_miss on this thread. Simply decreasing vacuum_cost_page_dirty seems like a low risk way of making the VACUUM costing more useful within autovacuum workers. Halving vacuum_cost_page_dirty to 5 would be a good start, though I think that a value as low as 2 would be better. That would make it only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing a page that is in shared_buffers but did not need to be dirtied), which seems sensible to me when considered in the context in which the value is actually applied (and not some abstract theoretical context). There are a few reasons why this seems like a good idea now: * Throttling/delaying VACUUM is only useful as a way of smoothing the impact on production queries, which is an important goal, but currently we don't discriminate against the cost that we really should keep under control (dirtying new pages within VACUUM) very well. This is due to the aforementioned trends, the use of a strategy ring buffer by VACUUM, the fact that indexes are usually vacuumed in sequential physical order these days, and many other things that were not a factor in 2004. * There is a real downside to throttling VACUUM unnecessarily, and the effects are *non-linear*. On a large table, the oldest xmin cutoff may become very old by the time we're only (say) half way through the initial table scan in lazy_scan_heap(). There may be relatively little work to do because most of the dead tuples won't be before the oldest xmin cutoff by that time (VACUUM just cannot keep up). Excessive throttling for simple page misses may actually *increase* the amount of I/O that VACUUM has to do over time -- we should try to get to the pages that actually need to be vacuumed quickly, which are probably already dirty anyway (and thus are probably going to add little to the cost delay limit in practice). Everything is connected to everything else. * vacuum_cost_page_miss is very much not like random_page_cost, and the similar names confuse the issue -- this is not an optimization problem. Thinking about VACUUM as unrelated to the workload itself is obviously wrong. Changing the default is also an opportunity to clear that up. Even if I am wrong to suggest that a miss within VACUUM should only be thought of as 2x as expensive as a hit in some *general* sense, I am concerned about *specific* consequences. There is no question about picking the best access path here -- we're still going to have to access the same blocks in the same way sooner or later. In general I think that we should move in the direction of more frequent, cheaper VACUUM operations [1], though we've already done a lot of work in that direction (e.g. freeze map work). * Some impact from VACUUM on query performance may in fact be a good thing. Deferring the cost of vacuuming can only make sense if we'll eventually be able to catch up because we're experiencing a surge in demand, which seems kind of optimistic -- it seems more likely that the GC debt will just grow and grow. Why should the DBA not expect to experience some kind of impact, which could be viewed as a natural kind of backpressure? The most important thing is consistent performance. * Other recent work such as the vacuum_cleanup_index_scale_factor patch has increased the relative cost of index vacuuming in some important cases: we don't have a visibility/freeze map for indexes, but index vacuuming that doesn't dirty any pages and has a TID kill list that's concentrated at the end of the heap/table is pretty cheap (the TID binary search is cache efficient/cheap). This change will help these workloads by better reflecting the way in which index vacuuming can be cheap for append-only tables with a small amount of garbage for recently inserted tuples that also got u
Re: doc review for v14
On Thu, Dec 24, 2020 at 05:12:02PM +0900, Michael Paquier wrote: > I have applied most of it on HEAD, except 0011 and the things noted > above. Thanks again. Thank you. I see that I accidentally included ZSTD_COMPRESSION in pg_backup_archiver.h while cherry-picking from the branch where I first fixed this. Sorry :( > 0001-pgindent-typos.not-a-patch touches pg_bsd_indent. I'm hoping that someone will apply it there, but I realize that access to its repository is tightly controlled :) On Thu, Dec 24, 2020 at 05:12:02PM +0900, Michael Paquier wrote: > Restraining more the set of options is something to consider, though > it could be annoying. I have discarded this one for now. Even though its -d is unused, I guess since wouldn't serve any significant purpose, we shouldn't make pg_restore -l -d fail for no reason. I think a couple of these should be backpatched. doc/src/sgml/ref/pg_dump.sgml doc/src/sgml/sources.sgml doc/src/sgml/cube.sgml? doc/src/sgml/func.sgml? -- Justin >From 9c24fa421e423edb29fc866a70e935843dab2804 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 14 Nov 2020 23:09:21 -0600 Subject: [PATCH 1/7] typos in master --- doc/src/sgml/datatype.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 58d168c763..55d79c02f5 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -639,7 +639,7 @@ NUMERIC The NaN (not a number) value is used to represent - undefined calculational results. In general, any operation with + undefined computational results. In general, any operation with a NaN input yields another NaN. The only exception is when the operation's other inputs are such that the same output would be obtained if the NaN were to -- 2.17.0 >From b4f8290d322a2d783352aff3a4b134acafa0a13c Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 25 Oct 2020 13:53:08 -0500 Subject: [PATCH 2/7] producement? fcec6caafa2346b6c9d3ad5065e417733bd63cd9 Previously discussed at https://www.postgresql.org/message-id/20201102062203.GA15770%40paquier.xyz Should backpatch --- src/backend/utils/adt/xml.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 4c299057a6..ab3151cefb 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -4535,11 +4535,11 @@ XmlTableFetchRow(TableFuncScanState *state) xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableFetchRow"); /* - * XmlTable returns table - set of composite values. The error context, is - * used for producement more values, between two calls, there can be - * created and used another libxml2 error context. It is libxml2 global - * value, so it should be refreshed any time before any libxml2 usage, - * that is finished by returning some value. + * XmlTable returns a table-set of composite values. The error context is + * used for providing more detail. Between two calls, other libxml2 + * error contexts might have been created and used ; since they're libxml2 + * global values, they should be refreshed each time before any libxml2 usage + * that finishes by returning some value. */ xmlSetStructuredErrorFunc((void *) xtCxt->xmlerrcxt, xml_errorHandler); -- 2.17.0 >From e2b22163feb706ecba670108b6760526ddb1d57b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 15 Nov 2020 10:23:32 -0600 Subject: [PATCH 3/7] cannot --- doc/src/sgml/protocol.sgml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 4899bacda7..8fc1427d0a 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1532,8 +1532,8 @@ SELCT 1/0; support to PostgreSQL. In this case the connection must be closed, but the frontend might choose to open a fresh connection and proceed without requesting GSSAPI -encryption. Given the length limits specified above, the ErrorMessage can -not be confused with a proper response from the server with an appropriate +encryption. Given the length limits specified above, the ErrorMessage +cannot be confused with a proper response from the server with an appropriate length. -- 2.17.0 >From 4f78473ed9063464e2510bbd433e3ebc3b9a773e Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 19 Dec 2020 03:30:25 -0600 Subject: [PATCH 4/7] Doc review for min_dynamic_shared_memory: 84b1c63ad --- doc/src/sgml/config.sgml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ce1fed7dfb..6b2acab9f4 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1941,13 +1941,13 @@ include_dir 'conf.d' Specifies the amount of memory that should be allocated at server -startup time for use by parallel queries.
Re: Feature request: Connection string parsing for postgres_fdw
Whoa that's perfect! Thank you so much. On Thu, Dec 24, 2020 at 4:59 PM Ian Lawrence Barwick wrote: > 2020年12月23日(水) 22:05 Eric Hanson : > > > > I'm trying to store connection to postgres_fdw in the database I want > to be able to store the full breadth of connection styles and all the > different types of connections that libpq supports. But having some > troubles. > > > > Postgres_fdw wants options passed into CREATE SERVER, all broken out > into separate variables, but this format is neither a connection URI, nor a > keyword/value string. One could imagine writing some parser that extracts > all the variables and honors collisions in the same way libpq does (like > when both the path and the query string specify a port), but I really don't > want to recreate that. > > > > It would be really nice if I could tap into libpq's connection string > parser from SQL and use it to extract all the variables in a given > connection string, so that I can then pipe those into CREATE SERVER > options. Either that, or teach postgres_fdw how to consume standard > connection strings. > > Does something like this do what you're looking for? > > postgres=# SELECT * FROM conninfo_parse('host=node1 user=foo'); > keyword | value > -+--- > user| foo > host| node1 > (2 rows) > > postgres=# SELECT * FROM conninfo_parse('postgresql://localhost:5433'); > keyword | value > -+--- > host| localhost > port| 5433 > (2 rows) > > Basically a wrapper around PQconninfoParse(), I've had the code knocking > around > for a while now and finally got round to packaging it into an > extension [1]. It's > on my todo list to submit a patch based on this to core, as it seems > Generally > Useful To Have. > > [1] https://github.com/ibarwick/conninfo > > Regards > > Ian Barwick > -- > EnterpriseDB: https://www.enterprisedb.com >
Re: Rethinking plpgsql's assignment implementation
so 26. 12. 2020 v 19:00 odesílatel Pavel Stehule napsal: > Hi > > I repeated tests. I wrote a set of simple functions. It is a synthetical > test, but I think it can identify potential problems well. > > I calculated the average of 3 cycles and I checked the performance of each > function. I didn't find any problem. The total execution time is well too. > Patched code is about 11% faster than master (14sec x 15.8sec). So there is > new important functionality with nice performance benefits. > > make check-world passed > I played with plpgsql_check tests and again I didn't find any significant issue of this patch. I am very satisfied with implementation. Now, the behavior of SELECT INTO is behind the assign statement and this fact should be documented. Usually we don't need to use array's fields here, but somebody can try it. Regards Pavel > Regards > > Pavel > > > >
Re: Rethinking plpgsql's assignment implementation
Pavel Stehule writes: > Now, the behavior of SELECT INTO is behind the assign statement and this > fact should be documented. Usually we don't need to use array's fields > here, but somebody can try it. It's been behind all along --- this patch didn't really change that. But I don't mind documenting it more clearly. regards, tom lane
Re: Cache relation sizes?
On Thu, Dec 24, 2020 at 6:59 AM Thomas Munro wrote: > On Thu, Dec 17, 2020 at 10:22 PM Andy Fan > wrote: > > Let me try to understand your point. Suppose process 1 extends a file to > > 2 blocks from 1 block, and fsync is not called, then a). the lseek *may* > still > > return 1 based on the comments in the ReadBuffer_common ("because > > of buggy Linux kernels that sometimes return an seek SEEK_END result > > that doesn't account for a recent write"). b). When this patch is > introduced, > > we would get 2 blocks for sure if we can get the size from cache. c). > After > > user reads the 2 blocks from cache and then the cache is evicted, and > user > > reads the blocks again, it is possible to get 1 again. > > > > Do we need to consider case a) in this patch? and Is case c). the > situation you > > want to avoid and do we have to introduce fsync to archive this? > Basically I > > think case a) should not happen by practice so we don't need to handle > case > > c) and finally we don't need the fsync/SR_SYNCING/SR_JUST_DIRTIED flag. > > I'm not opposed to adding them, I am just interested in the background > in case > > you are also willing to discuss it. > > Sorry for the lack of background -- there was some discussion on the > thread "Optimize dropping of relation buffers using dlist", but it's > long and mixed up with other topics. I'll try to summarise here. > > It's easy to reproduce files shrinking asynchronously on network > filesystems that reserve space lazily[1], Thanks for the explaintain. After I read that thread, I am more confused now. In [1], we have the below test result. $ cat magic_shrinking_file.c #include #include #include #include int main() { int fd; char buffer[8192] = {0}; fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND); if (fd < 0) { perror("open"); return EXIT_FAILURE; } printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END)); printf("write(...) = %zd\n", write(fd, buffer, sizeof(buffer))); printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END)); printf("fsync(...) = %d\n", fsync(fd)); printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END)); return EXIT_SUCCESS; } $ cc magic_shrinking_file.c $ ./a.out lseek(..., SEEK_END) = 9670656 write(...) = 8192 lseek(..., SEEK_END) = 9678848 fsync(...) = -1 lseek(..., SEEK_END) = 9670656 I got 2 information from above. a) before the fsync, the lseek(fd, 0, SEEK_END) returns a correct value, however after the fsync, it returns a wrong value. b). the fsync failed(return values is -1) in the above case. I'm more confused because of point a). Since the fsync can't correct some wrong value, what is the point to call fsync in this patch (I agree that it is good to fix this reliability problem within this patch, but I'm doubtful if fsync can help in this case). Am I missing something obviously? I read your patch with details some days ago and feel it is in pretty good shape. and I think you are clear about the existing issues very well. like a). smgr_alloc_sr use a FIFO design. b). SR_PARTITION_LOCK doesn't use a partition lock really. c). and looks like you have some concern about the concurrency issue, which I didn't find anything wrong. d). how to handle the DIRTY sr during evict. I planned to recheck item c) before this reply, but looks like the time is not permitted. May I know what Is your plan about this patch? [1] https://www.postgresql.org/message-id/CA%2BhUKGLZTfKuXir9U4JQkY%3Dk3Tb6M_3toGrGOK_fa2d4MPQQ_w%40mail.gmail.com -- Best Regards Andy Fan
Tying an object's ownership to datdba
https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave $SUBJECT as one of the constituent projects for changing the public schema default ACL. This feature stands on its own, hence the new thread. I showed syntax "ALTER SCHEMA public OWNER TO DATABASE_OWNER" and anticipated a new RoleSpecType. That was fine for in-memory representation, but it lacked a value to store in pg_namespace.nspowner. That led me to instead add a default role, "pg_database_owner". That way, user queries joining owner columns to pg_authid need no modification. Also, user.c changes were trivial, and pg_dump needed no changes. The role's membership consists, implicitly, of the current database owner. The first patch refactors acl.c to reduce code duplication, and the second patch adds the feature. I ended up blocking DDL that creates role memberships involving the new role; see reasons in user.c comments. Lifting those restrictions looked feasible, but it was inessential to the mission, and avoiding unintended consequences would have been tricky. Views "information_schema.enabled_roles" and "information_schema.applicable_roles" list any implicit membership in pg_database_owner, but pg_catalog.pg_group and psql \dgS do not. If this leads any reviewer to look closely at applicable_roles, note that its behavior doesn't match its documentation (https://postgr.es/m/flat/20060728170615.gy20...@kenobi.snowman.net). This patch makes us read pg_database when reading pg_auth_members. IndexScanOK() has been saying "during backend startup we have to be able to use the pg_authid and pg_auth_members syscaches for authentication". While that's true of pg_authid, I found no sign of pg_auth_members reads that early. (The read in InitPostgres() -> CheckMyDatabase() -> pg_database_aclcheck() happens after RelationCacheInitializePhase3(). In a physical walsender, which never has a mature relcache, some SHOW commands make guc.c check DEFAULT_ROLE_READ_ALL_SETTINGS. The walsender case, though it happens after authentication, may necessitate IndexScanOK()'s treatment of pg_auth_members.) For now, just in case I missed the early read, I've made IndexScanOK() treat pg_database like pg_auth_members. Thanks, nm Author: Noah Misch Commit: Noah Misch Merge similar algorithms into roles_is_member_of(). The next commit would have complicated two or three algorithms, so take this opportunity to consolidate. No functional changes. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c index fe6c444..1adacb9 100644 --- a/src/backend/utils/adt/acl.c +++ b/src/backend/utils/adt/acl.c @@ -50,32 +50,24 @@ typedef struct /* * We frequently need to test whether a given role is a member of some other * role. In most of these tests the "given role" is the same, namely the - * active current user. So we can optimize it by keeping a cached list of - * all the roles the "given role" is a member of, directly or indirectly. - * - * There are actually two caches, one computed under "has_privs" rules - * (do not recurse where rolinherit isn't true) and one computed under - * "is_member" rules (recurse regardless of rolinherit). + * active current user. So we can optimize it by keeping cached lists of all + * the roles the "given role" is a member of, directly or indirectly. * * Possibly this mechanism should be generalized to allow caching membership * info for multiple roles? * - * The has_privs cache is: - * cached_privs_role is the role OID the cache is for. - * cached_privs_roles is an OID list of roles that cached_privs_role - * has the privileges of (always including itself). - * The cache is valid if cached_privs_role is not InvalidOid. - * - * The is_member cache is similarly: - * cached_member_role is the role OID the cache is for. - * cached_membership_roles is an OID list of roles that cached_member_role - * is a member of (always including itself). - * The cache is valid if cached_member_role is not InvalidOid. + * Each element of cached_roles is an OID list of constituent roles for the + * corresponding element of cached_role (always including the cached_role + * itself). One cache has ROLERECURSE_PRIVS semantics, and the other has + * ROLERECURSE_MEMBERS semantics. */ -static Oid cached_privs_role = InvalidOid; -static List *cached_privs_roles = NIL; -static Oid cached_member_role = InvalidOid; -static List *cached_membership_roles = NIL; +enum RoleRecurseType +{ + ROLERECURSE_PRIVS = 0, /* recurse if rolinherit */ + ROLERECURSE_MEMBERS = 1 /* recurse unconditionally */ +}; +static Oid cached_role[] = {InvalidOid, InvalidOid}; +static List *cached_roles[] = {NIL, NIL}; static const char *getid(const char *s, char *n); @@ -4675,8 +4667,8 @@ initialize_acl(void) { /* * In normal mode, set a callbac
Re: Parallel Inserts in CREATE TABLE AS
On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy wrote: > > On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar wrote: > > I have reviewed part of v15-0001 patch, I have a few comments, I will > > continue to review this. > > Thanks a lot. > > > 1. > > Why is this temporary hack? and what is the plan for removing this hack? > > The changes in xact.c, xact.h and heapam.c are common to all the > parallel insert patches - COPY, INSERT INTO SELECT. That was the > initial comment, I forgot to keep it in sync with the other patches. > Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan > was to have these code in all the parallel inserts patch, whichever > gets to review and commit first, others will update their patches > accordingly. > > > 2. > > +/* > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > + * insertion is possible, if yes set the parallel insert state i.e. push > > down > > + * the dest receiver to the Gather nodes. > > + */ > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > +{ > > +if (!IS_CTAS(into)) > > +return; > > > > When will this hit? The functtion name suggest that it is from CTAS > > but now you have a check that if it is > > not for CTAS then return, can you add the comment that when do you > > expect this case? > > Yes it will hit for explain cases, but I choose to remove this and > check outside in the explain something like: > if (into) > ChooseParallelInsertsInCTAS() > > > Also the function name should start in a new line > > i.e > > void > > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > Ah, missed that. Modified now. > > > 3. > > +/* > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > + * insertion is possible, if yes set the parallel insert state i.e. push > > down > > + * the dest receiver to the Gather nodes. > > + */ > > > > Push down to the Gather nodes? I think the right statement will be > > push down below the Gather node. > > Modified. > > > 4. > > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) > > { > > DR_intorel *myState = (DR_intorel *) self; > > > > -- Comment ->in parallel worker we don't need to crease dest recv blah > > blah > > +if (myState->is_parallel_worker) > > { > > --parallel worker handling-- > > return; > > } > > > > --non-parallel worker code stay right there, instead of moving to else > > Done. > > > 5. > > +/* > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > + * insertion is possible, if yes set the parallel insert state i.e. push > > down > > + * the dest receiver to the Gather nodes. > > + */ > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > +{ > > > > From function name and comments it appeared that this function will > > return boolean saying whether > > Parallel insert should be selected or not. I think name/comment > > should be better for this > > Yeah that function can still return void because no point in returning > bool there, since the intention is to see if parallel inserts can be > performed, if yes, set the state otherwise exit. I changed the > function name to TryParallelizingInsertsInCTAS(). Let me know your > suggestions if that doesn't work out. > > > 6. > > /* > > + * For parallelizing inserts in CTAS i.e. making each parallel > > worker > > + * insert the tuples, we must send information such as into clause > > (for > > + * each worker to build separate dest receiver), object id (for > > each > > + * worker to open the created table). > > > > Comment is saying we need to pass object id but the code under this > > comment is not doing so. > > Improved the comment. > > > 7. > > +/* > > + * Since there are no rows that are transferred from workers to > > Gather > > + * node, so we set it to 0 to be visible in estimated row count of > > + * explain plans. > > + */ > > +queryDesc->planstate->plan->plan_rows = 0; > > > > This seems a bit hackies Why it is done after the planning, I mean > > plan must know that it is returning a 0 rows? > > This exists to show up the estimated row count(in case of EXPLAIN CTAS > without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples > are shown correctly as 0 because Gather doesn't receive any tuples. > if (es->costs) > { > if (es->format == EXPLAIN_FORMAT_TEXT) > { > appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f > width=%d)", > plan->startup_cost, plan->total_cost, > plan->plan_rows, plan->plan_width); > > Since it's an estimated row count(which may not be always correct), we > will let the EXPLAIN plan show that and I think we can remove that > part. Thoughts? > > I removed it in v6 patch set. > > > 8. > > +char *intoc
Re: [HACKERS] Custom compression methods
On Sun, Dec 27, 2020 at 12:40 PM Andrey Borodin wrote: > > > > > 25 дек. 2020 г., в 14:34, Dilip Kumar написал(а): > > > > > > > > > > > > > > > > Maybe add Lz4\Zlib WAL FPI compression on top of this patchset? I'm not > insisting on anything, it just would be so cool to have it... > > BTW currently there are Oid collisions in original patchset. Thanks for the patch. Maybe we can allow setting custom compression methods for wal compression as well. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Rethinking plpgsql's assignment implementation
po 28. 12. 2020 v 0:55 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > Now, the behavior of SELECT INTO is behind the assign statement and this > > fact should be documented. Usually we don't need to use array's fields > > here, but somebody can try it. > > It's been behind all along --- this patch didn't really change that. > But I don't mind documenting it more clearly. > ok Pavel > regards, tom lane >
Re: Disable WAL logging to speed up data loading
On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com wrote: > > Hello > > > I've made a new patch v05 that took in comments > to filter out WALs more strictly and addressed some minor fixes > that were discussed within past few days. > Also, I changed the documentations, considering those modifications. > >From a backup management tool perspective, how can they detect that wal_level has been changed to ‘none' (and back to the normal)? IIUC once we changed wal_level to none, old backups that are taken before setting to ‘none’ can be used only for restoring the database to the point before the LSN where setting 'wal_level = none'. The users can neither restore the database to any points in the term of 'wal_level = none' nor use an old backup to restore the database to the point after LSN where setting 'wal_level = none’. I think we might need to provide a way to detect the changes other than reading XLOG_PARAMETER_CHANGE. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: Parallel Inserts in CREATE TABLE AS
On Sun, Dec 27, 2020 at 2:28 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > > On Sat, Dec 26, 2020 at 9:20 PM vignesh C wrote: > > +-- parallel inserts must occur > > +select explain_pictas( > > +'create table parallel_write as select length(stringu1) from tenk1;'); > > +select count(*) from parallel_write; > > +drop table parallel_write; > > > > We can change comment "parallel inserts must occur" like "parallel > > insert must be selected for CTAS on normal table" > > > > +-- parallel inserts must occur > > +select explain_pictas( > > +'create unlogged table parallel_write as select length(stringu1) from tenk1;'); > > +select count(*) from parallel_write; > > +drop table parallel_write; > > > > We can change comment "parallel inserts must occur" like "parallel > > insert must be selected for CTAS on unlogged table" > > Similar comment need to be handled in other places also. > > I think the existing comments look fine. The info like table type and > the Query CTAS or CMV is visible by looking at the test case. What I > wanted from the comments is whether we support parallel inserts or not > and if not why so that it will be easy to read. I tried to keep it as > succinctly as possible. > I saw few inconsistencies in the patch: *+-- parallel inserts must occur*+select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); + explain_pictas *+-- parallel inserts must not occur as the table is temporary*+select explain_pictas( +'create temporary table parallel_write as select length(stringu1) from tenk1;'); + explain_pictas *+-- parallel inserts must occur, as there is init plan that gets executed by+-- each parallel worker* +select explain_pictas( +'create table parallel_write as select two col1, +(select two from (select * from tenk2) as tt limit 1) col2 +from tenk1 where tenk1.four = 3;'); + explain_pictas *+-- the top node is Gather under which merge join happens, so parallel inserts+-- must occur* +set enable_nestloop to off; +set enable_mergejoin to on; *+-- parallel hash join happens under Gather node, so parallel inserts must occur*+set enable_mergejoin to off; +set enable_hashjoin to on; +select explain_pictas( Test comments are detailed in a few cases and in few others it is not detailed for similar kinds of parallelism selected tests. I felt we could make the test comments consistent across the file. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Custom compression methods
> 28 дек. 2020 г., в 10:20, Dilip Kumar написал(а): > > On Sun, Dec 27, 2020 at 12:40 PM Andrey Borodin wrote: >> >> >> >>> 25 дек. 2020 г., в 14:34, Dilip Kumar написал(а): >>> >>> >>> >>> >>> >>> >>> >> >> Maybe add Lz4\Zlib WAL FPI compression on top of this patchset? I'm not >> insisting on anything, it just would be so cool to have it... >> >> BTW currently there are Oid collisions in original patchset. > > Thanks for the patch. Maybe we can allow setting custom compression > methods for wal compression as well. No, unfortunately, we can't use truly custom methods. Custom compression handlers are WAL-logged. So we can use only static set of hardcoded compression methods. Thanks! Best regards, Andrey Borodin.
Re: New IndexAM API controlling index vacuum strategies
On Thu, Dec 24, 2020 at 12:59 PM Peter Geoghegan wrote: > > On Tue, Dec 22, 2020 at 2:54 PM Masahiko Sawada wrote: > > I've started this separate thread from [1] for discussing the general > > API design of index vacuum. > > This is a very difficult and very important problem. Clearly defining > the problem is probably the hardest part. This prototype patch seems > like a good start, though. > > Private discussion between Masahiko and myself led to a shared > understanding of what the best *general* direction is for VACUUM now. > It is necessary to deal with several problems all at once here, and to > at least think about several more problems that will need to be solved > later. If anybody reading the thread initially finds it hard to see > the connection between the specific items that Masahiko has > introduced, they should note that that's *expected*. > > > Summary: > > > > * Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is > > false, and leave it to the index AM whether or not skip them. > > Makes sense. I like the way you unify INDEX_CLEANUP and the > vacuum_cleanup_index_scale_factor stuff in a way that is now quite > explicit and obvious in the code. > > > The second and third points are to introduce a general framework for > > future extensibility. User-visible behavior is not changed by this > > change. > > In some ways the ideas in your patch might be considered radical, or > at least novel: they introduce the idea that bloat can be a > qualitative thing. But at the same time the design is quite > conservative: these are fairly isolated changes, at least code-wise. I > am not 100% sure that this approach will be successful in > vacuumlazy.c, in the end (I'm ~95% sure). But I am 100% sure that our > collective understanding of the problems in this area will be > significantly improved by this effort. A fundamental rethink does not > necessarily require a fundamental redesign, and yet it might be just > as effective. > > This is certainly what I see when testing my bottom-up index deletion > patch, which adds an incremental index deletion mechanism that merely > intervenes in a precise, isolated way. Despite my patch's simplicity, > it manages to practically eliminate an entire important *class* of > index bloat (at least once you make certain mild assumptions about the > duration of snapshots). Sometimes it is possible to solve a hard > problem by thinking about it only *slightly* differently. > > This is a tantalizing possibility for VACUUM, too. I'm willing to risk > sounding grandiose if that's what it takes to get more hackers > interested in these questions. With that in mind, here is a summary of > the high level hypothesis behind this VACUUM patch: > > VACUUM can and should be reimagined as a top-down mechanism that > complements various bottom-up mechanisms (including the stuff from my > deletion patch, heap pruning, and possibly an enhanced version of heap > pruning based on similar principles). This will be possible without > changing any of the fundamental invariants of the current vacuumlazy.c > design. VACUUM's problems are largely pathological behaviors of one > kind or another, that can be fixed with specific well-targeted > interventions. Workload characteristics can naturally determine how > much of the cleanup is done by VACUUM itself -- large variations are > possible within a single database, and even across indexes on the same > table. Agreed. Ideally, the bottom-up mechanism works well and reclaim almost all garbage. VACUUM should be a feature that complements these works if the bottom-up mechanism cannot work well for some reason, and also is used to make sure that all collected garbage has been vacuumed. For heaps, we already have such a mechanism: opportunistically hot-pruning and lazy vacuum. For indexes especially btree indexes, the bottom-up index deletion and ambulkdelete() would have a similar relationship. > > > The new index AM API, amvacuumstrategy(), which is called before > > bulkdelete() for each index and asks the index bulk-deletion strategy. > > On this API, lazy vacuum asks, "Hey index X, I collected garbage heap > > tuples during heap scanning, how urgent is vacuuming for you?", and > > the index answers either "it's urgent" when it wants to do > > bulk-deletion or "it's not urgent, I can skip it". The point of this > > proposal is to isolate heap vacuum and index vacuum for each index so > > that we can employ different strategies for each index. Lazy vacuum > > can decide whether or not to do heap clean based on the answers from > > the indexes. > > Right -- workload characteristics (plus appropriate optimizations at > the local level) make it possible that amvacuumstrategy() will give > *very* different answers from different indexes *on the same table*. > The idea that all indexes on the table are more or less equally > bloated at any given point in time is mostly wrong. Actually, > *sometimes* it really is correct! But other times
RE: Disable WAL logging to speed up data loading
Hello Sawada-San On Monday, December 28, 2020 2:29 PM Masahiko Sawada wrote: > On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com > wrote: > > > > I've made a new patch v05 that took in comments to filter out WALs > > more strictly and addressed some minor fixes that were discussed > > within past few days. > > Also, I changed the documentations, considering those modifications. > > From a backup management tool perspective, how can they detect that > wal_level has been changed to ‘none' (and back to the normal)? IIUC once > we changed wal_level to none, old backups that are taken before setting to > ‘none’ can be used only for restoring the database to the point before the > LSN where setting 'wal_level = none'. The users can neither restore the > database to any points in the term of 'wal_level = none' nor use an old backup > to restore the database to the point after LSN where setting 'wal_level = > none’. I think we might need to provide a way to detect the changes other > than reading XLOG_PARAMETER_CHANGE. In the past, we discussed the aspect of backup management tool in [1] and concluded that this should be another patch separated from this thread because to compare the wal_level changes between snapshots applies to wal_level = minimal, too. Please have a look at the "second idea" in the e-mail in the [1] and responses to it. [1] - https://www.postgresql.org/message-id/OSBPR01MB4888B34B81A6E0DD46B5D063EDE00%40OSBPR01MB4888.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
Re: New IndexAM API controlling index vacuum strategies
On Sun, Dec 27, 2020 at 10:55 PM Masahiko Sawada wrote: > > As you said, the next question must be: How do we teach lazy vacuum to > > not do what gets requested by amvacuumcleanup() when it cannot respect > > the wishes of one individual indexes, for example when the > > accumulation of LP_DEAD items in the heap becomes a big problem in > > itself? That really could be the thing that forces full heap > > vacuuming, even with several indexes. > > You mean requested by amvacuumstreategy(), not by amvacuumcleanup()? I > think amvacuumstrategy() affects only ambulkdelete(). But when all > ambulkdelete() were skipped by the requests by index AMs we might want > to skip amvacuumcleanup() as well. No, I was asking about how we should decide to do a real VACUUM even (a real ambulkdelete() call) when no index asks for it because bottom-up deletion works very well in every index. Clearly we will need to eventually remove remaining LP_DEAD items from the heap at some point if nothing else happens -- eventually LP_DEAD items in the heap alone will force a traditional heap vacuum (which will still have to go through indexes that have not grown, just to be safe/avoid recycling a TID that's still in the index). Postgres heap fillfactor is 100 by default, though I believe it's 90 in another well known DB system. If you set Postgres heap fill factor to 90 you can fit a little over 200 LP_DEAD items in the "extra space" left behind in each heap page after initial bulk loading/INSERTs take place that respect our lower fill factor setting. This is about 4x the number of initial heap tuples in the pgbench_accounts table -- it's quite a lot! If we pessimistically assume that all updates are non-HOT updates, we'll still usually have enough space for each logical row to get updated several times before the heap page "overflows". Even when there is significant skew in the UPDATEs, the skew is not noticeable at the level of individual heap pages. We have a surprisingly large general capacity to temporarily "absorb" extra garbage LP_DEAD items in heap pages this way. Nobody really cared about this extra capacity very much before now, because it did not help with the big problem of index bloat that you naturally see with this workload. But that big problem may go away soon, and so this extra capacity may become important at the same time. I think that it could make sense for lazy_scan_heap() to maintain statistics about the number of LP_DEAD items remaining in each heap page (just local stack variables). From there, it can pass the statistics to the choose_vacuum_strategy() function from your patch. Perhaps choose_vacuum_strategy() will notice that the heap page with the most LP_DEAD items encountered within lazy_scan_heap() (among those encountered so far in the event of multiple index passes) has too many LP_DEAD items -- this indicates that there is a danger that some heap pages will start to "overflow" soon, which is now a problem that lazy_scan_heap() must think about. Maybe if the "extra space" left by applying heap fill factor (with settings below 100) is insufficient to fit perhaps 2/3 of the LP_DEAD items needed on the heap page that has the most LP_DEAD items (among all heap pages), we stop caring about what amvacuumstrategy()/the indexes say. So we do the right thing for the heap pages, while still mostly avoiding index vacuuming and the final heap pass. I experimented with this today, and I think that it is a good way to do it. I like the idea of choose_vacuum_strategy() understanding that heap pages that are subject to many non-HOT updates have a "natural extra capacity for LP_DEAD items" that it must care about directly (at least with non-default heap fill factor settings). My early testing shows that it will often take a surprisingly long time for the most heavily updated heap page to have more than about 100 LP_DEAD items. > > I will need to experiment in order to improve my understanding of how > > to make this cooperate with bottom-up index deletion. But that's > > mostly just a question for my patch (and a relatively easy one). > > Yeah, I think we might need something like statistics about garbage > per index so that individual index can make a different decision based > on their status. For example, a btree index might want to skip > ambulkdelete() if it has a few dead index tuples in its leaf pages. It > could be on stats collector or on btree's meta page. Right. I think that even a very conservative approach could work well. For example, maybe we teach nbtree's amvacuumstrategy() routine to ask to do a real ambulkdelete(), except in the extreme case where the index is *exactly* the same size as it was after the last VACUUM. This will happen regularly with bottom-up index deletion. Maybe that approach is a bit too conservative, though. -- Peter Geoghegan