Re: Bloom filter Pushdown Optimization for Merge Join
On Fri, Sep 30, 2022 at 9:20 PM Zhihong Yu wrote: > > > On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu wrote: > >> >> >> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li wrote: >> >>> Hello, >>> >>> A bloom filter provides early filtering of rows that cannot be joined >>> before they would reach the join operator, the optimization is also >>> called a semi join filter (SJF) pushdown. Such a filter can be created >>> when one child of the join operator must materialize its derived table >>> before the other child is evaluated. >>> >>> For example, a bloom filter can be created using the the join keys for >>> the build side/inner side of a hash join or the outer side of a merge >>> join, the bloom filter can then be used to pre-filter rows on the >>> other side of the join operator during the scan of the base relation. >>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good >>> discussion on using such optimization for hash join without going into >>> the pushdown of the filter where its performance gain could be further >>> increased. >>> >>> We worked on prototyping bloom filter pushdown for both hash join and >>> merge join. Attached is a patch set for bloom filter pushdown for >>> merge join. We also plan to send the patch for hash join once we have >>> it rebased. >>> >>> Here is a summary of the patch set: >>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early >>> during the table scan instead of later on. >>> -The bloom filter is pushed down along the execution tree to >>> the target SeqScan nodes. >>> -Experiments show that this optimization can speed up Merge >>> Join by up to 36%. >>> >>> 2. The planner makes the decision to use the bloom filter based on the >>> estimated filtering rate and the expected performance gain. >>> -The planner accomplishes this by estimating four numbers per >>> variable - the total number of rows of the relation, the number of >>> distinct values for a given variable, and the minimum and maximum >>> value of the variable (when applicable). Using these numbers, the >>> planner estimates a filtering rate of a potential filter. >>> -Because actually creating and implementing the filter adds >>> more operations, there is a minimum threshold of filtering where the >>> filter would actually be useful. Based on testing, we query to see if >>> the estimated filtering rate is higher than 35%, and that informs our >>> decision to use a filter or not. >>> >>> 3. If using a bloom filter, the planner also adjusts the expected cost >>> of Merge Join based on expected performance gain. >>> >>> 4. Capability to build the bloom filter in parallel in case of >>> parallel SeqScan. This is done efficiently by populating a local bloom >>> filter for each parallel worker and then taking a bitwise OR over all >>> the local bloom filters to form a shared bloom filter at the end of >>> the parallel SeqScan. >>> >>> 5. The optimization is GUC controlled, with settings of >>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter. >>> >>> We found in experiments that there is a significant improvement >>> when using the bloom filter during Merge Join. One experiment involved >>> joining two large tables while varying the theoretical filtering rate >>> (TFR) between the two tables, the TFR is defined as the percentage >>> that the two datasets are disjoint. Both tables in the merge join were >>> the same size. We tested changing the TFR to see the change in >>> filtering optimization. >>> >>> For example, let’s imagine t0 has 10 million rows, which contain the >>> numbers 1 through 10 million randomly shuffled. Also, t1 has the >>> numbers 4 million through 14 million randomly shuffled. Then the TFR >>> for a join of these two tables is 40%, since 40% of the tables are >>> disjoint from the other table (1 through 4 million for t0, 10 million >>> through 14 million for t4). >>> >>> Here is the performance test result joining two tables: >>> TFR: theoretical filtering rate >>> EFR: estimated filtering rate >>> AFR: actual filtering rate >>> HJ: hash join >>> MJ Default: default merge join >>> MJ Filter: merge join with bloom filter optimization enabled >>> MJ Filter Forced: merge join with bloom filter optimization forced >>> >>> TFR EFR AFR HJ MJ Default MJ Filter MJ Filter Forced >>> >>> - >>> 10 33.46 7.416529 226382194923160 >>> 20 37.27 14.85 6483 222902192821930 >>> 30 41.32 22.25 6395 223742071820794 >>> 40 45.67 29.76272 219691944919410 >>> 50 50.41 37.16210 214121822218224 >>> 60 55.64 44.51 6052 211081706017018 >>> 70 61.59 51.98 5947 210201568215737 >>> 80 68.64 59.36 5761 208121441114
Re: [PATCH] Log details for client certificate failures
On 29.09.22 06:52, Masahiko Sawada wrote: While this seems a future-proof idea, I wonder if it might be overkill since we don't need to worry about accumulation of leaked memory in this case. Given that only check_cluter_name is the case where we found a small memory leak, I think it's adequate to fix it. Fixing this issue suppresses the valgrind's complaint but since the boot value of cluster_name is "" the memory leak we can avoid is only 1 byte. I have committed this. I think it's better to keep the code locally robust and not to have to rely on complex analysis of how GUC memory management works.
Re: kerberos/001_auth test fails on arm CPU darwin
On 27.09.22 17:35, Nazir Bilal Yavuz wrote: I updated my patch regarding these reviews. The current logic is it will try to find all executables in that order(if it finds all executables, it won't try remaining steps): 1 - 'krb5-config --prefix' 2 - hardcoded paths(I added arm and MacPorts paths for darwin) 3 - from PATH Also, I tried to do some refactoring for adding another paths to search in the future and being sure about all executables are found. This patch could use some more in-code comments. For example, this +# get prefix for kerberos executables and try to find them at this path +sub test_krb5_paths is not helpful. What does it "get", where does it put it, how does it "try", and what does it do if it fails? What are the inputs and outputs of this function? + # remove '\n' since 'krb5-config --prefix' returns path ends with '\n' + $krb5_path =~ s/\n//g; use chomp
Re: ICU for global collation
On 22.09.22 20:06, Marina Polyakova wrote: On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks. I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is ok to check that the selected encoding is supported by ICU after printing all the locale & encoding information, why not to move almost all the ICU checks here?.. It's possible that we can do better, but I'm not going to add things like that to PG 15 at this point unless it fixes a faulty behavior.
Re: postgres_fdw: dead lock in a same transaction when postgres_fdw server is lookback
On Sat, 2022-10-01 at 04:02 +, Xiaoran Wang wrote: > I created a postgers_fdw server lookback as the test does. Then run the > following SQLs > > [create a foreign server via loopback and manipulate the same data locally > and via foreign table] > > Then the transaction got stuck. Should the "lookback" server be disabled in > the postgres_fdw? It shouldn't; there are good use cases for that ("autonomous transactions"). AT most, some cautioning documentation could be added, but I am not convinced that even that is necessary. I'd say that this is a pretty obvious case of pilot error. Yours, Laurenz Albe
Re: longfin and tamandua aren't too happy but I'm not sure why
On Wed, Sep 28, 2022 at 08:45:31PM -0700, Andres Freund wrote: > Hi, > > On 2022-09-28 21:22:26 +0200, Alvaro Herrera wrote: > > I have an additional, unrelated complaint about CI, which is that we > > don't have anything for past branches. I have a partial hack(*), but > > I wish we had something we could readily use. > > > > (*) I just backpatched the commit that added the .cirrus.yml file, plus > > some later fixes to it, and I keep that as a separate branch which I > > merge with whatever other changes I want to test. I then push that to > > github, and ignore the windows results when looking at cirrus-ci.com. You wouldn't need to ignore Windows tap test failures if you also backpatch 76e38b37a, and either disable PG_TEST_USE_UNIX_SOCKETS, or include 45f52709d. > I'd not be against backpatching the ci stuff if there were sufficient demand > for it. It'd probably be a decent bit of initial work, but after that it > shouldn't be too bad. I just tried this, which works fine at least for v11-v14: | git checkout origin/REL_15_STABLE .cirrus.yml src/tools/ci https://cirrus-ci.com/task/5742859943936000 v15a https://cirrus-ci.com/task/6725412431593472 v15b https://cirrus-ci.com/task/5105320283340800 v13 https://cirrus-ci.com/task/4809469463887872 v12 https://cirrus-ci.com/task/6659971021537280 v11 (I still suggest my patches to run all tests using vcregress. The number of people who remember that, for v15, cirrusci runs incomplete tests is probably fewer than five.) https://www.postgresql.org/message-id/20220623193125.GB22452%40telsasoft.com https://www.postgresql.org/message-id/2022082817.GA21897%40telsasoft.com If cirrusci were backpatched, it'd be kind of nice to use a ccache key that includes the branch name (but maybe the overhead of compilation is unimportant compared to the workload induced by cfbot). A gripe from me: the regression.diffs and other logs from the SQL regression tests are in a directory called "main" (same for "isolation"). I imagine I won't be the last person to spend minutes looking through the list of test dirs for the entry called "regress", conclude that it's inexplicably absent, and locate it only after reading src/test/regress/meson.build. -- Justin
Re: missing indexes in indexlist with partitioned tables
On Tue, Sep 20, 2022 at 4:53 PM David Rowley wrote: > Arne sent me an off-list > message to say he's planning on working on the patch that uses the > existing field instead of the new one he originally added. Let's hold > off for that patch. I wouldn't say, I explicitly stated that. But I ended up doing something, that resulted in the attached patch. :) For my own sanity I greped one last time for the usage of indexlist. Most of the (untouched) usages have comments that, they are only called for baserels/plain tables. Namely all but the cluster of partitioned tables. I had to reread that section. There we are just traversing the tree and omitting partitioned tables. There is now a test section in join.sql for partitioned tables, that tests very similar to the baserel case. That's more thorough, than what I originally went for. Further feedback would be appreciated! Regards Arne From 290252bab5837c1a6f42bd53cf788c8696d5d0ec Mon Sep 17 00:00:00 2001 From: Arne Roland Date: Sat, 1 Oct 2022 18:14:34 +0200 Subject: [PATCH v8_indexlist_contains_partitioned_indexes] v8 --- src/backend/optimizer/util/plancat.c | 226 ++- src/backend/storage/buffer/bufmgr.c | 6 +- src/backend/utils/adt/selfuncs.c | 4 + src/test/regress/expected/inherit.out| 6 + src/test/regress/expected/join.out | 124 +- src/test/regress/expected/partition_join.out | 30 +-- src/test/regress/sql/inherit.sql | 2 + src/test/regress/sql/join.sql| 69 +- src/test/regress/sql/partition_join.sql | 6 +- 9 files changed, 348 insertions(+), 125 deletions(-) diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 6d5718ee4c..7545ae862d 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -106,10 +106,12 @@ static void set_baserel_partition_constraint(Relation relation, * cases these are left as zeroes, but sometimes we need to compute attr * widths here, and we may as well cache the results for costsize.c. * - * If inhparent is true, all we need to do is set up the attr arrays: - * the RelOptInfo actually represents the appendrel formed by an inheritance - * tree, and so the parent rel's physical size and index information isn't - * important for it. + * If inhparent is true, we don't care about physical size, index am + * and estimates. + * We still need index uniqueness for join removal. + * When it comes to the path, the RelOptInfo actually represents + * the appendrel formed by an inheritance tree, and so the parent + * rel's physical size isn't important for it. */ void get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, @@ -157,10 +159,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* * Make list of indexes. Ignore indexes on system catalogs if told to. - * Don't bother with indexes for an inheritance parent, either. + * Don't bother with indexes from traditional inheritance parents. + * We care about partitioned indexes for join pruning, + * even if we don't have any am for them. */ - if (inhparent || - (IgnoreSystemIndexes && IsSystemRelation(relation))) + if ((inhparent && relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + || (IgnoreSystemIndexes && IsSystemRelation(relation))) hasindex = false; else hasindex = relation->rd_rel->relhasindex; @@ -191,8 +195,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, IndexAmRoutine *amroutine; IndexOptInfo *info; int ncolumns, - nkeycolumns; - int i; + nkeycolumns, + i; /* * Extract info from the relation descriptor for the index. @@ -213,16 +217,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, continue; } - /* - * Ignore partitioned indexes, since they are not usable for - * queries. - */ - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) - { -index_close(indexRelation, NoLock); -continue; - } - /* * If the index is valid, but cannot yet be used, ignore it; but * mark the plan we are generating as transient. See @@ -267,108 +261,133 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->relam = indexRelation->rd_rel->relam; - /* We copy just the fields we need, not all of rd_indam */ - amroutine = indexRelation->rd_indam; - info->amcanorderbyop = amroutine->amcanorderbyop; - info->amoptionalkey = amroutine->amoptionalkey; - info->amsearcharray = amroutine->amsearcharray; - info->amsearchnulls = amroutine->amsearchnulls; - info->amcanparallel = amroutine->amcanparallel; - info->amhasgettuple = (amroutine->amgettuple != NULL); - info->amhasgetbitmap = amroutine->amgetbitmap != NULL && -relation->rd_tableam->scan_bitmap_next_block != NULL; - info->amcanma
Re: Question: test "aggregates" failed in 32-bit machine
I wrote: > So at this point I've lost all faith in the estimates being meaningful > at all. I spent some time today looking into the question of what our qsort code actually does. I wrote a quick-n-dirty little test module (attached) to measure the number of comparisons qsort really uses for assorted sample inputs. The results will move a bit from run to run because of randomization, but the average counts should be pretty stable I think. I got results like these: regression=# create temp table data as select * from qsort_comparisons(10); SELECT 10 regression=# select n * log(groups)/log(2) as est, 100*(n * log(groups)/log(2) - avg_cmps)/avg_cmps as pct_err, * from data; est | pct_err | n| groups | avg_cmps | min_cmps | max_cmps | note ++++--+--+--+ 0 | -100 | 10 | 1 |9 | 9 |9 | all values the same 1660964.0474436812 | -5.419880052975057 | 10 | 10 | 1756145 | 1722569 | 1835627 | all values distinct 10 | -33.33911061041376 | 10 | 2 | 150013 | 150008 | 150024 | 2 distinct values 40 | 11.075628618635713 | 10 | 16 | 360115 | 337586 | 431376 | 16 distinct values 60 | 8.369757612975473 | 10 | 64 | 553660 | 523858 | 639492 | 64 distinct values 80 | 4.770461016221087 | 10 |256 | 763574 | 733898 | 844450 | 256 distinct values 100 | 1.5540821186618827 | 10 | 1024 | 984697 | 953830 | 384 | 1024 distinct values 1457116.0087927429 | 41.97897366170798 | 10 | 24342 | 1026290 | 994694 | 1089503 | Zipfian, parameter 1.1 1150828.9986140348 | 158.28880094758154 | 10 | 2913 | 445559 | 426575 | 511214 | Zipfian, parameter 1.5 578135.9713524659 | 327.6090378488971 | 10 | 55 | 135202 | 132541 | 213467 | Zipfian, parameter 3.0 (10 rows) So "N * log(NumberOfGroups)" is a pretty solid estimate for uniformly-sized groups ... except when NumberOfGroups = 1 ... but it is a significant overestimate if the groups aren't uniformly sized. Now a factor of 2X or 3X isn't awful --- we're very happy to accept estimates only that good in other contexts --- but I still wonder whether this is reliable enough to justify the calculations being done in compute_cpu_sort_cost. I'm still very afraid that the conclusions we're drawing about the sort costs for different column orders are mostly junk. In any case, something's got to be done about the failure at NumberOfGroups = 1. Instead of this: correctedNGroups = Max(1.0, ceil(correctedNGroups)); per_tuple_cost += totalFuncCost * LOG2(correctedNGroups); I suggest if (correctedNGroups > 1.0) per_tuple_cost += totalFuncCost * LOG2(correctedNGroups); else /* Sorting N all-alike tuples takes only N-1 comparisons */ per_tuple_cost += totalFuncCost; (Note that the ceil() here is a complete waste, because all paths leading to this produced integral estimates already. Even if they didn't, I see no good argument why ceil() makes the result better.) I'm still of the opinion that we need to revert this code for now. regards, tom lane /* create function qsort_comparisons(N int) returns table (N int, groups int, avg_cmps int8, min_cmps int8, max_cmps int8, note text) strict volatile language c as '/path/to/count_comparisons.so'; select * from qsort_comparisons(1); */ #include "postgres.h" #include #include "common/pg_prng.h" #include "fmgr.h" #include "funcapi.h" #include "miscadmin.h" #include "utils/tuplestore.h" PG_MODULE_MAGIC; /* * qsort_arg comparator function for integers. * The number of calls is tracked in *arg. */ static int cmp_func(const void *a, const void *b, void *arg) { int aa = *(const int *) a; int bb = *(const int *) b; size_t *count_ptr = (size_t *) arg; (*count_ptr)++; if (aa < bb) return -1; if (aa > bb) return 1; return 0; } /* * Randomly shuffle an array of integers. */ static void shuffle(int *x, int n) { /* * Shuffle array using Fisher-Yates algorithm. Iterate array and swap head * (i) with a randomly chosen same-or-later item (j) at each iteration. */ for (int i = 0; i < n; i++) { int j = (int) pg_prng_uint64_range(&pg_global_prng_state, i, n - 1); int tmp = x[i]; x[i] = x[j]; x[j] = tmp; } } /* * Test qsort for the given test case. * Shuffle the given array, sort it using qsort_arg, * and report the numbers of comparisons made. */ static void qsort_test_case(int *array, int n, char *note, int trials, Tuplestorestate *tupstore, AttInMetadata *attinmeta) { size_t tot_count = 0; size_t min_count = SIZE_MAX;
Re: Question: test "aggregates" failed in 32-bit machine
On Sat, Oct 1, 2022 at 12:14 PM Tom Lane wrote: > I spent some time today looking into the question of what our qsort > code actually does. I wrote a quick-n-dirty little test module > (attached) to measure the number of comparisons qsort really uses > for assorted sample inputs. Reminds me of the other sort testing program that you wrote when the B&M code first went in: https://www.postgresql.org/message-id/18732.1142967...@sss.pgh.pa.us This was notable for recreating the tests from the original B&M paper. The paper uses various types of test inputs with characteristics that were challenging to the implementation and worth specifically getting right. For example, "saw tooth" input. -- Peter Geoghegan
Re: [PATCH v1] [meson] add a default option prefix=/usr/local/pgsql
Hi, On 2022-09-30 19:07:21 -0700, Andres Freund wrote: > The attached 0001 calls into a meson helper command to do this. Not > particularly pretty, but less code than before, and likely more reliable. I pushed Junwang Zhao's patch together with this change, after adding a comment to the setting of the default prefix explaining how it affects windows. Greetings, Andres Freund
Re: Question: test "aggregates" failed in 32-bit machine
Peter Geoghegan writes: > Reminds me of the other sort testing program that you wrote when the > B&M code first went in: > https://www.postgresql.org/message-id/18732.1142967...@sss.pgh.pa.us Ha, I'd totally forgotten about that ... regards, tom lane
Re: pg_regress: lookup shellprog in $PATH
On Thu, Aug 25, 2022 at 04:04:39PM -0400, Tom Lane wrote: > Robert Haas writes: > > On Thu, Aug 25, 2022 at 10:48 AM Tom Lane wrote: > >> If we were executing a program that the user needs to have some control > >> over, sure, but what we have here is an implementation detail that I > >> doubt anyone cares about. The fact that we're using a shell at all is > >> only because nobody has cared to manually implement I/O redirection logic > >> in these places; otherwise we'd be execl()'ing the server or psql directly. > >> Maybe the best answer would be to do that, and get out of the business > >> of knowing where the shell is? > The Windows side of this is completely untested and may be broken; also, > perhaps Windows has something more nearly equivalent to execvp() that we > could use instead of reconstructing a command line? It's annoying that Windows has nothing like execvp(), unfortunately. > the patch removes all shell-quoting hazards on the Unix side but they > are still there on the Windows side. It's feasible to take cmd.exe out of the loop. One could then eliminate cmd.exe quoting (the "^" characters). Can't avoid the rest of the quoting (https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args#parsing-c-command-line-arguments). Bypassing cmd.exe would also make it easy to remove the ban on newlines and carriage returns in arguments.
Re: interrupted tap tests leave postgres instances around
Hi, On 2022-09-30 11:17:00 +0200, Alvaro Herrera wrote: > But on testing, some nodes linger after being sent a shutdown signal. > I'm not clear why this is -- I think it's due to the fact that we send > the signal just as the node is starting up, which means the signal > doesn't reach the process. I suspect it's when a test gets interrupt while pg_ctl is starting the backend. The start() routine only does _update_pid() after pg_ctl finished, and terminate()->stop() returns before doing anything if pid isn't defined. Perhaps the END{} routine should call $node->_update_pid(-1); if $exit_code != 0 and _pid is undefined? That does seem to reduce the incidence of "leftover" postgres instances. 001_start_stop.pl leaves some behind, but that makes sense, because it's bypassing the whole node management. But I still occasionally see some remaining processes if I crank up test concurrency. Ah! At least part of the problem is that sub stop() does BAIL_OUT, and of course it can fail as part of the shutdown. But there's still some that survive, where your perl.trace doesn't contain the node getting shut down... Greetings, Andres Freund
Re: Question: test "aggregates" failed in 32-bit machine
On 10/1/22 3:13 PM, Tom Lane wrote: I'm still of the opinion that we need to revert this code for now. [RMT hat, but speaking just for me] reading through Tom's analysis, this seems to be the safest path forward. I have a few questions to better understand: 1. How invasive would the revert be? 2. Are the other user-visible items that would be impacted? 3. Is there an option of disabling the feature by default viable? Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: longfin and tamandua aren't too happy but I'm not sure why
Hi, On 2022-10-01 11:14:20 -0500, Justin Pryzby wrote: > I just tried this, which works fine at least for v11-v14: > | git checkout origin/REL_15_STABLE .cirrus.yml src/tools/ci > > https://cirrus-ci.com/task/5742859943936000 v15a > https://cirrus-ci.com/task/6725412431593472 v15b > https://cirrus-ci.com/task/5105320283340800 v13 > https://cirrus-ci.com/task/4809469463887872 v12 > https://cirrus-ci.com/task/6659971021537280 v11 Cool, thanks for trying that! I wonder if there's any problems on other branches... > (I still suggest my patches to run all tests using vcregress. The number of > people who remember that, for v15, cirrusci runs incomplete tests is probably > fewer than five.) > https://www.postgresql.org/message-id/20220623193125.GB22452%40telsasoft.com > https://www.postgresql.org/message-id/2022082817.GA21897%40telsasoft.com Andrew, the defacto maintainer of src/tools/msvc, kind of NACKed those. But the reasoning might not hold with vcregress being on life support. OTOH, to me the basic advantage is to have *any* CI coverage. We don't need to put the bar for the backbranches higher than were we were at ~2 weeks ago. > If cirrusci were backpatched, it'd be kind of nice to use a ccache key > that includes the branch name (but maybe the overhead of compilation is > unimportant compared to the workload induced by cfbot). Hm. The branch name in general sounds like it might be too broad, particularly for cfbot. I think we possibly should just put the major version into .cirrus.yml and use that as the cache key. I think that'd also solve some of the "diff against what" arguments we've had around your CI improvements. > A gripe from me: the regression.diffs and other logs from the SQL regression > tests are in a directory called "main" (same for "isolation"). I imagine I > won't be the last person to spend minutes looking through the list of test > dirs > for the entry called "regress", conclude that it's inexplicably absent, and > locate it only after reading src/test/regress/meson.build. I'd have no problem renaming main/isolation to isolation/isolation and main/regress to pg_regress/regress or such. FWIW, if you add --print-errorlogs meson test will show you the output of just failed tests, which for pg_regress style tests will include the path to regression.diffs: ... The differences that caused some tests to fail can be viewed in the file "/srv/dev/build/m/testrun/cube/regress/regression.diffs". A copy of the test summary that you see above is saved in the file "/srv/dev/build/m/testrun/cube/regress/regression.out". It's too bad the default of --print-errorlogs can't be changed. Unfortunately we don't print something as useful in the case of tap tests. I wonder if we should do something like diff --git i/src/test/perl/PostgreSQL/Test/Utils.pm w/src/test/perl/PostgreSQL/Test/Utils.pm index 99d33451064..acc18ca7c85 100644 --- i/src/test/perl/PostgreSQL/Test/Utils.pm +++ w/src/test/perl/PostgreSQL/Test/Utils.pm @@ -239,6 +239,8 @@ END # # Preserve temporary directories after (1) and after (2). $File::Temp::KEEP_ALL = 1 unless $? == 0 && all_tests_passing(); + +diag("test logfile: $test_logfile"); } =pod Potentially doing so only if $? != 0. This would make the output for a failing test end like this: ✀ ― stderr: # Failed test at /home/andres/src/postgresql/contrib/amcheck/t/001_verify_heapam.pl line 20. # Failed test at /home/andres/src/postgresql/contrib/amcheck/t/001_verify_heapam.pl line 22. # test logfile: /srv/dev/build/m/testrun/amcheck/001_verify_heapam/log/regress_log_001_verify_heapam # Looks like you failed 2 tests of 275. (test program exited with status code 2) ― which should make it a lot easier to find the log? Greetings, Andres Freund
Re: Question: test "aggregates" failed in 32-bit machine
"Jonathan S. Katz" writes: > On 10/1/22 3:13 PM, Tom Lane wrote: >> I'm still of the opinion that we need to revert this code for now. > [RMT hat, but speaking just for me] reading through Tom's analysis, this > seems to be the safest path forward. I have a few questions to better > understand: > 1. How invasive would the revert be? I've just finished constructing a draft full-reversion patch. I'm not confident in this yet; in particular, teasing it apart from 1349d2790 ("Improve performance of ORDER BY / DISTINCT aggregates") was fairly messy. I need to look through the regression test changes and make sure that none are surprising. But this is approximately the right scope if we rip it out entirely. I plan to have a look tomorrow at the idea of reverting only the cost_sort changes, and rewriting get_cheapest_group_keys_order() to just sort the keys by decreasing numgroups estimates as I suggested upthread. That might be substantially less messy, because of fewer interactions with 1349d2790. > 2. Are the other user-visible items that would be impacted? See above. (But note that 1349d2790 is HEAD-only, not in v15.) > 3. Is there an option of disabling the feature by default viable? Not one that usefully addresses my concerns. The patch did add an enable_group_by_reordering GUC which we could change to default-off, but it does nothing about the cost_sort behavioral changes. I would be a little inclined to rip out that GUC in either case, because I doubt that we need it with the more restricted change. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 2e4e82a94f..cc9e39c4a5 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -2862,13 +2862,16 @@ select c2 * (random() <= 1)::int as c2 from ft2 group by c2 * (random() <= 1)::i -- GROUP BY clause in various forms, cardinal, alias and constant expression explain (verbose, costs off) select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2; - QUERY PLAN - - Foreign Scan + QUERY PLAN +--- + Sort Output: (count(c2)), c2, 5, 7.0, 9 - Relations: Aggregate on (public.ft1) - Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY 2, 3, 5 ORDER BY c2 ASC NULLS LAST -(4 rows) + Sort Key: ft1.c2 + -> Foreign Scan + Output: (count(c2)), c2, 5, 7.0, 9 + Relations: Aggregate on (public.ft1) + Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY 2, 3, 5 +(7 rows) select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2; w | x | y | z diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d8848bc774..d750290f13 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5062,20 +5062,6 @@ ANY num_sync ( - enable_group_by_reordering (boolean) - - enable_group_by_reordering configuration parameter - - - - -Enables or disables reordering of keys in a GROUP BY -clause. The default is on. - - - - enable_hashagg (boolean) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index f486d42441..5ef29eea69 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1814,327 +1814,6 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm) rterm->pathtarget->width); } -/* - * is_fake_var - * Workaround for generate_append_tlist() which generates fake Vars with - * varno == 0, that will cause a fail of estimate_num_group() call - * - * XXX Ummm, why would estimate_num_group fail with this? - */ -static bool -is_fake_var(Expr *expr) -{ - if (IsA(expr, RelabelType)) - expr = (Expr *) ((RelabelType *) expr)->arg; - - return (IsA(expr, Var) && ((Var *) expr)->varno == 0); -} - -/* - * get_width_cost_multiplier - * Returns relative complexity of comparing two values based on its width. - * The idea behind is that the comparison becomes more expensive the longer the - * value is. Return value is in cpu_operator_cost units. - */ -static double -get_width_cost_multiplier(PlannerInfo *root, Expr *expr) -{ - double width = -1.0; /* fake value */ - - if (IsA(expr, RelabelType)) - expr = (Expr *) ((RelabelType *) expr)->arg; - - /* Try to find actual stat in corresponding relation */ - if (IsA(expr, Var)) - { - Var *var = (Var *) expr; - - if (var->varno > 0 && var->varno < root->simple_rel_array
Re: longfin and tamandua aren't too happy but I'm not sure why
On Sat, Oct 01, 2022 at 03:15:14PM -0700, Andres Freund wrote: > On 2022-10-01 11:14:20 -0500, Justin Pryzby wrote: > > (I still suggest my patches to run all tests using vcregress. The number of > > people who remember that, for v15, cirrusci runs incomplete tests is > > probably > > fewer than five.) > > https://www.postgresql.org/message-id/20220623193125.GB22452%40telsasoft.com > > https://www.postgresql.org/message-id/2022082817.GA21897%40telsasoft.com > > Andrew, the defacto maintainer of src/tools/msvc, kind of NACKed those. But > the reasoning might not hold with vcregress being on life support. I think you're referring to comment here: 87a81b91-87bf-c0bc-7e4f-06dffadcf...@dunslane.net ..which I tried to discuss here: 20220528153741.gk19...@telsasoft.com | I think there was some confusion about the vcregress "alltaptests" | target. I said that it's okay to add it and make cirrus use it (and | that the buildfarm could use it too). Andrew responded that the | buildfarm wants to run different tests separately. But Andres seems | to have interpretted that as an objection to the addition of an | "alltaptests" target, which I think isn't what's intended - it's fine | if the buildfarm prefers not to use it. > OTOH, to me the basic advantage is to have *any* CI coverage. We don't need to > put the bar for the backbranches higher than were we were at ~2 weeks ago. I agree that something is frequently better than nothing. But it could be worse if it gives the impression that "CI showed that everything was green", when in fact it hadn't run 10% of the tests: https://www.postgresql.org/message-id/CA%2BhUKGLneD%2Bq%2BE7upHGwn41KGvbxhsKbJ%2BM-y9nvv7_Xjv8Qog%40mail.gmail.com > I'd have no problem renaming main/isolation to isolation/isolation and > main/regress to pg_regress/regress or such. +1 -- Justin
Re: ci: reduce macos test concurrency
Hi, On 2022-09-26 21:02:08 -0700, Andres Freund wrote: > The x86 mac VMs from cirrus-ci claim to have 12 CPUs, but when working on > initdb caching for tests I noticed that using all those CPUs for tests hurts > the test times noticably. > > See [1] (note that the overall time is influenced by different degrees of > cache hit ratios): > > concurrency test time: > 405:58 > 605:09 > 804:58 > 10 05:58 > 12 (current) 06:58 > > There's a fair bit of run-to-run variance, but the rough shape of these > looks repeatable. > > I suspect the VMs might be overcommitted a fair bit - or macos just scales > poorly. Cirrus-ci apparently is switching to M1 based macs, which could be > related. It'd be good for us do that switch, as it'd give use ARM coverage for > CI / cfbot. See also [3]. > > > In 15 (and thus autoconf) the timings differ a bit less [2]: > > concurrency test time: > 406:54 > 605:43 > 806:09 > 10 06:01 > 12 (current) 06:38 > > Looks like changing TEST_JOBS=6 or 8 would be a good idea. Set it to 8 now. Greetings, Andres Freund
Re: CI and test improvements
Hi, On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote: > From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Sun, 3 Apr 2022 00:10:20 -0500 > Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats > > Since v4.0, ccache enables zstd compression by default, saving roughly > 2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable > ccache compression, allowing cirrus to gzip the uncompressed data > (better than ccache's default of zstd-1). I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's default IIRC). It'd be good if we could increase cache utilization. > From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Thu, 14 Apr 2022 06:27:07 -0500 > Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and > freebsd > > windows is slower than freebsd and mac, so it's okay to enable options which > will slow them down some. Also, the cirrusci mac instances always have lot of > cores available. > See: > https://www.postgresql.org/message-id/20211217193159.pwrelhiyx7kev...@alap3.anarazel.de > https://www.postgresql.org/message-id/20211213211223.vkgg3wwiss2tragj%40alap3.anarazel.de > https://www.postgresql.org/message-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA%40mail.gmail.com > https://www.postgresql.org/message-id/20220325000933.vgazz7pjk2ytj...@alap3.anarazel.de > > ci-os-only: freebsd, macos > --- > .cirrus.yml | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/.cirrus.yml b/.cirrus.yml > index 183e8746ce6..4ad20892eeb 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -113,7 +113,9 @@ task: > \ > CC="ccache cc" \ > CXX="ccache c++" \ > -CFLAGS="-Og -ggdb" > +CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES > -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST" \ > +CXXFLAGS="-Og -ggdb -march=native -mtune=native" \ > +CFLAGS="-Og -ggdb -march=native -mtune=native" What's reason for -march=native -mtune=native here? > EOF >build_script: | > su postgres -c "ccache --zero-stats" > @@ -336,8 +338,8 @@ task: >CC="ccache cc" \ >CXX="ccache c++" \ >CLANG="ccache ${brewpath}/llvm/bin/ccache" \ > - CFLAGS="-Og -ggdb" \ > - CXXFLAGS="-Og -ggdb" \ > + CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > + CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ >\ >LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \ >PYTHON=python3 I'd also use CPPFLAGS here, given you'd used it above... I'm planning to commit an updated version of this change soon, without the -march=native -mtune=native bit, unless somebody protests... Greetings, Andres Freund
Re: CI and test improvements
On Sat, Oct 01, 2022 at 05:45:01PM -0700, Andres Freund wrote: > Hi, > > On 2022-09-10 15:05:42 -0500, Justin Pryzby wrote: > > From 4ed5eb427de4508a4c3422e60891b45c8512814a Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby > > Date: Sun, 3 Apr 2022 00:10:20 -0500 > > Subject: [PATCH 03/23] cirrus/ccache: disable compression and show stats > > > > Since v4.0, ccache enables zstd compression by default, saving roughly > > 2x-3x. But, cirrus caches are compressed as tar.gz, so we could disable > > ccache compression, allowing cirrus to gzip the uncompressed data > > (better than ccache's default of zstd-1). > > I wonder whether we could instead change CCACHE_COMPRESSLEVEL (maybe 3, zstd's > default IIRC). It'd be good if we could increase cache utilization. I considered that (and I think that's what I wrote initially). I figured that if cirrus is going to use gzip-6 (tar.gz) in any case, we might as well disable compression. Then, all the tasks are also doing the same thing (half the tasks have ccache before 4.0). > > From 0bd5f51b8c143ed87a867987309d66b8554b1fd6 Mon Sep 17 00:00:00 2001 > > From: Justin Pryzby > > Date: Thu, 14 Apr 2022 06:27:07 -0500 > > Subject: [PATCH 05/23] cirrus: enable various runtime checks on macos and > > freebsd > > > > windows is slower than freebsd and mac, so it's okay to enable options which > > will slow them down some. Also, the cirrusci mac instances always have lot > > of > > cores available. > > > See: > > https://www.postgresql.org/message-id/20211217193159.pwrelhiyx7kev...@alap3.anarazel.de > > https://www.postgresql.org/message-id/20211213211223.vkgg3wwiss2tragj%40alap3.anarazel.de > > https://www.postgresql.org/message-id/CAH2-WzmevBhKNEtqX3N-Tkb0gVBHH62C0KfeTxXzqYES_PiFiA%40mail.gmail.com > > https://www.postgresql.org/message-id/20220325000933.vgazz7pjk2ytj...@alap3.anarazel.de > > > > ci-os-only: freebsd, macos > > --- > > .cirrus.yml | 8 +--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/.cirrus.yml b/.cirrus.yml > > index 183e8746ce6..4ad20892eeb 100644 > > --- a/.cirrus.yml > > +++ b/.cirrus.yml > > @@ -113,7 +113,9 @@ task: > > \ > > CC="ccache cc" \ > > CXX="ccache c++" \ > > -CFLAGS="-Og -ggdb" > > +CPPFLAGS="-DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES > > -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST" \ > > +CXXFLAGS="-Og -ggdb -march=native -mtune=native" \ > > +CFLAGS="-Og -ggdb -march=native -mtune=native" > > What's reason for -march=native -mtune=native here? No particular reason, and my initial patch didn't have it. I suppose I added it to test its effect and never got rid of it. > > EOF > >build_script: | > > su postgres -c "ccache --zero-stats" > > @@ -336,8 +338,8 @@ task: > >CC="ccache cc" \ > >CXX="ccache c++" \ > >CLANG="ccache ${brewpath}/llvm/bin/ccache" \ > > - CFLAGS="-Og -ggdb" \ > > - CXXFLAGS="-Og -ggdb" \ > > + CFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > > + CXXFLAGS="-Og -ggdb -DRANDOMIZE_ALLOCATED_MEMORY" \ > >\ > >LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \ > >PYTHON=python3 > > I'd also use CPPFLAGS here, given you'd used it above... > > I'm planning to commit an updated version of this change soon, without the > -march=native -mtune=native bit, unless somebody protests... One other thing is that your -m32 changes caused the linux/meson task to take an additional 3+ minutes (total ~8). That's no issue, except that the Warnings task depends on the linux/mason task, and itself can take up to 15 minutes. So those two potentially take as long as the windows task. I suggested that CompileWarnings could instead "Depend on: Freebsd", which currently takes 6-7min (and could take 4-5min if given more CPUs). -- Justin
Re: TAP output format in pg_regress
Hi, On 2022-09-01 14:21:18 +0200, Daniel Gustafsson wrote: > Attached is a v8 which fixes a compiler warning detected by the CFBot. cfbot at the moment does show a warning. A bit surprised to see this warning enabled by default in gcc, but it seems correct here: [20:57:02.892] make -s -j${BUILD_JOBS} clean [20:57:03.326] time make -s -j${BUILD_JOBS} world-bin [20:57:12.882] pg_regress.c: In function ‘bail’: [20:57:12.882] pg_regress.c:219:2: error: function ‘bail’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] [20:57:12.882] 219 | vsnprintf(tmp, sizeof(tmp), fmt, ap); [20:57:12.882] | ^ Now that meson is merged, it'd be worthwhile to change the invocation for pg_regress in the toplevel meson.build (search for "Test Generation"). Greetings, Andres Freund
Re: Tightening behaviour for non-immutable behaviour in immutable functions
Hi, On 2022-06-16 12:04:12 -0400, Greg Stark wrote: > Of course this patch is still very WIP. Only one or the other function > makes sense to keep. And I'm not opposed to having a GUC to > enable/disable the enforcement or warnings. And the code itself needs > to be cleaned up with parts of it moving to guc.c and/or namespace.c. This currently obviously doesn't pass tests - are you planning to work on this further? As is I'm not really clear what the CF entry is for. Given the current state it doesn't look like it's actually looking for review? Greetings, Andres Freund
Re: Eliminating SPI from RI triggers - take 2
Hi, On 2022-09-29 18:18:10 +0900, Amit Langote wrote: > So, here's a final revision for today. Sorry for the noise. This appears to fail on 32bit systems. Seems the new test is indeed worthwhile... https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406 [19:12:24.452] Summary of Failures: [19:12:24.452] [19:12:24.452] 2/243 postgresql:main / main/regress FAIL 45.08s (exit status 1) [19:12:24.452] 4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR 71.96s [19:12:24.452] 32/243 postgresql:recovery / recovery/027_stream_regress ERROR 45.84s Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have suceeded in capture the test files of the 32bit build (and perhaps broke it for 64bit builds as well?), so I can't see the regression.diffs contents. [19:12:24.387] alter_table ... FAILED 4546 ms ... [19:12:24.387] [19:12:24.387] 1 of 211 tests failed. [19:12:24.387] [19:12:24.387] ... Greetings, Andres Freund
Re: Eliminating SPI from RI triggers - take 2
Hi, On 2022-10-01 18:21:15 -0700, Andres Freund wrote: > On 2022-09-29 18:18:10 +0900, Amit Langote wrote: > > So, here's a final revision for today. Sorry for the noise. > > This appears to fail on 32bit systems. Seems the new test is indeed > worthwhile... > > https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406 > > [19:12:24.452] Summary of Failures: > [19:12:24.452] > [19:12:24.452] 2/243 postgresql:main / main/regress >FAIL 45.08s (exit status 1) > [19:12:24.452] 4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade >ERROR 71.96s > [19:12:24.452] 32/243 postgresql:recovery / recovery/027_stream_regress >ERROR 45.84s > > Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have > suceeded in capture the test files of the 32bit build (and perhaps broke it > for 64bit builds as well?), so I can't see the regression.diffs contents. Oh, that appears to have been an issue on the CI side (*), while uploading the logs. The previous run did catch the error: diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out --- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out 2022-09-30 15:05:49.930613669 + +++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out 2022-09-30 15:11:21.050383258 + @@ -672,6 +672,8 @@ ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable; -- Check it actually works INSERT INTO FKTABLE VALUES(42);-- should succeed +ERROR: insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey" +DETAIL: Key (ftest1)=(42) is not present in table "pktable". INSERT INTO FKTABLE VALUES(43);-- should fail ERROR: insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey" DETAIL: Key (ftest1)=(43) is not present in table "pktable". Greetings, Andres Freund * Error from upload stream: rpc error: code = Unknown desc =
Re: CI and test improvements
Hi, On 2022-10-01 19:58:01 -0500, Justin Pryzby wrote: > One other thing is that your -m32 changes caused the linux/meson task to > take an additional 3+ minutes (total ~8). That's no issue, except that > the Warnings task depends on the linux/mason task, and itself can take > up to 15 minutes. > So those two potentially take as long as the windows task. > I suggested that CompileWarnings could instead "Depend on: Freebsd", > which currently takes 6-7min (and could take 4-5min if given more CPUs). I am wondering if we should instead introduce a new "quickcheck" task that just compiles and runs maybe one test and have *all* other tests depend on that. Wasting a precious available windows instance to just fail to build or immediately fail during tests doesn't really make sense. Greetings, Andres Freund
Re: Documentation building fails on HTTPS redirect (again)
Hi, On 2022-09-30 11:35:36 +0200, Daniel Gustafsson wrote: > Installing the stylesheets locally as we document solves the issue of course, > but maybe it's time to move to using --nonet as we discussed in [0] and > require > the stylesheets locally? It's a shame that casual contributions require a big > investment in installation, but it seems hard to get around. docbooks-xml and docbooks-xsl aren't that big (adding 8MB to a minimal debian install). However a) we document installing fop as well, even though it's not needed for the html docs build b) the dependencies recommended by the debian packages increase the size a lot. Just using our documented line ends up with 550MB. Perhaps separating out fop and using --no-install-recommends (and other similar flags) makes it less of an issue? We probably should work to deliver a more usable error than what just using --nonet gives you... Greetings, Andres Freund
Re: proposal: possibility to read dumped table's name from file
Hi, On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > Correct, fixed in the attached. Updated patch adding meson compatibility attached. Greetings, Andres Freund >From 5d3ba4d2d6567626ccc0019208ea4c0ea91ac866 Mon Sep 17 00:00:00 2001 From: Daniel Gustafsson Date: Wed, 7 Sep 2022 15:22:05 +0200 Subject: [PATCH v5] Add include/exclude filtering via file in pg_dump Author: Pavel Stehule Discussion: https://postgr.es/m/CAFj8pRB10wvW0CC9Xq=1XDs=zcqxer3cblcnza+qix4cuh-...@mail.gmail.com --- src/bin/pg_dump/.gitignore| 4 + src/bin/pg_dump/Makefile | 17 +++- src/bin/pg_dump/filter.h | 44 ++ src/bin/pg_dump/filterparse.y | 64 ++ src/bin/pg_dump/filterscan.l | 139 ++ src/bin/pg_dump/meson.build | 18 src/bin/pg_dump/pg_backup_utils.c | 33 +++ src/bin/pg_dump/pg_backup_utils.h | 1 + src/bin/pg_dump/pg_dump.c | 56 src/bin/pg_dump/pg_dumpall.c | 65 +- src/bin/pg_dump/pg_restore.c | 58 + doc/src/sgml/ref/pg_dump.sgml | 88 +++ doc/src/sgml/ref/pg_dumpall.sgml | 22 + doc/src/sgml/ref/pg_restore.sgml | 25 ++ src/tools/msvc/Mkvcbuild.pm | 4 +- 15 files changed, 634 insertions(+), 4 deletions(-) create mode 100644 src/bin/pg_dump/filter.h create mode 100644 src/bin/pg_dump/filterparse.y create mode 100644 src/bin/pg_dump/filterscan.l diff --git a/src/bin/pg_dump/.gitignore b/src/bin/pg_dump/.gitignore index e6d78127793..11f2d68bea0 100644 --- a/src/bin/pg_dump/.gitignore +++ b/src/bin/pg_dump/.gitignore @@ -2,4 +2,8 @@ /pg_dumpall /pg_restore +# Local generated source files +/filterparse.c +/filterscan.c + /tmp_check/ diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile index 9dc5a784dd2..e3befdc9b1f 100644 --- a/src/bin/pg_dump/Makefile +++ b/src/bin/pg_dump/Makefile @@ -26,6 +26,8 @@ OBJS = \ $(WIN32RES) \ compress_io.o \ dumputils.o \ + filterparse.o \ + filterscan.o \ parallel.o \ pg_backup_archiver.o \ pg_backup_custom.o \ @@ -37,14 +39,23 @@ OBJS = \ all: pg_dump pg_restore pg_dumpall +# See notes in src/backend/parser/Makefile about the following two rules +filterparse.h: filterparse.c + touch $@ + +filterparse.c: BISONFLAGS += -d + +# Force these dependencies to be known even without dependency info built: +filterparse.o filterscan.o: filterparse.h + pg_dump: pg_dump.o common.o pg_dump_sort.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils $(CC) $(CFLAGS) pg_dump.o common.o pg_dump_sort.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) pg_restore: pg_restore.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils $(CC) $(CFLAGS) pg_restore.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) -pg_dumpall: pg_dumpall.o dumputils.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils - $(CC) $(CFLAGS) pg_dumpall.o dumputils.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) +pg_dumpall: pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils + $(CC) $(CFLAGS) pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) install: all installdirs $(INSTALL_PROGRAM) pg_dump$(X) '$(DESTDIR)$(bindir)'/pg_dump$(X) @@ -63,6 +74,8 @@ installcheck: uninstall: rm -f $(addprefix '$(DESTDIR)$(bindir)'/, pg_dump$(X) pg_restore$(X) pg_dumpall$(X)) +distprep: filterparse.c filterscan.c + clean distclean maintainer-clean: rm -f pg_dump$(X) pg_restore$(X) pg_dumpall$(X) $(OBJS) pg_dump.o common.o pg_dump_sort.o pg_restore.o pg_dumpall.o rm -rf tmp_check diff --git a/src/bin/pg_dump/filter.h b/src/bin/pg_dump/filter.h new file mode 100644 index 000..5dff4161f02 --- /dev/null +++ b/src/bin/pg_dump/filter.h @@ -0,0 +1,44 @@ +/*- + * + * filter.h + * Common header file for the parser of filter file + * + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/bin/pg_dump/filter.h + * + *- + */ +#ifndef FILTER_H +#define FILTER_H +#include "c.h" + +/* + * List of objects that can be specified in filter file + */ +typedef enum +{ + FILTER_OBJECT_TYPE_NONE, + FILTER_OBJECT_TYPE_DATA, + FILTER_OBJECT_TYPE_DATABASE, + FILTER_OBJECT_TYPE_FOREIGN_DATA, + FILTER_OBJECT_TYPE_FUNCTION, + FILTER_OBJECT_TYPE_INDEX, + FILTER_OBJECT_TYPE_SCHEMA, + FILTER_OBJECT_TYPE_TABLE, + FILTER_OBJECT_TYPE_TRIGGER, +} FilterObjectType; + +extern FILE *filter_yyin; + +extern int filter_yylex(void); +extern void filter_yyerror(void *priv, const char *msg); +extern void filter_scanner_init(void); +extern void filter_scanner_finish(void); +extern int filter_yyparse(void *pri