Re: pgbench logging broken by time logic changes
Hi Fabien, I committed the code change without the new TAP tests, because I didn't want to leave the open item hanging any longer. As for the test, ... On Sat, Jul 10, 2021 at 9:36 PM Fabien COELHO wrote: > >> I hoped we were done here but I realised that your check for 1-3 log > >> lines will not survive the harsh environment of the build farm. > >> Adding sleep(2) before the final doLog() confirms that. I had two > >> ideas: > I misread your point. You think that it should fail, but it is not > tried yet. I'm rather optimistic that it should not fail, but I'm okay > with averting the risk anyway. ... I know it can fail, and your v18 didn't fix that, because... +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, ^ | ... this range can be exceeded. That's because extra aggregations are created based on doLog() reading the clock after a transaction is finished, entirely independently of the -T mechanism deciding when to stop the benchmark, and potentially many seconds later in adverse conditions. As I mentioned, you can see it fail with your own eyes if you hack the code like so: if (agg_interval > 0) { + /* +* XXX: simulate an overloaded raspberry pi swapping to a microsd +* card or other random delays as we can expect in the build farm +*/ + sleep(3); /* log aggregated but not yet reported transactions */ doLog(thread, state, &aggs, false, 0, 0); } > I stand by this solution which should allow to get some data from the > field, as v18 attached. If all is green then the TODO could be removed > later. I suspect the number of aggregates could be made deterministic, as I described in an earlier message. What do you think about doing something like that first for the next release, before trying to add assertions about the number of aggregates? I'm with you on the importance of testing, but it seems better to start by making the thing more testable.
Re: Enhanced error message to include hint messages for redundant options error
On Sat, 10 Jul 2021 at 17:03, vignesh C wrote: > > > Also, I don't think this function should be marked inline -- using a > > normal function ought to help make the compiled code smaller. > > inline instructs the compiler to attempt to embed the function content > into the calling code instead of executing an actual call. I think we > should keep it inline to reduce the function call. Hmm, I'd say that inline should generally be used sparingly, and only for small functions that are called very often, to avoid the function call overhead, and generate a faster and possibly smaller executable. (Though I think sometimes it can still be faster if the executable is larger.) In this case, it's a function that is only called under error conditions, so it's not commonly called, and we don't care so much about performance when we're about to throw an error. Also, if you look at an ereport() such as ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), errmsg("conflicting or redundant options"), parser_errposition(pstate, defel->location))); This is a macro that's actually expanded into 5 separate function calls: - errstart() / errstart_cold() - errcode() - errmsg() - parser_errposition() - errfinish() so it's a non-trivial amount of code. Whereas, if it's not inlined, it becomes just one function call at each call-site, making for smaller, faster code in the typical case where an error is not being raised. Of course, it's possible the compiler might still decide to inline the function, if it thinks that's preferable. In some cases, we explicitly mark this type of function with pg_noinline, to avoid that, and reduce code bloat where it's used in lots of small, fast functions (see, for example, float_overflow_error()). In general though, I think inline and noinline should be reserved for special cases where they give a clear, measurable benefit, and that in general it's better to not mark the function and just let the compiler decide. Regards, Dean
Re: Enhanced error message to include hint messages for redundant options error
On Sat, 10 Jul 2021 at 18:09, vignesh C wrote: > > I'm planning to handle conflicting errors separately after this > current work is done, once the patch is changed to have just the valid > scenarios(removing the scenarios you have pointed out) existing > function can work as is without any changes. Thoughts? Ah OK, that might be reasonable. Perhaps, then errorDuplicateDefElem() and errorConflictingDefElem() would be better than what I originally suggested. BTW, another case I spotted was this: copy (select 1) to stdout csv csv header; ERROR: option "format" specified more than once LINE 1: copy (select 1) to stdout csv csv header; ^ which isn't good because there is no option called "format". Regards, Dean
Re: psql - factor out echo code
On Sat, Jul 10, 2021 at 10:25 PM Fabien COELHO wrote: > > > Hello Vignesh, > > > I am changing the status to "Needs review" as the review is not > > completed for this patch and also there are some tests failing, that > > need to be fixed: > > test test_extdepend ... FAILED 50 ms > > Indeed, > > Attached v4 simplifies the format and fixes this one. > I ran check-world, this time. Thanks for posting an updated patch, the tests are passing now. I have changed the status back to Ready For Committer. Regards, Vignesh
Re: pgbench logging broken by time logic changes
Hello Thomas, I committed the code change without the new TAP tests, because I didn't want to leave the open item hanging any longer. Ok. Good. As for the test, ... [...] Argh, so there are no tests that would have caught the regressions:-( ... I know it can fail, and your v18 didn't fix that, because... +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, ... this range can be exceeded. Indeed. I meant to move that one in the TODO section as well, not just the previous call, so that all time-sensitive tests are fully ignored but reported, which would be enough for me. I suspect the number of aggregates could be made deterministic, as I described in an earlier message. What do you think about doing something like that first for the next release, before trying to add assertions about the number of aggregates? I think that last time I did something to get more deterministic results in pgbench, which involved a few lines of hocus-pocus in pgbench, the patch got rejected:-) An "ignored" tests looked like a good compromise to check how things are going in the farm and to be able to check for more non regressions when developing pgbench, without introducing behavioral changes. I'm with you on the importance of testing, but it seems better to start by making the thing more testable. I'm used to my test patches being rejected, including modifying pgbench behavior to make it more testable. Am I mad enough to retry? Maybe, maybe not. Attached the fully "ignored" version of the time features test as a patch. -- Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 3aa9d5d753..f2a1c861b2 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -8,6 +8,7 @@ use PostgresNode; use TestLib; use Test::More; use Config; +use Time::HiRes qw(time); # start a pgbench specific server my $node = get_new_node('main'); @@ -54,12 +55,14 @@ sub pgbench push @cmd, @args; + my $start = time(); $node->command_checks_all(\@cmd, $stat, $out, $err, $name); + my $stop = time(); # cleanup? #unlink @filenames or die "cannot unlink files (@filenames): $!"; - return; + return $stop - $start; } # tablespace for testing, because partitioned tables cannot use pg_default @@ -1187,7 +1190,7 @@ sub check_pgbench_logs # $prefix is simple enough, thus does not need escaping my @logs = list_files($dir, qr{^$prefix\..*$}); - ok(@logs == $nb, "number of log files"); + ok(@logs == $nb, "number of log files (@logs)"); ok(grep(/\/$prefix\.\d+(\.\d+)?$/, @logs) == $nb, "file name format"); my $log_number = 0; @@ -1219,7 +1222,58 @@ sub check_pgbench_logs my $bdir = $node->basedir; -# Run with sampling rate, 2 clients with 50 transactions each. +TODO: { + # + # Test time-sensitive features on a light read-only transaction + # + local $TODO = "possibly unreliable on slow hosts or unlucky runs"; + + # Run with sampling rate, 2 clients with 50 transactions each. + # + # -T: bench duration, 2 seconds to exercise progress & logs + # -P: progress report + # --aggregate-interval: periodic aggregated logs + # --rate: schedule load + # --latency-limit: max delay, not deeply exercice + # + # note: the --rate behavior is probabilistic in nature. + # note: --progress-timestamp is not tested. + my $delay = pgbench( + '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2' + . ' --rate=20 --latency-limit=1000 -j ' . $nthreads + . ' -c 3 -r', + 0, + [ qr{type: multiple}, + qr{clients: 3}, + qr{threads: $nthreads}, + qr{duration: 2 s}, + qr{script 1: .* select only}, + qr{script 2: .* select only}, + qr{statement latencies in milliseconds}, + qr{FROM pgbench_accounts} ], + [ qr{vacuum}, qr{progress: 1\b} ], + 'pgbench progress', undef, + "--log-prefix=$bdir/001_pgbench_log_1"); + + # The rate may results in an unlucky schedule which triggers + # an early exit, hence the loose bound. + + # also, the delay may totally fail on very slow or overloaded hosts, + # valgrind runs... + + ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds"); + + # $nthreads threads, 2 seconds, but due to timing imprecision we might get + # only 1 or as many as 3 progress reports per thread. + # aggregate log format is: + # unix_epoch_time #tx sum sum2 min max [sum sum2 min max [skipped]] + # first series about latency; second about lag (--rate) ; + # skipped only if --latency-limit is set. + check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, + qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); +} + +# with sampling rate, 2 clients with 50 tx each pgbench( "-n -S -t 50 -c 2 --log --sampling-rate=0.5", 0, [ qr{select only}, qr{processed: 100/100} ], [qr{^$}],
Re: Enhanced error message to include hint messages for redundant options error
.On Sun, Jul 11, 2021 at 2:57 PM Dean Rasheed wrote: > > On Sat, 10 Jul 2021 at 17:03, vignesh C wrote: > > > > > Also, I don't think this function should be marked inline -- using a > > > normal function ought to help make the compiled code smaller. > > > > inline instructs the compiler to attempt to embed the function content > > into the calling code instead of executing an actual call. I think we > > should keep it inline to reduce the function call. > > Hmm, I'd say that inline should generally be used sparingly, and only > for small functions that are called very often, to avoid the function > call overhead, and generate a faster and possibly smaller executable. > (Though I think sometimes it can still be faster if the executable is > larger.) > > In this case, it's a function that is only called under error > conditions, so it's not commonly called, and we don't care so much > about performance when we're about to throw an error. > > Also, if you look at an ereport() such as > > ereport(ERROR, > errcode(ERRCODE_SYNTAX_ERROR), > errmsg("conflicting or redundant options"), > parser_errposition(pstate, defel->location))); > > This is a macro that's actually expanded into 5 separate function calls: > > - errstart() / errstart_cold() > - errcode() > - errmsg() > - parser_errposition() > - errfinish() > > so it's a non-trivial amount of code. Whereas, if it's not inlined, it > becomes just one function call at each call-site, making for smaller, > faster code in the typical case where an error is not being raised. > > Of course, it's possible the compiler might still decide to inline the > function, if it thinks that's preferable. In some cases, we explicitly > mark this type of function with pg_noinline, to avoid that, and reduce > code bloat where it's used in lots of small, fast functions (see, for > example, float_overflow_error()). > > In general though, I think inline and noinline should be reserved for > special cases where they give a clear, measurable benefit, and that in > general it's better to not mark the function and just let the compiler > decide. Ok, that makes sense. As this flow is mainly in the error part it is ok. I will change it. Regards, Vignesh
Re: Slow standby snapshot
Hello, Kirill. > Also, maybe it is better to reduce the invasivity by using a more > simple approach. For example, use the first bit to mark xid as valid > and the last 7 bit (128 values) as an optimistic offset to the next > valid xid (jump by 127 steps in the worse scenario). > What do you think? I have tried such an approach but looks like it is not effective, probably because of CPU caching issues. I have looked again at your patch, ut seems like it has a lot of issues at the moment: * error in KnownAssignedXidsGetOldestXmin, `i` is uninitialized, logic is wrong * error in compressing function (```KnownAssignedXidsValidDLL[compress_index].prv = prv;```, `prv` is never updated) * probably other errors? * compilation warnings * looks a little complex logic with `KAX_DLL_ENTRY_INVALID` * variable\methods placing is bad (see `KAX_E_INVALID` and others) * need to update documentation about KnownAssignedXidsValid, see ```To keep individual deletions cheap, we need to allow gaps in the array``` in procarray.c * formatting is broken Do you have plans to update it? If not - I could try to rewrite it. Also, what is about to add a patch to commitfest? Thanks, Michail.
Re: enable_resultcache confusion
On Mon, Jul 12, 2021 at 02:56:58AM +1200, David Rowley wrote: > On Sat, 10 Jul 2021 at 07:30, Robert Haas wrote: > > > > On Fri, Jul 9, 2021 at 11:35 AM David Rowley wrote: > > > I really like that name. > > > > > > I'll wait to see if anyone else wants to voice their opinion before I > > > do any renaming work. > > > > I like it, too. > > Great. I've attached my first draft patch to do the renaming. In REL_14, maybe you'd also want to update the release notes reference |This is useful if only a small percentage of rows is checked on |the inner side and is controlled by . -- Justin
Re: pg_stats and range statistics
Hello, This should have been added with [1]. Excerpt from the documentation: "pg_stats is also designed to present the information in a more readable format than the underlying catalog — at the cost that its schema must be extended whenever new slot types are defined for pg_statistic." [2] So, I added a reminder in pg_statistic.h. Attached is v2 of this patch with some cosmetic changes. Renamed the columns a bit and updated the docs to be a bit more descriptive. (range_length_empty_frac -> empty_range_frac, range_bounds_histogram -> range_bounds_histograms) One question: We do have the option of representing the histogram of lower bounds separately from the histogram of upper bounds, as two separate view columns. Don't know if there is much utility though and there is a fair bit of added complexity: see below. Thoughts? My attempts via SQL (unnest -> lower|upper -> array_agg) were futile given unnest does not play nice with anyarray. For instance: select unnest(stavalues1) from pg_statistic; ERROR: cannot determine element type of "anyarray" argument Maybe the only option is to write a UDF pg_get_{lower|upper}_bounds_histogram which can do something similar to what calc_hist_selectivity does: /* * Convert histogram of ranges into histograms of its lower and upper * bounds. */ nhist = hslot.nvalues; hist_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist); hist_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist); for (i = 0; i < nhist; i++) { bool empty; range_deserialize(rng_typcache, DatumGetRangeTypeP(hslot.values[i]), &hist_lower[i], &hist_upper[i], &empty); /* The histogram should not contain any empty ranges */ if (empty) elog(ERROR, "bounds histogram contains an empty range"); } This is looking good and ready. [1] https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f [2] https://www.postgresql.org/docs/devel/view-pg-stats.html Regards, Soumyadeep (VMware) From 1f2ed0911eb3f7a53b16047cefb499692daf5ef6 Mon Sep 17 00:00:00 2001 From: Egor Rogov Date: Sun, 11 Jul 2021 11:09:45 -0700 Subject: [PATCH v2 1/1] Display length and bounds histograms in pg_stats Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these slot kinds were introduced in 918eee0c49. This commit adds the missing fields to pg_stats. TODO: catalog version bump --- doc/src/sgml/catalogs.sgml | 32 src/backend/catalog/system_views.sql | 23 +++- src/include/catalog/pg_statistic.h | 3 +++ src/test/regress/expected/rules.out | 26 +- 4 files changed, 82 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index f517a7d4af..7168ff9a13 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -12877,6 +12877,38 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx non-null elements. (Null for scalar types.) + + + + range_length_histogram anyarray + + + A histogram of the lengths of non-empty and non-null range values of a + range type column. (Null for non-range types.) + + + + + + empty_range_frac float4 + + + Fraction of column entries whose values are empty ranges. + (Null for non-range types.) + + + + + + range_bounds_histograms anyarray + + + Histograms of lower and upper bounds of non-empty, non-null ranges, + combined into a single array of range values. The lower and upper bounds + of each value correspond to the histograms of lower and upper bounds + respectively. (Null for non-range types.) + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 999d984068..d8bc622ad5 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS WHEN stakind3 = 5 THEN stanumbers3 WHEN stakind4 = 5 THEN stanumbers4 WHEN stakind5 = 5 THEN stanumbers5 -END AS elem_count_histogram +END AS elem_count_histogram, +CASE +WHEN stakind1 = 6 THEN stavalues1 +WHEN stakind2 = 6 THEN stavalues2 +WHEN stakind3 = 6 THEN stavalues3 +WHEN stakind4 = 6 THEN stavalues4 +WHEN stakind5 = 6 THEN stavalues5 +END AS range_length_histogram, +CASE +WHEN stakind1 = 6 THEN stanumbers1[1] +WHEN stakind2 = 6 THEN stanumbers2[1] +WHEN stakind3 = 6 THEN stanumbers3[1] +WHEN stakind4 = 6 THEN stanumbers4[1] +WHEN stakind5 = 6 THEN stanumbers5[1] +END AS empty_range_frac, +CASE +WHEN stakind1 = 7 THEN stavalues1 +
Re: row filtering for logical replication
On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote: > Hi. > > I have been looking at the latest patch set (v16). Below are my review > comments and some patches. > Peter, thanks for your detailed review. Comments are inline. > 1. Patch 0001 comment - typo > > you can optionally filter rows that does not satisfy a WHERE condition > > typo: does/does Fixed. > > 2. Patch 0001 comment - typo > > The WHERE clause should probably contain only columns that are part of > the primary key or that are covered by REPLICA IDENTITY. Otherwise, > and DELETEs won't be replicated. > > typo: "Otherwise, and DELETEs" ?? Fixed. > 3. Patch 0001 comment - typo and clarification > > If your publication contains partitioned table, the parameter > publish_via_partition_root determines if it uses the partition row filter (if > the parameter is false -- default) or the partitioned table row filter. > > Typo: "contains partitioned table" -> "contains a partitioned table" Fixed. > Also, perhaps the text "or the partitioned table row filter." should > say "or the root partitioned table row filter." to disambiguate the > case where there are more levels of partitions like A->B->C. e.g. What > filter does C use? I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the root partitioned table is used. We should improve that sentence too. > 4. src/backend/catalog/pg_publication.c - misleading names > > -publication_add_relation(Oid pubid, Relation targetrel, > +publication_add_relation(Oid pubid, PublicationRelationInfo *targetrel, > bool if_not_exists) > > Leaving this parameter name as "targetrel" seems a bit misleading now > in the function code. Maybe this should be called something like "pri" > which is consistent with other places where you have declared > PublicationRelationInfo. > > Also, consider declaring some local variables so that the patch may > have less impact on existing code. e.g. > Oid relid = pri->relid > Relation *targetrel = relationinfo->relation Done. > 5. src/backend/commands/publicationcmds.c - simplify code > > - rels = OpenTableList(stmt->tables); > + if (stmt->tableAction == DEFELEM_DROP) > + rels = OpenTableList(stmt->tables, true); > + else > + rels = OpenTableList(stmt->tables, false); > > Consider writing that code more simply as just: > > rels = OpenTableList(stmt->tables, stmt->tableAction == DEFELEM_DROP); It is not a common pattern to use an expression as a function argument in Postgres. I prefer to use a variable with a suggestive name. > 6. src/backend/commands/publicationcmds.c - bug? > > - CloseTableList(rels); > + CloseTableList(rels, false); > } > > Is this a potential bug? When you called OpenTableList the 2nd param > was maybe true/false, so is it correct to be unconditionally false > here? I am not sure. Good catch. > 7. src/backend/commands/publicationcmds.c - OpenTableList function comment. > > * Open relations specified by a RangeVar list. > + * AlterPublicationStmt->tables has a different list element, hence, is_drop > + * indicates if it has a RangeVar (true) or PublicationTable (false). > * The returned tables are locked in ShareUpdateExclusiveLock mode in order > to > * add them to a publication. > > I am not sure about this. Should that comment instead say "indicates > if it has a Relation (true) or PublicationTable (false)"? Fixed. > 8. src/backend/commands/publicationcmds.c - OpenTableList >8 > For some reason it feels kind of clunky to me for this function to be > processing the list differently according to the 2nd param. e.g. the > name "is_drop" seems quite unrelated to the function code, and more to > do with where it was called from. Sorry, I don't have any better ideas > for improvement atm. My suggestion is to rename it to "pub_drop_table". > 9. src/backend/commands/publicationcmds.c - OpenTableList bug? >8 > I felt maybe this is a possible bug here because there seems no code > explicitly assigning the whereClause = NULL if "is_drop" is true so > maybe it can have a garbage value which could cause problems later. > Maybe this is fixed by using palloc0. Fixed. > 10. src/backend/commands/publicationcmds.c - CloseTableList function comment >8 > Probably the meaning of "is_drop" should be described in this function > comment. Done. > 11. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry > signature. > > -static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid); > +static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation > rel); > > I see that this function signature is modified but I did not see how > this parameter refactoring is actually related to the RowFilter patch. > Perhaps I am mistaken, but IIUC this only changes the relid = > RelationGetRelid(rel); to be done inside this function instead of > being done outside by the callers. It is not critical for this patch so I removed it. > 12. src/backend/replication/pgoutput/pgoutput.c - missing function comme
Re: row filtering for logical replication
On Mon, Jul 5, 2021, at 12:14 AM, Greg Nancarrow wrote: > I have some review comments on the "Row filter for logical replication" patch: > > (1) Suggested update to patch comment: > (There are some missing words and things which could be better expressed) I incorporated all your wording suggestions. > (2) Some inconsistent error message wording: > > Currently: > err = _("cannot use subquery in publication WHERE expression"); > > Suggest changing it to: > err = _("subqueries are not allowed in publication WHERE expressions"); The same expression "cannot use subquery in ..." is used in the other switch cases. If you think this message can be improved, I suggest that you submit a separate patch to change all sentences. > > Other examples from the patch: > err = _("aggregate functions are not allowed in publication WHERE > expressions"); > err = _("grouping operations are not allowed in publication WHERE > expressions"); > err = _("window functions are not allowed in publication WHERE expressions"); > errmsg("functions are not allowed in publication WHERE expressions"), > err = _("set-returning functions are not allowed in publication WHERE > expressions"); This is a different function. I just followed the same wording from similar sentences around it. > > (3) The current code still allows arbitrary code execution, e.g. via a > user-defined operator: I fixed it in v18. > Perhaps add the following after the existing shell error-check in make_op(): > > /* User-defined operators are not allowed in publication WHERE clauses */ > if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && opform->oid > >= FirstNormalObjectId) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("user-defined operators are not allowed in publication > WHERE expressions"), > parser_errposition(pstate, location))); I'm still working on a way to accept built-in functions but while we don't have it, let's forbid custom operators too. > > Also, I believe it's also allowing user-defined CASTs (so could add a > similar check to above in transformTypeCast()). > Ideally, it would be preferable to validate/check publication WHERE > expressions in one central place, rather than scattered all over the > place, but that might be easier said than done. > You need to update the patch comment accordingly. I forgot to mention it in the patch I sent a few minutes ago. I'm not sure we need to mention every error condition (specially one that will be rarely used). > (4) src/backend/replication/pgoutput/pgoutput.c > pgoutput_change() > > The 3 added calls to pgoutput_row_filter() are returning from > pgoutput_change(), if false is returned, but instead they should break > from the switch, otherwise cleanup code is missed. This is surely a > bug. Fixed. In summary, v18 contains * Peter Smith's review * Greg Nancarrow's review * cache ExprState * cache TupleTableSlot * forbid custom operators * various fixes -- Euler Taveira EDB https://www.enterprisedb.com/
Protect against possible memory corruption (src/backend/access/nbtree/nbtxlog.c)
Hi, While analyzing a possible use of an uninitialized variable, I checked that *_bt_restore_page* can lead to memory corruption, by not checking the maximum limit of array items which is MaxIndexTuplesPerPage. It can also generate a dangling pointer by incrementing it beyond the limits it can point to. While there, I promoted a reduction of scope and adaptation of the type of the *len* parameter to match XLogRecGetBlockData function. pass regress check at Windows and check-world at Linux. regards, Ranier Vilela 0001-_bt_restore_page-have-issues-can-lead-a-memory-corru.patch Description: Binary data
Re: Protect against possible memory corruption (src/backend/access/nbtree/nbtxlog.c)
On 11/07/2021 22:51, Ranier Vilela wrote: Hi, While analyzing a possible use of an uninitialized variable, I checked that *_bt_restore_page* can lead to memory corruption, by not checking the maximum limit of array items which is MaxIndexTuplesPerPage. + /* Protect against corrupted recovery file */ + nitems = (len / sizeof(IndexTupleData)); + if (nitems < 0 || nitems > MaxIndexTuplesPerPage) + elog(PANIC, "_bt_restore_page: cannot restore %d items to page", nitems); + That's not right. You don't get the number of items by dividing like that. 'len' includes the tuple data as well, not just the IndexTupleData header. @@ -73,12 +79,9 @@ _bt_restore_page(Page page, char *from, int len) nitems = i; for (i = nitems - 1; i >= 0; i--) - { if (PageAddItem(page, items[i], itemsizes[i], nitems - i, false, false) == InvalidOffsetNumber) elog(PANIC, "_bt_restore_page: cannot add item to page"); - from += itemsz; - } } I agree with this change (except that I would leave the braces in place). The 'from' that's calculated here is plain wrong; oversight in commit 7e30c186da. Fortunately it's not used, so it can just be removed. - Heikki
Re: proposal - psql - use pager for \watch command
On Sun, Jul 11, 2021 at 1:18 AM vignesh C wrote: > On Wed, May 12, 2021 at 5:45 PM Pavel Stehule wrote: > > looks so with your patch psql doesn't work on ms Here's a fix for Windows. The pqsignal() calls are #ifdef'd out. I also removed a few lines that were added after commit 3a513067 but aren't needed anymore after fae65629. From b6b3b420e10f389ebbf4ae1d21e34b390b62ce67 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 12 Jul 2021 10:34:40 +1200 Subject: [PATCH v8] Add PSQL_WATCH_PAGER for psql's \watch command. Allow a pager to be used by the \watch command. This works but isn't very useful with traditional pagers like "less", so use a different environment variable. The popular open source tool "pspg" (written by the first author of this patch) knows how to display the output if you set PSQL_WATCH_PAGER="pspg --stream". To make \watch react quickly when the user quits the pager or presses ^C, and also to increase the accuracy of its timing and decrease the rate of useless context switches, change the main loop of the \watch command to use sigwait() rather than a sleeping/polling loop, on Unix. Supported on Unix only for now (like pspg). Author: Pavel Stehule Author: Thomas Munro Discussion: https://postgr.es/m/CAFj8pRBfzUUPz-3gN5oAzto9SDuRSq-TQPfXU_P6h0L7hO%2BEhg%40mail.gmail.com --- doc/src/sgml/ref/psql-ref.sgml | 28 +++ src/bin/psql/command.c | 133 ++--- src/bin/psql/common.c | 11 ++- src/bin/psql/common.h | 2 +- src/bin/psql/help.c| 6 +- src/bin/psql/startup.c | 19 + 6 files changed, 184 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index a8dfc41e40..da53306857 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3002,6 +3002,16 @@ lo_import 152801 (such as more) is used. + + When using the \watch command to execute a query + repeatedly, the environment variable PSQL_WATCH_PAGER + is used to find the pager program instead, on Unix systems. This is + configured separately because it may confuse traditional pagers, but + can be used to send output to tools that understand + psql's output format (such as + pspg --stream). + + When the pager option is off, the pager program is not used. When the pager option is @@ -4672,6 +4682,24 @@ PSQL_EDITOR_LINENUMBER_ARG='--line ' + +PSQL_WATCH_PAGER + + + + When a query is executed repeatedly with the \watch + command, a pager is not used by default. This behavior can be changed + by setting PSQL_WATCH_PAGER to a pager command, on Unix + systems. The pspg pager (not part of + PostgreSQL but available in many open source + software distributions) can display the output of + \watch if started with the option + --stream. + + + + + PSQLRC diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 543401c6d6..cc79da9013 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -13,6 +13,7 @@ #include #ifndef WIN32 #include /* for stat() */ +#include /* for setitimer() */ #include /* open() flags */ #include /* for geteuid(), getpid(), stat() */ #else @@ -4894,8 +4895,17 @@ do_watch(PQExpBuffer query_buf, double sleep) const char *strftime_fmt; const char *user_title; char *title; + const char *pagerprog = NULL; + FILE *pagerpipe = NULL; int title_len; int res = 0; +#ifndef WIN32 + sigset_t sigalrm_sigchld_sigint; + sigset_t sigalrm_sigchld; + sigset_t sigint; + struct itimerval interval; + bool done = false; +#endif if (!query_buf || query_buf->len <= 0) { @@ -4903,6 +4913,58 @@ do_watch(PQExpBuffer query_buf, double sleep) return false; } +#ifndef WIN32 + sigemptyset(&sigalrm_sigchld_sigint); + sigaddset(&sigalrm_sigchld_sigint, SIGCHLD); + sigaddset(&sigalrm_sigchld_sigint, SIGALRM); + sigaddset(&sigalrm_sigchld_sigint, SIGINT); + + sigemptyset(&sigalrm_sigchld); + sigaddset(&sigalrm_sigchld, SIGCHLD); + sigaddset(&sigalrm_sigchld, SIGALRM); + + sigemptyset(&sigint); + sigaddset(&sigint, SIGINT); + + /* + * Block SIGALRM and SIGCHLD before we start the timer and the pager (if + * configured), to avoid races. sigwait() will receive them. + */ + sigprocmask(SIG_BLOCK, &sigalrm_sigchld, NULL); + + /* + * Set a timer to interrupt sigwait() so we can run the query at the + * requested intervals. + */ + interval.it_value.tv_sec = sleep_ms / 1000; + interval.it_value.tv_usec = (sleep_ms % 1000) * 1000; + interval.it_interval = interval.it_value; + if (setitimer(ITIMER_REAL, &interval, NULL) < 0) + { + pg_log_error("could not set timer: %m"); + done = true; + } +#endif + + /* + * For \watch, we ignore the size of the result and always use the pager +
Re: row filtering for logical replication
Hi, I took a look at this patch, which seems to be in CF since 2018. I have only some basic comments and observations at this point: 1) alter_publication.sgml I think "expression is executed" sounds a bit strange, perhaps "evaluated" would be better? 2) create_publication.sgml Why is the patch changing publish_via_partition_root docs? That seems like a rather unrelated bit. The WHERE clause should probably contain only columns that are part of the primary key or be covered by REPLICA ... I'm not sure what exactly is this trying to say. What does "should probably ..." mean in practice for the users? Does that mean something bad will happen for other columns, or what? I'm sure this wording will be quite confusing for users. It may also be unclear whether the condition is evaluated on the old or new row, so perhaps add an example illustrating that & more detailed comment, or something. E.g. what will happen with UPDATE departments SET active = false WHERE active; 3) publication_add_relation Does this need to build the parse state even for whereClause == NULL? 4) AlterPublicationTables I wonder if this new reworked code might have issues with subscriptions containing many tables, but I haven't tried. 5) OpenTableList I really dislike that the list can have two different node types (Relation and PublicationTable). In principle we don't actually need the extra flag, we can simply check the node type directly by IsA() and act based on that. However, I think it'd be better to just use a single node type from all places. I don't see why not to set whereClause every time, I don't think the extra if saves anything, it's just a bit more complex. 5) CloseTableList The comment about node types seems pointless, this function has no flag and the element type does not matter. 6) parse_agg.c ... are not allowed in publication WHERE expressions I think all similar cases use "WHERE conditions" instead. 7) transformExprRecurse The check at the beginning seems rather awkward / misplaced - it's way too specific for this location (there are no other p_expr_kind references in this function). Wouldn't transformFuncCall (or maybe ParseFuncOrColumn) be a more appropriate place? Initially I was wondering why not to allow function calls in WHERE conditions, but I see that was discussed in the past as problematic. But that reminds me that I don't see any docs describing what expressions are allowed in WHERE conditions - maybe we should explicitly list what expressions are allowed? 8) pgoutput.c I have not reviewed this in detail yet, but there seems to be something wrong because `make check-world` fails in subscription/010_truncate.pl after hitting an assert (backtrace attached) during "START_REPLICATION SLOT" in get_rel_sync_entry in this code: /* Release tuple table slot */ if (entry->scantuple != NULL) { ExecDropSingleTupleTableSlot(entry->scantuple); entry->scantuple = NULL; } So there seems to be something wrong with how the slot is created. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Program terminated with signal SIGABRT, Aborted. #0 0x7e8ed2ec29e5 in ?? () from /lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install glibc-2.32-8.fc33.x86_64 libgcc-10.3.1-1.fc33.x86_64 (gdb) bt #0 0x7e8ed2ec29e5 in ?? () from /lib64/libc.so.6 #1 0x7e8ed2eab8a4 in ?? () from /lib64/libc.so.6 #2 0x00afffe1 in ExceptionalCondition (conditionName=0xb7f8e5 "tupdesc->tdrefcount > 0", errorType=0xb7f8a3 "FailedAssertion", fileName=0xb7f810 "tupdesc.c", lineNumber=386) at assert.c:69 #3 0x004a55cb in DecrTupleDescRefCount (tupdesc=0x7e8ed23b4578) at tupdesc.c:386 #4 0x0073fd84 in ExecDropSingleTupleTableSlot (slot=0x7e8ed23b8910) at execTuples.c:1261 #5 0x7e8ed23f2aed in get_rel_sync_entry (data=0x2875440, relation=0x7e8ed23b86c0) at pgoutput.c:1258 #6 0x7e8ed23f20f8 in pgoutput_truncate (ctx=0x286cc70, txn=0x2888c48, nrelations=1, relations=0x2892dd8, change=0x288b138) at pgoutput.c:890 #7 0x008cf03b in truncate_cb_wrapper (cache=0x286ec80, txn=0x2888c48, nrelations=1, relations=0x2892dd8, change=0x288b138) at logical.c:1083 #8 0x008d9605 in ReorderBufferApplyTruncate (rb=0x286ec80, txn=0x2888c48, nrelations=1, relations=0x2892dd8, change=0x288b138, streaming=false) at reorderbuffer.c:1928 #9 0x008da08c in ReorderBufferProcessTXN (rb=0x286ec80, txn=0x2888c48, commit_lsn=23903752, snapshot_now=0x286f168, command_id=2, streaming=false) at reorderbuffer.c:2314 #10 0x008da845 in ReorderBufferReplay (txn=0x2888c48, rb=0x286ec80, xid=744, commit_lsn=23903752, end_lsn=23903960, commit_time=679358346883121, origin_id=0, origin_lsn=0) at reorderbuffer.c:2619 #11 0x008da8c3 in ReorderBufferCommit (rb=0x286ec80, xid=744, commit_lsn=23903752, end_lsn=23903960
Re: row filtering for logical replication
On Sun, Jul 11, 2021, at 4:39 PM, Euler Taveira wrote: > with cache (v18) > --- > > mean: 0.63 us > stddev: 1.07 us > median: 0.55 us > min-max:[0.29 .. 844.87] us > percentile(99): 1.38 us > mode: 0.41 us > > It represents -57%. It is a really good optimization for just a few extra > lines > of code. cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos, it could probably be a transient issue. $ uname -a FreeBSD freebsd12 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC amd64 $ PROVE_TESTS="t/010_truncate.pl" gmake check gmake -C ../../../src/backend generated-headers gmake[1]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend' gmake -C catalog distprep generated-header-symlinks gmake[2]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend/catalog' gmake[2]: Nothing to be done for 'distprep'. gmake[2]: Nothing to be done for 'generated-header-symlinks'. gmake[2]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend/catalog' gmake -C utils distprep generated-header-symlinks gmake[2]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend/utils' gmake[2]: Nothing to be done for 'distprep'. gmake[2]: Nothing to be done for 'generated-header-symlinks'. gmake[2]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend/utils' gmake[1]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend' rm -rf '/home/euler/pglr-row-filter-v17'/tmp_install /bin/sh ../../../config/install-sh -c -d '/home/euler/pglr-row-filter-v17'/tmp_install/log gmake -C '../../..' DESTDIR='/home/euler/pglr-row-filter-v17'/tmp_install install >'/home/euler/pglr-row-filter-v17'/tmp_install/log/install.log 2>&1 gmake -j1 checkprep >>'/home/euler/pglr-row-filter-v17'/tmp_install/log/install.log 2>&1 rm -rf '/usr/home/euler/pglr-row-filter-v17/src/test/subscription'/tmp_check /bin/sh ../../../config/install-sh -c -d '/usr/home/euler/pglr-row-filter-v17/src/test/subscription'/tmp_check cd . && TESTDIR='/usr/home/euler/pglr-row-filter-v17/src/test/subscription' PATH="/home/euler/pglr-row-filter-v17/tmp_install/home/euler/pgrf18/bin:$PATH" LD_LIBRARY_PATH="/home/euler/pglr-row-filter-v17/tmp_install/home/euler/pgrf18/lib" LD_LIBRARY_PATH_RPATH=1 PGPORT='6' PG_REGRESS='/usr/home/euler/pglr-row-filter-v17/src/test/subscription/../../../src/test/regress/pg_regress' /usr/local/bin/prove -I ../../../src/test/perl/ -I . t/010_truncate.pl t/010_truncate.pl .. ok All tests successful. Files=1, Tests=14, 5 wallclock secs ( 0.02 usr 0.00 sys + 1.09 cusr 0.99 csys = 2.10 CPU) Result: PASS -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Protect against possible memory corruption (src/backend/access/nbtree/nbtxlog.c)
Em dom., 11 de jul. de 2021 às 19:19, Heikki Linnakangas escreveu: > On 11/07/2021 22:51, Ranier Vilela wrote: > > Hi, > > > > While analyzing a possible use of an uninitialized variable, I checked > that > > *_bt_restore_page* can lead to memory corruption, > > by not checking the maximum limit of array items which is > > MaxIndexTuplesPerPage. > > > + /* Protect against corrupted recovery file */ > > + nitems = (len / sizeof(IndexTupleData)); > > + if (nitems < 0 || nitems > MaxIndexTuplesPerPage) > > + elog(PANIC, "_bt_restore_page: cannot restore %d items to > page", nitems); > > + > > That's not right. You don't get the number of items by dividing like > that. 'len' includes the tuple data as well, not just the IndexTupleData > header. > Thanks for the quick review. Not totally wrong. If it is not possible, know the upper limits, before the loop. It is necessary to do this inside the loop. attached v1 of patch. regards, Ranier Vilela v1-0001-_bt_restore_page-have-issues-can-lead-a-memory-co.patch Description: Binary data
Re: enable_resultcache confusion
On Mon, 12 Jul 2021 at 03:22, Justin Pryzby wrote: > |This is useful if only a small percentage of rows is checked on > |the inner side and is controlled by |linkend="guc-enable-resultcache"/>. You might be right there, but I'm not too sure if I changed that that it might cause a mention of the rename to be missed in the changes since beta2 notes. Additionally, I was unsure about touching typedefs.list. In the patch I changed it, but wasn't too sure if that was the correct thing to do. In normal circumstances, i.e writing new code, I'd not touch it. David
Re: row filtering for logical replication
Hi Andres complained about the safety of doing general expression evaluation in pgoutput; that was first in https://postgr.es/m/20210128022032.eq2qqc6zxkqn5...@alap3.anarazel.de where he described a possible approach to handle it by restricting expressions to have limited shape; and later in http://postgr.es/m/20210331191710.kqbiwe73lur7j...@alap3.anarazel.de I was just scanning the patch trying to see if some sort of protection had been added for this, but I couldn't find anything. (Some functions are under-commented, though). So, is it there already, and if so what is it? And if it isn't, then I think it should definitely be put there in some form. Thanks -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: Cosmic ray hits integerset
Fwiw, yes it could be a cosmic ray. It could also just be marginally bad ram. Bad ram is notoriously hard to reliably test for. It can be very sensitive to the exact bit pattern stored in it, the timing of reads and writes, and other factors. The whole point of the rowhammer attacks is to push some of those timing factors hard but the same failures can happen randomly. On Wed, 7 Jul 2021 at 08:14, Joe Conway wrote: > > On 7/7/21 2:53 AM, Jakub Wartak wrote: > > Hi, Asking out of pure technical curiosity about "the rhinoceros" - what > > kind of animal is it ? Physical box or VM? How one could get dmidecode(1) / > > dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?) > > > Rhinoceros is just a VM on a simple desktop machine. Nothing fancy. > > Joe > > -- > Crunchy Data - http://crunchydata.com > PostgreSQL Support for Secure Enterprises > Consulting, Training, & Open Source Development > > -- greg
Re: row filtering for logical replication
On Mon, Jul 12, 2021 at 9:31 AM Euler Taveira wrote: > > cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed > in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to > reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos, > it could probably be a transient issue. > I don't think it's a transient issue. I also get a test failure in subscription/010_truncate.pl when I run "make check-world" with the v18 patches applied. The problem can be avoided with the following change (to match what was originally in my v17-0005 performance-improvement patch): diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 08c018a300..800bae400b 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -1256,8 +1256,8 @@ get_rel_sync_entry(PGOutputData *data, Relation relation) } /* create a tuple table slot for row filter */ -tupdesc = RelationGetDescr(relation); oldctx = MemoryContextSwitchTo(CacheMemoryContext); +tupdesc = CreateTupleDescCopy(RelationGetDescr(relation)); entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple); MemoryContextSwitchTo(oldctx); This creates a TupleDesc copy in CacheMemoryContext that is not refcounted, so it side-steps the problem. At this stage I am not sure why the original v18 patch code doesn't work correctly for the TupleDesc refcounting here. The TupleDesc refcount is zero when it's time to dealloc the tuple slot (thus causing that Assert to fire), yet when the slot was created, the TupleDesc refcount was incremented.- so it seems something else has already decremented the refcount by the time it comes to deallocate the slot. Perhaps there's an order-of-cleanup or MemoryContext issue here or some buggy code somewhere, not sure yet. Regards, Greg Nancarrow Fujitsu Australia
Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.
On Sat, Jul 10, 2021 at 07:09:10PM +0530, Bharath Rupireddy wrote: > Thanks. The patch looks good to me, except a minor comment - isn't it > "int2 for these fields" as the fields still exist? + /* pageinspect >= > 1.10 uses int4 instead of int2 for those fields */ This comment looks fine to me as-is, so applied on HEAD. -- Michael signature.asc Description: PGP signature
Re: Record a Bitmapset of non-pruned partitions
On Sat, 10 Jul 2021 at 03:24, David Rowley wrote: > The good news is that the code in partitions_are_ordered() became even > more simple as a result of this change. We can do ordered scan simply > when !bms_overlap(live_parts, boundinfo->interleaved_parts). I've spent a bit more time revising the 0002 patch so that we're a bit more strict about when we mark a partition as interleaved. For example, if the DEFAULT partition happens to be the only partition, then I'm no longer classing that as interleaved as there's nothing for it to be interleaved with. This also fixes up the not-so-robust check that I had to check if the NULL partition allowed other Datums. David v3-0001-Track-non-pruned-partitions-in-RelOptInfo.patch Description: Binary data v3-0002-Allow-ordered-partition-scans-in-more-cases.patch Description: Binary data
Re: Diagnostic comment in LogicalIncreaseXminForSlot
On Mon, Jul 5, 2021 at 12:54 PM Masahiko Sawada wrote: > > On Fri, May 21, 2021 at 6:00 PM Ashutosh Bapat > wrote: > > > > It's there in CF. I am fine with PG-15. It will be good to patch the > > back-branches to have this extra diagnostic information available. > > The patch looks to me. > { slot->candidate_catalog_xmin = xmin; slot->candidate_xmin_lsn = current_lsn; + elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin, + LSN_FORMAT_ARGS(current_lsn)); } SpinLockRelease(&slot->mutex); Today, again looking at this patch, I don't think doing elog inside spinlock is a good idea. We can either release spinlock before it or use some variable to remember that we need to write such an elog and do it after releasing the lock. What do you think? I have noticed that a nearby function LogicalIncreaseRestartDecodingForSlot() logs similar information after releasing spinlock, so it is better to follow the same here as well. -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Fri, Jul 9, 2021 at 5:58 AM Masahiko Sawada wrote: > > On Thu, Jul 8, 2021 at 6:28 PM Amit Kapila wrote: > > > > On Wed, Jul 7, 2021 at 11:47 AM Masahiko Sawada > > wrote: > > > > > > On Tue, Jul 6, 2021 at 6:33 PM Amit Kapila > > > wrote: > > > > > > > > > > > > > > According to the doc, ALTER SUBSCRIPTION ... SET is used to alter > > > > > parameters originally set by CREATE SUBSCRIPTION. Therefore, we can > > > > > specify a subset of parameters that can be specified by CREATE > > > > > SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can > > > > > be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff > > > > > cannot be done. Are you concerned about adding a syntax to ALTER > > > > > SUBSCRIPTION? > > > > > > > > > > > > > Both for additional syntax and consistency with disable_on_error. > > > > Isn't it just a current implementation that Alter only allows to > > > > change parameters supported by Create? Is there a reason why we can't > > > > allow Alter to set/change some parameters not supported by Create? > > > > > > I think there is not reason for that but looking at ALTER TABLE I > > > thought there is such a policy. > > > > > > > If we are looking for precedent then I think we allow to set > > configuration parameters via Alter Database but not via Create > > Database. Does that address your concern? > > Thank you for the info! But it seems like CREATE DATABASE doesn't > support SET in the first place. Also interestingly, ALTER SUBSCRIPTION > support both ENABLE/DISABLE and SET (enabled = on/off). > I think that is redundant but not sure if there is any reason behind doing so. > I’m not sure > from the point of view of consistency with other CREATE, ALTER > commands, and disable_on_error but it might be better to avoid adding > additional syntax. > If we can avoid introducing new syntax that in itself is a good reason to introduce it as an option. -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Fri, Jul 9, 2021 at 9:02 AM Alexey Lesovsky wrote: > > On Fri, Jul 9, 2021 at 5:43 AM Masahiko Sawada wrote: >> >> > I also would like to clarify, when conflict is resolved - the error record >> > is cleared or kept in the view? If it is cleared, the error counter is >> > required (because we don't want to lose all history of errors). If it is >> > kept - the flag telling about the error is resolved is needed (or set xid >> > to NULL). I mean when the user is watching the view, he should be able to >> > identify if the error has already been resolved or not. >> >> With the current patch, once the conflict is resolved by skipping the >> transaction in question, its entry on the stats view is cleared. As >> you suggested, if we have the total error counts in that view, it >> would be good to keep the count and clear other fields such as xid, >> last_failure, and command etc. > > > Ok, looks nice. But I am curious how this will work in the case when there > are two (or more) errors in the same subscription, but different relations? > We can't proceed unless the first error is resolved, so there shouldn't be multiple unresolved errors. However, there is an exception to it which is during initial table sync and I think the view should have separate rows for each table sync. -- With Regards, Amit Kapila.
Re: [HACKERS] logical decoding of two-phase transactions
On Sun, Jul 11, 2021 at 8:20 PM Amit Kapila wrote: > > On Fri, Jul 9, 2021 at 4:43 AM Peter Smith wrote: > > > > > The patch looks good to me, I don't have any comments. > > > > I tried the v95-0001 patch. > > > > - The patch applied cleanly and all build / testing was OK. > > - The documentation also builds OK. > > - I checked all v95-0001 / v93-0001 differences and found no problems. > > - Furthermore, I noted that v95-0001 patch is passing the cfbot [1]. > > > > So this patch LGTM. > > > > Thanks, I took another pass over it and made a few changes in docs and > comments. I am planning to push this next week sometime (by 14th July) > unless there are more comments from you or someone else. Just to > summarize, this patch will add support for prepared transactions to > built-in logical replication. To add support for streaming > transactions at prepare time into the > built-in logical replication, we need to do the following things: (a) > Modify the output plugin (pgoutput) to implement the new two-phase API > callbacks, by leveraging the extended replication protocol. (b) Modify > the replication apply worker, to properly handle two-phase > transactions by replaying them on prepare. (c) Add a new SUBSCRIPTION > option "two_phase" to allow users to enable > two-phase transactions. We enable the two_phase once the initial data > sync is over. Refer to comments atop worker.c in the patch and commit > message to see further details about this patch. After this patch, > there is a follow-up patch to allow streaming and two-phase options > together which I feel needs some more review and can be committed > separately. > FYI - I repeated the same verification of the v96-0001 patch as I did previously for v95-0001 - The v96 patch applied cleanly and all build / testing was OK. - The documentation also builds OK. - I checked the v95-0001 / v96-0001 differences and found no problems. - Furthermore, I noted that v96-0001 patch is passing the cfbot. LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Dear Horiguchi-san, Thank you for reviewing! I attached new version. Sorry for delaying reply. > Since we don't allow descriptors with the same name even if they are > for the different connections, I think we can set the current > connection if any (which is set either by AT option or statement-bound > one) to i->connection silently if i->connection is NULL in > lookup_descriptor(). What do you think about this? I tried to implement. Is it correct? > connection is "conn1" at the error time. The parser relies on > output_statement and friends for connection name reset. So the rules > that don't call the functions need to reset it by themselves. Oh, I didn't notice that. Fixed. I'm wondering why a output function is not implemented, like output_describe_statement(), but anyway I put a connection reset in ecpg.addons. > Similary, the following sequence doesn't yield an error, which is > expected. > > > EXEC SQL AT conn1 DECLARE stmt STATEMENT; > > EXEC SQL AT conn2 EXECUTE stmt INTO ..; > > In this case "conn2" set by the AT option is silently overwritten with > "conn1" by check_declared_list(). I think we should reject AT option > (with a different connection) in that case. Actually this comes from Oracle's specification. Pro*C precompiler overwrite their connection in the situation, hence I followed that. But I agree this might be confused and I added the warning report. How do you think? Is it still strange? Best Regards, Hayato Kuroda FUJITSU LIMITED v03-0001-fix-deallocate-describe.patch Description: v03-0001-fix-deallocate-describe.patch v03-0002-add-test.patch Description: v03-0002-add-test.patch
Re: More time spending with "delete pending"
On Fri, Jul 09, 2021 at 09:00:00PM +0300, Alexander Lakhin wrote: > Thank you! I agree with your improvement. I can't remember why did I > inject 'include "port.h"' in genfile.c. > Today I've rechecked all the chain of includes and I see that stat() is > redefined as _pgstat64() in win32_port.h, that includes . > genfile.c includes "postgres.h" (that includes win32_port.h indirectly) > and then includes again, but the later include should be > ignored due "#pragma once" in stat.h. > So I have no objection to the removal of that include. Thanks, that matches my impression. There was one comment at the top of _pgstat64() that was still incorrect. I have spent more time checking and tested this stuff today, and that looks fine. One question pending is if we should increase the timeout used by WIN32's stat() and open(), but the buildfarm should be able to tell us more here. -- Michael signature.asc Description: PGP signature
Re: Skipping logical replication transactions on subscriber side
On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila wrote: > > > > Ok, looks nice. But I am curious how this will work in the case when > there are two (or more) errors in the same subscription, but different > relations? > > > > We can't proceed unless the first error is resolved, so there > shouldn't be multiple unresolved errors. > Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1). -- Regards, Alexey
Re: Skipping logical replication transactions on subscriber side
On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky wrote: > > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila wrote: >> >> > >> > Ok, looks nice. But I am curious how this will work in the case when there >> > are two (or more) errors in the same subscription, but different relations? >> > >> >> We can't proceed unless the first error is resolved, so there >> shouldn't be multiple unresolved errors. > > > Ok. I thought multiple errors are possible when many tables are initialized > using parallel workers (with max_sync_workers_per_subscription > 1). > Yeah, that is possible but that covers under the second condition mentioned by me and in such cases I think we should have separate rows for each tablesync. Is that right, Sawada-san or do you have something else in mind? -- With Regards, Amit Kapila.
Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values
On Wed, 7 Jul 2021 at 20:37, David Rowley wrote: > > On Wed, 7 Jul 2021 at 00:06, Greg Nancarrow wrote: > > I think if you're going to reject this patch, a brief comment should > > be added to that code to justify why that existing superfluous check > > is worthwhile. > > It seems strange to add a comment to explain why it's there. If we're > going to the trouble of doing that, then we should just remove it and > add a very small comment to mention why INT8 sequences don't need to > be checked. Any thoughts on this, Greg? David
Re: row filtering for logical replication
On Mon, Jul 12, 2021 at 7:19 AM Alvaro Herrera wrote: > > Hi > > Andres complained about the safety of doing general expression > evaluation in pgoutput; that was first in > > https://postgr.es/m/20210128022032.eq2qqc6zxkqn5...@alap3.anarazel.de > where he described a possible approach to handle it by restricting > expressions to have limited shape; and later in > http://postgr.es/m/20210331191710.kqbiwe73lur7j...@alap3.anarazel.de > > I was just scanning the patch trying to see if some sort of protection > had been added for this, but I couldn't find anything. (Some functions > are under-commented, though). So, is it there already, and if so what > is it? > I think the patch is trying to prohibit arbitrary expressions in the WHERE clause via transformWhereClause(..EXPR_KIND_PUBLICATION_WHERE..). You can notice that at various places the expressions are prohibited via EXPR_KIND_PUBLICATION_WHERE. I am not sure that the checks are correct and sufficient but I think there is some attempt to do it. For example, the below sort of ad-hoc check for func_call doesn't seem to be good idea. @@ -119,6 +119,13 @@ transformExprRecurse(ParseState *pstate, Node *expr) /* Guard against stack overflow due to overly complex expressions */ check_stack_depth(); + /* Functions are not allowed in publication WHERE clauses */ + if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && nodeTag(expr) == T_FuncCall) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("functions are not allowed in publication WHERE expressions"), + parser_errposition(pstate, exprLocation(expr; Now, the other idea I had in mind was to traverse the WHERE clause expression in publication_add_relation and identify if it contains anything other than the ANDed list of 'foo.bar op constant' expressions. OTOH, for index where clause expressions or policy check expressions, we use a technique similar to what we have in the patch to prohibit certain kinds of expressions. Do you have any preference on how this should be addressed? -- With Regards, Amit Kapila.
Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values
On Mon, Jul 12, 2021 at 2:26 PM David Rowley wrote: > > > It seems strange to add a comment to explain why it's there. If we're > > going to the trouble of doing that, then we should just remove it and > > add a very small comment to mention why INT8 sequences don't need to > > be checked. > > Any thoughts on this, Greg? > The patch LGTM (it's the same as my original patch but with short comments). Regards, Greg Nancarrow Fujitsu Australia
Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
On Fri, Jul 9, 2021 at 7:30 PM Alvaro Herrera wrote: > > On 2021-Jul-09, Amul Sul wrote: > > > > On Tue, Jul 6, 2021 at 11:06 PM Tom Lane wrote: > > > > > The point of the static-inline function idea was to be cheap enough > > > > that it isn't worth worrying about this sort of risky optimization. > > > > Given that an smgr function is sure to involve some kernel calls, > > > > I doubt it's worth sweating over an extra test-and-branch beforehand. > > > > So where I was hoping to get to is that smgr objects are *only* > > > > referenced by RelationGetSmgr() calls and nobody ever keeps any > > > > other pointers to them across any non-smgr operations. > > > Herewith attached version did the same, thanks. > > I think it would be valuable to have a comment in that function to point > out what is the function there for. Thanks for the suggestion, added the same in the attached version. Regards, Amul v5_Add-RelationGetSmgr-inline-function.patch Description: Binary data
Re: More time spending with "delete pending"
On Mon, Jul 12, 2021 at 01:07:17PM +0900, Michael Paquier wrote: > Thanks, that matches my impression. There was one comment at the top > of _pgstat64() that was still incorrect. I have spent more time > checking and tested this stuff today, and that looks fine. One > question pending is if we should increase the timeout used by WIN32's > stat() and open(), but the buildfarm should be able to tell us more > here. And fairywren, that uses MinGW, is unhappy: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-07-12%2004%3A47%3A13 Looking at it now. -- Michael signature.asc Description: PGP signature
Re: Teach pg_receivewal to use lz4 compression
On Thu, Jul 8, 2021 at 7:48 PM wrote: > > Dilip Kumar wrote: > > > Wouldn't it be better to call it compression method instead of > > compression program? > > Agreed. This is inline with the suggestions of other reviewers. > Find the change in the attached patch. Thanks, I will have a look. > > I think we can somehow use "acceleration" parameter of lz4 compression > > to map on compression level, It is not direct mapping but > > can't we create some internal mapping instead of completely ignoring > > this option for lz4, or we can provide another option for lz4? > > We can, though I am not in favour of doing so. There is seemingly > little benefit for added complexity. I am really not sure what complexity you are talking about, do you mean since with pglz we were already providing the compression level so let it be as it is but with the new compression method you don't see much benefit of providing compression level or speed? > > Should we also support LZ4 compression using dictionary? > > I would we should not do that. If my understanding is correct, > decompression would require the dictionary to be passed along. > The algorithm seems to be very competitive to the current compression > as is. I agree, we might not go for a dictionary because we would need to dictionary to decompress as well. So that will add an extra complexity for user. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Skipping logical replication transactions on subscriber side
On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila wrote: > > On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky wrote: > > > > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila wrote: > >> > >> > > >> > Ok, looks nice. But I am curious how this will work in the case when > >> > there are two (or more) errors in the same subscription, but different > >> > relations? > >> > > >> > >> We can't proceed unless the first error is resolved, so there > >> shouldn't be multiple unresolved errors. > > > > > > Ok. I thought multiple errors are possible when many tables are initialized > > using parallel workers (with max_sync_workers_per_subscription > 1). > > > > Yeah, that is possible but that covers under the second condition > mentioned by me and in such cases I think we should have separate rows > for each tablesync. Is that right, Sawada-san or do you have something > else in mind? Yeah, I agree to have separate rows for each table sync. The table should not be processed by both the table sync worker and the apply worker at a time so the pair of subscription OID and relation OID will be unique. I think that we have a boolean column in the view, indicating whether the error entry is reported by the table sync worker or the apply worker, or maybe we also can have the action column show "TABLE SYNC" if the error is reported by the table sync worker. When it comes to removing the subscription errors in pgstat_vacuum_stat(), I think we need to seq scan on the hash table and send the messages to purge the subscription error entries. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: More time spending with "delete pending"
On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote: > And fairywren, that uses MinGW, is unhappy: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-07-12%2004%3A47%3A13 > Looking at it now. I am not sure what to do here to cool down MinGW, so the patch has been reverted for now: - Plain stat() is not enough to do a proper detection of the files pending for deletion on MSVC, so reverting only the part with microsoft_native_stat() is not going to help. - We are going to need an evaluation of the stat() implementation in MinGW and check if is it possible to rely on it more. If yes, we could get away with more tweaks based on __MINGW32__/__MINGW64__. -- Michael signature.asc Description: PGP signature
Re: Teach pg_receivewal to use lz4 compression
On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote: > On Thu, Jul 8, 2021 at 7:48 PM wrote: >> We can, though I am not in favour of doing so. There is seemingly >> little benefit for added complexity. > > I am really not sure what complexity you are talking about, do you > mean since with pglz we were already providing the compression level > so let it be as it is but with the new compression method you don't > see much benefit of providing compression level or speed? You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has in mind, but my opinion stands on the latter: there is little benefit in making lz4 faster than the default and reduce compression, as the default is already a rather low CPU user. -- Michael signature.asc Description: PGP signature
Re: More time spending with "delete pending"
Hello Michael, 12.07.2021 08:52, Michael Paquier wrote: > On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote: >> And fairywren, that uses MinGW, is unhappy: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-07-12%2004%3A47%3A13 >> Looking at it now. > I am not sure what to do here to cool down MinGW, so the patch has > been reverted for now: > - Plain stat() is not enough to do a proper detection of the files > pending for deletion on MSVC, so reverting only the part with > microsoft_native_stat() is not going to help. > - We are going to need an evaluation of the stat() implementation in > MinGW and check if is it possible to rely on it more. If yes, we > could get away with more tweaks based on __MINGW32__/__MINGW64__. I'll take care of it and will prepare mingw-compatible patch tomorrow. Best regards, Alexander
Re: More time spending with "delete pending"
On Mon, Jul 12, 2021 at 09:06:01AM +0300, Alexander Lakhin wrote: > I'll take care of it and will prepare mingw-compatible patch tomorrow. Thanks. Do you have an environment with 32b or 64b MinGW? -- Michael signature.asc Description: PGP signature
Re: More time spending with "delete pending"
12.07.2021 09:23, Michael Paquier wrote: > On Mon, Jul 12, 2021 at 09:06:01AM +0300, Alexander Lakhin wrote: >> I'll take care of it and will prepare mingw-compatible patch tomorrow. > Thanks. Do you have an environment with 32b or 64b MinGW? Yes, I have VMs with 32- and 64-bit mingw environments. Best regards, Alexander
Re: Introduce pg_receivewal gzip compression tests
On Fri, Jul 09, 2021 at 11:26:58AM +, Georgios wrote: > As suggested on a different thread [1], pg_receivewal can increase it's test > coverage. There exists a non trivial amount of code that handles gzip > compression. The current patch introduces tests that cover creation of gzip > compressed WAL files and the handling of gzip partial segments. Finally the > integrity of the compressed files is verified. + # Verify compressed file's integrity + my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); + is($gzip_is_valid, 0, "program gzip verified file's integrity"); libz and gzip are usually split across different packages, hence there is no guarantee that this command is always available (same comment as for LZ4 from a couple of days ago). + [ + 'pg_receivewal', '-D', $stream_dir, '--verbose', + '--endpos', $nextlsn, '-Z', '5' + ], I would keep the compression level to a minimum here, to limit CPU usage but still compress something faster. + # Verify compressed file's integrity + my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]); + is($gzip_is_valid, 0, "program gzip verified file's integrity"); Shouldn't this be coded as a loop going through @gzip_wals? -- Michael signature.asc Description: PGP signature
Re: Speed up transaction completion faster after many relations are accessed in a transaction
Thanks for having a look at this. On Mon, 21 Jun 2021 at 05:02, Zhihong Yu wrote: > + * GH_ELEMENT_TYPE defines the data type that the hashtable stores. Each > + * instance of GH_ELEMENT_TYPE which is stored in the hash table is done so > + * inside a GH_SEGMENT. > > I think the second sentence can be written as (since done means stored, it is > redundant): I've rewords this entire paragraph slightly. > Each instance of GH_ELEMENT_TYPE is stored in the hash table inside a > GH_SEGMENT. > > + * Macros for translating a bucket's index into the segment and another to > + * determine the item number within the segment. > + */ > +#define GH_INDEX_SEGMENT(i)(i) / GH_ITEMS_PER_SEGMENT > > into the segment -> into the segment number (in the code I see segidx but I > wonder if segment index may cause slight confusion). I've adjusted this comment > + GH_BITMAP_WORD used_items[GH_BITMAP_WORDS]; /* A 1-bit for each used item > +* in the items array */ > > 'A 1-bit' -> One bit (A and 1 mean the same) I think you might have misread this. We're storing a 1-bit for each used item rather than a 0-bit. If I remove the 'A' then it's not clear what the meaning of each bit's value is. > + uint32 first_free_segment; > > Since the segment may not be totally free, maybe name the field > first_segment_with_free_slot I don't really like that. It feels too long to me. > + * This is similar to GH_NEXT_ONEBIT but flips the bits before operating on > + * each GH_BITMAP_WORD. > > It seems the only difference from GH_NEXT_ONEBIT is in this line: > > + GH_BITMAP_WORD word = ~(words[wordnum] & mask); /* flip bits */ > > If a 4th parameter is added to signify whether the flipping should be done, > these two functions can be unified. I don't want to do that. I'd rather have them separate to ensure the compiler does not create any additional needless branching. Those functions are pretty hot. > +* next insert will store in this segment. If it's already pointing to an > +* earlier segment, then leave it be. > > The last sentence is confusing: first_free_segment cannot point to some > segment and earlier segment at the same time. > Maybe drop the last sentence. I've adjusted this comment to become: * Check if we need to lower the next_free_segment. We want new inserts * to fill the lower segments first, so only change the first_free_segment * if the removed entry was stored in an earlier segment. Thanks for having a look at this. I'll attach an updated patch soon. David