date:20210711

Re: pgbench logging broken by time logic changes

2021-07-11 Thread Thomas Munro

Hi Fabien,

I committed the code change without the new TAP tests, because I
didn't want to leave the open item hanging any longer.  As for the
test, ...

On Sat, Jul 10, 2021 at 9:36 PM Fabien COELHO  wrote:
> >> I hoped we were done here but I realised that your check for 1-3 log
> >> lines will not survive the harsh environment of the build farm.
> >> Adding sleep(2) before the final doLog() confirms that.  I had two
> >> ideas:

> I misread your point. You think that it should fail, but it is not
> tried yet. I'm rather optimistic that it should not fail, but I'm okay
> with averting the risk anyway.

... I know it can fail, and your v18 didn't fix that, because...

+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,

  ^
  |

  ... this range can be exceeded.

That's because extra aggregations are created based on doLog() reading
the clock after a transaction is finished, entirely independently of
the -T mechanism deciding when to stop the benchmark, and potentially
many seconds later in adverse conditions.  As I mentioned, you can see
it fail with your own eyes if you hack the code like so:

if (agg_interval > 0)
{
+   /*
+* XXX: simulate an overloaded raspberry pi swapping to a microsd
+* card or other random delays as we can expect in the build farm
+*/
+   sleep(3);
/* log aggregated but not yet reported transactions */
doLog(thread, state, &aggs, false, 0, 0);
}

> I stand by this solution which should allow to get some data from the
> field, as v18 attached. If all is green then the TODO could be removed
> later.

I suspect the number of aggregates could be made deterministic, as I
described in an earlier message.  What do you think about doing
something like that first for the next release, before trying to add
assertions about the number of aggregates?  I'm with you on the
importance of testing, but it seems better to start by making the
thing more testable.

Re: Enhanced error message to include hint messages for redundant options error

2021-07-11 Thread Dean Rasheed

On Sat, 10 Jul 2021 at 17:03, vignesh C  wrote:
>
> > Also, I don't think this function should be marked inline -- using a
> > normal function ought to help make the compiled code smaller.
>
> inline instructs the compiler to attempt to embed the function content
> into the calling code instead of executing an actual call. I think we
> should keep it inline to reduce the function call.

Hmm, I'd say that inline should generally be used sparingly, and only
for small functions that are called very often, to avoid the function
call overhead, and generate a faster and possibly smaller executable.
(Though I think sometimes it can still be faster if the executable is
larger.)

In this case, it's a function that is only called under error
conditions, so it's not commonly called, and we don't care so much
about performance when we're about to throw an error.

Also, if you look at an ereport() such as

ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options"),
parser_errposition(pstate, defel->location)));

This is a macro that's actually expanded into 5 separate function calls:

 - errstart() / errstart_cold()
 - errcode()
 - errmsg()
 - parser_errposition()
 - errfinish()

so it's a non-trivial amount of code. Whereas, if it's not inlined, it
becomes just one function call at each call-site, making for smaller,
faster code in the typical case where an error is not being raised.

Of course, it's possible the compiler might still decide to inline the
function, if it thinks that's preferable. In some cases, we explicitly
mark this type of function with pg_noinline, to avoid that, and reduce
code bloat where it's used in lots of small, fast functions (see, for
example, float_overflow_error()).

In general though, I think inline and noinline should be reserved for
special cases where they give a clear, measurable benefit, and that in
general it's better to not mark the function and just let the compiler
decide.

Regards,
Dean

Re: Enhanced error message to include hint messages for redundant options error

2021-07-11 Thread Dean Rasheed

On Sat, 10 Jul 2021 at 18:09, vignesh C  wrote:
>
> I'm planning to handle conflicting errors separately after this
> current work is done, once the patch is changed to have just the valid
> scenarios(removing the scenarios you have pointed out)  existing
> function can work as is without any changes. Thoughts?

Ah OK, that might be reasonable. Perhaps, then errorDuplicateDefElem()
and errorConflictingDefElem() would be better than what I originally
suggested.

BTW, another case I spotted was this:

copy (select 1) to stdout csv csv header;
ERROR:  option "format" specified more than once
LINE 1: copy (select 1) to stdout csv csv header;
  ^

which isn't good because there is no option called "format".

Regards,
Dean

Re: psql - factor out echo code

2021-07-11 Thread vignesh C

On Sat, Jul 10, 2021 at 10:25 PM Fabien COELHO  wrote:
>
>
> Hello Vignesh,
>
> > I am changing the status to "Needs review" as the review is not
> > completed for this patch and also there are some tests failing, that
> > need to be fixed:
> > test test_extdepend   ... FAILED   50 ms
>
> Indeed,
>
> Attached v4 simplifies the format and fixes this one.
> I ran check-world, this time.

Thanks for posting an updated patch, the tests are passing now. I have
changed the status back to Ready For Committer.

Regards,
Vignesh

Re: pgbench logging broken by time logic changes

2021-07-11 Thread Fabien COELHO



Hello Thomas,


I committed the code change without the new TAP tests, because I
didn't want to leave the open item hanging any longer.


Ok. Good.


 As for the test, ... [...]


Argh, so there are no tests that would have caught the regressions:-(


... I know it can fail, and your v18 didn't fix that, because...

+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
 ... this range can be exceeded.


Indeed. I meant to move that one in the TODO section as well, not just the 
previous call, so that all time-sensitive tests are fully ignored but 
reported, which would be enough for me.



I suspect the number of aggregates could be made deterministic, as I
described in an earlier message.  What do you think about doing
something like that first for the next release, before trying to add
assertions about the number of aggregates?


I think that last time I did something to get more deterministic results 
in pgbench, which involved a few lines of hocus-pocus in pgbench, the 
patch got rejected:-)


An "ignored" tests looked like a good compromise to check how things are 
going in the farm and to be able to check for more non regressions when 
developing pgbench, without introducing behavioral changes.


I'm with you on the importance of testing, but it seems better to start 
by making the thing more testable.


I'm used to my test patches being rejected, including modifying pgbench 
behavior to make it more testable. Am I mad enough to retry? Maybe, maybe 
not.


Attached the fully "ignored" version of the time features test as a patch.

--
Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3aa9d5d753..f2a1c861b2 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -8,6 +8,7 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 use Config;
+use Time::HiRes qw(time);
 
 # start a pgbench specific server
 my $node = get_new_node('main');
@@ -54,12 +55,14 @@ sub pgbench
 
 	push @cmd, @args;
 
+	my $start = time();
 	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
+	my $stop = time();
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
 
-	return;
+	return $stop - $start;
 }
 
 # tablespace for testing, because partitioned tables cannot use pg_default
@@ -1187,7 +1190,7 @@ sub check_pgbench_logs
 
 	# $prefix is simple enough, thus does not need escaping
 	my @logs = list_files($dir, qr{^$prefix\..*$});
-	ok(@logs == $nb, "number of log files");
+	ok(@logs == $nb, "number of log files (@logs)");
 	ok(grep(/\/$prefix\.\d+(\.\d+)?$/, @logs) == $nb, "file name format");
 
 	my $log_number = 0;
@@ -1219,7 +1222,58 @@ sub check_pgbench_logs
 
 my $bdir = $node->basedir;
 
-# Run with sampling rate, 2 clients with 50 transactions each.
+TODO: {
+	#
+	# Test time-sensitive features on a light read-only transaction
+	#
+	local $TODO = "possibly unreliable on slow hosts or unlucky runs";
+
+	# Run with sampling rate, 2 clients with 50 transactions each.
+	#
+	#   -T: bench duration, 2 seconds to exercise progress & logs
+	#   -P: progress report
+	#   --aggregate-interval: periodic aggregated logs
+	#   --rate: schedule load
+	#   --latency-limit: max delay, not deeply exercice
+	#
+	# note: the --rate behavior is probabilistic in nature.
+	# note: --progress-timestamp is not tested.
+	my $delay = pgbench(
+		'-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
+		. ' --rate=20 --latency-limit=1000 -j ' . $nthreads
+		. ' -c 3 -r',
+		0,
+		[   qr{type: multiple},
+			qr{clients: 3},
+			qr{threads: $nthreads},
+			qr{duration: 2 s},
+			qr{script 1: .* select only},
+			qr{script 2: .* select only},
+			qr{statement latencies in milliseconds},
+			qr{FROM pgbench_accounts} ],
+		[ qr{vacuum}, qr{progress: 1\b} ],
+		'pgbench progress', undef,
+		"--log-prefix=$bdir/001_pgbench_log_1");
+
+	# The rate may results in an unlucky schedule which triggers
+	# an early exit, hence the loose bound.
+
+	# also, the delay may totally fail on very slow or overloaded hosts,
+	# valgrind runs...
+
+	ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds");
+
+	# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+	# only 1 or as many as 3 progress reports per thread.
+	# aggregate log format is:
+	#   unix_epoch_time #tx sum sum2 min max [sum sum2 min max [skipped]]
+	# first series about latency; second about lag (--rate) ;
+	# skipped only if --latency-limit is set.
+	check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+		qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
+}
+
+# with sampling rate, 2 clients with 50 tx each
 pgbench(
 	"-n -S -t 50 -c 2 --log --sampling-rate=0.5", 0,
 	[ qr{select only}, qr{processed: 100/100} ], [qr{^$}],

Re: Enhanced error message to include hint messages for redundant options error

2021-07-11 Thread vignesh C

.On Sun, Jul 11, 2021 at 2:57 PM Dean Rasheed  wrote:
>
> On Sat, 10 Jul 2021 at 17:03, vignesh C  wrote:
> >
> > > Also, I don't think this function should be marked inline -- using a
> > > normal function ought to help make the compiled code smaller.
> >
> > inline instructs the compiler to attempt to embed the function content
> > into the calling code instead of executing an actual call. I think we
> > should keep it inline to reduce the function call.
>
> Hmm, I'd say that inline should generally be used sparingly, and only
> for small functions that are called very often, to avoid the function
> call overhead, and generate a faster and possibly smaller executable.
> (Though I think sometimes it can still be faster if the executable is
> larger.)
>
> In this case, it's a function that is only called under error
> conditions, so it's not commonly called, and we don't care so much
> about performance when we're about to throw an error.
>
> Also, if you look at an ereport() such as
>
> ereport(ERROR,
> errcode(ERRCODE_SYNTAX_ERROR),
> errmsg("conflicting or redundant options"),
> parser_errposition(pstate, defel->location)));
>
> This is a macro that's actually expanded into 5 separate function calls:
>
>  - errstart() / errstart_cold()
>  - errcode()
>  - errmsg()
>  - parser_errposition()
>  - errfinish()
>
> so it's a non-trivial amount of code. Whereas, if it's not inlined, it
> becomes just one function call at each call-site, making for smaller,
> faster code in the typical case where an error is not being raised.
>
> Of course, it's possible the compiler might still decide to inline the
> function, if it thinks that's preferable. In some cases, we explicitly
> mark this type of function with pg_noinline, to avoid that, and reduce
> code bloat where it's used in lots of small, fast functions (see, for
> example, float_overflow_error()).
>
> In general though, I think inline and noinline should be reserved for
> special cases where they give a clear, measurable benefit, and that in
> general it's better to not mark the function and just let the compiler
> decide.

Ok, that makes sense. As this flow is mainly in the error part it is
ok. I will change it.

Regards,
Vignesh

Re: Slow standby snapshot

2021-07-11 Thread Michail Nikolaev

Hello, Kirill.

> Also, maybe it is better to reduce the invasivity by using a more
> simple approach. For example, use the first bit to mark xid as valid
> and the last 7 bit (128 values) as an optimistic offset to the next
> valid xid (jump by 127 steps in the worse scenario).
> What do you think?

I have tried such an approach but looks like it is not effective,
probably because of CPU caching issues.

I have looked again at your patch, ut seems like it has a lot of
issues at the moment:

* error in KnownAssignedXidsGetOldestXmin, `i` is uninitialized, logic is wrong
* error in compressing function
(```KnownAssignedXidsValidDLL[compress_index].prv = prv;```, `prv` is
never updated)
* probably other errors?
* compilation warnings
* looks a little complex logic with `KAX_DLL_ENTRY_INVALID`
* variable\methods placing is bad (see `KAX_E_INVALID` and others)
* need to update documentation about KnownAssignedXidsValid, see ```To
keep individual deletions cheap, we need to allow gaps in the array```
in procarray.c
* formatting is broken

Do you have plans to update it? If not - I could try to rewrite it.

Also, what is about to add a patch to commitfest?

Thanks,
Michail.

Re: enable_resultcache confusion

2021-07-11 Thread Justin Pryzby

On Mon, Jul 12, 2021 at 02:56:58AM +1200, David Rowley wrote:
> On Sat, 10 Jul 2021 at 07:30, Robert Haas  wrote:
> >
> > On Fri, Jul 9, 2021 at 11:35 AM David Rowley  wrote:
> > > I really like that name.
> > >
> > > I'll wait to see if anyone else wants to voice their opinion before I
> > > do any renaming work.
> >
> > I like it, too.
> 
> Great.  I've attached my first draft patch to do the renaming.

In REL_14, maybe you'd also want to update the release notes reference

|This is useful if only a small percentage of rows is checked on
|the inner side and is controlled by .

-- 
Justin

Re: pg_stats and range statistics

2021-07-11 Thread Soumyadeep Chakraborty

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.

Attached is v2 of this patch with some cosmetic changes. Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

One question:

We do have the option of representing the histogram of lower bounds separately
from the histogram of upper bounds, as two separate view columns. Don't know if
there is much utility though and there is a fair bit of added complexity: see
below. Thoughts?

My attempts via SQL (unnest -> lower|upper -> array_agg) were futile given
unnest does not play nice with anyarray. For instance:

select unnest(stavalues1) from pg_statistic;
ERROR:  cannot determine element type of "anyarray" argument

Maybe the only option is to write a UDF pg_get_{lower|upper}_bounds_histogram
which can do something similar to what calc_hist_selectivity does:

/*
 * Convert histogram of ranges into histograms of its lower and upper
 * bounds.
 */
nhist = hslot.nvalues;
hist_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
hist_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
for (i = 0; i < nhist; i++)
{
bool empty;

range_deserialize(rng_typcache, DatumGetRangeTypeP(hslot.values[i]),
  &hist_lower[i], &hist_upper[i], &empty);
/* The histogram should not contain any empty ranges */
if (empty)
elog(ERROR, "bounds histogram contains an empty range");
}

This is looking good and ready.

[1] 
https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f
[2] https://www.postgresql.org/docs/devel/view-pg-stats.html

Regards,
Soumyadeep (VMware)
From 1f2ed0911eb3f7a53b16047cefb499692daf5ef6 Mon Sep 17 00:00:00 2001
From: Egor Rogov 
Date: Sun, 11 Jul 2021 11:09:45 -0700
Subject: [PATCH v2 1/1] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml   | 32 
 src/backend/catalog/system_views.sql | 23 +++-
 src/include/catalog/pg_statistic.h   |  3 +++
 src/test/regress/expected/rules.out  | 26 +-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f517a7d4af..7168ff9a13 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -12877,6 +12877,38 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
non-null elements.  (Null for scalar types.)
   
  
+
+ 
+  
+   range_length_histogram anyarray
+  
+  
+   A histogram of the lengths of non-empty and non-null range values of a
+   range type column. (Null for non-range types.)
+  
+ 
+
+ 
+  
+   empty_range_frac float4
+  
+  
+   Fraction of column entries whose values are empty ranges.
+   (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_bounds_histograms anyarray
+  
+  
+   Histograms of lower and upper bounds of non-empty, non-null ranges,
+   combined into a single array of range values. The lower and upper bounds
+   of each value correspond to the histograms of lower and upper bounds
+   respectively. (Null for non-range types.)
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 999d984068..d8bc622ad5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 WHEN stakind3 = 5 THEN stanumbers3
 WHEN stakind4 = 5 THEN stanumbers4
 WHEN stakind5 = 5 THEN stanumbers5
-END AS elem_count_histogram
+END AS elem_count_histogram,
+CASE
+WHEN stakind1 = 6 THEN stavalues1
+WHEN stakind2 = 6 THEN stavalues2
+WHEN stakind3 = 6 THEN stavalues3
+WHEN stakind4 = 6 THEN stavalues4
+WHEN stakind5 = 6 THEN stavalues5
+END AS range_length_histogram,
+CASE
+WHEN stakind1 = 6 THEN stanumbers1[1]
+WHEN stakind2 = 6 THEN stanumbers2[1]
+WHEN stakind3 = 6 THEN stanumbers3[1]
+WHEN stakind4 = 6 THEN stanumbers4[1]
+WHEN stakind5 = 6 THEN stanumbers5[1]
+END AS empty_range_frac,
+CASE
+WHEN stakind1 = 7 THEN stavalues1
+

Re: row filtering for logical replication

2021-07-11 Thread Euler Taveira

On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:
> Hi.
> 
> I have been looking at the latest patch set (v16). Below are my review
> comments and some patches.
> 
Peter, thanks for your detailed review. Comments are inline.

> 1. Patch 0001 comment - typo
> 
> you can optionally filter rows that does not satisfy a WHERE condition
> 
> typo: does/does
Fixed.

> 
> 2. Patch 0001 comment - typo
> 
> The WHERE clause should probably contain only columns that are part of
> the primary key or that are covered by REPLICA IDENTITY. Otherwise,
> and DELETEs won't be replicated.
> 
> typo: "Otherwise, and DELETEs" ??
Fixed.

> 3. Patch 0001 comment - typo and clarification
> 
> If your publication contains partitioned table, the parameter
> publish_via_partition_root determines if it uses the partition row filter (if
> the parameter is false -- default) or the partitioned table row filter.
> 
> Typo: "contains partitioned table" -> "contains a partitioned table"
Fixed.

> Also, perhaps the text "or the partitioned table row filter." should
> say "or the root partitioned table row filter." to disambiguate the
> case where there are more levels of partitions like A->B->C. e.g. What
> filter does C use?
I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
root partitioned table is used. We should improve that sentence too.

> 4. src/backend/catalog/pg_publication.c - misleading names
> 
> -publication_add_relation(Oid pubid, Relation targetrel,
> +publication_add_relation(Oid pubid, PublicationRelationInfo *targetrel,
>   bool if_not_exists)
> 
> Leaving this parameter name as "targetrel" seems a bit misleading now
> in the function code. Maybe this should be called something like "pri"
> which is consistent with other places where you have declared
> PublicationRelationInfo.
> 
> Also, consider declaring some local variables so that the patch may
> have less impact on existing code. e.g.
> Oid relid = pri->relid
> Relation *targetrel = relationinfo->relation
Done.

> 5. src/backend/commands/publicationcmds.c - simplify code
> 
> - rels = OpenTableList(stmt->tables);
> + if (stmt->tableAction == DEFELEM_DROP)
> + rels = OpenTableList(stmt->tables, true);
> + else
> + rels = OpenTableList(stmt->tables, false);
> 
> Consider writing that code more simply as just:
> 
> rels = OpenTableList(stmt->tables, stmt->tableAction == DEFELEM_DROP);
It is not a common pattern to use an expression as a function argument in
Postgres. I prefer to use a variable with a suggestive name.

> 6. src/backend/commands/publicationcmds.c - bug?
> 
> - CloseTableList(rels);
> + CloseTableList(rels, false);
> }
> 
> Is this a potential bug? When you called OpenTableList the 2nd param
> was maybe true/false, so is it correct to be unconditionally false
> here? I am not sure.
Good catch.

> 7. src/backend/commands/publicationcmds.c - OpenTableList function comment.
> 
>   * Open relations specified by a RangeVar list.
> + * AlterPublicationStmt->tables has a different list element, hence, is_drop
> + * indicates if it has a RangeVar (true) or PublicationTable (false).
>   * The returned tables are locked in ShareUpdateExclusiveLock mode in order 
> to
>   * add them to a publication.
> 
> I am not sure about this. Should that comment instead say "indicates
> if it has a Relation (true) or PublicationTable (false)"?
Fixed.

> 8. src/backend/commands/publicationcmds.c - OpenTableList
>8

> For some reason it feels kind of clunky to me for this function to be
> processing the list differently according to the 2nd param. e.g. the
> name "is_drop" seems quite unrelated to the function code, and more to
> do with where it was called from. Sorry, I don't have any better ideas
> for improvement atm.
My suggestion is to rename it to "pub_drop_table".

> 9. src/backend/commands/publicationcmds.c - OpenTableList bug?
>8

> I felt maybe this is a possible bug here because there seems no code
> explicitly assigning the whereClause = NULL  if "is_drop" is true so
> maybe it can have a garbage value which could cause problems later.
> Maybe this is fixed by using palloc0.
Fixed.

> 10. src/backend/commands/publicationcmds.c - CloseTableList function comment
>8

> Probably the meaning of "is_drop" should be described in this function 
> comment.
Done.

> 11. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry 
> signature.
> 
> -static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid);
> +static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation 
> rel);
> 
> I see that this function signature is modified but I did not see how
> this parameter refactoring is actually related to the RowFilter patch.
> Perhaps I am mistaken, but IIUC this only changes the relid =
> RelationGetRelid(rel); to be done inside this function instead of
> being done outside by the callers.
It is not critical for this patch so I removed it.

> 12. src/backend/replication/pgoutput/pgoutput.c - missing function comme

Re: row filtering for logical replication

2021-07-11 Thread Euler Taveira

On Mon, Jul 5, 2021, at 12:14 AM, Greg Nancarrow wrote:
> I have some review comments on the "Row filter for logical replication" patch:
> 
> (1) Suggested update to patch comment:
> (There are some missing words and things which could be better expressed)
I incorporated all your wording suggestions.

> (2) Some inconsistent error message wording:
> 
> Currently:
> err = _("cannot use subquery in publication WHERE expression");
> 
> Suggest changing it to:
> err = _("subqueries are not allowed in publication WHERE expressions");
The same expression "cannot use subquery in ..." is used in the other switch
cases. If you think this message can be improved, I suggest that you submit a
separate patch to change all sentences.

> 
> Other examples from the patch:
> err = _("aggregate functions are not allowed in publication WHERE 
> expressions");
> err = _("grouping operations are not allowed in publication WHERE 
> expressions");
> err = _("window functions are not allowed in publication WHERE expressions");
> errmsg("functions are not allowed in publication WHERE expressions"),
> err = _("set-returning functions are not allowed in publication WHERE
> expressions");
This is a different function. I just followed the same wording from similar
sentences around it.

> 
> (3) The current code still allows arbitrary code execution, e.g. via a
> user-defined operator:
I fixed it in v18.

> Perhaps add the following after the existing shell error-check in make_op():
> 
> /* User-defined operators are not allowed in publication WHERE clauses */
> if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && opform->oid
> >= FirstNormalObjectId)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("user-defined operators are not allowed in publication
> WHERE expressions"),
> parser_errposition(pstate, location)));
I'm still working on a way to accept built-in functions but while we don't have
it, let's forbid custom operators too.


> 
> Also, I believe it's also allowing user-defined CASTs (so could add a
> similar check to above in transformTypeCast()).
> Ideally, it would be preferable to validate/check publication WHERE
> expressions in one central place, rather than scattered all over the
> place, but that might be easier said than done.
> You need to update the patch comment accordingly.
I forgot to mention it in the patch I sent a few minutes ago. I'm not sure we
need to mention every error condition (specially one that will be rarely used).

> (4) src/backend/replication/pgoutput/pgoutput.c
> pgoutput_change()
> 
> The 3 added calls to pgoutput_row_filter() are returning from
> pgoutput_change(), if false is returned, but instead they should break
> from the switch, otherwise cleanup code is missed. This is surely a
> bug.
Fixed.

In summary, v18 contains

* Peter Smith's review
* Greg Nancarrow's review
* cache ExprState
* cache TupleTableSlot
* forbid custom operators
* various fixes


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Protect against possible memory corruption (src/backend/access/nbtree/nbtxlog.c)

2021-07-11 Thread Ranier Vilela

Hi,

While analyzing a possible use of an uninitialized variable, I checked that
*_bt_restore_page* can lead to memory corruption,
by not checking the maximum limit of array items which is
MaxIndexTuplesPerPage.

It can also generate a dangling pointer by incrementing it beyond the
limits it can point to.

While there, I promoted a reduction of scope and adaptation of the type of
the *len* parameter to match XLogRecGetBlockData function.

pass regress check at Windows and check-world at Linux.

regards,
Ranier Vilela


0001-_bt_restore_page-have-issues-can-lead-a-memory-corru.patch
Description: Binary data

Re: Protect against possible memory corruption (src/backend/access/nbtree/nbtxlog.c)

2021-07-11 Thread Heikki Linnakangas


On 11/07/2021 22:51, Ranier Vilela wrote:

Hi,

While analyzing a possible use of an uninitialized variable, I checked that
*_bt_restore_page* can lead to memory corruption,
by not checking the maximum limit of array items which is
MaxIndexTuplesPerPage.



+   /* Protect against corrupted recovery file */
+   nitems = (len / sizeof(IndexTupleData));
+   if (nitems < 0 || nitems > MaxIndexTuplesPerPage)
+   elog(PANIC, "_bt_restore_page: cannot restore %d items to 
page", nitems);
+


That's not right. You don't get the number of items by dividing like 
that. 'len' includes the tuple data as well, not just the IndexTupleData 
header.



@@ -73,12 +79,9 @@ _bt_restore_page(Page page, char *from, int len)
nitems = i;
 
 	for (i = nitems - 1; i >= 0; i--)

-   {
if (PageAddItem(page, items[i], itemsizes[i], nitems - i,
false, false) == 
InvalidOffsetNumber)
elog(PANIC, "_bt_restore_page: cannot add item to 
page");
-   from += itemsz;
-   }
 }


I agree with this change (except that I would leave the braces in 
place). The 'from' that's calculated here is plain wrong; oversight in 
commit 7e30c186da. Fortunately it's not used, so it can just be removed.


- Heikki

Re: proposal - psql - use pager for \watch command

2021-07-11 Thread Thomas Munro

On Sun, Jul 11, 2021 at 1:18 AM vignesh C  wrote:
> On Wed, May 12, 2021 at 5:45 PM Pavel Stehule  wrote:
> > looks so with your patch psql doesn't work on ms

Here's a fix for Windows.  The pqsignal() calls are #ifdef'd out.  I
also removed a few lines that were added after commit 3a513067 but
aren't needed anymore after fae65629.
From b6b3b420e10f389ebbf4ae1d21e34b390b62ce67 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 12 Jul 2021 10:34:40 +1200
Subject: [PATCH v8] Add PSQL_WATCH_PAGER for psql's \watch command.

Allow a pager to be used by the \watch command.  This works but isn't
very useful with traditional pagers like "less", so use a different
environment variable.  The popular open source tool "pspg" (written by
the first author of this patch) knows how to display the output if you
set PSQL_WATCH_PAGER="pspg --stream".

To make \watch react quickly when the user quits the pager or presses
^C, and also to increase the accuracy of its timing and decrease the
rate of useless context switches, change the main loop of the \watch
command to use sigwait() rather than a sleeping/polling loop, on Unix.

Supported on Unix only for now (like pspg).

Author: Pavel Stehule 
Author: Thomas Munro 
Discussion: https://postgr.es/m/CAFj8pRBfzUUPz-3gN5oAzto9SDuRSq-TQPfXU_P6h0L7hO%2BEhg%40mail.gmail.com
---
 doc/src/sgml/ref/psql-ref.sgml |  28 +++
 src/bin/psql/command.c | 133 ++---
 src/bin/psql/common.c  |  11 ++-
 src/bin/psql/common.h  |   2 +-
 src/bin/psql/help.c|   6 +-
 src/bin/psql/startup.c |  19 +
 6 files changed, 184 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index a8dfc41e40..da53306857 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3002,6 +3002,16 @@ lo_import 152801
   (such as more) is used.
   
 
+  
+  When using the \watch command to execute a query
+  repeatedly, the environment variable PSQL_WATCH_PAGER
+  is used to find the pager program instead, on Unix systems.  This is
+  configured separately because it may confuse traditional pagers, but
+  can be used to send output to tools that understand
+  psql's output format (such as
+  pspg --stream).
+  
+
   
   When the pager option is off, the pager
   program is not used. When the pager option is
@@ -4672,6 +4682,24 @@ PSQL_EDITOR_LINENUMBER_ARG='--line '
 

 
+   
+PSQL_WATCH_PAGER
+
+
+ 
+  When a query is executed repeatedly with the \watch
+  command, a pager is not used by default.  This behavior can be changed
+  by setting PSQL_WATCH_PAGER to a pager command, on Unix
+  systems.  The pspg pager (not part of
+  PostgreSQL but available in many open source
+  software distributions) can display the output of
+  \watch if started with the option
+  --stream.
+ 
+
+
+   
+

 PSQLRC
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 543401c6d6..cc79da9013 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -13,6 +13,7 @@
 #include 
 #ifndef WIN32
 #include 			/* for stat() */
+#include 			/* for setitimer() */
 #include /* open() flags */
 #include /* for geteuid(), getpid(), stat() */
 #else
@@ -4894,8 +4895,17 @@ do_watch(PQExpBuffer query_buf, double sleep)
 	const char *strftime_fmt;
 	const char *user_title;
 	char	   *title;
+	const char *pagerprog = NULL;
+	FILE	   *pagerpipe = NULL;
 	int			title_len;
 	int			res = 0;
+#ifndef WIN32
+	sigset_t	sigalrm_sigchld_sigint;
+	sigset_t	sigalrm_sigchld;
+	sigset_t	sigint;
+	struct itimerval interval;
+	bool		done = false;
+#endif
 
 	if (!query_buf || query_buf->len <= 0)
 	{
@@ -4903,6 +4913,58 @@ do_watch(PQExpBuffer query_buf, double sleep)
 		return false;
 	}
 
+#ifndef WIN32
+	sigemptyset(&sigalrm_sigchld_sigint);
+	sigaddset(&sigalrm_sigchld_sigint, SIGCHLD);
+	sigaddset(&sigalrm_sigchld_sigint, SIGALRM);
+	sigaddset(&sigalrm_sigchld_sigint, SIGINT);
+
+	sigemptyset(&sigalrm_sigchld);
+	sigaddset(&sigalrm_sigchld, SIGCHLD);
+	sigaddset(&sigalrm_sigchld, SIGALRM);
+
+	sigemptyset(&sigint);
+	sigaddset(&sigint, SIGINT);
+
+	/*
+	 * Block SIGALRM and SIGCHLD before we start the timer and the pager (if
+	 * configured), to avoid races.  sigwait() will receive them.
+	 */
+	sigprocmask(SIG_BLOCK, &sigalrm_sigchld, NULL);
+
+	/*
+	 * Set a timer to interrupt sigwait() so we can run the query at the
+	 * requested intervals.
+	 */
+	interval.it_value.tv_sec = sleep_ms / 1000;
+	interval.it_value.tv_usec = (sleep_ms % 1000) * 1000;
+	interval.it_interval = interval.it_value;
+	if (setitimer(ITIMER_REAL, &interval, NULL) < 0)
+	{
+		pg_log_error("could not set timer: %m");
+		done = true;
+	}
+#endif
+
+	/*
+	 * For \watch, we ignore the size of the result and always use the pager
+

Re: row filtering for logical replication

2021-07-11 Thread Tomas Vondra


Hi,

I took a look at this patch, which seems to be in CF since 2018. I have 
only some basic comments and observations at this point:


1) alter_publication.sgml

I think "expression is executed" sounds a bit strange, perhaps 
"evaluated" would be better?


2) create_publication.sgml

Why is the patch changing publish_via_partition_root docs? That seems 
like a rather unrelated bit.


   The WHERE clause should probably contain only
   columns that are part of the primary key or be covered by
   REPLICA ...

I'm not sure what exactly is this trying to say. What does "should 
probably ..." mean in practice for the users? Does that mean something 
bad will happen for other columns, or what? I'm sure this wording will 
be quite confusing for users.


It may also be unclear whether the condition is evaluated on the old or 
new row, so perhaps add an example illustrating that & more detailed 
comment, or something. E.g. what will happen with


UPDATE departments SET active = false WHERE active;


3) publication_add_relation

Does this need to build the parse state even for whereClause == NULL?


4) AlterPublicationTables

I wonder if this new reworked code might have issues with subscriptions 
containing many tables, but I haven't tried.



5) OpenTableList

I really dislike that the list can have two different node types 
(Relation and PublicationTable). In principle we don't actually need the 
extra flag, we can simply check the node type directly by IsA() and act 
based on that. However, I think it'd be better to just use a single node 
type from all places.


I don't see why not to set whereClause every time, I don't think the 
extra if saves anything, it's just a bit more complex.



5) CloseTableList

The comment about node types seems pointless, this function has no flag 
and the element type does not matter.



6) parse_agg.c

   ... are not allowed in publication WHERE expressions

I think all similar cases use "WHERE conditions" instead.


7) transformExprRecurse

The check at the beginning seems rather awkward / misplaced - it's way 
too specific for this location (there are no other p_expr_kind 
references in this function). Wouldn't transformFuncCall (or maybe 
ParseFuncOrColumn) be a more appropriate place?


Initially I was wondering why not to allow function calls in WHERE 
conditions, but I see that was discussed in the past as problematic. But 
that reminds me that I don't see any docs describing what expressions 
are allowed in WHERE conditions - maybe we should explicitly list what 
expressions are allowed?



8) pgoutput.c

I have not reviewed this in detail yet, but there seems to be something 
wrong because `make check-world` fails in subscription/010_truncate.pl 
after hitting an assert  (backtrace attached) during "START_REPLICATION 
SLOT" in get_rel_sync_entry in this code:


/* Release tuple table slot */
if (entry->scantuple != NULL)
{
ExecDropSingleTupleTableSlot(entry->scantuple);
entry->scantuple = NULL;
}

So there seems to be something wrong with how the slot is created.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Program terminated with signal SIGABRT, Aborted.
#0  0x7e8ed2ec29e5 in ?? () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install 
glibc-2.32-8.fc33.x86_64 libgcc-10.3.1-1.fc33.x86_64
(gdb) bt
#0  0x7e8ed2ec29e5 in ?? () from /lib64/libc.so.6
#1  0x7e8ed2eab8a4 in ?? () from /lib64/libc.so.6
#2  0x00afffe1 in ExceptionalCondition (conditionName=0xb7f8e5 
"tupdesc->tdrefcount > 0", errorType=0xb7f8a3 "FailedAssertion", 
fileName=0xb7f810 "tupdesc.c", lineNumber=386) at assert.c:69
#3  0x004a55cb in DecrTupleDescRefCount (tupdesc=0x7e8ed23b4578) at 
tupdesc.c:386
#4  0x0073fd84 in ExecDropSingleTupleTableSlot (slot=0x7e8ed23b8910) at 
execTuples.c:1261
#5  0x7e8ed23f2aed in get_rel_sync_entry (data=0x2875440, 
relation=0x7e8ed23b86c0) at pgoutput.c:1258
#6  0x7e8ed23f20f8 in pgoutput_truncate (ctx=0x286cc70, txn=0x2888c48, 
nrelations=1, relations=0x2892dd8, change=0x288b138) at pgoutput.c:890
#7  0x008cf03b in truncate_cb_wrapper (cache=0x286ec80, txn=0x2888c48, 
nrelations=1, relations=0x2892dd8, change=0x288b138) at logical.c:1083
#8  0x008d9605 in ReorderBufferApplyTruncate (rb=0x286ec80, 
txn=0x2888c48, nrelations=1, relations=0x2892dd8, change=0x288b138, 
streaming=false) at reorderbuffer.c:1928
#9  0x008da08c in ReorderBufferProcessTXN (rb=0x286ec80, txn=0x2888c48, 
commit_lsn=23903752, snapshot_now=0x286f168, command_id=2, streaming=false) at 
reorderbuffer.c:2314
#10 0x008da845 in ReorderBufferReplay (txn=0x2888c48, rb=0x286ec80, 
xid=744, commit_lsn=23903752, end_lsn=23903960, commit_time=679358346883121, 
origin_id=0, origin_lsn=0) at reorderbuffer.c:2619
#11 0x008da8c3 in ReorderBufferCommit (rb=0x286ec80, xid=744, 
commit_lsn=23903752, end_lsn=23903960

Re: row filtering for logical replication

2021-07-11 Thread Euler Taveira

On Sun, Jul 11, 2021, at 4:39 PM, Euler Taveira wrote:
> with cache (v18)
> ---
> 
> mean:   0.63 us
> stddev: 1.07 us
> median: 0.55 us
> min-max:[0.29 .. 844.87] us
> percentile(99): 1.38 us
> mode:   0.41 us
> 
> It represents -57%. It is a really good optimization for just a few extra 
> lines
> of code.
cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed
in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to
reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos,
it could probably be a transient issue.

$ uname -a
FreeBSD freebsd12 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC  amd64
$ PROVE_TESTS="t/010_truncate.pl" gmake check
gmake -C ../../../src/backend generated-headers
gmake[1]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend'
gmake -C catalog distprep generated-header-symlinks
gmake[2]: Entering directory 
'/usr/home/euler/pglr-row-filter-v17/src/backend/catalog'
gmake[2]: Nothing to be done for 'distprep'.
gmake[2]: Nothing to be done for 'generated-header-symlinks'.
gmake[2]: Leaving directory 
'/usr/home/euler/pglr-row-filter-v17/src/backend/catalog'
gmake -C utils distprep generated-header-symlinks
gmake[2]: Entering directory 
'/usr/home/euler/pglr-row-filter-v17/src/backend/utils'
gmake[2]: Nothing to be done for 'distprep'.
gmake[2]: Nothing to be done for 'generated-header-symlinks'.
gmake[2]: Leaving directory 
'/usr/home/euler/pglr-row-filter-v17/src/backend/utils'
gmake[1]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend'
rm -rf '/home/euler/pglr-row-filter-v17'/tmp_install
/bin/sh ../../../config/install-sh -c -d 
'/home/euler/pglr-row-filter-v17'/tmp_install/log
gmake -C '../../..' DESTDIR='/home/euler/pglr-row-filter-v17'/tmp_install 
install >'/home/euler/pglr-row-filter-v17'/tmp_install/log/install.log 2>&1
gmake -j1  checkprep 
>>'/home/euler/pglr-row-filter-v17'/tmp_install/log/install.log 2>&1
rm -rf '/usr/home/euler/pglr-row-filter-v17/src/test/subscription'/tmp_check
/bin/sh ../../../config/install-sh -c -d 
'/usr/home/euler/pglr-row-filter-v17/src/test/subscription'/tmp_check
cd . && TESTDIR='/usr/home/euler/pglr-row-filter-v17/src/test/subscription' 
PATH="/home/euler/pglr-row-filter-v17/tmp_install/home/euler/pgrf18/bin:$PATH" 
LD_LIBRARY_PATH="/home/euler/pglr-row-filter-v17/tmp_install/home/euler/pgrf18/lib"
 LD_LIBRARY_PATH_RPATH=1 PGPORT='6' 
PG_REGRESS='/usr/home/euler/pglr-row-filter-v17/src/test/subscription/../../../src/test/regress/pg_regress'
 /usr/local/bin/prove -I ../../../src/test/perl/ -I .  t/010_truncate.pl
t/010_truncate.pl .. ok
All tests successful.
Files=1, Tests=14,  5 wallclock secs ( 0.02 usr  0.00 sys +  1.09 cusr  0.99 
csys =  2.10 CPU)
Result: PASS


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Protect against possible memory corruption (src/backend/access/nbtree/nbtxlog.c)

2021-07-11 Thread Ranier Vilela

Em dom., 11 de jul. de 2021 às 19:19, Heikki Linnakangas 
escreveu:

> On 11/07/2021 22:51, Ranier Vilela wrote:
> > Hi,
> >
> > While analyzing a possible use of an uninitialized variable, I checked
> that
> > *_bt_restore_page* can lead to memory corruption,
> > by not checking the maximum limit of array items which is
> > MaxIndexTuplesPerPage.
>
> > + /* Protect against corrupted recovery file */
> > + nitems = (len / sizeof(IndexTupleData));
> > + if (nitems < 0 || nitems > MaxIndexTuplesPerPage)
> > + elog(PANIC, "_bt_restore_page: cannot restore %d items to
> page", nitems);
> > +
>
> That's not right. You don't get the number of items by dividing like
> that. 'len' includes the tuple data as well, not just the IndexTupleData
> header.
>
Thanks for the quick review.

Not totally wrong.
If it is not possible, know the upper limits, before the loop.
It is necessary to do this inside the loop.

attached v1 of patch.

regards,
Ranier Vilela


v1-0001-_bt_restore_page-have-issues-can-lead-a-memory-co.patch
Description: Binary data

Re: enable_resultcache confusion

2021-07-11 Thread David Rowley

On Mon, 12 Jul 2021 at 03:22, Justin Pryzby  wrote:
> |This is useful if only a small percentage of rows is checked on
> |the inner side and is controlled by  |linkend="guc-enable-resultcache"/>.

You might be right there, but I'm not too sure if I changed that that
it might cause a mention of the rename to be missed in the changes
since beta2 notes.

Additionally, I was unsure about touching typedefs.list. In the patch
I changed it, but wasn't too sure if that was the correct thing to do.
In normal circumstances, i.e writing new code, I'd not touch it.

David

Re: row filtering for logical replication

2021-07-11 Thread Alvaro Herrera

Hi

Andres complained about the safety of doing general expression
evaluation in pgoutput; that was first in 

https://postgr.es/m/20210128022032.eq2qqc6zxkqn5...@alap3.anarazel.de
where he described a possible approach to handle it by restricting
expressions to have limited shape; and later in
http://postgr.es/m/20210331191710.kqbiwe73lur7j...@alap3.anarazel.de

I was just scanning the patch trying to see if some sort of protection
had been added for this, but I couldn't find anything.  (Some functions
are under-commented, though).  So, is it there already, and if so what
is it?  And if it isn't, then I think it should definitely be put there
in some form.

Thanks

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: Cosmic ray hits integerset

2021-07-11 Thread Greg Stark

Fwiw, yes it could be a cosmic ray.

It could also just be marginally bad ram. Bad ram is notoriously hard
to reliably test for. It can be very sensitive to the exact bit
pattern stored in it, the timing of reads and writes, and other
factors. The whole point of the rowhammer attacks is to push some of
those timing factors hard but the same failures can happen randomly.

On Wed, 7 Jul 2021 at 08:14, Joe Conway  wrote:
>
> On 7/7/21 2:53 AM, Jakub Wartak wrote:
> > Hi, Asking out of pure technical curiosity about "the rhinoceros" - what 
> > kind of animal is it ? Physical box or VM? How one could get dmidecode(1) / 
> > dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?)
>
>
> Rhinoceros is just a VM on a simple desktop machine. Nothing fancy.
>
> Joe
>
> --
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises
> Consulting, Training, & Open Source Development
>
>

-- 
greg

Re: row filtering for logical replication

2021-07-11 Thread Greg Nancarrow

On Mon, Jul 12, 2021 at 9:31 AM Euler Taveira  wrote:
>
> cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed
> in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to
> reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos,
> it could probably be a transient issue.
>

I don't think it's a transient issue.
I also get a test failure in subscription/010_truncate.pl when I run
"make check-world" with the v18 patches applied.
The problem can be avoided with the following change (to match what
was originally in my v17-0005 performance-improvement patch):

diff --git a/src/backend/replication/pgoutput/pgoutput.c
b/src/backend/replication/pgoutput/pgoutput.c
index 08c018a300..800bae400b 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1256,8 +1256,8 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 }

 /* create a tuple table slot for row filter */
-tupdesc = RelationGetDescr(relation);
 oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+tupdesc = CreateTupleDescCopy(RelationGetDescr(relation));
 entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
 MemoryContextSwitchTo(oldctx);

This creates a TupleDesc copy in CacheMemoryContext that is not
refcounted, so it side-steps the problem.
At this stage I am not sure why the original v18 patch code doesn't
work correctly for the TupleDesc refcounting here.
The TupleDesc refcount is zero when it's time to dealloc the tuple
slot (thus causing that Assert to fire), yet when the slot was
created, the TupleDesc refcount was incremented.- so it seems
something else has already decremented the refcount by the time it
comes to deallocate the slot. Perhaps there's an order-of-cleanup or
MemoryContext issue here or some buggy code somewhere, not sure yet.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.

2021-07-11 Thread Michael Paquier

On Sat, Jul 10, 2021 at 07:09:10PM +0530, Bharath Rupireddy wrote:
> Thanks. The patch looks good to me, except a minor comment - isn't it
> "int2 for these fields" as the fields still exist? + /* pageinspect >=
> 1.10 uses int4 instead of int2 for those fields */

This comment looks fine to me as-is, so applied on HEAD.
--
Michael


signature.asc
Description: PGP signature

Re: Record a Bitmapset of non-pruned partitions

2021-07-11 Thread David Rowley

On Sat, 10 Jul 2021 at 03:24, David Rowley  wrote:
> The good news is that the code in partitions_are_ordered() became even
> more simple as a result of this change. We can do ordered scan simply
> when !bms_overlap(live_parts, boundinfo->interleaved_parts).

I've spent a bit more time revising the 0002 patch so that we're a bit
more strict about when we mark a partition as interleaved.  For
example, if the DEFAULT partition happens to be the only partition,
then I'm no longer classing that as interleaved as there's nothing for
it to be interleaved with.

This also fixes up the not-so-robust check that I had to check if the
NULL partition allowed other Datums.

David

v3-0001-Track-non-pruned-partitions-in-RelOptInfo.patch
Description: Binary data

v3-0002-Allow-ordered-partition-scans-in-more-cases.patch
Description: Binary data

Re: Diagnostic comment in LogicalIncreaseXminForSlot

2021-07-11 Thread Amit Kapila

On Mon, Jul 5, 2021 at 12:54 PM Masahiko Sawada  wrote:
>
> On Fri, May 21, 2021 at 6:00 PM Ashutosh Bapat
>  wrote:
> >
> > It's there in CF. I am fine with PG-15. It will be good to patch the 
> > back-branches to have this extra diagnostic information available.
>
> The patch looks to me.
>

{
  slot->candidate_catalog_xmin = xmin;
  slot->candidate_xmin_lsn = current_lsn;
+ elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin,
+ LSN_FORMAT_ARGS(current_lsn));
  }
  SpinLockRelease(&slot->mutex);

Today, again looking at this patch, I don't think doing elog inside
spinlock is a good idea. We can either release spinlock before it or
use some variable to remember that we need to write such an elog and
do it after releasing the lock. What do you think? I have noticed that
a nearby function LogicalIncreaseRestartDecodingForSlot() logs similar
information after releasing spinlock, so it is better to follow the
same here as well.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-07-11 Thread Amit Kapila

On Fri, Jul 9, 2021 at 5:58 AM Masahiko Sawada  wrote:
>
> On Thu, Jul 8, 2021 at 6:28 PM Amit Kapila  wrote:
> >
> > On Wed, Jul 7, 2021 at 11:47 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jul 6, 2021 at 6:33 PM Amit Kapila  
> > > wrote:
> > > >
> > > > >
> > > > > According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
> > > > > parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
> > > > > specify a subset of parameters that can be specified by CREATE
> > > > > SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
> > > > > be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
> > > > > cannot be done. Are you concerned about adding a syntax to ALTER
> > > > > SUBSCRIPTION?
> > > > >
> > > >
> > > > Both for additional syntax and consistency with disable_on_error.
> > > > Isn't it just a current implementation that Alter only allows to
> > > > change parameters supported by Create? Is there a reason why we can't
> > > > allow Alter to set/change some parameters not supported by Create?
> > >
> > > I think there is not reason for that but looking at ALTER TABLE I
> > > thought there is such a policy.
> > >
> >
> > If we are looking for precedent then I think we allow to set
> > configuration parameters via Alter Database but not via Create
> > Database. Does that address your concern?
>
> Thank you for the info! But it seems like CREATE DATABASE doesn't
> support SET in the first place. Also interestingly, ALTER SUBSCRIPTION
> support both ENABLE/DISABLE and SET (enabled = on/off).
>

I think that is redundant but not sure if there is any reason behind doing so.

> I’m not sure
> from the point of view of consistency with other CREATE, ALTER
> commands, and disable_on_error but it might be better to avoid adding
> additional syntax.
>

If we can avoid introducing new syntax that in itself is a good reason
to introduce it as an option.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-07-11 Thread Amit Kapila

On Fri, Jul 9, 2021 at 9:02 AM Alexey Lesovsky  wrote:
>
> On Fri, Jul 9, 2021 at 5:43 AM Masahiko Sawada  wrote:
>>
>> > I also would like to clarify, when conflict is resolved - the error record 
>> > is cleared or kept in the view? If it is cleared, the error counter is 
>> > required (because we don't want to lose all history of errors). If it is 
>> > kept - the flag telling about the error is resolved is needed (or set xid 
>> > to NULL). I mean when the user is watching the view, he should be able to 
>> > identify if the error has already been resolved or not.
>>
>> With the current patch, once the conflict is resolved by skipping the
>> transaction in question, its entry on the stats view is cleared. As
>> you suggested, if we have the total error counts in that view, it
>> would be good to keep the count and clear other fields such as xid,
>> last_failure, and command etc.
>
>
> Ok, looks nice. But I am curious how this will work in the case when there 
> are two (or more) errors in the same subscription, but different relations?
>

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors. However, there is an
exception to it which is during initial table sync and I think the
view should have separate rows for each table sync.

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] logical decoding of two-phase transactions

2021-07-11 Thread Peter Smith

On Sun, Jul 11, 2021 at 8:20 PM Amit Kapila  wrote:
>
> On Fri, Jul 9, 2021 at 4:43 AM Peter Smith  wrote:
> >
> > > The patch looks good to me, I don't have any comments.
> >
> > I tried the v95-0001 patch.
> >
> > - The patch applied cleanly and all build / testing was OK.
> > - The documentation also builds OK.
> > - I checked all v95-0001 / v93-0001 differences and found no problems.
> > - Furthermore, I noted that v95-0001 patch is passing the cfbot [1].
> >
> > So this patch LGTM.
> >
>
> Thanks, I took another pass over it and made a few changes in docs and
> comments. I am planning to push this next week sometime (by 14th July)
> unless there are more comments from you or someone else. Just to
> summarize, this patch will add support for prepared transactions to
> built-in logical replication. To add support for streaming
> transactions at prepare time into the
> built-in logical replication, we need to do the following things: (a)
> Modify the output plugin (pgoutput) to implement the new two-phase API
> callbacks, by leveraging the extended replication protocol. (b) Modify
> the replication apply worker, to properly handle two-phase
> transactions by replaying them on prepare. (c) Add a new SUBSCRIPTION
> option "two_phase" to allow users to enable
> two-phase transactions. We enable the two_phase once the initial data
> sync is over. Refer to comments atop worker.c in the patch and commit
> message to see further details about this patch. After this patch,
> there is a follow-up patch to allow streaming and two-phase options
> together which I feel needs some more review and can be committed
> separately.
>

FYI - I repeated the same verification of the v96-0001 patch as I did
previously for v95-0001

- The v96 patch applied cleanly and all build / testing was OK.
- The documentation also builds OK.
- I checked the v95-0001 / v96-0001 differences and found no problems.
- Furthermore, I noted that v96-0001 patch is passing the cfbot.

LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-07-11 Thread kuroda.hay...@fujitsu.com

Dear Horiguchi-san,

Thank you for reviewing! I attached new version.
Sorry for delaying reply.

> Since we don't allow descriptors with the same name even if they are
> for the different connections, I think we can set the current
> connection if any (which is set either by AT option or statement-bound
> one) to i->connection silently if i->connection is NULL in
> lookup_descriptor(). What do you think about this?

I tried to implement. Is it correct?

> connection is "conn1" at the error time. The parser relies on
> output_statement and friends for connection name reset. So the rules
> that don't call the functions need to reset it by themselves.

Oh, I didn't notice that. Fixed.
I'm wondering why a output function is not implemented, like 
output_describe_statement(),
but anyway I put a connection reset in ecpg.addons.

> Similary, the following sequence doesn't yield an error, which is
> expected.
> 
> > EXEC SQL AT conn1 DECLARE stmt STATEMENT;
> > EXEC SQL AT conn2 EXECUTE stmt INTO ..;
> 
> In this case "conn2" set by the AT option is silently overwritten with
> "conn1" by check_declared_list(). I think we should reject AT option
> (with a different connection) in that case.

Actually this comes from Oracle's specification. Pro*C precompiler
overwrite their connection in the situation, hence I followed that.
But I agree this might be confused and I added the warning report.
How do you think? Is it still strange?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v03-0001-fix-deallocate-describe.patch
Description: v03-0001-fix-deallocate-describe.patch


v03-0002-add-test.patch
Description: v03-0002-add-test.patch

Re: More time spending with "delete pending"

2021-07-11 Thread Michael Paquier

On Fri, Jul 09, 2021 at 09:00:00PM +0300, Alexander Lakhin wrote:
> Thank you! I agree with your improvement. I can't remember why did I
> inject 'include "port.h"' in genfile.c.
> Today I've rechecked all the chain of includes and I see that stat() is
> redefined as _pgstat64() in win32_port.h, that includes .
> genfile.c includes "postgres.h" (that includes win32_port.h indirectly)
> and then includes  again, but the later include should be
> ignored due "#pragma once" in stat.h.
> So I have no objection to the removal of that include.

Thanks, that matches my impression.  There was one comment at the top
of _pgstat64() that was still incorrect.  I have spent more time
checking and tested this stuff today, and that looks fine.  One
question pending is if we should increase the timeout used by WIN32's
stat() and open(), but the buildfarm should be able to tell us more
here.
--
Michael


signature.asc
Description: PGP signature

Re: Skipping logical replication transactions on subscriber side

2021-07-11 Thread Alexey Lesovsky

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila  wrote:

> >
> > Ok, looks nice. But I am curious how this will work in the case when
> there are two (or more) errors in the same subscription, but different
> relations?
> >
>
> We can't proceed unless the first error is resolved, so there
> shouldn't be multiple unresolved errors.
>

Ok. I thought multiple errors are possible when many tables are initialized
using parallel workers (with max_sync_workers_per_subscription > 1).

-- 
Regards, Alexey

Re: Skipping logical replication transactions on subscriber side

2021-07-11 Thread Amit Kapila

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky  wrote:
>
> On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila  wrote:
>>
>> >
>> > Ok, looks nice. But I am curious how this will work in the case when there 
>> > are two (or more) errors in the same subscription, but different relations?
>> >
>>
>> We can't proceed unless the first error is resolved, so there
>> shouldn't be multiple unresolved errors.
>
>
> Ok. I thought multiple errors are possible when many tables are initialized 
> using parallel workers (with max_sync_workers_per_subscription > 1).
>

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

-- 
With Regards,
Amit Kapila.

Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-07-11 Thread David Rowley

On Wed, 7 Jul 2021 at 20:37, David Rowley  wrote:
>
> On Wed, 7 Jul 2021 at 00:06, Greg Nancarrow  wrote:
> > I think if you're going to reject this patch, a brief comment should
> > be added to that code to justify why that existing superfluous check
> > is worthwhile.
>
> It seems strange to add a comment to explain why it's there. If we're
> going to the trouble of doing that, then we should just remove it and
> add a very small comment to mention why INT8 sequences don't need to
> be checked.

Any thoughts on this, Greg?

David

Re: row filtering for logical replication

2021-07-11 Thread Amit Kapila

On Mon, Jul 12, 2021 at 7:19 AM Alvaro Herrera  wrote:
>
> Hi
>
> Andres complained about the safety of doing general expression
> evaluation in pgoutput; that was first in
>
> https://postgr.es/m/20210128022032.eq2qqc6zxkqn5...@alap3.anarazel.de
> where he described a possible approach to handle it by restricting
> expressions to have limited shape; and later in
> http://postgr.es/m/20210331191710.kqbiwe73lur7j...@alap3.anarazel.de
>
> I was just scanning the patch trying to see if some sort of protection
> had been added for this, but I couldn't find anything.  (Some functions
> are under-commented, though).  So, is it there already, and if so what
> is it?
>

I think the patch is trying to prohibit arbitrary expressions in the
WHERE clause via
transformWhereClause(..EXPR_KIND_PUBLICATION_WHERE..). You can notice
that at various places the expressions are prohibited via
EXPR_KIND_PUBLICATION_WHERE. I am not sure that the checks are correct
and sufficient but I think there is some attempt to do it. For
example, the below sort of ad-hoc check for func_call doesn't seem to
be good idea.

@@ -119,6 +119,13 @@ transformExprRecurse(ParseState *pstate, Node *expr)
  /* Guard against stack overflow due to overly complex expressions */
  check_stack_depth();

+ /* Functions are not allowed in publication WHERE clauses */
+ if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE &&
nodeTag(expr) == T_FuncCall)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("functions are not allowed in publication WHERE expressions"),
+ parser_errposition(pstate, exprLocation(expr;

Now, the other idea I had in mind was to traverse the WHERE clause
expression in publication_add_relation and identify if it contains
anything other than the ANDed list of 'foo.bar op constant'
expressions. OTOH, for index where clause expressions or policy check
expressions, we use a technique similar to what we have in the patch
to prohibit certain kinds of expressions.

Do you have any preference on how this should be addressed?

-- 
With Regards,
Amit Kapila.

Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-07-11 Thread Greg Nancarrow

On Mon, Jul 12, 2021 at 2:26 PM David Rowley  wrote:
>
> > It seems strange to add a comment to explain why it's there. If we're
> > going to the trouble of doing that, then we should just remove it and
> > add a very small comment to mention why INT8 sequences don't need to
> > be checked.
>
> Any thoughts on this, Greg?
>

The patch LGTM (it's the same as my original patch but with short comments).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-07-11 Thread Amul Sul

On Fri, Jul 9, 2021 at 7:30 PM Alvaro Herrera  wrote:
>
> On 2021-Jul-09, Amul Sul wrote:
>
> > > On Tue, Jul 6, 2021 at 11:06 PM Tom Lane  wrote:
>
> > > > The point of the static-inline function idea was to be cheap enough
> > > > that it isn't worth worrying about this sort of risky optimization.
> > > > Given that an smgr function is sure to involve some kernel calls,
> > > > I doubt it's worth sweating over an extra test-and-branch beforehand.
> > > > So where I was hoping to get to is that smgr objects are *only*
> > > > referenced by RelationGetSmgr() calls and nobody ever keeps any
> > > > other pointers to them across any non-smgr operations.
>
> > Herewith attached version did the same, thanks.
>
> I think it would be valuable to have a comment in that function to point
> out what is the function there for.

Thanks for the suggestion, added the same in the attached version.

Regards,
Amul


v5_Add-RelationGetSmgr-inline-function.patch
Description: Binary data

Re: More time spending with "delete pending"

2021-07-11 Thread Michael Paquier

On Mon, Jul 12, 2021 at 01:07:17PM +0900, Michael Paquier wrote:
> Thanks, that matches my impression.  There was one comment at the top
> of _pgstat64() that was still incorrect.  I have spent more time
> checking and tested this stuff today, and that looks fine.  One
> question pending is if we should increase the timeout used by WIN32's
> stat() and open(), but the buildfarm should be able to tell us more
> here.

And fairywren, that uses MinGW, is unhappy:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-07-12%2004%3A47%3A13
Looking at it now.
--
Michael


signature.asc
Description: PGP signature

Re: Teach pg_receivewal to use lz4 compression

2021-07-11 Thread Dilip Kumar

On Thu, Jul 8, 2021 at 7:48 PM  wrote:
>
> Dilip Kumar wrote:
>
> > Wouldn't it be better to call it compression method instead of
> > compression program?
>
> Agreed. This is inline with the suggestions of other reviewers.
> Find the change in the attached patch.

Thanks, I will have a look.

> > I think we can somehow use "acceleration" parameter of lz4 compression
> > to map on compression level, It is not direct mapping but
> > can't we create some internal mapping instead of completely ignoring
> > this option for lz4, or we can provide another option for lz4?
>
> We can, though I am not in favour of doing so. There is seemingly
> little benefit for added complexity.

I am really not sure what complexity you are talking about, do you
mean since with pglz we were already providing the compression level
so let it be as it is but with the new compression method you don't
see much benefit of providing compression level or speed?

> > Should we also support LZ4 compression using dictionary?
>
> I would we should not do that. If my understanding is correct,
> decompression would require the dictionary to be passed along.
> The algorithm seems to be very competitive to the current compression
> as is.

I agree, we might not go for a dictionary because we would need to
dictionary to decompress as well.  So that will add an extra
complexity for user.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Skipping logical replication transactions on subscriber side

2021-07-11 Thread Masahiko Sawada

On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila  wrote:
>
> On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky  wrote:
> >
> > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila  wrote:
> >>
> >> >
> >> > Ok, looks nice. But I am curious how this will work in the case when 
> >> > there are two (or more) errors in the same subscription, but different 
> >> > relations?
> >> >
> >>
> >> We can't proceed unless the first error is resolved, so there
> >> shouldn't be multiple unresolved errors.
> >
> >
> > Ok. I thought multiple errors are possible when many tables are initialized 
> > using parallel workers (with max_sync_workers_per_subscription > 1).
> >
>
> Yeah, that is possible but that covers under the second condition
> mentioned by me and in such cases I think we should have separate rows
> for each tablesync. Is that right, Sawada-san or do you have something
> else in mind?

Yeah, I agree to have separate rows for each table sync. The table
should not be processed by both the table sync worker and the apply
worker at a time so the pair of subscription OID and relation OID will
be unique. I think that we have a boolean column in the view,
indicating whether the error entry is reported by the table sync
worker or the apply worker, or maybe we also can have the action
column show "TABLE SYNC" if the error is reported by the table sync
worker.

When it comes to removing the subscription errors in
pgstat_vacuum_stat(), I think we need to seq scan on the hash table
and send the messages to purge the subscription error entries.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: More time spending with "delete pending"

2021-07-11 Thread Michael Paquier

On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:
> And fairywren, that uses MinGW, is unhappy:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-07-12%2004%3A47%3A13
> Looking at it now.

I am not sure what to do here to cool down MinGW, so the patch has
been reverted for now:
- Plain stat() is not enough to do a proper detection of the files
pending for deletion on MSVC, so reverting only the part with
microsoft_native_stat() is not going to help.
- We are going to need an evaluation of the stat() implementation in
MinGW and check if is it possible to rely on it more.  If yes, we
could get away with more tweaks based on __MINGW32__/__MINGW64__.
--
Michael


signature.asc
Description: PGP signature

Re: Teach pg_receivewal to use lz4 compression

2021-07-11 Thread Michael Paquier

On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> On Thu, Jul 8, 2021 at 7:48 PM  wrote:
>> We can, though I am not in favour of doing so. There is seemingly
>> little benefit for added complexity.
> 
> I am really not sure what complexity you are talking about, do you
> mean since with pglz we were already providing the compression level
> so let it be as it is but with the new compression method you don't
> see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps?  I am not sure what Georgios has
in mind, but my opinion stands on the latter: there is little benefit
in making lz4 faster than the default and reduce compression, as the
default is already a rather low CPU user.
--
Michael


signature.asc
Description: PGP signature

Re: More time spending with "delete pending"

2021-07-11 Thread Alexander Lakhin

Hello Michael,
12.07.2021 08:52, Michael Paquier wrote:
> On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:
>> And fairywren, that uses MinGW, is unhappy:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-07-12%2004%3A47%3A13
>> Looking at it now.
> I am not sure what to do here to cool down MinGW, so the patch has
> been reverted for now:
> - Plain stat() is not enough to do a proper detection of the files
> pending for deletion on MSVC, so reverting only the part with
> microsoft_native_stat() is not going to help.
> - We are going to need an evaluation of the stat() implementation in
> MinGW and check if is it possible to rely on it more.  If yes, we
> could get away with more tweaks based on __MINGW32__/__MINGW64__.
I'll take care of it and will prepare mingw-compatible patch tomorrow.

Best regards,
Alexander

Re: More time spending with "delete pending"

2021-07-11 Thread Michael Paquier

On Mon, Jul 12, 2021 at 09:06:01AM +0300, Alexander Lakhin wrote:
> I'll take care of it and will prepare mingw-compatible patch tomorrow.

Thanks.  Do you have an environment with 32b or 64b MinGW?
--
Michael


signature.asc
Description: PGP signature

Re: More time spending with "delete pending"

2021-07-11 Thread Alexander Lakhin

12.07.2021 09:23, Michael Paquier wrote:
> On Mon, Jul 12, 2021 at 09:06:01AM +0300, Alexander Lakhin wrote:
>> I'll take care of it and will prepare mingw-compatible patch tomorrow.
> Thanks.  Do you have an environment with 32b or 64b MinGW?
Yes, I have VMs with 32- and 64-bit mingw environments.

Best regards,
Alexander

Re: Introduce pg_receivewal gzip compression tests

2021-07-11 Thread Michael Paquier

On Fri, Jul 09, 2021 at 11:26:58AM +, Georgios wrote:
> As suggested on a different thread [1], pg_receivewal can increase it's test
> coverage. There exists a non trivial amount of code that handles gzip
> compression. The current patch introduces tests that cover creation of gzip
> compressed WAL files and the handling of gzip partial segments. Finally the
> integrity of the compressed files is verified.

+   # Verify compressed file's integrity
+   my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
+   is($gzip_is_valid, 0, "program gzip verified file's integrity");
libz and gzip are usually split across different packages, hence there
is no guarantee that this command is always available (same comment as
for LZ4 from a couple of days ago).

+   [
+   'pg_receivewal', '-D', $stream_dir, '--verbose',
+   '--endpos',  $nextlsn, '-Z', '5'
+   ],
I would keep the compression level to a minimum here, to limit CPU
usage but still compress something faster.

+   # Verify compressed file's integrity
+   my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
+   is($gzip_is_valid, 0, "program gzip verified file's integrity");
Shouldn't this be coded as a loop going through @gzip_wals?
--
Michael


signature.asc
Description: PGP signature

Re: Speed up transaction completion faster after many relations are accessed in a transaction

2021-07-11 Thread David Rowley

Thanks for having a look at this.

On Mon, 21 Jun 2021 at 05:02, Zhihong Yu  wrote:
> + *   GH_ELEMENT_TYPE defines the data type that the hashtable stores.  Each
> + *   instance of GH_ELEMENT_TYPE which is stored in the hash table is done so
> + *   inside a GH_SEGMENT.
>
> I think the second sentence can be written as (since done means stored, it is 
> redundant):

I've rewords this entire paragraph slightly.

> Each instance of GH_ELEMENT_TYPE is stored in the hash table inside a 
> GH_SEGMENT.
>
> + * Macros for translating a bucket's index into the segment and another to
> + * determine the item number within the segment.
> + */
> +#define GH_INDEX_SEGMENT(i)(i) / GH_ITEMS_PER_SEGMENT
>
> into the segment -> into the segment number (in the code I see segidx but I 
> wonder if segment index may cause slight confusion).

I've adjusted this comment

> +   GH_BITMAP_WORD used_items[GH_BITMAP_WORDS]; /* A 1-bit for each used item
> +* in the items array */
>
> 'A 1-bit' -> One bit (A and 1 mean the same)

I think you might have misread this. We're storing a 1-bit for each
used item rather than a 0-bit.  If I remove the 'A' then it's not
clear what the meaning of each bit's value is.

> +   uint32  first_free_segment;
>
> Since the segment may not be totally free, maybe name the field 
> first_segment_with_free_slot

I don't really like that. It feels too long to me.

> + * This is similar to GH_NEXT_ONEBIT but flips the bits before operating on
> + * each GH_BITMAP_WORD.
>
> It seems the only difference from GH_NEXT_ONEBIT is in this line:
>
> +   GH_BITMAP_WORD word = ~(words[wordnum] & mask); /* flip bits */
>
> If a 4th parameter is added to signify whether the flipping should be done, 
> these two functions can be unified.

I don't want to do that.  I'd rather have them separate to ensure the
compiler does not create any additional needless branching. Those
functions are pretty hot.

> +* next insert will store in this segment.  If it's already pointing to an
> +* earlier segment, then leave it be.
>
> The last sentence is confusing: first_free_segment cannot point to some 
> segment and earlier segment at the same time.
> Maybe drop the last sentence.

I've adjusted this comment to become:

* Check if we need to lower the next_free_segment.  We want new inserts
* to fill the lower segments first, so only change the first_free_segment
* if the removed entry was stored in an earlier segment.

Thanks for having a look at this.

I'll attach an updated patch soon.

David

45 matches

Mail list logo