Re: Summary Sort workers Stats in EXPLAIN ANALYZE

2022-03-21 Thread Jian Guo
There is some problem with the last patch, I have removed the 
`ExplainOpenWorker` call to fix.

And also, I have added a test case in explain.sql​ according to the code change.

From: Jian Guo 
Sent: Monday, March 21, 2022 11:36
To: pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Summary Sort workers Stats in EXPLAIN ANALYZE


In current EXPLAIN ANALYZE implementation, the Sort Node stats from each 
workers are not summarized: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2762

When the worker number is large, it will print out huge amount of node details 
in the plan. I have created this patch to summarize the tuplesort stats by 
AverageSpaceUsed / PeakSpaceUsed, make it behave just like in 
`show_incremental_sort_group_info()`: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2890

From 5b044523ee16bdae7b998ddd31fca92434e5028a Mon Sep 17 00:00:00 2001
From: Aegeaner Guo 
Date: Mon, 21 Mar 2022 11:19:46 +0800
Subject: [PATCH] Summary Sort workers Stats.

Signed-off-by: Jian Guo 
---
 src/backend/commands/explain.c|  44 +--
 src/test/regress/expected/explain.out | 523 ++
 src/test/regress/sql/explain.sql  |  13 +-
 3 files changed, 392 insertions(+), 188 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..57409bdea2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2758,40 +2758,42 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
+		const char *sortMethod;
+		const char *spaceType;
+		int64		peakSpaceUsed = 0;
+		int64		totalSpaceUsed = 0;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
 			TuplesortInstrumentation *sinstrument;
-			const char *sortMethod;
-			const char *spaceType;
-			int64		spaceUsed;
 
 			sinstrument = &sortstate->shared_info->sinstrument[n];
 			if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
 continue;		/* ignore any unfilled slots */
 			sortMethod = tuplesort_method_name(sinstrument->sortMethod);
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
-			spaceUsed = sinstrument->spaceUsed;
+			peakSpaceUsed = Max(peakSpaceUsed, sinstrument->spaceUsed);
+			totalSpaceUsed += sinstrument->spaceUsed;
+		}
 
-			if (es->workers_state)
-ExplainOpenWorker(n, es);
+		int64 avgSpaceUsed = sortstate->shared_info->num_workers > 0 ?
+totalSpaceUsed / sortstate->shared_info->num_workers : 0;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-ExplainIndentText(es);
-appendStringInfo(es->str,
- "Sort Method: %s  %s: " INT64_FORMAT "kB\n",
- sortMethod, spaceType, spaceUsed);
-			}
-			else
-			{
-ExplainPropertyText("Sort Method", sortMethod, es);
-ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
-ExplainPropertyText("Sort Space Type", spaceType, es);
-			}
+		ExplainPropertyInteger("Workers planned", NULL, sortstate->shared_info->num_workers, es);
 
-			if (es->workers_state)
-ExplainCloseWorker(n, es);
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str,
+			 "Sort Method: %s  %s: " INT64_FORMAT INT64_FORMAT "kB\n",
+			 sortMethod, spaceType, avgSpaceUsed, peakSpaceUsed);
+		}
+		else
+		{
+			ExplainPropertyText("Sort Method", sortMethod, es);
+			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpaceUsed, es);
+			ExplainPropertyInteger("Peak Sort Space Used", "kB", peakSpaceUsed, es);
+			ExplainPropertyText("Sort Space Type", spaceType, es);
 		}
 	}
 }
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index bc36175921..25a5744a30 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -

Re: Patch proposal - parameter to limit amount of FPW because of hint bits per second

2022-03-21 Thread Michail Nikolaev
Hello, Peter.

> * Instead of avoiding the FPI when this happens, proactively call
> _bt_simpledel_pass() just before _bt_killitems() returns. Accept the
> immediate cost of setting an LP_DEAD bit, just like today, but avoid
> repeated FPIs.

Hm, not sure here
AFAIK current implementation does not produce repeated FPIs. Page is
marked as dirty on the first bit. So, others LP_DEAD (if not set by
single scan) do not generate FPI until checkpoint is ready.
Also, the issue affects GITS and HASH indexes and HEAP pages.

Best regards,
Michail.




Re: Support logical replication of DDLs

2022-03-21 Thread Dilip Kumar
On Thu, Mar 17, 2022 at 2:47 AM Zheng Li  wrote:
>
> Hi,
>
> >If you don't mind, would you like to share the POC or the branch for this 
> >work?
>
> The POC patch is attached. It currently supports the following 
> functionalities:

Thanks for sharing, I will look into it.

> >In such cases why don't we just log the table creation WAL for DDL
> >instead of a complete statement which creates the table and inserts
> >the tuple? Because we are already WAL logging individual inserts and
> >once you make sure of replicating the table creation I think the exact
> >data insertion on the subscriber side will be taken care of by the
> >insert WALs no?
>
> The table creation WAL and table insert WAL are available. The tricky
> part is how do we break down this command into two parts (a normal
> CREATE TABLE followed by insertions) either from the parsetree or the
> WALs. I’ll have to dig more on this.

I agree that this is a bit tricky, anyway I will also put more thoughts on this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: WIP: WAL prefetch (another approach)

2022-03-21 Thread Julien Rouhaud
Hi,

On Sun, Mar 20, 2022 at 05:36:38PM +1300, Thomas Munro wrote:
> On Fri, Mar 18, 2022 at 9:59 AM Thomas Munro  wrote:
> > I'll push 0001 today to let the build farm chew on it for a few days
> > before moving to 0002.
> 
> Clearly 018_wal_optimize.pl is flapping and causing recoveryCheck to
> fail occasionally, but that predates the above commit.  I didn't
> follow the existing discussion on that, so I'll try to look into that
> tomorrow.
> 
> Here's a rebase of the 0002 patch, now called 0001

So I finally finished looking at this patch.  Here again, AFAICS the feature is
working as expected and I didn't find any problem.  I just have some minor
comments, like for the previous patch.

For the docs:

+Whether to try to prefetch blocks that are referenced in the WAL that
+are not yet in the buffer pool, during recovery.  Valid values are
+off (the default), on and
+try.  The setting try enables
+prefetching only if the operating system provides the
+posix_fadvise function, which is currently used
+to implement prefetching.  Note that some operating systems provide the
+function, but don't actually perform any prefetching.

Is there any reason not to change it to try?  I'm wondering if some system says
that the function exists but simply raise an error if you actually try to use
it.  I think that at least WSL does that for some functions.

+  
+   The  parameter can
+   be used to improve I/O performance during recovery by instructing
+   PostgreSQL to initiate reads
+   of disk blocks that will soon be needed but are not currently in
+   PostgreSQL's buffer pool.
+   The  and
+settings limit prefetching
+   concurrency and distance, respectively.
+   By default, prefetching in recovery is disabled.
+  

I think that "improving I/O performance" is a bit misleading, maybe reduce I/O
wait time or something like that?  Also, I don't know if we need to be that
precise, but maybe we should say that it's the underlying kernel that will
(asynchronously) initiate the reads, and postgres will simply notifies it.


+  
+   The pg_stat_prefetch_recovery view will contain 
only
+   one row.  It is filled with nulls if recovery is not running or WAL
+   prefetching is not enabled.  See 
+   for more information.
+  

That's not the implemented behavior as far as I can see.  It just prints 
whatever is in SharedStats
regardless of the recovery state or the prefetch_wal setting (assuming that
there's no pending reset request).  Similarly, there's a mention that
pg_stat_reset_shared('wal') will reset the stats, but I don't see anything
calling XLogPrefetchRequestResetStats().

Finally, I think we should documented what are the cumulated counters in that
view (that should get reset) and the dynamic counters (that shouldn't get
reset).

For the code:

 bool
 XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
   RelFileNode *rnode, ForkNumber *forknum, BlockNumber *blknum)
+{
+   return XLogRecGetBlockInfo(record, block_id, rnode, forknum, blknum, NULL);
+}
+
+bool
+XLogRecGetBlockInfo(XLogReaderState *record, uint8 block_id,
+   RelFileNode *rnode, ForkNumber *forknum,
+   BlockNumber *blknum,
+   Buffer *prefetch_buffer)
 {

It's missing comments on that function.  XLogRecGetBlockTag comments should
probably be reworded at the same time.

+ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
   bool fetching_ckpt, TimeLineID replayTLI)
 {
XLogRecord *record;
+   XLogReaderState *xlogreader = XLogPrefetcherReader(xlogprefetcher);

nit: maybe name it XLogPrefetcherGetReader()?

  * containing it (if not open already), and returns true. When end of standby
  * mode is triggered by the user, and there is no more WAL available, returns
  * false.
+ *
+ * If nonblocking is true, then give up immediately if we can't satisfy the
+ * request, returning XLREAD_WOULDBLOCK instead of waiting.
  */
-static bool
+static XLogPageReadResult
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,

The comment still mentions a couple of time returning true/false rather than
XLREAD_*, same for at least XLogPageRead().

@@ -3350,6 +3392,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,
 */
if (lastSourceFailed)
{
+   /*
+* Don't allow any retry loops to occur during nonblocking
+* readahead.  Let the caller process everything that has been
+* decoded already first.
+*/
+   if (nonblocking)
+   return XLREAD_WOULDBLOCK;

Is that really enough?  I'm wondering if the code path in ReadRecord() that
forces lastSourceFailed to False while it actually failed when switching into
archive recovery (xlogrecovery.c around line 3044) can be problematic here.


{"wal_decode_buffer_size", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
gettext_noo

[PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Aleksander Alekseev
Hi hackers,

I learned from Tom [1] that we can simplify the code like:

```
char buff[32];
snprintf(buf, sizeof(buf), INT64_FORMAT, ...)
ereport(WARNING, (errmsg("%s ...", buf)));
```

... and rely on %lld/%llu now as long as we explicitly cast the
argument to long long int / unsigned long long. This was previously
addressed in 6a1cd8b9 and d914eb34, but I see more places where we
still use an old approach.

Suggested patch fixes this. Tested locally - no warnings; passes all the tests.

[1] https://www.postgresql.org/message-id/771048.1647528068%40sss.pgh.pa.us

-- 
Best regards,
Aleksander Alekseev


v1-0001-Remove-workarounds-to-format-u-int64-s.patch
Description: Binary data


Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Pavel Borisov
пн, 21 мар. 2022 г. в 12:52, Aleksander Alekseev :

> Hi hackers,
>
> I learned from Tom [1] that we can simplify the code like:
>
> ```
> char buff[32];
> snprintf(buf, sizeof(buf), INT64_FORMAT, ...)
> ereport(WARNING, (errmsg("%s ...", buf)));
> ```
>
> ... and rely on %lld/%llu now as long as we explicitly cast the
> argument to long long int / unsigned long long. This was previously
> addressed in 6a1cd8b9 and d914eb34, but I see more places where we
> still use an old approach.
>
> Suggested patch fixes this. Tested locally - no warnings; passes all the
> tests.
>
> [1]
> https://www.postgresql.org/message-id/771048.1647528068%40sss.pgh.pa.us
>
> Hi, Alexander!
Probably you can do (long long) instead of (long long int). It is shorter
and this is used elsewhere in the code.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Aleksander Alekseev
Hi Pavel,

> Probably you can do (long long) instead of (long long int). It is shorter and 
> this is used elsewhere in the code.

Thanks! Here is the updated patch. I also added Reviewed-by: and
Discussion: to the commit message.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Remove-workarounds-to-format-u-int64-s.patch
Description: Binary data


Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Pavel Borisov
>
> > Probably you can do (long long) instead of (long long int). It is
> shorter and this is used elsewhere in the code.
>
> Thanks! Here is the updated patch. I also added Reviewed-by: and
> Discussion: to the commit message.
>
Thanks, Alexander!
I suggest the patch is in a good shape to be committed.
(
Maybe some strings that don't fit screen cloud be reflowed:
 (long long int)seqdataform->last_value, (long long int)seqform->seqmax)));
)

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Problem with moderation of messages with patched attached.

2022-03-21 Thread Dave Page
On Sun, 20 Mar 2022 at 13:52, Andrew Dunstan  wrote:

>
> On 3/19/22 14:48, Andres Freund wrote:
> > Hi,
> >
> > On 2022-03-03 13:37:35 +, Dave Page wrote:
> >> On Thu, 3 Mar 2022 at 13:28, Pavel Borisov 
> wrote:
> >>
> >>> The mail system doesn't have the capability to apply different
> moderation
>  rules for people in that way I'm afraid.
> 
> >>> Maybe then 2MB for everyone? Otherwise it's not so convenient. Lead to
> >>> answers before the questions in the thread [1], seems weird.
> >>>
> >> Then someone will complain if their patch is 2.1MB! How often are
> messages
> >> legitimately over 1MB anyway, even with a patch? I don't usually
> moderate
> >> -hackers, so I don't know if this is a common thing or not.
> > I don't think it's actually that rare. But most contributors writing that
> > large patchsets know about the limit and work around it - I gzip patches
> when
> > I see the email getting too large. But it's more annoying to work with
> for
> > reviewers.
> >
> > It's somewhat annoying. If you e.g. append a few graphs of performance
> changes
> > and a patch it's pretty easy to get into the range where compressing
> won't
> > help anymore.
> >
> > And sure, any limit may be hit by somebody. But 1MB across the whole
> email
> > seems pretty low these days.
> >
>
> Of course we could get complaints no matter what level we set the limit
> at. I think raising it to 2Mb would be a reasonable experiment. If no
> observable evil ensues then leave it that way. If it does then roll it
> back. I agree that plain uncompressed patches are easier to deal with in
> general.
>

Thanks for the reminder :-)

I've bumped the limit to 2MB.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


Re: [PATCH] Add reloption for views to enable RLS

2022-03-21 Thread Laurenz Albe
On Sat, 2022-03-19 at 01:10 +, Dean Rasheed wrote:
> I have been hacking on it a bit, and attached is an updated version.
> Aside from some general copy editing, the most notable changes are:
> [...]

Thanks for your diligent work on this, and the patch looks good to me.
It is good that you found the oversight in LOCK - I wasn't even
aware that views could be locked.

Yours,
Laurenz Albe





Re: logical replication empty transactions

2022-03-21 Thread Amit Kapila
On Sat, Mar 19, 2022 at 9:10 AM Ajin Cherian  wrote:
>
> On Thu, Mar 17, 2022 at 10:43 PM Amit Kapila  wrote:
>
> > 3. Can we add a simple test for it in one of the existing test
> > files(say in 001_rep_changes.pl)?
>
> added a simple test.
>

This doesn't verify if the transaction is skipped. I think we should
extend this test to check for a DEBUG message in the Logs (you need to
probably set log_min_messages to DEBUG1 for this test). As an example,
you can check the patch [1]. Also, it seems by mistake you have added
wait_for_catchup() twice.

Few other comments:
=
1. Let's keep the parameter name as skipped_empty_xact in
OutputPluginUpdateProgress so as to not confuse with the other patch's
[2] keep_alive parameter. I think in this case we must send the
keep_alive message so as to not make the syncrep wait whereas in the
other patch we only need to send it periodically based on
wal_sender_timeout parameter.
2. The new function SyncRepEnabled() seems confusing to me as the
comments in SyncRepWaitForLSN() clearly state why we need to first
read the parameter 'sync_standbys_defined' without any lock then read
it again with a lock if the parameter is true. So, I just put that
check back and also added a similar check in WalSndUpdateProgress.
3.
@@ -1392,11 +1481,21 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
  continue;

  relids[nrelids++] = relid;
+
+ /* Send BEGIN if we haven't yet */
+ if (txndata && !txndata->sent_begin_txn)
+ pgoutput_send_begin(ctx, txn);
  maybe_send_schema(ctx, change, relation, relentry);
  }

  if (nrelids > 0)
  {
+ txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ /* Send BEGIN if we haven't yet */
+ if (txndata && !txndata->sent_begin_txn)
+ pgoutput_send_begin(ctx, txn);
+

Why do we need to try sending the begin in the second check? I think
it should be sufficient to do it in the above loop.

I have made these and a number of other changes in the attached patch.
Do let me know what you think of the attached?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JbLRj6pSUENfDFsqj0%2BadNob_%3DRPXpnUnWFBskVi5JhA%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1LGnaPuWs2M4sDfpd6JQZjoh4DGAsgUvNW%3DOr8i9z6K8w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.


v27-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data


Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Japin Li

On Mon, 21 Mar 2022 at 17:23, Aleksander Alekseev  
wrote:
> Hi Pavel,
>
>> Probably you can do (long long) instead of (long long int). It is shorter 
>> and this is used elsewhere in the code.
>
> Thanks! Here is the updated patch. I also added Reviewed-by: and
> Discussion: to the commit message.

Hi,

After apply the patch, I found pg_checksums.c also has the similar code.

In progress_report(), I'm not sure we can do this replace for this code.

snprintf(total_size_str, sizeof(total_size_str), INT64_FORMAT,
 total_size / (1024 * 1024));
snprintf(current_size_str, sizeof(current_size_str), INT64_FORMAT,
 current_size / (1024 * 1024));

fprintf(stderr, _("%*s/%s MB (%d%%) computed"),
(int) strlen(current_size_str), current_size_str, total_size_str,
percent);

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 7e69475947..e45e436683 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -657,11 +657,11 @@ main(int argc, char *argv[])
 			progress_report(true);
 
 		printf(_("Checksum operation completed\n"));
-		printf(_("Files scanned:   %s\n"), psprintf(INT64_FORMAT, files_scanned));
-		printf(_("Blocks scanned:  %s\n"), psprintf(INT64_FORMAT, blocks_scanned));
+		printf(_("Files scanned:   %lld\n"), files_scanned);
+		printf(_("Blocks scanned:  %lld\n"), blocks_scanned);
 		if (mode == PG_MODE_CHECK)
 		{
-			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Bad checksums:  %lld\n"), badblocks);
 			printf(_("Data checksum version: %u\n"), ControlFile->data_checksum_version);
 
 			if (badblocks > 0)
@@ -669,8 +669,8 @@ main(int argc, char *argv[])
 		}
 		else if (mode == PG_MODE_ENABLE)
 		{
-			printf(_("Files written:  %s\n"), psprintf(INT64_FORMAT, files_written));
-			printf(_("Blocks written: %s\n"), psprintf(INT64_FORMAT, blocks_written));
+			printf(_("Files written:  %lld\n"), files_written);
+			printf(_("Blocks written: %lld\n"), blocks_written);
 		}
 	}
 


Re: Detaching a partition with a FK on itself is not possible

2022-03-21 Thread Jehan-Guillaume de Rorthais
Hi,

On Thu, 17 Mar 2022 17:58:04 +
Arne Roland  wrote:

> I don't think this a bug, but a feature request. I therefore think hackers
> would be more appropriate.

+1

I changed the list destination

> I don't see how an additional syntax to modify the constraint should help.

Me neiher.

> If I'd want to fix this, I'd try to teach the detach partition code about
> self referencing foreign keys. It seems to me like that would be the cleanest
> solution, because the user doesn't need to care about this at all.

Teaching the detach partition about self referencing means either:

* it's safe to remove the FK
* we can rewrite the FK for self referencing

Both solution are not ideal from the original schema and user perspective.

Another solution could be to teach the create partition to detect a self
referencing FK and actually create a self referencing FK, not pointing to the
partitioned table, and of course issuing a NOTICE to the client.





Re: Column Filtering in Logical Replication

2022-03-21 Thread Amit Kapila
On Sun, Mar 20, 2022 at 4:53 PM Tomas Vondra
 wrote:
>
> On 3/20/22 07:23, Amit Kapila wrote:
> > On Sun, Mar 20, 2022 at 8:41 AM Amit Kapila  wrote:
> >>
> >> On Fri, Mar 18, 2022 at 10:42 PM Tomas Vondra
> >>  wrote:
> >>
> >>> So the question is why those two sync workers never complete - I guess
> >>> there's some sort of lock wait (deadlock?) or infinite loop.
> >>>
> >>
> >> It would be a bit tricky to reproduce this even if the above theory is
> >> correct but I'll try it today or tomorrow.
> >>
> >
> > I am able to reproduce it with the help of a debugger. Firstly, I have
> > added the LOG message and some While (true) loops to debug sync and
> > apply workers. Test setup
> >
> > Node-1:
> > create table t1(c1);
> > create table t2(c1);
> > insert into t1 values(1);
> > create publication pub1 for table t1;
> > create publication pu2;
> >
> > Node-2:
> > change max_sync_workers_per_subscription to 1 in potgresql.conf
> > create table t1(c1);
> > create table t2(c1);
> > create subscription sub1 connection 'dbname = postgres' publication pub1;
> >
> > Till this point, just allow debuggers in both workers just continue.
> >
> > Node-1:
> > alter publication pub1 add table t2;
> > insert into t1 values(2);
> >
> > Here, we have to debug the apply worker such that when it tries to
> > apply the insert, stop the debugger in function apply_handle_insert()
> > after doing begin_replication_step().
> >
> > Node-2:
> > alter subscription sub1 set pub1, pub2;
> >
> > Now, continue the debugger of apply worker, it should first start the
> > sync worker and then exit because of parameter change. All of these
> > debugging steps are to just ensure the point that it should first
> > start the sync worker and then exit. After this point, table sync
> > worker never finishes and log is filled with messages: "reached
> > max_sync_workers_per_subscription limit" (a newly added message by me
> > in the attached debug patch).
> >
> > Now, it is not completely clear to me how exactly '013_partition.pl'
> > leads to this situation but there is a possibility based on the LOGs
> > it shows.
> >
>
> Thanks, I'll take a look later. From the description it seems this is an
> issue that existed before any of the patches, right? It might be more
> likely to hit due to some test changes, but the root cause is older.
>

Yes, your understanding is correct. If my understanding is correct,
then we need probably just need some changes in the new test to make
it behave as per the current code.

-- 
With Regards,
Amit Kapila.




Re: Detaching a partition with a FK on itself is not possible

2022-03-21 Thread Arne Roland
From: Jehan-Guillaume de Rorthais 

Sent: Monday, March 21, 2022 11:36
Subject: Re: Detaching a partition with a FK on itself is not possible
 > I changed the list destination
Thanks

> Another solution could be to teach the create partition to detect a self
> referencing FK and actually create a self referencing FK, not pointing to the
> partitioned table, and of course issuing a NOTICE to the client.
That's what I meant. I didn't think about the NOTICE, but that's a good idea.

Regards
Arne



Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Aleksander Alekseev
Hi Japin,

> After apply the patch, I found pg_checksums.c also has the similar code.

Thanks for noticing it.

> In progress_report(), I'm not sure we can do this replace for this code.

I added the corresponding change as a separate commit so it can be
easily reverted if necessary.

Here is a complete patchset with some additional changes by me.

-- 
Best regards,
Aleksander Alekseev


v3-0001-Remove-workarounds-to-format-u-int64-s.patch
Description: Binary data


v3-0002-Remove-workarounds-to-format-int64-s-in-pg_checks.patch
Description: Binary data


Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Aleksander Alekseev
Hi Japin,

> As Tom said in [1], we don't need to touch the *.po files, since those
files
> are managed by the translation team.
>
> [1]
https://www.postgresql.org/message-id/1110708.1647623560%40sss.pgh.pa.us

True, but I figured that simplifying the work of the translation team would
not harm either. In any case, the committer can easily exclude these
changes from the patch, if necessary.

-- 
Best regards,
Aleksander Alekseev


Re: Column Filtering in Logical Replication

2022-03-21 Thread Amit Kapila
On Fri, Mar 18, 2022 at 8:13 PM Tomas Vondra
 wrote:
>
> Ah, thanks for reminding me - it's hard to keep track of all the issues
> in threads as long as this one.
>
> BTW do you have any opinion on the SET COLUMNS syntax? Peter Smith
> proposed to get rid of it in [1] but I'm not sure that's a good idea.
> Because if we ditch it, then removing the column list would look like this:
>
> ALTER PUBLICATION pub ALTER TABLE tab;
>
> And if we happen to add other per-table options, this would become
> pretty ambiguous.
>
> Actually, do we even want to allow resetting column lists like this? We
> don't allow this for row filters, so if you want to change a row filter
> you have to re-add the table, right?
>

We can use syntax like: "alter publication pub1 set table t1 where (c2
> 10);" to reset the existing row filter. It seems similar thing works
for column list as well ("alter publication pub1 set table t1 (c2)
where (c2 > 10)"). If I am not missing anything, I don't think we need
additional Alter Table syntax.

> So maybe we should just ditch ALTER
> TABLE entirely.
>

Yeah, I agree especially if my above understanding is correct.

-- 
With Regards,
Amit Kapila.




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-21 Thread Alvaro Herrera
On 2022-Mar-04, Michael Paquier wrote:

> d6d317d as solved the issue of tablespace paths across multiple nodes
> with the new GUC called allow_in_place_tablespaces, and is getting
> successfully used in the recovery tests as of 027_stream_regress.pl.

OK, but that means that the test suite is now not backpatchable.  The
implication here is that either we're going to commit the fix without
any tests at all on older branches, or that we're going to fix it only
in branch master.  Are you thinking that it's okay to leave this bug
unfixed in older branches?  That seems embarrasing.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo." (Augusto Pinochet a una corte de justicia)




Re: Column Filtering in Logical Replication

2022-03-21 Thread Amit Kapila
On Sat, Mar 19, 2022 at 3:56 AM Tomas Vondra
 wrote:
>
> On 3/18/22 15:43, Tomas Vondra wrote:
> >
>
> As for the issue reported by Shi-San about replica identity full and
> column filters, presumably you're referring to this:
>
>   create table tbl (a int, b int, c int);
>   create publication pub for table tbl (a, b, c);
>   alter table tbl replica identity full;
>
>   postgres=# delete from tbl;
>   ERROR:  cannot delete from table "tbl"
>   DETAIL:  Column list used by the publication does not cover the
>replica identity.
>
> I believe not allowing column lists with REPLICA IDENTITY FULL is
> expected / correct behavior. I mean, for that to work the column list
> has to always include all columns anyway, so it's pretty pointless. Of
> course, we might check that the column list contains everything, but
> considering the list does always have to contain all columns, and it
> break as soon as you add any columns, it seems reasonable (cheaper) to
> just require no column lists.
>

Fair point. We can leave this as it is.

> I also went through the patch and made the naming more consistent. The
> comments used both "column filter" and "column list" randomly, and I
> think the agreement is to use "list" so I adopted that wording.
>
>
> However, while looking at how pgoutput, I realized one thing - for row
> filters we track them "per operation", depending on which operations are
> defined for a given publication. Shouldn't we do the same thing for
> column lists, really?
>
> I mean, if there are two publications with different column lists, one
> for inserts and the other one for updates, isn't it wrong to merge these
> two column lists?
>

The reason we can't combine row filters for inserts with
updates/deletes is that if inserts have some column that is not
present in RI then during update filtering (for old tuple) it will
give an error as the column won't be present in WAL log.

OTOH, the same problem won't be there for the column list/filter patch
because all the RI columns are there in the column list (for
update/delete) and we don't need to apply a column filter for old
tuples in either update or delete.

Basically, the filter rules are slightly different for row filters and
column lists, so we need them (combine of filters) for one but not for
the other. Now, for the sake of consistency with row filters, we can
do it but as such there won't be any problem or maybe we can just add
a comment for the same in code.

-- 
With Regards,
Amit Kapila.




Re: Fix unsigned output for signed values in SLRU error reporting

2022-03-21 Thread Pavel Borisov
>
> Afaics offset etc can't be negative, so I don't think this really improves
> matters. I think there's quite a few other places where we use %u to print
> integers that we know aren't negative.
>
> If anything I think we should change the signed integers to unsigned ones.
> It
> might be worth doing that as part of
>
> https://www.postgresql.org/message-id/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com


That was one of my intentions in the mentioned patch, but I couldn't
confirm that the page number (and offset) in SLRU was used signed not by
purpose. Thank you for confirming this. I will try to replace int to
unsigned where it is relevant in SLRU as part of the mentioned thread.
Though it could be a big change worth a separate patch maybe.

Again thanks!
Pavel


Re: Add pg_freespacemap extension sql test

2022-03-21 Thread Dong Wook Lee
2022년 3월 20일 (일) 03:13, Fabrízio de Royes Mello 님이
작성:

>
>
> On Sat, Mar 19, 2022 at 1:18 PM Dong Wook Lee  wrote:
> >
> > > Well, my guess is that you basically just care about being able to
> > > detect if there is free space in the map or not, which goes down to
> > > detecting if pg_freespace() returns 0 or a number strictly higher than
> > > 0, so wouldn't it be enough to stick some > 0 in your test queries?
> >
> > I edited the previous patch file.
> > Am I correct in understanding that?
> >
>
> I think what Michael meant is something like attached.
>
> Regards,
>
> --
> Fabrízio de Royes Mello
>

I think you’re right, thank you for sending it instead of me.

>


Re: Column Filtering in Logical Replication

2022-03-21 Thread Alvaro Herrera
Hello,

Please add me to the list of authors of this patch.  I made a large
number of nontrivial changes to it early on.  Thanks.  I have modified
the entry in the CF app (which sorts alphabetically, it was not my
intention to put my name first.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Skipping logical replication transactions on subscriber side

2022-03-21 Thread Euler Taveira
On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
> I have fixed all the above comments as per your suggestion in the
> attached. Do let me know if something is missed?
Looks good to me.

> > src/test/subscription/t/029_disable_on_error.pl |  94 --
> > src/test/subscription/t/029_on_error.pl | 183 +++
> >
> > It seems you are removing a test for 
> > 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
> > I should also name 029_on_error.pl to something else such as 
> > 030_skip_lsn.pl or
> > a generic name 030_skip_option.pl.
> >
> 
> As explained in my previous email, I don't think any change is
> required for this comment but do let me know if you still think so?
Oh, sorry about the noise. I saw mixed tests between the 2 new features and I
was confused if it was intentional or not.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: [PATCH] Add reloption for views to enable RLS

2022-03-21 Thread Laurenz Albe
On Mon, 2022-03-21 at 18:09 +0800, Japin Li wrote:
> After apply the patch, I found pg_checksums.c also has the similar code.
> 
> In progress_report(), I'm not sure we can do this replace for this code.
> 
>     snprintf(total_size_str, sizeof(total_size_str), INT64_FORMAT,
>  total_size / (1024 * 1024));
>     snprintf(current_size_str, sizeof(current_size_str), INT64_FORMAT,
>  current_size / (1024 * 1024));
> 
>     fprintf(stderr, _("%*s/%s MB (%d%%) computed"),
>     (int) strlen(current_size_str), current_size_str, total_size_str,
>     percent);

I think you replied to the wrong thread...

Yours,
Laurenz Albe





Re: logical decoding and replication of sequences

2022-03-21 Thread Peter Eisentraut

On 20.03.22 23:55, Tomas Vondra wrote:

Attached is a rebased patch, addressing most of the remaining issues.


This looks okay to me, if the two FIXMEs are addressed.  Remember to 
also update protocol.sgml if you change LOGICAL_REP_MSG_SEQUENCE.





Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Justin Pryzby
On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > This patch also needs to update the other user-facing docs.
> 
> Which ones exactly?

I mean pg_basebackup -Z

-Z level
-Z [{client|server}-]method[:level]
--compress=level
--compress=[{client|server}-]method[:level]




Re: [PATCH] Add reloption for views to enable RLS

2022-03-21 Thread Japin Li


On Mon, 21 Mar 2022 at 20:40, Laurenz Albe  wrote:
> On Mon, 2022-03-21 at 18:09 +0800, Japin Li wrote:
>> After apply the patch, I found pg_checksums.c also has the similar code.
>>
>> In progress_report(), I'm not sure we can do this replace for this code.
>>
>> snprintf(total_size_str, sizeof(total_size_str), INT64_FORMAT,
>>  total_size / (1024 * 1024));
>> snprintf(current_size_str, sizeof(current_size_str), INT64_FORMAT,
>>  current_size / (1024 * 1024));
>>
>> fprintf(stderr, _("%*s/%s MB (%d%%) computed"),
>> (int) strlen(current_size_str), current_size_str, total_size_str,
>> percent);
>
> I think you replied to the wrong thread...
>


I'm sorry!  There is a problem with my email client and I didn't notice the
subject of the reply email.

Again, sorry for the noise!

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-21 Thread Robert Haas
On Sun, Mar 20, 2022 at 1:34 AM Dilip Kumar  wrote:
> I thought that way because IIUC, when we are locking the database
> tuple we are ensuring that we are calling
> ReceiveSharedInvalidMessages() right?   And IIUC
> ReceiveSharedInvalidMessages(), is designed such a way that it will
> consume all the outstanding messages and that's the reason it loops
> multiple times if it identifies that the queue is full.  And if my
> assumption here is correct then I think it is also correct that now we
> only need to worry about anyone generating new invalidations and that
> is not possible in this case.

Well, I don't see how that chain of logic addresses my concern about
sinval reset.

Mind you, I'm not sure there's an actual problem here, because I tried
testing the patch with debug_discard_caches=1 and nothing failed. But
I still don't understand WHY nothing failed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Tom Lane
Aleksander Alekseev  writes:
>> As Tom said in [1], we don't need to touch the *.po files, since those
>> files are managed by the translation team.

> True, but I figured that simplifying the work of the translation team would
> not harm either.

It would not simplify things for them at all, just mess it up.
The master copies of the .po files are kept in a different repo.
Also, I believe that extraction of new message strings is automated
already.

https://www.postgresql.org/docs/devel/nls.html

https://wiki.postgresql.org/wiki/NLS

regards, tom lane




Re: Column Filtering in Logical Replication

2022-03-21 Thread Amit Kapila
On Sat, Mar 19, 2022 at 11:11 PM Tomas Vondra
 wrote:
>
> On 3/19/22 18:11, Tomas Vondra wrote:
> > Fix a compiler warning reported by cfbot.
>
> Apologies, I failed to actually commit the fix. So here we go again.
>

Few comments:
===
1.
+/*
+ * Gets a list of OIDs of all partial-column publications of the given
+ * relation, that is, those that specify a column list.
+ */
+List *
+GetRelationColumnPartialPublications(Oid relid)
{
...
}

...
+/*
+ * For a relation in a publication that is known to have a non-null column
+ * list, return the list of attribute numbers that are in it.
+ */
+List *
+GetRelationColumnListInPublication(Oid relid, Oid pubid)
{
...
}

Both these functions are not required now. So, we can remove them.

2.
@@ -464,11 +478,11 @@ logicalrep_write_update(StringInfo out,
TransactionId xid, Relation rel,
  pq_sendbyte(out, 'O'); /* old tuple follows */
  else
  pq_sendbyte(out, 'K'); /* old key follows */
- logicalrep_write_tuple(out, rel, oldslot, binary);
+ logicalrep_write_tuple(out, rel, oldslot, binary, columns);
  }

As mentioned previously, here, we should pass NULL similar to
logicalrep_write_delete as we don't need to use column list for old
tuples.

3.
+ * XXX The name is a bit misleading, because we don't really transform
+ * anything here - we merely check the column list is compatible with the
+ * definition of the publication (with publish_via_partition_root=false)
+ * we only allow column lists on the leaf relations. So maybe rename it?
+ */
+static void
+TransformPubColumnList(List *tables, const char *queryString,
+bool pubviaroot)

The second parameter is not used in this function. As noted in the
comments, I also think it is better to rename this. How about
ValidatePubColumnList?

4.
@@ -821,6 +942,9 @@ fetch_remote_table_info(char *nspname, char *relname,
  *
  * 3) one of the subscribed publications is declared as ALL TABLES IN
  * SCHEMA that includes this relation
+ *
+ * XXX Does this actually handle puballtables and schema publications
+ * correctly?
  */
  if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 15)

Why is this comment added in the row filter code? Now, both row filter
and column list are fetched in the same way, so not sure what exactly
this comment is referring to.

5.
+/* qsort comparator for attnums */
+static int
+compare_int16(const void *a, const void *b)
+{
+ int av = *(const int16 *) a;
+ int bv = *(const int16 *) b;
+
+ /* this can't overflow if int is wider than int16 */
+ return (av - bv);
+}

The exact same code exists in statscmds.c. Do we need a second copy of the same?

6.
 static void pgoutput_row_filter_init(PGOutputData *data,
  List *publications,
  RelationSyncEntry *entry);
+
 static bool pgoutput_row_filter_exec_expr(ExprState *state,

Spurious line addition.

7. The tests in 030_column_list.pl take a long time as compared to all
other similar individual tests in the subscription folder. I haven't
checked whether there is any need to reduce some tests but it seems
worth checking.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Aleksander Alekseev
Hi Tom,

> It would not simplify things for them at all, just mess it up.
> The master copies of the .po files are kept in a different repo.
> Also, I believe that extraction of new message strings is automated
> already.

Got it, thanks. Here is the corrected patch. It includes all the
changes by me and Japin, and doesn't touch PO files.

-- 
Best regards,
Aleksander Alekseev


v4-0001-Remove-workarounds-to-format-u-int64-s.patch
Description: Binary data


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-21 Thread Dilip Kumar
On Mon, Mar 21, 2022 at 7:07 PM Robert Haas  wrote:
>
> On Sun, Mar 20, 2022 at 1:34 AM Dilip Kumar  wrote:
> > I thought that way because IIUC, when we are locking the database
> > tuple we are ensuring that we are calling
> > ReceiveSharedInvalidMessages() right?   And IIUC
> > ReceiveSharedInvalidMessages(), is designed such a way that it will
> > consume all the outstanding messages and that's the reason it loops
> > multiple times if it identifies that the queue is full.  And if my
> > assumption here is correct then I think it is also correct that now we
> > only need to worry about anyone generating new invalidations and that
> > is not possible in this case.
>
> Well, I don't see how that chain of logic addresses my concern about
> sinval reset.
>
> Mind you, I'm not sure there's an actual problem here, because I tried
> testing the patch with debug_discard_caches=1 and nothing failed. But
> I still don't understand WHY nothing failed.

Okay, I see what you are saying.  Yeah this looks like a problem to me
as well.  I will try to reproduce this issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-21 Thread Tom Lane
Aleksander Alekseev  writes:
> Got it, thanks. Here is the corrected patch. It includes all the
> changes by me and Japin, and doesn't touch PO files.

Pushed.  I removed now-unnecessary braces, reflowed some lines
as suggested by Pavel, and pgindent'ed (which insisted on adding
spaces after the casts, as is project style).

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-21 Thread Dilip Kumar
On Mon, Mar 21, 2022 at 8:29 PM Dilip Kumar  wrote:
>
> On Mon, Mar 21, 2022 at 7:07 PM Robert Haas  wrote:
> >
> > On Sun, Mar 20, 2022 at 1:34 AM Dilip Kumar  wrote:
> > > I thought that way because IIUC, when we are locking the database
> > > tuple we are ensuring that we are calling
> > > ReceiveSharedInvalidMessages() right?   And IIUC
> > > ReceiveSharedInvalidMessages(), is designed such a way that it will
> > > consume all the outstanding messages and that's the reason it loops
> > > multiple times if it identifies that the queue is full.  And if my
> > > assumption here is correct then I think it is also correct that now we
> > > only need to worry about anyone generating new invalidations and that
> > > is not possible in this case.
> >
> > Well, I don't see how that chain of logic addresses my concern about
> > sinval reset.
> >
> > Mind you, I'm not sure there's an actual problem here, because I tried
> > testing the patch with debug_discard_caches=1 and nothing failed. But
> > I still don't understand WHY nothing failed.
>
> Okay, I see what you are saying.  Yeah this looks like a problem to me
> as well.  I will try to reproduce this issue.

I tried to debug the case but I realized that somehow
CHECK_FOR_INTERRUPTS() is not calling the
AcceptInvalidationMessages() and I could not find the same while
looking into the code as well.   While debugging I noticed that
AcceptInvalidationMessages() is called multiple times but that is only
through LockRelationId() but while locking the relation we had already
closed the previous smgr because at a time we keep only one smgr open.
And that's the reason it is not hitting the issue which we think it
could. Is there any condition under which it will call
AcceptInvalidationMessages() through CHECK_FOR_INTERRUPTS() ? because
I could not see while debugging as well as in code.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: New Object Access Type hooks

2022-03-21 Thread Andrew Dunstan


On 3/17/22 23:21, Mark Dilger wrote:
> Hackers,
>
> Over in [1], Joshua proposed a new set of Object Access Type hooks based on 
> strings rather than Oids.
>
> His patch was written to be applied atop my patch for granting privileges on 
> gucs.
>
> On review of his patch, I became uncomfortable with the complete lack of 
> regression test coverage.  To be fair, he did paste a bit of testing logic to 
> the thread, but it appears to be based on pgaudit, and it is unclear how to 
> include such a test in the core project, where pgaudit is not assumed to be 
> installed.
>
> First, I refactored his patch to work against HEAD and not depend on my GUCs 
> patch.  Find that as v1-0001.  The refactoring exposed a bit of a problem.  
> To call the new hook for SET and ALTER SYSTEM commands, I need to pass in the 
> Oid of a catalog table.  But since my GUC patch isn't applied yet, there 
> isn't any such table (pg_setting_acl or whatnot) to pass.  So I'm passing 
> InvalidOid, but I don't know if that is right.  In any event, if we want a 
> new API like this, we should think a bit harder about whether it can be used 
> to check operations where no table Oid is applicable.


My first inclination is to say it's probably ok. The immediately obvious
alternative would be to create yet another set of functions that don't
have classId parameters. That doesn't seem attractive.

Modulo that issue I think patch 1 is basically ok, but we should fix the
comments in objectaccess.c.  Rather than "It is [the] entrypoint ..." we
should have something like "Oid variant entrypoint ..." and "Name
variant entrypoint ...", and also fix the function names in the comments.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-21 Thread David Christensen
On Sun, Mar 20, 2022 at 11:56 PM Thomas Munro 
wrote:

> On Mon, Mar 21, 2022 at 4:36 PM Thomas Munro 
> wrote:
> > On Sat, Feb 26, 2022 at 7:58 AM David Christensen
> >  wrote:
> > > Attached is V2 with additional feedback from this email, as well as
> the specification of the
> > > ForkNumber and FPW as specifiable options.
> >
> > Trivial fixup needed after commit 3f1ce973.
>
> [04:30:50.630] pg_waldump.c:963:26: error: format ‘%u’ expects
> argument of type ‘unsigned int *’, but argument 3 has type ‘ForkNumber
> *’ [-Werror=format=]
> [04:30:50.630] 963 | if (sscanf(optarg, "%u",
> &config.filter_by_relation_forknum) != 1 ||
> [04:30:50.630] | ~^ ~~
> [04:30:50.630] | | |
> [04:30:50.630] | | ForkNumber *
> [04:30:50.630] | unsigned int *
>
> And now that this gets to the CompilerWarnings CI task, it looks like
> GCC doesn't like an enum as a scanf %u destination (I didn't see that
> warning locally when I compiled the above fixup because clearly Clang
> is cool with it...).  Probably needs a temporary unsigned int to
> sscanf into first.
>

Do you need me to fix this, or are you incorporating that into a V4 of this
patch? (Similar to your fixup prior in this thread?)


Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Robert Haas
On Mon, Mar 21, 2022 at 9:18 AM Justin Pryzby  wrote:
> On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > > This patch also needs to update the other user-facing docs.
> >
> > Which ones exactly?
>
> I mean pg_basebackup -Z
>
> -Z level
> -Z [{client|server}-]method[:level]
> --compress=level
> --compress=[{client|server}-]method[:level]

Ah, right. Thanks.

Here's v3. I have updated that section of the documentation. I also
went and added a bunch more test cases for validation of compression
detail strings, many inspired by your examples, and fixed all the bugs
that I found in the process. I think the crashes you complained about
are now fixed, but please let me know if I have missed any. I also
added _() calls as you suggested. I searched for the "contain a an"
typo that you mentioned but was not able to find it. Can you give me a
more specific pointer?

I looked a little bit more at the compression method vs. compression
algorithm thing. I agree that there is some inconsistency in
terminology here, but I'm still not sure that we are well-served by
trying to make it totally uniform, especially if we pick the word
"method" as the standard rather than "algorithm". In my opinion,
"method" is less specific than "algorithm". If someone asks me to
choose a compression algorithm, I know that I should give an answer
like "lz4" or "zstd". If they ask me to pick a compression method, I'm
not quite sure whether they want that kind of answer or whether they
want something more detailed, like "use lz4 with compression level 3
and a 1MB block size". After all, that is (at least according to my
understanding of how English works) a perfectly valid answer to the
question "what method should I use to compress this data?" -- but not
to the question "what algorithm should I use to compress this data?".
The latter can ONLY be properly answered by saying something like
"lz4". And I think that's really the root of my hesitation to make the
kinds of changes you want here. If it's just a question of specifying
a compression algorithm and a level, I don't think using the name
"method" for the algorithm is going to be too bad. But as we enrich
the system with multiple compression algorithms each of which may have
multiple and different parameters, I think the whole thing becomes
murkier and the need for precision in language goes up.

Now that is of course an arguable position and you're welcome to
disagree with it, but I think that's part of why I'm hesitating.
Another part of it, at least for me, is that complete uniformity is
not always a positive. I suppose all of us have had the experience at
some point of reading a manual that says something like "to activate
the boil water function, press and release the 'boil water' button"
and rolled our eyes at how useless it was. It's important to me that
we don't fall into that trap. We clearly don't want to go ballistic
and have random inconsistencies in language for no reason, but at the
same time, it's not useful to tell people that METHOD should be
replaced with a compression method and LEVEL with a compression level.
I mean, if you end up saying something like that interspersed with
non-obvious information, that is OK, and I don't want to overstate the
point I'm trying to make. But it seems to me that if there's a little
variation in phrasing and we end up saying that METHOD means the
compression algorithm or that ALGORITHM means the compression method
or whatever, that can actually make things more clear. Here again it's
debatable: how much variation in phraseology is helpful, and at what
point does it just start to seem inconsistent? Well, everyone may have
their own opinion.

I'm not trying to pretend that this patch (or the existing code base)
gets this all right. But I do think that, to the extent that we have a
considered position on what to do here, we can make that change later,
perhaps even after getting some user feedback on what does and does
not make sense to other people. And I also think that what we end up
doing here may well end up being more nuanced than a blanket
search-and-replace. I'm not saying we couldn't make a blanket
search-and-replace. I just don't see it as necessarily creating value,
or being all that closely connected to the goal of this patch, which
is to quickly clean up a forward-compatibility risk before we hit
feature freeze.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: pgsql: Add option to use ICU as global locale provider

2022-03-21 Thread Christoph Berg
Re: Peter Eisentraut
> > Since the intended usage seems to be that databases should either be
> > using libc, or the ICU locales, but probably not both at the same
> > time, does it make sense to clutter the already very wide `psql -l`
> > output with two new extra columns?
> 
> Good point, let me think about that.

A possible solution might be to rip out all the locale columns except
"Encoding" from \l, and leave them in place for \l+.

For \l+, I'd suggest moving the database size and the tablespace to
the front, after owner.

Christoph




Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-21 Thread David Christensen
Updated to include the V3 fixes as well as the unsigned int/enum fix.

>


v4-0001-Add-additional-filtering-options-to-pg_waldump.patch
Description: Binary data


Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Justin Pryzby
On Mon, Mar 21, 2022 at 12:57:36PM -0400, Robert Haas wrote:
> > typo: contain a an
> I searched for the "contain a an" typo that you mentioned but was not able to
> find it. Can you give me a more specific pointer?

Here:

+ * during parsing, and will otherwise contain a an appropriate error message.

> I looked a little bit more at the compression method vs. compression
> algorithm thing. I agree that there is some inconsistency in
> terminology here, but I'm still not sure that we are well-served by
> trying to make it totally uniform, especially if we pick the word
> "method" as the standard rather than "algorithm". In my opinion,
> "method" is less specific than "algorithm". If someone asks me to
> choose a compression algorithm, I know that I should give an answer
> like "lz4" or "zstd". If they ask me to pick a compression method, I'm
> not quite sure whether they want that kind of answer or whether they
> want something more detailed, like "use lz4 with compression level 3
> and a 1MB block size". After all, that is (at least according to my
> understanding of how English works) a perfectly valid answer to the
> question "what method should I use to compress this data?" -- but not
> to the question "what algorithm should I use to compress this data?".
> The latter can ONLY be properly answered by saying something like
> "lz4". And I think that's really the root of my hesitation to make the
> kinds of changes you want here.

I think "algorithm" could be much more nuanced than "lz4", but I also think
we've spent more than enough time on it now :)

-- 
Justin




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-21 Thread Robert Haas
On Mon, Mar 21, 2022 at 11:21 AM Dilip Kumar  wrote:
> I tried to debug the case but I realized that somehow
> CHECK_FOR_INTERRUPTS() is not calling the
> AcceptInvalidationMessages() and I could not find the same while
> looking into the code as well.   While debugging I noticed that
> AcceptInvalidationMessages() is called multiple times but that is only
> through LockRelationId() but while locking the relation we had already
> closed the previous smgr because at a time we keep only one smgr open.
> And that's the reason it is not hitting the issue which we think it
> could. Is there any condition under which it will call
> AcceptInvalidationMessages() through CHECK_FOR_INTERRUPTS() ? because
> I could not see while debugging as well as in code.

Yeah, I think the reason you can't find it is that it's not there. I
was confused in what I wrote earlier. I think we only process sinval
catchups when we're idle, not at every CHECK_FOR_INTERRUPTS(). And I
think the reason for that is precisely that it would be hard to write
correct code otherwise, since invalidations might then get processed
in a lot more places. So ... I guess all we really need to do here is
avoid assuming that the results of smgropen() are valid across any
code that might acquire relation locks. Which I think the code already
does.

But on a related note, why doesn't CreateDatabaseUsingWalLog() acquire
locks on both the source and destination relations? It looks like
you're only taking locks for the source template database ... but I
thought the intention here was to make sure that we didn't pull pages
into shared_buffers without holding a lock on the relation and/or the
database? I suppose the point is that while the template database
might be concurrently dropped, nobody can be doing anything
concurrently to the target database because nobody knows that it
exists yet. Still, I think that this would be the only case where we
let pages into shared_buffers without a relation or database lock,
though maybe I'm confused about this point, too. If not, perhaps we
should consider locking the target database OID and each relation OID
as we are copying it?

I guess I'm imagining that there might be more code pathways in the
future that want to ensure that there are no remaining buffers for
some particular database or relation OID. It seems natural to want to
be able to take some lock that prevents buffers from being added, and
then go and get rid of all the ones that are there already. But I
admit I can't quite think of a concrete case where we'd want to do
something like this where the patch as coded would be a problem. I'm
just thinking perhaps taking locks is fairly harmless and might avoid
some hypothetical problem later.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Robert Haas
On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby  wrote:
> + * during parsing, and will otherwise contain a an appropriate error message.

OK, thanks. v4 attached.

> I think "algorithm" could be much more nuanced than "lz4", but I also think
> we've spent more than enough time on it now :)

Oh dear. But yes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v4-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: pgsql: Add option to use ICU as global locale provider

2022-03-21 Thread Tom Lane
Christoph Berg  writes:
> A possible solution might be to rip out all the locale columns except
> "Encoding" from \l, and leave them in place for \l+.

I'd rather see a single column summarizing the locale situation.
Perhaps it could be COALESCE(daticulocale, datcollate), or
something using a CASE on datlocprovider?
Then \l+ could replace that with all the underlying columns.

> For \l+, I'd suggest moving the database size and the tablespace to
> the front, after owner.

I think it's confusing if the + and non-+ versions of a command
present their columns in inconsistent orders.  I'm not dead set
against this, but -0.5 or so.

regards, tom lane




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-21 Thread Peter Eisentraut

On 18.03.22 16:14, Maxim Orlov wrote:
Here is v22. It addresses things mentioned by Tom and Kyotaro. Also 
added Aleksander's changes from v21.


The v22-0002-Update-XID-formatting-in-the-.po-files.patch is not 
necessary.  That is done by a separate procedure.


I'm wondering about things like this:

- psprintf("xmax %u equals or exceeds next valid transaction ID %u:%u",
-  xmax,
+ psprintf("xmax %llu equals or exceeds next valid transaction ID %u:%llu",
+  (unsigned long long) xmax,
   EpochFromFullTransactionId(ctx->next_fxid),
-  XidFromFullTransactionId(ctx->next_fxid)));
+  (unsigned long long) XidFromFullTransactionId(ctx->next_fxid)));

This %u:%u business is basically an existing workaround for the lack of 
64-bit transaction identifiers.  Presumably, when those are available, 
all of this will be replaced by a single number display, possibly a 
single %llu.  So these sites you change here will have to be touched 
again, and so changing this now doesn't make sense.  At least that's my 
guess.  Maybe there needs to be a fuller explanation of how this is 
meant to be transitioned.


As a more general point, I don't like plastering these bulky casts all 
over the place.  Casts hide problems.  For example, if we currently have


elog(LOG, "xid is %u", xid);

and then xid is changed to a 64-bit type, this will give a compiler 
warning until you change the format.  If we add a (long long unsigned) 
cast here now and then somehow forget to change the type of xid, nothing 
will warn us about that.  (Note that there is also third-party code 
affected by this.)  Besides, these casts are quite ugly anyway, and I 
don't think the solution for all time should be that we have to add 
these casts just to print xids.


I think there needs to be a bit more soul searching here on how to 
handle that in the future and how to transition it.  I don't think 
targeting this patch for PG15 is useful at this point.





Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Dagfinn Ilmari Mannsåker
Robert Haas  writes:

> On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby  wrote:
>> + * during parsing, and will otherwise contain a an appropriate error 
>> message.
>
> OK, thanks. v4 attached.

I haven't read the whole patch, but I noticed an omission in the
documentation changes:

> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index 9178c779ba..00c593f1af 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
> 
> -COMPRESSION_LEVEL 
> level
> +COMPRESSION_DETAIL 
> detail
>  
>   
>Specifies the compression level to be used.

This is no longer the accurate. How about something like like "Specifies
details of the chosen compression method"?

- ilmari




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-21 Thread Alvaro Herrera
I had a look at this latest version of the patch, and found some things
to tweak.  Attached is v21 with three main changes from Kyotaro's v20:

1. the XLogFlush is only done if consistent state has not been reached.
As I understand, it's not needed in normal mode.  (In any case, if we do
call XLogFlush in normal mode, what it does is not advance the recovery
point, so the comment would be incorrect.)

2. use %u to print OIDs rather than %d

3. I changed the warning message wording to this:

+   ereport(WARNING,
+   (errmsg("skipping replay of database creation WAL record"),
+errdetail("The source database directory \"%s\" was not 
found.",
+  src_path),
+errhint("A future WAL record that removes the directory 
before reaching consistent mode is expected.")));

I also renamed the function XLogReportMissingDir to
XLogRememberMissingDir (which matches the "forget" part) and changed the
DEBUG2 messages in that function to DEBUG1 (all the calls in other
functions remain DEBUG2, because ISTM they are not as interesting).
Finally, I made the TAP test search the WARNING line in the log.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
 (Carlos Caszeli)
>From 6a6fc73a93768a44ec026720c115f77c67d5cda2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 21 Mar 2022 12:34:34 +0100
Subject: [PATCH v21] Fix replay of create database records on standby

Crash recovery on standby may encounter missing directories when
replaying create database WAL records.  Prior to this patch, the
standby would fail to recover in such a case.  However, the
directories could be legitimately missing.  Consider a sequence of WAL
records as follows:

CREATE DATABASE
DROP DATABASE
DROP TABLESPACE

If, after replaying the last WAL record and removing the tablespace
directory, the standby crashes and has to replay the create database
record again, the crash recovery must be able to move on.

This patch adds mechanism similar to invalid page hash table, to track
missing directories during crash recovery.  If all the missing
directory references are matched with corresponding drop records at
the end of crash recovery, the standby can safely enter archive
recovery.

Diagnosed-by: Paul Guo 
Author: Paul Guo 
Author: Kyotaro Horiguchi 
Author: Asim R Praveen 
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com
---
 src/backend/access/transam/xlogrecovery.c   |   6 +
 src/backend/access/transam/xlogutils.c  | 159 +++-
 src/backend/commands/dbcommands.c   |  57 +++
 src/backend/commands/tablespace.c   |  17 +++
 src/include/access/xlogutils.h  |   4 +
 src/test/recovery/t/029_replay_tsp_drops.pl |  67 +
 src/tools/pgindent/typedefs.list|   2 +
 7 files changed, 311 insertions(+), 1 deletion(-)
 create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 9feea3e6ec..f48d8d51fb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -2043,6 +2043,12 @@ CheckRecoveryConsistency(void)
 		 */
 		XLogCheckInvalidPages();
 
+		/*
+		 * Check if the XLOG sequence contained any unresolved references to
+		 * missing directories.
+		 */
+		XLogCheckMissingDirs();
+
 		reachedConsistency = true;
 		ereport(LOG,
 (errmsg("consistent recovery state reached at %X/%X",
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 511f2f186f..8c1b8216be 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -54,6 +54,164 @@ bool		InRecovery = false;
 /* Are we in Hot Standby mode? Only valid in startup process, see xlogutils.h */
 HotStandbyState standbyState = STANDBY_DISABLED;
 
+
+/*
+ * If a create database WAL record is being replayed more than once during
+ * crash recovery on a standby, it is possible that either the tablespace
+ * directory or the template database directory is missing.  This happens when
+ * the directories are removed by replay of subsequent drop records.  Note
+ * that this problem happens only on standby and not on master.  On master, a
+ * checkpoint is created at the end of create database operation. On standby,
+ * however, such a strategy (creating restart points during replay) is not
+ * viable because it will slow down WAL replay.
+ *
+ * The alternative is to track references to each missing directory
+ * encountered when performing crash recovery in the following hash table.
+ * Similar to invalid page table above, the expectation is that each missing
+ * directory entry should be matched with a drop database or drop tablespace
+ * WAL record by the end

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-03-21 Thread Andres Freund
Hi,

On 2022-03-21 11:27:15 +0530, Bharath Rupireddy wrote:
> On Sat, Mar 19, 2022 at 4:39 AM Andres Freund  wrote:
> >
> > before we further change this (as done in this patch) we should deduplicate
> > these huge statements in a separate patch / commit.
>
> Something like attached
> v6-0001-Deduplicate-checkpoint-restartpoint-complete-mess.patch?

Mostly. I don't see a reason for the use of the stringinfo. And I think
LogCheckpointStart() should be dealt with similarly.

I'd just make it a restartpoint ? _("restartpoint") : _("checkpoint") or such
in the argument? Then translators don't need to translate the two messages
separately.

Or we could just not translate restartpoint/checkpoint - after all we don't
translate the flags in LogCheckpointStart() either. But on balance I'd lean
towards the above.


> > This practically doubles the size of the log message - doesn't that seem a 
> > bit
> > disproportionate? Can we make this more dense? "logical decoding rewrite
> > mapping file(s) removed=" and "logical decoding snapshot file(s) removed=" 
> > is
> > quite long.
>
> Do you suggest something like below? Or some other better wording like
> "logical decoding rewrite map files" and "logical decoding snapshot
> files" or "logical rewrite map files" and "logical snapshot files" or
> just "rewrite mapping files" or "snapshot files"  ?

Both seem still very long. I still am doubtful this level of detail is
appropriate. Seems more like a thing for a tracepoint or such. How about just
printing the time for the logical decoding operations in aggregate, without
breaking down into files, adding LSNs etc?

Greetings,

Andres Freund




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-21 Thread Alvaro Herrera
On 2022-Mar-14, Robert Haas wrote:

> 2. Why not instead change the code so that the operation can succeed,
> by creating the prerequisite parent directories? Do we not have enough
> information for that? I'm not saying that we definitely should do it
> that way rather than this way, but I think we do take that approach in
> some cases.

It seems we can choose freely between these two implementations -- I
mean I don't see any upsides or downsides to either one.

The current one has the advantage that it never makes the datadir
"dirty", to use Kyotaro's term.  It verifies that the creation/drop form
a pair.  A possible downside is that if there's a bug, we could end up
with a spurious PANIC at the end of recovery, and no way to recover.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: New Object Access Type hooks

2022-03-21 Thread Mark Dilger


> On Mar 18, 2022, at 3:04 PM, Andrew Dunstan  wrote:
> 
> I haven't looked at it in detail, but regarding the test code I'm not
> sure why there's a .control file, since this isn't a loadable extension,
> not why there's a test_oat_hooks.h file.

The .control file exists because the test defines a loadable module which 
defines the hooks.  The test_oat_hooks.h file was extraneous, and has been 
removed in v2.



v2-0001-Add-String-object-access-hooks.patch
Description: Binary data


v2-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-21 Thread Mark Dilger



> On Mar 21, 2022, at 8:41 AM, Andrew Dunstan  wrote:
> 
> My first inclination is to say it's probably ok. The immediately obvious
> alternative would be to create yet another set of functions that don't
> have classId parameters. That doesn't seem attractive.
> 
> Modulo that issue I think patch 1 is basically ok, but we should fix the
> comments in objectaccess.c.  Rather than "It is [the] entrypoint ..." we
> should have something like "Oid variant entrypoint ..." and "Name
> variant entrypoint ...", and also fix the function names in the comments.

Joshua,

Do you care to create a new version of this, perhaps based on the v2-0001 patch 
I just posted?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: jsonpath syntax extensions

2022-03-21 Thread Greg Stark
This patch seems to be getting ignored. Like David I'm a bit puzzled
because it doesn't seem like an especially obscure or difficult patch
to review. Yet it's been multiple years without even a superficial
"does it meet the coding requirements" review let alone a design
review.

Can we get a volunteer to at least give it a quick once-over? I don't
think it's ideal to be doing this in the last CF but neither is it
very appetizing to just shift it to the next CF without a review after
two years...

On Thu, 27 Feb 2020 at 10:58, Nikita Glukhov  wrote:
>
> Hi, hackers!
>
> Attached patches implement several useful jsonpath syntax extensions.
> I already published them two years ago in the original SQL/JSON thread,
> but then after creation of separate threads for SQL/JSON functions and
> JSON_TABLE I forgot about them.
>
> A brief description of the patches:
>
> 1. Introduced new jsonpath modifier 'pg' which is used for enabling
> PostgreSQL-specific extensions.  This feature was already proposed in the
> discussion of jsonpath's like_regex implementation.
>
> 2. Added support for raw jbvObject and jbvArray JsonbValues inside jsonpath
> engine.  Now, jsonpath can operate with JSON arrays and objects only in
> jbvBinary form.  But with introduction of array and object constructors in
> patches #4 and #5 raw in-memory jsonb containers can appear in jsonpath 
> engine.
> In some places we can iterate through jbvArrays, in others we need to encode
> jbvArrays and jbvObjects into jbvBinay.
>
> 3. SQL/JSON sequence construction syntax. A simple comma-separated list can be
> used to concatenate single values or sequences into a single resulting 
> sequence.
>
>  SELECT jsonb_path_query('[1, 2, 3]', 'pg $[*], 4, 2 + 3');
>   jsonb_path_query
>  --
>   1
>   2
>   3
>   4
>   5
>
>  SELECT jsonb_path_query('{ "a": [1, 2, 3], "b": [4, 5] }',
> 'pg ($.a[*], $.b[*]) ? (@ % 2 == 1)');
>jsonb_path_query
>  --
>   1
>   3
>   5
>
>
> Patches #4-#6 implement ECMAScript-like syntax constructors and accessors:
>
> 4. Array construction syntax.
> This can also be considered as enclosing a sequence constructor into brackets.
>
>  SELECT jsonb_path_query('[1, 2, 3]', 'pg [$[*], 4, 2 + 3]');
>   jsonb_path_query
>  --
>   [1, 2, 3, 4, 5]
>
> Having this feature, jsonb_path_query_array() becomes somewhat redundant.
>
>
> 5. Object construction syntax.  It is useful for constructing derived objects
> from the interesting parts of the original object.  (But this is not 
> sufficient
> to "project" each object in array, item method like '.map()' is needed here.)
>
>  SELECT jsonb_path_query('{"b": 2}', 'pg { a : 1, b : $.b, "x y" : $.b + 3 
> }');
>  jsonb_path_query
>  ---
>   { "a" : 1, "b": 3, "x y": 5 }
>
> Fields with empty values are simply skipped regardless of lax/strict mode:
>
>  SELECT jsonb_path_query('{"a": 1}', 'pg { b : $.b, a : $.a ? (@ > 1) }');
>   jsonb_path_query
>  --
>   {}
>
>
> 6. Object subscription syntax.  This gives us ability to specify what key to
> extract on runtime.  The syntax is the same as ordinary array subscription
> syntax.
>
>  -- non-existent $.x is simply skipped in lax mode
>  SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$.a, "x", "a"]');
>   jsonb_path_query
>  --
>   "c"
>   "b"
>
>  SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$fld]', '{"fld": 
> "b"}');
>   jsonb_path_query
>  --
>   "c"
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



-- 
greg




Re: jsonpath syntax extensions

2022-03-21 Thread Greg Stark
Hm. Actually... These changes were split off from the JSON_TABLE
patches? Are they still separate or have they been merged into those
other patches since? I see the JSON_TABLE thread is getting more
comments do those reviews include these patches?

On Mon, 21 Mar 2022 at 16:09, Greg Stark  wrote:
>
> This patch seems to be getting ignored. Like David I'm a bit puzzled
> because it doesn't seem like an especially obscure or difficult patch
> to review. Yet it's been multiple years without even a superficial
> "does it meet the coding requirements" review let alone a design
> review.
>
> Can we get a volunteer to at least give it a quick once-over? I don't
> think it's ideal to be doing this in the last CF but neither is it
> very appetizing to just shift it to the next CF without a review after
> two years...
>
> On Thu, 27 Feb 2020 at 10:58, Nikita Glukhov  wrote:
> >
> > Hi, hackers!
> >
> > Attached patches implement several useful jsonpath syntax extensions.
> > I already published them two years ago in the original SQL/JSON thread,
> > but then after creation of separate threads for SQL/JSON functions and
> > JSON_TABLE I forgot about them.
> >
> > A brief description of the patches:
> >
> > 1. Introduced new jsonpath modifier 'pg' which is used for enabling
> > PostgreSQL-specific extensions.  This feature was already proposed in the
> > discussion of jsonpath's like_regex implementation.
> >
> > 2. Added support for raw jbvObject and jbvArray JsonbValues inside jsonpath
> > engine.  Now, jsonpath can operate with JSON arrays and objects only in
> > jbvBinary form.  But with introduction of array and object constructors in
> > patches #4 and #5 raw in-memory jsonb containers can appear in jsonpath 
> > engine.
> > In some places we can iterate through jbvArrays, in others we need to encode
> > jbvArrays and jbvObjects into jbvBinay.
> >
> > 3. SQL/JSON sequence construction syntax. A simple comma-separated list can 
> > be
> > used to concatenate single values or sequences into a single resulting 
> > sequence.
> >
> >  SELECT jsonb_path_query('[1, 2, 3]', 'pg $[*], 4, 2 + 3');
> >   jsonb_path_query
> >  --
> >   1
> >   2
> >   3
> >   4
> >   5
> >
> >  SELECT jsonb_path_query('{ "a": [1, 2, 3], "b": [4, 5] }',
> > 'pg ($.a[*], $.b[*]) ? (@ % 2 == 1)');
> >jsonb_path_query
> >  --
> >   1
> >   3
> >   5
> >
> >
> > Patches #4-#6 implement ECMAScript-like syntax constructors and accessors:
> >
> > 4. Array construction syntax.
> > This can also be considered as enclosing a sequence constructor into 
> > brackets.
> >
> >  SELECT jsonb_path_query('[1, 2, 3]', 'pg [$[*], 4, 2 + 3]');
> >   jsonb_path_query
> >  --
> >   [1, 2, 3, 4, 5]
> >
> > Having this feature, jsonb_path_query_array() becomes somewhat redundant.
> >
> >
> > 5. Object construction syntax.  It is useful for constructing derived 
> > objects
> > from the interesting parts of the original object.  (But this is not 
> > sufficient
> > to "project" each object in array, item method like '.map()' is needed 
> > here.)
> >
> >  SELECT jsonb_path_query('{"b": 2}', 'pg { a : 1, b : $.b, "x y" : $.b + 3 
> > }');
> >  jsonb_path_query
> >  ---
> >   { "a" : 1, "b": 3, "x y": 5 }
> >
> > Fields with empty values are simply skipped regardless of lax/strict mode:
> >
> >  SELECT jsonb_path_query('{"a": 1}', 'pg { b : $.b, a : $.a ? (@ > 1) }');
> >   jsonb_path_query
> >  --
> >   {}
> >
> >
> > 6. Object subscription syntax.  This gives us ability to specify what key to
> > extract on runtime.  The syntax is the same as ordinary array subscription
> > syntax.
> >
> >  -- non-existent $.x is simply skipped in lax mode
> >  SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$.a, "x", "a"]');
> >   jsonb_path_query
> >  --
> >   "c"
> >   "b"
> >
> >  SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$fld]', '{"fld": 
> > "b"}');
> >   jsonb_path_query
> >  --
> >   "c"
> >
> > --
> > Nikita Glukhov
> > Postgres Professional: http://www.postgrespro.com
> > The Russian Postgres Company
>
>
>
> --
> greg



-- 
greg




Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-03-21 Thread Joe Conway

On 3/20/22 12:38, Stephen Frost wrote:

Greetings,

On Sun, Mar 20, 2022 at 18:31 Joshua Brindle 
mailto:joshua.brin...@crunchydata.com>> 
wrote:


On Sun, Mar 20, 2022 at 12:27 PM Joe Conway mailto:m...@joeconway.com>> wrote:
 >
 > On 3/3/22 11:26, Joshua Brindle wrote:
 > > On Thu, Feb 10, 2022 at 2:37 PM Joe Conway mailto:m...@joeconway.com>> wrote:
 > >>
 > >> On 2/10/22 14:28, Nathan Bossart wrote:
 > >> > On Wed, Feb 09, 2022 at 04:39:11PM -0500, Joe Conway wrote:
 > >> >> On 2/9/22 13:13, Nathan Bossart wrote:
 > >> >>> I do wonder if users find the differences between
predefined roles and role
 > >> >>> attributes confusing.  INHERIT doesn't govern role
attributes, but it will
 > >> >>> govern predefined roles when this patch is applied.  Maybe
the role
 > >> >>> attribute system should eventually be deprecated in favor
of using
 > >> >>> predefined roles for everything.  Or perhaps the
predefined roles should be
 > >> >>> converted to role attributes.
 > >> >>
 > >> >> Yep, I was suggesting that the latter would have been
preferable to me while
 > >> >> Robert seemed to prefer the former. Honestly I could be
happy with either of
 > >> >> those solutions, but as I alluded to that is probably a
discussion for the
 > >> >> next development cycle since I don't see us doing that big
a change in this
 > >> >> one.
 > >> >
 > >> > I agree.  I still think Joshua's proposed patch is a
worthwhile improvement
 > >> > for v15.
 > >>
 > >> +1
 > >>
 > >> I am planning to get into it in detail this weekend. So far I have
 > >> really just ensured it merges cleanly and passes make world.
 > >
 > > Rebased patch to apply to master attached.
 >
 > Well longer than I planned, but finally took a closer look.
 >
 > I made one minor editorial fix to Joshua's patch, rebased to current
 > master, and added two missing call sites that presumably are
related to
 > recent commits for pg_basebackup.

FWIW I pinged Stephen when I saw the basebackup changes included
is_member_of and he didn't think they necessarily needed to be changed
since they aren't accessible to human and you can't SET ROLE on a
replication connection in order to access the role where inheritance
was blocked.

Yeah, though that should really be very clearly explained in comments 
around that code as anything that uses is_member should really be viewed 
as an exception.  I also wouldn’t be against using has_privs there 
anyway and saying that you shouldn’t be trying to connect as one role on 
a replication connection and using the privs of another when you don’t 
automatically inherit those rights, but on the assumption that the 
committer intended is_member there because SET ROLE isn’t something one 
does on replication connections, I’m alright with it being as is.



Robert -- any opinion on this? If I am not mistaken it is code that you 
are actively working on.


Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-03-21 Thread Nathan Bossart
On Mon, Mar 21, 2022 at 11:27:15AM +0530, Bharath Rupireddy wrote:
> On Sat, Mar 19, 2022 at 3:16 AM Nathan Bossart  
> wrote:
>> /* buffer stats */
>> appendStringInfo(&logmsg, "wrote %d buffers (%.1f%%); ",
>>   
>> CheckpointStats.ckpt_bufs_written,
>>   (double) 
>> CheckpointStats.ckpt_bufs_written * 100 / NBuffers);
>>
>> /* WAL file stats */
>> appendStringInfo(&logmsg, "%d WAL file(s) added, %d removed, %d 
>> recycled; ",
>>   
>> CheckpointStats.ckpt_segs_added,
>>   
>> CheckpointStats.ckpt_segs_removed,
>>   
>> CheckpointStats.ckpt_segs_recycled);
> 
> Do we actually need to granularize the message like this? I actually
> don't think so, because we print the stats even if they are 0 (buffers
> written is 0 or WAL segments added is 0 and so on).

I suggested it because I thought it would improve readability and simplify
optionally adding new stats to the message.  If there is another way to
accomplish those things, that is fine by me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: jsonpath syntax extensions

2022-03-21 Thread Erik Rijkers



Op 21-03-2022 om 21:13 schreef Greg Stark:

Hm. Actually... These changes were split off from the JSON_TABLE
patches? Are they still separate or have they been merged into those
other patches since? I see the JSON_TABLE thread is getting more
comments do those reviews include these patches?



They are separate.

FWIW, I've done all my JSON_PATH testing both without and with these 
syntax extensions (but I've done no code review.)  I like these 
extensions but as you say -- there seems to be not much interest.



Erik


On Mon, 21 Mar 2022 at 16:09, Greg Stark  wrote:


This patch seems to be getting ignored. Like David I'm a bit puzzled
because it doesn't seem like an especially obscure or difficult patch
to review. Yet it's been multiple years without even a superficial
"does it meet the coding requirements" review let alone a design
review.

Can we get a volunteer to at least give it a quick once-over? I don't
think it's ideal to be doing this in the last CF but neither is it
very appetizing to just shift it to the next CF without a review after
two years...

On Thu, 27 Feb 2020 at 10:58, Nikita Glukhov  wrote:


Hi, hackers!

Attached patches implement several useful jsonpath syntax extensions.
I already published them two years ago in the original SQL/JSON thread,
but then after creation of separate threads for SQL/JSON functions and
JSON_TABLE I forgot about them.

A brief description of the patches:

1. Introduced new jsonpath modifier 'pg' which is used for enabling
PostgreSQL-specific extensions.  This feature was already proposed in the
discussion of jsonpath's like_regex implementation.

2. Added support for raw jbvObject and jbvArray JsonbValues inside jsonpath
engine.  Now, jsonpath can operate with JSON arrays and objects only in
jbvBinary form.  But with introduction of array and object constructors in
patches #4 and #5 raw in-memory jsonb containers can appear in jsonpath engine.
In some places we can iterate through jbvArrays, in others we need to encode
jbvArrays and jbvObjects into jbvBinay.

3. SQL/JSON sequence construction syntax. A simple comma-separated list can be
used to concatenate single values or sequences into a single resulting sequence.

  SELECT jsonb_path_query('[1, 2, 3]', 'pg $[*], 4, 2 + 3');
   jsonb_path_query
  --
   1
   2
   3
   4
   5

  SELECT jsonb_path_query('{ "a": [1, 2, 3], "b": [4, 5] }',
 'pg ($.a[*], $.b[*]) ? (@ % 2 == 1)');
jsonb_path_query
  --
   1
   3
   5


Patches #4-#6 implement ECMAScript-like syntax constructors and accessors:

4. Array construction syntax.
This can also be considered as enclosing a sequence constructor into brackets.

  SELECT jsonb_path_query('[1, 2, 3]', 'pg [$[*], 4, 2 + 3]');
   jsonb_path_query
  --
   [1, 2, 3, 4, 5]

Having this feature, jsonb_path_query_array() becomes somewhat redundant.


5. Object construction syntax.  It is useful for constructing derived objects
from the interesting parts of the original object.  (But this is not sufficient
to "project" each object in array, item method like '.map()' is needed here.)

  SELECT jsonb_path_query('{"b": 2}', 'pg { a : 1, b : $.b, "x y" : $.b + 3 }');
  jsonb_path_query
  ---
   { "a" : 1, "b": 3, "x y": 5 }

Fields with empty values are simply skipped regardless of lax/strict mode:

  SELECT jsonb_path_query('{"a": 1}', 'pg { b : $.b, a : $.a ? (@ > 1) }');
   jsonb_path_query
  --
   {}


6. Object subscription syntax.  This gives us ability to specify what key to
extract on runtime.  The syntax is the same as ordinary array subscription
syntax.

  -- non-existent $.x is simply skipped in lax mode
  SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$.a, "x", "a"]');
   jsonb_path_query
  --
   "c"
   "b"

  SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$fld]', '{"fld": "b"}');
   jsonb_path_query
  --
   "c"

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




--
greg









Re: New Object Access Type hooks

2022-03-21 Thread Andrew Dunstan


On 3/21/22 15:57, Mark Dilger wrote:
>> On Mar 18, 2022, at 3:04 PM, Andrew Dunstan  wrote:
>>
>> I haven't looked at it in detail, but regarding the test code I'm not
>> sure why there's a .control file, since this isn't a loadable extension,
>> not why there's a test_oat_hooks.h file.
> The .control file exists because the test defines a loadable module which 
> defines the hooks. 



To the best of my knowledge .control files are only used by extensions,
not by other modules. They are only referenced in
src/backend/commands/extension.c in the backend code. For example,
auto_explain which is a loadable module but not en extension does not
have one, and I bet if you remove it you'll find this will work just fine.


cheers


andrew


-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Patch proposal - parameter to limit amount of FPW because of hint bits per second

2022-03-21 Thread Peter Geoghegan
On Mon, Mar 21, 2022 at 12:58 AM Michail Nikolaev
 wrote:
> Hm, not sure here
> AFAIK current implementation does not produce repeated FPIs. Page is
> marked as dirty on the first bit. So, others LP_DEAD (if not set by
> single scan) do not generate FPI until checkpoint is ready.

There is one FPI per checkpoint for any leaf page that is modified
during that checkpoint. The difference between having that happen once
or twice per leaf page and having that happen many more times per leaf
page could be very large.

Of course it's true that that might not make that much difference. Who
knows? But if you're not willing to measure it then we'll never know.
What version are you using here? How frequently were checkpoints
occurring in the period in question, and how does that compare to
normal? You didn't even include this basic information.

Many things have changed in this area already, and it's rather unclear
how much just upgrading to Postgres 14 would help. I think that it's
possible that it would help you here a great deal. I also think it's
possible that it wouldn't help at all. I don't know which it is, and I
wouldn't expect to know without careful testing -- it's too
complicated, and likely would be even if all of the information about
the application is available.

The main reason that this can be so complex is that FPIs are caused by
more frequent checkpoints, but *also* cause more frequent checkpoints
in turn. So you could have a "death spiral" with FPIs -- the effect is
nonlinear, which has the potential to lead to pathological, chaotic
behavior. The impact on response time is *also* nonlinear and chaotic,
in turn.

Sometimes it's possible to address things like this quite well with
relatively simple solutions, that at least work well in most cases --
just avoiding getting into a "death spiral" might be all it takes. As
I said, maybe that won't be possible here, but it should be carefully
considered first. Not setting LP_DEAD bits because there are currently
"too many FPIs" requires defining what that actually means, which
seems very difficult because of these nonlinear dynamics. What do you
do when there were too many FPIs for a long time, but also too much
avoiding them earlier on? It's very complicated.

That's why I'm emphasizing solutions that focus on limiting the
downside of not setting LP_DEAD bits, which is local information (not
system wide information) that is much easier to understand and target
in the implementation.

-- 
Peter Geoghegan




Re: Support logical replication of DDLs

2022-03-21 Thread Zheng Li
Hi Japin,

> You should use a different user that has different length from your current 
> one.
> For example:
>
> px@localhost$ make check-world

This is fixed in the latest commit:
https://github.com/zli236/postgres/commits/ddl_replication

Thanks,
Zheng




RE: WIP: Aggregation push-down

2022-03-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi everyone.

I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 

I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign 
table.
Actually, I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and 
this patch. 
  1. Things I found
   I execute a query which contain join of postgres_fdw's foreign table and a 
table and aggregation of the josin result.
   In my setting, my prototype reduce this query's response by 93%.
  2. Differences between my prototype and this patch
   (1) Pushdown aggregation of foeign table if FDW pushdown partioal aggregation
   (2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of 
accept of this patch's function.
. I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any 
reviewers / committers are 
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation



v17-0004-pushdown-aggregation-foreign-table.patch
Description: v17-0004-pushdown-aggregation-foreign-table.patch
1. Restrictions of my prototype
(1) aggregate functions are avg, sum, count, min, max
(2) argment or sum of avg is bigint var

2. My experiment settings
(1) hardware settings
  cpu:Intel Corei5-9500 processor(3.0 GHz/9MB cache)(6 cores)
  ram:DDR4 3200Mhz 128GB
  storage:512GB SSD(M.2 NVMe)
(2) software
  os: CentOS7
  postgresql source code:14(in development) with commit-id 
947456a823d6b0973b68c6b38c8623a0504054e7
  # build by "make world"
  patchs I applied:
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
(3) PostgreSQL settings
  max_parallel_workers_per_gather=6
  work_mem=100
(4) tables and data
  execute the following commands using postgresql's client tool such as psql
--
\c tmp;
create table log as select 1::bigint as id, 1::bigint as size, 1::bigint as 
total from generate_series(1,10) t;

create table master as select t as id, 'hoge' || mod(t, 1000) as name from 
generate_series(1,100) t;
create index master_idx on master(id);

create extension postgres_fdw;
create server local_server foreign data wrapper postgres_fdw OPTIONS (host 
'localhost', port '5432', dbname 'tmp');
create user mapping for user server local_server options (user 'postgres', 
password 'postgres');
create foreign table f_log(id bigint, size bigint, total bigint) server 
local_server options (table_name 'log');

analyze;

set enable_agg_pushdown=true;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;

set enable_agg_pushdown=false;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;
--


Re: freeing bms explicitly

2022-03-21 Thread Zhihong Yu
>
> Hi,
> I was looking at calls to bms_free() in PG code.
>
> e.g. src/backend/commands/publicationcmds.c line 362
>
> bms_free(bms);
>
> The above is just an example, there're other calls to bms_free().
> Since the bms is allocated from some execution context, I wonder why this
> call is needed.
>
> When the underlying execution context wraps up, isn't the bms freed ?
>
> Cheers
>
>
>


Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-21 Thread Thomas Munro
On Tue, Mar 22, 2022 at 6:14 AM David Christensen
 wrote:
> Updated to include the V3 fixes as well as the unsigned int/enum fix.

Hi David,

I ran this though pg_indent and adjusted some remaining
non-project-style whitespace, and took it for a spin.  Very minor
comments:

pg_waldump: error: could not parse valid relation from ""/ (expecting
"tablespace OID/database OID/relation filenode")
-> There was a stray "/" in that message, which I've removed in the attached.

pg_waldump: error: could not parse valid relation from "1664/0/1262"
(expecting "tablespace OID/database OID/relation filenode")
-> Why not?  Shared relations like pg_database have invalid database
OID, so I think it should be legal to write --relation=1664/0/1262.  I
took out that restriction.

+   if (sscanf(optarg, "%u",
&forknum) != 1 ||
+   forknum >= MAX_FORKNUM)
+   {
+   pg_log_error("could
not parse valid fork number (0..%d) \"%s\"",
+
  MAX_FORKNUM - 1, optarg);
+   goto bad_argument;
+   }

I guess you did this because init fork references aren't really
expected in the WAL, but I think it's more consistent to allow up to
MAX_FORKNUM, not least because your documentation mentions 3 as a
valid value.  So I adjust this to allow MAX_FORKNUM.  Make sense?

Here are some more details I noticed, as a likely future user of this
very handy feature, which I haven't changed, because they seem more
debatable and you might disagree...

1.  I think it'd be less surprising if the default value for --fork
wasn't 0... why not show all forks?
2.  I think it'd be less surprising if --fork without --relation
either raised an error (like --block without --relation), or were
allowed, with the meaning "show me this fork of all relations".
3.  It seems funny to have no short switch for --fork when everything
else has one... what about -F?
From 596181e9dfe2db9d5338b3ac3c899d230fe0fc78 Mon Sep 17 00:00:00 2001
From: David Christensen 
Date: Fri, 25 Feb 2022 12:52:56 -0600
Subject: [PATCH v5] Add additional filtering options to pg_waldump.

This feature allows you to only output records that are touch a specific
RelFileNode and optionally BlockNumber and ForkNum in this relation.  We
also add the independent ability to filter Full Page Write records.

Author: David Christensen 
Reviewed-by: Peter Geoghegan 
Reviewed-by: Japin Li 
Reviewed-by: Bharath Rupireddy 
Reviewed-by: Cary Huang 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/lzzgmgm6e5.fsf%40veeddrois.attlocal.net
---
 doc/src/sgml/ref/pg_waldump.sgml |  48 +++
 src/bin/pg_waldump/pg_waldump.c  | 132 ++-
 2 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 5735a161ce..f157175764 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -100,6 +100,44 @@ PostgreSQL documentation
   
  
 
+ 
+  -k block
+  --block=block
+  
+   
+Display only records touching the given block. (Requires also
+providing the relation via --relation.)
+   
+  
+ 
+
+ 
+  --fork=fork
+  
+   
+When using the --relation filter, output only records
+from the given fork.  The valid values here are 0
+for the main fork, 1 for the Free Space
+Map, 2 for the Visibility Map,
+and 3 for the Init fork.  If unspecified, defaults
+to the main fork.
+   
+  
+ 
+
+ 
+  -l tbl/db/rel
+  --relation=tbl/db/rel
+  
+   
+Display only records touching the given relation.  The relation is
+specified via tablespace OID, database OID, and relfilenode separated
+by slashes, for instance, 1234/12345/12345.  This
+is the same format used for relations in the WAL dump output.
+   
+  
+ 
+
  
   -n limit
   --limit=limit
@@ -183,6 +221,16 @@ PostgreSQL documentation

  
 
+ 
+   -w
+   --fullpage
+   
+   
+   Filter records to only those which have full page writes.
+   
+   
+ 
+
  
   -x xid
   --xid=xid
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index fc081adfb8..eb0c9b2dcb 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -55,6 +55,12 @@ typedef struct XLogDumpConfig
 	bool		filter_by_rmgr_enabled;
 	TransactionId filter_by_xid;
 	bool		filter_by_xid_enabled;
+	RelFileNode filter_by_relation;
+	bool		filter_by_relation_enabled;
+	BlockNumber filter_by_relation_block;
+	bool		filter_by_relation_block_enabled;
+	ForkNumber	filter_by_relation_forknum;
+	bool		filter_by_fpw;
 } XLogDumpConfig;
 
 typedef struct Stats

Re: shared-memory based stats collector - v66

2022-03-21 Thread Melanie Plageman
On Sun, Mar 20, 2022 at 4:56 PM Melanie Plageman
 wrote:
>
> Addressed all of these points in
> v2-0001-add-replica-cleanup-tests.patch
>
> also added a new test file in
> v2-0002-Add-TAP-test-for-discarding-stats-after-crash.patch
> testing correct behavior after a crash and when stats file is invalid
>

Attached is the last of the tests confirming clean up for stats in the
shared stats hashtable (these are for the subscription stats).

I thought that maybe these tests could now use
pg_stat_force_next_flush() instead of poll_query_until() but I wasn't
sure how to ensure that the error has happened and the pending entry has
been added before setting force_next_flush.

I also added in tests that resetting subscription stats works as
expected.

- Melanie
From ffb83cc6ad2941f1d01b42b55dd0615a011d59cf Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 21 Mar 2022 14:52:55 -0400
Subject: [PATCH] add subscriber stats reset and drop tests

---
 src/test/subscription/t/026_stats.pl | 303 ---
 1 file changed, 230 insertions(+), 73 deletions(-)

diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index a42ea3170e..e86bfb4fea 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -18,83 +18,240 @@ my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 $node_subscriber->init(allows_streaming => 'logical');
 $node_subscriber->start;
 
-# Initial table setup on both publisher and subscriber. On subscriber we
-# create the same tables but with primary keys. Also, insert some data that
-# will conflict with the data replicated from publisher later.
-$node_publisher->safe_psql(
-	'postgres',
-	qq[
-BEGIN;
-CREATE TABLE test_tab1 (a int);
-INSERT INTO test_tab1 VALUES (1);
-COMMIT;
-]);
-$node_subscriber->safe_psql(
-	'postgres',
-	qq[
-BEGIN;
-CREATE TABLE test_tab1 (a int primary key);
-INSERT INTO test_tab1 VALUES (1);
-COMMIT;
-]);
-
-# Setup publication.
-my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
-$node_publisher->safe_psql('postgres',
-	"CREATE PUBLICATION tap_pub FOR TABLE test_tab1;");
+
+sub create_sub_pub_w_errors
+{
+	my ($publisher, $subscriber, $db, $table_name) = @_;
+	# Initial table setup on both publisher and subscriber. On subscriber we
+	# create the same tables but with primary keys. Also, insert some data that
+	# will conflict with the data replicated from publisher later.
+	$publisher->safe_psql(
+		$db,
+		qq[
+	BEGIN;
+	CREATE TABLE $table_name(a int);
+	INSERT INTO $table_name VALUES (1);
+	COMMIT;
+	]);
+	$subscriber->safe_psql(
+		$db,
+		qq[
+	BEGIN;
+	CREATE TABLE $table_name(a int primary key);
+	INSERT INTO $table_name VALUES (1);
+	COMMIT;
+	]);
+
+	# Set up publication.
+	my $pub_name = $table_name . '_pub';
+	my $publisher_connstr = $publisher->connstr . qq( dbname=$db);
+
+	$publisher->safe_psql($db,
+		qq(CREATE PUBLICATION $pub_name FOR TABLE $table_name));
+
+	# Create subscription. The tablesync for table on subscription will enter into
+	# infinite error loop due to violating the unique constraint.
+	my $sub_name = $table_name . '_sub';
+	$subscriber->safe_psql($db,
+		qq(CREATE SUBSCRIPTION $sub_name CONNECTION '$publisher_connstr' PUBLICATION $pub_name)
+	);
+
+	$publisher->wait_for_catchup($sub_name);
+
+	# TODO: can this be replaced with pg_stat_force_next_flush() and a test
+	# that sync error > 0?
+	# Wait for the tablesync error to be reported.
+	$node_subscriber->poll_query_until(
+		$db,
+		qq[
+	SELECT sync_error_count > 0
+	FROM pg_stat_subscription_stats
+	WHERE subname = '$sub_name'
+	]) or die qq(Timed out while waiting for tablesync errors for subscription '$sub_name');
+
+	# Truncate test_tab1 so that tablesync worker can continue.
+	$subscriber->safe_psql($db, qq(TRUNCATE $table_name));
+
+	# Wait for initial tablesync to finish.
+	$subscriber->poll_query_until(
+		$db,
+		qq[
+	SELECT count(1) = 1 FROM pg_subscription_rel
+	WHERE srrelid = '$table_name'::regclass AND srsubstate in ('r', 's')
+	]) or die qq(Timed out while waiting for subscriber to synchronize data for table '$table_name'.);
+
+	# Check test table on the subscriber has one row.
+	my $result = $subscriber->safe_psql($db, qq(SELECT a FROM $table_name));
+	is($result, qq(1), qq(Check that table '$table_name' now has 1 row.));
+
+	# Insert data to test table on the publisher, raising an error on the
+	# subscriber due to violation of the unique constraint on test table.
+	$publisher->safe_psql($db, qq(INSERT INTO $table_name VALUES (1)));
+
+	# TODO: can this be replaced with a pg_stat_force_next_flush() and a test
+	# that apply error > 0?
+	# Wait for the apply error to be reported.
+	$subscriber->poll_query_until(
+		$db,
+		qq[
+	SELECT apply_error_count > 0
+	FROM pg_stat_subscription_stats
+	WHERE subname = '$sub_name'
+	]) or die qq(Timed out while waiting for apply error for subscription '$sub_name');
+
+	# Truncate test table so that apply work

Re: logical decoding and replication of sequences

2022-03-21 Thread Tomas Vondra
On 3/21/22 14:05, Peter Eisentraut wrote:
> On 20.03.22 23:55, Tomas Vondra wrote:
>> Attached is a rebased patch, addressing most of the remaining issues.
> 
> This looks okay to me, if the two FIXMEs are addressed.  Remember to
> also update protocol.sgml if you change LOGICAL_REP_MSG_SEQUENCE.

Thanks. Do we want to use a different constant for the sequence message?
I've used 'X' for the WIP patch, but maybe there's a better value?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: freeing bms explicitly

2022-03-21 Thread Tom Lane
Zhihong Yu  writes:
>> I was looking at calls to bms_free() in PG code.
>> e.g. src/backend/commands/publicationcmds.c line 362
>>  bms_free(bms);
>> The above is just an example, there're other calls to bms_free().
>> Since the bms is allocated from some execution context, I wonder why this
>> call is needed.
>> 
>> When the underlying execution context wraps up, isn't the bms freed ?

Yeah, that's kind of pointless --- and the pfree(rfnode) after it is even
more pointless, since it'll free only the top node of that expression
tree.  Not to mention the string returned by TextDatumGetCString, and
whatever might be leaked during the underlying catalog accesses.

If we were actually worried about transient space consumption of this
function, it'd be necessary to do a lot more than this.  It doesn't
look to me like it's worth worrying about though -- it doesn't seem
like it could be hit more than once per query in normal cases.

regards, tom lane




Re: freeing bms explicitly

2022-03-21 Thread Zhihong Yu
On Mon, Mar 21, 2022 at 3:05 PM Tom Lane  wrote:

> Zhihong Yu  writes:
> >> I was looking at calls to bms_free() in PG code.
> >> e.g. src/backend/commands/publicationcmds.c line 362
> >>  bms_free(bms);
> >> The above is just an example, there're other calls to bms_free().
> >> Since the bms is allocated from some execution context, I wonder why
> this
> >> call is needed.
> >>
> >> When the underlying execution context wraps up, isn't the bms freed ?
>
> Yeah, that's kind of pointless --- and the pfree(rfnode) after it is even
> more pointless, since it'll free only the top node of that expression
> tree.  Not to mention the string returned by TextDatumGetCString, and
> whatever might be leaked during the underlying catalog accesses.
>
> If we were actually worried about transient space consumption of this
> function, it'd be necessary to do a lot more than this.  It doesn't
> look to me like it's worth worrying about though -- it doesn't seem
> like it could be hit more than once per query in normal cases.
>
> regards, tom lane
>

Thanks Tom for replying.

What do you think of the following patch ?

Cheers


rfcolumn-free.patch
Description: Binary data


Re: Estimating HugePages Requirements?

2022-03-21 Thread Nathan Bossart
On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
> A simple approach could be to just set log_min_messages to PANIC before
> exiting.  I've attached a patch for this.  With this patch, we'll still see
> a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
> running server, and there will be no extra output as long as the user sets
> log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
> comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
> extra output as long as the user sets log_min_messages to DEBUG2 or higher.

I created a commitfest entry for this:

https://commitfest.postgresql.org/38/3596/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

2022-03-21 Thread Jimmy Yih
Tom Lane  wrote:
> Hence, I propose the attached.  0001 is pure refactoring: it hopefully
> clears up the confusion about which "relkind" is which, and it also
> saves a couple of redundant syscache fetches in RemoveRelations.
> Then 0002 adds the actual bug fix as well as a test case that does
> deadlock with unpatched code.

The proposed patches look great and make much more sense. I see you've
already squashed and committed in
7b6ec86532c2ca585d671239bba867fe380448ed. Thanks!

--
Jimmy Yih (VMware)
Gaurab Dey (VMware)



Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-21 Thread David Christensen
On Mon, Mar 21, 2022 at 4:39 PM Thomas Munro  wrote:

[snip]

I guess you did this because init fork references aren't really
> expected in the WAL, but I think it's more consistent to allow up to
> MAX_FORKNUM, not least because your documentation mentions 3 as a
> valid value.  So I adjust this to allow MAX_FORKNUM.  Make sense?
>

Makes sense, but I think I'd actually thought it was +1 of the max forks,
so you give me more credit than I deserve in this case... :-)


> Here are some more details I noticed, as a likely future user of this
> very handy feature, which I haven't changed, because they seem more
> debatable and you might disagree...
>
> 1.  I think it'd be less surprising if the default value for --fork
> wasn't 0... why not show all forks?
>

Agreed; made it default to all, with the ability to filter down if desired.


> 2.  I think it'd be less surprising if --fork without --relation
> either raised an error (like --block without --relation), or were
> allowed, with the meaning "show me this fork of all relations".
>

Agreed; reworked to support the use case of only showing target forks.


> 3.  It seems funny to have no short switch for --fork when everything
> else has one... what about -F?
>

Good idea; I'd hadn't seen capitals in the getopt list so didn't consider
them, but I like this.

Enclosed is v6, incorporating these fixes and docs tweaks.

Best,

David


v6-0001-Add-additional-filtering-options-to-pg_waldump.patch
Description: Binary data


Re: New Object Access Type hooks

2022-03-21 Thread Mark Dilger


> On Mar 21, 2022, at 1:30 PM, Andrew Dunstan  wrote:
> 
> To the best of my knowledge .control files are only used by extensions,
> not by other modules. They are only referenced in
> src/backend/commands/extension.c in the backend code. For example,
> auto_explain which is a loadable module but not en extension does not
> have one, and I bet if you remove it you'll find this will work just fine.

Fixed, also with adjustments to Joshua's function comments.



v3-0001-Add-String-object-access-hooks.patch
Description: Binary data


v3-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: PoC: using sampling to estimate joins / complex conditions

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-21 01:06:37 +0100, Tomas Vondra wrote:
> Yeah, I haven't updated some of the test output because some of those
> changes are a bit wrong (and I think that's fine for a PoC patch). I
> should have mentioned that in the message, though. Sorry about that.

Given that the patch hasn't been updated since January and that it's a PoC in
the final CF, it seems like it should at least be moved to the next CF? Or
perhaps returned?

I've just marked it as waiting-on-author for now - iirc that leads to fewer
reruns by cfbot once it's failing...


> 2) The correlated samples are currently built using a query, executed
> through SPI in a loop. So given a "driving" sample of 30k rows, we do
> 30k lookups - that'll take time, even if we do that just once and cache
> the results.

Ugh, yea, that's going to increase overhead by at least a few factors.


> I'm sure there there's room for some improvement, though - for example
> we don't need to fetch all columns included in the statistics object,
> but just stuff referenced by the clauses we're estimating. That could
> improve chance of using IOS etc.

Yea. Even just avoid avoiding SPI / planner + executor seems likely to be a
big win.


It seems one more of the cases where we really need logic to recognize "cheap"
vs "expensive" plans, so that we only do sampling when useful. I don't think
that's solved just by having a declarative syntax.


Greetings,

Andres Freund




Re: Add sub-transaction overflow status in pg_stat_activity

2022-03-21 Thread Andres Freund
On 2022-01-14 11:25:45 -0500, Tom Lane wrote:
> Julien Rouhaud  writes:
> > Like many I previously had to investigate a slowdown due to sub-transaction
> > overflow, and even with the information available in a monitoring view (I 
> > had
> > to rely on a quick hackish extension as I couldn't patch postgres) it's not
> > terribly fun to do this way.  On top of that log analyzers like pgBadger 
> > could
> > help to highlight such a problem.
> 
> It feels to me like far too much effort is being invested in fundamentally
> the wrong direction here.  If the subxact overflow business is causing
> real-world performance problems, let's find a way to fix that, not put
> effort into monitoring tools that do little to actually alleviate anyone's
> pain.

There seems to be some agreement on this (I certainly do agree). Thus it seems
we should mark the CF entry as rejected?

It's been failing on cfbot for weeks... 
https://cirrus-ci.com/task/5289336424890368?logs=docs_build#L347




Re: TAP output format in pg_regress

2022-03-21 Thread Andres Freund
Hi,

On 2022-02-22 15:10:11 +0100, Daniel Gustafsson wrote:
> > On 22 Feb 2022, at 00:08, Daniel Gustafsson  wrote:
> 
> > I'll fix that.
> 
> The attached v3 fixes that thinko, and cleans up a lot of the output which
> isn't diagnostic per the TAP spec to make it less noisy.  It also fixes tag
> support used in the ECPG tests and a few small cleanups.  There is a blank 
> line
> printed which needs to be fixed, but I'm running out of time and wanted to get
> a non-broken version on the list before putting it aside for today.
> 
> The errorpaths that exit(2) the testrun should be converted to "bail out" 
> lines
> when running with TAP output, but apart from that I think it's fairly spec
> compliant.

This is failing with segmentation faults on cfbot:
https://cirrus-ci.com/task/4879227009892352?logs=test_world#L21

Looks like something around isolationtester is broken?

Unfortunately there's no useful backtraces for isolationtester. Not sure
what's up there.


Seems like we might not have energy to push this forward in the next two
weeks, so maybe the CF entry should be marked returned or moved for now?

Greetings,

Andres Freund




Re: Add parameter jit_warn_above_fraction

2022-03-21 Thread Andres Freund
Hi,

On 2022-03-07 13:10:32 +0100, Magnus Hagander wrote:
> Meanwhile here is an updated based on your other comments above, as
> well as those from Julien.

This fails on cfbot, due to compiler warnings:
https://cirrus-ci.com/task/5127667648299008?logs=mingw_cross_warning#L390

Greetings,

Andres Freund




Re: pgcrypto: Remove internal padding implementation

2022-03-21 Thread Nathan Bossart
On Wed, Mar 16, 2022 at 11:12:06AM +0100, Peter Eisentraut wrote:
> Right, the previous behaviors were clearly faulty.  I have updated the
> commit message to call out the behavior change more clearly.
> 
> This patch is now complete from my perspective.

I took a look at this patch and found nothing of concern.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Fast COPY FROM based on batch insert

2022-03-21 Thread Andres Freund
On 2021-06-07 16:16:58 +0500, Andrey Lepikhov wrote:
> Second version of the patch fixes problems detected by the FDW regression
> tests and shows differences of error reports in tuple-by-tuple and batched
> COPY approaches.

Patch doesn't apply and likely hasn't for a while...




Re: Identify missing publications from publisher while create/alter subscription.

2022-03-21 Thread Andres Freund
On 2022-02-13 19:34:05 +0530, vignesh C wrote:
> Thanks for the comments, the attached v14 patch has the changes for the same.

The patch needs a rebase, it currently fails to apply:
http://cfbot.cputube.org/patch_37_2957.log




Re: Make mesage at end-of-recovery less scary.

2022-03-21 Thread Andres Freund
Hi,

On 2022-03-04 09:43:59 +0900, Kyotaro Horiguchi wrote:
> On second thought the two seems repeating the same things.  Thus I
> merged the two comments together.  In this verion 16 it looks like
> this.

Patch currently fails to apply, needs a rebase:
http://cfbot.cputube.org/patch_37_2490.log

Greetings,

Andres Freund




Re: Parameter for planner estimate of recursive queries

2022-03-21 Thread Andres Freund
Hi,

On 2022-03-10 17:42:14 +, Simon Riggs wrote:
> Shall I set this as Ready For Committer?

Currently this CF entry fails on cfbot: 
https://cirrus-ci.com/task/4531771134967808?logs=test_world#L1158

[16:27:35.772] #   Failed test 'no parameters missing from 
postgresql.conf.sample'
[16:27:35.772] #   at t/003_check_guc.pl line 82.
[16:27:35.772] #  got: '1'
[16:27:35.772] # expected: '0'
[16:27:35.772] # Looks like you failed 1 test of 3.
[16:27:35.772] [16:27:35] t/003_check_guc.pl ..

Marked as waiting on author.

- Andres




Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-25 19:21:01 +0800, Julien Rouhaud wrote:
> I rebased the pathset in attached v9.  Please double check that I didn't miss
> anything in the rebase.

Fails to apply at the moment: http://cfbot.cputube.org/patch_37_2947.log

Marked as waiting for author.

- Andres




Re: [PATCH] Add TOAST support for several system tables

2022-03-21 Thread Andres Freund
On 2022-02-28 18:08:48 -0500, Tom Lane wrote:
> =?UTF-8?B?U29maWEgS29waWtvdmE=?=  writes:
> > ACL lists in tables may potentially be large in size. I suggest adding 
> > TOAST support for system tables, namely pg_class, pg_attribute and 
> > pg_largeobject_metadata, for they include ACL columns.
> 
> TBH, the idea of adding a toast table to pg_class scares the daylights
> out of me.  I do not for one minute believe that you've fixed every
> problem that will cause, nor do I think "allow wider ACLs" is a
> compelling enough reason to take the risk.
> 
> I wonder if it'd be practical to move the ACLs for relations
> and attributes into some new table, indexed like pg_depend or
> pg_description, so that we could dodge that whole problem.
> pg_attrdef is a prototype for how this could work.
> 
> (Obviously, once we had such a table the ACLs for other things
> could be put in it, but I'm not sure that the ensuing breakage
> would be justified for other sorts of objects.)

As there has been no progress since this email, I'm marking this CF entry as
returned with feedback for now.

- Andres




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-21 Thread Yugo NAGATA
Hi Ishii-san,

On Sun, 20 Mar 2022 09:52:06 +0900 (JST)
Tatsuo Ishii  wrote:

> Hi Yugo,
> 
> I have looked into the patch and I noticed that  linkend=... endterm=...> is used in pgbench.sgml. e.g.
> 
> 
> 
> AFAIK this is the only place where "endterm" is used. In other places
> "link" tag is used instead:

Thank you for pointing out it. 

I've checked other places using  referring to , and found
that "xreflabel"s are used in such  tags. So, I'll fix it 
in this style.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-21 Thread Yugo NAGATA
On Sun, 20 Mar 2022 16:11:43 +0900 (JST)
Tatsuo Ishii  wrote:

> > Hi Yugo,
> > 
> > I tested with serialization error scenario by setting:
> > default_transaction_isolation = 'repeatable read'
> > The result was:
> > 
> > $ pgbench -t 10 -c 10 --max-tries=10 test
> > transaction type: 
> > scaling factor: 10
> > query mode: simple
> > number of clients: 10
> > number of threads: 1
> > maximum number of tries: 10
> > number of transactions per client: 10
> > number of transactions actually processed: 100/100
> > number of failed transactions: 0 (0.000%)
> > number of transactions retried: 35 (35.000%)
> > total number of retries: 74
> > latency average = 5.306 ms
> > initial connection time = 15.575 ms
> > tps = 1884.516810 (without initial connection time)
> > 
> > I had hard time to understand what those numbers mean:
> > number of transactions retried: 35 (35.000%)
> > total number of retries: 74
> > 
> > It seems "total number of retries" matches with the number of ERRORs
> > reported in PostgreSQL. Good. What I am not sure is "number of
> > transactions retried". What does this mean?
> 
> Oh, ok. I see it now. It turned out that "number of transactions
> retried" does not actually means the number of transactions
> rtried. Suppose pgbench exectutes following in a session:
> 
> BEGIN;-- transaction A starts
> :
> (ERROR)
> ROLLBACK; -- transaction A aborts
> 
> (retry)
> 
> BEGIN;-- transaction B starts
> :
> (ERROR)
> ROLLBACK; -- transaction B aborts
> 
> (retry)
> 
> BEGIN;-- transaction C starts
> :
> END;  -- finally succeeds
> 
> In this case "total number of retries:" = 2 and "number of
> transactions retried:" = 1. In this patch transactions A, B and C are
> regarded as "same" transaction, so the retried transaction count
> becomes 1. But it's confusing to use the language "transaction" here
> because A, B and C are different transactions. I would think it's
> better to use different language instead of "transaction", something
> like "cycle"? i.e.
> 
> number of cycles retried: 35 (35.000%)

In the original patch by Marina Polyakova it was "number of retried", 
but I changed it to "number of transactions retried" is because I felt
it was confusing with "number of retries". I chose the word "transaction"
because a transaction ends in any one of successful commit , skipped, or
failure, after possible retries. 

Well, I agree with that it is somewhat confusing wording. If we can find
nice word to resolve the confusion, I don't mind if we change the word. 
Maybe, we can use "executions" as well as "cycles". However, I am not sure
that the situation is improved by using such word because what such word
exactly means seems to be still unclear for users. 

Another idea is instead reporting only "the number of successfully
retried transactions" that does not include "failed transactions", 
that is, transactions failed after retries, like this;

 number of transactions actually processed: 100/100
 number of failed transactions: 0 (0.000%)
 number of successfully retried transactions: 35 (35.000%)
 total number of retries: 74 

The meaning is clear and there seems to be no confusion.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Add connection active, idle time to pg_stat_activity

2022-03-21 Thread Andres Freund
Hi,

On 2022-02-04 10:58:24 +0100, Sergey Dudoladov wrote:
> Thank you for the contribution. I included both of your diffs with
> minor changes.

This currently doesn't apply: http://cfbot.cputube.org/patch_37_3405.log

Could you rebase? Marking as waiting on author for now.

- Andres




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-03-21 Thread Andres Freund
Hi,

On 2022-02-19 11:06:18 -0500, Melanie Plageman wrote:
> v21 rebased with compile errors fixed is attached.

This currently doesn't apply (mea culpa likely): 
http://cfbot.cputube.org/patch_37_3272.log

Could you rebase? Marked as waiting-on-author for now.

- Andres




Re: Mingw task for Cirrus CI

2022-03-21 Thread Andres Freund
Hi,

On 2022-02-25 19:44:27 +0300, Melih Mutlu wrote:
> I've been working on adding Windows+MinGW environment into cirrus-ci tasks
> (discussion about ci is here [1]).

This doesn't apply anymore: http://cfbot.cputube.org/patch_37_3575.log

Could you rebase?

Greetings,

Andres Freund




Re: PoC: using sampling to estimate joins / complex conditions

2022-03-21 Thread Tomas Vondra



On 3/22/22 00:35, Andres Freund wrote:
> Hi,
> 
> On 2022-01-21 01:06:37 +0100, Tomas Vondra wrote:
>> Yeah, I haven't updated some of the test output because some of those
>> changes are a bit wrong (and I think that's fine for a PoC patch). I
>> should have mentioned that in the message, though. Sorry about that.
> 
> Given that the patch hasn't been updated since January and that it's a PoC in
> the final CF, it seems like it should at least be moved to the next CF? Or
> perhaps returned?
> 
> I've just marked it as waiting-on-author for now - iirc that leads to fewer
> reruns by cfbot once it's failing...
> 

Either option works for me.

> 
>> 2) The correlated samples are currently built using a query, executed
>> through SPI in a loop. So given a "driving" sample of 30k rows, we do
>> 30k lookups - that'll take time, even if we do that just once and cache
>> the results.
> 
> Ugh, yea, that's going to increase overhead by at least a few factors.
> 
> 
>> I'm sure there there's room for some improvement, though - for example
>> we don't need to fetch all columns included in the statistics object,
>> but just stuff referenced by the clauses we're estimating. That could
>> improve chance of using IOS etc.
> 
> Yea. Even just avoid avoiding SPI / planner + executor seems likely to be a
> big win.
> 
> 
> It seems one more of the cases where we really need logic to recognize "cheap"
> vs "expensive" plans, so that we only do sampling when useful. I don't think
> that's solved just by having a declarative syntax.
> 

Right.

I was thinking about walking the first table, collecting all the values,
and then doing a single IN () query for the second table - a bit like a
custom join (which seems a bit terrifying, TBH).

But even if we manage to make this much cheaper, there will still be
simple queries where it's going to be prohibitively expensive.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-14 01:39:30 +0100, Tomas Vondra wrote:
> Are you interested / willing to do some of this work?

This patch hasn't moved in the last two months. I think it may be time to
mark it as returned with feedback for now?

It's also failing tests, and has done so for months:

https://cirrus-ci.com/task/5308087077699584
https://api.cirrus-ci.com/v1/artifact/task/5308087077699584/log/src/test/regress/regression.diffs

Greetings,

Andres Freund




Re: Extensible storage manager API - smgr hooks

2022-03-21 Thread Andres Freund
Hi,

On 2021-06-30 05:36:11 +0300, Yura Sokolov wrote:
> Anastasia Lubennikova писал 2021-06-30 00:49:
> > Hi, hackers!
> > 
> > Many recently discussed features can make use of an extensible storage
> > manager API. Namely, storage level compression and encryption [1],
> > [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> > other features that may want to substitute PostgreSQL storage layer
> > with their implementation (i.e. lazy_restore [6]).
> > 
> > Attached is a proposal to change smgr API to make it extensible.  The
> > idea is to add a hook for plugins to get control in smgr and define
> > custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> > selector with smgr() function that loads f_smgr implementation.
> > 
> > As before it has only one implementation - smgr_md, which is wrapped
> > into smgr_standard().
> > 
> > To create custom implementation, a developer needs to implement smgr
> > API functions
> > static const struct f_smgr smgr_custom =
> > {
> > .smgr_init = custominit,
> > ...
> > }
> > 
> > create a hook function
> > 
> >const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
> >   {
> >   //Here we can also add some logic and chose which smgr to use
> > based on rnode and backend
> >   return &smgr_custom;
> >   }
> > 
> > and finally set the hook:
> > smgr_hook = smgr_custom;
> > 
> > [1]
> > https://www.postgresql.org/message-id/flat/11996861554042...@iva4-dd95b404a60b.qloud-c.yandex.net
> > [2]
> > https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> > [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> > [4]
> > https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> > [5]
> > https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> > [6]
> > https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> > 
> > --
> > 
> > Best regards,
> > Lubennikova Anastasia
> 
> Good day, Anastasia.
> 
> I also think smgr should be extended with different implementations aside of
> md.
> But which way concrete implementation will be chosen for particular
> relation?
> I believe it should be (immutable!) property of tablespace, and should be
> passed
> to smgropen. Patch in current state doesn't show clear way to distinct
> different
> implementations per relation.
> 
> I don't think patch should be that invasive. smgrsw could pointer to
> array instead of static array as it is of now, and then reln->smgr_which
> will remain with same meaning. Yep it then will need a way to select
> specific
> implementation, but something like `char smgr_name[NAMEDATALEN]` field with
> linear search in (i believe) small smgrsw array should be enough.
> 
> Maybe I'm missing something?

There has been no activity on this thread for > 6 months. Therefore I'm
marking it as returned with feedback. Anastasia, if you want to work on this,
please do, but there's obviously no way it can be merged into 15...

Greetings,

Andres




Re: Allow batched insert during cross-partition updates

2022-03-21 Thread Andres Freund
Hi,

On 2021-08-24 12:03:59 +0900, Amit Langote wrote:
> Tomas committed the bug-fix, so attaching a rebased version of the
> patch for $subject.

This patch is in the current CF, but doesn't apply: 
http://cfbot.cputube.org/patch_37_2992.log
marked the entry as waiting-on-author.

Greetings,

Andres Freund




Re: Fix overflow in DecodeInterval

2022-03-21 Thread Tom Lane
Joseph Koshakow  writes:
> [ v8-0001-Check-for-overflow-when-decoding-an-interval.patch ]

This isn't applying per the cfbot; looks like it got sideswiped
by 9e9858389.  Here's a quick rebase.  I've not reviewed it, but
I did notice (because git was in my face about this) that it's
got whitespace issues.  Please try to avoid unnecessary whitespace
changes ... pgindent will clean those up, but it makes reviewing
harder.

regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index ba0ec35ac5..014ec88e0d 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -21,6 +21,7 @@
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/pg_type.h"
+#include "common/int.h"
 #include "common/string.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -38,16 +39,20 @@ static int	DecodeNumberField(int len, char *str,
 			  int fmask, int *tmask,
 			  struct pg_tm *tm, fsec_t *fsec, bool *is2digits);
 static int	DecodeTime(char *str, int fmask, int range,
-	   int *tmask, struct pg_tm *tm, fsec_t *fsec);
+	   int *tmask, struct pg_tm *tm, int64 *hour, fsec_t *fsec);
+static int DecodeTimeForInterval(char *str, int fmask, int range,
+ int *tmask, struct pg_itm_in *itm_in);
 static const datetkn *datebsearch(const char *key, const datetkn *base, int nel);
 static int	DecodeDate(char *str, int fmask, int *tmask, bool *is2digits,
 	   struct pg_tm *tm);
-static char *AppendSeconds(char *cp, int sec, fsec_t fsec,
+static char *AppendSeconds(char *cp, int sec, int64 fsec,
 		   int precision, bool fillzeros);
-static void AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec,
-			   int scale);
-static void AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec,
-			int scale);
+static bool AdjustFractMicroseconds(long double frac, struct pg_itm_in *itm_in, int64 scale);
+static bool AdjustFractDays(double frac, struct pg_itm_in *pg_itm_in, int scale);
+static bool AdjustFractMonths(double frac, struct pg_itm_in *itm_in, int scale);
+static bool AdjustMicroseconds(int64 val, struct pg_itm_in *itm_in, int64 multiplier, double fval);
+static bool AdjustDays(int val, struct pg_itm_in *itm_in, int multiplier);
+static bool AdjustYears(int val, struct pg_itm_in *itm_in, int multiplier);
 static int	DetermineTimeZoneOffsetInternal(struct pg_tm *tm, pg_tz *tzp,
 			pg_time_t *tp);
 static bool DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t,
@@ -428,7 +433,7 @@ GetCurrentTimeUsec(struct pg_tm *tm, fsec_t *fsec, int *tzp)
  * Note that any sign is stripped from the input seconds values.
  */
 static char *
-AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros)
+AppendSeconds(char *cp, int sec, int64 fsec, int precision, bool fillzeros)
 {
 	Assert(precision >= 0);
 
@@ -437,10 +442,9 @@ AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros)
 	else
 		cp = pg_ultostr(cp, Abs(sec));
 
-	/* fsec_t is just an int32 */
 	if (fsec != 0)
 	{
-		int32		value = Abs(fsec);
+		int64		value = Abs(fsec);
 		char	   *end = &cp[precision + 1];
 		bool		gotnonzero = false;
 
@@ -453,8 +457,8 @@ AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros)
 		 */
 		while (precision--)
 		{
-			int32		oldval = value;
-			int32		remainder;
+			int64		oldval = value;
+			int64		remainder;
 
 			value /= 10;
 			remainder = oldval - value * 10;
@@ -475,7 +479,7 @@ AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros)
 		 * which will generate a correct answer in the minimum valid width.
 		 */
 		if (value)
-			return pg_ultostr(cp, Abs(fsec));
+			return pg_ulltostr(cp, Abs(fsec));
 
 		return end;
 	}
@@ -497,36 +501,96 @@ AppendTimestampSeconds(char *cp, struct pg_tm *tm, fsec_t fsec)
 }
 
 /*
- * Multiply frac by scale (to produce seconds) and add to *tm & *fsec.
- * We assume the input frac is less than 1 so overflow is not an issue.
+ * Multiply frac by scale (to produce microseconds) and add to *itm.
+ * We assume the input frac is less than 1 so overflow of frac is not an issue.
  */
-static void
-AdjustFractSeconds(double frac, struct pg_tm *tm, fsec_t *fsec, int scale)
+static bool
+AdjustFractMicroseconds(long double frac, struct pg_itm_in *itm_in, int64 scale)
 {
-	int			sec;
+	int64		usec;
+	int64		round = 0;
 
 	if (frac == 0)
-		return;
+		return true;
 	frac *= scale;
-	sec = (int) frac;
-	tm->tm_sec += sec;
-	frac -= sec;
-	*fsec += rint(frac * 100);
+	usec = (int64) frac;
+	if (pg_add_s64_overflow(itm_in->tm_usec, usec, &itm_in->tm_usec))
+		return false;
+	
+	frac = frac - usec;
+	if (frac > 0.5)
+		round = 1;
+	else if (frac < -0.5)
+		round = -1;
+
+	return !pg_add_s64_overflow(itm_in->tm_usec, round, &itm_in->tm_usec);
 }
 
 /* As above, but initial scale produces days */
-static void
-AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, int scale)
+stati

Re: Supply restore_command to pg_rewind via CLI argument

2022-03-21 Thread Andres Freund
Hi,

On 2021-11-05 15:10:29 +0500, Andrey Borodin wrote:
> > 4 нояб. 2021 г., в 17:55, Daniel Gustafsson  написал(а):
> > 
> > The patch no longer applies, can you submit a rebased version please?
> 
> Thanks, Daniel! PFA rebase.

Doesn't apply once more: http://cfbot.cputube.org/patch_37_3213.log

Greetings,

Andres Freund




Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2022-03-21 Thread Tomas Vondra
On 3/22/22 01:18, Andres Freund wrote:
> Hi,
> 
> On 2022-01-14 01:39:30 +0100, Tomas Vondra wrote:
>> Are you interested / willing to do some of this work?
> 
> This patch hasn't moved in the last two months. I think it may be time to
> mark it as returned with feedback for now?
> 
> It's also failing tests, and has done so for months:
> 
> https://cirrus-ci.com/task/5308087077699584
> https://api.cirrus-ci.com/v1/artifact/task/5308087077699584/log/src/test/regress/regression.diffs
> 
> Greetings,
> 

Yeah. I think it's a useful improvement, but it needs much more work
than is doable by the end of this CF. RwF seems about right.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel

2022-03-21 Thread Andres Freund
Hi,

On 2022-02-01 21:57:00 -0500, Jaime Casanova wrote:
> This patch adds a new option (-J num, --jobs-per-disk=num) in 
> pg_upgrade to speed up copy mode. This generates upto ${num} 
> processes per tablespace to copy segments of the same relfilenode 
> in parallel.
> 
> This can help when you have many multi gigabyte tables (each segment 
> is 1GB by default) in different tablespaces (each tablespace in a 
> different disk) and multiple processors.
> 
> In a customer's database (~20Tb) it went down from 6h to 4h 45min.
> 
> It lacks documentation and I need help with WIN32 part of it, I created
> this new mail to put the patch on the next commitfest.

The patch currently fails on cfbot due to warnings, likely related due to the
win32 issue: 
https://cirrus-ci.com/task/4566046517493760?logs=mingw_cross_warning#L388

As it's a new patch submitted to the last CF, hasn't gotten any review yet and
misses some platform support, it seems like there's no chance it can make it
into 15?

Greetings,

Andres Freund




Re: [PATCH] Proof of concept for GUC improvements

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-12 12:57:02 -0600, David Christensen wrote:
> > Hi,
> > 
> > According to the cfbot, the patch doesn't apply anymore and needs a
> > rebase: http://cfbot.cputube.org/patch_36_3290.log
> 
> V4 rebased attached. 

Doesn't apply anymore, again: http://cfbot.cputube.org/patch_37_3290.log

My impression is that there's not a lot of enthusiasm for the concept? If
that's true we maybe ought to mark the CF entry as rejected?

Greetings,

Andres Freund




Re: Proposal: allow database-specific role memberships

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-22 22:56:44 +0800, Julien Rouhaud wrote:
> On Sat, Jan 22, 2022 at 05:28:05AM -0800, Kenaniah Cerny wrote:
> > Thank you so much for the backtrace!
> > 
> > This latest patch should address by moving the dumpRoleMembership call to
> > before the pointer is closed.
> 
> Thanks!  The cfbot turned green since:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3374

red again: https://cirrus-ci.com/task/5516269981007872?logs=test_world#L1480

Marked as waiting-on-author.

- Andres




Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-17 01:05:14 -0800, Jeff Davis wrote:
> Great, I attached a rebased version.

Currently this doesn't apply: http://cfbot.cputube.org/patch_37_3394.log

- Andres




Re: Map WAL segment files on PMEM as WAL buffers

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-20 14:55:13 +0900, Takashi Menjo wrote:
> Here is patchset v8. It will have "make check-world" and Cirrus to
> pass.

This unfortunately does not apply anymore: 
http://cfbot.cputube.org/patch_37_3181.log

Could you rebase?

- Andres




Re: [PATCH] New [relation] option engine

2022-03-21 Thread Andres Freund
Hi,

On 2022-02-14 00:43:36 +0300, Nikolay Shaplov wrote:
> I'd like to introduce a patch that reworks options processing. 

This doesn't apply anymore: http://cfbot.cputube.org/patch_37_3536.log

Given that this patch has been submitted just to the last CF and that there's
been no action on it, I don't see this going into 15. Therefore I'd like to
move it to the next CF?

Greetings,

Andres Freund




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-21 Thread Tatsuo Ishii
> On Sun, 20 Mar 2022 16:11:43 +0900 (JST)
> Tatsuo Ishii  wrote:
> 
>> > Hi Yugo,
>> > 
>> > I tested with serialization error scenario by setting:
>> > default_transaction_isolation = 'repeatable read'
>> > The result was:
>> > 
>> > $ pgbench -t 10 -c 10 --max-tries=10 test
>> > transaction type: 
>> > scaling factor: 10
>> > query mode: simple
>> > number of clients: 10
>> > number of threads: 1
>> > maximum number of tries: 10
>> > number of transactions per client: 10
>> > number of transactions actually processed: 100/100
>> > number of failed transactions: 0 (0.000%)
>> > number of transactions retried: 35 (35.000%)
>> > total number of retries: 74
>> > latency average = 5.306 ms
>> > initial connection time = 15.575 ms
>> > tps = 1884.516810 (without initial connection time)
>> > 
>> > I had hard time to understand what those numbers mean:
>> > number of transactions retried: 35 (35.000%)
>> > total number of retries: 74
>> > 
>> > It seems "total number of retries" matches with the number of ERRORs
>> > reported in PostgreSQL. Good. What I am not sure is "number of
>> > transactions retried". What does this mean?
>> 
>> Oh, ok. I see it now. It turned out that "number of transactions
>> retried" does not actually means the number of transactions
>> rtried. Suppose pgbench exectutes following in a session:
>> 
>> BEGIN;   -- transaction A starts
>> :
>> (ERROR)
>> ROLLBACK; -- transaction A aborts
>> 
>> (retry)
>> 
>> BEGIN;   -- transaction B starts
>> :
>> (ERROR)
>> ROLLBACK; -- transaction B aborts
>> 
>> (retry)
>> 
>> BEGIN;   -- transaction C starts
>> :
>> END; -- finally succeeds
>> 
>> In this case "total number of retries:" = 2 and "number of
>> transactions retried:" = 1. In this patch transactions A, B and C are
>> regarded as "same" transaction, so the retried transaction count
>> becomes 1. But it's confusing to use the language "transaction" here
>> because A, B and C are different transactions. I would think it's
>> better to use different language instead of "transaction", something
>> like "cycle"? i.e.
>> 
>> number of cycles retried: 35 (35.000%)

I realized that the same argument can be applied even to "number of
transactions actually processed" because with the retry feature,
"transaction" could comprise multiple transactions.

But if we go forward and replace those "transactions" with "cycles"
(or whatever) altogether, probably it could bring enough confusion to
users who have been using pgbench. Probably we should give up the
language changing and redefine "transaction" when the retry feature is
enabled instead like "when retry feature is enabled, each transaction
can be consisted of multiple transactions retried."

> In the original patch by Marina Polyakova it was "number of retried", 
> but I changed it to "number of transactions retried" is because I felt
> it was confusing with "number of retries". I chose the word "transaction"
> because a transaction ends in any one of successful commit , skipped, or
> failure, after possible retries. 

Ok.

> Well, I agree with that it is somewhat confusing wording. If we can find
> nice word to resolve the confusion, I don't mind if we change the word. 
> Maybe, we can use "executions" as well as "cycles". However, I am not sure
> that the situation is improved by using such word because what such word
> exactly means seems to be still unclear for users. 
> 
> Another idea is instead reporting only "the number of successfully
> retried transactions" that does not include "failed transactions", 
> that is, transactions failed after retries, like this;
> 
>  number of transactions actually processed: 100/100
>  number of failed transactions: 0 (0.000%)
>  number of successfully retried transactions: 35 (35.000%)
>  total number of retries: 74 
> 
> The meaning is clear and there seems to be no confusion.

Thank you for the suggestion. But I think it would better to leave it
as it is because of the reason I mentioned above.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




  1   2   >