date:20211214

Question about HEAP_XMIN_COMMITTED

2021-12-14 Thread liuhuail...@fujitsu.com

Hi

I did the following steps on PG.

1. Building a synchronous streaming replication environment.
2. Executing the following SQL statements on primary
  (1) postgres=# CREATE EXTENSION pageinspect;
  (2) postgres=# begin;
  (3) postgres=# select txid_current();
  (4) postgres=# create table mytest6(i int);
  (6) postgres=# insert into mytest6 values(1);
  (7) postgres=# commit;
3. Executing the following SQL statements on standby
  (8) postgres=# select * from mytest6;
 i 
---
 1
  (1 row)
  (9) postgres=# SELECT t_infomask FROM 
heap_page_items(get_raw_page('pg_class', 0)) where t_xmin=502※;
　   　t_infomask 
   
2049
  (1 row)
  ※502 is the transaction ID returned by step (3) above.

In the result of step (9),the value of the t_infomask field is 2049(0x801) 
which means that HEAP_XMAX_INVALID 
and HEAP_HASNULL flags were setted, but HEAP_XMIN_COMMITTED flag was not setted.

According to source , when step (8) was executed,SetHintBits function were 
called to set HEAP_XMIN_COMMITTED.
however, the minRecoveryPoint value was not updated. So HEAP_XMIN_COMMITTED 
flag was not setted successfully.

After CheckPoint, select from mytest6 again in another session, we can see 
HEAP_XMIN_COMMITTED flag was setted.

So my question is that before checkpoint, HEAP_XMIN_COMMITTED flag was not 
setted correctly, right?

Or we need to move minRecoveryPoint forword to make HEAP_XMIN_COMMITTED flag 
setted correctly when first select
from mytest6.


Best Regards, LiuHuailing
-- 
以上
Liu Huailing
--
Liu Huailing
Development Department III
Software Division II
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
ADDR.: No.6 Wenzhu Road, Software Avenue,
   Nanjing, 210012, China 
TEL  : +86+25-86630566-8439
COINS: 7998-8439
FAX  : +86+25-83317685
MAIL : liuhuail...@cn.fujitsu.com
--

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Amit Kapila

On Tue, Dec 14, 2021 at 1:07 PM Dilip Kumar  wrote:
>
> On Tue, Dec 14, 2021 at 8:20 AM Amit Kapila  wrote:
> >
> > On Mon, Dec 13, 2021 at 6:55 PM Masahiko Sawada  
> > wrote:
>
> > > In streaming cases, we don’t know when stream-commit or stream-abort
> > > comes and another conflict could occur on the subscription in the
> > > meanwhile. But given that (we expect) this feature is used after the
> > > apply worker enters into an error loop, this is unlikely to happen in
> > > practice unless the user sets the wrong XID. Similarly, in 2PC cases,
> > > we don’t know when commit-prepared or rollback-prepared comes and
> > > another conflict could occur in the meanwhile. But this could occur in
> > > practice even if the user specified the correct XID. Therefore, if we
> > > disallow to change skip_xid until the subscriber receives
> > > commit-prepared or rollback-prepared, we cannot skip the second
> > > transaction that conflicts with data on the subscriber.
> > >
> >
> > I agree with this theory. Can we reflect this in comments so that in
> > the future we know why we didn't pursue this direction?
>
> I might be missing something here, but for streaming, transaction
> users can decide whether they wants to skip or not only once we start
> applying no?  I mean only once we start applying the changes we can
> get some errors and by that time we must be having all the changes for
> the transaction.
>

That is right and as per my understanding, the patch is trying to
accomplish the same.

>  So I do not understand the point we are trying to
> discuss here?
>

The point is that whether we can skip the changes while streaming
itself like when we get the changes and write to a stream file. Now,
it is possible that streams from multiple transactions can be
interleaved and users can change the skip_xid in between. It is not
that we can't handle this but that would require a more complex design
and it doesn't seem worth it because we can anyway skip the changes
while applying as you mentioned in the previous paragraph.

-- 
With Regards,
Amit Kapila.

Re: Adding CI to our tree

2021-12-14 Thread Daniel Gustafsson

> On 14 Dec 2021, at 05:11, Andres Freund  wrote:
> On 2021-12-14 16:51:58 +1300, Thomas Munro wrote:
>> I'd better go and figure out how to fix cfbot when this lands...
> 
> I assume it'd be:
> - stop adding the CI stuff
> - adjust links to CI tasks, appveyor wouldn't be used anymore
> - perhaps reference individual tasks from the cfbot page?

+1 on leveraging the CI tasks in the tree in the CFBot.  For a patch like the
libnss TLS backend one it would be a great help to both developer and reviewer
to have that codepath actually built and tested as part of the CFBot; being
able to tweak the CI tasks used in the CFBot per patch would be very helpful.

--
Daniel Gustafsson   https://vmware.com/

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread vignesh C

On Fri, Dec 10, 2021 at 11:14 AM Masahiko Sawada  wrote:
>
> On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila  
> > > wrote:
> > > >
> > > > I am thinking that we can start a transaction, update the catalog,
> > > > commit that transaction. Then start a new one to update
> > > > origin_lsn/timestamp, finishprepared, and commit it. Now, if it
> > > > crashes after the first transaction, only commit prepared will be
> > > > resent again and this time we don't need to update the catalog as that
> > > > entry would be already cleared.
> > >
> > > Sounds good. In the crash case, it should be fine since we will just
> > > commit an empty transaction. The same is true for the case where
> > > skip_xid has been changed after skipping and preparing the transaction
> > > and before handling commit_prepared.
> > >
> > > Regarding the case where the user specifies XID of the transaction
> > > after it is prepared on the subscriber (i.g., the transaction is not
> > > empty), we won’t skip committing the prepared transaction. But I think
> > > that we don't need to support skipping already-prepared transaction
> > > since such transaction doesn't conflict with anything regardless of
> > > having changed or not.
> > >
> >
> > Yeah, this makes sense to me.
> >
>
> I've attached an updated patch. The new syntax is like "ALTER
> SUBSCRIPTION testsub SKIP (xid = '123')".
>
> I’ve been thinking we can do something safeguard for the case where
> the user specified the wrong xid. For example, can we somewhat use the
> stats in pg_stat_subscription_workers? An idea is that logical
> replication worker fetches the xid from the stats when reading the
> subscription and skips the transaction if the xid matches to
> subskipxid. That is, the worker checks the error reported by the
> worker previously working on the same subscription. The error could
> not be a conflict error (e.g., connection error etc.) or might have
> been cleared by the reset function, But given the worker is in an
> error loop, the worker can eventually get xid in question. We can
> prevent an unrelated transaction from being skipped unexpectedly. It
> seems not a stable solution though. Or it might be enough to warn
> users when they specified an XID that doesn’t match to last_error_xid.
> Anyway, I think it’s better to have more discussion on this. Any
> ideas?

Few comments:
1) Should we check if conflicting option is specified like others above:
+   else if (strcmp(defel->defname, "xid") == 0)
+   {
+   char *xid_str = defGetString(defel);
+   TransactionId xid;
+
+   if (strcmp(xid_str, "-1") == 0)
+   {
+   /* Setting -1 to xid means to reset it */
+   xid = InvalidTransactionId;
+   }
+   else
+   {

2) Currently only superusers can set skip xid, we can add this in the
documentation:
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to set %s", "skip_xid")));

3) There is an extra tab before "The resolution can be done ...", it
can be removed.
+  Skip applying changes of the particular transaction.  If incoming data
+  violates any constraints the logical replication will stop until it is
+  resolved. The resolution can be done either by changing data on the
+  subscriber so that it doesn't conflict with incoming change or
by skipping
+  the whole transaction.  The logical replication worker skips all data

4) xid with -2 is currently allowed, may be it is ok. If it is fine we
can remove it from the fail section.
+-- fail
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 1.1);
+ERROR:  invalid transaction id: 1.1
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = -2);
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 0);
+ERROR:  invalid transaction id: 0
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 1);

Regards,
Vignesh

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Dilip Kumar

On Tue, Dec 14, 2021 at 2:36 PM Amit Kapila  wrote:
>
> > >
> > > I agree with this theory. Can we reflect this in comments so that in
> > > the future we know why we didn't pursue this direction?
> >
> > I might be missing something here, but for streaming, transaction
> > users can decide whether they wants to skip or not only once we start
> > applying no?  I mean only once we start applying the changes we can
> > get some errors and by that time we must be having all the changes for
> > the transaction.
> >
>
> That is right and as per my understanding, the patch is trying to
> accomplish the same.
>
> >  So I do not understand the point we are trying to
> > discuss here?
> >
>
> The point is that whether we can skip the changes while streaming
> itself like when we get the changes and write to a stream file. Now,
> it is possible that streams from multiple transactions can be
> interleaved and users can change the skip_xid in between. It is not
> that we can't handle this but that would require a more complex design
> and it doesn't seem worth it because we can anyway skip the changes
> while applying as you mentioned in the previous paragraph.

Actually, I was trying to understand the use case for skipping while
streaming.  Actually, during streaming we are not doing any database
operation that means this will not generate any error.  So IIUC, there
is no use case for skipping while streaming itself? Is there any use
case which I am not aware of?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Add client connection check during the execution of the query

2021-12-14 Thread Thomas Munro

On Tue, Dec 14, 2021 at 11:18 AM Thomas Munro  wrote:
> On Tue, Dec 14, 2021 at 7:53 AM Andres Freund  wrote:
> > On 2021-12-13 17:51:00 +1300, Thomas Munro wrote:
> > > I tried that.  It seems OK, and gets rid of the PG_FINALLY(), which is
> > > nice.  Latches still have higher priority, and still have the fast
> > > return if already set and you asked for only one event, but now if you
> > > ask for nevents > 1 we'll make the syscall too so we'll see the
> > > WL_SOCKET_CLOSED.
> >
> > Isn't a certain postgres committer that cares a lot about unnecessary 
> > syscalls
> > going to be upset about this one? Even with the nevents > 1 optimization? 
> > Yes,
> > right now there's no other paths that do so, but I don't like the corner 
> > this
> > paints us in.
>
> Well, I was trying to avoid bikeshedding an API change just for a
> hypothetical problem we could solve when the time comes (say, after
> fixing the more egregious problems with Append's WES usage), but here
> goes: we could do something like AddWaitEventToSet(FeBeWaitSet,
> WL_LATCH_SET_LOPRIO, ...) that is translated to WL_LATCH_SET
> internally but also sets a flag to enable this
> no-really-please-poll-all-the-things-if-there-is-space behaviour.

Here's one like that.
From 6b994807c29157ac7053ec347a51268d60f2b3b4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 30 Apr 2021 10:38:40 +1200
Subject: [PATCH v5 1/3] Add WL_SOCKET_CLOSED for socket shutdown events.

Provide a way for WaitEventSet to report that the remote peer has shut
down its socket, independently of whether there is any buffered data
remaining to be read.  This works only on systems where the kernel
exposes that information, namely:

* WAIT_USE_POLL builds using POLLRDHUP, if available
* WAIT_USE_EPOLL builds using EPOLLRDHUP
* WAIT_USE_KQUEUE builds using EV_EOF

Reviewed-by: Zhihong Yu 
Reviewed-by: Maksim Milyutin 
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 src/backend/storage/ipc/latch.c | 79 +
 src/include/storage/latch.h |  6 ++-
 2 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 1d893cf863..54e928c564 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -841,6 +841,7 @@ FreeWaitEventSet(WaitEventSet *set)
  * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
  *	 can be combined with other WL_SOCKET_* events (on non-Windows
  *	 platforms, this is the same as WL_SOCKET_WRITEABLE)
+ * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer.
  * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
  *
  * Returns the offset in WaitEventSet->events (starting from 0), which can be
@@ -1043,12 +1044,16 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action)
 	else
 	{
 		Assert(event->fd != PGINVALID_SOCKET);
-		Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE));
+		Assert(event->events & (WL_SOCKET_READABLE |
+WL_SOCKET_WRITEABLE |
+WL_SOCKET_CLOSED));
 
 		if (event->events & WL_SOCKET_READABLE)
 			epoll_ev.events |= EPOLLIN;
 		if (event->events & WL_SOCKET_WRITEABLE)
 			epoll_ev.events |= EPOLLOUT;
+		if (event->events & WL_SOCKET_CLOSED)
+			epoll_ev.events |= EPOLLRDHUP;
 	}
 
 	/*
@@ -1087,12 +1092,18 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event)
 	}
 	else
 	{
-		Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE));
+		Assert(event->events & (WL_SOCKET_READABLE |
+WL_SOCKET_WRITEABLE |
+WL_SOCKET_CLOSED));
 		pollfd->events = 0;
 		if (event->events & WL_SOCKET_READABLE)
 			pollfd->events |= POLLIN;
 		if (event->events & WL_SOCKET_WRITEABLE)
 			pollfd->events |= POLLOUT;
+#ifdef POLLRDHUP
+		if (event->events & WL_SOCKET_CLOSED)
+			pollfd->events |= POLLRDHUP;
+#endif
 	}
 
 	Assert(event->fd != PGINVALID_SOCKET);
@@ -1165,7 +1176,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 	Assert(event->events != WL_LATCH_SET || set->latch != NULL);
 	Assert(event->events == WL_LATCH_SET ||
 		   event->events == WL_POSTMASTER_DEATH ||
-		   (event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)));
+		   (event->events & (WL_SOCKET_READABLE |
+			 WL_SOCKET_WRITEABLE |
+			 WL_SOCKET_CLOSED)));
 
 	if (event->events == WL_POSTMASTER_DEATH)
 	{
@@ -1188,9 +1201,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 		 * old event mask to the new event mask, since kevent treats readable
 		 * and writable as separate events.
 		 */
-		if (old_events & WL_SOCKET_READABLE)
+		if (old_events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED))
 			old_filt_read = true;
-		if (event->events & WL_SOCKET_READABLE)
+		if (event->events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED))
 			new_filt_read = true;
 		if (old_events & WL_SOCKET_WRITEABLE)
 			old_filt_write = true;
@@ -1210,7 +1223,10 @@

Re: row filtering for logical replication

2021-12-14 Thread Amit Kapila

On Tue, Dec 14, 2021 at 10:50 AM Amit Kapila  wrote:
>
> On Tue, Dec 14, 2021 at 4:44 AM Peter Smith  wrote:
>
> Few other comments:
> ===

Few more comments:
==
v46-0001/0002
===
1. After rowfilter_walker() why do we need
EXPR_KIND_PUBLICATION_WHERE? I thought this is primarily to identify
the expressions that are not allowed in rowfilter which we are now
able to detect upfront with the help of a walker. Can't we instead use
EXPR_KIND_WHERE?

2.
+Node *
+GetTransformedWhereClause(ParseState *pstate, PublicationRelInfo *pri,
+   bool bfixupcollation)

Can we add comments atop this function?

3. In GetTransformedWhereClause, can we change the name of variables
(a) bfixupcollation to fixup_collation or assign_collation, (b)
transformedwhereclause to whereclause. I think that will make the
function more readable.

v46-0002

4.
+ else if (IsA(node, List) || IsA(node, Const) || IsA(node, BoolExpr)
|| IsA(node, NullIfExpr) ||
+ IsA(node, NullTest) || IsA(node, BooleanTest) || IsA(node, CoalesceExpr) ||
+ IsA(node, CaseExpr) || IsA(node, CaseTestExpr) || IsA(node, MinMaxExpr) ||
+ IsA(node, ArrayExpr) || IsA(node, ScalarArrayOpExpr) || IsA(node, XmlExpr))

Can we move this to a separate function say IsValidRowFilterExpr() or
something on those lines and use Switch (nodetag(node)) to identify
these nodes?

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Amit Kapila

On Tue, Dec 14, 2021 at 3:41 PM Dilip Kumar  wrote:
>
> On Tue, Dec 14, 2021 at 2:36 PM Amit Kapila  wrote:
> >
> > > >
> > > > I agree with this theory. Can we reflect this in comments so that in
> > > > the future we know why we didn't pursue this direction?
> > >
> > > I might be missing something here, but for streaming, transaction
> > > users can decide whether they wants to skip or not only once we start
> > > applying no?  I mean only once we start applying the changes we can
> > > get some errors and by that time we must be having all the changes for
> > > the transaction.
> > >
> >
> > That is right and as per my understanding, the patch is trying to
> > accomplish the same.
> >
> > >  So I do not understand the point we are trying to
> > > discuss here?
> > >
> >
> > The point is that whether we can skip the changes while streaming
> > itself like when we get the changes and write to a stream file. Now,
> > it is possible that streams from multiple transactions can be
> > interleaved and users can change the skip_xid in between. It is not
> > that we can't handle this but that would require a more complex design
> > and it doesn't seem worth it because we can anyway skip the changes
> > while applying as you mentioned in the previous paragraph.
>
> Actually, I was trying to understand the use case for skipping while
> streaming.  Actually, during streaming we are not doing any database
> operation that means this will not generate any error.
>

Say, there is an error the first time when we start to apply changes
for such a transaction. So, such a transaction will be streamed again.
Say, the user has set the skip_xid before we stream a second time, so
this time, we can skip it either during the stream phase or apply
phase. I think the patch is skipping it during apply phase.
Sawada-San, please confirm if my understanding is correct?

-- 
With Regards,
Amit Kapila.

Confused comment about drop replica identity index

2021-12-14 Thread wangw.f...@fujitsu.com

Hi hackers,

When I doing development based by PG, I found the following comment have a
little problem in file src/include/catalog/pg_class.h.

/*
 * an explicitly chosen candidate key's columns are used as replica identity.
 * Note this will still be set if the index has been dropped; in that case it
 * has the same meaning as 'd'.
 */
#define   REPLICA_IDENTITY_INDEX'i'

The last sentence makes me a little confused :
[..in that case it as the same meaning as 'd'.]

Now, pg-doc didn't have a clear style to describe this.


But if I drop relation's replica identity index like the comment, the action
is not as same as default.

For example:
Execute the following SQL:
create table tbl (col1 int  primary key, col2 int not null);
create unique INDEX ON tbl(col2);
alter table tbl replica identity using INDEX tbl_col2_idx;
drop index tbl_col2_idx;
create publication pub for table tbl;
delete from tbl;

Actual result:
ERROR:  cannot delete from table "tbl" because it does not have a replica 
identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER 
TABLE.

Expected result in comment:
DELETE 0


I found that in the function CheckCmdReplicaIdentity, the operation described
in the comment is not considered,
When relation's replica identity index is found to be InvalidOid, an error is
reported.

Are the comment here not accurate enough?
Or we need to adjust the code according to the comments?


Regards,
Wang wei

Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-14 Thread Ashutosh Sharma

Hi,

I have looked into the v2 patch and here are my comments:

+   PG_RETURN_INT32(local_beentry->subxact_overflowed);
+}

Should this be PG_RETURN_BOOL instead of PG_RETURN_INT32??

--

+{ oid => '6107', descr => 'statistics: cached subtransaction count of
backend',
+  proname => 'pg_stat_get_backend_subxact_count', provolatile => 's',
proparallel => 'r',
+  prorettype => 'int4', proargtypes => 'int4',
+  prosrc => 'pg_stat_get_backend_subxact_count' },
+{ oid => '6108', descr => 'statistics: subtransaction cache of backend
overflowed',
+  proname => 'pg_stat_get_backend_subxact_overflow', provolatile => 's',
proparallel => 'r',
+  prorettype => 'bool', proargtypes => 'int4',
+  prosrc => 'pg_stat_get_backend_subxact_overflow' },

The description says that the two new functions show the statistics for
"cached subtransaction count of backend" and "subtransaction cache of
backend overflowed". But, when these functions are called it shows these
stats for the non-backend processes like checkpointer, walwriter etc as
well. Should that happen?

--

typedef struct LocalPgBackendStatus
 * not.
 */
TransactionId backend_xmin;
+
+   /*
+* Number of cached subtransactions in the current session.
+*/
+   int subxact_count;
+
+   /*
+* The number of subtransactions in the current session exceeded the
cached
+* subtransaction limit.
+*/
+   bool subxact_overflowed;

All the variables inside this LocalPgBackendStatus structure are prefixed
with "backend" word. Can we do the same for the newly added variables as
well?

--

+ * Get the xid and xmin, nsubxid and overflow status of the backend.
The

Should this be put as - "xid, xmin, nsubxid and overflow" instead of "xid
and xmin, nsubxid and overflow"?

--
With Regards,
Ashutosh Sharma.


On Mon, Dec 13, 2021 at 7:58 PM Dilip Kumar  wrote:

> On Tue, Dec 7, 2021 at 11:11 AM Justin Pryzby 
> wrote:
>
> >
> > You added this to pg_stat_activity, which already has a lot of fields.
> > We talked a few months ago about not adding more fields that weren't
> commonly
> > used.
> >
> https://www.postgresql.org/message-id/flat/20210426191811.sp3o77doinphyjhu%40alap3.anarazel.de#d96d0a116f0344301eead2676ea65b2e
> >
> > Since I think this field is usually not interesting to most users of
> > pg_stat_activity, maybe this should instead be implemented as a function
> like
> > pg_backend_get_subxact_status(pid).
> >
> > People who want to could use it like:
> > SELECT * FROM pg_stat_activity psa, pg_backend_get_subxact_status(pid)
> sub;
>
> I have provided two function, one for subtransaction counts and other
> whether subtransaction cache is overflowed or not, we can use like
> this,  if we think this is better way to do it then we can also add
> another function for the lastOverflowedXid
>
> postgres[43994]=# select id, pg_stat_get_backend_pid(id) as pid,
> pg_stat_get_backend_subxact_count(id) as nsubxact,
> pg_stat_get_backend_subxact_overflow(id) as overflowed from
> pg_stat_get_backend_idset() as id;
>  id |  pid  | nsubxact | overflowed
> +---+--+
>   1 | 43806 |0 | f
>   2 | 43983 |   64 | t
>   3 | 43994 |0 | f
>   4 | 44323 |   22 | f
>   5 | 43802 |0 | f
>   6 | 43801 |0 | f
>   7 | 43804 |0 | f
> (7 rows)
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>

RE: Failed transaction statistics to measure the logical replication progress

2021-12-14 Thread houzj.f...@fujitsu.com

On Tues, Dec 14, 2021 10:28 AM Amit Kapila  wrote:
> On Mon, Dec 13, 2021 at 5:48 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, December 13, 2021 6:19 PM Amit Kapila
>  wrote:
> > > On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > >
> > > Few questions and comments:
> > Thank you for your comments !
> >
> > > 
> > > 1.
> > > one row per subscription worker on which errors have occurred, for 
> > > workers
> > > applying logical replication changes and workers handling the initial 
> > > data
> > > -   copy of the subscribed tables.  The statistics entry is removed when 
> > > the
> > > -   corresponding subscription is dropped.
> > > +   copy of the subscribed tables. Also, the row corresponding to the 
> > > apply
> > > +   worker shows all transaction statistics of both types of workers on 
> > > the
> > > +   subscription. The statistics entry is removed when the corresponding
> > > +   subscription is dropped.
> > >
> > > Why did you choose to show stats for both types of workers in one row?
> > This is because if we have hundreds or thousands of tables for table sync,
> > we need to create many entries to cover them and store the entries for all
> > tables.
> >
> 
> If we fear a large number of entries for such workers then won't it be
> better to show the value of these stats for apply workers. I
> think normally the table sync workers perform only copy operation or
> maybe a fixed number of xacts, so, one might not be interested in the
> transaction stats of these workers. I find merging only specific stats
> of two different types of workers confusing.
> 
> What do others think about this?

Personally, I agreed that merging two types of stats into one row might not be
a good idea. And the xact stats of table sync workers are usually less
interesting than the apply worker's, So, it's seems acceptable to me if we
show stats only for apply workers and document about this.

Best regards,
Hou zj

Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

2021-12-14 Thread Bharath Rupireddy

On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera  wrote:
> >
> > On 2021-Dec-01, Bharath Rupireddy wrote:
> >
> > > The active_pid of ReplicationSlot structure, which tells whether a
> > > replication slot is active or inactive, isn't persisted to the disk
> > > i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
> > > better if we add that info to ReplicationSlotPersistentData structure
> > > and persist to the disk? This will help to know what were the inactive
> > > replication slots in case the server goes down or crashes for some
> > > reason. Currently, we don't have a way to interpret the replication
> > > slot info in the disk but there's a patch for pg_replslotdata tool at
> > > [1]. This way, one can figure out the reasons for the server
> > > down/crash and figure out which replication slots to remove to bring
> > > the server up and running without touching the other replication
> > > slots.
> >
> > I think the PIDs are log-worthy for sure, but it's not clear to me that
> > it is desirable to write them to the persistent state file.  In case of
> > crashes, the log should serve just fine to aid root cause investigation
> > -- in fact even better than the persistent file, where the data would be
> > lost as soon as the next client acquires that slot.
>
> Thanks. +1 to log a message at LOG level whenever a replication slot
> becomes active (gets assigned a valid pid to active_pid) and becomes
> inactive(gets assigned 0 to active_pid). Having said that, isn't it
> also helpful if we write a bool (1 byte character) whenever the slot
> becomes active and inactive to the disk?

Here's the patch that adds a LOG message whenever a replication slot
becomes active and inactive. These logs will be extremely useful on
production servers to debug and analyze inactive replication slot
issues.

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-Add-LOG-messages-when-replication-slots-become-ac.patch
Description: Binary data

Re: Confused comment about drop replica identity index

2021-12-14 Thread Ashutosh Bapat

On Tue, Dec 14, 2021 at 6:08 PM wangw.f...@fujitsu.com
 wrote:
>
> Hi hackers,
>
> When I doing development based by PG, I found the following comment have a
> little problem in file src/include/catalog/pg_class.h.
>
> /*
>  * an explicitly chosen candidate key's columns are used as replica identity.
>  * Note this will still be set if the index has been dropped; in that case it
>  * has the same meaning as 'd'.
>  */
> #define   REPLICA_IDENTITY_INDEX'i'
>
> The last sentence makes me a little confused :
> [..in that case it as the same meaning as 'd'.]
>
> Now, pg-doc didn't have a clear style to describe this.
>
>
> But if I drop relation's replica identity index like the comment, the action
> is not as same as default.
>
> For example:
> Execute the following SQL:
> create table tbl (col1 int  primary key, col2 int not null);
> create unique INDEX ON tbl(col2);
> alter table tbl replica identity using INDEX tbl_col2_idx;
> drop index tbl_col2_idx;
> create publication pub for table tbl;
> delete from tbl;
>
> Actual result:
> ERROR:  cannot delete from table "tbl" because it does not have a replica 
> identity and publishes deletes
> HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER 
> TABLE.

I think I see where's the confusion. The table has a primary key and
so when the replica identity index is dropped, per the comment in
code, you expect that primary key will be used as replica identity
since that's what 'd' or default means.

>
> Expected result in comment:
> DELETE 0
>
>
> I found that in the function CheckCmdReplicaIdentity, the operation described
> in the comment is not considered,
> When relation's replica identity index is found to be InvalidOid, an error is
> reported.

This code in RelationGetIndexList() is not according to that comment.

   if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex))
relation->rd_replidindex = pkeyIndex;
else if (replident == REPLICA_IDENTITY_INDEX && OidIsValid(candidateIndex))
relation->rd_replidindex = candidateIndex;
else
relation->rd_replidindex = InvalidOid;

>
> Are the comment here not accurate enough?
> Or we need to adjust the code according to the comments?
>

Comment in code is one thing, but I think PG documentation is not
covering the use case you tried. What happens when a replica identity
index is dropped has not been covered either in ALTER TABLE
https://www.postgresql.org/docs/13/sql-altertable.html or DROP INDEX
https://www.postgresql.org/docs/14/sql-dropindex.html documentation.


-- 
Best Wishes,
Ashutosh Bapat

Re: more descriptive message for process termination due to max_slot_wal_keep_size

2021-12-14 Thread Ashutosh Bapat

On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi
 wrote:
>
> Hello.
>
> As complained in pgsql-bugs [1], when a process is terminated due to
> max_slot_wal_keep_size, the related messages don't mention the root
> cause for *the termination*.  Note that the third message does not
> show for temporary replication slots.
>
> [pid=a] LOG:  terminating process x to release replication slot "s"
> [pid=x] LOG:  FATAL:  terminating connection due to administrator command
> [pid=a] LOG:  invalidting slot "s" because its restart_lsn X/X exceeds 
> max_slot_wal_keep_size
>
> The attached patch attaches a DETAIL line to the first message.
>
> > [17605] LOG:  terminating process 17614 to release replication slot "s1"
> + [17605] DETAIL:  The slot's restart_lsn 0/2CA0 exceeds 
> max_slot_wal_keep_size.
> > [17614] FATAL:  terminating connection due to administrator command
> > [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2CA0 
> > exceeds max_slot_wal_keep_size
>
> Somewhat the second and fourth lines look inconsistent each other but
> that wouldn't be such a problem.  I don't think we want to concatenate
> the two lines together as the result is a bit too long.
>
> > LOG:  terminating process 17614 to release replication slot "s1" because 
> > it's restart_lsn 0/2CA0 exceeds max_slot_wal_keep_size.
>
> What do you think about this?

Agree. I think we should also specify the restart_lsn value which
would be within max_slot_wal_keep_size for better understanding.

-- 
Best Wishes,
Ashutosh Bapat

Re: conchuela has some SSL issues

2021-12-14 Thread Mikael Kjellström


On 2021-12-13 20:53, Andrew Dunstan wrote:


Any thoughts?



It's also failing on REL_14_STABLE, so I think it must be an
environmental change.


I did an pkg update && pkg upgrade and it messed up the SSL-libraries. 
It had both libressl and openssl and when I upgraded it some how removed 
libressl-libraries.   One would think it would still work as it would 
pick up the openssl library instead.  But apparently not.


I will take a look at it.

At the moment I have disabled new builds until I have fixed it.

Sorry for the inconvenience.

/Mikael

Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Robert Haas

On Sun, Dec 12, 2021 at 8:47 PM Peter Geoghegan  wrote:
> I am currently working on decoupling advancing relfrozenxid from tuple
> freezing [1]. That is, I'm teaching VACUUM to keep track of
> information that it uses to generate an "optimal value" for the
> table's final relfrozenxid: the most recent XID value that might still
> be in the table. This patch is based on the observation that we don't
> actually have to use the FreezeLimit cutoff for our new
> pg_class.relfrozenxid. We need only obey the basic relfrozenxid
> invariant, which is that the final value must be <= any extant XID in
> the table.  Using FreezeLimit is needlessly conservative.

Right.

> It now occurs to me to push this patch in another direction, on top of
> all that: the OldestXmin behavior hints at a precise, robust way of
> defining "useless vacuuming". We can condition skipping a VACUUM (i.e.
> whether a VACUUM is considered "definitely won't be useful if allowed
> to execute") on whether or not our preexisting pg_class.relfrozenxid
> precisely equals our newly-acquired OldestXmin for an about-to-begin
> VACUUM operation.  (We'd also want to add an "unchangeable
> pg_class.relminmxid" test, I think.)

I think this is a reasonable line of thinking, but I think it's a
little imprecise. In general, we could be vacuuming a relation to
advance relfrozenxid, but we could also be vacuuming a relation to
advance relminmxid, or we could be vacuuming a relation to fight
bloat, or set pages all-visible. It is possible that there's no hope
of advancing relfrozenxid but that we can still accomplish one of the
other goals. In that case, the vacuuming is not useless.  I think the
place to put logic around this would be in the triggering logic for
autovacuum. If we're going to force a relation to be vacuumed because
of (M)XID wraparound danger, we could first check whether there seems
to be any hope of advancing relfrozenxid(minmxid). If not, we discount
that as a trigger for vacuum, but may still decide to vacuum if some
other trigger warrants it. In most cases, if there's no hope of
advancing relfrozenxid, there won't be any bloat to remove either, but
aborted transactions are a counterexample. And the XID and MXID
horizons can advance at completely different rates.

One reason I haven't pursued this kind of optimization is that it
doesn't really feel like it's fixing the whole problem. It would be a
little bit sad if we did a perfect job preventing useless vacuuming
but still allowed almost-useless vacuuming. Suppose we have a 1TB
relation and we trigger autovacuum. It cleans up a few things but
relfrozenxid is still old. On the next pass, we see that the
system-wide xmin has not advanced, so we don't trigger autovacuum
again. Then on the pass after that we see that the system-wide xmin
has advanced by 1. Shall we trigger an autovacuum of the whole
relation now, to be able to do relfrozenxid++? Seems dubious.

Part of the problem here, for both vacuuming-for-bloat and
vacuuming-for-relfrozenxid-advancement, we would really like to know
the distribution of old XIDs in the table. If we knew that a lot of
the inserts, updates, and deletes that are causing us to vacuum for
bloat containment were in a certain relatively narrow range, then we'd
probably want to not autovacuum for either purpose until the
system-wide xmin has crossed through at least a good chunk of that
range. And it fully crossed over that range then an immediate vacuum
looks extremely appealing: we'll both remove a bunch of dead tuples
and reclaim the associated line pointers, and at the same time we'll
be able to advance relfrozenxid. Nice! But we have no such
information.

So I'm not certain of the way forward here. Just because we can't
prevent almost-useless vacuuming is not a sufficient reason to
continue allowing entirely-useless vacuuming that we can prevent. And
it seems like we need a bunch of new bookkeeping to do any better than
that, which seems like a lot of work. So maybe it's the most practical
path forward for the time being, but it feels like more of a
special-purpose kludge than a truly high-quality solution.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: more descriptive message for process termination due to max_slot_wal_keep_size

2021-12-14 Thread Bharath Rupireddy

On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi
 wrote:
>
> Hello.
>
> As complained in pgsql-bugs [1], when a process is terminated due to
> max_slot_wal_keep_size, the related messages don't mention the root
> cause for *the termination*.  Note that the third message does not
> show for temporary replication slots.
>
> [pid=a] LOG:  "terminating process %d to release replication slot \"%s\""
> [pid=x] LOG:  FATAL:  terminating connection due to administrator command
> [pid=a] LOG:  invalidting slot "s" because its restart_lsn X/X exceeds 
> max_slot_wal_keep_size
>
> The attached patch attaches a DETAIL line to the first message.
>
> > [17605] LOG:  terminating process 17614 to release replication slot "s1"
> + [17605] DETAIL:  The slot's restart_lsn 0/2CA0 exceeds 
> max_slot_wal_keep_size.
> > [17614] FATAL:  terminating connection due to administrator command
> > [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2CA0 
> > exceeds max_slot_wal_keep_size
>
> Somewhat the second and fourth lines look inconsistent each other but
> that wouldn't be such a problem.  I don't think we want to concatenate
> the two lines together as the result is a bit too long.
>
> > LOG:  terminating process 17614 to release replication slot "s1" because 
> > it's restart_lsn 0/2CA0 exceeds max_slot_wal_keep_size.
>
> What do you think about this?
>
> [1] 
> https://www.postgresql.org/message-id/20211214.101137.379073733372253470.horikyota.ntt%40gmail.com

+1 to give more context to the "terminating process %d to release
replication slot \"%s\"" message.

How about having below, instead of adding errdetail:
"terminating process %d to release replication slot \"%s\" whose
restart_lsn %X/%X exceeds max_slot_wal_keep_size"?

I think we can keep the "invalidating slot \"%s\" because its
restart_lsn %X/%X exceeds max_slot_wal_keep_size" message as-is. We
may not see "terminating process ..." and "invalidation slot ..."
messages together for the same slot, so having slightly different
wording is fine IMO.

Regards,
Bharath Rupireddy.

Re: cutting down the TODO list thread

2021-12-14 Thread John Naylor

On Wed, Dec 8, 2021 at 1:40 PM John Naylor  wrote:
>
> It's been a while, but here are a few more suggested
> removals/edits/additions to the TODO list. Any objections or new
> information, let me know:
>
> - Auto-fill the free space map by scanning the buffer cache or by
> checking pages written by the background writer
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php
> https://www.postgresql.org/message-id/200603011716.16984.pete...@gmx.net
>
> Both these threads are from 2006, so have nothing to do with the current FSM.

Moved to the Not Worth Doing list.

> - Allow concurrent inserts to use recently created pages rather than
> creating new ones
> http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php
>
> Skimming the first few messages, I believe this has been covered by
> commit 719c84c1b? (Extend relations multiple blocks at a time to
> improve scalability.)

Removed.

> - Allow VACUUM FULL and CLUSTER to update the visibility map
>
> This topic has a current CF entry which seems to have stalled, so that
> newer thread would be better to list here than the one from 2013.

Added.

> - Bias FSM towards returning free space near the beginning of the heap
> file, in hopes that empty pages at the end can be truncated by VACUUM
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php
> https://www.postgresql.org/message-id/20150424190403.gp4...@alvh.no-ip.org
>
> I'm not sure what to think of this, but independently of that, the
> second thread is actually talking about bringing back something like
> the pre-9.0 vacuum full, so maybe it should be its own entry?

Done.

> - Consider a more compact data representation for dead tuple locations
> within VACUUM
> http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php
>
> Great, but let's link to this more recent thread instead:
> https://www.postgresql.org/message-id/flat/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com

Done.

> > The second thread is really about autovacuum launcher scheduling.
> > Probably still relevant, but the thread is very long and doesn't seem
> > terribly helpful to someone trying to get up to speed on the issues
> > that are still relevant. I don't see any more recent discussion,
> > either. Thoughts?

Split into two entries.

On Wed, Dec 8, 2021 at 8:12 PM Masahiko Sawada  wrote:
> There is another discussion on autovacuum scheduling in 2018 here:
>
> https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05
>
> Some algorithms were proposed there and I implemented a PoC patch:
>
> https://www.postgresql.org/message-id/CAD21AoBUaSRBypA6pd9ZD%3DU-2TJCHtbyZRmrS91Nq0eVQ0B3BA%40mail.gmail.com

Added, thanks!

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Commitfest 2021-11 Patch Triage - Part 3

2021-12-14 Thread Mark Dilger

> On Dec 13, 2021, at 10:18 PM, Jeff Davis  wrote:
> 
>>This is implemented but I still need to update the documentation
>> before
>>posting.
> 
> We also discussed how much of the insert path we want to include in the
> apply worker. The immediate need is for the patch to support RLS, but
> it raised the question: "why not just use the entire ExecInsert()
> path?".

I went a different direction with this.  The need is to prevent non-superuser 
subscription owners from *circumventing* RLS.  For this patch set, I'm just 
checking whether RLS should be enforced against the subscription owner, and if 
so, raising an ERROR, same as for a privilege violation.  This should be 
sufficient in practice, as the table owner, roles with bypassrls privilege, and 
superusers should still be able to replicate into an RLS enabled table.

The problem with actually *supporting* RLS is that the current logical 
replication implementation is a mix of different practical (rather than 
theoretical) design choices.  After working to get RLS working as part of that, 
I became concerned that there may be subtle differences between how I was 
making RLS work in logical replication vs. how it works for regular 
inserts/updates/deletes.  Rather than create a mess that would need to be 
supported going forward, I bailed out and just forbade it.

As far as completeness and a clean implementation (but not performance) are 
concerned, using ExecInsert is quite appealing.  I think that would require 
reworking the logical replication protocol to send more than one row at a time, 
so that the overhead of such a choice is not paid *per row*.  That seems quite 
out of scope for this release cycle, though.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Column Filtering in Logical Replication

2021-12-14 Thread Alvaro Herrera

On 2021-Dec-13, Alvaro Herrera wrote:

> I think this means we need a new OBJECT_PUBLICATION_REL_COLUMN value in
> the ObjectType (paralelling OBJECT_COLUMN), and no new ObjectClass
> value.  Looking now to confirm.

After working on this a little bit more, I realized that this is a bad
idea overall.  It causes lots of complications and it's just not worth
it.  So I'm back at my original thought that we need to throw an ERROR
at ALTER TABLE .. DROP COLUMN time if the column is part of a
replication column filter, and suggest the user to remove the column
from the filter first and reattempt the DROP COLUMN.

This means that we need to support changing the column list of a table
in a publication.  I'm looking at implementing some form of ALTER
PUBLICATION for that.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bruce Momjian

On Mon, Dec 13, 2021 at 11:05:46PM +, Bossart, Nathan wrote:
> On 12/13/21, 12:37 PM, "Robert Haas"  wrote:
> > On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan  wrote:
> >> > But against all that, if these tasks are slowing down checkpoints and
> >> > that's avoidable, that seems pretty important too. Interestingly, I
> >> > can't say that I've ever seen any of these things be a problem for
> >> > checkpoint or startup speed. I wonder why you've had a different
> >> > experience.
> >>
> >> Yeah, it's difficult for me to justify why users should suffer long
> >> periods of downtime because startup or checkpointing is taking a very
> >> long time doing things that are arguably unrelated to startup and
> >> checkpointing.
> >
> > Well sure. But I've never actually seen that happen.
> 
> I'll admit that surprises me.  As noted elsewhere [0], we were seeing
> this enough with pgsql_tmp that we started moving the directory aside
> before starting the server.  Discussions about handling this usually
> prompt questions about why there are so many temporary files in the
> first place (which is fair).  FWIW all four functions noted in my
> original message [1] are things I've seen firsthand affecting users.

Have we changed temporary file handling in any recent major releases,
meaning is this a current problem or one already improved in PG 14.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

pg_upgrade operation failed if table created in --single user mode

2021-12-14 Thread tushar


Hi,

Please refer to this scenario where pg_upgrade operation is failing if 
the table is create in single-user mode.


PG v13
--connect to PG v13 using single user mode  ( ./postgres --single -D 
/tmp/data13 postgres )

--create table ( backend> create table r(n int); )
--exit  ( ctrl + D)

-- Perform pg_upgrade ( PG v13->PG v15)  )(./pg_upgrade -d data13 -D 
data15 -b /usr/psql-12/bin -B . )


it will fail with these messages

Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
  postgres
*failure*

Consult the last few lines of "pg_upgrade_dump_14174.log" for
the probable cause of the failure.
Failure, exiting

--cat pg_upgrade_dump_14174.log
--
--
--
--
pg_restore: creating TABLE "public.r"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 200; 1259 14180 TABLE r edb
pg_restore: error: could not execute query: ERROR:  pg_type array OID 
value not set when in binary upgrade mode

Command was:
-- For binary upgrade, must preserve pg_type oid
SELECT 
pg_catalog.binary_upgrade_set_next_pg_type_oid('14181'::pg_catalog.oid);



-- For binary upgrade, must preserve pg_class oids and relfilenodes
SELECT 
pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('14180'::pg_catalog.oid);
SELECT 
pg_catalog.binary_upgrade_set_next_heap_relfilenode('14180'::pg_catalog.oid);


CREATE TABLE "public"."r" (
    "n" integer
);

-- For binary upgrade, set heap's relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '492', relminmxid = '1'
WHERE oid = '"public"."r"'::pg_catalog.regclass;

Is it expected ?

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Peter Geoghegan

On Tue, Dec 14, 2021 at 6:05 AM Robert Haas  wrote:
> I think this is a reasonable line of thinking, but I think it's a
> little imprecise. In general, we could be vacuuming a relation to
> advance relfrozenxid, but we could also be vacuuming a relation to
> advance relminmxid, or we could be vacuuming a relation to fight
> bloat, or set pages all-visible. It is possible that there's no hope
> of advancing relfrozenxid but that we can still accomplish one of the
> other goals. In that case, the vacuuming is not useless.  I think the
> place to put logic around this would be in the triggering logic for
> autovacuum. If we're going to force a relation to be vacuumed because
> of (M)XID wraparound danger, we could first check whether there seems
> to be any hope of advancing relfrozenxid(minmxid). If not, we discount
> that as a trigger for vacuum, but may still decide to vacuum if some
> other trigger warrants it. In most cases, if there's no hope of
> advancing relfrozenxid, there won't be any bloat to remove either, but
> aborted transactions are a counterexample. And the XID and MXID
> horizons can advance at completely different rates.

I think that you'd agree that the arguments in favor of skipping are
strongest for an aggressive anti-wraparound autovacuum (as opposed to
any other kind of aggressive VACUUM, including aggressive autovacuum).
Aside from the big benefit I pointed out already (avoiding blocking
useful anti-wraparound vacuums that starts a little later by not
starting a conflicting useless anti-wraparound vacuum now), there is
also more certainty about downsides. We can know the following things
for sure:

* We only launch an (aggressive) anti-wraparound autovacuum because we
need to advance relfrozenxid. In other words, if we didn't need to
advance relfrozenxid then (for better or worse) we definitely wouldn't
be launching anything.

* Our would-be OldestXmin exactly matches the preexisting
pg_class.relfrozenxid (and pg_class.relminmxid). And so it follows
that we're definitely not going to be able to do the thing that is
ostensibly the whole point of anti-wraparound vacuum (advance
relfrozenxid/relminmxid).

> One reason I haven't pursued this kind of optimization is that it
> doesn't really feel like it's fixing the whole problem. It would be a
> little bit sad if we did a perfect job preventing useless vacuuming
> but still allowed almost-useless vacuuming. Suppose we have a 1TB
> relation and we trigger autovacuum. It cleans up a few things but
> relfrozenxid is still old. On the next pass, we see that the
> system-wide xmin has not advanced, so we don't trigger autovacuum
> again. Then on the pass after that we see that the system-wide xmin
> has advanced by 1. Shall we trigger an autovacuum of the whole
> relation now, to be able to do relfrozenxid++? Seems dubious.

I can see what you mean, but just fixing the most extreme case can be
a useful goal. It's often enough to stop the system from going into a
tailspin, which is the real underlying goal here. Things that approach
the most extreme case (but don't quite hit it) don't have that
quality.

An anti-wraparound vacuum is supposed to be a mechanism that the
system escalates to when nothing else triggers an autovacuum worker to
run (which is aggressive but not anti-wraparound). That's not really
true in practice, of course; anti-wraparound av often becomes a
routine thing. But I think that it's a good ideal to strive for -- it
should be rare.

The draft patch series now adds opportunistic freezing -- I should be
able to post a new version in a few days time, once I've tied up some
loose ends. My testing shows an interesting effect, when opportunistic
freezing is applied on top of the relfrozenxid thing: every autovacuum
manages to advance relfrozenxid, and so we'll never have to run an
aggressive autovacuum (much less an aggressive anti-wraparound
autovacuum) in practice. And so (for example) when autovacuum runs
against the pgbench_history table, it always sets its relfrozenxid to
a value very close to the OldestXmin -- usually the exact OldestXmin.

Opportunistic freezing makes us avoid setting the all-visible bit for
a heap page without also setting the all-frozen bit -- when we're
about to do that, we go freeze the heap tuples and then set the entire
page all-frozen (so we freeze anything <= OldestXmin, not <=
FreezeLimit). We also freeze based on this more aggressive <=
OldestXmin cutoff when pruning had to delete some tuples.

The patch still needs more polishing, but I think that we can make
anti-wraparound vacuums truly exceptional with this design -- which
would make autovacuum a lot easier to deal with operationally. This
seems like a feasible goal for Postgres 15, even (though still quite
ambitious). The opportunistic freezing stuff isn't free (the WAL
records aren't tiny), but it's still not all that expensive. Plus I
think that the cost can be further reduced, with a little more work.

> Part of the problem here, for bo

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-14 Thread Shruthi Gowda

On Tue, Dec 14, 2021 at 2:35 AM Robert Haas  wrote:
>
> On Mon, Dec 13, 2021 at 9:40 AM Shruthi Gowda  wrote:
> > > I am reviewing another patch
> > > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
> > > and will provide the comments soon if any...
>
> I spent much of today reviewing 0001. Here's an updated version, so
> far only lightly tested. Please check whether I've broken anything.
> Here are the changes:

Thanks, Robert for the updated version. I reviewed the changes and it
looks fine.
I also tested the patch. The patch works as expected.

> - I adjusted the function header comment for heap_create. Your
> proposed comment seemed like it was pretty detailed but not 100%
> correct. It also made one of the lines kind of long because you didn't
> wrap the text in the surrounding style. I decided to make it simpler
> and shorter instead of longer still and 100% correct.

The comment update looks fine. However, I still feel it would be good to
mention on which (rare) circumstance a valid relfilenode can get passed.

> - I removed a one-line comment that said /* Override the toast
> relfilenode */ because it preceded an error check, not a line of code
> that would have done what the comment claimed.
>
> - I removed a one-line comment that said /* Override the relfilenode
> */  because the following code would only sometimes override the
> relfilenode. The code didn't seem complex enough to justify a a longer
> and more accurate comment, so I just took it out.

Fine

> - I changed a test for (relkind == RELKIND_RELATION || relkind ==
> RELKIND_SEQUENCE || relkind == RELKIND_MATVIEW) to use
> RELKIND_HAS_STORAGE(). It's true that not all of the storage types
> that RELKIND_HAS_STORAGE() tests are possible here, but that's not a
> reason to avoiding using the macro. If somebody adds a new relkind
> with storage in the future, they might miss the need to manually
> update this place, but they will not likely miss the need to update
> RELKIND_HAS_STORAGE() since, if they did, their code probably wouldn't
> work at all.

I agree.

> - I changed the way that you were passing create_storage down to
> heap_create. I think I said before that you should EITHER fix it so
> that we set create_storage = true only when the relation actually has
> storage OR ELSE have heap_create() itself override the value to false
> when there is no storage. You did both. There are times when it's
> reasonable to ensure the same thing in multiple places, but this
> doesn't seem to be one of them. So I took that out. I chose to retain
> the code in heap_create() that overrides the value to false, added a
> comment explaining that it does that, and then adjusted the callers to
> ignore the storage type. I then added comments, and in one place an
> assertion, to make it clearer what is happening.

The changes are fine. Thanks for the fine-tuning.

> - In pg_dump.c, I adjusted the comment that says "Not every relation
> has storage." and the test that immediately follows, to ignore the
> relfilenode when relkind says it's a partitioned table. Really,
> partitioned tables should never have had relfilenodes, but as it turns
> out, they used to have them.
>

Fine. Understood.

Thanks & Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-14 Thread tushar


On 12/14/21 2:35 AM, Robert Haas wrote:

I spent much of today reviewing 0001. Here's an updated version, so
far only lightly tested. Please check whether I've broken anything.

Thanks Robert, I tested from v96/12/13/v14 -> v15( with patch)
things are working fine i.e table /index relfilenode is preserved,
not changing after pg_upgrade.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: pg_upgrade operation failed if table created in --single user mode

2021-12-14 Thread Tom Lane

tushar  writes:
> Please refer to this scenario where pg_upgrade operation is failing if 
> the table is create in single-user mode.

Fixed in v14:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=f7f70d5e2

regards, tom lane

Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Robert Haas

On Tue, Dec 14, 2021 at 1:16 PM Peter Geoghegan  wrote:
> I think that you'd agree that the arguments in favor of skipping are
> strongest ...

Well I just don't understand why you insist on using the word
"skipping." I think what we're talking about - or at least what we
should be talking about - is whether relation_needs_vacanalyze() sets
*wraparound = true right after the comment that says /* Force vacuum
if table is at risk of wraparound */. And adding some kind of
exception to the logic that's there now.

> What I see with the draft patch series is that the oldest XID just
> isn't that old anymore, consistently -- we literally never fail to
> advance relfrozenxid, in any autovacuum, for any table. And the value
> that we end up with is consistently quite recent. This is something
> that I see both with BenchmarkSQL, and pgbench. There is a kind of
> virtuous circle, which prevents us from ever getting anywhere near
> having any table age in the tens of millions of XIDs.

Yeah, I hadn't thought about it from that perspective, but that does
seem very good. I think it's inevitable that there will be cases where
that doesn't work out - e.g. you can always force the bad case by
holding a table lock until your newborn heads off to college, or just
by overthrottling autovacuum so that it can't get through the database
in any reasonable amount of time - but it will be nice when it does
work out, for sure.

> I guess that that makes avoiding useless vacuuming seem like less of a
> priority. ISTM that it should be something that is squarely aimed at
> keeping things stable in truly pathological cases.

Yes. I think "pathological cases" is a good summary of what's wrong
with autovacuum. When there's nothing too crazy happening, it actually
does pretty well. But, when resources are tight or other corner cases
occur, really dumb things start to happen. So it's reasonable to think
about how we can install guard rails that prevent complete insanity.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Commitfest 2021-11 Patch Triage - Part 3

2021-12-14 Thread Jeff Davis

On Tue, 2021-12-14 at 07:41 -0800, Mark Dilger wrote:
> I went a different direction with this.  The need is to prevent non-
> superuser subscription owners from *circumventing* RLS.  For this
> patch set, I'm just checking whether RLS should be enforced against
> the subscription owner, and if so, raising an ERROR, same as for a
> privilege violation.  This should be sufficient in practice, as the
> table owner, roles with bypassrls privilege, and superusers should
> still be able to replicate into an RLS enabled table.

I agree with this. Let's consider subscription targets involving RLS a
separate feature that is just blocked for now (except for bypassrls
roles and superusers).

> As far as completeness and a clean implementation (but not
> performance) are concerned, using ExecInsert is quite appealing.  I
> think that would require reworking the logical replication protocol
> to send more than one row at a time, so that the overhead of such a
> choice is not paid *per row*.  That seems quite out of scope for this
> release cycle, though.

Are you sure overhead is a problem? INSERT INTO ... SELECT goes through
that path. And the existing logical replication insert path does some
extra stuff, like opening the table for each row, so it's not clear to
me that logical replication is much faster.

Also, the current patch does per-row ACL checks. Did you consider
making the checks a part of logicalrep_rel_open()? That already has a
path to consider relcache invalidation, so it would also be a good
place to check if the table has RLS enabled.

Regards,
Jeff Davis

Re: Add id's to various elements in protocol.sgml

2021-12-14 Thread Corey Huinker

On Sun, Dec 5, 2021 at 11:15 AM Daniel Gustafsson  wrote:

> > On 5 Dec 2021, at 16:51, Brar Piening  wrote:
>
> > The attached patch adds id's to various elements in protocol.sgml to
> > make them more accesssible via the public html documentation interface.
>
> Off the cuff without having checked the compiled results yet, it seems
> like a good idea.
>
> —
> Daniel Gustafsson
>

I wanted to do something similar for every function specification in the
docs. This may inspire me to take another shot at that.

Re: Column Filtering in Logical Replication

2021-12-14 Thread Tomas Vondra


On 12/14/21 17:43, Alvaro Herrera wrote:

On 2021-Dec-13, Alvaro Herrera wrote:


I think this means we need a new OBJECT_PUBLICATION_REL_COLUMN value in
the ObjectType (paralelling OBJECT_COLUMN), and no new ObjectClass
value.  Looking now to confirm.


After working on this a little bit more, I realized that this is a bad
idea overall.  It causes lots of complications and it's just not worth
it.  So I'm back at my original thought that we need to throw an ERROR
at ALTER TABLE .. DROP COLUMN time if the column is part of a
replication column filter, and suggest the user to remove the column
from the filter first and reattempt the DROP COLUMN.

This means that we need to support changing the column list of a table
in a publication.  I'm looking at implementing some form of ALTER
PUBLICATION for that.



Yeah. I think it's not clear if this should behave more like an index or 
a view. When an indexed column gets dropped we simply drop the index. 
But if you drop a column referenced by a view, we fail with an error. I 
think we should handle this more like a view, because publications are 
externally visible objects too (while indexes are pretty much just an 
implementation detail).


But why would it be easier not to add new object type? We still need to 
check there is no publication referencing the column - either you do 
that automatically through a dependency, or you do that by custom code. 
Using a dependency seems better to me, but I don't know what are the 
complications you mentioned.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Column Filtering in Logical Replication

2021-12-14 Thread Alvaro Herrera

On 2021-Dec-14, Tomas Vondra wrote:

> Yeah. I think it's not clear if this should behave more like an index or a
> view. When an indexed column gets dropped we simply drop the index. But if
> you drop a column referenced by a view, we fail with an error. I think we
> should handle this more like a view, because publications are externally
> visible objects too (while indexes are pretty much just an implementation
> detail).

I agree -- I think it's more like a view than like an index.  (The
original proposal was that if you dropped a column that was part of the
column list of a relation in a publication, the entire relation is
dropped from the view, but that doesn't seem very friendly behavior --
you break the replication stream immediately if you do that, and the
only way to fix it is to send a fresh copy of the remaining subset of
columns.)

> But why would it be easier not to add new object type? We still need to
> check there is no publication referencing the column - either you do that
> automatically through a dependency, or you do that by custom code. Using a
> dependency seems better to me, but I don't know what are the complications
> you mentioned.

The problem is that we need a way to represent the object "column of a
table in a publication".  I found myself adding a lot of additional code
to support OBJECT_PUBLICATION_REL_COLUMN and that seemed like too much.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Add id's to various elements in protocol.sgml

2021-12-14 Thread Alvaro Herrera

On 2021-Dec-05, Brar Piening wrote:

> When working with the Frontend/Backend Protocol implementation in Npgsql
> and discussing things with the team, I often struggle with the fact that
> you can't set deep links to individual message formats in the somewhat
> lengthy html docs pages.
> 
> The attached patch adds id's to various elements in protocol.sgml to
> make them more accesssible via the public html documentation interface.
> 
> I've tried to follow the style that was already there in a few of the
> elements.

Hmm, I think we tend to avoid xreflabels; see
https://www.postgresql.org/message-id/8315c0ca-7758-8823-fcb6-f37f9413e...@2ndquadrant.com


-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bossart, Nathan

On 12/14/21, 9:00 AM, "Bruce Momjian"  wrote:
> Have we changed temporary file handling in any recent major releases,
> meaning is this a current problem or one already improved in PG 14.

I haven't noticed any recent improvements while working in this area.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bossart, Nathan

On 12/14/21, 12:09 PM, "Bossart, Nathan"  wrote:
> On 12/14/21, 9:00 AM, "Bruce Momjian"  wrote:
>> Have we changed temporary file handling in any recent major releases,
>> meaning is this a current problem or one already improved in PG 14.
>
> I haven't noticed any recent improvements while working in this area.

On second thought, the addition of the remove_temp_files_after_crash
GUC is arguably an improvement since it could prevent files from
accumulating after repeated crashes.

Nathan

Re: WIN32 pg_import_system_collations

2021-12-14 Thread Juan José Santamaría Flecha

On Mon, Dec 13, 2021 at 9:54 PM Thomas Munro  wrote:

>
> I haven't tested yet but +1 for the feature.  I guess the API didn't
> exist at the time collation support was added.
>
> Good to hear.


> This conversion makes sense, to keep the user experience the same
> across platforms.   Nitpick on the comment: why ANSI?  I think we can
> call "en_NZ" a POSIX locale identifier[1], and I think we can call
> "en-NZ" a BCP 47 language tag.
>
> POSIX also works for me.


> When would the full set of locales not be installed on a Windows
> system, and why does this need Visual Studio?  Wondering if this test
> will work with some of the frankenstein/cross toolchains tool chains
> (not objecting if it doesn't and could be skipped, just trying to
> understand the comment).
>
> What I meant to say is that to run the test, you need a database that has
successfully run pg_import_system_collations. This would be also possible
in Mingw for _WIN32_WINNT> = 0x0600, but the current value in
src\include\port\win32.h is _WIN32_WINNT = 0x0501 when compiling with
Mingw.


> Regards,

 Juan José Santamaría Flecha

Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Peter Geoghegan

On Tue, Dec 14, 2021 at 10:47 AM Robert Haas  wrote:
> Well I just don't understand why you insist on using the word
> "skipping." I think what we're talking about - or at least what we
> should be talking about - is whether relation_needs_vacanalyze() sets
> *wraparound = true right after the comment that says /* Force vacuum
> if table is at risk of wraparound */. And adding some kind of
> exception to the logic that's there now.

Actually, I agree. Skipping is the wrong term, especially because the
phrase "VACUUM skips..." is already too overloaded. Not necessarily in
vacuumlazy.c itself, but certainly on the mailing list.

> Yeah, I hadn't thought about it from that perspective, but that does
> seem very good. I think it's inevitable that there will be cases where
> that doesn't work out - e.g. you can always force the bad case by
> holding a table lock until your newborn heads off to college, or just
> by overthrottling autovacuum so that it can't get through the database
> in any reasonable amount of time - but it will be nice when it does
> work out, for sure.

Right. But when the patch doesn't manage to totally prevent
anti-wraparound VACUUMs, things still work out a lot better than they
would now. I would expect that in practice this will usually only
happen when non-aggressive autovacuums keep getting canceled. And
sure, it's still not ideal that things have come to that. But because
we now do freezing earlier (when it's relatively inexpensive), and
because we set all-frozen bits incrementally, the anti-wraparound
autovacuum will at least be able to reuse any freezing that we manage
to do in all those canceled autovacuums.

I think that this tends to make anti-wraparound VACUUMs mostly about
not being cancelable -- not so much about reliably advancing
relfrozenxid. I mean it doesn't change the basic rules (there is no
change to the definition of aggressive VACUUM), but in practice I
think that it'll just work that way.  Which makes a great deal of
sense. I hope to be able to totally get rid of
vacuum_freeze_table_age.

The freeze map work in PostgreSQL 9.6 was really great, and very
effective. But I think that it had an undesirable interaction with
vacuum_freeze_min_age: if we set a heap page as all-visible (but not
all frozen) before some of its tuples reached that age (which is very
likely), then tuples <  vacuum_freeze_min_age aren't going to get
frozen until whenever we do an aggressive autovacuum. Very often, this
will only happen when we next do an anti-wraparound VACUUM (at least
before Postgres 13). I suspect we risk running into a "debt cliff" in
the eventual anti-wraparound autovacuum. And so while
vacuum_freeze_min_age kinda made sense prior to 9.6, it now seems to
make a lot less sense.

> > I guess that that makes avoiding useless vacuuming seem like less of a
> > priority. ISTM that it should be something that is squarely aimed at
> > keeping things stable in truly pathological cases.
>
> Yes. I think "pathological cases" is a good summary of what's wrong
> with autovacuum.

This is 100% my focus, in general. The main goal of the patch I'm
working on isn't so much improving performance as making it more
predictable over time. Focussing on freezing while costs are low has a
natural tendency to spread the costs out over time. The system should
never "get in over its head" with debt that vacuum is expected to
eventually deal with.

> When there's nothing too crazy happening, it actually
> does pretty well. But, when resources are tight or other corner cases
> occur, really dumb things start to happen. So it's reasonable to think
> about how we can install guard rails that prevent complete insanity.

Another thing that I really want to stamp out is anything involving a
tiny, seemingly-insignificant adverse event that has the potential to
cause disproportionate impact over time. For example, right now a
non-aggressive VACUUM will never be able to advance relfrozenxid when
it cannot get a cleanup lock on one heap page. It's actually extremely
unlikely that that should have much of any impact, at least when you
determine the new relfrozenxid for the table intelligently. Not
acquiring one cleanup lock on one heap page on a huge table should not
have such an extreme impact.

It's even worse when the systemic impact over time is considered.
Let's say you only have a 20% chance of failing to acquire one or more
cleanup locks during a non-aggressive autovacuum for a given large
table, meaning that you'll fail to advance relfrozenxid in at least
20% of all non-aggressive autovacuums. I think that that might be a
lot worse than it sounds, because the impact compounds over time --
I'm not sure that 20% is much worse than 60%, or much better than 5%
(very hard to model it). But if we make the high-level, abstract idea
of "aggressiveness" more of a continuous thing, and not something
that's defined by sharp (and largely meaningless) XID-based cutoffs,
we have every chance of nipping these problems i

Life cycles of tuple descriptors

2021-12-14 Thread Chapman Flack

Hi,

There are some things about the life cycle of the common TupleDesc
that I'm not 100% sure about.

1. In general, if you get one from relcache or typcache, it is
   reference-counted, right?

2. The only exception I know of is if you ask the typcache for
   a blessed one (RECORD+typmod), and it is found in the shared-memory
   cache. It is not refcounted then. If it is found only locally,
   it is refcounted.

3. Is that shared case the only way you could see a non-refcounted
   TupleDesc handed to you by the typcache?

4. How long can such a non-refcounted TupleDesc from the typcache
   be counted on to live?

   There is a comment in expandedrecord.c: "It it's not refcounted, just
   assume it will outlive the expanded object."

   That sounds like confidence, but is it confidence in the longevity
   of shared TupleDescs, or in the fleeting lives of expanded records?

   The same comment says "(This can happen for shared record types,
   for instance.)" Does the "for instance" mean there are now, or just
   in future might be, other cases where the typcache hands you
   a non-refcounted TupleDesc?

5. When a constructed TupleDesc is blessed, the copy placed in the cache
   by assign_record_type_typmod is born with a refcount of 1. Assuming every
   later user of that TupleDesc plays nicely with balanced pin and release,
   what event(s), if any, could ever occur to decrease that initial 1 to 0?

Thanks for any help getting these points straight!

Regards,
-Chap

Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-14 Thread Peter Eisentraut


On 13.12.21 15:44, Tom Lane wrote:

Our current hard-coded uses of long long are all written on the
assumption that it's*at least*  64 bits, so we'd survive OK on
such a platform so long as we don't start confusing it with
*exactly*  64 bits.


OK, makes sense.  Here is an alternative patch.  It introduces two 
light-weight macros strtoi64() and strtou64() (compare e.g., strtoimax() 
in POSIX) in c.h and removes pg_strtouint64().  This moves the 
portability layer from numutils.c to c.h, so it's closer to the rest of 
the int64 portability code.  And that way it is available to not just 
server code.  And it resolves the namespace collision with the 
pg_strtointNN() functions in numutils.c.From e723f9480be4c466f9c3f59a75af951d94f2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 14 Dec 2021 08:38:35 +0100
Subject: [PATCH v2] Simplify the general-purpose 64-bit integer parsing APIs

pg_strtouint64() is a wrapper around strtoull/strtoul/_strtoui64, but
it seems no longer necessary to have this indirection.
msvc/Solution.pm claims HAVE_STRTOULL, so the "MSVC only" part seems
unnecessary.  Also, we have code in c.h to substitute alternatives for
strtoull() if not found, and that would appear to cover all currently
supported platforms, so having a further fallback in pg_strtouint64()
seems unnecessary.

Therefore, we could remove pg_strtouint64(), and use strtoull()
directly in all call sites.  However, it seems useful to keep a
separate notation for parsing exactly 64-bit integers, matching the
type definition int64/uint64.  For that, add new macros strtoi64() and
strtou64() in c.h as thin wrappers around strtol()/strtoul() or
strtoll()/stroull().  This makes these functions available everywhere
instead of just in the server code, and it makes the function naming
notably different from the pg_strtointNN() functions in numutils.c,
which have a different API.
---
 src/backend/nodes/readfuncs.c |  2 +-
 src/backend/utils/adt/numutils.c  | 22 --
 src/backend/utils/adt/xid.c   |  2 +-
 src/backend/utils/adt/xid8funcs.c |  6 +++---
 src/backend/utils/misc/guc.c  |  2 +-
 src/include/c.h   | 13 +
 src/include/utils/builtins.h  |  1 -
 7 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index dcec3b728f..76cff8a2b1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -80,7 +80,7 @@
 #define READ_UINT64_FIELD(fldname) \
token = pg_strtok(&length); /* skip :fldname */ \
token = pg_strtok(&length); /* get field value */ \
-   local_node->fldname = pg_strtouint64(token, NULL, 10)
+   local_node->fldname = strtou64(token, NULL, 10)
 
 /* Read a long integer field (anything written as ":fldname %ld") */
 #define READ_LONG_FIELD(fldname) \
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b93096f288..6a9c00fdd3 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -606,25 +606,3 @@ pg_ultostr(char *str, uint32 value)
 
return str + len;
 }
-
-/*
- * pg_strtouint64
- * Converts 'str' into an unsigned 64-bit integer.
- *
- * This has the identical API to strtoul(3), except that it will handle
- * 64-bit ints even where "long" is narrower than that.
- *
- * For the moment it seems sufficient to assume that the platform has
- * such a function somewhere; let's not roll our own.
- */
-uint64
-pg_strtouint64(const char *str, char **endptr, int base)
-{
-#ifdef _MSC_VER/* MSVC only */
-   return _strtoui64(str, endptr, base);
-#elif defined(HAVE_STRTOULL) && SIZEOF_LONG < 8
-   return strtoull(str, endptr, base);
-#else
-   return strtoul(str, endptr, base);
-#endif
-}
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 24c1c93732..a09096d018 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -158,7 +158,7 @@ xid8in(PG_FUNCTION_ARGS)
 {
char   *str = PG_GETARG_CSTRING(0);
 
-   
PG_RETURN_FULLTRANSACTIONID(FullTransactionIdFromU64(pg_strtouint64(str, NULL, 
0)));
+   PG_RETURN_FULLTRANSACTIONID(FullTransactionIdFromU64(strtou64(str, 
NULL, 0)));
 }
 
 Datum
diff --git a/src/backend/utils/adt/xid8funcs.c 
b/src/backend/utils/adt/xid8funcs.c
index f1870a7366..68985dca5a 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -295,12 +295,12 @@ parse_snapshot(const char *str)
char   *endp;
StringInfo  buf;
 
-   xmin = FullTransactionIdFromU64(pg_strtouint64(str, &endp, 10));
+   xmin = FullTransactionIdFromU64(strtou64(str, &endp, 10));
if (*endp != ':')
goto bad_format;
str = endp + 1;
 
-   xmax = FullTransactionIdFromU64(pg_strtouint64(str, &endp, 10));
+   xmax = FullTransactionIdFromU64

Re: Column Filtering in Logical Replication

2021-12-14 Thread Tomas Vondra





On 12/14/21 20:35, Alvaro Herrera wrote:

On 2021-Dec-14, Tomas Vondra wrote:


Yeah. I think it's not clear if this should behave more like an index or a
view. When an indexed column gets dropped we simply drop the index. But if
you drop a column referenced by a view, we fail with an error. I think we
should handle this more like a view, because publications are externally
visible objects too (while indexes are pretty much just an implementation
detail).


I agree -- I think it's more like a view than like an index.  (The
original proposal was that if you dropped a column that was part of the
column list of a relation in a publication, the entire relation is
dropped from the view, but that doesn't seem very friendly behavior --
you break the replication stream immediately if you do that, and the
only way to fix it is to send a fresh copy of the remaining subset of
columns.)



Right, that's my reasoning too.


But why would it be easier not to add new object type? We still need to
check there is no publication referencing the column - either you do that
automatically through a dependency, or you do that by custom code. Using a
dependency seems better to me, but I don't know what are the complications
you mentioned.


The problem is that we need a way to represent the object "column of a
table in a publication".  I found myself adding a lot of additional code
to support OBJECT_PUBLICATION_REL_COLUMN and that seemed like too much.



My experience with dependencies is pretty limited, but can't we simply 
make a dependency between the whole publication and the column?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Column Filtering in Logical Replication

2021-12-14 Thread Tomas Vondra


Hi,

I went through the v9 patch, and I have a couple comments / questions. 
Apologies if some of this was already discussed earlier, it's hard to 
cross-check in such a long thread. Most of the comments are in 0002 to 
make it easier to locate, and it also makes proposed code changes 
clearer I think.


1) check_publication_add_relation - the "else" branch is not really 
needed, because the "if (replidentfull)" always errors-out


2) publication_add_relation has a FIXME about handling cases with 
different column list


So what's the right behavior for ADD TABLE with different column list? 
I'd say we should allow that, and that it should be mostly the same 
thing as adding/removing columns to the list incrementally, i.e. we 
should replace the column lists. We could also prohibit such changes, 
but that seems like a really annoying limitation, forcing people to 
remove/add the relation.


I added some comments to the attmap translation block, and replaced <0 
check with AttrNumberIsForUserDefinedAttr.


But I wonder if we could get rid of the offset, considering we're 
dealing with just user-defined attributes. That'd make the code clearer, 
but it would break if we're comparing it to other bitmaps with offsets. 
But I don't think we do.


3) I doubt "att_map" is the right name, though. AFAICS it's just a list 
of columns for the relation, not a map, right? So maybe attr_list?


4) AlterPublication talks about "publication status" for a column, but 
do we actually track that? Or what does that mean?


5) PublicationDropTables does a check

if (pubrel->columns)
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),

Shouldn't this be prevented by the grammar, really? Also, it should be 
in regression tests.


6) Another thing that should be in the test is partitioned table with 
attribute mapping and column list, to see how map and attr_map interact.


7) There's a couple places doing this

if (att_map != NULL &&
!bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
   att_map) &&
!bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
   idattrs) &&
!replidentfull)

which is really hard to understand (even if we get rid of the offset), 
so maybe let's move that to a function with sensible name. Also, some 
places don't check indattrs - seems a bit suspicious.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom fb5ce02d36b46f92ab01c9a823cc4e315cfcb73c Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 14 Dec 2021 20:53:28 +0100
Subject: [PATCH 1/2] v9

---
 doc/src/sgml/ref/alter_publication.sgml  |   4 +-
 doc/src/sgml/ref/create_publication.sgml |  11 +-
 src/backend/catalog/dependency.c |   8 +-
 src/backend/catalog/objectaddress.c  |   8 +
 src/backend/catalog/pg_depend.c  |  50 ++
 src/backend/catalog/pg_publication.c | 106 +++-
 src/backend/commands/publicationcmds.c   |  94 ++-
 src/backend/commands/tablecmds.c |  10 +-
 src/backend/nodes/copyfuncs.c|   1 +
 src/backend/nodes/equalfuncs.c   |   1 +
 src/backend/parser/gram.y|  36 -
 src/backend/replication/logical/proto.c  |  97 ---
 src/backend/replication/logical/tablesync.c  | 117 +-
 src/backend/replication/pgoutput/pgoutput.c  |  80 +++--
 src/bin/pg_dump/pg_dump.c|  41 -
 src/bin/pg_dump/pg_dump.h|   1 +
 src/bin/psql/describe.c  |  26 ++-
 src/bin/psql/tab-complete.c  |   2 +
 src/include/catalog/dependency.h |   3 +
 src/include/catalog/pg_publication.h |   1 +
 src/include/catalog/pg_publication_rel.h |   3 +
 src/include/commands/publicationcmds.h   |   2 +-
 src/include/nodes/parsenodes.h   |   1 +
 src/include/replication/logicalproto.h   |   6 +-
 src/test/regress/expected/publication.out|  39 -
 src/test/regress/sql/publication.sql |  19 ++-
 src/test/subscription/t/021_column_filter.pl | 162 +++
 27 files changed, 857 insertions(+), 72 deletions(-)
 create mode 100644 src/test/subscription/t/021_column_filter.pl

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index bb4ef5e5e2..c86055b93c 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -30,7 +30,7 @@ ALTER PUBLICATION name RENAME TO where publication_object is one of:
 
-TABLE [ ONLY ] table_name [ * ] [, ... ]
+TABLE [ ONLY ] table_name [ * ]  [ ( column_name, [, ... ] ) ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
@@ -110,6 +110,8 @@ ALTER PUBLICATION name RENAME TO * can be specified after the table
   name to explicitly indicate that descendant tables

Re: WIN32 pg_import_system_collations

2021-12-14 Thread Thomas Munro

On Wed, Dec 15, 2021 at 9:14 AM Juan José Santamaría Flecha
 wrote:
> What I meant to say is that to run the test, you need a database that has 
> successfully run pg_import_system_collations. This would be also possible in 
> Mingw for _WIN32_WINNT> = 0x0600, but the current value in 
> src\include\port\win32.h is _WIN32_WINNT = 0x0501 when compiling with Mingw.

Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
just stop thinking about these old ghosts, as mentioned by various
people in various threads.  Do you happen to know if there are
complications for that, with the non-MSVC tool chains?

Re: Synchronizing slots from primary to standby

2021-12-14 Thread Peter Eisentraut


On 31.10.21 11:08, Peter Eisentraut wrote:
I want to reactivate $subject.  I took Petr Jelinek's patch from [0], 
rebased it, added a bit of testing.  It basically works, but as 
mentioned in [0], there are various issues to work out.


The idea is that the standby runs a background worker to periodically 
fetch replication slot information from the primary.  On failover, a 
logical subscriber would then ideally find up-to-date replication slots 
on the new publisher and can just continue normally.


So, again, this isn't anywhere near ready, but there is already a lot 
here to gather feedback about how it works, how it should work, how to 
configure it, and how it fits into an overall replication and HA 
architecture.


Here is an updated patch.  The main changes are that I added two 
configuration parameters.  The first, synchronize_slot_names, is set on 
the physical standby to specify which slots to sync from the primary. 
By default, it is empty.  (This also fixes the recovery test failures 
that I had to disable in the previous patch version.)  The second, 
standby_slot_names, is set on the primary.  It holds back logical 
replication until the listed physical standbys have caught up.  That 
way, when failover is necessary, the promoted standby is not behind the 
logical replication consumers.


In principle, this works now, I think.  I haven't made much progress in 
creating more test cases for this; that's something that needs more 
attention.


It's worth pondering what the configuration language for 
standby_slot_names should be.  Right now, it's just a list of slots that 
all need to be caught up.  More complicated setups are conceivable. 
Maybe you have standbys S1 and S2 that are potential failover targets 
for logical replication consumers L1 and L2, and also standbys S3 and S4 
that are potential failover targets for logical replication consumers L3 
and L4.  Viewed like that, this setting could be a replication slot 
setting.  The setting might also have some relationship with 
synchronous_standby_names.  Like, if you have synchronous_standby_names 
set, then that's a pretty good indication that you also want some or all 
of those standbys in standby_slot_names.  (But note that one is slots 
and one is application names.)  So there are a variety of possibilities.From f1d306bfe3eb657b5d14b0ef3024586083beb4ed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 14 Dec 2021 22:20:59 +0100
Subject: [PATCH v2] Synchronize logical replication slots from primary to
 standby

Discussion: 
https://www.postgresql.org/message-id/flat/514f6f2f-6833-4539-39f1-96cd1e011f23%40enterprisedb.com
---
 doc/src/sgml/config.sgml  |  34 ++
 src/backend/commands/subscriptioncmds.c   |   4 +-
 src/backend/postmaster/bgworker.c |   3 +
 .../libpqwalreceiver/libpqwalreceiver.c   |  96 
 src/backend/replication/logical/Makefile  |   1 +
 src/backend/replication/logical/launcher.c| 202 ++---
 .../replication/logical/reorderbuffer.c   |  85 
 src/backend/replication/logical/slotsync.c| 412 ++
 src/backend/replication/logical/tablesync.c   |  13 +-
 src/backend/replication/repl_gram.y   |  32 +-
 src/backend/replication/repl_scanner.l|   1 +
 src/backend/replication/slotfuncs.c   |   2 +-
 src/backend/replication/walsender.c   | 196 -
 src/backend/utils/activity/wait_event.c   |   3 +
 src/backend/utils/misc/guc.c  |  23 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/commands/subscriptioncmds.h   |   3 +
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/replnodes.h |   9 +
 src/include/replication/logicalworker.h   |   9 +
 src/include/replication/slot.h|   4 +-
 src/include/replication/walreceiver.h |  16 +
 src/include/replication/worker_internal.h |   8 +-
 src/include/utils/wait_event.h|   1 +
 src/test/recovery/t/030_slot_sync.pl  |  58 +++
 25 files changed, 1148 insertions(+), 70 deletions(-)
 create mode 100644 src/backend/replication/logical/slotsync.c
 create mode 100644 src/test/recovery/t/030_slot_sync.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..2b2a21a251 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4406,6 +4406,23 @@ Primary Server
   
  
 
+ 
+  standby_slot_names (string)
+  
+   standby_slot_names configuration 
parameter
+  
+  
+  
+   
+List of physical replication slots that logical replication waits for.
+If a logical replication connection is meant to switch to a physical
+standby after the standby is promoted, the physical replication slot
+for the standby should be listed here.  This ensures that logical
+replication is not ahead of the physical standby.
+   
+  
+

Re: Synchronizing slots from primary to standby

2021-12-14 Thread Peter Eisentraut


On 24.11.21 07:11, Masahiko Sawada wrote:

I haven’t looked at the patch deeply but regarding 007_sync_rep.pl,
the tests seem to fail since the tests rely on the order of the wal
sender array on the shared memory. Since a background worker for
synchronizing replication slots periodically connects to the walsender
on the primary and disconnects, it breaks the assumption of the order.
Regarding 010_logical_decoding_timelines.pl, I guess that the patch
breaks the test because the background worker for synchronizing slots
on the replica periodically advances the replica's slot. I think we
need to have a way to disable the slot synchronization or to specify
the slot name to sync with the primary. I'm not sure we already
discussed this topic but I think we need it at least for testing
purposes.


This has been addressed by patch v2 that adds such a setting.

Re: pg_dump versus ancient server versions

2021-12-14 Thread Tom Lane

I wrote:
> Anyway, it seems like there's some consensus that 9.2 is a good
> stopping place for today.  I'll push forward with
> (1) back-patching as necessary to make 9.2 and up build cleanly
> on the platforms I have handy;
> (2) ripping out pg_dump's support for pre-9.2 servers;
> (3) ripping out psql's support for pre-9.2 servers.

I've completed the pg_dump/pg_dumpall part of that, but while
updating the docs I started to wonder whether we shouldn't nuke
pg_dump's --no-synchronized-snapshots option.  As far as I can
make out, the remaining use case for that is to let you perform an
unsafe parallel dump from a standby server of an out-of-support
major version.  I'm not very clear why we allowed that at all,
ever, rather than saying you can't parallelize in such cases.
But for sure that remaining use case is paper thin, and leaving
the option available seems way more likely to let people shoot
themselves in the foot than to let them do anything helpful.

Barring objections, I'll remove that option in a day or two.

regards, tom lane

Re: Synchronizing slots from primary to standby

2021-12-14 Thread Peter Eisentraut


On 24.11.21 17:25, Dimitri Fontaine wrote:

Is there a case to be made about doing the same thing for physical
replication slots too?


It has been considered.  At the moment, I'm not doing it, because it 
would add more code and complexity and it's not that important.  But it 
could be added in the future.



Given the admitted state of the patch, I didn't focus on tests. I could
successfully apply the patch on-top of current master's branch, and
cleanly compile and `make check`.

Then I also updated pg_auto_failover to support Postgres 15devel [2] so
that I could then `make NODES=3 cluster` there and play with the new
replication command:

   $ psql -d "port=5501 replication=1" -c "LIST_SLOTS;"
   psql:/Users/dim/.psqlrc:24: ERROR:  XX000: cannot execute SQL commands in 
WAL sender for physical replication
   LOCATION:  exec_replication_command, walsender.c:1830
   ...

I'm not too sure about this idea of running SQL in a replication
protocol connection that you're mentioning, but I suppose that's just me
needing to brush up on the topic.


FWIW, the way the replication command parser works, if there is a parse 
error, it tries to interpret the command as a plain SQL command.  But 
that only works for logical replication connections.  So in physical 
replication, if you try to run anything that does not parse, you will 
get this error.  But that has nothing to do with this feature.  The 
above command works for me, so maybe something else went wrong in your 
situation.



Maybe the first question about configuration would be about selecting
which slots a standby should maintain from the primary. Is it all of the
slots that exists on both the nodes, or a sublist of that?

Is it possible to have a slot with the same name on a primary and a
standby node, in a way that the standby's slot would be a completely
separate entity from the primary's slot? If yes (I just don't know at
the moment), well then, should we continue to allow that?


This has been added in v2.


Also, do we want to even consider having the slot management on a
primary node depend on the ability to sync the advancing on one or more
standby nodes? I'm not sure to see that one as a good idea, but maybe we
want to kill it publically very early then ;-)


I don't know what you mean by this.

Re: Synchronizing slots from primary to standby

2021-12-14 Thread Peter Eisentraut


On 28.11.21 07:52, Bharath Rupireddy wrote:

1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).


I looked at that but didn't see an obvious way to consolidate them. 
This is something we could look at again later.



2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?


The interval is currently hardcoded, but could be a configuration 
setting.  In the v2 patch, there is a new setting that orders physical 
replication before logical so that the logical subscribers cannot get 
ahead of the physical standby.



3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?


This sounds like the failover slot feature, which was rejected.

Re: Granting SET and ALTER SYSTE privileges for GUCs

2021-12-14 Thread Joshua Brindle

On Mon, Dec 13, 2021 at 5:34 PM Mark Dilger
 wrote:
>
>
>
> > On Dec 13, 2021, at 1:33 PM, Mark Dilger  
> > wrote:
> >
> > but will repost in a few hours
>
> ... and here it is:

currently there is a failure in check-world (not sure if it's known):

diff -U3 /src/postgres/src/test/regress/expected/guc_privs.out
/src/postgres/src/test/regress/results/guc_privs.out
--- /src/postgres/src/test/regress/expected/guc_privs.out 2021-12-14
14:11:45.0 +
+++ /src/postgres/src/test/regress/results/guc_privs.out 2021-12-14
15:50:52.219583421 +
@@ -39,8 +39,10 @@
 ALTER SYSTEM RESET autovacuum;  -- ok
 -- PGC_SUSET
 SET lc_messages = 'en_US.UTF-8';  -- ok
+ERROR:  invalid value for parameter "lc_messages": "en_US.UTF-8"
 RESET lc_messages;  -- ok
 ALTER SYSTEM SET lc_messages = 'en_US.UTF-8';  -- ok
+ERROR:  invalid value for parameter "lc_messages": "en_US.UTF-8"
 ALTER SYSTEM RESET lc_messages;  -- ok
 -- PGC_SU_BACKEND
 SET jit_debugging_support = OFF;  -- fail, cannot be set after connection start

Aside from that I've tested this and it seems to function as
advertised and in my view is worth adding to PG.

One thing that seems like an omission to me is the absence of a
InvokeObjectPostAlterHook in pg_setting_acl_aclcheck or
pg_setting_acl_aclmask so that MAC extensions can also block this,
InvokeObjectPostCreateHook is already in the create path so a
PostAlter hook seems appropriate.

Thank you.

Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-14 Thread Tom Lane

Peter Eisentraut  writes:
> OK, makes sense.  Here is an alternative patch.  It introduces two 
> light-weight macros strtoi64() and strtou64() (compare e.g., strtoimax() 
> in POSIX) in c.h and removes pg_strtouint64().  This moves the 
> portability layer from numutils.c to c.h, so it's closer to the rest of 
> the int64 portability code.  And that way it is available to not just 
> server code.  And it resolves the namespace collision with the 
> pg_strtointNN() functions in numutils.c.

Works for me.  I'm not in a position to verify that this'll work
on Windows, but the buildfarm will tell us that quickly enough.

regards, tom lane

Re: daitch_mokotoff module

2021-12-14 Thread Dag Lem

Tomas Vondra  writes:

[...]

>
> Thanks, looks interesting. A couple generic comments, based on a quick
> code review.

Thank you for the constructive review!

>
> 1) Can the extension be marked as trusted, just like fuzzystrmatch?

I have now moved the daitch_mokotoff function into the fuzzystrmatch
module, as suggested by Andrew Dunstan.

>
> 2) The docs really need an explanation of what the extension is for,
> not just a link to fuzzystrmatch. Also, a couple examples would be
> helpful, I guess - similarly to fuzzystrmatch. The last line in the
> docs is annoyingly long.

Please see the updated documentation for the fuzzystrmatch module.

>
> 3) What's daitch_mokotov_header.pl for? I mean, it generates the
> header, but when do we need to run it?

It only has to be run if the soundex rules are changed. I have now
made the dependencies explicit in the fuzzystrmatch Makefile.

>
> 4) It seems to require perl-open, which is a module we did not need
> until now. Not sure how well supported it is, but maybe we can use a
> more standard module?

I believe Perl I/O layers have been part of Perl core for two decades
now :-)

>
> 5) Do we need to keep DM_MAIN? It seems to be meant for some kind of
> testing, but our regression tests certainly don't need it (or the
> palloc mockup). I suggest to get rid of it.

Done. BTW this was modeled after dmetaphone.c

>
> 6) I really don't understand some of the comments in
> daitch_mokotov.sql, like for example:
>
> -- testEncodeBasic
> -- Tests covered above are omitted.
>
> Also, comments with names of Java methods seem pretty confusing. It'd
> be better to actually explain what rules are the tests checking.

The tests were copied from various web sites and implementations. I have
cut down on the number of tests and made the comments more to the point.

>
> 7) There are almost no comments in the .c file (ignoring the comment
> on top). Short functions like initialize_node are probably fine
> without one, but e.g. update_node would deserve one.

More comments are added to both the .h and the .c file.

>
> 8) Some of the lines are pretty long (e.g. the update_node signature
> is almost 170 chars). That should be wrapped. Maybe try running
> pgindent on the code, that'll show which parts need better formatting
> (so as not to get broken later).

Fixed. I did run pgindent earlier, however it didn't catch those long
lines.

>
> 9) I'm sure there's better way to get the number of valid chars than this:
>
>   for (i = 0, ilen = 0; (c = read_char(&str[i], &ilen)) && (c < 'A' ||
> c > ']'); i += ilen)
>   {
>   }
>
> Say, a while loop or something?

The code gets to the next encodable character, skipping any other
characters. I have now added a comment which should hopefully make this
clearer, and broken up the for loop for readability.

Please find attached the revised patch.

Best regards

Dag Lem

diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 0704894f88..826e529e3e 100644
--- a/contrib/fuzzystrmatch/Makefile
+++ b/contrib/fuzzystrmatch/Makefile
@@ -3,11 +3,12 @@
 MODULE_big = fuzzystrmatch
 OBJS = \
 	$(WIN32RES) \
+	daitch_mokotoff.o \
 	dmetaphone.o \
 	fuzzystrmatch.o
 
 EXTENSION = fuzzystrmatch
-DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.0--1.1.sql
+DATA = fuzzystrmatch--1.2.sql fuzzystrmatch--1.1--1.2.sql fuzzystrmatch--1.0--1.1.sql
 PGFILEDESC = "fuzzystrmatch - similarities and distance between strings"
 
 REGRESS = fuzzystrmatch
@@ -22,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+daitch_mokotoff.o: daitch_mokotoff.h
+
+daitch_mokotoff.h: daitch_mokotoff_header.pl
+	perl $< > $@
diff --git a/contrib/fuzzystrmatch/daitch_mokotoff.c b/contrib/fuzzystrmatch/daitch_mokotoff.c
new file mode 100644
index 00..302e9a6d86
--- /dev/null
+++ b/contrib/fuzzystrmatch/daitch_mokotoff.c
@@ -0,0 +1,516 @@
+/*
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem 
+ *
+ * This implementation of the Daitch-Mokotoff Soundex System aims at high
+ * performance.
+ *
+ * - The processing of each phoneme is initiated by an O(1) table lookup.
+ * - For phonemes containing more than one character, a coding tree is traversed
+ *   to process the complete phoneme.
+ * - The (alternate) soundex codes are produced digit by digit in-place in
+ *   another tree structure.
+ *
+ *  References:
+ *
+ * https://www.avotaynu.com/soundex.htm
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex
+ * https://stevemorse.org/census/soundex.html (dmlat.php, dmsoundex.php)
+ * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java)
+ * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm)
+ *
+ * A few notes on other implementations:
+ *
+ * - "J" is considered a vowel in dmlat.php
+ * - The official rules for "RS" are commented out in dmlat

Re: Life cycles of tuple descriptors

2021-12-14 Thread Tom Lane

Chapman Flack  writes:
> There are some things about the life cycle of the common TupleDesc
> that I'm not 100% sure about.

> 1. In general, if you get one from relcache or typcache, it is
>reference-counted, right?

Tupdescs for named composite types should be, since those are
potentially modifiable by DDL.  The refcount allows a stale tupdesc
to go away when it's no longer referenced anywhere.

Tupdescs for RECORD types are a different story: there's no way to
change them once created, so they'll live as long as the process
does (at least for tupdescs kept in the typcache).  Refcounting
isn't terribly necessary in that case; and at least for the shared
tupdesc case, we don't do it, to avoid questions of modifying a
piece of shared state.

> 3. Is that shared case the only way you could see a non-refcounted
>TupleDesc handed to you by the typcache?

I'm not sure what happens for a non-shared RECORD tupdesc, but it
probably wouldn't be wise to assume anything either way.

> 5. When a constructed TupleDesc is blessed, the copy placed in the cache
>by assign_record_type_typmod is born with a refcount of 1. Assuming every
>later user of that TupleDesc plays nicely with balanced pin and release,
>what event(s), if any, could ever occur to decrease that initial 1 to 0?

That refcount is describing the cache's own reference, so as long as
that reference remains it'd be incorrect to decrease the refcount to 0.

regards, tom lane

Re: Life cycles of tuple descriptors

2021-12-14 Thread Chapman Flack

On 12/14/21 18:03, Tom Lane wrote:

> Tupdescs for RECORD types are a different story: ... Refcounting
> isn't terribly necessary in that case; and at least for the shared
> tupdesc case, we don't do it, to avoid questions of modifying a
> piece of shared state.

Ok, that's kind of what I was getting at here:

>> 5. When a constructed TupleDesc is blessed, the copy placed in the cache
>>by assign_record_type_typmod is born with a refcount of 1. ...
>>what event(s), if any, could ever occur to decrease that initial 1 ...
>
> That refcount is describing the cache's own reference, so as long as
> that reference remains it'd be incorrect to decrease the refcount to 0.

In the case of a blessed RECORD+typmod tupdesc, *is* there any event that
could ever cause the cache to drop that reference? Or is the tupdesc just
going to live there for the life of the backend, its refcount sometimes
going above and back down to but never below 1?

That would fit with your "refcounting isn't terribly necessary in that
case". If that's how it works, it's interesting having the two different
patterns: if it's a shared one, it has refcount -1 and you never fuss
with it and it never goes away; if it's a local one it has a non-negative
refcount and you go through all the motions and it never goes away anyway.

>> 3. Is that shared case the only way you could see a non-refcounted
>>TupleDesc handed to you by the typcache?
> 
> I'm not sure what happens for a non-shared RECORD tupdesc, but it
> probably wouldn't be wise to assume anything either way.

If I'm reading this right, for the non-shared case, the copy that goes
into the cache is made refcounted. (The copy you presented for blessing
gets the assigned typmod written in it, and no change to its refcount
field.)

There's really just one thing I'm interested in assuming:

*In general*, if I encounter a tupdesc with -1 refcount, I had better not
assume much about its longevity. It might be in a context that's about to
go away. If I'll be wanting it later, I had better defensively copy it
into some context I've chosen.

(Ok, I guess if it's a tupdesc provided to me in a function call, I can
assume it is good for the duration of the call, or if it's part of an
SPI result, I can assume it's good until SPI_finish.)

But if I have gone straight to the typcache to ask for a RECORD tupdesc,
and the one it gives me has -1 refcount, is it reasonable to assume
I can retain a reference to that without the defensive copy?

Regards,
-Chap

Re: Question about 001_stream_rep.pl recovery test

2021-12-14 Thread David Zhang


Thanks a lot Michael for the explanation.

On 2021-12-13 4:05 a.m., Michael Paquier wrote:

On Fri, Dec 10, 2021 at 01:44:40PM -0800, David Zhang wrote:

Inside the test script `src/test/recovery/t/001_stream_rep.pl`, a comment at
line 30 says `my_backup` is "not mandatory",

  30 # Take backup of standby 1 (not mandatory, but useful to check if
  31 # pg_basebackup works on a standby).
  32 $node_standby_1->backup($backup_name);

however if remove the backup folder "my_backup" after line 32, something
like below,

Well, the comment is not completely incorrect IMO.  In this context, I
read taht that we don't need a backup from a standby and we could just
take a backup from the primary instead.  If I were to fix something, I
would suggest to reword the comment as follows:
# Take backup of standby 1 (could be taken from the primary, but this
# is useful to check if pg_basebackup works on a standby).


And then the test script takes another backup `my_backup_2`, but it seems
this backup is not used anywhere.

This had better stay around, per commit f267c1c.  And as far as I can
see, we don't have an equivalent test in a different place, where
pg_basebackup runs on a standby while its *primary* is stopped.
--
Michael

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Masahiko Sawada

On Tue, Dec 14, 2021 at 8:24 PM Amit Kapila  wrote:
>
> On Tue, Dec 14, 2021 at 3:41 PM Dilip Kumar  wrote:
> >
> > On Tue, Dec 14, 2021 at 2:36 PM Amit Kapila  wrote:
> > >
> > > > >
> > > > > I agree with this theory. Can we reflect this in comments so that in
> > > > > the future we know why we didn't pursue this direction?
> > > >
> > > > I might be missing something here, but for streaming, transaction
> > > > users can decide whether they wants to skip or not only once we start
> > > > applying no?  I mean only once we start applying the changes we can
> > > > get some errors and by that time we must be having all the changes for
> > > > the transaction.
> > > >
> > >
> > > That is right and as per my understanding, the patch is trying to
> > > accomplish the same.
> > >
> > > >  So I do not understand the point we are trying to
> > > > discuss here?
> > > >
> > >
> > > The point is that whether we can skip the changes while streaming
> > > itself like when we get the changes and write to a stream file. Now,
> > > it is possible that streams from multiple transactions can be
> > > interleaved and users can change the skip_xid in between. It is not
> > > that we can't handle this but that would require a more complex design
> > > and it doesn't seem worth it because we can anyway skip the changes
> > > while applying as you mentioned in the previous paragraph.
> >
> > Actually, I was trying to understand the use case for skipping while
> > streaming.  Actually, during streaming we are not doing any database
> > operation that means this will not generate any error.
> >
>
> Say, there is an error the first time when we start to apply changes
> for such a transaction. So, such a transaction will be streamed again.
> Say, the user has set the skip_xid before we stream a second time, so
> this time, we can skip it either during the stream phase or apply
> phase. I think the patch is skipping it during apply phase.
> Sawada-San, please confirm if my understanding is correct?

My understanding is the same. The patch doesn't skip the streaming
phase but starts skipping when starting to apply changes. That is, we
receive streamed changes and write them to the stream file anyway
regardless of skip_xid. When receiving the stream-commit message, we
check whether or not we skip this transaction, and if so we apply all
messages in the stream file other than all data modification messages.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: generalized conveyor belt storage

2021-12-14 Thread Peter Geoghegan

On Tue, Dec 14, 2021 at 3:00 PM Robert Haas  wrote:
> I got interested in this problem again because of the
> idea discussed in
> https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com
> of having a "dead TID" relation fork in which to accumulate TIDs that
> have been marked as dead in the table but not yet removed from the
> indexes, so as to permit a looser coupling between table vacuum and
> index vacuum. That's yet another case where you accumulate new data
> and then at a certain point the oldest data can be thrown away because
> its intended purpose has been served.

Thanks for working on this! It seems very strategically important to me.

> So here's a patch. Basically, it lets you initialize a relation fork
> as a "conveyor belt," and then you can add pages of basically
> arbitrary data to the conveyor belt and then throw away old ones and,
> modulo bugs, it will take care of recycling space for you. There's a
> fairly detailed README in the patch if you want a more detailed
> description of how the whole thing works.

How did you test this? I ask because it would be nice if there was a
convenient way to try this out, as somebody with a general interest.
Even just a minimal test module, that you used for development work.

> When I was chatting with Andres about this, he jumped to the question
> of whether this could be used to replace SLRUs. To be honest, it's not
> really designed for applications that are quite that intense.

I personally think that SLRUs (and the related problem of hint bits)
are best addressed by tracking which transactions have modified what
heap blocks (perhaps only approximately), and then eagerly cleaning up
aborted transaction IDs, using a specialized version of VACUUM that
does something like heap pruning for aborted xacts. It just seems
weird that we keep around clog in order to not have to run VACUUM too
frequently, which (among other things) already cleans up after aborted
transactions. Having a more efficient data structure for commit status
information doesn't seem all that promising, because the problem is
actually our insistence on remembering which XIDs aborted almost
indefinitely. There is no fundamental reason why it has to work that
way. I don't mean that it could in principle be changed (that's almost
always true); I mean that it seems like an accident of history, that
could have easily gone another way: the ancestral design of clog
existing in a world without MVCC. It seems like a totally vestigial
thing to me, which I wouldn't say about a lot of other things (just
clog and freezing).

Something like the conveyor belt seems like it would help with this
other kind of VACUUM, mind you. We probably don't want to do anything
like index vacuuming for these aborted transactions (actually maybe we
want to do retail deletion, which is much more likely to work out
there). But putting the TIDs into a store used for dead TIDs could
make sense. Especially in extreme cases.

-- 
Peter Geoghegan

Re: Life cycles of tuple descriptors

2021-12-14 Thread Tom Lane

Chapman Flack  writes:
> But if I have gone straight to the typcache to ask for a RECORD tupdesc,
> and the one it gives me has -1 refcount, is it reasonable to assume
> I can retain a reference to that without the defensive copy?

The API contract for lookup_rowtype_tupdesc specifies that you must "call
ReleaseTupleDesc or DecrTupleDescRefCount when done using the tupdesc".
It's safe to assume that the tupdesc will stick around as long as you
haven't done that.

APIs that don't mention a refcount are handing you a tupdesc of uncertain
lifespan (no more than the current query, likely), so if you want the
tupdesc to last a long time you'd better copy it into storage you control.

regards, tom lane

Re: row filtering for logical replication

2021-12-14 Thread Peter Smith

On Tue, Dec 14, 2021 at 10:12 PM Amit Kapila  wrote:
>
> On Tue, Dec 14, 2021 at 10:50 AM Amit Kapila  wrote:
> >
> > On Tue, Dec 14, 2021 at 4:44 AM Peter Smith  wrote:
> >
> > Few other comments:
> > ===
>
> Few more comments:
> ==
> v46-0001/0002
> ===
> 1. After rowfilter_walker() why do we need
> EXPR_KIND_PUBLICATION_WHERE? I thought this is primarily to identify
> the expressions that are not allowed in rowfilter which we are now
> able to detect upfront with the help of a walker. Can't we instead use
> EXPR_KIND_WHERE?

FYI - I have tried this locally and all tests pass.

~~

If the EXPR_KIND_PUBLICATION_WHERE is removed then there will be some
differences:
- we would get errors for aggregate/grouping functions from the EXPR_KIND_WHERE
- we would get errors for windows functions from the EXPR_KIND_WHERE
- we would get errors for set-returning functions from the EXPR_KIND_WHERE

Actually, IMO this would be a *good* change because AFAIK those are
not all being checked by the row-filter walker. I think the only
reason all tests pass is that there are no specific regression tests
for these cases.

OTOH, there would also be a difference where an error message would
not be as nice. Please see the review comment from Vignesh. [1] The
improved error message is only possible by checking the
EXPR_KIND_PUBLICATION_WHERE.

~~

I think the best thing to do here is to leave the
EXPR_KIND_PUBLICATION_WHERE but simplify code so that the improved
error message remains as the *only* difference in behaviour from the
EXPR_KIND_WHERE. i.e. we should let the other
aggregate/grouping/windows/set function checks give errors exactly the
same as for the EXPR_KIND_WHERE case.

--
[1] 
https://www.postgresql.org/message-id/CALDaNm08Ynr_FzNg%2BdoHj%3D_nBet%2BKZAvNbqmkEEw7M2SPpPEAw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-12-14 Thread Michael Paquier

On Mon, Dec 13, 2021 at 10:14:49PM -0800, Andres Freund wrote:
> Tom's point was that the buildfarm scripts do
>   if ($self->{bfconf}->{using_msvc})
>   @checklog = run_log("perl vcregress.pl upgradecheck");
>   else
> "cd $self->{pgsql}/src/bin/pg_upgrade && $make 
> $instflags check";
> 
> if we don't want to break every buildfarm member that has TestUpgrade enabled
> the moment this is committed, we need to have a backward compat path.

Missed that, thanks!  I'll think about all that a bit more before
sending a long-overdue rebased version.
--
Michael


signature.asc
Description: PGP signature

Re: Life cycles of tuple descriptors

2021-12-14 Thread Chapman Flack

On 12/14/21 20:02, Tom Lane wrote:
> The API contract for lookup_rowtype_tupdesc specifies that you must "call
> ReleaseTupleDesc or DecrTupleDescRefCount when done using the tupdesc".
> It's safe to assume that the tupdesc will stick around as long as you
> haven't done that.

I think what threw me was having a function whose API contract mentions
reference counts, but that sometimes gives me things that don't have them.

But I guess, making the between-the-lines of the contract explicit,
if lookup_rowtype_tupdesc is contracted to give me a tupdesc that sticks
around for as long as I haven't called ReleaseTupleDesc, and it sometimes
elects to give me one for which ReleaseTupleDesc is a no-op, the contract
is still that the thing sticks around for (at least) as long as I haven't
done that.

Cool. :)

Oh, hmm, maybe one thing in that API comment ought to be changed. It says
I must call ReleaseTupleDesc *or* DecrTupleDescRefCount. Maybe that dates
from before the shared registry? ReleaseTupleDesc is safe, but anybody who
uses DecrTupleDescRefCount on a lookup_rowtype_tupdesc result could be
in for an assertion failure if a non-refcounted tupdesc is returned.

Regards,
-Chap

Re: Life cycles of tuple descriptors

2021-12-14 Thread Tom Lane

Chapman Flack  writes:
> On 12/14/21 20:02, Tom Lane wrote:
>> The API contract for lookup_rowtype_tupdesc specifies that you must "call
>> ReleaseTupleDesc or DecrTupleDescRefCount when done using the tupdesc".
>> It's safe to assume that the tupdesc will stick around as long as you
>> haven't done that.

> I think what threw me was having a function whose API contract mentions
> reference counts, but that sometimes gives me things that don't have them.

That's supposed to be hidden under ReleaseTupleDesc; you shouldn't have to
think about it.

> Oh, hmm, maybe one thing in that API comment ought to be changed. It says
> I must call ReleaseTupleDesc *or* DecrTupleDescRefCount. Maybe that dates
> from before the shared registry? ReleaseTupleDesc is safe, but anybody who
> uses DecrTupleDescRefCount on a lookup_rowtype_tupdesc result could be
> in for an assertion failure if a non-refcounted tupdesc is returned.

Yeah, I was just wondering the same.  I think DecrTupleDescRefCount
is safe if you know you are looking up a named composite type, but
maybe that's still too much familiarity with typcache innards.

regards, tom lane

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Masahiko Sawada

On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow  wrote:
>
> On Tue, Dec 14, 2021 at 3:23 PM vignesh C  wrote:
> >
> > While the worker is skipping one of the skip transactions specified by
> > the user and immediately if the user specifies another skip
> > transaction while the skipping of the transaction is in progress this
> > new value will be reset by the worker while clearing the skip xid. I
> > felt once the worker has identified the skip xid and is about to skip
> > the xid, the worker can acquire a lock to prevent concurrency issues:
>
> That's a good point.
> If only the last_error_xid could be skipped, then this wouldn't be an
> issue, right?
> If a different xid to skip is specified while the worker is currently
> skipping a transaction, should that even be allowed?
>

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: WIN32 pg_import_system_collations

2021-12-14 Thread Michael Paquier

On Wed, Dec 15, 2021 at 10:45:28AM +1300, Thomas Munro wrote:
> Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
> just stop thinking about these old ghosts, as mentioned by various
> people in various threads.

Seeing your message here..  My apologies for the short digression.
Would that mean that we could use CreateSymbolicLinkA() as a mapper
for pgreadlink() rather than junction points?  I am wondering how much
code in src/port/ such a move could allow us to do.
--
Michael


signature.asc
Description: PGP signature

Re: parallel vacuum comments

2021-12-14 Thread Peter Geoghegan

On Mon, Dec 13, 2021 at 7:03 PM Amit Kapila  wrote:
> Thanks, I can take care of this before committing. The v9-0001* looks
> good to me as well, so, I am planning to commit that tomorrow unless I
> see more comments or any objection to that.

I would like to thank both Masahiko and yourself for working on this.
It's important.

> There is still pending
> work related to moving parallel vacuum code to a separate file and a
> few other pending comments that are still under discussion. We can
> take care of those in subsequent patches. Do, let me know if you or
> others think differently?

I'm +1 on moving it into a new file. I think that that division makes
perfect sense. It will make the design of parallel VACUUM easier to
understand. I believe that index vacuuming (whether or not it involves
parallel workers) ought to be a more or less distinct operation to
heap vacuuming. With a distinct autovacuum schedule (well, the
schedule would be related, but still distinct).

-- 
Peter Geoghegan

Re: parallel vacuum comments

2021-12-14 Thread Amit Kapila

On Wed, Dec 15, 2021 at 8:23 AM Peter Geoghegan  wrote:
>
> On Mon, Dec 13, 2021 at 7:03 PM Amit Kapila  wrote:
> > Thanks, I can take care of this before committing. The v9-0001* looks
> > good to me as well, so, I am planning to commit that tomorrow unless I
> > see more comments or any objection to that.
>
> I would like to thank both Masahiko and yourself for working on this.
> It's important.
>
> > There is still pending
> > work related to moving parallel vacuum code to a separate file and a
> > few other pending comments that are still under discussion. We can
> > take care of those in subsequent patches. Do, let me know if you or
> > others think differently?
>
> I'm +1 on moving it into a new file. I think that that division makes
> perfect sense. It will make the design of parallel VACUUM easier to
> understand.
>

Agreed and thanks for your support.

-- 
With Regards,
Amit Kapila.

Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

2021-12-14 Thread Kyotaro Horiguchi

At Tue, 14 Dec 2021 19:04:09 +0530, Bharath Rupireddy 
 wrote in 
> On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera  
> > wrote:
> > > I think the PIDs are log-worthy for sure, but it's not clear to me that
> > > it is desirable to write them to the persistent state file.  In case of
> > > crashes, the log should serve just fine to aid root cause investigation
> > > -- in fact even better than the persistent file, where the data would be
> > > lost as soon as the next client acquires that slot.
> >
> > Thanks. +1 to log a message at LOG level whenever a replication slot
> > becomes active (gets assigned a valid pid to active_pid) and becomes
> > inactive(gets assigned 0 to active_pid). Having said that, isn't it
> > also helpful if we write a bool (1 byte character) whenever the slot
> > becomes active and inactive to the disk?
> 
> Here's the patch that adds a LOG message whenever a replication slot
> becomes active and inactive. These logs will be extremely useful on
> production servers to debug and analyze inactive replication slot
> issues.
> 
> Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG:  replication slot "s1" becomes active
[22054:client backend] DETAIL:  The process with PID 22054 acquired it.
[22054:client backend] STATEMENT:  select pg_drop_replication_slot('s1');
[22054:client backend] LOG:  replication slot "s1" becomes inactive
[22054:client backend] DETAIL:  The process with PID 22054 released it.
[22054:client backend] STATEMENT:  select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines.  The
process PID shown is of the process the slot operations took place so
I think it conveys no information.  The STATEMENT lines are also noisy
for non-ERROR emssages.  Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender]  LOG:  acquired replication slot "s1"
[17156:walsender]  LOG:  released replication slot "s1"

I think in the first place we don't need this log lines at slot
creation since it is actually not acquirement nor releasing of a slot.

It behaves in a strange way when executing pg_basebackup.

[22864:walsender] LOG:  replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL:  The process with PID 22864 acquired it.
[22864:walsender] STATEMENT:  CREATE_REPLICATION_SLOT "pg_basebackup_22864" 
TEMPORARY PHYSICAL ( RESERVE_WAL)
[22864:walsender] LOG:  replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL:  The process with PID 22864 acquired it.
[22864:walsender] STATEMENT:  START_REPLICATION SLOT "pg_basebackup_22864" 
0/600 TIMELINE 1
[22864:walsender] LOG:  replication slot "pg_basebackup_22864" becomes inactive
[22864:walsender] DETAIL:  The process with PID 22864 released it.

The slot is acquired twice then released once.  It is becuase the
patch doesn't emit "becomes inactive" line when releasing a temporary
slot. However, I'd rather think we don't need the first 'become
active' line like the previous example.

@@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
slot->active_pid = 0;
slot->in_use = false;
LWLockRelease(ReplicationSlotControlLock);
+
+   if (pid > 0)
+   ereport(LOG,
+   (errmsg("replication slot \"%s\" becomes 
inactive",
+   NameStr(slot->data.name)),
+errdetail("The process with PID %d released 
it.", pid)));
+

This is wrong.  I see a "become inactive" message if I droped an
"inactive" replication slot.  The reason the inactive slot looks as if
it were acquired is it is temporarily aquired as a preparing step of
dropping.

Even assuming that the log lines are simplified to this extent, I
still see it a bit strange that the "becomes acitve (or acruied)"
message shows alone without having a message like "replication
connection accepted".  But that would be another issue even if it is
true.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Confused comment about drop replica identity index

2021-12-14 Thread Michael Paquier

On Tue, Dec 14, 2021 at 07:10:49PM +0530, Ashutosh Bapat wrote:
> This code in RelationGetIndexList() is not according to that comment.
> 
>if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex))
> relation->rd_replidindex = pkeyIndex;
> else if (replident == REPLICA_IDENTITY_INDEX && 
> OidIsValid(candidateIndex))
> relation->rd_replidindex = candidateIndex;
> else
> relation->rd_replidindex = InvalidOid;

Yeah, the comment is wrong.  If the index of a REPLICA_IDENTITY_INDEX
is dropped, I recall that the behavior is the same as
REPLICA_IDENTITY_NOTHING.

> Comment in code is one thing, but I think PG documentation is not
> covering the use case you tried. What happens when a replica identity
> index is dropped has not been covered either in ALTER TABLE
> https://www.postgresql.org/docs/13/sql-altertable.html or DROP INDEX
> https://www.postgresql.org/docs/14/sql-dropindex.html documentation.

Not sure about the DROP INDEX page, but I'd be fine with mentioning
that in the ALTER TABLE page in the paragraph related to REPLICA
IDENTITY.  While on it, I would be tempted to switch this stuff to use
a list of  for all the option values.  That would be
much easier to read.

[ ... thinks a bit ... ]

FWIW, this brings back some memories, as of this thread:
https://www.postgresql.org/message-id/20200522035028.go2...@paquier.xyz

See also commit fe7fd4e from August 2020, where some tests have been
added.  I recall seeing this incorrect comment from last year's
thread and it may have been mentioned in one of the surrounding
threads..  Maybe I just let it go back then.  I don't know.
--
Michael


signature.asc
Description: PGP signature

Re: row filtering for logical replication

2021-12-14 Thread Amit Kapila

On Wed, Dec 15, 2021 at 6:47 AM Peter Smith  wrote:
>
> On Tue, Dec 14, 2021 at 10:12 PM Amit Kapila  wrote:
> >
> > On Tue, Dec 14, 2021 at 10:50 AM Amit Kapila  
> > wrote:
> > >
> > > On Tue, Dec 14, 2021 at 4:44 AM Peter Smith  wrote:
> > >
> > > Few other comments:
> > > ===
> >
> > Few more comments:
> > ==
> > v46-0001/0002
> > ===
> > 1. After rowfilter_walker() why do we need
> > EXPR_KIND_PUBLICATION_WHERE? I thought this is primarily to identify
> > the expressions that are not allowed in rowfilter which we are now
> > able to detect upfront with the help of a walker. Can't we instead use
> > EXPR_KIND_WHERE?
>
> FYI - I have tried this locally and all tests pass.
>
> ~~
>
> If the EXPR_KIND_PUBLICATION_WHERE is removed then there will be some
> differences:
> - we would get errors for aggregate/grouping functions from the 
> EXPR_KIND_WHERE
> - we would get errors for windows functions from the EXPR_KIND_WHERE
> - we would get errors for set-returning functions from the EXPR_KIND_WHERE
>
> Actually, IMO this would be a *good* change because AFAIK those are
> not all being checked by the row-filter walker. I think the only
> reason all tests pass is that there are no specific regression tests
> for these cases.
>
> OTOH, there would also be a difference where an error message would
> not be as nice. Please see the review comment from Vignesh. [1] The
> improved error message is only possible by checking the
> EXPR_KIND_PUBLICATION_WHERE.
>
> ~~
>
> I think the best thing to do here is to leave the
> EXPR_KIND_PUBLICATION_WHERE but simplify code so that the improved
> error message remains as the *only* difference in behaviour from the
> EXPR_KIND_WHERE. i.e. we should let the other
> aggregate/grouping/windows/set function checks give errors exactly the
> same as for the EXPR_KIND_WHERE case.
>

I am not sure if  "the better error message" is a good enough reason
to introduce this new kind. I thought it is better to deal with that
in rowfilter_walker.


-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Dilip Kumar

On Tue, Dec 14, 2021 at 4:54 PM Amit Kapila  wrote:
>

> > Actually, I was trying to understand the use case for skipping while
> > streaming.  Actually, during streaming we are not doing any database
> > operation that means this will not generate any error.
> >
>
> Say, there is an error the first time when we start to apply changes
> for such a transaction. So, such a transaction will be streamed again.
> Say, the user has set the skip_xid before we stream a second time, so
> this time, we can skip it either during the stream phase or apply
> phase. I think the patch is skipping it during apply phase.
> Sawada-San, please confirm if my understanding is correct?
>

Got it, thanks for clarifying.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: WIN32 pg_import_system_collations

2021-12-14 Thread Thomas Munro

On Wed, Dec 15, 2021 at 3:52 PM Michael Paquier  wrote:
> On Wed, Dec 15, 2021 at 10:45:28AM +1300, Thomas Munro wrote:
> > Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
> > just stop thinking about these old ghosts, as mentioned by various
> > people in various threads.
>
> Seeing your message here..  My apologies for the short digression.
> Would that mean that we could use CreateSymbolicLinkA() as a mapper
> for pgreadlink() rather than junction points?  I am wondering how much
> code in src/port/ such a move could allow us to do.

Sadly, (1) it wouldn't work unless running with a special privilege or
as admin, and (2) it wouldn't work on non-NTFS filesystems.  I think
it's mostly intended to allow things like unpacking tarballs, checking
out git repos etc etc etc that came from Unix systems, which is why it
works with 'developer mode' enabled[1], though obviously it wouldn't
be totally impossible for us to require that privilege.  Didn't seem
great to me, though, that's why I gave up on it over in
https://commitfest.postgresql.org/36/3090/ where this was recently
discussed.

[1] https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Amit Kapila

On Wed, Dec 15, 2021 at 8:19 AM Masahiko Sawada  wrote:
>
> On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow  wrote:
> >
> > On Tue, Dec 14, 2021 at 3:23 PM vignesh C  wrote:
> > >
> > > While the worker is skipping one of the skip transactions specified by
> > > the user and immediately if the user specifies another skip
> > > transaction while the skipping of the transaction is in progress this
> > > new value will be reset by the worker while clearing the skip xid. I
> > > felt once the worker has identified the skip xid and is about to skip
> > > the xid, the worker can acquire a lock to prevent concurrency issues:
> >
> > That's a good point.
> > If only the last_error_xid could be skipped, then this wouldn't be an
> > issue, right?
> > If a different xid to skip is specified while the worker is currently
> > skipping a transaction, should that even be allowed?
> >
>
> We don't expect such usage but yes, it could happen and seems not
> good. I thought we can acquire Share lock on pg_subscription during
> the skip but not sure it's a good idea. It would be better if we can
> find a way to allow users to specify only XID that has failed.
>

Yeah, but as we don't have a definite way to allow specifying only
failed XID, I think it is better to use share lock on that particular
subscription. We are already using it for add/update rel state (see,
AddSubscriptionRelState, UpdateSubscriptionRelState), so this will be
another place to use a similar technique.

-- 
With Regards,
Amit Kapila.

Re: more descriptive message for process termination due to max_slot_wal_keep_size

2021-12-14 Thread Kyotaro Horiguchi

At Tue, 14 Dec 2021 19:31:21 +0530, Ashutosh Bapat 
 wrote in 
> On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi
>  wrote:
> > > [17605] LOG:  terminating process 17614 to release replication slot "s1"
> > + [17605] DETAIL:  The slot's restart_lsn 0/2CA0 exceeds 
> > max_slot_wal_keep_size.
> > > [17614] FATAL:  terminating connection due to administrator command
> > > [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2CA0 
> > > exceeds max_slot_wal_keep_size
> >
> > Somewhat the second and fourth lines look inconsistent each other but
> > that wouldn't be such a problem.  I don't think we want to concatenate
> > the two lines together as the result is a bit too long.
> >
> > > LOG:  terminating process 17614 to release replication slot "s1" because 
> > > it's restart_lsn 0/2CA0 exceeds max_slot_wal_keep_size.
> >
> > What do you think about this?
> 
> Agree. I think we should also specify the restart_lsn value which
> would be within max_slot_wal_keep_size for better understanding.

Thanks!  It seems to me the main message of the "invalidating" log has
no room for further detail.  So I split the reason out to DETAILS line
the same way with the "terminating" message in the attached second
patch. (It is separated from the first patch just for review) I
believe someone can make the DETAIL message simpler or more natural.

The attached patch set emits the following message.

> LOG:  invalidating slot "s1"
> DETAIL:  The slot's restart_lsn 0/1D68 is behind the limit 0/1100 
> defined by max_slot_wal_keep_size.

The second line could be changed like the following or anything other.

> DETAIL:  The slot's restart_lsn 0/1D68 got behind the limit 0/1100 
> determined by max_slot_wal_keep_size.
.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 6909d854a0d48f2504cbb6dfbc7eb571fd32bcd8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 14 Dec 2021 10:50:00 +0900
Subject: [PATCH v2 1/2] Make an error message about process termination more
 descriptive

If checkpointer kills a process due to a temporary replication slot
exceeding max_slot_wal_keep_size, the messages fails to describe the
cause of the termination.  It is finally shown for persistent slots in
a subsequent message, but that message doesn't show for temporary
slots.  It's better to attach an explainaion to the termination
message directly.

Reported-by: Alex Enachioaie 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..cba9a29113 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			{
 ereport(LOG,
 		(errmsg("terminating process %d to release replication slot \"%s\"",
-active_pid, NameStr(slotname;
+active_pid, NameStr(slotname)),
+		 errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn;
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
-- 
2.27.0

>From 265a84fa6a20ee7c5178ace41096df1a4b00465c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 15 Dec 2021 13:08:20 +0900
Subject: [PATCH v2 2/2] Make the message further detailed

---
 src/backend/replication/slot.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index cba9a29113..695151e484 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1255,7 +1255,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 ereport(LOG,
 		(errmsg("terminating process %d to release replication slot \"%s\"",
 active_pid, NameStr(slotname)),
-		 errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn;
+		 errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by max_slot_wal_keep_size.",
+   LSN_FORMAT_ARGS(restart_lsn),
+   LSN_FORMAT_ARGS(oldestLSN;
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
@@ -1292,9 +1294,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotRelease();
 
 			ereport(LOG,
-	(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
-			NameStr(slotname),
-			LSN_FORMAT_ARGS(restart_lsn;
+	(errmsg("invalidating slot \"%s\"",
+			NameStr(slotname)),
+	 errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by max_slot_wal_keep_size.",
+			LSN_FORMAT_ARGS(restart_lsn),
+			LSN_FORMAT_ARGS(oldestLSN;
 
 			/* done with this

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Amit Kapila

On Tue, Dec 14, 2021 at 11:40 AM Dilip Kumar  wrote:
>
> On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada  wrote:
> >
> > Skipping a whole transaction by specifying xid would be a good start.
> > Ideally, we'd like to automatically skip only operations within the
> > transaction that fail but it seems not easy to achieve. If we allow
> > specifying operations and/or relations, probably multiple operations
> > or relations need to be specified in some cases. Otherwise, the
> > subscriber cannot continue logical replication if the transaction has
> > multiple operations on different relations that fail. But similar to
> > the idea of specifying multiple xids, we need to note the fact that
> > user wouldn't know of the second operation failure unless the apply
> > worker applies the change. So I'm not sure there are many use cases in
> > practice where users can specify multiple operations and relations in
> > order to skip applies that fail.
>
> I think there would be use cases for specifying the relations or
> operation, e.g. if the user finds an issue in inserting in a
> particular relation then maybe based on some manual investigation he
> founds that the table has some constraint due to that it is failing on
> the subscriber side but on the publisher side that constraint is not
> there so maybe the user is okay to skip the changes for this table and
> not for other tables, or there might be a few more tables which are
> designed based on the same principle and can have similar error so
> isn't it good to provide an option to give the list of all such
> tables.
>

That's right and I agree there could be some use case for it and even
specifying the operation but I think we can always extend the existing
feature for it if the need arises. Note that the user can anyway only
specify a single relation or an operation because there is a way to
know only one error and till that is resolved, the apply process won't
proceed. We have discussed providing these additional options in this
thread but thought of doing it later once we have the base feature and
based on the feedback from users.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Dilip Kumar

On Wed, Dec 15, 2021 at 9:46 AM Amit Kapila  wrote:
>
> On Tue, Dec 14, 2021 at 11:40 AM Dilip Kumar  wrote:
>
> That's right and I agree there could be some use case for it and even
> specifying the operation but I think we can always extend the existing
> feature for it if the need arises. Note that the user can anyway only
> specify a single relation or an operation because there is a way to
> know only one error and till that is resolved, the apply process won't
> proceed. We have discussed providing these additional options in this
> thread but thought of doing it later once we have the base feature and
> based on the feedback from users.

Yeah, I only wanted to make the point that this could be useful, it
seems we are on the same page.  I agree we can extend it in the future
as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2021-12-14 Thread Greg Nancarrow

On Mon, Dec 13, 2021 at 8:49 PM Peter Smith  wrote:
>
> PSA the v46* patch set.
>

0001

(1)

"If a subscriber is a pre-15 version, the initial table
synchronization won't use row filters even if they are defined in the
publisher."

Won't this lead to data inconsistencies or errors that otherwise
wouldn't happen? Should such subscriptions be allowed?

(2) In the 0001 patch comment, the term "publication filter" is used
in one place, and in others "row filter" or "row-filter".


src/backend/catalog/pg_publication.c
(3) GetTransformedWhereClause() is missing a function comment.

(4)
The following comment seems incomplete:

+ /* Fix up collation information */
+ whereclause = GetTransformedWhereClause(pstate, pri, true);


src/backend/parser/parse_relation.c
(5)
wording? consistent?
Shouldn't it be "publication WHERE expression" for consistency?

+ errmsg("publication row-filter WHERE invalid reference to table \"%s\"",
+ relation->relname),


src/backend/replication/logical/tablesync.c
(6)

(i) Improve wording:

BEFORE:
 /*
  * Get information about remote relation in similar fashion the RELATION
- * message provides during replication.
+ * message provides during replication. This function also returns the relation
+ * qualifications to be used in COPY command.
  */

AFTER:
 /*
- * Get information about remote relation in similar fashion the RELATION
- * message provides during replication.
+ * Get information about a remote relation, in a similar fashion to
how the RELATION
+ * message provides information during replication. This function
also returns the relation
+ * qualifications to be used in the COPY command.
  */

(ii) fetch_remote_table_info() doesn't currently account for ALL
TABLES and ALL TABLES IN SCHEMA.


src/backend/replication/pgoutput/pgoutput.c
(7) pgoutput_tow_filter()
I think that the "ExecDropSingleTupleTableSlot(entry->scantuple);" is
not needed in pgoutput_tow_filter() - I don't think it can be non-NULL
when entry->exprstate_valid is false

(8) I am a little unsure about this "combine filters on copy
(irrespective of pubaction)" functionality. What if a filter is
specified and the only pubaction is DELETE?


0002

src/backend/catalog/pg_publication.c
(1) rowfilter_walker()
One of the errdetail messages doesn't begin with an uppercase letter:

+   errdetail_msg = _("user-defined types are not allowed");


src/backend/executor/execReplication.c
(2) CheckCmdReplicaIdentity()

Strictly speaking, the following:

+ if (invalid_rfcolnum)

should be:

+ if (invalid_rfcolnum != InvalidAttrNumber)


0003

src/backend/replication/logical/tablesync.c
(1)
Column name in comment should be "puballtables" not "puballtable":

+ * If any publication has puballtable true then all row-filtering is

(2) pgoutput_row_filter_init()

There should be a space before the final "*/" (so the asterisks align).
Also, should say "... treated the same".

  /*
+ * If the publication is FOR ALL TABLES then it is treated same as if this
+ * table has no filters (even if for some other publication it does).
+ */


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Skipping logical replication transactions on subscriber side

2021-12-14 Thread Greg Nancarrow

On Wed, Dec 15, 2021 at 1:49 PM Masahiko Sawada  wrote:
>
> We don't expect such usage but yes, it could happen and seems not
> good. I thought we can acquire Share lock on pg_subscription during
> the skip but not sure it's a good idea. It would be better if we can
> find a way to allow users to specify only XID that has failed.
>

Yes, I agree that would be better.
If you didn't do that, I think you'd need to queue the XIDs to be
skipped (rather than locking).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Failed transaction statistics to measure the logical replication progress

2021-12-14 Thread Masahiko Sawada

On Tue, Dec 14, 2021 at 11:28 AM Amit Kapila  wrote:
>
> On Mon, Dec 13, 2021 at 5:48 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, December 13, 2021 6:19 PM Amit Kapila  
> > wrote:
> > > On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > >
> > > Few questions and comments:
> > Thank you for your comments !
> >
> > > 
> > > 1.
> > > The pg_stat_subscription_workers view will
> > > contain
> > > one row per subscription worker on which errors have occurred, for 
> > > workers
> > > applying logical replication changes and workers handling the initial 
> > > data
> > > -   copy of the subscribed tables.  The statistics entry is removed when 
> > > the
> > > -   corresponding subscription is dropped.
> > > +   copy of the subscribed tables. Also, the row corresponding to the 
> > > apply
> > > +   worker shows all transaction statistics of both types of workers on 
> > > the
> > > +   subscription. The statistics entry is removed when the corresponding
> > > +   subscription is dropped.
> > >
> > > Why did you choose to show stats for both types of workers in one row?
> > This is because if we have hundreds or thousands of tables for table sync,
> > we need to create many entries to cover them and store the entries for all 
> > tables.
> >
>
> If we fear a large number of entries for such workers then won't it be
> better to show the value of these stats only for apply workers. I
> think normally the table sync workers perform only copy operation or
> maybe a fixed number of xacts, so, one might not be interested in the
> transaction stats of these workers. I find merging only specific stats
> of two different types of workers confusing.
>
> What do others think about this?

I understand the concern to have a large number of entries but I agree
that merging only specific stats would confuse users. As Amit
suggested, it'd be better to show only apply workers' transaction
stats.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: generalized conveyor belt storage

2021-12-14 Thread Dilip Kumar

On Wed, Dec 15, 2021 at 6:33 AM Peter Geoghegan  wrote:
>

> How did you test this? I ask because it would be nice if there was a
> convenient way to try this out, as somebody with a general interest.
> Even just a minimal test module, that you used for development work.
>

I have tested this using a simple extension over the conveyor belt
APIs, this extension provides wrapper apis over the conveyor belt
APIs.  These are basic APIs which can be extended even further for
more detailed testing, e.g. as of now I have provided an api to read
complete page from the conveyor belt but that can be easily extended
to read from a particular offset and also the amount of data to read.

Parallelly I am also testing this by integrating it with the vacuum,
which is still a completely WIP patch and needs a lot of design level
improvement so not sharing it, so once it is in better shape I will
post that in the separate thread.

Basically, we have decoupled different vacuum phases (only for manual
vacuum ) something like below,
VACUUM (first_pass) t;
VACUUM idx;
VACUUM (second_pass) t;

So in the first pass we are just doing the first pass of vacuum and
wherever we are calling lazy_vacuum() we are storing those dead tids
in the conveyor belt.  In the index pass, user can vacuum independent
index and therein it will just fetch the last conveyor belt point upto
which it has already vacuum, then from there load dead tids which can
fit in maintenance_work_mem and then call the index bulk delete (this
will be done in loop until we complete the index vacuum).  In the
second pass, we check all the indexes and find the minimum conveyor
belt point upto which all indexes have vacuumed.  We also fetch the
last point where we left the second pass of the heap.  Now we fetch
the dead tids from the conveyor belt (which fits in
maintenance_work_mem) from the last vacuum point of heap upto the min
index vacuum point.  And perform the second heap pass.

I have given the highlights of the decoupling work just to show what
sort of testing we are doing for the conveyor belt.  But we can
discuss this on a separate thread when I am ready to post that patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From b6affd2a778b7b4cff5738ad99f34ea21a816562 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 15 Dec 2021 10:28:49 +0530
Subject: [PATCH v1] Conveyor belt testing extention

This extention provide wrapper over the conveyor belt infrastructure
for testing the conveyor belt.
---
 contrib/pg_conveyor/Makefile |  23 +++
 contrib/pg_conveyor/expected/pg_conveyor.out | 185 
 contrib/pg_conveyor/pg_conveyor--1.0.sql |  32 +
 contrib/pg_conveyor/pg_conveyor.c| 207 +++
 contrib/pg_conveyor/pg_conveyor.control  |   5 +
 contrib/pg_conveyor/sql/pg_conveyor.sql  | 125 
 src/common/relpath.c |   3 +-
 src/include/common/relpath.h |   5 +-
 8 files changed, 582 insertions(+), 3 deletions(-)
 create mode 100644 contrib/pg_conveyor/Makefile
 create mode 100644 contrib/pg_conveyor/expected/pg_conveyor.out
 create mode 100644 contrib/pg_conveyor/pg_conveyor--1.0.sql
 create mode 100644 contrib/pg_conveyor/pg_conveyor.c
 create mode 100644 contrib/pg_conveyor/pg_conveyor.control
 create mode 100644 contrib/pg_conveyor/sql/pg_conveyor.sql

diff --git a/contrib/pg_conveyor/Makefile b/contrib/pg_conveyor/Makefile
new file mode 100644
index 000..8c29ffd
--- /dev/null
+++ b/contrib/pg_conveyor/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_conveyor/Makefile
+
+MODULE_big = pg_conveyor
+OBJS = \
+	$(WIN32RES) \
+	pg_conveyor.o
+
+EXTENSION = pg_conveyor
+DATA = pg_conveyor--1.0.sql
+PGFILEDESC = "pg_conveyor - conveyor belt test"
+
+REGRESS = pg_conveyor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_conveyor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_conveyor/expected/pg_conveyor.out b/contrib/pg_conveyor/expected/pg_conveyor.out
new file mode 100644
index 000..6ae5cd3
--- /dev/null
+++ b/contrib/pg_conveyor/expected/pg_conveyor.out
@@ -0,0 +1,185 @@
+CREATE EXTENSION pg_conveyor;
+CREATE TABLE test(a int);
+SELECT pg_conveyor_init('test'::regclass::oid, 4);
+ pg_conveyor_init 
+--
+ 
+(1 row)
+
+SELECT pg_conveyor_insert('test'::regclass::oid, 'test_data');
+ pg_conveyor_insert 
+
+ 
+(1 row)
+
+SELECT pg_conveyor_read('test'::regclass::oid, 0);
+ pg_conveyor_read 
+--
+ test_data
+(1 row)
+
+--CASE1
+do $$
+<>
+declare
+  i int := 0;
+  data varchar;
+begin
+  for i in 1..1000 loop
+	data := 'test_dataa' || i;
+	PERFORM pg_conveyor_insert('test'::regclass::oid, data);
+  end loop;
+end first_block $$;
+-- read from some random blocks
+

Re: row filtering for logical replication

2021-12-14 Thread Amit Kapila

On Wed, Dec 15, 2021 at 10:20 AM Greg Nancarrow  wrote:
>
> On Mon, Dec 13, 2021 at 8:49 PM Peter Smith  wrote:
> >
> > PSA the v46* patch set.
> >
>
> 0001
>
> (1)
>
> "If a subscriber is a pre-15 version, the initial table
> synchronization won't use row filters even if they are defined in the
> publisher."
>
> Won't this lead to data inconsistencies or errors that otherwise
> wouldn't happen?
>

How? The subscribers will get all the initial data.

> Should such subscriptions be allowed?
>

I am not sure what you have in mind here? How can we change the
already released code pre-15 for this new feature?

-- 
With Regards,
Amit Kapila.

RE: [Proposal] Add foreign-server health checks infrastructure

2021-12-14 Thread kuroda.hay...@fujitsu.com

Dear Kato-san,

Thank you for giving comments! And sorry for late reply.
I rebased my patches.

> Even for local-only transaction, I thought it useless to execute
> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
> make it so that it determines outside whether it contains SQL to the
> remote or not?

Yeah, remote-checking timeout will be enable even ifa local transaction is 
opened. 
In my understanding, postgres cannot distinguish whether opening transactions
are using only local object or not. 
My first idea was that turning on the timeout when GetFdwRoutineXXX functions
were called, but in some cases FDWs may not use remote connection even if
they call such a handler function. e.g. postgresExplainForeignScan().
Hence I tried another approach that unregister all checking callback
the end of each transactions. Only FDWs can notice that transactions are remote 
or local,
so they must register callback when they really connect to other database.
This implementation will avoid calling remote checking in the case of local 
transaction,
but multiple registering and unregistering may lead another overhead.
I attached which implements that.

> The following points are minor.
> 1. In foreign.c, some of the lines are too long and should be broken.
> 2. In pqcomm.c, the newline have been removed, what is the intention of
> this?
> 3. In postgres.c,
> 3-1. how about inserting a comment between lines 2713 and 2714, similar
> to line 2707?
> 3-2. the line breaks in line 3242 and line 3243 should be aligned.
> 3-3. you should change
>   /*
>   * Skip checking foreign servers while reading
> messages.
>   */
> to
>   /*
>* Skip checking foreign servers while reading
> messages.
>*/
> 4. In connection.c, There is a typo in line 1684, so "fucntion" should
> be changed to "function".

Maybe all of them were fixed. Thanks!

How do you think?


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v05_0001_add_checking_infrastracture.patch
Description: v05_0001_add_checking_infrastracture.patch


v05_0002_add_doc.patch
Description: v05_0002_add_doc.patch

78 matches

Mail list logo