date:20220208

Re: RFC: Logging plan of the running query

2022-02-08 Thread Kyotaro Horiguchi

At Tue, 8 Feb 2022 01:13:44 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2022/02/02 21:59, torikoshia wrote:
> >> This may cause users to misunderstand that pg_log_query_plan() fails
> >> while the target backend is holding *any* locks? Isn't it better to
> >> mention "page-level locks", instead? So how about the following?
> >>
> >> --
> >> Note that the request to log the query plan will be ignored if it's
> >> received during a short period while the target backend is holding a
> >> page-level lock.
> >> --
> > Agreed.
> 
> On second thought, this note is confusing rather than helpful? Because
> the users don't know when and what operation needs page-level lock. So
> now I'm thinking it's better to remove this note.

*I* agree to removing the note. And the following error message looks
as mysterious as the note is, and the DETAIL doesn't help..

ereport(LOG_SERVER_ONLY,
+   errmsg("could not log the query plan"),
+   errdetail("Cannot log the query plan while 
holding page-level lock."));
+   hash_seq_term(&status);

We should tell the command can be retried soon, like this?

"LOG:  ignored request for logging query plan due to lock confilcts"
"HINT:  You can try again in a moment."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-08 Thread Kyotaro Horiguchi

Mmm.. checkpoint and checkpointer are quite confusing in this context..

At Tue, 08 Feb 2022 16:58:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 7 Feb 2022 13:51:31 +0900, Fujii Masao  
> wrote in 
> > 
> > 
> > On 2022/02/07 12:02, Kyotaro Horiguchi wrote:
> > > - If any later checkpoint/restartpoint has been established, just skip
> > >remaining task then return false. (!chkpt_was_latest)
> > >(I'm not sure this can happen, though.)
> > > - we update control file only when archive recovery is still ongoing.
> > 
> > This comment seems to conflict with what your PoC patch does. Because
> > with the patch, ControlFile->checkPoint and
> > ControlFile->checkPointCopy seem to be updated even when
> > ControlFile->state != DB_IN_ARCHIVE_RECOVERY.
> 
> Ah, yeah, by "update" I meant that "move forward". Sorry for confusing
> wording.

Please ignore the "that".

> > I agree with what your PoC patch does for now. When we're not in
> > archive recovery state, checkpoint and REDO locations in pg_control
> > should be updated but min recovery point should be reset to invalid
> > one (which instruments that subsequent crash recovery should replay
> > all available WAL files).
> 
> Yes.  All buffers before the last recovery point's end have been
> flushed out so the recovery point is valid as a checkpoint.  On the
> other hand minRecoveryPoint is no longer needed and actually is
> ignored at the next crash recovery.  We can leave it alone but it is
> consistent that it is cleared.
> 
> > > - Otherwise reset minRecoveryPoint then continue.
> > > Do you have any thoughts or opinions?
> > 
> > Regarding chkpt_was_latest, whether the state is
> > DB_IN_ARCHIVE_RECOVERY or not, if checkpoint and redo locations in
> > pg_control are updated, IMO we don't need to skip the "remaining
> > tasks". Since those locations are updated and subsequent crash
> > recovery will start from that redo location, for example, ISTM that we
> > can safely delete old WAL files prior to the redo location as the
> > "remaining tasks". Thought?
> 
> If I read you correctly, the PoC works that way. It updates pg_control
> if the restart point is latest then performs the remaining cleanup
> tasks in that case. Recovery state doesn't affect this process.
> 
> I reexamined about the possibility of concurrent checkpoints.
> 
> Both CreateCheckPoint and CreateRestartPoint are called from
> checkpointer loop, shutdown handler of checkpointer and standalone
> process.  So I can't see a possibility of concurrent checkpoints.
> 
> In the past we had a time when startup process called CreateCheckPoint

- directly in the crash recovery case where checkpoint is not running
- but since 7ff23c6d27 checkpoint is started before startup process
+ directly in the crash recovery case where checkpointer is not running
+ but since 7ff23c6d27 checkpointer is launched before startup process

> starts.  So I conclude that that cannot happen.
> 
> So the attached takes away the path for the case where the restart
> point is overtaken by a concurrent checkpoint.
> 
> Thus.. the attached removes the ambiguity of of the proposed patch
> about the LSNs in the restartpoint-ending log message.

Thoughts?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-08 Thread Bharath Rupireddy

On Tue, Feb 8, 2022 at 2:10 PM Kyotaro Horiguchi
 wrote:
> > Thus.. the attached removes the ambiguity of of the proposed patch
> > about the LSNs in the restartpoint-ending log message.
>
> Thoughts?

Thanks for the patch. I have few comments on the
v1-0001-Get-rid-of-unused-path-to-handle-concurrent-check.patch

1) Can we have this Assert right after "skipping restartpoint, already
performed at %X/%X" error message block? Does it make any difference?
My point is that if at all, we were to assert this, why can't we do it
before CheckPointGuts?
+ /* We mustn't have a concurrent checkpoint that advances checkpoint LSN */
+ Assert(lastCheckPoint.redo > ControlFile->checkPointCopy.redo);
+
2) Related to the above Assert, do we really need an assertion or a FATAL error?
3) Let's be consistent with "crash recovery" - replace
"archive-recovery" with "archive recovery"?
+ * We have exited from archive-recovery mode after this restartpoint
+ * started. Crash recovery ever after should always recover to the end
4) Isn't it enough to say "Crash recovery should always recover to the
end of WAL."?
+ * started. Crash recovery ever after should always recover to the end
5) Is there a reliable test case covering this code? Please point me
if the test case is shared upthread somewhere.
6) So, with this patch, the v8 patch-set posted at [1] doesn't need
any changes IIUC. If that's the case, please feel free to post all the
patches together such that they get tested in cfbot.

[1] - 
https://www.postgresql.org/message-id/CALj2ACUtZhTb%3D2ENkF3BQ3wi137uaGi__qzvXC-qFYC0XwjALw%40mail.gmail.com

Regards,
Bharath Rupireddy.

RE: Logical replication timeout problem

2022-02-08 Thread kuroda.hay...@fujitsu.com

Dear Wang,

Thank you for making a patch.
I applied your patch and confirmed that codes passed regression test.
I put a short reviewing:

```
+   static int skipped_changes_count = 0;
+   /*
+* Conservatively, at least 150,000 changes can be skipped in 1s.
+*
+* Because we use half of wal_sender_timeout as the threshold, and the 
unit
+* of wal_sender_timeout in process is ms, the final threshold is
+* wal_sender_timeout * 75.
+*/
+   int skipped_changes_threshold = wal_sender_timeout * 75;
```

I'm not sure but could you tell me the background of this calculation? 
Is this assumption reasonable?

```
@@ -654,20 +663,62 @@ pgoutput_change(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
{
case REORDER_BUFFER_CHANGE_INSERT:
if (!relentry->pubactions.pubinsert)
+   {
+   if (++skipped_changes_count >= 
skipped_changes_threshold)
+   {
+   OutputPluginUpdateProgress(ctx, true);
+
+   /*
+* After sending keepalive message, 
reset
+* skipped_changes_count.
+*/
+   skipped_changes_count = 0;
+   }
return;
+   }
break;
```

Is the if-statement needed? In the walsender case OutputPluginUpdateProgress() 
leads WalSndUpdateProgress(),
and the function also has the threshold for ping-ing.

```
static void
-WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, 
TransactionId xid)
+WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, 
TransactionId xid, bool send_keep_alive)
 {
-   static TimestampTz sendTime = 0;
+   static TimestampTz trackTime = 0;
TimestampTz now = GetCurrentTimestamp();
 
+   if (send_keep_alive)
+   {
+   /*
+* If half of wal_sender_timeout has lapsed without send 
message standby,
+* send a keep-alive message to the standby.
+*/
+   static TimestampTz sendTime = 0;
+   TimestampTz ping_time = TimestampTzPlusMilliseconds(sendTime,
+   
wal_sender_timeout / 2);
+   if (now >= ping_time)
+   {
+   WalSndKeepalive(false);
+
+   /* Try to flush pending output to the client */
+   if (pq_flush_if_writable() != 0)
+   WalSndShutdown();
+   sendTime = now;
+   }
+   }
+
```

* +1 about renaming to trackTime.
* `/2` might be magic number. How about following? Renaming is very welcome:

```
+#define WALSND_LOGICAL_PING_FACTOR 0.5
+   static TimestampTz sendTime = 0;
+   TimestampTz ping_time = TimestampTzPlusMilliseconds(sendTime,
+   
wal_sender_timeout * WALSND_LOGICAL_PING_FACTOR)
```

Could you add a commitfest entry for cfbot?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-08 Thread Ken Kato


On 2022-02-08 15:34, Fujii Masao wrote:

Thanks for updating the patch! It basically looks good to me. I
applied the following small changes to the patch. Updated version of
the patch attached. Could you review this version?


Thank you for the patch!

It looks good to me!

I'm not sure how strict coding conventions are, but the following line 
is over 80 characters.
+insert into xid8_t1 values ('0'), ('010'), ('42'), 
('0x'), ('-1');
Therefore, I made a patch which removed ('010') just to fit in 80 
characters.



Best wishes,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8754f2f89b..1b064b4feb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19973,7 +19973,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
@@ -19992,7 +19992,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 9b4ceaea47..e4b4952a28 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -286,6 +286,30 @@ xid8cmp(PG_FUNCTION_ARGS)
 		PG_RETURN_INT32(-1);
 }
 
+Datum
+xid8_larger(PG_FUNCTION_ARGS)
+{
+	FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+	FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+	if (FullTransactionIdFollows(fxid1, fxid2))
+		PG_RETURN_FULLTRANSACTIONID(fxid1);
+	else
+		PG_RETURN_FULLTRANSACTIONID(fxid2);
+}
+
+Datum
+xid8_smaller(PG_FUNCTION_ARGS)
+{
+	FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+	FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+	if (FullTransactionIdPrecedes(fxid1, fxid2))
+		PG_RETURN_FULLTRANSACTIONID(fxid1);
+	else
+		PG_RETURN_FULLTRANSACTIONID(fxid2);
+}
+
 /*
  *	 COMMAND IDENTIFIER ROUTINES			 *
  */
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 137f6eef69..2843f4b415 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -149,6 +149,9 @@
 { aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger',
   aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'max(xid8)', aggtransfn => 'xid8_larger',
+  aggcombinefn => 'xid8_larger', aggsortop => '>(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # min
 { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller',
@@ -214,6 +217,9 @@
 { aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller',
   aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'min(xid8)', aggtransfn => 'xid8_smaller',
+  aggcombinefn => 'xid8_smaller', aggsortop => '<(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # count
 { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7024dbe10a..62f36daa98 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -203,6 +203,12 @@
 { oid => '5071', descr => 'convert xid8 to xid',
   proname => 'xid', prorettype => 'xid', proargtypes => 'xid8',
   prosrc => 'xid8toxid' },
+{ oid => '5097', descr => 'larger of two',
+  proname => 'xid8_larger', prorettype => 'xid8', proargtypes => 'xid8 xid8',
+  prosrc => 'xid8_larger' },
+{ oid => '5098', descr => 'smaller of two',
+  proname => 'xid8_smaller', prorettype => 'xid8', proargtypes => 'xid8 xid8',
+  prosrc => 'xid8_smaller' },
 { oid => '69',
   proname => 'cideq', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'cid cid', prosrc => 'cideq' },
@@ -6564,6 +6570,9 @@
 { oid => '4189', descr => 'maximum value of all pg_lsn input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
+{ oid => '5099', descr => 'maximum value of all xid8 input values',
+  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
@@ -6631,6 +6640,9 @@
 { oid => '4190', descr => 'minimum value of all pg_lsn input values',
   proname => 'min', prokind => 'a', p

Re: [BUG]Update Toast data failure in logical replication

2022-02-08 Thread Amit Kapila

On Tue, Feb 8, 2022 at 12:48 AM Andres Freund  wrote:
>
> On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
> > Right, and it is getting changed. We are just printing the first 200
> > characters (by using SQL [1]) from the decoded tuple so what is shown
> > in the results is the initial 200 bytes.
>
> Ah, I knew I must have been missing something.
>
>
> > The complete decoded data after the patch is as follows:
>
> Hm. I think we should change the way the strings are shortened - otherwise we
> don't really verify much in that test. Perhaps we could just replace the long
> repetitive strings with something shorter in the output?
>
> E.g. using something like regexp_replace(data, 
> '(1234567890|9876543210){200}', '\1{200}','g')
> inside the substr().
>

This sounds like a good idea. Shall we do this as part of this patch
itself or as a separate improvement?

> Wonder if we should deduplicate the number of different toasted strings in the
> file to something that'd allow us to have a single "redact_toast" function or
> such. There's too many different ones to have a reasonbly simple redaction
> function right now.
>

I think this is also worth trying.

> But that's perhaps better done separately.
>

+1.

-- 
With Regards,
Amit Kapila.

Should pg_restore vacuum the tables before the post-data stage?

2022-02-08 Thread Frédéric Yhuel


Hello,

I was wondering if pg_restore should call VACUUM ANALYZE for all tables, 
after the "COPY" stage, and before the "post-data" stage.


Indeed, without such a VACUUM, the visibility map isn't available. 
Depending on the size of the tables and on the configuration, a foreign 
key constraint check would have to perform either:


 * a seq scan of the referencing table, even if an index exists for it;
 * a index-only scan, with 100% of heap fetches.

And it's the same for the referenced table, of course, excepts the index 
always exists.


In both cases, it could be very slow.

What's more, the seq scan isn't parallelized [1].

It seems to me that in most cases, autovacuum doesn't have enough time 
to process the tables before the post-data stage, or only some of them.


What do you think?

If the matter has already been discussed previously, can you point me to 
the thread discussing it?


Best regards,
Frédéric

[1] 
https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com

Re: Database-level collation version tracking

2022-02-08 Thread Peter Eisentraut


On 07.02.22 17:08, Julien Rouhaud wrote:

There's so limited testing in collate.* regression tests, so I thought it would
be ok to add it there.  At least some ALTER DATABASE ... REFRESH VERSION would
be good, similarly to collation-level versioning.


I don't think you can run ALTER DATABASE from the regression test 
scripts, since the database name is not fixed.  You'd have to paste the 
command together using psql tricks or something.  I guess it could be 
done, but maybe there is a better solution.  We could put it into the 
createdb test suite, or write a new TAP test suite somewhere.  There is 
no good precedent for this.



That code takes a RowExclusiveLock on pg_database.  Did you have something
else in mind?


But that lock won't prevent a concurrent DROP DATABASE, so it's totally
expected to hit that cache lookup failed error.  There should either be a
shdepLockAndCheckObject(), or changing the error message to some errmsg("data
xxx does not exist").


I was not familiar with that function.  AFAICT, it is only used for 
database and role settings (AlterDatabaseSet(), AlterRoleSet()), but not 
when just updating the role or database catalog (e.g., AlterRole(), 
RenameRole(), RenameDatabase()).  So I don't think it is needed here. 
Maybe I'm missing something.

Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-08 Thread Joe Conway


On 2/7/22 12:09, Robert Haas wrote:

On Mon, Feb 7, 2022 at 11:13 AM Joe Conway  wrote:

It is confusing and IMHO dangerous that the predefined roles currently
work differently than regular roles eith respect to privilege inheritance.


I feel like that's kind of a conclusory statement, as opposed to
making an argument. I mean that this tells me something about how you
feel, but it doesn't really help me understand why you feel that way.



The argument is that we call these things "predefined roles", but they 
do not behave the same way normal "roles" behave.


Someone not intimately familiar with that fact could easily make bad 
assumptions, and therefore wind up with misconfigured security settings. 
In other words, it violates the principle of least astonishment (POLA).


As Joshua said nearby, it simply jumps out at me as a bug.

---
After more thought, perhaps the real problem is that these things should 
not have been called "predefined roles" at all. I know, the horse has 
already left the barn on that, but in any case...


They are (to me at least) similar in concept to Linux capabilities in 
that they allow roles other than superusers to do a certain subset of 
the things historically reserved for superusers through a special 
mechanism (hardcoded) rather than through the normal privilege system 
(GRANTS/ACLs).


As an example, the predefined role pg_read_all_settings allows a 
non-superuser to read GUC normally reserved for superuser access only.


If I create a new user "bob" with the default INHERIT attribute, and I 
grant postgres to bob, bob must SET ROLE to postgres in order to access 
the capability to read superuser settings.


This is similar to bob's access to the default superuser privilege to 
read data in someone else's table (must SET ROLE to access that capability).


But it is different from bob's access to inherited privileges which are 
GRANTed:

8<--
psql nmx
psql (15devel)
Type "help" for help.

nmx=# create user bob;
CREATE ROLE

nmx=# grant postgres to bob;
GRANT ROLE

nmx=# \q
8<--
-and-
8<--
psql -U bob nmx
psql (15devel)
Type "help" for help.

nmx=> select current_user;
 current_user
--
 bob
(1 row)

nmx=> show stats_temp_directory;
ERROR:  must be superuser or have privileges of pg_read_all_settings to 
examine "stats_temp_directory"

nmx=> set role postgres;
SET
nmx=# show stats_temp_directory;
 stats_temp_directory
--
 pg_stat_tmp
(1 row)

nmx=# select current_user;
 current_user
--
 postgres
(1 row)

nmx=# select * from foo;
 id

 42
(1 row)

nmx=# reset role;
RESET
nmx=> select current_user;
 current_user
--
 bob
(1 row)

nmx=> select * from foo;
ERROR:  permission denied for table foo
nmx=> set role postgres;
SET
nmx=# grant select on table foo to postgres;
GRANT
nmx=# reset role;
RESET
nmx=> select current_user;
 current_user
--
 bob
(1 row)

nmx=> select * from foo;
 id

 42
(1 row)
8<--

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Temporary file access API

2022-02-08 Thread Antonin Houska

Here I'm starting a new thread to discuss a topic that's related to the
Transparent Data Encryption (TDE), but could be useful even without that.  The
problem has been addressed somehow in the Cybertec TDE fork, and I can post
the code here if it helps. However, after reading [1] (and the posts
upthread), I've got another idea, so let's try to discuss it first.

It makes sense to me if we first implement the buffering (i.e. writing/reading
certain amount of data at a time) and make the related functions aware of
encryption later: as long as we use a block cipher, we also need to read/write
(suitably sized) chunks rather than individual bytes (or arbitrary amounts of
data). (In theory, someone might need encryption but reject buffering, but I'm
not sure if this is a realistic use case.)

For the buffering, I imagine a "file stream" object that user creates on the
top of a file descriptor, such as

FileStream  *FileStreamCreate(File file, int buffer_size)

or

FileStream  *FileStreamCreateFD(int fd, int buffer_size)

and uses functions like

int FileStreamWrite(FileStream *stream, char *buffer, int amount)

and

int FileStreamRead(FileStream *stream, char *buffer, int amount)

to write and read data respectively.

Besides functions to close the streams explicitly (e.g. FileStreamClose() /
FileStreamFDClose()), we'd need to ensure automatic closing where that happens
to the file. For example, if OpenTemporaryFile() was used to obtain the file
descriptor, the user expects that the file will be closed and deleted on
transaction boundary, so the corresponding stream should be freed
automatically as well.

To avoid code duplication, buffile.c should use these streams internally as
well, as it also performs buffering. (Here we'd also need functions to change
reading/writing position.)

Once we implement the encryption, we might need add an argument to the
FileStreamCreate...() functions that helps to generate an unique IV, but the
...Read() / ...Write() functions would stay intact. And possibly one more
argument to specify the kind of cipher, in case we support more than one.

I think that's enough to start the discussion. Thanks for feedback in advance.

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYGjN_f%3DFCErX49bzjhNG%2BGoctY%2Ba%2BXhNRWCVvDY8U74w%40mail.gmail.com

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: storing an explicit nonce

2022-02-08 Thread Antonin Houska

Stephen Frost  wrote:

> Perhaps this is all too meta and we need to work through some specific
> ideas around just what this would look like.  In particular, thinking
> about what this API would look like and how it would be used by
> reorderbuffer.c, which builds up changes in memory and then does a bare
> write() call, seems like a main use-case to consider.  The gist there
> being "can we come up with an API to do all these things that doesn't
> require entirely rewriting ReorderBufferSerializeChange()?"
> 
> Seems like it'd be easier to achieve that by having something that looks
> very close to how write() looks, but just happens to have the option to
> run the data through a stream cipher and maybe does better error
> handling for us.  Making that layer also do block-based access to the
> files underneath seems like a much larger effort that, sure, may make
> some things better too but if we could do that with the same API then it
> could also be done later if someone's interested in that.

My initial proposal is in this new thread:

https://www.postgresql.org/message-id/4987.1644323098%40antos

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Database-level collation version tracking

2022-02-08 Thread Julien Rouhaud

On Tue, Feb 08, 2022 at 12:14:02PM +0100, Peter Eisentraut wrote:
> On 07.02.22 17:08, Julien Rouhaud wrote:
> > There's so limited testing in collate.* regression tests, so I thought it 
> > would
> > be ok to add it there.  At least some ALTER DATABASE ... REFRESH VERSION 
> > would
> > be good, similarly to collation-level versioning.
> 
> I don't think you can run ALTER DATABASE from the regression test scripts,
> since the database name is not fixed.  You'd have to paste the command
> together using psql tricks or something.  I guess it could be done, but
> maybe there is a better solution.  We could put it into the createdb test
> suite, or write a new TAP test suite somewhere.  There is no good precedent
> for this.

I was thinking using a simple DO command, as there are already some other usage
for that for non deterministic things (like TOAST tables).  If it's too
problematic I'm fine with not having tests for that for now.

> > > That code takes a RowExclusiveLock on pg_database.  Did you have something
> > > else in mind?
> > 
> > But that lock won't prevent a concurrent DROP DATABASE, so it's totally
> > expected to hit that cache lookup failed error.  There should either be a
> > shdepLockAndCheckObject(), or changing the error message to some 
> > errmsg("data
> > xxx does not exist").
> 
> I was not familiar with that function.  AFAICT, it is only used for database
> and role settings (AlterDatabaseSet(), AlterRoleSet()), but not when just
> updating the role or database catalog (e.g., AlterRole(), RenameRole(),
> RenameDatabase()).  So I don't think it is needed here. Maybe I'm missing
> something.

I'm just saying that without such a lock you can easily trigger the "cache
lookup" error, and that's something that's supposed to happen with normal
usage I think.  So it should be a better message saying that the database has
been concurrently dropped, or actually simply does not exist like it's done in
AlterDatabaseOwner() for the same pattern:

[...]
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_DATABASE),
 errmsg("database \"%s\" does not exist", 
dbname)));
[...]

Apart from that I still think that we should check the collation version of the
source database when creating a new database.  It won't cost much but will give
the DBA a chance to recreate the indexes before risking invalid index usage.

Re: Make mesage at end-of-recovery less scary.

2022-02-08 Thread Ashutosh Sharma

Hi,

Here are some of my review comments on the v11 patch:

-   (errmsg_internal("reached end of WAL in
pg_wal, entering archive recovery")));
+   (errmsg_internal("reached end of WAL at %X/%X
on timeline %u in %s during crash recovery, entering archive
recovery",
+LSN_FORMAT_ARGS(ErrRecPtr),
+replayTLI,
+xlogSourceNames[currentSource])));

Why crash recovery? Won't this message get printed even during PITR?

I just did a PITR and could see these messages in the logfile.

2022-02-08 18:00:44.367 IST [86185] LOG:  starting point-in-time
recovery to WAL location (LSN) "0/5227790"
2022-02-08 18:00:44.368 IST [86185] LOG:  database system was not
properly shut down; automatic recovery in progress
2022-02-08 18:00:44.369 IST [86185] LOG:  redo starts at 0/14DC8D8
2022-02-08 18:00:44.978 IST [86185] DEBUG1:  reached end of WAL at
0/3D0 on timeline 1 in pg_wal during crash recovery, entering
archive recovery

==

+   /*
+* If we haven't emit an error message, we have safely reached the
+* end-of-WAL.
+*/
+   if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG)
+   {
+   char   *fmt;
+
+   if (StandbyMode)
+   fmt = gettext_noop("reached end of WAL at %X/%X on
timeline %u in %s during standby mode");
+   else if (InArchiveRecovery)
+   fmt = gettext_noop("reached end of WAL at %X/%X on
timeline %u in %s during archive recovery");
+   else
+   fmt = gettext_noop("reached end of WAL at %X/%X on
timeline %u in %s during crash recovery");
+
+   ereport(LOG,
+   (errmsg(fmt, LSN_FORMAT_ARGS(ErrRecPtr), replayTLI,
+   xlogSourceNames[currentSource]),
+(errormsg ? errdetail_internal("%s", errormsg) : 0)));
+   }

Doesn't it make sense to add an assert statement inside this if-block
that will check for xlogreader->EndOfWAL?

==

-* We only end up here without a message when XLogPageRead()
-* failed - in that case we already logged something. In
-* StandbyMode that only happens if we have been triggered, so we
-* shouldn't loop anymore in that case.
+* If we get here for other than end-of-wal, emit the error
+* message right now. Otherwise the message if any is shown as a
+* part of the end-of-WAL message below.
 */

For consistency, I think we can replace "end-of-wal" with
"end-of-WAL". Please note that everywhere else in the comments you
have used "end-of-WAL". So why not the same here?

==

ereport(LOG,
-   (errmsg("replication terminated by
primary server"),
-errdetail("End of WAL reached on
timeline %u at %X/%X.",
-  startpointTLI,
-
LSN_FORMAT_ARGS(LogstreamResult.Write;
+   (errmsg("replication terminated by
primary server on timeline %u at %X/%X.",
+   startpointTLI,
+
LSN_FORMAT_ARGS(LogstreamResult.Write;

Is this change really required? I don't see any issue with the
existing error message.

==

Lastly, are we also planning to backport this patch?

--
With Regards,
Ashutosh Sharma.

On Wed, Feb 2, 2022 at 11:05 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 1 Feb 2022 12:38:01 +0400, Pavel Borisov  
> wrote in
> > Maybe it can be written little bit shorter:
> > pe = (char *) record + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));
> > as
> > pe = p + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));
> > ?
>
> That difference would be a matter of taste, but I found it looks
> cleaner that definition and assignment is separated for both p and pe.
> Now it is like the following.
>
> >   char   *p;
> >   char   *pe;
> >
> >   /* scan from the beginning of the record to the end of block */
> >   p  = (char *) record;
> >   pe = p + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));
>
>
> > The problem that pgindent sometimes reflow formatting of unrelated blocks
> > is indeed existing. But I think it's right to manually leave pgindent-ed
> > code only on what is related to the patch. The leftover is pgindent-ed in a
> > scheduled manner sometimes, so don't need to bother.
>
> Yeah, I meant that it is a bit annoying to unpginden-ting unrelated
> edits:p
>
> > I'd like to set v10 as RfC.
>
> Thanks!  The suggested change is done in the attached v11.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Re: row filtering for logical replication

2022-02-08 Thread Amit Kapila

On Tue, Feb 8, 2022 at 8:01 AM houzj.f...@fujitsu.com
 wrote:
>
> >
> > 12. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
> >
> > + /*
> > + * Initialize the row filter after getting the final publish_as_relid
> > + * as we only evaluate the row filter of the relation which we publish
> > + * change as.
> > + */
> > + pgoutput_row_filter_init(data, active_publications, entry);
> >
> > The comment "which we publish change as" seems strangely worded.
> >
> > Perhaps it should be:
> > "... only evaluate the row filter of the relation which being published."
>
> Changed.
>

I don't know if this change is an improvement. If you want to change
then I don't think 'which' makes sense in the following part of the
comment: "...relation which being published."

Few other comments:

1. Can we save sending schema change messages if the row filter
doesn't match by moving maybe_send_schema after row filter checks?

2.
commit message/docs:
"The WHERE clause
only allows simple expressions that don't have user-defined functions,
user-defined operators, user-defined collations, non-immutable built-in
functions, or references to system columns."

"user-defined types" is missing in this sentence.

3.
+ /*
+ * For all the supported nodes, check the functions and collations used in
+ * the nodes.
+ */

Again 'types' is missing in this comment.

-- 
With Regards,
Amit Kapila.

Re: [RFC] building postgres with meson - autogenerated headers

2022-02-08 Thread Peter Eisentraut


On 07.02.22 20:24, Andres Freund wrote:

To be honest, I do not really understand the logic behind when autoconf ends
up with #defines that define a macro to 0/1 and when a macro ends defined/or
not and when we end up with a macro defined to 1 or not defined at all.


The default is to define to 1 or not at all.  The reason for this is 
presumably that originally, autoconf (or its predecessor practices) just 
populated the command line with a few -DHAVE_THIS options.  Creating a 
header file came later.  And -DFOO is equivalent to #define FOO 1. 
Also, this behavior allows code to use both the #ifdef HAVE_THIS and the 
#if HAVE_THIS style.


The cases that deviate from this have a special reason for this.  One 
issue to consider is that depending on how the configure script is set 
up or structured, a test might not run at all.  But for example, if you 
have a check for a declaration of a function, and the test doesn't run 
in a particular configuration, the fallback in your own code would 
normally be to then manually declare the function yourself.  But if you 
didn't even run the test, then adding a declaration of a function you 
didn't want in the first place might be bad.  In that case, you can 
check with #ifdef whether the test was run, and then check the value of 
the macro for the test outcome.

Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 6:59 AM Joe Conway  wrote:
> This is similar to bob's access to the default superuser privilege to
> read data in someone else's table (must SET ROLE to access that capability).
>
> But it is different from bob's access to inherited privileges which are
> GRANTed:

Yeah. I think right here you've put your finger on what's been bugging
me about this: it's similar to one thing, and it's different from
another. To you and Joshua and Stephen, it seems 100% obvious that
these roles should work like grants of other roles. But I think of
them as capabilities derived from the superuser account, and so I'm
sort of tempted to think that they should work the way the superuser
bit does. And that's why I don't think the fact that they work the
other way is "just a bug" -- it's one of two possible ways that
someone could think that it ought to work based on how other things in
the system actually do work.

I'm not hard stuck on the idea that the current behavior is right, but
I don't think that we can really say that we've made things fully
consistent unless we make things like SUPERUSER and BYPASSRLS work the
same way that you want to make predefined roles work. And probably do
something about the INHERIT flag too because the current situation
seems like a hot mess.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Improve correlation names in sanity tests

2022-02-08 Thread Peter Eisentraut

I had to do some analysis on the "sanity" tests in the regression test 
suite (opr_sanity, type_sanity) recently, and I found some of the 
queries very confusing.  One main stumbling block is that for some 
probably ancient reason many of the older queries are written with 
correlation names p1, p2, etc. independent of the name of the catalog. 
This one is a good example:


SELECT p1.oid, p1.oprname, p2.oid, p2.proname
FROM pg_operator AS p1, pg_proc AS p2  <-- HUH?!?
WHERE p1.oprcode = p2.oid AND
p1.oprkind = 'l' AND
(p2.pronargs != 1
 OR NOT binary_coercible(p2.prorettype, p1.oprresult)
 OR NOT binary_coercible(p1.oprright, p2.proargtypes[0])
 OR p1.oprleft != 0);

I think this is better written as

SELECT o1.oid, o1.oprname, p1.oid, p1.proname
FROM pg_operator AS o1, pg_proc AS p1
WHERE o1.oprcode = p1.oid AND
o1.oprkind = 'l' AND
(p1.pronargs != 1
 OR NOT binary_coercible(p1.prorettype, o1.oprresult)
 OR NOT binary_coercible(o1.oprright, p1.proargtypes[0])
 OR o1.oprleft != 0);

Attached is a patch that cleans up all the queries in this manner.

(As in the above case, I kept the digits like o1 and p1 even in cases 
where only one of each letter is used in a query.  This is mainly to 
keep the style consistent, but if people don't like that at all, it 
could be changed.)From a194456df5f56042b3dd2b6c5bd95b27a771761a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 8 Feb 2022 14:37:56 +0100
Subject: [PATCH] Improve correlation names in sanity tests

---
 src/test/regress/expected/opr_sanity.out  | 374 -
 src/test/regress/expected/type_sanity.out | 466 +++---
 src/test/regress/sql/opr_sanity.sql   | 376 -
 src/test/regress/sql/type_sanity.sql  | 466 +++---
 4 files changed, 841 insertions(+), 841 deletions(-)

diff --git a/src/test/regress/expected/opr_sanity.out 
b/src/test/regress/expected/opr_sanity.out
index 562b586d8e..4ce6c039b4 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -991,9 +991,9 @@ WHERE c.castmethod = 'b' AND
 
 --  pg_conversion 
 -- Look for illegal values in pg_conversion fields.
-SELECT p1.oid, p1.conname
-FROM pg_conversion as p1
-WHERE p1.conproc = 0 OR
+SELECT c.oid, c.conname
+FROM pg_conversion as c
+WHERE c.conproc = 0 OR
 pg_encoding_to_char(conforencoding) = '' OR
 pg_encoding_to_char(contoencoding) = '';
  oid | conname 
@@ -1025,8 +1025,8 @@ WHERE p.oid = c.conproc AND
 -- them directly from SQL.  But there are no non-default built-in
 -- conversions anyway.
 -- (Similarly, this doesn't cope with any search path issues.)
-SELECT p1.oid, p1.conname
-FROM pg_conversion as p1
+SELECT c.oid, c.conname
+FROM pg_conversion as c
 WHERE condefault AND
 convert('ABC'::bytea, pg_encoding_to_char(conforencoding),
 pg_encoding_to_char(contoencoding)) != 'ABC';
@@ -1036,32 +1036,32 @@ WHERE condefault AND
 
 --  pg_operator 
 -- Look for illegal values in pg_operator fields.
-SELECT p1.oid, p1.oprname
-FROM pg_operator as p1
-WHERE (p1.oprkind != 'b' AND p1.oprkind != 'l') OR
-p1.oprresult = 0 OR p1.oprcode = 0;
+SELECT o1.oid, o1.oprname
+FROM pg_operator as o1
+WHERE (o1.oprkind != 'b' AND o1.oprkind != 'l') OR
+o1.oprresult = 0 OR o1.oprcode = 0;
  oid | oprname 
 -+-
 (0 rows)
 
 -- Look for missing or unwanted operand types
-SELECT p1.oid, p1.oprname
-FROM pg_operator as p1
-WHERE (p1.oprleft = 0 and p1.oprkind != 'l') OR
-(p1.oprleft != 0 and p1.oprkind = 'l') OR
-p1.oprright = 0;
+SELECT o1.oid, o1.oprname
+FROM pg_operator as o1
+WHERE (o1.oprleft = 0 and o1.oprkind != 'l') OR
+(o1.oprleft != 0 and o1.oprkind = 'l') OR
+o1.oprright = 0;
  oid | oprname 
 -+-
 (0 rows)
 
 -- Look for conflicting operator definitions (same names and input datatypes).
-SELECT p1.oid, p1.oprcode, p2.oid, p2.oprcode
-FROM pg_operator AS p1, pg_operator AS p2
-WHERE p1.oid != p2.oid AND
-p1.oprname = p2.oprname AND
-p1.oprkind = p2.oprkind AND
-p1.oprleft = p2.oprleft AND
-p1.oprright = p2.oprright;
+SELECT o1.oid, o1.oprcode, o2.oid, o2.oprcode
+FROM pg_operator AS o1, pg_operator AS o2
+WHERE o1.oid != o2.oid AND
+o1.oprname = o2.oprname AND
+o1.oprkind = o2.oprkind AND
+o1.oprleft = o2.oprleft AND
+o1.oprright = o2.oprright;
  oid | oprcode | oid | oprcode 
 -+-+-+-
 (0 rows)
@@ -1070,14 +1070,14 @@ WHERE p1.oid != p2.oid AND
 -- DEFINITIONAL NOTE: If A.oprcom = B, then x A y has the same result as y B x.
 -- We expect that B will always say that B.oprcom = A as well; that's not
 -- inherently essential, but it would be inefficient not to mark it so.
-SELECT p1.oid, p1.oprcode, p2.oid, p2.oprcode
-FROM pg_operator AS p1, pg_operator AS p2
-WHERE p1.oprcom = p2.oid AND
-(p1.oprkind != 'b' OR
- p1.oprleft != p2.oprright O

Re: Database-level collation version tracking

2022-02-08 Thread Michael Banck

On Mon, Feb 07, 2022 at 04:44:24PM +0100, Peter Eisentraut wrote:
> On 07.02.22 11:29, Julien Rouhaud wrote:
> > - that's not really something new with this patch, but should we output the
> >collation version info or mismatch info in \l / \dO?
> 
> It's a possibility.  Perhaps there is a question of performance if we show
> it in \l and people have tons of databases and they have to make a locale
> call for each one.  As you say, it's more an independent feature, but it's
> worth looking into.

Ok, but \l+ shows among others the database size, so I guess at that
level also showing this should be fine? (or is that already the case?)


Michael

Re: [RFC] building postgres with meson - perl embedding

2022-02-08 Thread Andrew Dunstan



On 2/7/22 21:40, Tom Lane wrote:
> I wrote:
>> Andres Freund  writes:
>>> What is the reason behind subtracting ccdlflags?
>> It looks like the coding actually originated here:
>> commit f5d0c6cad5bb2706e0e63f3f8f32e431ea428100
> Ah, here's the thread leading up to that:
>
> https://www.postgresql.org/message-id/flat/200106191206.f5JC6R108371%40candle.pha.pa.us
>
> The use of ldopts rather than hand-hacked link options seems to date to
> 0ed7864d6, only a couple days before that.  I don't think we had a
> buildfarm then, but I'd bet against the problem being especially
> widespread even then, or more people would've complained.


The buildfarm's first entry is from 22 Oct 2004.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-08 Thread Fujii Masao





On 2022/02/08 18:43, Ken Kato wrote:

On 2022-02-08 15:34, Fujii Masao wrote:

Thanks for updating the patch! It basically looks good to me. I
applied the following small changes to the patch. Updated version of
the patch attached. Could you review this version?


Thank you for the patch!

It looks good to me!


Thanks for the review!



I'm not sure how strict coding conventions are, but the following line is over 
80 characters.
+insert into xid8_t1 values ('0'), ('010'), ('42'), ('0x'), 
('-1');
Therefore, I made a patch which removed ('010') just to fit in 80 characters.


If you want to avoid the line longer than 80 columns, you should break it into 
two or more rather than remove the test code, I think. What to test is more 
important than formatting.

Also the following descriptions about formatting would be helpful.

---
https://www.postgresql.org/docs/devel/source-format.html

Limit line lengths so that the code is readable in an 80-column window.
(This doesn't mean that you must never go past 80 columns. For instance,
breaking a long error message string in arbitrary places just to keep
the code within 80 columns is probably not a net gain in readability.)
---

Therefore I'm ok with the patch that I posted upthread. Also I'm ok if you will 
break that longer line into two and post new patch. Or if the value '010' is 
really useless for the test purpose, I'm also ok if you remove it. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Refactoring SSL tests

2022-02-08 Thread Daniel Gustafsson

> On 7 Feb 2022, at 17:29, Andrew Dunstan  wrote:
> On 2/2/22 14:50, Daniel Gustafsson wrote:

>>> On 2 Feb 2022, at 17:09, Andrew Dunstan  wrote:
>>> On 2/2/22 08:26, Daniel Gustafsson wrote:
 Thoughts?  I'm fairly sure there are many crimes against Perl in this 
 patch,
 I'm happy to take pointers on how to improve that.
>>> It feels a bit odd to me from a perl POV. I think it needs to more along
>>> the lines of standard OO patterns. I'll take a stab at that based on
>>> this, might be a few days.
>> That would be great, thanks!
> 
> Here's the result of that surgery.  It's a little incomplete in that it
> needs some POD adjustment, but I think the code is right - it passes
> testing for me.

Confirmed, it passes all tests for me as well.

> One of the advantages of this, apart from being more idiomatic, is that
> by avoiding the use of package level variables you can have two
> SSL::Server objects, one for OpenSSL and (eventually) one for NSS. This
> was the original motivation for the recent install_path additions to
> PostgreSQL::Test::Cluster, so it complements that work nicely.

Agreed, this version is a clear improvement over my attempt. Thanks!

The attached v2 takes a stab at fixing up the POD sections.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-ssl-test-refactoring.patch
Description: Binary data

Re: Synchronizing slots from primary to standby

2022-02-08 Thread Ashutosh Sharma

On Tue, Feb 8, 2022 at 2:02 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-02-07 13:38:38 +0530, Ashutosh Sharma wrote:
> > Are you talking about this scenario - what if the logical replication
> > slot on the publisher is dropped, but is being referenced by the
> > standby where the slot is synchronized?
>
> It's a bit hard to say, because neither in this thread nor in the patch I've
> found a clear description of what the syncing needs to & tries to
> guarantee. It might be that that was discussed in one of the precursor
> threads, but...
>
> Generally I don't think we can permit scenarios where a slot can be in a
> "corrupt" state, i.e. missing required catalog entries, after "normal"
> administrative commands (i.e. not mucking around in catalog entries / on-disk
> files). Even if the sequence of commands may be a bit weird. All such cases
> need to be either prevented or detected.
>
>
> As far as I can tell, the way this patch keeps slots on physical replicas
> "valid" is solely by reorderbuffer.c blocking during replay via
> wait_for_standby_confirmation().
>
> Which means that if e.g. the standby_slot_names GUC differs from
> synchronize_slot_names on the physical replica, the slots synchronized on the
> physical replica are not going to be valid.  Or if the primary drops its
> logical slots.
>
>
> > Should the redo function for the drop replication slot have the capability
> > to drop it on standby and its subscribers (if any) as well?
>
> Slots are not WAL logged (and shouldn't be).
>
> I think you pretty much need the recovery conflict handling infrastructure I
> referenced upthread, which recognized during replay if a record has a conflict
> with a slot on a standby.  And then ontop of that you can build something like
> this patch.
>

OK. Understood, thanks Andres.

--
With Regards,
Ashutosh Sharma.

Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-08 Thread Joshua Brindle

On Tue, Feb 8, 2022 at 8:46 AM Robert Haas  wrote:
>
> On Tue, Feb 8, 2022 at 6:59 AM Joe Conway  wrote:
> > This is similar to bob's access to the default superuser privilege to
> > read data in someone else's table (must SET ROLE to access that capability).
> >
> > But it is different from bob's access to inherited privileges which are
> > GRANTed:
>
> Yeah. I think right here you've put your finger on what's been bugging
> me about this: it's similar to one thing, and it's different from
> another. To you and Joshua and Stephen, it seems 100% obvious that
> these roles should work like grants of other roles. But I think of
> them as capabilities derived from the superuser account, and so I'm
> sort of tempted to think that they should work the way the superuser
> bit does. And that's why I don't think the fact that they work the
> other way is "just a bug" -- it's one of two possible ways that
> someone could think that it ought to work based on how other things in
> the system actually do work.
>
> I'm not hard stuck on the idea that the current behavior is right, but
> I don't think that we can really say that we've made things fully
> consistent unless we make things like SUPERUSER and BYPASSRLS work the
> same way that you want to make predefined roles work. And probably do
> something about the INHERIT flag too because the current situation
> seems like a hot mess.

I think hot mess is an apt description of the current situation, for
example consider that:

src/backend/catalog/aclchk.c
3931: has_privs_of_role(roleid, ROLE_PG_READ_ALL_DATA))
3943: has_privs_of_role(roleid, ROLE_PG_WRITE_ALL_DATA))
4279: (has_privs_of_role(roleid, ROLE_PG_READ_ALL_DATA) ||
4280: has_privs_of_role(roleid, ROLE_PG_WRITE_ALL_DATA)))

src/backend/storage/ipc/signalfuncs.c
82: if (!has_privs_of_role(GetUserId(), proc->roleId) &&
83: !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))

src/backend/storage/ipc/procarray.c
3843: if (!has_privs_of_role(GetUserId(), proc->roleId) &&
3844: !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND))

src/backend/tcop/utility.c
943: if (!has_privs_of_role(GetUserId(), ROLE_PG_CHECKPOINTER))

4 predefined roles currently use has_privs_of_role in master.

Further, pg_monitor, as an SQL-only predefined role, also behaves
consistently with the INHERIT rules that other roles do.

In order for SQL-only predefined roles to ignore INHERIT we would need
to hardcode bypasses for them, which IMO seems like the worst possible
solution to the current inconsistency.

Re: libpq async duplicate error results

2022-02-08 Thread Alvaro Herrera

On 2022-Feb-07, Tom Lane wrote:

> In any case, the particular example we're looking at here is connection
> loss, which is not something that should greatly concern us as far as
> pipeline semantics are concerned, because you're not getting any more
> pipelined results anyway if that happens.  In the non-I/O-error case,
> I see that PQgetResult does a hacky "resetPQExpBuffer(&conn->errorMessage)"
> in between pipelined queries.  It seems like if there are real usability
> issues in this area, they probably come from that not being well placed.
> In particular, I wonder why that's done with the pqPipelineProcessQueue
> call in the PGASYNC_IDLE stanza, yet not with the pqPipelineProcessQueue
> call in the PGASYNC_READY stanza (where we're returning a PIPELINE_SYNC
> result).  It kinda looks to me like maybe pqPipelineProcessQueue
> ought to have the responsibility for doing that, rather than having
> two out of the three call sites do it.  Also it would seem more natural
> to associate it with the reinitialization that happens inside
> pqPipelineProcessQueue.

Yeah, I agree that the placement of error message reset in pipelined
libpq is more ad-hoc to the testing I was doing than carefully
principled.  I didn't test changing it, but on a quick look your
proposed change seems reasonable to me.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 10:00 AM Joshua Brindle
 wrote:
> 4 predefined roles currently use has_privs_of_role in master.
>
> Further, pg_monitor, as an SQL-only predefined role, also behaves
> consistently with the INHERIT rules that other roles do.
>
> In order for SQL-only predefined roles to ignore INHERIT we would need
> to hardcode bypasses for them, which IMO seems like the worst possible
> solution to the current inconsistency.

I agree we need to make the situation consistent. But if you think
there's exactly one problem here and this patch fixes it, I
emphatically disagree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

2022-02-08 Thread torikoshia

On 2022-02-03 17:09, Robert Treat wrote:
On Wed, Feb 2, 2022 at 7:59 AM torikoshia  
wrote:

2022-02-01 01:51, Fujii Masao wrote:

> +Note that nested statements (statements executed inside a
> function) are not
> +considered for logging. Only the plan of the most deeply nested
> query is logged.
>
> Now the plan of even nested statement can be logged. So this
> description needs to be updated?

Modified it as below:

 -Note that nested statements (statements executed inside a
function) are not
 -considered for logging. Only the plan of the most deeply 
nested

query is logged.
 +Note that when the statements are executed inside a 
function,

only the
 +plan of the most deeply nested query is logged.

Minor nit, but I think the "the" is superfluous.. ie.

"Note that when statements are executed inside a function,
only the plan of the most deeply nested query is logged"

Thanks! Modified it.

On 2022-02-08 01:13, Fujii Masao wrote:
Thanks for the comments!

On 2022/02/02 21:59, torikoshia wrote:

This may cause users to misunderstand that pg_log_query_plan() fails
while the target backend is holding *any* locks? Isn't it better to
mention "page-level locks", instead? So how about the following?

--
Note that the request to log the query plan will be ignored if it's
received during a short period while the target backend is holding a
page-level lock.
--

Agreed.

On second thought, this note is confusing rather than helpful? Because
the users don't know when and what operation needs page-level lock. So
now I'm thinking it's better to remove this note.

Removed it.

As you pointed out offlist, the issue could occur even when both 
pg_log_backend_memory_contexts and pg_log_query_plan are called and it 
may be better to make another patch.

OK.

You also pointed out offlist that it would be necessary to call 
LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
AFAICS it can call ereport(ERROR), i.e., palloc0() in 
NewExplainState(), so I added PG_TRY/CATCH and make it call 
LockErrorCleanup() when ERROR occurs.

As we discussed off-list, this treatment might not be necessary.
Because when ERROR or FATAL error happens, AbortTransaction() is
called and LockErrorCleanup() is also called inside there.

Agreed.

Since I'm not sure how long it will take to discuss this point, the 
attached patch is based on the current HEAD at this time.

Thanks for updating the patch!

@@ -5048,6 +5055,12 @@ AbortSubTransaction(void)
 */
PG_SETMASK(&UnBlockSig);
 +  /*
+	 * When ActiveQueryDesc is referenced after abort, some of its 
elements

+* are freed. To avoid accessing them, reset ActiveQueryDesc here.
+*/
+   ActiveQueryDesc = NULL;

AbortSubTransaction() should reset ActiveQueryDesc to
save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
Otherwise ActiveQueryDesc of top-level statement will be unavailable
after subtransaction is aborted in the nested statements.

For example, no plan is logged while the following "select
pg_sleep(test())" is running, because the exception inside test()
function aborts the subtransaction and resets ActiveQueryDesc to NULL.

create or replace function test () returns int as $$
begin
 perform 1/0;
exception when others then
 return 30;
end;
$$ language plpgsql;

select pg_sleep(test());

Agreed.

BTW, since the above example results in calling ExecutorRun() only once, 
the output didn't differ even after ActiveQueryDesc is reset to 
save_ActiveQueryDesc.

The below definition of test() worked as expected.

 create or replace function test () returns int as $$
 begin
 perform 1;
 perform 1/0;
 exception when others then
 return 30;
 end;
 $$ language plpgsql;

-CREATE ROLE regress_log_memory;
+CREATE ROLE regress_log;

Isn't this name too generic? How about regress_log_function, for 
example?

Agreed.

On 2022-02-08 17:18, Kyotaro Horiguchi wrote:

At Tue, 8 Feb 2022 01:13:44 +0900, Fujii Masao
 wrote in

On 2022/02/02 21:59, torikoshia wrote:
>> This may cause users to misunderstand that pg_log_query_plan() fails
>> while the target backend is holding *any* locks? Isn't it better to
>> mention "page-level locks", instead? So how about the following?
>>
>> --
>> Note that the request to log the query plan will be ignored if it's
>> received during a short period while the target backend is holding a
>> page-level lock.
>> --
> Agreed.

On second thought, this note is confusing rather than helpful? Because
the users don't know when and what operation needs page-level lock. So
now I'm thinking it's better to remove this note.

*I* agree to removing the note. And the following error message looks
as mysterious as the note is, and the DETAIL doesn't help..

ereport(LOG_SERVER_ONLY,
+   errmsg("could

Re: Refactoring SSL tests

2022-02-08 Thread Andrew Dunstan

On 2/8/22 09:24, Daniel Gustafsson wrote:
>
> The attached v2 takes a stab at fixing up the POD sections.

There a capitalization typo in SSL/Backend/OpenSSL.pm - looks like
that's my fault:

+  my $backend = SSL::backend::OpenSSL->new();

Also, I think we should document that SSL::Server::new() takes an
optional flavor parameter, in the absence of which it uses
$ENV{with_ssl}, and that 'openssl' is the only currently supported value.

cheers

andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com

is the base backup protocol used by out-of-core tools?

2022-02-08 Thread Robert Haas

Hi,

Commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454 added a new syntax for
the BASE_BACKUP command, and commit
cc333f32336f5146b75190f57ef587dff225f565 added a new COPY subprotocol
for taking base backups. In both cases, I preserved backward
compatibility. However, nothing in PostgreSQL itself cares about this,
because if you try to run an older version of pg_basebackup against a
v15 server, it's going to fail anyway:

pg_basebackup: error: incompatible server version 15devel

>From that point of view, there's no downside to removing from the
server the old syntax for BASE_BACKUP and the old protocol for taking
backups. We can't remove anything from pg_basebackup, because it is
our practice to make new versions of pg_basebackup work with old
versions of the server. But the reverse is not true: an older
pg_basebackup will categorically refuse to work with a newer server
version. Therefore keeping the code for this stuff around in the
server has no value ... unless there is out-of-core code that (a) uses
the BASE_BACKUP command and (b) wouldn't immediately adopt the new
syntax and protocol anyway. If there is, we might want to keep the
backward-compatibility code around in the server for a few releases.
If not, we should probably nuke that code to simplify things and
reduce the maintenance burden.

Patches for the nuking are attached. If nobody writes back, I'm going
to assume that means nobody cares, and commit these some time before
feature freeze. If one or more people do write back, then my plan is
to see what they have to say and proceed accordingly.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0002-Remove-legacy-base-backup-protocol.patch
Description: Binary data


0001-Remove-old-base-backup-syntax.patch
Description: Binary data

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-08 Thread Robert Haas

On Mon, Dec 6, 2021 at 12:45 PM Robert Haas  wrote:
> So for example, imagine tests with 1GB of shard_buffers, 8GB, and
> 64GB. And template databases with sizes of whatever the default is,
> 1GB, 10GB, 100GB. Repeatedly make 75% of the pages dirty and then
> create a new database from one of the templates. And then just measure
> the performance. Maybe for large databases this approach is just
> really the pits -- and if your max_wal_size is too small, it
> definitely will be. But, I don't know, maybe with reasonable settings
> it's not that bad. Writing everything to disk twice - once to WAL and
> once to the target directory - has to be more expensive than doing it
> once. But on the other hand, it's all sequential I/O and the data
> pages don't need to be fsync'd, so perhaps the overhead is relatively
> mild. I don't know.

I have been tied up with other things for a bit now and have not had
time to look at this thread; sorry about that. I have a little more
time available now so I thought I would take a look at this again and
see where things stand.

Sadly, it doesn't appear to me that anyone has done any performance
testing of this patch, along the lines suggested above or otherwise,
and I think it's a crucial question for the patch. My reading of this
thread is that nobody really likes the idea of maintaining two methods
for performing CREATE DATABASE, but nobody wants to hose people who
are using it to clone large databases, either. To some extent those
things are inexorably in conflict. If we postulate that the 10TB
template database is on a local RAID array with 40 spindles, while
pg_wal is on an iSCSI volume that we access via a 128kB ISDN link,
then the new system is going to be infinitely worse. But real
situations aren't likely to be that bad, and it would be useful in my
opinion to have an idea how bad they actually are.

I'm somewhat inclined to propose that we keep the existing method
around along with the new method. Even though nobody really likes
that, we don't necessarily have to maintain both methods forever. If,
say, we use the new method by default in all cases, but add an option
to get the old method back if you need it, we could leave it that way
for a few years and then propose removing the old method (and the
switch to activate it) and see if anyone complains. That way, if the
new method turns out to suck in certain cases, users have a way out.
However, I still think doing some performance testing would be a
really good idea. It's not a great plan to make decisions about this
kind of thing in an information vacuum.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-08 Thread Robert Haas

On Wed, Dec 22, 2021 at 9:32 AM Ashutosh Sharma  wrote:
> I couldn't find the mdpostchkpt() function. Are you talking about
> SyncPostCheckpoint() ? Anyway, as you have rightly said, we need to
> unlink all the files available inside the dst_tablespaceoid/dst_dboid/
> directory by scanning the pendingUnlinks list. And finally we don't
> want the next checkpoint to unlink this file again and PANIC so for
> that we have to update the entry for this unlinked rel file in the
> hash table i.e. cancel the sync request for it.

Until commit 3eb77eba5a51780d5cf52cd66a9844cd4d26feb0 in April 2019,
there was an mdpostckpt function, which is probably what was meant
here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: decoupling table and index vacuum

2022-02-08 Thread Peter Geoghegan

On Sun, Feb 6, 2022 at 11:25 PM Dilip Kumar  wrote:
> > One thing we could try doing in order to make that easier would be:
> > tweak things so that when autovacuum vacuums the table, it only
> > vacuums the indexes if they meet some threshold for bloat. I'm not
> > sure exactly what happens with the heap vacuuming then - do we do
> > phases 1 and 2 always, or a combined heap pass, or what? But if we
> > pick some criteria that vacuums indexes sometimes and not other times,
> > we can probably start doing some meaningful measurement of whether
> > this patch is making bloat better or worse, and whether it's using
> > fewer or more resources to do it.
>
> I think we can always trigger phase 1 and 2 and phase 2 will only
> vacuum conditionally based on if all the indexes are vacuumed for some
> conveyor belt pages so we don't have risk of scanning without marking
> anything unused.

Not sure what you mean about a risk of scanning without marking any
LP_DEAD items as LP_UNUSED. If VACUUM always does some amount of this,
then it follows that the new mechanism added by the patch just can't
safely avoid any work at all, making it all pointless. We have to
expect heap vacuuming to take place much less often with the patch.
Simply because that's what the invariant described in comments above
lazy_scan_heap() requires.

Note that this is not the same thing as saying that we do less
*absolute* heap vacuuming with the conveyor belt -- my statement about
less heap vacuuming taking place is *only* true relative to the amount
of other work that happens in any individual "shortened" VACUUM
operation. We could do exactly the same total amount of heap vacuuming
as before (in a version of Postgres without the conveyor belt but with
the same settings), but much *more* index vacuuming (at least for one
or two problematic indexes).

> And we can try to measure with other approaches as
> well where we completely avoid phase 2 and it will be done only along
> with phase 1 whenever applicable.

I believe that the main benefit of the dead TID conveyor belt (outside
of global index use cases) will be to enable us to do more (much more)
index vacuuming for one index in particular. So it's not really about
doing less index vacuuming or less heap vacuuming -- it's about doing
a *greater* amount of *useful* index vacuuming, in less time. There is
often some way in which failing to vacuum one index for a long time
does lasting damage to the index structure.

-- 
Peter Geoghegan

Re: decoupling table and index vacuum

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 12:12 PM Peter Geoghegan  wrote:
> I believe that the main benefit of the dead TID conveyor belt (outside
> of global index use cases) will be to enable us to do more (much more)
> index vacuuming for one index in particular. So it's not really about
> doing less index vacuuming or less heap vacuuming -- it's about doing
> a *greater* amount of *useful* index vacuuming, in less time. There is
> often some way in which failing to vacuum one index for a long time
> does lasting damage to the index structure.

This makes sense to me, and I think it's a good insight.

It's not clear to me that we have enough information to make good
decisions about which indexes to vacuum and which indexes to skip.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: decoupling table and index vacuum

2022-02-08 Thread Peter Geoghegan

On Tue, Feb 8, 2022 at 9:33 AM Robert Haas  wrote:
> On Tue, Feb 8, 2022 at 12:12 PM Peter Geoghegan  wrote:
> > I believe that the main benefit of the dead TID conveyor belt (outside
> > of global index use cases) will be to enable us to do more (much more)
> > index vacuuming for one index in particular. So it's not really about
> > doing less index vacuuming or less heap vacuuming -- it's about doing
> > a *greater* amount of *useful* index vacuuming, in less time. There is
> > often some way in which failing to vacuum one index for a long time
> > does lasting damage to the index structure.
>
> This makes sense to me, and I think it's a good insight.
>
> It's not clear to me that we have enough information to make good
> decisions about which indexes to vacuum and which indexes to skip.

What if "extra vacuuming, not skipping vacuuming" was not just an
abstract goal, but an actual first-class part of the implementation,
and the index AM API? Then the question we're asking the index/index
AM is no longer "Do you [an index] *not* require index vacuuming, even
though you are entitled to it according to the conventional rules of
autovacuum scheduling?". The question is instead more like "Could you
use an extra, early VACUUM?".

if we invert the question like this then we have something that makes
more sense at the index AM level, but requires significant
improvements at the level of autovacuum scheduling. On the other hand
I think that you already need to do at least some work in that area.

-- 
Peter Geoghegan

Re: is the base backup protocol used by out-of-core tools?

2022-02-08 Thread Justin Pryzby

On Tue, Feb 08, 2022 at 11:26:41AM -0500, Robert Haas wrote:
> Patches for the nuking are attached. If nobody writes back, I'm going
> to assume that means nobody cares, and commit these some time before
> feature freeze. If one or more people do write back, then my plan is
> to see what they have to say and proceed accordingly.

I created this and added that for visibility and so it's not forgotten.
ISTM that could be done post-freeze, although I don't know if that's useful or
important.

https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items

-- 
Justin

Re: is the base backup protocol used by out-of-core tools?

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 1:03 PM Justin Pryzby  wrote:
> I created this and added that for visibility and so it's not forgotten.
> ISTM that could be done post-freeze, although I don't know if that's useful or
> important.
>
> https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items

Thanks. I feel like this is in the department of things that are
unlikely to be an issue for anyone, but could be. But the set of
people who could care is basically just backup tool authors, so
hopefully they'll notice this thread, or that list.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: is the base backup protocol used by out-of-core tools?

2022-02-08 Thread David Steele


On 2/8/22 12:39, Robert Haas wrote:

On Tue, Feb 8, 2022 at 1:03 PM Justin Pryzby  wrote:

I created this and added that for visibility and so it's not forgotten.
ISTM that could be done post-freeze, although I don't know if that's useful or
important.

https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items


Thanks. I feel like this is in the department of things that are
unlikely to be an issue for anyone, but could be. But the set of
people who could care is basically just backup tool authors, so
hopefully they'll notice this thread, or that list.


I'm aware of several tools that use pg_basebackup but they are using the 
command-line tool, not the server protocol directly.


Personally, I'm in favor of simplifying the code on the server side. If 
anyone is using the protocol directly (which I kind of doubt) I imagine 
they would want to take advantage of all the new features that have been 
added.


Regards,
-David

Re: decoupling table and index vacuum

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 12:50 PM Peter Geoghegan  wrote:
> > It's not clear to me that we have enough information to make good
> > decisions about which indexes to vacuum and which indexes to skip.
>
> What if "extra vacuuming, not skipping vacuuming" was not just an
> abstract goal, but an actual first-class part of the implementation,
> and the index AM API? Then the question we're asking the index/index
> AM is no longer "Do you [an index] *not* require index vacuuming, even
> though you are entitled to it according to the conventional rules of
> autovacuum scheduling?". The question is instead more like "Could you
> use an extra, early VACUUM?".
>
> if we invert the question like this then we have something that makes
> more sense at the index AM level, but requires significant
> improvements at the level of autovacuum scheduling. On the other hand
> I think that you already need to do at least some work in that area.

Right, that's why I asked the question. If we're going to ask the
index AM whether it would like to be vacuumed right now, we're going
to have to put some logic into the index AM that knows how to answer
that question. But if we don't have any useful statistics that would
let us answer the question correctly, then we have problems.

While I basically agree with everything that you just wrote, I'm
somewhat inclined to think that the question is not best phrased as
either extra-vacuum or skip-a-vacuum. Either of those supposes a
normative amount of vacuuming from which we could deviate in one
direction or the other. I think it would be better to phrase it in a
way that doesn't make such a supposition. Maybe something like: "Hi,
we are vacuuming the heap right now and we are also going to vacuum
any indexes that would like it, and does that include you?"

The point is that it's a continuum. If we decide that we're asking the
index "do you want extra vacuuming?" then that phrasing suggests that
you should only say yes if you really need it. If we decide we're
asking the index "can we skip vacuuming you this time?" then the
phrasing suggests that you should not feel bad about insisting on a
vacuum right now, and only surrender your claim if you're sure you
don't need it. But in reality, no bias either way is warranted. It is
either better that this index should be vacuumed right now, or better
that it should not be vacuumed right now, and whichever is better
should be what we choose to do.

To expand on that just a bit, if I'm a btree index and someone asks me
"can we skip vacuuming you this time?" I might say "return dead_tups <
tiny_amount" and if they ask me "do you want extra vacuuming" I might
say "return dead_tups > quite_large_amount". But if they ask me
"should we vacuum you now?" then I might say "return dead_tups >
moderate_amount" which feels like the correct thing here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: is the base backup protocol used by out-of-core tools?

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 1:52 PM David Steele  wrote:
> I'm aware of several tools that use pg_basebackup but they are using the
> command-line tool, not the server protocol directly.

Good to know.

> Personally, I'm in favor of simplifying the code on the server side. If
> anyone is using the protocol directly (which I kind of doubt) I imagine
> they would want to take advantage of all the new features that have been
> added.

I agree, and I'm glad you do too, but I thought it was worth asking.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Documentation about PL transforms

2022-02-08 Thread Chapman Flack

On 02/07/22 15:14, Chapman Flack wrote:
> It has since occurred to me that another benefit of having a
> transform_validator per PL would be immediate error reporting
> if someone, for whatever reason, tries out CREATE TRANSFORM
> for a PL that doesn't grok transforms.

The same could be achieved, I guess, by an event trigger, though that
would take positive effort to set up, where the benefit of a per-PL
transform_validator would be that if a given PL does not bother to
provide one, CREATE TRANSFORM for it would automatically fail. (And
such a validator would not have to spend most of its life ignoring
other DDL.)

That does reveal another documentation gap: table 40.1 does not show
CREATE or DROP TRANSFORM being supported for event triggers. I've
confirmed they work, though. I'll tack that onto the doc patch.

I notice our transforms lack the named groups of 9075-2. With those
(plus our LANGUAGE clause), I could write, for a made-up example:

CREATE TRANSFORM FOR hstore LANGUAGE plpython3u
  asdict (
FROM SQL WITH FUNCTION hstore_to_plpython3dict,
TO SQL WITH FUNCTION plpython3dict_to_hstore)
  asnamedtuple (
FROM SQL WITH FUNCTION hstore_to_plpython3namedtup,
TO SQL WITH FUNCTION plpython3namedtup_to_hstore);

CREATE FUNCTION f1(val hstore) RETURNS int
LANGUAGE plpython3u
TRANSFORM GROUP asdict FOR TYPE hstore
...

CREATE FUNCTION f2(val hstore) RETURNS int
LANGUAGE plpython3u
TRANSFORM GROUP asnamedtuple FOR TYPE hstore
...

It seems to me that could be useful, in cases where a PL offers
more than one good choice for representing a PostgreSQL type and
the preferred one could depend on the function.

Was that considered and rejected for our transforms, or were ours
just based on an earlier 9075-2 without the named groups?

Also, I am doubtful of our Compatibility note, "There is a CREATE
TRANSFORM command in the SQL standard, but it is for adapting data
types to client languages."

In my reading of 9075-2, I do see the transforms used for client
languages (all the s), but I also see
the to-sql and from-sql functions being applied in 
whenever "R is an external routine" and the type "is a user-defined type".
The latter doesn't seem much different from our usage. The differences
I see are (1) our LANGUAGE clause, (2) we don't have a special
"user-defined type" category limiting what types can have transforms
(and (3), we don't have the named groups). And we are applying them
/only/ for server-side routines, rather than for server and client code.

The ISO transforms work by mapping the ("user-defined") type to some
existing SQL type for which the PL has a standard mapping already. Ours
work by mapping the type directly to some suitable type in the PL.

Am I reading this accurately?

Regards,
-Chap

Re: Improve correlation names in sanity tests

2022-02-08 Thread Tom Lane

Peter Eisentraut  writes:
> I had to do some analysis on the "sanity" tests in the regression test 
> suite (opr_sanity, type_sanity) recently, and I found some of the 
> queries very confusing.  One main stumbling block is that for some 
> probably ancient reason many of the older queries are written with 
> correlation names p1, p2, etc. independent of the name of the catalog. 

I think that was at least partially my fault, and no there isn't
any very good reason for it.  Your proposal seems fine.

regards, tom lane

Re: decoupling table and index vacuum

2022-02-08 Thread Peter Geoghegan

On Tue, Feb 8, 2022 at 10:58 AM Robert Haas  wrote:
> Right, that's why I asked the question. If we're going to ask the
> index AM whether it would like to be vacuumed right now, we're going
> to have to put some logic into the index AM that knows how to answer
> that question. But if we don't have any useful statistics that would
> let us answer the question correctly, then we have problems.

I have very little faith in the use of statistical sampling for
anything involving vacuuming. In fact, I think that the current way in
which ANALYZE counts dead tuples is a misapplication of statistics. It
isn't even wrong. One of the things that I really like about this
project is that it can plausibly solve that problem by splitting up
the work of VACUUM, at low cost -- it's less top-down. Not only do you
get the obvious benefits with preventing bloat; you also get
*continual* feedback about the actual physical reality in the table
(and indexes, to a lesser extent). As I said recently, right now the
more bloat we have, the more uncertainty about the total amount of
bloat exists. We need to control both the bloat, and the uncertainty
about the bloat.

The basic high level idea behind how the optimizer uses statistics
involves the assumption that *all* the rows in the table are
*themselves* a sample taken from some larger distribution -- something
from the real physical world (meeting this assumption is one reason
why database/schema normalization really matters). And so on a good
week it probably won't matter too much to the optimizer if ANALYZE
doesn't run until the table size doubles (for a table that was already
quite large). These are pretty delicate assumptions, that (from the
point of view of the optimizer) work out surprisingly well in
practice.

Bloat just isn't like that. Dead tuples are fundamentally cyclic and
dynamic in nature -- conventional statistics just won't work with
something like that. Worst of all, the process that counts dead tuples
(ANALYZE) is really an active participant in the system -- the whole
entire purpose of even looking is to *reduce* the number of dead
tuples by making an autovacuum run. That's deeply weird.

> The point is that it's a continuum. If we decide that we're asking the
> index "do you want extra vacuuming?" then that phrasing suggests that
> you should only say yes if you really need it. If we decide we're
> asking the index "can we skip vacuuming you this time?" then the
> phrasing suggests that you should not feel bad about insisting on a
> vacuum right now, and only surrender your claim if you're sure you
> don't need it. But in reality, no bias either way is warranted.

Actually, I think that this particular bias *is* warranted. We should
openly and plainly be biased in the direction of causing the least
harm. What's wrong with that? Having accurate information in not an
intrinsic good. I even think that having more information can be
strictly worse, because you might actually believe it. Variance
matters a lot -- the bias/variance tradeoff is pretty fundamental
here.

I'm also saying some of this stuff because of broader VACUUM design
considerations. VACUUM fundamentally has to work at the table level,
and I don't see that changing. The approach of making autovacuum do
something akin to a plain VACUUM command in the simplest cases, and
only later some extra "dynamic mini vacuums" (that pick up where the
VACUUM command style VACUUM left off) has a lot to recommend it. This
approach allows most of the current autovacuum settings to continue to
work in roughly the same way. They just need to have their
documentation updated to make it clear that they're about the worst
case.

> To expand on that just a bit, if I'm a btree index and someone asks me
> "can we skip vacuuming you this time?" I might say "return dead_tups <
> tiny_amount" and if they ask me "do you want extra vacuuming" I might
> say "return dead_tups > quite_large_amount". But if they ask me
> "should we vacuum you now?" then I might say "return dead_tups >
> moderate_amount" which feels like the correct thing here.

The btree side of this shouldn't care at all about dead tuples (in
general we focus way too much on dead tuples, and way too little on
pages). With bottom-up index deletion the number of dead tuples in the
index is just about completely irrelevant. It's entirely possible and
often even likely that 20%+ of all index tuples will be dead at any
one time, when the optimization perfectly preserves the index
structure.

The btree side of the index AM API should be focussing on the growth
in index size, relative to some expectation (like maybe the growth for
whatever index on the same table has grown the least since last time,
accounting for obvious special cases like partial indexes). Perhaps
we'd give some consideration to bulk deletes, too. Overall, it should
be pretty simple, and should sometimes force us to do one of these
"dynamic mini vacuums" of the index just because we're not quite sure
wh

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-08 Thread Robert Haas

On Sun, Dec 12, 2021 at 3:09 AM Dilip Kumar  wrote:
> Correct, I have done this cleanup, apart from this we have dropped the
> fsyc request in create database failure case as well and also need to
> drop buffer in error case of creatdb as well as movedb.  I have also
> fixed the other issue for which you gave the patch (a bit differently)
> basically, in case of movedb the source and destination dboid are same
> so we don't need an additional parameter and also readjusted the
> conditions to avoid nested if.

Amazingly to me given how much time has passed, these patches still
apply, although I think there are a few outstanding issues that you
promised to fix in the next version and haven't yet addressed.

In 0007, I think you will need to work a bit harder. I don't think
that you can just add a second argument to
ForgetDatabaseSyncRequests() that makes it do something other than
what the name of the function suggests but without renaming the
function or updating any comments. Elsewhere we have things like
TablespaceCreateDbspace and ResetUnloggedRelationsInDbspaceDir so
perhaps we ought to just add a new function with a name inspired by
those precedents alongside the existing one, rather than doing it this
way.

In 0008, this is a bit confusing:

+   PageInit(dstPage, BufferGetPageSize(dstBuf), 0);
+   memcpy(dstPage, srcPage, BLCKSZ);

After a minute, I figured out that the point here was to force
log_newpage() to actually set the LSN, but how about a comment?

I kind of wonder whether GetDatabaseRelationList should be broken into
two functions so that don't have quite such deep nesting. And I wonder
if maybe the return value of GetActiveSnapshot() should be cached in a
local variable.

On the whole I think there aren't huge code-level issues here, even if
things need to be tweaked here and there and bugs fixed. The real key
is arriving at a set of design trade-offs that doesn't make anyone too
upset.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Refactoring the regression tests for more independence

2022-02-08 Thread Tom Lane

Julien Rouhaud  writes:
> On Mon, Feb 07, 2022 at 02:00:25PM -0500, Tom Lane wrote:
>> Not too surprisingly, these patches broke during the commitfest.
>> Here's a rebased version.
>> I'm not sure that anyone wants to review these in detail ...
>> should I just go ahead and push them?

> I don't see anything shocking after a quick glance, and I don't think any
> review is going to give any more confidence compared to the script-dep-testing
> script, so +1 for pushing them since the cf bot is green again.

Done, will watch the farm.

regards, tom lane

Re: Move replication slot structures/enums/macros to a new header file for better usability by external tools/modules

2022-02-08 Thread Peter Smith

On Mon, Feb 7, 2022 at 4:22 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> While working on pg_replslotdata tool [1], it was observed that some
> of the replication slot structures/enums/macros such as
> ReplicationSlotPersistentData, ReplicationSlotPersistency,
> ReplicationSlotOnDisk, ReplicationSlotOnDisk etc. are currently in
> replication/slot.h and replication/slot.c. This makes the replication
> slot on disk data structures unusable by the external tools/modules.
> How about moving these structures to a new header file called
> slot_common.h as attached in the patch here?
>
> Thoughts?
>
> PS: I'm planning to have the tool separately, as it was rejected to be in 
> core.
>
> [1] 
> https://www.postgresql.org/message-id/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com
>
> Regards,
> Bharath Rupireddy.

Recently I was also looking to add some new enums but I found it
was difficult to find any good place to put them where they could be
shared by the replication code and the pg_recvlogical tool.

So +1 to your suggestion to have a common header, but I wonder can it
have a more generic name (e.g. repl_common.h? ...) since the
stuff I wanted to put there was not really "slot" related.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Fix BUG #17335: Duplicate result rows in Gather node

2022-02-08 Thread David Rowley

Thanks for having a look at this.

On Fri, 4 Feb 2022 at 13:48, Robert Haas  wrote:
> I think the actual rule is: every path under a Gather or GatherMerge
> must be parallel-safe.

I've adjusted the patch so that it counts parallel_aware and
parallel_safe Paths independently and verifies everything below a
Gather[Merge] is parallel_safe.

The diff stat currently looks like:

src/backend/optimizer/plan/createplan.c | 230
1 file changed, 230 insertions(+)

I still feel this is quite a bit of code for what we're getting here.
I'd be more for it if the path traversal function existed for some
other reason and I was just adding the callback functions and Asserts.

I'm keen to hear what others think about that.

David
diff --git a/src/backend/optimizer/plan/createplan.c 
b/src/backend/optimizer/plan/createplan.c
index cd6d72c763..898046ca07 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -313,6 +313,20 @@ static ModifyTable *make_modifytable(PlannerInfo *root, 
Plan *subplan,
 List 
*rowMarks, OnConflictExpr *onconflict, int epqParam);
 static GatherMerge *create_gather_merge_plan(PlannerInfo *root,

 GatherMergePath *best_path);
+static bool contains_a_parallel_aware_path(Path *path);
+static bool contains_only_parallel_safe_paths(Path *path);
+
+/*
+ * PathTypeCount
+ * Used for various checks to assert plans are sane in assert 
enabled
+ * builds.
+ */
+typedef struct PathTypeCount
+{
+   uint64 count;
+   uint64 parallel_safe_count;
+   uint64 parallel_aware_count;
+} PathTypeCount;
 
 
 /*
@@ -389,6 +403,10 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, 
int flags)
/* Guard against stack overflow due to overly complex plans */
check_stack_depth();
 
+   /* Parallel aware paths should contain only parallel safe subpaths. */
+   Assert(!best_path->parallel_aware ||
+  contains_only_parallel_safe_paths(best_path));
+
switch (best_path->pathtype)
{
case T_SeqScan:
@@ -481,6 +499,14 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, 
int flags)
case T_Gather:
plan = (Plan *) create_gather_plan(root,

   (GatherPath *) best_path);
+
+   /*
+* We expect a Gather to contain at least one parallel 
aware path
+* unless running in single_copy mode.
+*/
+   Assert(((GatherPath *) best_path)->single_copy ||
+  contains_a_parallel_aware_path(((GatherPath 
*)
+   
  best_path)->subpath));
break;
case T_Sort:
plan = (Plan *) create_sort_plan(root,
@@ -537,6 +563,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int 
flags)
case T_GatherMerge:
plan = (Plan *) create_gather_merge_plan(root,

 (GatherMergePath *) best_path);
+   /* GatherMerge must contain at least one parallel aware 
path */
+   Assert(contains_a_parallel_aware_path(((GatherMergePath 
*)
+   
  best_path)->subpath));
break;
default:
elog(ERROR, "unrecognized node type: %d",
@@ -7052,6 +7081,207 @@ make_modifytable(PlannerInfo *root, Plan *subplan,
return node;
 }
 
+/*
+ * path_tree_walker
+ * Walk a path tree beginning with 'path' and call the 'walker' 
function
+ * for that path and each of its subpaths recursively.
+ */
+static void
+path_tree_walker(Path *path, void (*walker) (), void *context)
+
+{
+   if (path == NULL)
+   return;
+
+   /* Guard against stack overflow due to overly complex path trees */
+   check_stack_depth();
+
+   walker(path, context);
+
+   switch (path->pathtype)
+   {
+   case T_SeqScan:
+   case T_SampleScan:
+   case T_IndexScan:
+   case T_IndexOnlyScan:
+   case T_BitmapHeapScan:
+   case T_TidScan:
+   case T_TidRangeScan:
+   case T_SubqueryScan:
+   case T_FunctionScan:
+   case T_TableFuncScan:
+   case T_ValuesScan:
+   case T_CteScan:
+   case T_WorkTableScan:
+   case T_NamedTuplestoreSc

Re: make MaxBackends available in _PG_init

2022-02-08 Thread Robert Haas

On Fri, Feb 4, 2022 at 3:27 PM Robert Haas  wrote:
> Great. I'll take a look at this next week, but not right now, mostly
> because it's Friday afternoon and if I commit it and anything breaks I
> don't want to end up having to fix it on the weekend if I can avoid
> it.

After some investigation I've determined that it's no longer Friday
afternoon. I also spent time investigating whether the patch had
problems that would make me uncomfortable with the idea of committing
it, and I did not find any. So, I committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix BUG #17335: Duplicate result rows in Gather node

2022-02-08 Thread Zhihong Yu

On Tue, Feb 8, 2022 at 1:11 PM David Rowley  wrote:

> Thanks for having a look at this.
>
> On Fri, 4 Feb 2022 at 13:48, Robert Haas  wrote:
> > I think the actual rule is: every path under a Gather or GatherMerge
> > must be parallel-safe.
>
> I've adjusted the patch so that it counts parallel_aware and
> parallel_safe Paths independently and verifies everything below a
> Gather[Merge] is parallel_safe.
>
> The diff stat currently looks like:
>
> src/backend/optimizer/plan/createplan.c | 230
> 1 file changed, 230 insertions(+)
>
> I still feel this is quite a bit of code for what we're getting here.
> I'd be more for it if the path traversal function existed for some
> other reason and I was just adding the callback functions and Asserts.
>
> I'm keen to hear what others think about that.
>
> David
>
Hi,

+   break;
+   case T_MergeAppend:

The case for T_MergeAppend should be left indented.

+   case T_Result:
+   if (IsA(path, ProjectionPath))

Since the remaining sub-cases don't have subpath, they are covered by the
final `else` block - MinMaxAggPath and GroupResultPath don't need to be
checked.

For contains_a_parallel_aware_path(), it seems path_type_counter() can
return bool indicating whether the walker should return early (when
parallel aware count reaches 1).

Cheers

Re: Fix BUG #17335: Duplicate result rows in Gather node

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 4:11 PM David Rowley  wrote:
> I still feel this is quite a bit of code for what we're getting here.
> I'd be more for it if the path traversal function existed for some
> other reason and I was just adding the callback functions and Asserts.
>
> I'm keen to hear what others think about that.

My view is that functions like path_tree_walker are good things to
have on general principle. I find it likely that it will find other
uses, and that if we don't add as part of this patch, someone will add
it for some other reason in the future. So I would not really count
that in deciding how big this patch is, and the rest of what you have
here is pretty short and to the point.

There is the more difficult philosophical question of whether it's
worth expending any code on this at all. I think it is pretty clear
that this has positive value: it could easily prevent >0 future bugs,
which IMHO is not bad for such a small patch. However, it does feel a
little bit primitive somehow, in the sense that there are a lot of
things you could do wrong which this wouldn't catch. For example, a
Gather with no parallel-aware node under it is probably busted, unless
someone invents new kinds of parallel operators that work differently
from what we have now. But a join beneath a Gather that is not itself
parallel-aware should have a parallel-aware node under exactly one
side of the join. If there's a parallel scan on both sides or neither
side, even with stuff on top of it, that's wrong. But a parallel-aware
join could do something else, e.g. Parallel Hash Join expects a
parallel path on both sides. Some other parallel-aware join type could
expect a parallel path on exactly one side without caring which one,
or on one specific side, or maybe even on neither side.

What we're really reasoning about here is whether the input is going
to be partitioned across multiple executions of the plan in a proper
way. A Gather is going to run the same plan in all of its workers, so
it wants a subplan that when run in all workers will together produce
all output rows. Parallel-aware scans partition the results across
workers, so they behave that way. A non-parallel aware join will work
that way if it joins a partition the input on one side to all of the
input from the other side, hence the rule I describe above. For
aggregates, you can't safely apply a plain old Aggregate operation
either to a regular scan or to a parallel-aware scan and get the right
answer, which is why we need Partial and Finalize stages for parallel
query. But for a lot of other nodes, like Materialize, their output
will have the same properties as the input: if the subplan of a
Materialize node produces all the rows on each execution, the
Materialize node will too; if it produces a partition of the output
rows each time it's executed, once per worker, the Materialize node
will do the same. And I think it's that kind of case that leads to the
check we have here, that there ought to be a parallel-aware node in
there someplace.

It might be the case that there's some more sophisticated check we
could be doing here that would be more satisfying than the one you've
written, but I'm not sure. Such a check might end up needing to know
the behavior of the existing nodes in a lot of detail, which then
wouldn't help with finding bugs in new functionality we add in the
future. In that sense, the kind of simple check you've got here has
something to recommend it: it won't catch everything people can do
wrong, but when it does trip, chances are good it's found a bug, and
it's got a good chance of continuing to work as well as it does today
even in the face of future additions. So I guess I'm mildly in favor
of it, but I would also find it entirely reasonable if you were to
decide it's not quite worth it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: WIP: System Versioned Temporal Table

2022-02-08 Thread Vik Fearing


On 1/24/22 00:16, Corey Huinker wrote:

- Table schemas change, and all (SV active) AV items would logically
need to fit the active schema or be updated to do so. Different story
for SV, nothing there should ever need to be changed.


Yeah, there's a mess (which you state below) about what happens if you
create a table and then rename a column, or drop a column and add a
same-named column back of another type at a later date, etc. In theory,
this means that the valid set of columns and their types changes according
to the time range specified. I may not be remembering correctly, but Vik
stated that the SQL spec seemed to imply that you had to track all those
things.


The spec does not allow schema changes at all on a a system versioned 
table, except to change the system versioning itself.

--
Vik Fearing

Re: [RFC] building postgres with meson - perl embedding

2022-02-08 Thread Tom Lane

Andres Freund  writes:
> On 2022-02-07 20:42:09 -0500, Tom Lane wrote:
>> ... Peter just copied the logic in 7662419f1.  Considering that
>> the point of 7662419f1 was to get rid of MakeMaker, maybe we no
>> longer needed that at that point.

> Yea. And maybe what was broken in 2001 isn't broken anymore either ;)

Yeah --- note that Bruce was complaining about a problem on
Perl 5.005, which was already a bit over-the-hill in 2001.

> AIX is the one exception. Specifying -bE... doesn't seem right for building
> plperl etc. So possibly the subtraction accidentally does work for us there...

I tried this on AIX 7.2 (using the gcc farm, same build options
as hoverfly).  The build still works and passes regression tests,
but you get a warning about each symbol exported by Perl itself:

...
ld: 0711-415 WARNING: Symbol PL_veto_cleanup is already exported.
ld: 0711-415 WARNING: Symbol PL_warn_nl is already exported.
ld: 0711-415 WARNING: Symbol PL_warn_nosemi is already exported.
ld: 0711-415 WARNING: Symbol PL_warn_reserved is already exported.
ld: 0711-415 WARNING: Symbol PL_warn_uninit is already exported.
ld: 0711-415 WARNING: Symbol PL_WB_invlist is already exported.
ld: 0711-415 WARNING: Symbol PL_XPosix_ptrs is already exported.
ld: 0711-415 WARNING: Symbol PL_Yes is already exported.
ld: 0711-415 WARNING: Symbol PL_Zero is already exported.

So there's about 1200 such warnings for plperl, and then the same
again for each contrib foo_plperl module.  Maybe that's annoying
enough that we should keep the logic.  OTOH, it seems entirely
accidental that it has that effect.  I'd be a little inclined to
replace it with some rule about stripping '-bE:' switches out of
the ldopts result.

regards, tom lane

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-08 Thread Ken Kato


On 2022-02-08 23:16, Fujii Masao wrote:

If you want to avoid the line longer than 80 columns, you should break
it into two or more rather than remove the test code, I think. What to
test is more important than formatting.

Also the following descriptions about formatting would be helpful.

---
https://www.postgresql.org/docs/devel/source-format.html

Limit line lengths so that the code is readable in an 80-column window.
(This doesn't mean that you must never go past 80 columns. For 
instance,

breaking a long error message string in arbitrary places just to keep
the code within 80 columns is probably not a net gain in readability.)
---

Therefore I'm ok with the patch that I posted upthread. Also I'm ok if
you will break that longer line into two and post new patch. Or if the
value '010' is really useless for the test purpose, I'm also ok if you
remove it. Thought?


Thank you for the explanation!

Even though the line is over 80 characters, it makes more sense to put 
in one line and it enhances readability IMO.
Also, '010' is good to have since it is the only octal value in the 
test.


Therefore, I think min_max_aggregates_for_xid8_v4.patch is the best one 
to go.


Best wishes,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

RE: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-02-08 Thread r.takahash...@fujitsu.com

Hi,


Thank you for updating the patch.
I agree with the documentation and program.

How about adding the test for %c (Session ID)?
(Adding the test for %C (cluster_name) seems difficult.)


Regards,
Ryohei Takahashi

Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-08 Thread Joe Conway


On 2/8/22 10:07, Robert Haas wrote:

On Tue, Feb 8, 2022 at 10:00 AM Joshua Brindle
 wrote:

4 predefined roles currently use has_privs_of_role in master.

Further, pg_monitor, as an SQL-only predefined role, also behaves
consistently with the INHERIT rules that other roles do.

In order for SQL-only predefined roles to ignore INHERIT we would need
to hardcode bypasses for them, which IMO seems like the worst possible
solution to the current inconsistency.


I agree we need to make the situation consistent. But if you think
there's exactly one problem here and this patch fixes it, I
emphatically disagree.



If we were to start all over again with this feature my vote would be to 
do things differently than we have done. I would not have called them 
predefined roles, and I would have used attributes of roles (e.g. make 
rolsuper into a bitmap rather than a boolean) rather than role 
membership to implement them. But I didn't find time to participate in 
the original discussion or review/write the code, so I have little room 
to complain.


However since we did call these things predefined roles, and used role 
membership to implement them, I think they ought to behave both 
self-consistently as a group, and like other real roles.


That is what this patch does unless I am missing something.

I guess an alternative is to discuss a "proper fix", but it seems to me 
that would be a version 16 thing given how late we are in this 
development cycle and how invasive it is likely to be. And doing nothing 
for pg15 is not a very satisfying proposition :-/


Joe


--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [PATCH] Accept IP addresses in server certificate SANs

2022-02-08 Thread Jacob Champion

On Mon, 2022-02-07 at 17:29 +0900, Kyotaro Horiguchi wrote:
> At Fri, 4 Feb 2022 17:06:53 +, Jacob Champion  
> wrote in 
> > That works a lot better than what I had in my head. Done that way in
> > v4. Thanks!
> 
> Thanks!
> 
> 0002:
> 
> +#define PGSQL_AF_INET  (AF_INET + 0)
> +#define PGSQL_AF_INET6 (AF_INET + 1)
> ..
> -#define PGSQL_AF_INET  (AF_INET + 0)
> -#define PGSQL_AF_INET6 (AF_INET + 1)
> 
> I feel this should be a part of 0001.  (But the patches will be
> finally merged so maybe no need to bother moving it).

Okay. I can move it easily if you feel like it would help review, but
for now I've kept it in 0002.

> > * The use of inet_aton() instead of inet_pton() is deliberate; the
> > * latter cannot handle alternate IPv4 notations ("numbers-and-dots").
> 
> I think we should be consistent in handling IP addresses.  We have
> both inet_pton and inet_aton to parse IPv4 addresses.
> 
> We use inet_pton in the inet type (network_in).
> We use inet_aton in server addresses.
> 
> # Hmm. I'm surprised to see listen_addresses accepts "0x7f.1".
> # I think we should accept the same by network_in but it is another
> # issue.

Yeah, that's an interesting inconsistency.

> So, inet_aton there seems to be the right choice but the comment
> doesn't describe the reason for that behavior. I think we should add
> an explanation about the reason for the behavior, maybe something like
> this:
> 
> > We accept alternative IPv4 address notations that are accepted by
> > inet_aton but not by inet_pton as server address.

I've pulled this wording into the comment in v5, attached.

> +* GEN_IPADD is an OCTET STRING containing an IP address in network 
> byte
> +* order.
> 
> +   /* OK to cast from unsigned to plain char, since it's all ASCII. */
> +   return pq_verify_peer_name_matches_certificate_ip(conn, (const char 
> *) addrdata, len, store_name);
> 
> Aren't the two comments contradicting each other? The retruned general
> name looks like an octet array, which is not a subset of ASCII
> string. So pq_verify_peer_name_matches_certificate_ip should receive
> addrdata as "const unsigned char *", without casting.

Bad copy-paste on my part; thanks for the catch. Fixed.

> +   if (name->type == host_type)
> +   check_cn = false;
> 
> Don't we want a concise coment for this?

Added one; see what you think.

> -   if (*names_examined == 0)
> +   if ((rc == 0) && check_cn)
> 
> To me, it seems a bit hard to understand.  We can set false to
> check_cn in the rc != 0 path in the loop on i, like this:
> 
> >   if (rc != 0)
> > + {
> > + /*
> > +  * don't fall back to CN when we have a match 
> > or have an error
> > +  */
> > + check_cn = false;
> >   break;
> > + }
> ...
> > - if ((rc == 0) && check_cn)
> > + if (check_cn)

If I understand right, that's not quite equivalent (and the new tests
fail if I implement it that way). We have to disable fallback if the
SAN exists, whether it matches or not.

> The following existing code (CN fallback)
> 
> >   rc = openssl_verify_peer_name_matches_certificate_name(conn,
> >   X509_NAME_ENTRY_get_data(X509_NAME_get_entry(subject_name, cn_index)),
> >   first_name);
> 
> is expecting that first_name has not been set when it is visited.
> However, with this patch, first_name can be set when the cert has any
> SAN of unmatching type (DNS/IPADD) and the already-set name leaks.  We
> need to avoid that memory leak since the path can be visited multiple
> times from the user-program of libpq. I came up with two directions.
> 
> 1. Completely ignore type-unmatching entries. first_name is not set by
>  such entries.  Such unmatching entreis doesn't increase
>  *names_examined.
> 
> 2. Avoid overwriting first_name there.
> 
> I like 1, but since we don't make distinction between DNS and IPADDR
> in the error message emited by the caller, we would take 2?

Great catch, thanks! I implemented option 2 to start. Option 1 might
make things difficult to debug if you're connecting to a server by IP
address but its certificate only has DNS names.

Thanks!
--Jacob

commit 8c427e3289a28cb683eff5d05b2e8770c3c07662
Author: Jacob Champion 
Date:   Tue Feb 8 16:26:27 2022 -0800

squash! libpq: allow IP address SANs in server certs

diff --git a/src/interfaces/libpq/fe-secure-common.c 
b/src/interfaces/libpq/fe-secure-common.c
index cfdde58e67..4d78715756 100644
--- a/src/interfaces/libpq/fe-secure-common.c
+++ b/src/interfaces/libpq/fe-secure-common.c
@@ -160,7 +160,8 @@ pq_verify_peer_name_matches_certificate_name(PGconn *conn,
  */
 int
 pq_verify_peer_name_matches_certificate_ip(PGconn *conn,
-   
   const char *ipdata, size_

RE: [BUG]Update Toast data failure in logical replication

2022-02-08 Thread tanghy.f...@fujitsu.com

On Mon, Feb 7, 2022 2:55 PM Amit Kapila  wrote:
> 
> On Sat, Feb 5, 2022 at 6:10 AM Amit Kapila  wrote:
> >
> > On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera  
> > wrote:
> > >
> > >
> > >  I have some suggestions
> > > on the comments and docs though.
> > >
> >
> > Thanks, your suggestions look good to me. I'll take care of these in
> > the next version.
> >
> 
> Attached please find the modified patches.
> 

Thanks for your patch. I tried it and it works well.
Two small comments:

1)
+static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+   
   Bitmapset *interesting_cols,
+   
   Bitmapset *external_cols,
+   
   HeapTuple oldtup, HeapTuple newtup,
+   
   bool *id_has_external);

+HeapDetermineColumnsInfo(Relation relation,
+Bitmapset *interesting_cols,
+Bitmapset *external_cols,
+HeapTuple oldtup, HeapTuple 
newtup,
+bool *has_external)

The declaration and the definition of this function use different parameter
names for the last parameter (id_has_external and has_external), it's better to
be consistent.

2)
+   /*
+* Check if the old tuple's attribute is stored externally and 
is a
+* member of external_cols.
+*/
+   if (VARATT_IS_EXTERNAL((struct varlena *) 
DatumGetPointer(value1)) &&
+   bms_is_member(attrnum - 
FirstLowInvalidHeapAttributeNumber,
+ external_cols))
+   *has_external = true;

If has_external is already true, it seems we don't need this check, so should we
check has_external first?

Regards,
Tang

Re: GUC flags

2022-02-08 Thread Michael Paquier

On Tue, Feb 08, 2022 at 01:06:29PM +0900, Michael Paquier wrote:
> Makes sense.  check_guc also checks after this pattern.

Okay, I have done all the adjustments you mentioned, added a couple of
comments and applied the patch.  If the buildfarm is happy, I'll
go retire check_guc.
--
Michael


signature.asc
Description: PGP signature

Re: row filtering for logical replication

2022-02-08 Thread Peter Smith

I did a review of the v79 patch. Below are my review comments:

==

1. doc/src/sgml/ref/create_publication.sgml - CREATE PUBLICATION

The commit message for v79-0001 says:

If your publication contains a partitioned table, the publication parameter
publish_via_partition_root determines if it uses the partition row filter (if
the parameter is false, the default) or the root partitioned table row filter.


I think that the same information should also be mentioned in the PG
DOCS for CREATE PUBLICATION note about the WHERE clause.

~~~

2. src/backend/commands/publicationcmds.c -
contain_mutable_or_ud_functions_checker

+/* check_functions_in_node callback */
+static bool
+contain_mutable_or_user_functions_checker(Oid func_id, void *context)
+{
+ return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE ||
+ func_id >= FirstNormalObjectId);
+}

I was wondering why is the checking for user function and mutable
functions combined in one function like this.  IMO it might be better
to have 2 "checker" callback functions instead of just one  - then the
error messages can be split too so that only the relevant one is
displayed to the user.

BEFORE
contain_mutable_or_user_functions_checker --> "User-defined or
built-in mutable functions are not allowed."

AFTER
contain_user_functions_checker --> "User-defined functions are not allowed."
contain_mutable_function_checker --> "Built-in mutable functions are
not allowed."

~~~

3. src/backend/commands/publicationcmds.c - check_simple_rowfilter_expr_walker

+ case T_Const:
+ case T_FuncExpr:
+ case T_BoolExpr:
+ case T_RelabelType:
+ case T_CollateExpr:
+ case T_CaseExpr:
+ case T_CaseTestExpr:
+ case T_ArrayExpr:
+ case T_CoalesceExpr:
+ case T_MinMaxExpr:
+ case T_XmlExpr:
+ case T_NullTest:
+ case T_BooleanTest:
+ case T_List:
+ break;

Perhaps a comment should be added here simply saying "OK, supported"
just to make it more obvious?

~~~

4. src/test/regress/sql/publication.sql - test comment

+-- fail - user-defined types disallowed

For consistency with the nearby comments it would be better to reword this one:
 "fail - user-defined types are not allowed"

~~~

5. src/test/regress/sql/publication.sql - test for \d

+-- test \d+ (now it displays filter information)
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION testpub_dplus_rf_yes FOR TABLE testpub_rf_tbl1
WHERE (a > 1) WITH (publish = 'insert');
+CREATE PUBLICATION testpub_dplus_rf_no FOR TABLE testpub_rf_tbl1;
+RESET client_min_messages;
+\d+ testpub_rf_tbl1

Actually, the \d (without "+") will also display filters but I don't
think that has been tested anywhere. So suggest updating the comment
and adding one more test

AFTER
-- test \d+  and \d  (now these display filter
information)
...
\d+ testpub_rf_tbl1
\d testpub_rf_tbl1

~~~

6. src/test/regress/sql/publication.sql - tests for partitioned table

+-- Tests for partitioned table
+ALTER PUBLICATION testpub6 SET TABLE rf_tbl_abcd_part_pk WHERE (a > 99);
+ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0);
+-- ok - partition does not have row filter
+UPDATE rf_tbl_abcd_part_pk SET a = 1;
+ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=1);
+-- ok - "a" is a OK col
+UPDATE rf_tbl_abcd_part_pk SET a = 1;
+ALTER PUBLICATION testpub6 SET TABLE rf_tbl_abcd_part_pk WHERE (b > 99);
+ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0);
+-- ok - partition does not have row filter
+UPDATE rf_tbl_abcd_part_pk SET a = 1;
+ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=1);
+-- fail - "b" is not in REPLICA IDENTITY INDEX
+UPDATE rf_tbl_abcd_part_pk SET a = 1;

Those comments and the way the code is arranged did not make it very
clear to me what exactly these tests are doing.

I think all the changes to the publish_via_partition_root belong BELOW
those test comments don't they?
Also the same comment "-- ok - partition does not have row filter"
appears 2 times so that can be made more clear too.

e.g. IIUC it should be changed to something a bit like this (Note - I
did not change the SQL, I only moved it a bit and changed the
comments):

AFTER (??)
-- Tests for partitioned table
ALTER PUBLICATION testpub6 SET TABLE rf_tbl_abcd_part_pk WHERE (a > 99);

-- ok - PUBLISH_VIA_PARTITION_ROOT is false
-- Here the partition does not have a row filter
-- Col "a" is in replica identity.
ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0);
UPDATE rf_tbl_abcd_part_pk SET a = 1;

-- ok - PUBLISH_VIA_PARTITION_ROOT is true
-- Here the partition does not have a row filter, so the root filter
will be used.
-- Col "a" is in replica identity.
ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=1);
UPDATE rf_tbl_abcd_part_pk SET a = 1;

-- Now change the root filter to use a column "b" (which is not in the
replica identity)
ALTER PUBLICATION testpub6 SET TABLE rf_tbl_abcd_part_pk WHERE (b > 99);

-- ok - PUBLISH_VIA_PARTITION_ROOT is false
-- Here the partition does not have a row filter
-- Col "a" is in replica identity.
AL

Re: [BUG]Update Toast data failure in logical replication

2022-02-08 Thread Euler Taveira

On Tue, Feb 8, 2022, at 10:18 PM, tanghy.f...@fujitsu.com wrote:
> 2)
> + /*
> + * Check if the old tuple's attribute is stored externally and is a
> + * member of external_cols.
> + */
> + if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
> + bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
> +   external_cols))
> + *has_external = true;
> 
> If has_external is already true, it seems we don't need this check, so should 
> we
> check has_external first?
Is it worth it? I don't think so. It complicates a non-critical path. In
general, the condition will be executed once or twice.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-08 Thread Fujii Masao





On 2022/02/09 8:49, Ken Kato wrote:

On 2022-02-08 23:16, Fujii Masao wrote:

If you want to avoid the line longer than 80 columns, you should break
it into two or more rather than remove the test code, I think. What to
test is more important than formatting.

Also the following descriptions about formatting would be helpful.

---
https://www.postgresql.org/docs/devel/source-format.html

Limit line lengths so that the code is readable in an 80-column window.
(This doesn't mean that you must never go past 80 columns. For instance,
breaking a long error message string in arbitrary places just to keep
the code within 80 columns is probably not a net gain in readability.)
---

Therefore I'm ok with the patch that I posted upthread. Also I'm ok if
you will break that longer line into two and post new patch. Or if the
value '010' is really useless for the test purpose, I'm also ok if you
remove it. Thought?


Thank you for the explanation!

Even though the line is over 80 characters, it makes more sense to put in one 
line and it enhances readability IMO.
Also, '010' is good to have since it is the only octal value in the test.

Therefore, I think min_max_aggregates_for_xid8_v4.patch is the best one to go.


Agreed. So barring any objection, I will commit that patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: row filtering for logical replication

2022-02-08 Thread Amit Kapila

On Wed, Feb 9, 2022 at 7:07 AM Peter Smith  wrote:
>
> 2. src/backend/commands/publicationcmds.c -
> contain_mutable_or_ud_functions_checker
>
> +/* check_functions_in_node callback */
> +static bool
> +contain_mutable_or_user_functions_checker(Oid func_id, void *context)
> +{
> + return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE ||
> + func_id >= FirstNormalObjectId);
> +}
>
> I was wondering why is the checking for user function and mutable
> functions combined in one function like this.  IMO it might be better
> to have 2 "checker" callback functions instead of just one  - then the
> error messages can be split too so that only the relevant one is
> displayed to the user.
>

For that, we need to invoke the checker function multiple times for a
node and or expression. So, not sure if it is worth it.


-- 
With Regards,
Amit Kapila.

Re: is the base backup protocol used by out-of-core tools?

2022-02-08 Thread Michael Paquier

On Tue, Feb 08, 2022 at 11:26:41AM -0500, Robert Haas wrote:
> From that point of view, there's no downside to removing from the
> server the old syntax for BASE_BACKUP and the old protocol for taking
> backups. We can't remove anything from pg_basebackup, because it is
> our practice to make new versions of pg_basebackup work with old
> versions of the server. But the reverse is not true: an older
> pg_basebackup will categorically refuse to work with a newer server
> version. Therefore keeping the code for this stuff around in the
> server has no value ... unless there is out-of-core code that (a) uses
> the BASE_BACKUP command and (b) wouldn't immediately adopt the new
> syntax and protocol anyway. If there is, we might want to keep the
> backward-compatibility code around in the server for a few releases.
> If not, we should probably nuke that code to simplify things and
> reduce the maintenance burden.

This line of arguments looks sensible from here, so +1 for this
cleanup in the backend as of 15~.  I am not sure if we should worry
about out-of-core tools that use replication commands, either, and the
new grammar is easy to adapt to.

FWIW, one backup tool maintained by NTT is pg_rman, which does not use
the replication protocol AFAIK:
https://github.com/ossc-db/pg_rman
Perhaps Horiguchi-san or Fujita-san have an opinion on that.
--
Michael


signature.asc
Description: PGP signature

Re: [BUG]Update Toast data failure in logical replication

2022-02-08 Thread Amit Kapila

On Wed, Feb 9, 2022 at 7:16 AM Euler Taveira  wrote:
>
> On Tue, Feb 8, 2022, at 10:18 PM, tanghy.f...@fujitsu.com wrote:
>
> 2)
> + /*
> + * Check if the old tuple's attribute is stored externally and is a
> + * member of external_cols.
> + */
> + if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value1)) &&
> + bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber,
> +   external_cols))
> + *has_external = true;
>
> If has_external is already true, it seems we don't need this check, so should 
> we
> check has_external first?
>
> Is it worth it? I don't think so.
>

I also don't think it is worth adding such a check.


-- 
With Regards,
Amit Kapila.

Re: make MaxBackends available in _PG_init

2022-02-08 Thread Michael Paquier

On Tue, Feb 08, 2022 at 04:12:26PM -0500, Robert Haas wrote:
> After some investigation I've determined that it's no longer Friday
> afternoon. I also spent time investigating whether the patch had
> problems that would make me uncomfortable with the idea of committing
> it, and I did not find any. So, I committed it.

@@ -1641,8 +1642,8 @@ SignalBackends(void)
 * XXX in principle these pallocs could fail, which would be bad. Maybe
 * preallocate the arrays?  They're not that large, though.
 */
-   pids = (int32 *) palloc(MaxBackends * sizeof(int32));
-   ids = (BackendId *) palloc(MaxBackends * sizeof(BackendId));
+   pids = (int32 *) palloc(GetMaxBackends() * sizeof(int32));
+   ids = (BackendId *) palloc(GetMaxBackends() * sizeof(BackendId));

You could have optimized this one, while on it, as well as the ones in
pgstat_beinit() and pg_safe_snapshot_blocking_pids().  It is not hot,
but you did that for all the other callers of GetMaxBackends().  Just
saying..
--
Michael


signature.asc
Description: PGP signature

Re: is the base backup protocol used by out-of-core tools?

2022-02-08 Thread Ian Lawrence Barwick

2022年2月9日(水) 11:21 Michael Paquier :
>
> On Tue, Feb 08, 2022 at 11:26:41AM -0500, Robert Haas wrote:
> > From that point of view, there's no downside to removing from the
> > server the old syntax for BASE_BACKUP and the old protocol for taking
> > backups. We can't remove anything from pg_basebackup, because it is
> > our practice to make new versions of pg_basebackup work with old
> > versions of the server. But the reverse is not true: an older
> > pg_basebackup will categorically refuse to work with a newer server
> > version. Therefore keeping the code for this stuff around in the
> > server has no value ... unless there is out-of-core code that (a) uses
> > the BASE_BACKUP command and (b) wouldn't immediately adopt the new
> > syntax and protocol anyway. If there is, we might want to keep the
> > backward-compatibility code around in the server for a few releases.
> > If not, we should probably nuke that code to simplify things and
> > reduce the maintenance burden.
>
> This line of arguments looks sensible from here, so +1 for this
> cleanup in the backend as of 15~.  I am not sure if we should worry
> about out-of-core tools that use replication commands, either, and the
> new grammar is easy to adapt to.

FWIW the out-of-core utility I have some involvement in and which uses base
backup functionality, uses this by executing pg_basebackup, so no problem
there. But even if it used the replication protocol directly, it'd
just be a case of
adding another bit of Pg version-specific code handling; I certainly wouldn't
expect core to hold itself back for my convenience.

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Unnecessary call to resetPQExpBuffer in getIndexes

2022-02-08 Thread Julien Rouhaud

Hi,

I just noticed that e2c52beecd (adding PeterE in Cc) added a resetPQExpBuffer()
which seems unnecessary since the variable is untouched since the initial
createPQExpBuffer().

Simple patch attached.
>From 1ebddb696af3b77f7d373034b938a358529a9ea1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 9 Feb 2022 10:36:07 +0800
Subject: [PATCH v1] Remove unnecessary resetPQExpBuffer call.

Oversight in e2c52beecdea152ca680a22ef35c6a7da55aa30f.
---
 src/bin/pg_dump/pg_dump.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 3499c0a4d5..3b4b63d897 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6524,8 +6524,6 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int 
numTables)
}
appendPQExpBufferChar(tbloids, '}');
 
-   resetPQExpBuffer(query);
-
appendPQExpBuffer(query,
  "SELECT t.tableoid, t.oid, 
i.indrelid, "
  "t.relname AS indexname, "
-- 
2.35.0

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-08 Thread Kyotaro Horiguchi

Hi, Bharath.

At Tue, 8 Feb 2022 14:33:01 +0530, Bharath Rupireddy 
 wrote in 
> On Tue, Feb 8, 2022 at 2:10 PM Kyotaro Horiguchi
>  wrote:
> > > Thus.. the attached removes the ambiguity of of the proposed patch
> > > about the LSNs in the restartpoint-ending log message.
> >
> > Thoughts?
> 
> Thanks for the patch. I have few comments on the
> v1-0001-Get-rid-of-unused-path-to-handle-concurrent-check.patch

Thanks for checking this!

> 1) Can we have this Assert right after "skipping restartpoint, already
> performed at %X/%X" error message block? Does it make any difference?
> My point is that if at all, we were to assert this, why can't we do it
> before CheckPointGuts?
> + /* We mustn't have a concurrent checkpoint that advances checkpoint LSN */
> + Assert(lastCheckPoint.redo > ControlFile->checkPointCopy.redo);

No. The assertion checks if something wrong has happend while
CheckPointGuts - which may take a long time - is running.  If we need
to do that, it should be after CheckPointGuts. (However, I removed it
finally. See below.)

> 2) Related to the above Assert, do we really need an assertion or a FATAL 
> error?

It's just to make sure in case where that happens by any chance in
future.  But on second thought, as I mentioned, CreateRestartPoint is
called from standalone process or from checkpointer.  I added that
assertion at the beginning of CreateRestartPoint.  I think that
assertion is logically equal to the old assertion.

I remember that I felt uncomfortable with the lock-less behavior on
ControlFile, which makes the code a bit complex to read.  So I moved
"PriorRedoPtr = " line to within the lock section just below.

Addition to that, I feel being confused by the parallel-use of
lastCheckPoint.redo and RedoRecPtr So I replaced lastCheckPoint.redo
after assiging the former to the latter with RedoRecPtr.

> 3) Let's be consistent with "crash recovery" - replace
> "archive-recovery" with "archive recovery"?
> + * We have exited from archive-recovery mode after this restartpoint
> + * started. Crash recovery ever after should always recover to the end

That's sensible.  I found several existing use of archive-recovery in
xlog.c and a few other files but the fix for them is separated as
another patch (0005).

> 4) Isn't it enough to say "Crash recovery should always recover to the
> end of WAL."?
> + * started. Crash recovery ever after should always recover to the end

I think we need to explicitly say something like "we have exited
archive recovery while *this* restartpoint is running". I simplified
the sentence as the follows.

+  * Archive recovery have ended. Crash recovery ever after should
+  * always recover to the end of WAL.

> 5) Is there a reliable test case covering this code? Please point me
> if the test case is shared upthread somewhere.

Nothing.  Looking from the opposite side, the existing code for the
competing restartpoint/checkpoint case should have not been exercised
at least for these several major versions.  Instead, I added an
assertion at the beginning of CreateRestartPoint that asserts that
"this should be called only under standalone process or from
checkpointer.".  If that assert doesn't fire while the whole test, it
should be the proof of the premise for this patch is correct.

> 6) So, with this patch, the v8 patch-set posted at [1] doesn't need
> any changes IIUC. If that's the case, please feel free to post all the
> patches together such that they get tested in cfbot.

The two are different fixes so I don't think they are ought to be
merged together.

> [1] - 
> https://www.postgresql.org/message-id/CALj2ACUtZhTb%3D2ENkF3BQ3wi137uaGi__qzvXC-qFYC0XwjALw%40mail.gmail.com

As the old 0001, though I said it'fine :p) I added a comment that
reading ControlFile->checkPoint* is safe here.

The old 0002 (attached 0003) looks file to me.

The old 0003 (attached 0004):

+++ b/src/backend/access/rmgrdesc/xlogdesc.c
-   appendStringInfo(buf, "redo %X/%X; "
+   appendStringInfo(buf, "redo lsn %X/%X; "

It is shown in the context of a checkpoint record, so I think it is
not needed or rather lengthning the dump line uselessly. 

+++ b/src/backend/access/transam/xlog.c
-   (errmsg("request to flush past end of generated 
WAL; request %X/%X, current position %X/%X",
+   (errmsg("request to flush past end of generated 
WAL; request lsn %X/%X, current lsn %X/%X",

+++ b/src/backend/replication/walsender.c
-   (errmsg("requested starting point %X/%X 
is ahead of the WAL flush position of this server %X/%X",
+   (errmsg("requested starting point %X/%X 
is ahead of the WAL flush LSN of this server %X/%X",

"WAL" is upper-cased. So it seems rather strange that the "lsn" is
lower-cased.  In the first place the message doesn't look like a
user-facing error message and I feel we don't need position or lsn
there..

+++ b/src/bin/pg_rewin

Re: Plug minor memleak in pg_dump

2022-02-08 Thread Michael Paquier

On Wed, Feb 02, 2022 at 10:06:13AM +0100, Daniel Gustafsson wrote:
> The leak itself is clearly not something to worry about wrt memory pressure.
> We do read into tmp and free it in other places in the same function though 
> (as
> you note above), so for code consistency alone this is worth doing IMO (and it
> reduces the risk of static analyzers flagging this).
> 
> Unless objected to I will go ahead with getting this committed.

Looks like you forgot to apply that?
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-08 Thread Kyotaro Horiguchi

At Wed, 9 Feb 2022 11:01:57 +0900, Fujii Masao  
wrote in 
> Agreed. So barring any objection, I will commit that patch.

Sorry for being late, but I don't like the function names.

+xid8_larger(PG_FUNCTION_ARGS)
+xid8_smaller(PG_FUNCTION_ARGS)

Aren't they named like xid8gt and xid8lt conventionally?  At least the
name lacks a mention to equality.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 7:38 PM Joe Conway  wrote:
> If we were to start all over again with this feature my vote would be to
> do things differently than we have done. I would not have called them
> predefined roles, and I would have used attributes of roles (e.g. make
> rolsuper into a bitmap rather than a boolean) rather than role
> membership to implement them. But I didn't find time to participate in
> the original discussion or review/write the code, so I have little room
> to complain.

Yep, fair. I kind of like the predefined role concept myself. I find
it sort of elegant, mostly because I think it scales better than a
bitmask, which can run out of bits surprisingly rapidly. But opinions
can vary, of course.

> However since we did call these things predefined roles, and used role
> membership to implement them, I think they ought to behave both
> self-consistently as a group, and like other real roles.
>
> That is what this patch does unless I am missing something.

Yes, I see that.

> I guess an alternative is to discuss a "proper fix", but it seems to me
> that would be a version 16 thing given how late we are in this
> development cycle and how invasive it is likely to be. And doing nothing
> for pg15 is not a very satisfying proposition :-/

True. The fact that we don't have consistency among existing
predefined roles is IMHO the best argument for this patch. That surely
can't be right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-02-08 Thread Robert Haas

On Tue, Feb 8, 2022 at 9:38 PM Michael Paquier  wrote:
> On Tue, Feb 08, 2022 at 04:12:26PM -0500, Robert Haas wrote:
> > After some investigation I've determined that it's no longer Friday
> > afternoon. I also spent time investigating whether the patch had
> > problems that would make me uncomfortable with the idea of committing
> > it, and I did not find any. So, I committed it.
>
> @@ -1641,8 +1642,8 @@ SignalBackends(void)
>  * XXX in principle these pallocs could fail, which would be bad. 
> Maybe
>  * preallocate the arrays?  They're not that large, though.
>  */
> -   pids = (int32 *) palloc(MaxBackends * sizeof(int32));
> -   ids = (BackendId *) palloc(MaxBackends * sizeof(BackendId));
> +   pids = (int32 *) palloc(GetMaxBackends() * sizeof(int32));
> +   ids = (BackendId *) palloc(GetMaxBackends() * sizeof(BackendId));
>
> You could have optimized this one, while on it, as well as the ones in
> pgstat_beinit() and pg_safe_snapshot_blocking_pids().  It is not hot,
> but you did that for all the other callers of GetMaxBackends().  Just
> saying..

Well I didn't do anything myself except review and commit Nathan's
patch, so I suppose you mean he could have done that, but fair enough.
I don't mind if you want to change it around.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-08 Thread Kyotaro Horiguchi

At Wed, 09 Feb 2022 12:04:51 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 9 Feb 2022 11:01:57 +0900, Fujii Masao  
> wrote in 
> > Agreed. So barring any objection, I will commit that patch.
> 
> Sorry for being late, but I don't like the function names.
> 
> +xid8_larger(PG_FUNCTION_ARGS)
> +xid8_smaller(PG_FUNCTION_ARGS)
> 
> Aren't they named like xid8gt and xid8lt conventionally?  At least the
> name lacks a mention to equality.

Ah, sorry. the function returns larger/smaller one from the
parameters. Sorry for the noise.

regardes.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-08 Thread Dilip Kumar

On Tue, Feb 8, 2022 at 10:39 PM Robert Haas  wrote:
>
> I have been tied up with other things for a bit now and have not had
> time to look at this thread; sorry about that. I have a little more
> time available now so I thought I would take a look at this again and
> see where things stand.

Thanks for looking into this.

>
> Sadly, it doesn't appear to me that anyone has done any performance
> testing of this patch, along the lines suggested above or otherwise,
> and I think it's a crucial question for the patch.

Yeah, actually some performance testing started as shared by Ahustosh
[1] and soon after that we got side tracked by another issue[2] which
we thought had to be fixed before we proceed with this feature.

My reading of this
> thread is that nobody really likes the idea of maintaining two methods
> for performing CREATE DATABASE, but nobody wants to hose people who
> are using it to clone large databases, either. To some extent those
> things are inexorably in conflict. If we postulate that the 10TB
> template database is on a local RAID array with 40 spindles, while
> pg_wal is on an iSCSI volume that we access via a 128kB ISDN link,
> then the new system is going to be infinitely worse. But real
> situations aren't likely to be that bad, and it would be useful in my
> opinion to have an idea how bad they actually are.

Yeah that makes sense, I will work on performance testing in this line
and also on previous ideas you suggested.

> I'm somewhat inclined to propose that we keep the existing method
> around along with the new method. Even though nobody really likes
> that, we don't necessarily have to maintain both methods forever. If,
> say, we use the new method by default in all cases, but add an option
> to get the old method back if you need it, we could leave it that way
> for a few years and then propose removing the old method (and the
> switch to activate it) and see if anyone complains. That way, if the
> new method turns out to suck in certain cases, users have a way out.
> However, I still think doing some performance testing would be a
> really good idea. It's not a great plan to make decisions about this
> kind of thing in an information vacuum.

Yeah that makes sense to me.

Now, one bigger question is can we proceed with this patch without
fixing [2], IMHO, if we are deciding to keep the old method also
intact then one option could be that for now only change CREATE
DATABASE to support both old and new way of creating database and for
time being leave the ALTER DATABASE SET TABLESPACE alone and let it
work only with the old method?  And another option is that we first
fix the issue related to the tombstone file and then come back to
this?

IMHO, the first option could be better in a way that we have already
made better progress in this patch and this is in better shape than
the other patch we are trying to make for removing the tombstone
files.

[1]https://www.postgresql.org/message-id/CAE9k0Pkg20tHq8oiJ%2BxXa9%3Daf3QZCSYTw99aBaPthA1UMKhnTg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BTgmobM5FN5x0u3tSpoNvk_TZPFCdbcHxsXCoY1ytn1dXROvg%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Plug minor memleak in pg_dump

2022-02-08 Thread Bharath Rupireddy

On Wed, Feb 9, 2022 at 8:26 AM Michael Paquier  wrote:
>
> On Wed, Feb 02, 2022 at 10:06:13AM +0100, Daniel Gustafsson wrote:
> > The leak itself is clearly not something to worry about wrt memory pressure.
> > We do read into tmp and free it in other places in the same function though 
> > (as
> > you note above), so for code consistency alone this is worth doing IMO (and 
> > it
> > reduces the risk of static analyzers flagging this).
> >
> > Unless objected to I will go ahead with getting this committed.
>
> Looks like you forgot to apply that?

Attaching the patch that I suggested above, also the original patch
proposed by Georgios is at [1], leaving the decision to the committer
to pick up the best one.

[1] 
https://www.postgresql.org/message-id/oZwKiUxFsVaetG2xOJp7Hwao8F1AKIdfFDQLNJrnwoaxmjyB-45r_aYmhgXHKLcMI3GT24m9L6HafSi2ns7WFxXe0mw2_tIJpD-Z3vb_eyI%3D%40pm.me

Regards,
Bharath Rupireddy.
From 335db3331a99a6cfc35608cdbec204d843b8ac55 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 9 Feb 2022 03:25:50 +
Subject: [PATCH v1] Fix a memory leak while reading Table of Contents

ReadStr() returns a malloc'ed pointer. Using it directly in a function
call results in a memleak. Rewrite to use a temporary buffer which is
then freed.
---
 src/bin/pg_dump/pg_backup_archiver.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 49bf0907cd..e62be78982 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2500,6 +2500,8 @@ ReadToc(ArchiveHandle *AH)
 
 	for (i = 0; i < AH->tocCount; i++)
 	{
+		bool	is_supported = true;
+
 		te = (TocEntry *) pg_malloc0(sizeof(TocEntry));
 		te->dumpId = ReadInt(AH);
 
@@ -2574,7 +2576,20 @@ ReadToc(ArchiveHandle *AH)
 			te->tableam = ReadStr(AH);
 
 		te->owner = ReadStr(AH);
-		if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
+
+		if (AH->version < K_VERS_1_9)
+			is_supported = false;
+		else
+		{
+			tmp = ReadStr(AH);
+
+			if (strcmp(tmp, "true") == 0)
+is_supported = false;
+
+			 pg_free(tmp);
+		}
+
+		if (!is_supported)
 			pg_log_warning("restoring tables WITH OIDS is not supported anymore");
 
 		/* Read TOC entry dependencies */
-- 
2.25.1

Re: make MaxBackends available in _PG_init

2022-02-08 Thread Michael Paquier

On Tue, Feb 08, 2022 at 10:57:37PM -0500, Robert Haas wrote:
> Well I didn't do anything myself except review and commit Nathan's
> patch, so I suppose you mean he could have done that, but fair enough.
> I don't mind if you want to change it around.

Okay, I'd rather apply the same rule everywhere for consistency, then,
like in the attached.  That's minimal, still.
--
Michael
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index d44001a49f..455d895a44 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1633,6 +1633,7 @@ SignalBackends(void)
 	int32	   *pids;
 	BackendId  *ids;
 	int			count;
+	int			max_backends = GetMaxBackends();
 
 	/*
 	 * Identify backends that we need to signal.  We don't want to send
@@ -1642,8 +1643,8 @@ SignalBackends(void)
 	 * XXX in principle these pallocs could fail, which would be bad. Maybe
 	 * preallocate the arrays?  They're not that large, though.
 	 */
-	pids = (int32 *) palloc(GetMaxBackends() * sizeof(int32));
-	ids = (BackendId *) palloc(GetMaxBackends() * sizeof(BackendId));
+	pids = (int32 *) palloc(max_backends * sizeof(int32));
+	ids = (BackendId *) palloc(max_backends * sizeof(BackendId));
 	count = 0;
 
 	LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 1528d788d0..ee2e15c17e 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2924,6 +2924,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	int			max_backends = GetMaxBackends();
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2942,12 +2943,12 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 			vxids = (VirtualTransactionId *)
 MemoryContextAlloc(TopMemoryContext,
    sizeof(VirtualTransactionId) *
-   (GetMaxBackends() + max_prepared_xacts + 1));
+   (max_backends + max_prepared_xacts + 1));
 	}
 	else
 		vxids = (VirtualTransactionId *)
 			palloc0(sizeof(VirtualTransactionId) *
-	(GetMaxBackends() + max_prepared_xacts + 1));
+	(max_backends + max_prepared_xacts + 1));
 
 	/* Compute hash code and partition lock, and look up conflicting modes. */
 	hashcode = LockTagHashCode(locktag);
@@ -3104,7 +3105,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 
 	LWLockRelease(partitionLock);
 
-	if (count > GetMaxBackends() + max_prepared_xacts)	/* should never happen */
+	if (count > max_backends + max_prepared_xacts)	/* should never happen */
 		elog(PANIC, "too many conflicting locks found");
 
 	vxids[count].backendId = InvalidBackendId;
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 4e517b28e1..944cd6df03 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -559,13 +559,14 @@ pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
 	int		   *blockers;
 	int			num_blockers;
 	Datum	   *blocker_datums;
+	int			max_backends = GetMaxBackends();
 
 	/* A buffer big enough for any possible blocker list without truncation */
-	blockers = (int *) palloc(GetMaxBackends() * sizeof(int));
+	blockers = (int *) palloc(max_backends * sizeof(int));
 
 	/* Collect a snapshot of processes waited for by GetSafeSnapshot */
 	num_blockers =
-		GetSafeSnapshotBlockingPids(blocked_pid, blockers, GetMaxBackends());
+		GetSafeSnapshotBlockingPids(blocked_pid, blockers, max_backends);
 
 	/* Convert int array to Datum array */
 	if (num_blockers > 0)


signature.asc
Description: PGP signature

RE: [BUG]Update Toast data failure in logical replication

2022-02-08 Thread tanghy.f...@fujitsu.com

On Tue, Feb 8, 2022 3:18 AM Andres Freund  wrote:
> 
> On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
> > Right, and it is getting changed. We are just printing the first 200
> > characters (by using SQL [1]) from the decoded tuple so what is shown
> > in the results is the initial 200 bytes.
> 
> Ah, I knew I must have been missing something.
> 
> 
> > The complete decoded data after the patch is as follows:
> 
> Hm. I think we should change the way the strings are shortened - otherwise we
> don't really verify much in that test. Perhaps we could just replace the long
> repetitive strings with something shorter in the output?
> 
> E.g. using something like regexp_replace(data,
> '(1234567890|9876543210){200}', '\1{200}','g')
> inside the substr().
> 
> Wonder if we should deduplicate the number of different toasted strings in the
> file to something that'd allow us to have a single "redact_toast" function or
> such. There's too many different ones to have a reasonbly simple redaction
> function right now. But that's perhaps better done separately.
> 

I tried to make the output shorter using your suggestion like the following 
SQL, 
please see the attached patch, which is based on v8 patch[1].

SELECT substr(regexp_replace(data, '(1234567890|9876543210){200}', 
'\1{200}','g'), 1, 200) FROM pg_logical_slot_get_changes('regression_slot', 
NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');

Note that some strings are still longer than 200 characters even though they 
have 
been shorter, so they can't be shown entirely.

e.g.
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' 
new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum 
toasted_col1[text]:unchanged-toast-datum toasted_col2[te

The entire string is:
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' 
new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum 
toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'9876543210{200}'

Maybe it's better to change the substr length to 250 to show the entire string, 
or we 
can do it as separate HEAD only improvement where we can deduplicate some of the
other long strings as well. Thoughts?

[1] 
https://www.postgresql.org/message-id/CAA4eK1L_Z_2LDwMNbGrwoO%2BFc-2Q04YORQSA9UfGUTMQpy2O1Q%40mail.gmail.com

Regards,
Tang


improve_toast_test.diff
Description: improve_toast_test.diff

Re: decoupling table and index vacuum

2022-02-08 Thread Dilip Kumar

On Tue, Feb 8, 2022 at 10:42 PM Peter Geoghegan  wrote:
>
> On Sun, Feb 6, 2022 at 11:25 PM Dilip Kumar  wrote:
> > > One thing we could try doing in order to make that easier would be:
> > > tweak things so that when autovacuum vacuums the table, it only
> > > vacuums the indexes if they meet some threshold for bloat. I'm not
> > > sure exactly what happens with the heap vacuuming then - do we do
> > > phases 1 and 2 always, or a combined heap pass, or what? But if we
> > > pick some criteria that vacuums indexes sometimes and not other times,
> > > we can probably start doing some meaningful measurement of whether
> > > this patch is making bloat better or worse, and whether it's using
> > > fewer or more resources to do it.
> >
> > I think we can always trigger phase 1 and 2 and phase 2 will only
> > vacuum conditionally based on if all the indexes are vacuumed for some
> > conveyor belt pages so we don't have risk of scanning without marking
> > anything unused.
>
> Not sure what you mean about a risk of scanning without marking any
> LP_DEAD items as LP_UNUSED.

I mean for testing purposes if we integrate with autovacuum such that,
1) always do the first pass of the vacuum 2) index vacuum will be done
only for the indexes which have bloated more than some threshold and
then 3) we can always trigger the heap vacuum second pass.  So my
point was even if from autovacuum we trigger the second vacuum pass
every time it will not do anything if all the indexes are not
vacuumed.

If VACUUM always does some amount of this,
> then it follows that the new mechanism added by the patch just can't
> safely avoid any work at all, making it all pointless. We have to
> expect heap vacuuming to take place much less often with the patch.
> Simply because that's what the invariant described in comments above
> lazy_scan_heap() requires.

In the second pass we are making sure that we don't mark any LP_DEAD
to LP_UNUSED for which index vacuum is not done.  Basically we are
storing dead items in the conveyor belt and whenever we do the index
pass we remember upto which conveyor belt page index vacuum is done.
And before starting the heap second pass we will find the minimum
conveyor belt page upto which all the indexes have been vacuumed.

> Note that this is not the same thing as saying that we do less
> *absolute* heap vacuuming with the conveyor belt -- my statement about
> less heap vacuuming taking place is *only* true relative to the amount
> of other work that happens in any individual "shortened" VACUUM
> operation. We could do exactly the same total amount of heap vacuuming
> as before (in a version of Postgres without the conveyor belt but with
> the same settings), but much *more* index vacuuming (at least for one
> or two problematic indexes).
>
> > And we can try to measure with other approaches as
> > well where we completely avoid phase 2 and it will be done only along
> > with phase 1 whenever applicable.
>
> I believe that the main benefit of the dead TID conveyor belt (outside
> of global index use cases) will be to enable us to do more (much more)
> index vacuuming for one index in particular. So it's not really about
> doing less index vacuuming or less heap vacuuming -- it's about doing
> a *greater* amount of *useful* index vacuuming, in less time. There is
> often some way in which failing to vacuum one index for a long time
> does lasting damage to the index structure.

I agree with the point.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: decoupling table and index vacuum

2022-02-08 Thread Dilip Kumar

On Wed, Feb 9, 2022 at 1:21 AM Peter Geoghegan  wrote:
>

> The btree side of this shouldn't care at all about dead tuples (in
> general we focus way too much on dead tuples, and way too little on
> pages). With bottom-up index deletion the number of dead tuples in the
> index is just about completely irrelevant. It's entirely possible and
> often even likely that 20%+ of all index tuples will be dead at any
> one time, when the optimization perfectly preserves the index
> structure.
>
> The btree side of the index AM API should be focussing on the growth
> in index size, relative to some expectation (like maybe the growth for
> whatever index on the same table has grown the least since last time,
> accounting for obvious special cases like partial indexes). Perhaps
> we'd give some consideration to bulk deletes, too. Overall, it should
> be pretty simple, and should sometimes force us to do one of these
> "dynamic mini vacuums" of the index just because we're not quite sure
> what to do. There is nothing wrong with admitting the uncertainty.

I agree with the point that we should be focusing more on index size
growth compared to dead tuples.  But I don't think that we can
completely ignore the number of dead tuples.  Although we have the
bottom-up index deletion but whether the index structure will be
preserved or not will depend upon what keys we are inserting next.  So
for example if there are 80% dead tuples but so far index size is fine
then can we avoid vacuum? If we avoid vacuuming then it is very much
possible that in some cases we will create a huge bloat e.g. if we are
inserting some keys which can not take advantage of bottom up
deletion.  So IMHO the decision should be a combination of index size
bloat and % dead tuples.  Maybe we can add more weight to the size
bloat and less weight to % dead tuple but we should not completely
ignore it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Make mesage at end-of-recovery less scary.

2022-02-08 Thread Kyotaro Horiguchi

Hi, Ashutosh.

At Tue, 8 Feb 2022 18:35:34 +0530, Ashutosh Sharma  
wrote in 
> Here are some of my review comments on the v11 patch:

Thank you for taking a look on this.

> -   (errmsg_internal("reached end of WAL in
> pg_wal, entering archive recovery")));
> +   (errmsg_internal("reached end of WAL at %X/%X
> on timeline %u in %s during crash recovery, entering archive
> recovery",
> +LSN_FORMAT_ARGS(ErrRecPtr),
> +replayTLI,
> +xlogSourceNames[currentSource])));
> 
> Why crash recovery? Won't this message get printed even during PITR?

It is in the if-block with the following condition.

>* If archive recovery was requested, but we were still doing
>* crash recovery, switch to archive recovery and retry using the
>* offline archive. We have now replayed all the valid WAL in
>* pg_wal, so we are presumably now consistent.
...
>if (!InArchiveRecovery && ArchiveRecoveryRequested)

This means archive-recovery is requested but not started yet. That is,
we've just finished crash recovery.  The existing comment cited
together is mentioning that.

At the end of PITR (or archive recovery), the other code works.

> /*
>  * If we haven't emit an error message, we have safely reached the
>  * end-of-WAL.
>  */
> if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG)
> {
>   char   *fmt;
> 
>   if (StandbyMode)
>   fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u 
> in %s during standby mode");
>   else if (InArchiveRecovery)
>   fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u 
> in %s during archive recovery");
>   else
>   fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u 
> in %s during crash recovery");

The last among the above messages is choosed when archive-recovery is
not requested at all.

> I just did a PITR and could see these messages in the logfile.

Yeah, the log lines are describing that the server starting with crash
recovery to run PITR.

> 2022-02-08 18:00:44.367 IST [86185] LOG:  starting point-in-time
> recovery to WAL location (LSN) "0/5227790"
> 2022-02-08 18:00:44.368 IST [86185] LOG:  database system was not
> properly shut down; automatic recovery in progress

Well. I guess that the "automatic recovery" is ambiguous.  Does it
make sense if the second line were like the follows instead?

+ 2022-02-08 18:00:44.368 IST [86185] LOG:  database system was not properly 
shut down; crash recovery in progress

> 2022-02-08 18:00:44.369 IST [86185] LOG:  redo starts at 0/14DC8D8
> 2022-02-08 18:00:44.978 IST [86185] DEBUG1:  reached end of WAL at
> 0/3D0 on timeline 1 in pg_wal during crash recovery, entering
> archive recovery

(I don't include this change in this patch since it would be another
issue.)

> ==
> 
> +   /*
> +* If we haven't emit an error message, we have safely reached the
> +* end-of-WAL.
> +*/
> +   if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG)
> +   {
> +   char   *fmt;
> +
> +   if (StandbyMode)
> +   fmt = gettext_noop("reached end of WAL at %X/%X on
> timeline %u in %s during standby mode");
> +   else if (InArchiveRecovery)
> +   fmt = gettext_noop("reached end of WAL at %X/%X on
> timeline %u in %s during archive recovery");
> +   else
> +   fmt = gettext_noop("reached end of WAL at %X/%X on
> timeline %u in %s during crash recovery");
> +
> +   ereport(LOG,
> +   (errmsg(fmt, LSN_FORMAT_ARGS(ErrRecPtr), replayTLI,
> +   xlogSourceNames[currentSource]),
> +(errormsg ? errdetail_internal("%s", errormsg) : 
> 0)));
> +   }
> 
> Doesn't it make sense to add an assert statement inside this if-block
> that will check for xlogreader->EndOfWAL?

Good point.  On second thought, the condition there is flat wrong.
The message is "reached end of WAL" so the condition should be
EndOfWAL.  On the other hand we didn't make sure that the error
message for the stop is emitted anywhere.  Thus I don't particularly
want to be strict on that point.

I made the following change for this.

-   if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG)
+   if (xlogreader->EndOfWAL)



> ==
> 
> -* We only end up here without a message when XLogPageRead()
> -* failed - in that case we already logged something. In
> -* StandbyMode that only happens if we have been triggered, so we
> -* shouldn't loop anymore in that case.
> +* If we get here for other than end-of-wal, emit the error
> +* message right now. Otherwise the message if any is shown as a
>

Re: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-02-08 Thread Kyotaro Horiguchi

Sorry for missing this.

At Thu, 27 Jan 2022 19:26:39 +0900, Fujii Masao  
wrote in 
> 
> On 2022/01/27 17:10, Kyotaro Horiguchi wrote:
> > I don't object to adding more meaningful replacements, but more escape
> > sequence makes me anxious about the increased easiness of exceeding
> > the size limit of application_name.
> 
> If this is really an issue, it might be time to reconsider the size
> limit of application_name. If it's considered too short, the patch
> that enlarges it should be proposed separately.

That makes sense.

> >  Considering that it is used to
> > identify fdw-initinator server, we might need to add padding (or
> > rather truncating) option in the escape sequence syntax, then warn
> > about truncated application_names for safety.
> 
> I failed to understand this. Could you tell me why we might need to
> add padding option here?

My point was "truncating" option, which limits the length of the
replacement string. But expanding the application_name limit is more
sensible.

> > Is the reason for 'C' in upper-case to avoid possible conflict with
> > 'c' of log_line_prefix?
> 
> Yes.
> 
> > I'm not sure that preventive measure is worth
> > doing.  Looking the escape-sequence spec alone, it seems to me rather
> > strange that an upper-case letter is used in spite of its lower-case
> > is not used yet.
> 
> I have no strong opinion about using %C. If there is better character
> for the escape sequence, I'm happy to use it. So what character is
> more proper? %c?

I think so.

> > Otherwise all looks fine to me except the lack of documentation.
> 
> The patch updated postgres-fdw.sgml, but you imply there are other
> documents that the patch should update? Could you tell me where the
> patch should update?

Mmm. I should have missed that part.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

79 matches

Mail list logo