date:20211005

Re: Add jsonlog log_destination for JSON server logs

2021-10-05 Thread Michael Paquier

On Wed, Sep 29, 2021 at 11:02:10AM +0900, Michael Paquier wrote:
> This happens to not be a problem as only 32 bits are significant for
> handles for both Win32 and Win64.  This also means that we should be
> able to remove the use for "long" in this code, making the routines
> more symmetric.  I have done more tests with Win32 and Win64, and
> applied it.  I don't have MinGW environments at hand, but looking at
> the upstream code that should not matter.  The buildfarm will let
> us know soon enough if there is a problem thanks to the TAP tests of
> pg_ctl.

So, I have been looking at the rest of the patch set for the last
couple of days, and I think that I have spotted all the code paths
that need to be smarter when it comes to multiple file-based log
destinations.  Attached is a new patch set:
- 0001 does some refactoring of the file rotation in syslogger.c,
that's the same patch as previously posted.
- 0002 is more refactoring of elog.c, adding routines for the start
timestamp, log timestamp, the backend type and an extra one to check
if a query can be logged or not.
- 0003 is a change to send_message_to_server_log() to be smarter
regarding the fallback to stderr if a csvlog (or a jsonlog!) entry
cannot be logged because the redirection is not ready yet.  The code
of HEAD processes first stderr, then csvlog, with csvlog moving back
to stderr if not done yet.  That's a bit strange, because for example
on WIN32 we would lose any csvlog entry for a service.  I propose here
to do csvlog first, and fallback to stderr so as it gets done in one
code path instead of two.  I have spent quite a bit of time thinking
about the best way to handle the case of multiple file log
destinations here because we don't want to log multiple times to
stderr if csvlog and jsonlog are both enabled.  And I think that this
is the simplest thing we could do.
- 0004 moves the CSV-specific code into its own file.  This include
some refactoring of elog.c that should be moved to 0002, as this
requires more routines of elog.c to be published:
-- write_pipe_chunks()
-- error_severity()
- 0005 is the main meat, that introduces JSON as log_destination.
This compiles and passes all my tests, but I have not really done an
in-depth review of this code yet.

0002 and 0004 could be more polished and most of their pieces had
better be squashed together.  0003, though, would improve the case of
WIN32 where only csvlog is enabled so as log entries are properly
redirected to the event logs if the redirection is not done yet.  I'd
like to move on with 0001 and 0003 as independent pieces.

Sehrope, any thoughts?
--
Michael
From 4641f8535c2be4b75285ac34d87e883bec250e48 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 28 Sep 2021 11:48:18 +0900
Subject: [PATCH v4 1/5] Refactor per-destination file rotation in syslogger

---
 src/backend/postmaster/syslogger.c | 242 ++---
 1 file changed, 119 insertions(+), 123 deletions(-)

diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index c5f9c5202d..4a019db7f4 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -87,7 +87,7 @@ static bool rotation_disabled = false;
 static FILE *syslogFile = NULL;
 static FILE *csvlogFile = NULL;
 NON_EXEC_STATIC pg_time_t first_syslogger_file_time = 0;
-static char *last_file_name = NULL;
+static char *last_sys_file_name = NULL;
 static char *last_csv_file_name = NULL;
 
 /*
@@ -274,7 +274,7 @@ SysLoggerMain(int argc, char *argv[])
 	 * time because passing down just the pg_time_t is a lot cheaper than
 	 * passing a whole file path in the EXEC_BACKEND case.
 	 */
-	last_file_name = logfile_getname(first_syslogger_file_time, NULL);
+	last_sys_file_name = logfile_getname(first_syslogger_file_time, NULL);
 	if (csvlogFile != NULL)
 		last_csv_file_name = logfile_getname(first_syslogger_file_time, ".csv");
 
@@ -1241,16 +1241,118 @@ logfile_open(const char *filename, const char *mode, bool allow_errors)
 	return fh;
 }
 
+/*
+ * Do logfile rotation for a single destination, as specified by target_dest.
+ * The information stored in *last_file_name and *logFile is updated on a
+ * successful file rotation.
+ *
+ * Returns false if the rotation has been stopped, and true to move on to
+ * the processing of other formats.
+ */
+static bool
+logfile_rotate_dest(bool time_based_rotation, int size_rotation_for,
+	pg_time_t fntime, int target_dest,
+	char **last_file_name, FILE **logFile)
+{
+	char   *logFileExt;
+	char   *filename;
+	FILE   *fh;
+
+	/*
+	 * If the target destination was just turned off, close the previous
+	 * file and unregister its data.  This cannot happen for stderr as
+	 * syslogFile is assumed to be always opened even if
+	 * LOG_DESTINATION_STDERR is disabled.
+	 */
+	if ((Log_destination & target_dest) == 0 &&
+		target_dest != LOG_DESTINATION_STDERR)
+	{
+		if (*logFile != NULL)
+			fclose(*logFile);
+		*logFile = NULL;
+		if (*last_file_name !=

Re: jsonb crash

2021-10-05 Thread David Rowley

On Thu, 30 Sept 2021 at 11:54, Tom Lane  wrote:
>
> David Rowley  writes:
> > This allows us to memoize any join expression, not just equality
> > expressions.
>
> I am clearly failing to get through to you.  Do I need to build
> an example?

Taking your -0.0 / +0.0 float example, If I understand correctly due
to -0.0 and +0.0 hashing to the same value and being classed as equal,
we're really only guaranteed to get the same rows if the join
condition uses the float value (in this example) directly.

If for example there was something like a function we could pass that
float value through that was able to distinguish -0.0 from +0.0, then
that could cause issues as the return value of that function could be
anything and have completely different join partners on the other side
of the join.

A function something like:

create or replace function text_sign(f float) returns text as
$$
begin
if f::text like '-%' then
return 'neg';
else
return 'pos';
end if;
end;
$$ language plpgsql;

would be enough to do it.  If the join condition was text_sign(t1.f) =
 t2.col and the cache key was t1.f rather than text_sign(t1.f)

On Thu, 30 Sept 2021 at 10:54, Tom Lane  wrote:
> So I'm now thinking you weren't that far wrong to be looking at
> hashability of the top-level qual operator.  What is missing is
> that you have to restrict candidate cache keys to be the *direct*
> arguments of such an operator.  Looking any further down in the
> expression introduces untenable assumptions.

I think I also follow you now with the above.  The problem is that if
the join operator is able to distinguish something that the equality
operator and hash function are not then we have the same problem.
Restricting the join operator to hash equality ensures that the join
condition cannot distinguish anything that we cannot distinguish in
the cache hash table.

David

RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2021-10-05 Thread osumi.takami...@fujitsu.com

On Friday, September 24, 2021 5:03 PM I wrote:
> On Tuesday, March 16, 2021 1:35 AM Oh, Mike  wrote:
> > We noticed that the logical replication could fail when the
> > Standby::RUNNING_XACT record is generated in the middle of a catalog
> > modifying transaction and if the logical decoding has to restart from
> > the RUNNING_XACT WAL entry.
> ...
> > Proposed solution:
> > If we’re decoding a catalog modifying commit record, then check
> > whether it’s part of the RUNNING_XACT xid’s processed @ the
> > restart_lsn. If so, then add its xid & subxacts in the committed txns
> > list in the logical decoding snapshot.
> >
> > Please refer to the attachment for the proposed patch.
> 
> 
> Let me share some review comments for the patch.

> (3) suggestion of small readability improvement
> 
> We calculate the same size twice here and DecodeCommit.
> I suggest you declare a variable that stores the computed result of size, 
> which
> might shorten those codes.
> 
> +   /*
> +* xl_running_xacts contains a xids
> Flexible Array
> +* and its size is subxcnt + xcnt.
> +* Take that into account while
> allocating
> +* the memory for last_running.
> +*/
> +   last_running = (xl_running_xacts *)
> malloc(sizeof(xl_running_xacts)
> +
> + sizeof(TransactionId )
> +
> * (running->subxcnt + running->xcnt));
> +   memcpy(last_running, running,
> sizeof(xl_running_xacts)
> +
> + (sizeof(TransactionId)
> +
> + * (running->subxcnt + running->xcnt)));
Let me add one more basic review comment in DecodeStandbyOp().

Why do you call raw malloc directly ?
You don't have the basic check whether the return value is
NULL or not and intended to call palloc here instead ?


Best Regards,
Takamichi Osumi

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Daniel Gustafsson

> On 5 Oct 2021, at 05:12, Tom Lane  wrote:

> In short: (a) we're not testing against anything older than 5.8.3
> and (b) it seems quite unlikely that anybody cares about 5.8.x anyway.
> So if we want to mess with this, maybe we should set the cutoff
> to 5.8.3 not 5.8.1.

Not being able to test against older versions in the builfarm seems like a
pretty compelling reason to set 5.8.3 as the required version.

--
Daniel Gustafsson   https://vmware.com/

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-10-05 Thread Dilip Kumar

On Mon, Oct 4, 2021 at 2:51 PM Dilip Kumar  wrote:

I have implemented the patch with approach2 as well, i.e. instead of
scanning the pg-class, we scan the directory.

IMHO, we have already discussed most of the advantages and
disadvantages of both approaches so I don't want to mention those
again. But I have noticed one more issue with the approach2,
basically, if we scan the directory then we don't have any way to
identify the relation-OID and that is required in order to acquire the
relation lock before copying it, right?

Patch details:
0001 to 0006 implements an approach1
0007 removes the code of pg_class scanning and adds the directory scan.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v5-0001-Refactor-relmap-load-and-relmap-write-functions.patch
Description: Binary data


v5-0003-Refactor-index_copy_data.patch
Description: Binary data


v5-0004-Extend-bufmgr-interfaces.patch
Description: Binary data


v5-0005-New-interface-to-lock-relation-id.patch
Description: Binary data


v5-0002-Extend-relmap-interfaces.patch
Description: Binary data


v5-0006-WAL-logged-CREATE-DATABASE.patch
Description: Binary data


v5-0007-POC-WAL-LOG-CREATE-DATABASE-APPROACH-2.patch
Description: Binary data

Re: Duplicat-word typos in code comments

2021-10-05 Thread Daniel Gustafsson

> On 4 Oct 2021, at 21:54, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 4 Oct 2021, at 15:56, Tom Lane  wrote:
>>> I used to think it was better to go ahead and manually reflow, if you
>>> use an editor that makes that easy.  That way there are fewer commits
>>> touching any one line of code, which is good when trying to review
>>> code history.  However, now that we've got the ability to make "git
>>> blame" ignore pgindent commits, maybe it's better to leave that sort
>>> of mechanical cleanup to pgindent, so that the substantive patch is
>>> easier to review.
> 
>> Yeah, that's precisely why I did it.  Since we can skip over pgindent sweeps 
>> it
>> makes sense to try and minimize such changes to make code archaeology easier.
>> There are of course cases when the result will be such an eyesore that we'd
>> prefer to have it done sooner, but in cases like these where line just got 
>> one
>> word shorter it seemed an easy choice.
> 
> Actually though, there's another consideration: if you leave
> not-correctly-pgindented code laying around, it causes problems
> for the next hacker who modifies that file and wishes to neaten
> up their own work by pgindenting it.  They can either tediously
> reverse out part of the delta, or commit a patch that includes
> entirely-unrelated cosmetic changes, neither of which is
> pleasant.

Right, this is mainly targeting comments where changing a word on the first
line in an N line long comment can have the knock-on effect of changing N-1
lines just due to reflowing.  This is analogous to wrapping existing code in a
new block, causing a re-indentation to happen, except that for comments it can
sometimes be Ok to leave (as in this particular case).  At the end of the day,
it's all a case-by-case basis trade-off call.

--
Daniel Gustafsson   https://vmware.com/

Fix pg_log_backend_memory_contexts() 's delay

2021-10-05 Thread bt21tanigaway


Hi,

Log output takes time between several seconds to a few tens when using 
‘SELECT pg_log_backend_memory_contexts(1234)’ with PID of ‘autovacuum 
launcher’.

I made a patch for this problem.

regards,
Koyu Tanigawa
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3b3df8fa8c..7ebe5ae8dc 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -836,6 +836,9 @@ HandleAutoVacLauncherInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	if (LogMemoryContextPending)
+	  ProcessLogMemoryContextInterrupt();
+
 	/* Process sinval catchup interrupts that happened while sleeping */
 	ProcessCatchupInterrupt();
 }

Re: refactoring basebackup.c

2021-10-05 Thread Jeevan Ladhe

Hi Robert,

I have fixed the autoFlush issue. Basically, I was wrongly initializing
the lz4 preferences in bbsink_lz4_begin_archive() instead of
bbsink_lz4_begin_backup(). I have fixed the issue in the attached
patch, please have a look at it.

Regards,
Jeevan Ladhe


On Fri, Sep 24, 2021 at 6:27 PM Jeevan Ladhe 
wrote:

> Still, there's got to be a simple way to make this work, and it can't
>> involve setting autoFlush. Like, look at this:
>>
>> https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
>>
>> That uses the same APIs that we're here and a fixed-size input buffer
>> and a fixed-size output buffer, just as we have here, to compress a
>> file. And it probably works, because otherwise it likely wouldn't be
>> in the "examples" directory. And it sets autoFlush to 0.
>>
>
> Thanks, Robert. I have seen this example, and it is similar to what we
> have.
> I went through each of the steps and appears that I have done it correctly.
> I am still trying to debug and figure out where it is going wrong.
>
> I am going to try hooking the pg_basebackup with the lz4 source and
> debug both the sources.
>
> Regards,
> Jeevan Ladhe
>


lz4_compress_v4.patch
Description: Binary data

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Dagfinn Ilmari Mannsåker

Daniel Gustafsson  writes:

>> On 5 Oct 2021, at 05:12, Tom Lane  wrote:
>
>> In short: (a) we're not testing against anything older than 5.8.3
>> and (b) it seems quite unlikely that anybody cares about 5.8.x anyway.
>> So if we want to mess with this, maybe we should set the cutoff
>> to 5.8.3 not 5.8.1.
>
> Not being able to test against older versions in the builfarm seems like a
> pretty compelling reason to set 5.8.3 as the required version.

Looking at the list of Perl versions shipped with various OSes
(https://www.cpan.org/ports/binaries.html), bumping the minimum
requirement from 5.8.1 to 5.8.3 will affect the following OS versions,
which shipped 5.8.1 or 5.8.2:

AIX: 5.3, 6.1
Fedora: 1 (Yarrow)
macOS: 10.3 (Panther)
Redhat: 2.1
Slackware: 9.0, 9.1
OpenSUSE: 8.2

The only one of these that I can imagine we might possibly care about is
AIX, but I don't know what versions we claim to support or people
actually run PostgreSQL on (and want to upgrade to 15).  The docs at
https://www.postgresql.org/docs/current/installation-platform-notes.html
just say that "AIX versions before about 6.1 […] are not recommended".

For reference, 6.1 was released on 2007-11-09 and EOL on 2017-04-30, and
7.1 was released on 2010-09-10 and is supported until 2023-04-30.

- ilmari

Re: Fix pg_log_backend_memory_contexts() 's delay

2021-10-05 Thread Bharath Rupireddy

On Tue, Oct 5, 2021 at 2:50 PM bt21tanigaway
 wrote:
>
> Hi,
>
> Log output takes time between several seconds to a few tens when using
> ‘SELECT pg_log_backend_memory_contexts(1234)’ with PID of ‘autovacuum
> launcher’.
> I made a patch for this problem.

Thanks for the patch. Do we also need to do the change in
HandleMainLoopInterrupts, HandleCheckpointerInterrupts,
HandlePgArchInterrupts, HandleWalWriterInterrupts as we don't call
CHECK_FOR_INTERRUPTS() there? And there are also other processes:
pgstat process/statistics collector, syslogger, walreceiver,
walsender, background workers, parallel workers and so on. I think we
need to change in all the processes where we don't call
CHECK_FOR_INTERRUPTS() in their main loops.

Before doing that, we need to be sure of whether we allow only the
user sessions/backend process's memory contexts with
pg_log_backend_memory_contexts or any process that is forked by
postmaster i.e. auxiliary process? The function
pg_log_backend_memory_contexts supports the processes that return a
pgproc structure from this function BackendPidGetProc, it doesn't
attempt to get pgproc structure from AuxiliaryPidGetProc.

Regards,
Bharath Rupireddy.

Re: Next Steps with Hash Indexes

2021-10-05 Thread Simon Riggs

On Mon, 27 Sept 2021 at 06:52, Amit Kapila  wrote:
>
> On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar  wrote:
> >
> > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro  
> > wrote:
> > >
> > > And to get the multi-column hash index selected, we may set
> > > enable_hashjoin =off, to avoid any condition become join condition,
> > > saw similar behaviors in other DBs as well...
> >
> > This may be related to Tom's point that, if some of the quals are
> > removed due to optimization or converted to join quals, then now, even
> > if the user has given qual on all the key columns the index scan will
> > not be selected because we will be forcing that the hash index can
> > only be selected if it has quals on all the key attributes?
> >
> > I don't think suggesting enable_hashjoin =off is a solution,
> >
>
> Yeah, this doesn't sound like a good idea. How about instead try to
> explore the idea where the hash (bucket assignment and search) will be
> based on the first index key and the other columns will be stored as
> payload? I think this might pose some difficulty in the consecutive
> patch to enable a unique index because it will increase the chance of
> traversing more buckets for uniqueness checks. If we see such
> problems, then I have another idea to minimize the number of buckets
> that we need to lock during uniqueness check which is by lock chaining
> as is used during hash bucket clean up where at a time we don't need
> to lock more than two buckets at a time.

I have presented a simple, almost trivial, patch to allow multi-col
hash indexes. It hashes the first column only, which can be a downside
in *some* cases. If that is clearly documented, it would not cause
many issues, IMHO. However, it does not have any optimization issues
or complexities, which is surely a very good thing.

Trying to involve *all* columns in the hash index is a secondary
optimization. It requires subtle changes in optimizer code, as Tom
points out. It also needs fine tuning to make the all-column approach
beneficial for the additional cases without losing against what the
"first column" approach gives.

I did consider both approaches and after this discussion I am still in
favour of committing the very simple "first column" approach to
multi-col hash indexes now.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [Patch] ALTER SYSTEM READ ONLY

2021-10-05 Thread Amul Sul

   On Mon, Oct 4, 2021 at 1:57 PM Rushabh Lathia
 wrote:
>
>
>
> On Fri, Oct 1, 2021 at 2:29 AM Robert Haas  wrote:
>>
>> On Thu, Sep 30, 2021 at 7:59 AM Amul Sul  wrote:
>> > To find the value of InRecovery after we clear it, patch still uses
>> > ControlFile's DBState, but now the check condition changed to a more
>> > specific one which is less confusing.
>> >
>> > In casual off-list discussion, the point was made to check
>> > SharedRecoveryState to find out the InRecovery value afterward, and
>> > check that using RecoveryInProgress().  But we can't depend on
>> > SharedRecoveryState because at the start it gets initialized to
>> > RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
>> > Therefore, we can't use RecoveryInProgress() which always returns
>> > true if SharedRecoveryState != RECOVERY_STATE_DONE.
>>
>> Uh, this change has crept into 0002, but it should be in 0004 with the
>> rest of the changes to remove dependencies on variables specific to
>> the startup process. Like I said before, we should really be trying to
>> separate code movement from functional changes.

Well, I have to replace the InRecovery flag in that patch since we are
moving code past to the point where the InRecovery flag gets cleared.
If I don't do, then the 0002 patch would be wrong since InRecovery is
always false, and behaviour won't be the same as it was before that
patch.

>> Also, 0002 doesn't
>> actually apply for me. Did you generate these patches with 'git
>> format-patch'?
>>
>> [rhaas pgsql]$ patch -p1 <
>> ~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
>> patching file src/backend/access/transam/xlog.c
>> Hunk #1 succeeded at 889 (offset 9 lines).
>> Hunk #2 succeeded at 939 (offset 12 lines).
>> Hunk #3 succeeded at 5734 (offset 37 lines).
>> Hunk #4 succeeded at 8038 (offset 70 lines).
>> Hunk #5 succeeded at 8248 (offset 70 lines).
>> [rhaas pgsql]$ patch -p1 <
>> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
>> patching file src/backend/access/transam/xlog.c
>> Reversed (or previously applied) patch detected!  Assume -R? [n]
>> Apply anyway? [n] y
>> Hunk #1 FAILED at 7954.
>> Hunk #2 succeeded at 8079 (offset 70 lines).
>> 1 out of 2 hunks FAILED -- saving rejects to file
>> src/backend/access/transam/xlog.c.rej
>> [rhaas pgsql]$ git reset --hard
>> HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a
>> non-recoverable connection failure.
>> [rhaas pgsql]$ patch -p1 <
>> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
>> patching file src/backend/access/transam/xlog.c
>> Reversed (or previously applied) patch detected!  Assume -R? [n]
>> Apply anyway? [n]
>> Skipping patch.
>> 2 out of 2 hunks ignored -- saving rejects to file
>> src/backend/access/transam/xlog.c.rej
>>
>
> I tried to apply the patch on the master branch head and it's failing
> with conflicts.
>

Thanks, Rushabh, for the quick check, I have attached a rebased version for the
latest master head commit # f6b5d05ba9a.

> Later applied patch on below commit and it got applied cleanly:
>
> commit 7d1aa6bf1c27bf7438179db446f7d1e72ae093d0
> Author: Tom Lane 
> Date:   Mon Sep 27 18:48:01 2021 -0400
>
> Re-enable contrib/bloom's TAP tests.
>
> rushabh@rushabh:postgresql$ git apply 
> v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
> rushabh@rushabh:postgresql$ git apply 
> v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
> rushabh@rushabh:postgresql$ git apply 
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:34: space 
> before tab in indent.
>   /*
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:38: space 
> before tab in indent.
>   */
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:39: space 
> before tab in indent.
>   Insert->fullPageWrites = lastFullPageWrites;
> warning: 3 lines add whitespace errors.
> rushabh@rushabh:postgresql$ git apply 
> v36-0004-Remove-dependencies-on-startup-process-specifica.patch
>
> There are whitespace errors on patch 0003.
>

Fixed.

>>
>> It seems to me that the approach you're pursuing here can't work,
>> because the long-term goal is to get to a place where, if the system
>> starts up read-only, XLogAcceptWrites() might not be called until
>> later, after StartupXLOG() has exited. But in that case the control
>> file state would be DB_IN_PRODUCTION. But my idea of using
>> RecoveryInProgress() won't work either, because we set
>> RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put
>> differently, the question we want to answer is not "are we in recovery
>> now?" but "did we perform recovery?". After studying the code a bit, I
>> think a good test might be
>> !XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery
>> gets set to true, then we're certain to enter the if (InRecovery)
>> block that contains the main redo loop. And

Re: Next Steps with Hash Indexes

2021-10-05 Thread Simon Riggs

On Fri, 13 Aug 2021 at 05:01, Amit Kapila  wrote:
>
> On Thu, Aug 12, 2021 at 8:30 PM Robert Haas  wrote:
> >
> > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila  
> > wrote:
> > > The design of the patch has changed since the initial proposal. It
> > > tries to perform unique inserts by holding a write lock on the bucket
> > > page to avoid duplicate inserts.
> >
> > Do you mean that you're holding a buffer lwlock while you search the
> > whole bucket? If so, that's surely not OK.
> >
>
> I think here you are worried that after holding lwlock we might
> perform reads of overflow pages which is not a good idea. I think
> there are fewer chances of having overflow pages for unique indexes so
> we don't expect such cases in common and as far as I can understand
> this can happen in btree as well during uniqueness check. Now, I think
> the other possibility could be that we do some sort of lock chaining
> where we grab the lock of the next bucket before releasing the lock of
> the current bucket as we do during bucket clean up but not sure about
> the same.
>
> I haven't studied the patch in detail so it is better for Simon to
> pitch in here to avoid any incorrect information or if he has a
> different understanding/idea.

That is correct. After analysis of their behavior, I think further
simple work on hash indexes is worthwhile.

With unique data, starting at 1 and monotonically ascending, hash
indexes will grow very nicely from 0 to 10E7 rows without causing >1
overflow block to be allocated for any bucket. This keeps the search
time for such data to just 2 blocks (bucket plus, if present, 1
overflow block). The small number of overflow blocks is because of the
regular and smooth way that splits occur, which works very nicely
without significant extra latency.

The probability of bucket collision while we hold the lock is fairly
low. This is because even with adjacent data values the hash values
would be spread across multiple buckets, so we would expect the
contention to be less than we would get on a monotonically increasing
btree.

So I don't now see any problem from holding the buffer lwlock on the
bucket while we do multi-buffer operations.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Triage on old commitfest entries

2021-10-05 Thread Vik Fearing

On 10/5/21 4:29 AM, Stephen Frost wrote:
> Greetings,
> 
> * Peter Geoghegan (p...@bowt.ie) wrote:
>> On Sun, Oct 3, 2021 at 1:30 PM Tom Lane  wrote:
>>> Fair.  My concern here is mostly that we not just keep kicking the
>>> can down the road.  If we see that a patch has been hanging around
>>> this long without reaching commit, we should either kill it or
>>> form a specific plan for how to advance it.
>>
>> Also fair.
>>
>> The pandemic has made the kind of coordination I refer to harder in
>> practice. It's the kind of thing that face to face communication
>> really helps with.
> 
> Entirely agree with this.  Index skip scan is actually *ridiculously*
> useful in terms of an improvement, and we need to get the right people
> together to work on it and get it implemented.  I'd love to see this
> done for v15, in particular.  Who do we need to coordinate getting
> together to make it happen?  I doubt that I'm alone in wanting to make
> this happen and I'd be pretty surprised if we weren't able to bring the
> right folks together this fall to make it a reality.
I don't have the skills to work on either side of this, but I would like
to voice my support in favor of having this feature and I would be happy
to help test it on a user level (as opposed to reviewing code).
-- 
Vik Fearing

Re: Added schema level support for publication.

2021-10-05 Thread Amit Kapila

On Tue, Oct 5, 2021 at 11:10 AM Greg Nancarrow  wrote:
>
> On Tue, Oct 5, 2021 at 3:11 PM Amit Kapila  wrote:
> >
> > > Code has been added to prevent a table being set (via ALTER TABLE) to
> > > UNLOGGED if it is part of a publication, but I found that I could
> > > still add all tables of a schema having a table that is UNLOGGED:
> > >
> > > postgres=# create schema sch;
> > > CREATE SCHEMA
> > > postgres=# create unlogged table sch.test(i int);
> > > CREATE TABLE
> > > postgres=# create publication pub for table sch.test;
> > > ERROR:  cannot add relation "test" to publication
> > > DETAIL:  Temporary and unlogged relations cannot be replicated.
> > > postgres=# create publication pub for all tables in schema sch;
> > > CREATE PUBLICATION
> > >
> >
> > What about when you use "create publication pub for all tables;"? I
> > think that also works, now on similar lines shouldn't the behavior of
> > "all tables in schema" publication be the same? I mean if we want we
> > can detect and give an error but is it advisable to give an error if
> > there are just one or few tables in schema that are unlogged?
> >
>
>
..
> Yes, I agree that ALL TABLES IN SCHEMA should behave the same as the
> ALL TABLES case.
> Problem is, shouldn't setting UNLOGGED on a table only then be
> disallowed if that table was publicised using FOR TABLE?
>

Right, I also think so. Let us see what Vignesh or others think on this matter.

-- 
With Regards,
Amit Kapila.

Re: Next Steps with Hash Indexes

2021-10-05 Thread Dilip Kumar

On Tue, Oct 5, 2021 at 4:08 PM Simon Riggs  wrote:
>
> On Mon, 27 Sept 2021 at 06:52, Amit Kapila  wrote:
> >
> > On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar  wrote:
> > >
> > > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro  
> > > wrote:
> > > >
> > > > And to get the multi-column hash index selected, we may set
> > > > enable_hashjoin =off, to avoid any condition become join condition,
> > > > saw similar behaviors in other DBs as well...
> > >
> > > This may be related to Tom's point that, if some of the quals are
> > > removed due to optimization or converted to join quals, then now, even
> > > if the user has given qual on all the key columns the index scan will
> > > not be selected because we will be forcing that the hash index can
> > > only be selected if it has quals on all the key attributes?
> > >
> > > I don't think suggesting enable_hashjoin =off is a solution,
> > >
> >
> > Yeah, this doesn't sound like a good idea. How about instead try to
> > explore the idea where the hash (bucket assignment and search) will be
> > based on the first index key and the other columns will be stored as
> > payload? I think this might pose some difficulty in the consecutive
> > patch to enable a unique index because it will increase the chance of
> > traversing more buckets for uniqueness checks. If we see such
> > problems, then I have another idea to minimize the number of buckets
> > that we need to lock during uniqueness check which is by lock chaining
> > as is used during hash bucket clean up where at a time we don't need
> > to lock more than two buckets at a time.
>
> I have presented a simple, almost trivial, patch to allow multi-col
> hash indexes. It hashes the first column only, which can be a downside
> in *some* cases. If that is clearly documented, it would not cause
> many issues, IMHO. However, it does not have any optimization issues
> or complexities, which is surely a very good thing.
>
> Trying to involve *all* columns in the hash index is a secondary
> optimization. It requires subtle changes in optimizer code, as Tom
> points out. It also needs fine tuning to make the all-column approach
> beneficial for the additional cases without losing against what the
> "first column" approach gives.
>
> I did consider both approaches and after this discussion I am still in
> favour of committing the very simple "first column" approach to
> multi-col hash indexes now.

But what about the other approach suggested by Tom, basically we hash
only based on the first column for identifying the bucket, but we also
store the hash value for other columns?  With that, we don't need
changes in the optimizer and we can also avoid a lot of disk fetches
because after finding the bucket we can match the secondary columns
before fetching the disk tuple.  I agree, one downside with this
approach is we will increase the index size.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: using an end-of-recovery record in all cases

2021-10-05 Thread Amul Sul

I was trying to understand the v1 patch and found that at the end
RequestCheckpoint() is called unconditionally, I think that should
have been called if REDO had performed, here is the snip from the v1
patch:

  /*
- * If this was a promotion, request an (online) checkpoint now. This isn't
- * required for consistency, but the last restartpoint might be far back,
- * and in case of a crash, recovering from it might take a longer than is
- * appropriate now that we're not in standby mode anymore.
+ * Request an (online) checkpoint now. Note that, until this is complete,
+ * a crash would start replay from the same WAL location we did, or from
+ * the last restartpoint that completed. We don't want to let that
+ * situation persist for longer than necessary, since users don't like
+ * long recovery times. On the other hand, they also want to be able to
+ * start doing useful work again as quickly as possible. Therfore, we
+ * don't pass CHECKPOINT_IMMEDIATE to avoid bogging down the system.
+ *
+ * Note that the consequence of requesting a checkpoint here only after
+ * we've allowed WAL writes is that a single checkpoint cycle can span
+ * multiple server lifetimes. So for example if you want to something to
+ * happen at least once per checkpoint cycle or at most once per
+ * checkpoint cycle, you have to consider what happens if the server
+ * is restarted someplace in the middle.
  */
- if (promoted)
- RequestCheckpoint(CHECKPOINT_FORCE);
+ RequestCheckpoint(CHECKPOINT_FORCE);

When I try to call that conditionally like attached, I don't see any
regression failure, correct me if I am missing something here.

Regards,
Amul
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index eddb13d13a7..6d48a1047c6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6549,7 +6549,7 @@ StartupXLOG(void)
 	DBState		dbstate_at_startup;
 	XLogReaderState *xlogreader;
 	XLogPageReadPrivate private;
-	bool		promoted = false;
+	bool		written_end_of_recovery = false;
 	struct stat st;
 
 	/*
@@ -7955,45 +7955,9 @@ StartupXLOG(void)
 
 	if (InRecovery)
 	{
-		/*
-		 * Perform a checkpoint to update all our recovery activity to disk.
-		 *
-		 * Note that we write a shutdown checkpoint rather than an on-line
-		 * one. This is not particularly critical, but since we may be
-		 * assigning a new TLI, using a shutdown checkpoint allows us to have
-		 * the rule that TLI only changes in shutdown checkpoints, which
-		 * allows some extra error checking in xlog_redo.
-		 *
-		 * In promotion, only create a lightweight end-of-recovery record
-		 * instead of a full checkpoint. A checkpoint is requested later,
-		 * after we're fully out of recovery mode and already accepting
-		 * queries.
-		 */
-		if (ArchiveRecoveryRequested && IsUnderPostmaster &&
-			LocalPromoteIsTriggered)
-		{
-			promoted = true;
-
-			/*
-			 * Insert a special WAL record to mark the end of recovery, since
-			 * we aren't doing a checkpoint. That means that the checkpointer
-			 * process may likely be in the middle of a time-smoothed
-			 * restartpoint and could continue to be for minutes after this.
-			 * That sounds strange, but the effect is roughly the same and it
-			 * would be stranger to try to come out of the restartpoint and
-			 * then checkpoint. We request a checkpoint later anyway, just for
-			 * safety.
-			 */
-			CreateEndOfRecoveryRecord();
-		}
-		else
-		{
-			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-			  CHECKPOINT_IMMEDIATE |
-			  CHECKPOINT_WAIT);
-		}
+		CreateEndOfRecoveryRecord();
+		written_end_of_recovery = true;
 	}
-
 	if (ArchiveRecoveryRequested)
 	{
 		/*
@@ -8177,12 +8141,9 @@ StartupXLOG(void)
 	WalSndWakeup();
 
 	/*
-	 * If this was a promotion, request an (online) checkpoint now. This isn't
-	 * required for consistency, but the last restartpoint might be far back,
-	 * and in case of a crash, recovering from it might take a longer than is
-	 * appropriate now that we're not in standby mode anymore.
+	 * TODO
 	 */
-	if (promoted)
+	if (written_end_of_recovery)
 		RequestCheckpoint(CHECKPOINT_FORCE);
 }

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Daniel Gustafsson  writes:
>> Not being able to test against older versions in the builfarm seems like a
>> pretty compelling reason to set 5.8.3 as the required version.

> Looking at the list of Perl versions shipped with various OSes
> (https://www.cpan.org/ports/binaries.html), bumping the minimum
> requirement from 5.8.1 to 5.8.3 will affect the following OS versions,
> which shipped 5.8.1 or 5.8.2:

> AIX: 5.3, 6.1
> Fedora: 1 (Yarrow)
> macOS: 10.3 (Panther)
> Redhat: 2.1
> Slackware: 9.0, 9.1
> OpenSUSE: 8.2

> The only one of these that I can imagine we might possibly care about is
> AIX, but I don't know what versions we claim to support or people
> actually run PostgreSQL on (and want to upgrade to 15).

We do have a couple of buildfarm animals on AIX 7.1, but nothing older.
The other systems you mention are surely dead and buried.

Interestingly, although cpan's table says AIX 7.1 shipped with perl
5.10.1, what's actually on those buildfarm animals is

tgl@gcc111:[/home/tgl]which perl
/usr/bin/perl
tgl@gcc111:[/home/tgl]ls -l /usr/bin/perl
lrwxrwxrwx1 root system   29 Nov 09 2020  /usr/bin/perl -> 
/usr/opt/perl5/bin/perl5.28.1

Hard to tell if that is a local update or official IBM distribution.

> For reference, 6.1 was released on 2007-11-09 and EOL on 2017-04-30, and
> 7.1 was released on 2010-09-10 and is supported until 2023-04-30.

So 6.1 will be five years out of support by the time we release PG 15.
I'm inclined to just update the docs to say we don't support anything
older than 7.1.

regards, tom lane

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> Interestingly, although cpan's table says AIX 7.1 shipped with perl
> 5.10.1, what's actually on those buildfarm animals is
>
> tgl@gcc111:[/home/tgl]which perl
> /usr/bin/perl
> tgl@gcc111:[/home/tgl]ls -l /usr/bin/perl
> lrwxrwxrwx1 root system   29 Nov 09 2020  /usr/bin/perl -> 
> /usr/opt/perl5/bin/perl5.28.1
>
> Hard to tell if that is a local update or official IBM distribution.

Looks like they update the Perl version in OS updates and service packs:
https://www.ibm.com/support/pages/aix-perl-updates-and-support-perlrte

>> For reference, 6.1 was released on 2007-11-09 and EOL on 2017-04-30, and
>> 7.1 was released on 2010-09-10 and is supported until 2023-04-30.
>
> So 6.1 will be five years out of support by the time we release PG 15.

And PG 14 will be supported until nine years after the 6.1 EOL date.

> I'm inclined to just update the docs to say we don't support anything
> older than 7.1.

I concur.

- ilmari

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> Hard to tell if that is a local update or official IBM distribution.

> Looks like they update the Perl version in OS updates and service packs:
> https://www.ibm.com/support/pages/aix-perl-updates-and-support-perlrte

Oh, interesting.   So even if someone still had AIX 6.1 in the wild,
they'd likely have some newer-than-5.8.x Perl on it.

regards, tom lane

Re: Extension relocation vs. schema qualification

2021-10-05 Thread Tom Lane

"Verona, Luiz"  writes:
> I am writing to resurrect this 3-year-old thread. Attached is a patch to 
> address earthdistance related failures during pg_restore.
> The proposed patch will: 
>  - Create a new version of earthdistance (1.2) and make this new version 
> default
>  - Change earthdistance relocatable from true to false

That part seems like kind of a nonstarter.  People may already rely on it
being relocatable.  Also, if cube is still relocatable, it could still
get broken post-installation.

I spent some time awhile ago on fixing this via new-style SQL functions
[1].  That didn't get finished for reasons explained in the thread,
but I think that's probably a more productive direction to go in.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/3395418.1618352794%40sss.pgh.pa.us

Re: Fix pg_log_backend_memory_contexts() 's delay

2021-10-05 Thread Tom Lane

Bharath Rupireddy  writes:
> On Tue, Oct 5, 2021 at 2:50 PM bt21tanigaway
>  wrote:
>> Log output takes time between several seconds to a few tens when using
>> ‘SELECT pg_log_backend_memory_contexts(1234)’ with PID of ‘autovacuum
>> launcher’.
>> I made a patch for this problem.

> Thanks for the patch. Do we also need to do the change in
> HandleMainLoopInterrupts, HandleCheckpointerInterrupts,
> HandlePgArchInterrupts, HandleWalWriterInterrupts as we don't call
> CHECK_FOR_INTERRUPTS() there?

It's not real clear to me why we need to care about this in those
processes' idle loops.  Their memory consumption is unlikely to be
very interesting in that state, nor could it change before they
wake up.

regards, tom lane

Re: proposal: possibility to read dumped table's name from file

2021-10-05 Thread Daniel Gustafsson

> On 1 Oct 2021, at 15:19, Daniel Gustafsson  wrote:

> I'm still not happy with the docs, I need to take another look there and see 
> if
> I make them more readable but otherwise I don't think there are any open 
> issues
> with this.

Attached is a rebased version which has rewritten docs which I think are more
in line with the pg_dump documentation.  I've also added tests for
--strict-names operation, as well subjected it to pgindent and pgperltidy.

Unless there are objections, I think this is pretty much ready to go in.

--
Daniel Gustafsson   https://vmware.com/



pg_dump-filteropt-20211005.patch
Description: Binary data

Re: proposal: possibility to read dumped table's name from file

2021-10-05 Thread Pavel Stehule

út 5. 10. 2021 v 14:30 odesílatel Daniel Gustafsson 
napsal:

> > On 1 Oct 2021, at 15:19, Daniel Gustafsson  wrote:
>
> > I'm still not happy with the docs, I need to take another look there and
> see if
> > I make them more readable but otherwise I don't think there are any open
> issues
> > with this.
>
> Attached is a rebased version which has rewritten docs which I think are
> more
> in line with the pg_dump documentation.  I've also added tests for
> --strict-names operation, as well subjected it to pgindent and pgperltidy.
>
> Unless there are objections, I think this is pretty much ready to go in.
>

great, thank you

Pavel


> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: Role Self-Administration

2021-10-05 Thread Stephen Frost

Greetings,

* Noah Misch (n...@leadboat.com) wrote:
> On Mon, Oct 04, 2021 at 10:57:46PM -0400, Stephen Frost wrote:
> > "A role is not considered to hold WITH ADMIN OPTION on itself, but it
> > may grant or revoke membership in itself from a database session where
> > the session user matches the role."
> 
> > Here's the thing - having looked back through the standard, it seems
> > we're missing a bit that's included there and that makes a heap of
> > difference.  Specifically, the SQL standard basically says that to
> > revoke a privilege, you need to have been able to grant that privilege
> > in the first place (as Andrew Dunstan actually also brought up in a
> > recent thread about related CREATEROLE things- 
> > https://www.postgresql.org/message-id/837cc50a-532a-85f5-a231-9d68f2184e52%40dunslane.net
> > ) and that isn't something we've been considering when it comes to role
> > 'self administration' thus far, at least as it relates to the particular
> > field of the "grantor".
> 
> Which SQL standard clauses are you paraphrasing?  (A reference could take the
> form of a spec version number, section number, and rule number.  Alternately,
> a page number and URL to a PDF would suffice.)

12.7 

Specifically the bit about how a role authorization is said to be
identified if it defines the grant of the role revoked to the grantee
*with grantor A*.  Reading it again these many years later, that seems
to indicate that you need to actually be the grantor or able to be the
grantor who performed the original grant in order to revoke it,
something that wasn't done in the original implementation of roles.

> > We can't possibly make things like EVENT TRIGGERs or CREATEROLE work
> > with role trees if a given role can basically just 'opt out' of being
> > part of the tree to which they were assigned by the user who created
> > them.  Therefore, I suggest we contemplate two changes in this area:
> 
> I suspect we'll regret using the GRANT system to modify behaviors other than
> whether or not one gets "permission denied".  Hence, -1 on using role
> membership to control event trigger firing, whether or not $SUBJECT changes.

I've not been entirely sure if that's a great idea or not either, but I
didn't see any particular holes in Tom's suggestion that we use this as
a way to identify a tree of roles, except for this particular issue that
a role is currently able to 'opt out', which seems like a mistake in the
original role implementation and not an issue with Tom's actual idea to
use it in this way.

I do think that getting the role management sorted out with just the
general concepts of 'tenant' and 'landlord' as discussed in the thread
with Mark about changes to CREATEROLE and adding of other predefined
roles is a necessary first step, and only after we feel like we've
solved that should we come back to the idea of using that for other
things, such as event trigger firing.

> > - Allow a user who is able to create roles decide if the role created is
> >   able to 'self administor' (that is- GRANT their own role to someone
> >   else) itself.
> > 
> > - Disallow roles from being able to REVOKE role membership that they
> >   didn't GRANT in the first place.
> 
> Either of those could be reasonable.  Does the SQL standard take a position
> relevant to the decision?  A third option is to track each role's creator and
> make is_admin_of_role() return true for the creator, whether or not the
> creator remains a member.  That would also benefit cases where the creator is
> rolinherit and wants its ambient privileges to shed the privileges of the role
> it's creating.

It's a bit dense, but my take on the revoke statement description is
that the short answer is "yes, the standard does take a position on
this" at least as it relates to role memberships.  As for if a role
would have the ability to control it for themselves, that seems like a
natural extension of the general approach whereby a role can't grant
themselves admin role on their own role if they don't already have it,
but some other, appropriately privileged role, could.

I don't feel it's necessary to track additional information about who
created a specific role.  Simply having, when that role is created,
the creator be automatically granted admin rights on the role created
seems like it'd be sufficient.

> > We should probably do a more thorough review
> > to see if there's other cases where a given role is able to REVOKE
> > rights that have been GRANT'd by some other role on a particular object,
> > as it seems like we should probably be consistent in this regard across
> > everything and not just for roles.  That might be a bit of a pain but it
> > seems likely to be worth it in the long run and feels like it'd bring us
> > more in-line with the SQL standard too.
> 
> Does the SQL standard take a position on whether REVOKE SELECT should work
> that way?

In my reading, yes, it's much the same.  I invite others to try and read
through it and see if they

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Andrew Dunstan

On 10/4/21 11:12 PM, Tom Lane wrote:
>
> In short: (a) we're not testing against anything older than 5.8.3
> and (b) it seems quite unlikely that anybody cares about 5.8.x anyway.
> So if we want to mess with this, maybe we should set the cutoff
> to 5.8.3 not 5.8.1.
>
>   

Seems OK. Note that the Msys DTK perl currawong uses to build with is
ancient (5.6.1). That's going to stay as it is until it goes completely
out of scope in about 13 months. The perl it builds plperl against is
much more modern - 5.16.3.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: RfC entries in CF 2021-09

2021-10-05 Thread Jaime Casanova

On Tue, Oct 05, 2021 at 03:24:40PM +0900, Etsuro Fujita wrote:
> Hi Jaime,
> 
> On Tue, Oct 5, 2021 at 4:09 AM Jaime Casanova
>  wrote:
> > - Fast COPY FROM command for the foreign tables
> >   Last patch was on Jun-2021, no further activity after that.
> >   Etsuro-san, are you going to commit this soon?
> 
> Unfortunately, I didn’t have time for this in the September
> commitfest.  I’m planning on working on it in the next commitfest.
> 

Thanks. Moving to next CF.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: plperl: update ppport.h and fix configure version check

2021-10-05 Thread Tom Lane

Andrew Dunstan  writes:
> Seems OK. Note that the Msys DTK perl currawong uses to build with is
> ancient (5.6.1). That's going to stay as it is until it goes completely
> out of scope in about 13 months. The perl it builds plperl against is
> much more modern - 5.16.3.

That brings up something I was intending to ask you about -- any special
tips about running the buildfarm script with a different Perl version
than is used in the PG build itself?  I'm trying to modernize a couple
of my buildfarm animals to use non-stone-age SSL, but I don't really
want to move the goalposts on what they're testing.

regards, tom lane

Re: proposal: possibility to read dumped table's name from file

2021-10-05 Thread Erik Rijkers


Op 05-10-2021 om 14:30 schreef Daniel Gustafsson:



Unless there are objections, I think this is pretty much ready to go in.


Agreed.  One typo:

'This keyword can only be with the exclude keyword.'should be
'This keyword can only be used with the exclude keyword.'


thanks,

Erik Rijkers






--
Daniel Gustafsson   https://vmware.com/

Re: RfC entries in CF 2021-09

2021-10-05 Thread Peter Geoghegan

On Mon, Oct 4, 2021 at 12:09 PM Jaime Casanova
 wrote:

> - Extending amcheck to check toast size and compression
>   Last patch from may-2021, also it seems there is no activity since
>   Jul-2021. Peter Geoghegan, are you planning to look at this one?

I didn't plan on it, no.

-- 
Peter Geoghegan

Re: Support for NSS as a libpq TLS backend

2021-10-05 Thread Jacob Champion

On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote:
> Thanks!  These changes looks good.  Since you accidentally based this on v43
> and not the v44 I posted with the cryptohash fix in, the attached is a v45 
> with
> both your v44 and the previous one, all rebased over HEAD.

Thanks, and sorry about that.

--Jacob

Re: RfC entries in CF 2021-09

2021-10-05 Thread Greg Stark

On Mon, 4 Oct 2021 at 15:09, Jaime Casanova
 wrote:
>

> - Extending amcheck to check toast size and compression
>   Last patch from may-2021, also it seems there is no activity since
>   Jul-2021. Peter Geoghegan, are you planning to look at this one?

I'll look at this if nobody minds.


Other patches I could maybe look at might be these two:

> - Simplify some RI checks to reduce SPI overhead
>   Last patch is from Jul-2021, little activity since then. Peter
>   Eisentraut you're marked as reviewer here, do you intend to take the
>   patch as the committer?
>
> - global temporary table
>   This has activity. And seems a good improvement. Comments?


-- 
greg

Re: corruption of WAL page header is never reported

2021-10-05 Thread Fujii Masao





On 2021/10/05 10:58, Kyotaro Horiguchi wrote:

At Tue, 5 Oct 2021 00:59:46 +0900, Fujii Masao  
wrote in

I think that it's better to comment why "retry" is not necessary
when not in standby mode.

---
When not in standby mode, an invalid page header should cause recovery
to end, not retry reading the page, so we don't need to validate the
page
header here for the retry. Instead, ReadPageInternal() is responsible
for
the validation.


LGTM.


Thanks for the review! I updated the comment and pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: using an end-of-recovery record in all cases

2021-10-05 Thread Robert Haas

On Tue, Oct 5, 2021 at 7:44 AM Amul Sul  wrote:
> I was trying to understand the v1 patch and found that at the end
> RequestCheckpoint() is called unconditionally, I think that should
> have been called if REDO had performed,

You're right. But I don't think we need an extra variable like this,
right? We can just test InRecovery?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix pg_log_backend_memory_contexts() 's delay

2021-10-05 Thread Robert Haas

On Tue, Oct 5, 2021 at 8:28 AM Tom Lane  wrote:
> It's not real clear to me why we need to care about this in those
> processes' idle loops.  Their memory consumption is unlikely to be
> very interesting in that state, nor could it change before they
> wake up.

Perhaps that's so, but it doesn't seem like a good reason not to make
them more responsive.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Role Self-Administration

2021-10-05 Thread Robert Haas

On Mon, Oct 4, 2021 at 10:57 PM Stephen Frost  wrote:
> - Disallow roles from being able to REVOKE role membership that they
>   didn't GRANT in the first place.

I think that's not quite the right test. For example, if alice and bob
are superusers and alice grants pg_monitor to doug, bob should be able
to revoke that grant even though he is not alice.

I think the rule should be: roles shouldn't be able to REVOKE role
memberships unless they can become the grantor.

But I think maybe if it should even be more general than that and
apply to all sorts of grants, rather than just roles and role
memberships: roles shouldn't be able to REVOKE any granted permission
unless they can become the grantor.

For example, if bob grants SELECT on one of his tables to alice, he
should be able to revoke the grant, too. But if the superuser performs
the grant, why should bob be able to revoke it? The superuser has
spoken, and bob shouldn't get to interfere ... unless of course he's
also a superuser.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Allow escape in application_name

2021-10-05 Thread Fujii Masao





On 2021/10/04 21:53, kuroda.hay...@fujitsu.com wrote:

if (*p == '-' ? isdigit(p[1]) : isdigit(p[0]))
{
char *endptr;
padding = strtol(p, &endptr, 10);
if (p == endptr)
break;
p = endptr;
}
else
padding = 0;


I removed the support function based on your suggestion,
but added some if-statement in order to treats the special case: '%-p'.


+   else if (*p == '-' ? isdigit(p[1]) : isdigit(p[0]))
+   {
+   char *endptr;
+   padding = strtol(p, &endptr, 10);

Maybe isdigit() and strtol() work differently depending on locale setting?
If so, I'm afraid that the behavior of this new code would be different from
process_log_prefix_padding().

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Next Steps with Hash Indexes

2021-10-05 Thread Simon Riggs

On Tue, 5 Oct 2021 at 12:24, Dilip Kumar  wrote:
>
> On Tue, Oct 5, 2021 at 4:08 PM Simon Riggs  
> wrote:
> >
> > On Mon, 27 Sept 2021 at 06:52, Amit Kapila  wrote:
> > >
> > > On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar  
> > > wrote:
> > > >
> > > > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro  
> > > > wrote:
> > > > >
> > > > > And to get the multi-column hash index selected, we may set
> > > > > enable_hashjoin =off, to avoid any condition become join condition,
> > > > > saw similar behaviors in other DBs as well...
> > > >
> > > > This may be related to Tom's point that, if some of the quals are
> > > > removed due to optimization or converted to join quals, then now, even
> > > > if the user has given qual on all the key columns the index scan will
> > > > not be selected because we will be forcing that the hash index can
> > > > only be selected if it has quals on all the key attributes?
> > > >
> > > > I don't think suggesting enable_hashjoin =off is a solution,
> > > >
> > >
> > > Yeah, this doesn't sound like a good idea. How about instead try to
> > > explore the idea where the hash (bucket assignment and search) will be
> > > based on the first index key and the other columns will be stored as
> > > payload? I think this might pose some difficulty in the consecutive
> > > patch to enable a unique index because it will increase the chance of
> > > traversing more buckets for uniqueness checks. If we see such
> > > problems, then I have another idea to minimize the number of buckets
> > > that we need to lock during uniqueness check which is by lock chaining
> > > as is used during hash bucket clean up where at a time we don't need
> > > to lock more than two buckets at a time.
> >
> > I have presented a simple, almost trivial, patch to allow multi-col
> > hash indexes. It hashes the first column only, which can be a downside
> > in *some* cases. If that is clearly documented, it would not cause
> > many issues, IMHO. However, it does not have any optimization issues
> > or complexities, which is surely a very good thing.
> >
> > Trying to involve *all* columns in the hash index is a secondary
> > optimization. It requires subtle changes in optimizer code, as Tom
> > points out. It also needs fine tuning to make the all-column approach
> > beneficial for the additional cases without losing against what the
> > "first column" approach gives.
> >
> > I did consider both approaches and after this discussion I am still in
> > favour of committing the very simple "first column" approach to
> > multi-col hash indexes now.
>
> But what about the other approach suggested by Tom, basically we hash
> only based on the first column for identifying the bucket, but we also
> store the hash value for other columns?  With that, we don't need
> changes in the optimizer and we can also avoid a lot of disk fetches
> because after finding the bucket we can match the secondary columns
> before fetching the disk tuple.  I agree, one downside with this
> approach is we will increase the index size.

Identifying the bucket is the main part of a hash index's work, so
that part would be identical.

Once we have identified the bucket, we sort the bucket page by hash,
so having an all-col hash would help de-duplicate multi-col hash
collisions, but not enough to be worth it, IMHO, given that storing an
extra 4 bytes per index tuple is a significant size increase which
would cause extra overflow pages etc.. The same thought applies to
8-byte hashes.

IMHO, multi-column hash collisions are a secondary issue, given that
we can already select the column order for an index and hash indexes
would only be used by explicit user choice. If there are some minor
sub-optimal aspects of using hash indexes, then btree was already the
default and a great choice for many cases.

If btree didn't already exist I would care more about making hash
indexes perfect. I just want to make them usable.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Role Self-Administration

2021-10-05 Thread Mark Dilger

> On Oct 5, 2021, at 9:23 AM, Robert Haas  wrote:
> 
>> - Disallow roles from being able to REVOKE role membership that they
>>  didn't GRANT in the first place.
> 
> I think that's not quite the right test. For example, if alice and bob
> are superusers and alice grants pg_monitor to doug, bob should be able
> to revoke that grant even though he is not alice.

Additionally, role "alice" might not exist anymore, which would leave the 
privilege irrevocable.  It's helpful to think in terms of role ownership rather 
than role creation:

superuser
  +---> alice
+---> charlie
  +---> diane
  +---> bob

It makes sense that alice can take ownership of diane and drop charlie, but not 
that bob can do so.  Nor should charlie be able to transfer ownership of diane 
to alice.  Nor should charlie be able to drop himself.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: using an end-of-recovery record in all cases

2021-10-05 Thread Amul Sul

On Tue, 5 Oct 2021 at 9:04 PM, Robert Haas  wrote:

> On Tue, Oct 5, 2021 at 7:44 AM Amul Sul  wrote:
> > I was trying to understand the v1 patch and found that at the end
> > RequestCheckpoint() is called unconditionally, I think that should
> > have been called if REDO had performed,
>
> You're right. But I don't think we need an extra variable like this,
> right? We can just test InRecovery?

No, InRecovery flag get cleared before this point. I think, we can use
lastReplayedEndRecPtr what you have suggested in other thread.

Regards,
Amul
-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-05 Thread Robert Haas

On Mon, Oct 4, 2021 at 7:10 PM Peter Geoghegan  wrote:
> All of the underlying errors are cases that were clearly intended to
> catch user error -- every single one. But apparently pg_amcheck is
> incapable of error, by definition. Like HAL 9000.

After some thought, I agree with the idea that pg_amcheck ought to
skip relations that can't be expected to be valid -- which includes
both unlogged relations while in recovery, and also invalid indexes
left behind by failed index builds. Otherwise it can only find
non-problems, which we don't want to do.

But this comment seems like mockery to me, and I don't like that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: extensible options syntax for replication parser?

2021-10-05 Thread Robert Haas

On Mon, Sep 27, 2021 at 3:15 AM tushar  wrote:
> It would be great if we throw an error rather than silently ignoring
> this parameter ,  I opened a separate email for this
>
> https://www.postgresql.org/message-id/CAC6VRoY3SAFeO7kZ0EOVC6mX%3D1ZyTocaecTDTh209W20KCC_aQ%40mail.gmail.com

Hearing no further comments, I've gone ahead and committed these
patches. I'm still slightly nervous that I may have missed some issue,
but I think at this point having the patches in the tree is more
likely to turn it up than any other course of action.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Triage on old commitfest entries

2021-10-05 Thread Jaime Casanova

On Mon, Oct 04, 2021 at 02:12:49AM -0500, Jaime Casanova wrote:
> On Sun, Oct 03, 2021 at 03:14:58PM -0400, Tom Lane wrote:
> [...]
> > 
> > Here's what I found, along with some commentary about each one.
> > 
> > Patch   Age in CFs
> > 
> > Protect syscache from bloating with negative cache entries  23
> > Last substantive discussion 2021-01, currently passing cfbot
> > 
> > It's well known that I've never liked this patch, so I can't
> > claim to be unbiased.  But what I see here is a lot of focus
> > on specific test scenarios with little concern for the
> > possibility that other scenarios will be made worse.
> > I think we need some new ideas to make progress.
> > Proposed action: RWF
> 
> if we RwF this patch we should add the thread to the TODO entry 
> it refers to 
> 

done this way

> 
> > Remove self join on a unique column 16
> > Last substantive discussion 2021-07, currently passing cfbot
> > 
> > I'm not exactly sold that this has a good planning-cost-to-
> > usefulness ratio.
> > Proposed action: RWF
> > 
> 
> It seems there is no proof that this will increase performance in the
> thread.
> David you're reviewer on this patch, what your opinion on this is?
> 

The last action here was a rebased patch.
So, I will try to follow on this one and will make some performance an
functional tests. 
Based on that, I will move this to the next CF and put myself as
reviewer.
But of course, I will be happy if some committer/more experienced dev
could look at the design/planner bits.


> > Index Skip Scan 16
> > Last substantive discussion 2021-05, currently passing cfbot
> > 
> > Seems possibly useful, but we're not making progress.
> > 
> 
> Peter G mentioned this would be useful. What we need to advance this
> one? 
> 

Moved to next CF based on several comments

> > Fix up partitionwise join on how equi-join conditions between the partition 
> > keys are identified 11
> > Last substantive discussion 2021-07, currently passing cfbot
> > 
> > This is another one where I feel we need new ideas to make
> > progress.
> > Proposed action: RWF
> 
> It seems there has been no activity since last version of the patch so I
> don't think RwF is correct. What do we need to advance on this one?
> 

Ok. You're a reviewer in that patch and know the problems that we 
mere mortals are not able to understand.

So will do as you suggest, and then will write to Richard to send the
new version he was talking about in a new entry in the CF

> > 
> > A hook for path-removal decision on add_path11
> > Last substantive discussion 2021-03, currently passing cfbot
> > 
> > I don't think this is a great idea: a hook there will be
> > costly, and it's very unclear how multiple extensions could
> > interact correctly.
> > Proposed action: Reject
> > 
> 
> Any other comments on this one?
> 

Will do as you suggest

> > Implement INSERT SET syntax 11
> > Last substantive discussion 2020-03, currently passing cfbot
> > 
> > This one is clearly stalled.  I don't think it's necessarily
> > a bad idea, but we seem not to be very interested.
> > Proposed action: Reject for lack of interest
> > 
> 
> Again, no activity after last patch. 
> 

I'm not a fan of not SQL Standard syntax but seems there were some
interest on this. 
And will follow this one as reviewer.

> 
> > psql - add SHOW_ALL_RESULTS option  11
> > Last substantive discussion 2021-09, currently passing cfbot
> > 
> > This got committed and reverted once already.  I have to be
> > suspicious of whether this is a good design.
> > 
> 
> No activity after last patch
> 

Moved to next CF 

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-05 Thread Peter Geoghegan

On Tue, Oct 5, 2021 at 9:41 AM Robert Haas  wrote:
> On Mon, Oct 4, 2021 at 7:10 PM Peter Geoghegan  wrote:
> > All of the underlying errors are cases that were clearly intended to
> > catch user error -- every single one. But apparently pg_amcheck is
> > incapable of error, by definition. Like HAL 9000.

> But this comment seems like mockery to me, and I don't like that.

It was certainly not a constructive way of getting my point across.

I apologize to Mark.

-- 
Peter Geoghegan

style for typedef of function that will be pointed to

2021-10-05 Thread Chapman Flack

Hi,

>From everything I've seen, the PostgreSQL style seems to be to include
the * in a typedef for a function type to which pointers will be held:

typedef void (*Furbinator)(char *furbee);

struct Methods
{
Furbinator furbinate;
};


An alternative I've sometimes used elsewhere is to typedef the function
type itself, and use the * when declaring a pointer to it:

typedef void Furbinator(char *furbee);

struct Methods
{
Furbinator *furbinate;
};


What I like about that form is it allows reusing the typedef to prototype
any implementing function:

static Furbinator _furbinator0;
static void _furbinator0(char *furbee)
{
}

It doesn't completely eliminate repeating myself, because the function
definition still has to be spelled out. But it's a bit less repetitive,
and makes it visibly explicit that this function is to be a Furbinator,
and if I get the repeating-myself part wrong, the compiler catches it
right on the spot, not only when I try to assign it later to some
*Furbinator-typed field.

Use of the thing doesn't look any different, thanks to the equivalence
of a function name and its address:

methods.furbinate = _furbinator0;

Naturally, I'm not proposing any change of existing usages, nor would
I presume to ever submit a patch using the different style.
If anything, maybe I'd consider adding some new code in this style
in PL/Java, which as an out-of-tree extension maybe isn't bound by
every jot and tittle of PG style, but generally has followed
the documented coding conventions. They seem to be silent on this
one point.

So what I'm curious about is: is there a story to how PG settled on
the style it uses? Is the typedef-the-function-itself style considered
objectionable? For any reason other than being different? If there were
compilers at one time that didn't like it, are there still any?
Any that matter?

I've found two outside references taking different positions.

The Ghostscript project has coding guidelines [0] recommending against,
saying "Many compilers don't handle this correctly -- they will give
errors, or do the wrong thing, ...". I can't easily tell what year
that guideline was written. Ghostscript goes back a long way.

The SquareSpace OpenSync coding standard [1] describes both styles
(p. 34) and the benefits of the typedef-the-function-itself style
(p. 35), without seeming to quite take any final position between them.

Regards,
-Chap



[0] https://www.ghostscript.com/doc/9.50/C-style.htm
[1] https://www.opensync.io/s/EDE-020-041-501_OpenSync_Coding_Standard.pdf

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-05 Thread Mark Dilger

> On Oct 5, 2021, at 9:58 AM, Peter Geoghegan  wrote:
> 
> I apologize to Mark.

I took no offense.  Actually, I owe you a thank-you for having put so much 
effort into debating the behavior with me.  I think the patch to fix bug #17212 
will be better for it.

(And thanks to Robert for the concern.)

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Role Self-Administration

2021-10-05 Thread Stephen Frost

Greetings,

On Tue, Oct 5, 2021 at 12:23 Robert Haas  wrote:

> On Mon, Oct 4, 2021 at 10:57 PM Stephen Frost  wrote:
> > - Disallow roles from being able to REVOKE role membership that they
> >   didn't GRANT in the first place.
>
> I think that's not quite the right test. For example, if alice and bob
> are superusers and alice grants pg_monitor to doug, bob should be able
> to revoke that grant even though he is not alice.
>
> I think the rule should be: roles shouldn't be able to REVOKE role
> memberships unless they can become the grantor.

Yes, role membership still equating to “being” that role still holds with
this, even though I didn’t say so explicitly.

But I think maybe if it should even be more general than that and
> apply to all sorts of grants, rather than just roles and role
> memberships: roles shouldn't be able to REVOKE any granted permission
> unless they can become the grantor.

Right, this was covered towards the end of my email, though again evidently
not clearly enough, sorry about that.

For example, if bob grants SELECT on one of his tables to alice, he
> should be able to revoke the grant, too. But if the superuser performs
> the grant, why should bob be able to revoke it? The superuser has
> spoken, and bob shouldn't get to interfere ... unless of course he's
> also a superuser.

Mostly agreed except I’d exclude the explicit “superuser” flag bit and just
say if r1 granted the right, r2 shouldn’t be the one who is allowed to
revoke it until r2 happens to also be a member of r1.

Thanks,

Stephen

>

Re: Role Self-Administration

2021-10-05 Thread Robert Haas

On Tue, Oct 5, 2021 at 12:38 PM Mark Dilger
 wrote:
> Additionally, role "alice" might not exist anymore, which would leave the 
> privilege irrevocable.

I thought that surely this couldn't be right, but apparently we have
absolutely no problem with leaving the "grantor" column in pg_authid
as a dangling reference to a pg_authid role that no longer exists:

rhaas=# select * from pg_auth_members where grantor not in (select oid
from pg_authid);
 roleid | member | grantor | admin_option
++-+--
   3373 |  16412 |   16410 | f
(1 row)

Yikes. We'd certainly have to do something about that if we want to
use the grantor field for anything security-sensitive, since otherwise
hilarity would ensue if that OID got recycled for a new role at any
later point in time.

This seems weirdly inconsistent with what we do in other cases:

rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# grant select on table foo to alice with grant option;
GRANT
rhaas=# \c rhaas alice
You are now connected to database "rhaas" as user "alice".
rhaas=> grant select on table foo to bob;
GRANT
rhaas=> \c - rhaas
You are now connected to database "rhaas" as user "rhaas".
rhaas=# drop role alice;
ERROR:  role "alice" cannot be dropped because some objects depend on it
DETAIL:  privileges for table foo
rhaas=#

Here, because the ACL on table foo records alice as a grantor, alice
cannot be dropped. But when alice is the grantor of a role, the same
rule doesn't apply. I think the behavior shown in this example, where
alice can't be dropped, is the right behavior, and the behavior for
roles is just plain broken.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: using an end-of-recovery record in all cases

2021-10-05 Thread Robert Haas

On Tue, Oct 5, 2021 at 12:41 PM Amul Sul  wrote:
> No, InRecovery flag get cleared before this point. I think, we can use 
> lastReplayedEndRecPtr what you have suggested in other thread.

Hmm, right, that makes sense. Perhaps I should start remembering what
I said in my own emails.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Role Self-Administration

2021-10-05 Thread Stephen Frost

Greetings,

On Tue, Oct 5, 2021 at 12:38 Mark Dilger 
wrote:

>
>
> > On Oct 5, 2021, at 9:23 AM, Robert Haas  wrote:
> >
> >> - Disallow roles from being able to REVOKE role membership that they
> >>  didn't GRANT in the first place.
> >
> > I think that's not quite the right test. For example, if alice and bob
> > are superusers and alice grants pg_monitor to doug, bob should be able
> > to revoke that grant even though he is not alice.
>
> Additionally, role "alice" might not exist anymore, which would leave the
> privilege irrevocable.

Do we actually allow that case to happen today..?  I didn’t think we did
and instead there’s a dependency from the grant on to the Alice role. If
that doesn’t exist today then I would think we’d need that and therefore
this concern isn’t an issue.

It's helpful to think in terms of role ownership rather than role creation:
>
> superuser
>   +---> alice
> +---> charlie
>   +---> diane
>   +---> bob
>
> It makes sense that alice can take ownership of diane and drop charlie,
> but not that bob can do so.  Nor should charlie be able to transfer
> ownership of diane to alice.  Nor should charlie be able to drop himself.

I dislike moving away from the ADMIN OPTION when it comes to roles as it
puts us outside of the SQL standard. Having the ADMIN OPTION for a role
seems, at least to me, to basically mean the things you’re suggesting
“ownership” to mean- so why have two different things, especially when one
doesn’t exist as a concept in the standard..?

I agree that Charlie shouldn’t be able to drop themselves in general, but I
don’t think we need an “ownership” concept for that. We also prevent loops
already which I think is called for in the standard already (would need to
go reread and make sure though) which already prevents Charlie from
granting Diane to Alice.  What does the “ownership” concept actually buy us
then?

Thanks,

Stephen

>

Re: Role Self-Administration

2021-10-05 Thread Stephen Frost

Greetings,

On Tue, Oct 5, 2021 at 13:09 Robert Haas  wrote:

> On Tue, Oct 5, 2021 at 12:38 PM Mark Dilger
>  wrote:
> > Additionally, role "alice" might not exist anymore, which would leave
> the privilege irrevocable.
>
> I thought that surely this couldn't be right, but apparently we have
> absolutely no problem with leaving the "grantor" column in pg_authid
> as a dangling reference to a pg_authid role that no longer exists:


> rhaas=# select * from pg_auth_members where grantor not in (select oid
> from pg_authid);
>  roleid | member | grantor | admin_option
> ++-+--
>3373 |  16412 |   16410 | f
> (1 row)
>
> Yikes. We'd certainly have to do something about that if we want to
> use the grantor field for anything security-sensitive, since otherwise
> hilarity would ensue if that OID got recycled for a new role at any
> later point in time.


Yeah, ew. We should just fix this.

This seems weirdly inconsistent with what we do in other cases:
>
> rhaas=# create table foo (a int, b text);
> CREATE TABLE
> rhaas=# grant select on table foo to alice with grant option;
> GRANT
> rhaas=# \c rhaas alice
> You are now connected to database "rhaas" as user "alice".
> rhaas=> grant select on table foo to bob;
> GRANT
> rhaas=> \c - rhaas
> You are now connected to database "rhaas" as user "rhaas".
> rhaas=# drop role alice;
> ERROR:  role "alice" cannot be dropped because some objects depend on it
> DETAIL:  privileges for table foo
> rhaas=#
>
> Here, because the ACL on table foo records alice as a grantor, alice
> cannot be dropped. But when alice is the grantor of a role, the same
> rule doesn't apply. I think the behavior shown in this example, where
> alice can't be dropped, is the right behavior, and the behavior for
> roles is just plain broken.


Agreed.

Thanks,

Stephen

>

Re: Role Self-Administration

2021-10-05 Thread Mark Dilger




> On Oct 5, 2021, at 10:14 AM, Stephen Frost  wrote:
> 
> What does the “ownership” concept actually buy us then?

DROP ... CASCADE.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A problem about partitionwise join

2021-10-05 Thread Jaime Casanova

On Wed, Jul 21, 2021 at 04:44:53PM +0800, Richard Guo wrote:
> On Fri, Nov 27, 2020 at 8:05 PM Ashutosh Bapat 
> wrote:
> 
> >
> > In the example you gave earlier, the equi join on partition key was
> > there but it was replaced by individual constant assignment clauses.
> > So if we keep the original restrictclause in there with a new flag
> > indicating that it's redundant, have_partkey_equi_join will still be
> > able to use it without much change. Depending upon where all we need
> > to use avoid restrictclauses with the redundant flag, this might be an
> > easier approach. However, with Tom's idea partition-wise join may be
> > used even when there is no equi-join between partition keys but there
> > are clauses like pk = const for all tables involved and const is the
> > same for all such tables.
> >
> 
> Correct. So with Tom's idea partition-wise join can cope with clauses
> such as 'foo.k1 = bar.k1 and foo.k2 = 16 and bar.k2 = 16'.
> 
> 
> >
> > In the spirit of small improvement made to the performance of
> > have_partkey_equi_join(), pk_has_clause should be renamed as
> > pk_known_equal and pks_known_equal as num_equal_pks.
> >
> 
> Thanks for the suggestion. Will do that in the new version of patch.
> 

Hi Richard,

We are marking this CF entry as "Returned with Feedback", which means
you are encouraged to send a new patch (and create a new entry for a
future CF for it) with the suggested changes.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: Role Self-Administration

2021-10-05 Thread Stephen Frost

Greetings,

On Tue, Oct 5, 2021 at 13:17 Mark Dilger 
wrote:

> > On Oct 5, 2021, at 10:14 AM, Stephen Frost  wrote:
> >
> > What does the “ownership” concept actually buy us then?
>
> DROP ... CASCADE

I’m not convinced that we need to invent the concept of ownership in order
to find a sensible way to make this work- though it would be helpful to
first get everyone’s idea of just what *would* this command do if run on a
role who “owns” or has “admin rights” of another role?

Thanks,

Stephen

>

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-05 Thread Peter Geoghegan

On Tue, Oct 5, 2021 at 10:03 AM Mark Dilger
 wrote:
> I took no offense.  Actually, I owe you a thank-you for having put so much 
> effort into debating the behavior with me.  I think the patch to fix bug 
> #17212 will be better for it.

Glad that you think so. And, thanks for working on the issue so promptly.

This was a question of fundamental definitions. Those are often very tricky.

-- 
Peter Geoghegan

Re: How to retain lesser paths at add_path()?

2021-10-05 Thread Jaime Casanova

On Sat, Mar 06, 2021 at 06:50:13PM +0900, Kohei KaiGai wrote:
> 
> On the other hand, I'm uncertain whether the pfree(new_path) at the tail
> of add_path makes sense on the modern hardware, because they allow to
> recycle just small amount of memory, then entire memory consumption
> by the optimizer shall be released by MemoryContext mechanism.
> If add_path does not release path-node, the above portion is much simpler.
> 

Hi Kaigai-san,

Do you have an updated patch? Please feel free to resubmit to next CF.
Current CF entry has been marked as RwF.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: storing an explicit nonce

2021-10-05 Thread Robert Haas

On Mon, Oct 4, 2021 at 10:00 PM Stephen Frost  wrote:
> I do want to point out, as I think I did when we discussed this but want
> to be sure it's also captured here- I don't think that temporary file
> access should be forced to be block-oriented when it's naturally (in
> very many cases) sequential.  To that point, I'm thinking that we need a
> temp file access API through which various systems work that's
> sequential and therefore relatively similar to the existing glibc, et
> al, APIs, but by going through our own internal API (which more
> consistently works with the glibc APIs and provides better error
> reporting in the event of issues, etc) we can then extend it to work as
> an encrypted stream instead.

Regarding this, would it use block-oriented access on the backend?

I agree that we need a better API layer through which all filesystem
access is routed. One of the notable weaknesses of the Cybertec patch
is that it has too large a code footprint,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Polyphase merge is obsolete

2021-10-05 Thread John Naylor

On Wed, Sep 15, 2021 at 5:35 PM Heikki Linnakangas  wrote:
> Thanks, here's another rebase.
>
> - Heikki

I've had a chance to review and test out the v5 patches.

0001 is a useful simplification. Nothing in 0002 stood out as needing
comment.

I've done some performance testing of master versus both patches applied.
The full results and test script are attached, but I'll give a summary
here. A variety of value distributions were tested, with work_mem from 1MB
to 16MB, plus 2GB which will not use external sort at all. I settled on 2
million records for the sort, to have something large enough to work with
but also keep the test time reasonable. That works out to about 130MB on
disk. We have recent improvements to datum sort, so I used both single
values and all values in the SELECT list.

The system was on a Westmere-era Xeon with gcc 4.8. pg_prewarm was run on
the input tables. The raw measurements were reduced to the minimum of five
runs.

I can confirm that sort performance is improved with small values of
work_mem. That was not the motivating reason for the patch, but it's a nice
bonus. Even as high as 16MB work_mem, it's possible some of the 4-6%
differences represent real improvement and not just noise or binary
effects, but it's much more convincing at 4MB and below, with 25-30% faster
with non-datum integer sorts at 1MB work_mem. The nominal regressions seem
within the noise level, with one exception that only showed up in one set
of measurements (-10.89% in the spreadsheet). I'm not sure what to make of
that since it only happens in one combination of factors and nowhere else.

On Sat, Sep 11, 2021 at 4:28 AM Zhihong Yu  wrote:
> + * Before PostgreSQL 14, we used the polyphase merge algorithm (Knuth's
> + * Algorithm 5.4.2D),
>
> I think the above 'Before PostgreSQL 14' should be 'Before PostgreSQL 15'
now that PostgreSQL 14 has been released.
>
> +static int64
> +merge_read_buffer_size(int64 avail_mem, int nInputTapes, int nInputRuns,
> +  int maxOutputTapes)
>
> For memory to allocate, I think uint64 can be used (instead of int64).

int64 is used elsewhere in this file, and I see now reason to do otherwise.

Aside from s/14/15/ for the version as noted above, I've marked it Ready
for Committer.

--
John Naylor
EDB: http://www.enterprisedb.com
set -e

ROWS=$1

function log {
	echo `date +%s` [`date +'%Y-%m-%d %H:%M:%S'`] $1
}

function create_tables {

	./inst/bin/psql > /dev/null  < /dev/null  < /dev/null  < /dev/null  < 16384 AND relkind = 'r';
EOF

}

# load test tables

function load_random {

	truncate_tables

	log "loading random"

	./inst/bin/psql > /dev/null  < /dev/null  < /dev/null  < /dev/null  < /dev/null  < /dev/null  < 0.5 
	then ''
	else ''
	end
FROM data_int;
EOF

	prewarm

	vacuum_analyze

}

# run

function run_query {

	times=""
	group=$1
	wmem=$2
	workers=$3
	query=$4

	echo "--" `date` [`date +%s`] >> explains.log
	echo "-- $group rows=$ROWS work_mem=$wmem workers=$workers" >> explains.log

	./inst/bin/psql >> explains.log 2>&1 < /dev/null <> results.csv
}

function run_queries {

	for wm in '1MB' '2MB' '4MB' '8MB' '16MB' '2GB'; do

		log "running queries work_mem=$wm noparallel"

		run_query $1 $wm 0 'SELECT a FROM int_test ORDER BY 1 OFFSET '$ROWS''
		run_query $1 $wm 0 'SELECT * FROM int_test ORDER BY 1 OFFSET '$ROWS''

		run_query $1 $wm 0 'SELECT a FROM txt_test ORDER BY 1 OFFSET '$ROWS''
		run_query $1 $wm 0 'SELECT * FROM txt_test ORDER BY 1 OFFSET '$ROWS''

	done

}

create_tables

log "sort benchmark $ROWS"

load_random
run_queries "random"

load_sorted
run_queries "sorted"

load_almost_sorted
run_queries "almost_sorted"

load_reversed
run_queries "reversed"

load_organ_pipe
run_queries "organ_pipe"

load_0_1
run_queries "0_1"

kway-merge-test-1.ods
Description: application/vnd.oasis.opendocument.spreadsheet

can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Bharath Rupireddy

Hi,

While working one of the internal features, I figured out that we
don't have subscription TAP tests option for "vcregress" tool for msvc
builds. Is there any specific reason that we didn't add "vcregress
subscriptioncheck" option similar to "vcregress recoverycheck"? It
looks like one can run with "vcregress taptest" option and PROVE_FLAGS
environment variable(I haven't tried it myself), but having
subscriptioncheck makes life easier.

Here's a small patch for that. Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-add-subscriptioncheck-option-for-vcregress-for-ms.patch
Description: Binary data

Re: storing an explicit nonce

2021-10-05 Thread Robert Haas

On Tue, Oct 5, 2021 at 1:24 PM Robert Haas  wrote:
> On Mon, Oct 4, 2021 at 10:00 PM Stephen Frost  wrote:
> > I do want to point out, as I think I did when we discussed this but want
> > to be sure it's also captured here- I don't think that temporary file
> > access should be forced to be block-oriented when it's naturally (in
> > very many cases) sequential.  To that point, I'm thinking that we need a
> > temp file access API through which various systems work that's
> > sequential and therefore relatively similar to the existing glibc, et
> > al, APIs, but by going through our own internal API (which more
> > consistently works with the glibc APIs and provides better error
> > reporting in the event of issues, etc) we can then extend it to work as
> > an encrypted stream instead.
>
> Regarding this, would it use block-oriented access on the backend?
>
> I agree that we need a better API layer through which all filesystem
> access is routed. One of the notable weaknesses of the Cybertec patch
> is that it has too large a code footprint,

(sent too soon)

...precisely because PostgreSQL doesn't have such a layer.

But I think ultimately we do want to encrypt and decrypt in blocks, so
if we create such a layer, it should expose byte-oriented APIs but
combine the actual I/Os somehow. That's also good for cutting down the
number of system calls, which is a benefit unto itself.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: RfC entries in CF 2021-09

2021-10-05 Thread Jaime Casanova

On Mon, Oct 04, 2021 at 02:08:58PM -0500, Jaime Casanova wrote:
> Hi,
> 
> Here's the list mentioned in ${SUBJECT}, please can the committers
> mention what they want to do with those?
> 

To move forward I have moved all RfC entries to next CF 

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: style for typedef of function that will be pointed to

2021-10-05 Thread Tom Lane

Chapman Flack  writes:
> From everything I've seen, the PostgreSQL style seems to be to include
> the * in a typedef for a function type to which pointers will be held:
> typedef void (*Furbinator)(char *furbee);

Yup.

> An alternative I've sometimes used elsewhere is to typedef the function
> type itself, and use the * when declaring a pointer to it:
> typedef void Furbinator(char *furbee);

Is that legal C?  I doubt that it was before C99 or so.  As noted
in the Ghostscript docs you came across, it certainly wouldn't have
been portable back in the day.

> So what I'm curious about is: is there a story to how PG settled on
> the style it uses?

See above.

regards, tom lane

Re: storing an explicit nonce

2021-10-05 Thread Antonin Houska

Robert Haas  wrote:

> On Tue, Oct 5, 2021 at 1:24 PM Robert Haas  wrote:
> > On Mon, Oct 4, 2021 at 10:00 PM Stephen Frost  wrote:
> > > I do want to point out, as I think I did when we discussed this but want
> > > to be sure it's also captured here- I don't think that temporary file
> > > access should be forced to be block-oriented when it's naturally (in
> > > very many cases) sequential.  To that point, I'm thinking that we need a
> > > temp file access API through which various systems work that's
> > > sequential and therefore relatively similar to the existing glibc, et
> > > al, APIs, but by going through our own internal API (which more
> > > consistently works with the glibc APIs and provides better error
> > > reporting in the event of issues, etc) we can then extend it to work as
> > > an encrypted stream instead.
> >
> > Regarding this, would it use block-oriented access on the backend?
> >
> > I agree that we need a better API layer through which all filesystem
> > access is routed. One of the notable weaknesses of the Cybertec patch
> > is that it has too large a code footprint,
> 
> (sent too soon)
> 
> ...precisely because PostgreSQL doesn't have such a layer.

I'm just trying to make our changes to buffile.c less invasive. Or do you mean
that this module should be reworked regardless the encryption?

> But I think ultimately we do want to encrypt and decrypt in blocks, so
> if we create such a layer, it should expose byte-oriented APIs but
> combine the actual I/Os somehow. That's also good for cutting down the
> number of system calls, which is a benefit unto itself.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: style for typedef of function that will be pointed to

2021-10-05 Thread Chapman Flack

On 10/05/21 13:47, Tom Lane wrote:
>> An alternative I've sometimes used elsewhere is to typedef the function
>> type itself, and use the * when declaring a pointer to it:
>> typedef void Furbinator(char *furbee);
> 
> Is that legal C?  I doubt that it was before C99 or so.  As noted
> in the Ghostscript docs you came across, it certainly wouldn't have
> been portable back in the day.


It compiles silently for me with gcc --std=c89 -Wpedantic

I think that's the oldest standard I can ask gcc about. Per the manpage,
'c89' is ISO C90 without its amendment 1, and without any gnuisms.

Regards,
-Chap

Re: Next Steps with Hash Indexes

2021-10-05 Thread Tomas Vondra





On 10/5/21 18:28, Simon Riggs wrote:

On Tue, 5 Oct 2021 at 12:24, Dilip Kumar  wrote:


On Tue, Oct 5, 2021 at 4:08 PM Simon Riggs  wrote:


On Mon, 27 Sept 2021 at 06:52, Amit Kapila  wrote:


On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar  wrote:


On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro  wrote:


And to get the multi-column hash index selected, we may set
enable_hashjoin =off, to avoid any condition become join condition,
saw similar behaviors in other DBs as well...


This may be related to Tom's point that, if some of the quals are
removed due to optimization or converted to join quals, then now, even
if the user has given qual on all the key columns the index scan will
not be selected because we will be forcing that the hash index can
only be selected if it has quals on all the key attributes?

I don't think suggesting enable_hashjoin =off is a solution,



Yeah, this doesn't sound like a good idea. How about instead try to
explore the idea where the hash (bucket assignment and search) will be
based on the first index key and the other columns will be stored as
payload? I think this might pose some difficulty in the consecutive
patch to enable a unique index because it will increase the chance of
traversing more buckets for uniqueness checks. If we see such
problems, then I have another idea to minimize the number of buckets
that we need to lock during uniqueness check which is by lock chaining
as is used during hash bucket clean up where at a time we don't need
to lock more than two buckets at a time.


I have presented a simple, almost trivial, patch to allow multi-col
hash indexes. It hashes the first column only, which can be a downside
in *some* cases. If that is clearly documented, it would not cause
many issues, IMHO. However, it does not have any optimization issues
or complexities, which is surely a very good thing.

Trying to involve *all* columns in the hash index is a secondary
optimization. It requires subtle changes in optimizer code, as Tom
points out. It also needs fine tuning to make the all-column approach
beneficial for the additional cases without losing against what the
"first column" approach gives.

I did consider both approaches and after this discussion I am still in
favour of committing the very simple "first column" approach to
multi-col hash indexes now.


But what about the other approach suggested by Tom, basically we hash
only based on the first column for identifying the bucket, but we also
store the hash value for other columns?  With that, we don't need
changes in the optimizer and we can also avoid a lot of disk fetches
because after finding the bucket we can match the secondary columns
before fetching the disk tuple.  I agree, one downside with this
approach is we will increase the index size.


Identifying the bucket is the main part of a hash index's work, so
that part would be identical.

Once we have identified the bucket, we sort the bucket page by hash,
so having an all-col hash would help de-duplicate multi-col hash
collisions, but not enough to be worth it, IMHO, given that storing an
extra 4 bytes per index tuple is a significant size increase which
would cause extra overflow pages etc.. The same thought applies to
8-byte hashes.



IMO it'd be nice to show some numbers to support the claims that storing 
the extra hashes and/or 8B hashes is not worth it ...


I'm sure there are cases where it'd be a net loss, but imagine for 
example a case when the first column has a lot of duplicate values. 
Which is not all that unlikely - duplicates seem like one of the natural 
reasons why people want multi-column hash indexes. And those duplicates 
are quite expensive, due to having to access the heap. Being able to 
eliminate those extra accesses cheaply might be a clear win, even if it 
makes the index a bit larger (shorter hashes might be enough).




IMHO, multi-column hash collisions are a secondary issue, given that
we can already select the column order for an index and hash indexes
would only be used by explicit user choice. If there are some minor
sub-optimal aspects of using hash indexes, then btree was already the
default and a great choice for many cases.



But we can't select arbitrary column order, because the first column is 
used to select the bucket. Which means it's also about what columns are 
used by the queries. If the query is not using the first column, it 
can't use the index.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-10-05 Thread Alvaro Herrera

Hi Stefan,

On 2021-Oct-03, Stefan Keller wrote:

> Just for my understanding - and perhaps as input for the documentation of 
> this:
> Are Foreign Key Arrays a means to implement "Generic Foreign Keys" as
> in Oracle [1] and Django [2], and of "Polymorphic Associations" as
> they call this in Ruby on Rails?

No -- at least as far as I was able to understand the pages you linked to.

It's intended for array elements of one column to reference values in a
scalar column.  These are always specific columns, not "generic" or
"polymorphic" (which I understand to mean one of several possible
columns).

Thanks,

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: .ready and .done files considered harmful

2021-10-05 Thread Bossart, Nathan

On 9/27/21, 11:06 AM, "Bossart, Nathan"  wrote:
> On 9/24/21, 9:29 AM, "Robert Haas"  wrote:
>> So what I am inclined to do is commit
>> v1-0001-Improve-performance-of-pgarch_readyXlog-with-many.patch.
>> However, v6-0001-Do-fewer-directory-scans-of-archive_status.patch has
>> perhaps evolved a bit more than the other one, so I thought I should
>> first ask whether any of those changes have influenced your thinking
>> about the batching approach and whether you want to make any updates
>> to that patch first. I don't really see that this is needed, but I
>> might be missing something.
>
> Besides sprucing up the comments a bit, I don't think there is
> anything that needs to be changed.  The only other thing I considered
> was getting rid of the arch_files array.  Instead, I would swap the
> comparator function the heap uses with a reverse one, rebuild the
> heap, and then have pgarch_readyXlog() return files via
> binaryheap_remove_first().  However, this seemed like a bit more
> complexity than necessary.
>
> Attached is a new version of the patch with some expanded comments.

I just wanted to gently bump this thread in case there is any
additional feedback.  I should have some time to work on it this week.
Also, it's looking more and more like this patch will nicely assist
the batching/loadable backup module stuff [0].

Nathan

[0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com

Windows crash / abort handling

2021-10-05 Thread Andres Freund

Hi,

As threatened in [1]... For CI, originally in the AIO project but now more
generally, I wanted to get windows backtraces as part of CI. I also was
confused why visual studio's "just in time debugging" (i.e. a window popping
up offering to debug a process when it crashes) didn't work with postgres.

My first attempt was to try to use the existing crashdump stuff in
pgwin32_install_crashdump_handler(). That's not really quite what I want,
because it only handles postmaster rather than any binary, but I thought it'd
be a good start. But outside of toy situations it didn't work for me.

A bunch of debugging later I figured out that the reason neither the
SetUnhandledExceptionFilter() nor JIT debugging works is that the
SEM_NOGPFAULTERRORBOX in the
  SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
we do in startup_hacks() prevents the paths dealing with crashes from being
reached.

The SEM_NOGPFAULTERRORBOX hails from:

commit 27bff7502f04ee01237ed3f5a997748ae43d3a81
Author: Bruce Momjian 
Date:   2006-06-12 16:17:20 +

Prevent Win32 from displaying a popup box on backend crash.  Instead let
the postmaster deal with it.

Magnus Hagander


I actually see error popups despite SEM_NOGPFAULTERRORBOX, at least for paths
reaching abort() (and thus our assertions).

The reason for abort() error boxes not being suppressed appears to be that in
debug mode a separate facility is reponsible for that: [2], [3]

"The default behavior is to print the message. _CALL_REPORTFAULT, if set,
specifies that a Watson crash dump is generated and reported when abort is
called. By default, crash dump reporting is enabled in non-DEBUG builds."

We apparently need _set_abort_behavior(_CALL_REPORTFAULT) to have abort()
behave the same between debug and release builds. [4]


To prevent the error popups we appear to at least need to call
_CrtSetReportMode(). The docs say:

  If you do not call _CrtSetReportMode to define the output destination of
  messages, then the following defaults are in effect:

  Assertion failures and errors are directed to a debug message window.

We can configure it so that that stuff goes to stderr, by calling
_CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
(and the same for _CRT_ERROR and perhaps _CRT_WARNING)
which removes the default _CRTDBG_MODE_WNDW.

It's possible that we'd need to do more than this, but this was sufficient to
get crash reports for segfaults and abort() in both assert and release builds,
without seeing an error popup.


To actually get the crash reports I ended up doing the following on the OS
level [5]:

Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows 
NT\CurrentVersion\AeDebug' -Name 'Debugger' -Value '\"C:\Windows 
Kits\10\Debuggers\x64\cdb.exe\" -p %ld -e %ld -g -kqm -c \".lines -e; .symfix+ 
;.logappend c:\cirrus\crashlog.txt ; !peb; ~*kP ; .logclose ; q \"' ; `
New-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows 
NT\CurrentVersion\AeDebug' -Name 'Auto' -Value 1 -PropertyType DWord ; `
Get-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows 
NT\CurrentVersion\AeDebug' -Name Debugger; `

This requires 'cdb' to be present, which is included in the Windows 10 SDK (or
other OS versions, it doesn't appear to have changed much). Whenever there's
an unhandled crash, cdb.exe is invoked with the parameters above, which
appends the crash report to crashlog.txt.

Alternatively we can generate "minidumps" [6], but that doesn't appear to be 
more
helpful for CI purposes at least - all we'd do is to create a backtrace using
the same tool. But it might be helpful for local development, to e.g. analyze
crashes in more detail.

The above ends up dumping all crashes into a single file, but that can
probably be improved. But cdb is so gnarly that I wanted to stop looking once
I got this far...


Andrew, I wonder if something like this could make sense for windows BF animals?


Greetings,

Andres Freund

[1] https://postgr.es/m/20211001222752.wrz7erzh4cajvgp6%40alap3.anarazel.de
[2] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/crtsetreportmode?view=msvc-160
[3] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/set-abort-behavior?view=msvc-160
[4] If anybody can explain to me what the two different parameters to
_set_abort_behavior() do, I'd be all ears
[5] 
https://docs.microsoft.com/en-us/windows/win32/debug/configuring-automatic-debugging
[6] https://docs.microsoft.com/en-us/windows/win32/wer/wer-settings

Re: Role Self-Administration

2021-10-05 Thread Mark Dilger

> On Oct 5, 2021, at 10:20 AM, Stephen Frost  wrote:
> 
> Greetings,
> 
> On Tue, Oct 5, 2021 at 13:17 Mark Dilger  wrote:
> > On Oct 5, 2021, at 10:14 AM, Stephen Frost  wrote:
> > 
> > What does the “ownership” concept actually buy us then?
> 
> DROP ... CASCADE
> 
> I’m not convinced that we need to invent the concept of ownership in order to 
> find a sensible way to make this work- though it would be helpful to first 
> get everyone’s idea of just what *would* this command do if run on a role who 
> “owns” or has “admin rights” of another role?

Ok, I'll start.  Here is how I envision it:

If roles have owners, then DROP ROLE bob CASCADE drops bob, bob's objects, 
roles owned by bob, their objects and any roles they own, recursively.  Roles 
which bob merely has admin rights on are unaffected, excepting that they are 
administered by one fewer roles once bob is gone.  

This design allows you to delegate to a new role some task, and you don't have 
to worry what network of other roles and objects they create, because in the 
end you just drop the one role cascade and all that other stuff is guaranteed 
to be cleaned up without any leaks.

If roles do not have owners, then DROP ROLE bob CASCADE drops role bob plus all 
objects that bob owns.  It doesn't cascade to other roles because the concept 
of "roles that bob owns" doesn't exist.  If bob created other roles, those will 
be left around.  Objects that bob created and then transferred to these other 
roles are also left around.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Andrew Dunstan

On 10/5/21 1:25 PM, Bharath Rupireddy wrote:
> Hi,
>
> While working one of the internal features, I figured out that we
> don't have subscription TAP tests option for "vcregress" tool for msvc
> builds. Is there any specific reason that we didn't add "vcregress
> subscriptioncheck" option similar to "vcregress recoverycheck"? It
> looks like one can run with "vcregress taptest" option and PROVE_FLAGS
> environment variable(I haven't tried it myself), but having
> subscriptioncheck makes life easier.
>
> Here's a small patch for that. Thoughts?
>

I would actually prefer to reduce the number of special things in
vcregress.pl rather than add more. We should be able to add a new set of
TAP tests somewhere without having to do anything to vcregress.pl.
That's more or less why I added the taptest option in the first place.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: TAP test for recovery_end_command

2021-10-05 Thread Andres Freund

Hi,

On 2021-09-14 10:34:09 +0530, Amul Sul wrote:
> +# recovery_end_command_file executed only on recovery end which can happen on
> +# promotion.
> +$standby3->promote;
> +ok(-f "$recovery_end_command_file",
> + 'recovery_end_command executed after promotion');

It'd be good to test what happens when recovery_end_command fails...

Greetings,

Andres Freund

Re: storing an explicit nonce

2021-10-05 Thread Bruce Momjian

On Tue, Sep 28, 2021 at 12:30:02PM +0300, Ants Aasma wrote:
> On Mon, 27 Sept 2021 at 23:34, Bruce Momjian  wrote:
> We are still working on our TDE patch. Right now the focus is on refactoring
> temporary file access to make the TDE patch itself smaller. Reconsidering
> encryption mode choices given concerns expressed is next. Currently a viable
> option seems to be AES-XTS with LSN added into the IV. XTS doesn't have an
> issue with predictable IV and isn't totally broken in case of IV reuse.

Uh, yes, AES-XTS has benefits, but since it is a block cipher, previous
16-byte blocks affect later blocks, meaning that hint bit changes would
also affect later blocks.  I think this means we would need to write WAL
full page images for hint bit changes to avoid torn pages.  Right now
hint bit (single bit) changes can be lost without causing torn pages. 
This was another of the advantages of using a stream cipher like CTR.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Andres Freund

Hi,

On 2021-10-05 16:03:53 -0400, Andrew Dunstan wrote:
> I would actually prefer to reduce the number of special things in
> vcregress.pl rather than add more. We should be able to add a new set of
> TAP tests somewhere without having to do anything to vcregress.pl.
> That's more or less why I added the taptest option in the first place.

My problem with that is that that means there's no convenient way to discover
what one needs to do to run all tests. Perhaps we could have one all-taptest
target or such?

Greetings,

Andres Freund

Re: storing an explicit nonce

2021-10-05 Thread Bruce Momjian

On Tue, Oct  5, 2021 at 04:29:25PM -0400, Bruce Momjian wrote:
> On Tue, Sep 28, 2021 at 12:30:02PM +0300, Ants Aasma wrote:
> > On Mon, 27 Sept 2021 at 23:34, Bruce Momjian  wrote:
> > We are still working on our TDE patch. Right now the focus is on refactoring
> > temporary file access to make the TDE patch itself smaller. Reconsidering
> > encryption mode choices given concerns expressed is next. Currently a viable
> > option seems to be AES-XTS with LSN added into the IV. XTS doesn't have an
> > issue with predictable IV and isn't totally broken in case of IV reuse.
> 
> Uh, yes, AES-XTS has benefits, but since it is a block cipher, previous
> 16-byte blocks affect later blocks, meaning that hint bit changes would
> also affect later blocks.  I think this means we would need to write WAL
> full page images for hint bit changes to avoid torn pages.  Right now
> hint bit (single bit) changes can be lost without causing torn pages. 
> This was another of the advantages of using a stream cipher like CTR.

Another problem caused by block mode ciphers is that to use the LSN as
part of the nonce, the LSN must not be encrypted, but you then have to
find a 16-byte block in the page that you don't need to encrypt.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Polyphase merge is obsolete

2021-10-05 Thread Peter Geoghegan

On Tue, Oct 5, 2021 at 10:25 AM John Naylor
 wrote:
> int64 is used elsewhere in this file, and I see now reason to do otherwise.

Right. The point of using int64 in tuplesort.c is that the values may
become negative in certain edge-cases. The whole LACKMEM() concept
that tuplesort.c uses to enforce work_mem limits sometimes allows the
implementation to use slightly more memory than theoretically
allowable. That's how we get negative values.

-- 
Peter Geoghegan

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-10-05 Thread Bruce Momjian

On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> While working on one of the internal features, we found that it is a
> bit difficult to run pg_waldump for a normal user to know WAL info and
> stats of a running postgres database instance in the cloud. Many a
> times users or DBAs or developers would want to get and analyze
> following:

Uh, are we going to implement everything that is only available at the
command line as an extension just for people who are using managed cloud
services where the command line is not available and the cloud provider
has not made that information accessible?  Seems like this might lead to
a lot of duplicated effort.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Andrew Dunstan



On 10/5/21 4:38 PM, Andres Freund wrote:
> Hi,
>
> On 2021-10-05 16:03:53 -0400, Andrew Dunstan wrote:
>> I would actually prefer to reduce the number of special things in
>> vcregress.pl rather than add more. We should be able to add a new set of
>> TAP tests somewhere without having to do anything to vcregress.pl.
>> That's more or less why I added the taptest option in the first place.
> My problem with that is that that means there's no convenient way to discover
> what one needs to do to run all tests. Perhaps we could have one all-taptest
> target or such?
>

Yeah. That's a much better proposal.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-10-05 Thread Chapman Flack

On 10/05/21 18:07, Bruce Momjian wrote:
> Uh, are we going to implement everything that is only available at the
> command line as an extension just for people who are using managed cloud
> services

One extension that runs a curated menu of command-line tools for you
and returns their stdout?

Regards,
-Chap

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-10-05 Thread Jeremy Schneider

On 10/5/21 15:07, Bruce Momjian wrote:
> On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
>> While working on one of the internal features, we found that it is a
>> bit difficult to run pg_waldump for a normal user to know WAL info and
>> stats of a running postgres database instance in the cloud. Many a
>> times users or DBAs or developers would want to get and analyze
>> following:
> 
> Uh, are we going to implement everything that is only available at the
> command line as an extension just for people who are using managed cloud
> services where the command line is not available and the cloud provider
> has not made that information accessible?  Seems like this might lead to
> a lot of duplicated effort.

No. For most command line utilities, there's no reason to expose them in
SQL or they already are exposed in SQL. For example, everything in
pg_controldata is already available via SQL functions.

Specifically exposing pg_waldump functionality in SQL could speed up
finding bugs in the PG logical replication code. We found and fixed a
few over this past year, but there are more lurking out there.

Having the extension in core is actually the only way to avoid
duplicated effort, by having shared code which the pg_dump binary and
the extension both wrap or call.

-Jeremy

-- 
http://about.me/jeremy_schneider

Re: Fix pg_log_backend_memory_contexts() 's delay

2021-10-05 Thread Michael Paquier

On Tue, Oct 05, 2021 at 12:16:06PM -0400, Robert Haas wrote:
> Perhaps that's so, but it doesn't seem like a good reason not to make
> them more responsive.

Yeah, that's still some information that the user asked for.  Looking
at the things that have a PGPROC entry, should we worry about the main
loop of the logical replication launcher?
--
Michael


signature.asc
Description: PGP signature

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Michael Paquier

On Tue, Oct 05, 2021 at 06:18:47PM -0400, Andrew Dunstan wrote:
> On 10/5/21 4:38 PM, Andres Freund wrote:
>> My problem with that is that that means there's no convenient way to discover
>> what one needs to do to run all tests. Perhaps we could have one all-taptest
>> target or such?
>>
> 
> Yeah. That's a much better proposal.

+1.  It is so easy to forget one or more targets when running this
script.
--
Michael


signature.asc
Description: PGP signature

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-10-05 Thread Bruce Momjian

On Tue, Oct  5, 2021 at 06:22:19PM -0400, Chapman Flack wrote:
> On 10/05/21 18:07, Bruce Momjian wrote:
> > Uh, are we going to implement everything that is only available at the
> > command line as an extension just for people who are using managed cloud
> > services
> 
> One extension that runs a curated menu of command-line tools for you
> and returns their stdout?

Yes, that would make sense, and something the cloud service providers
would write.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-10-05 Thread Bruce Momjian

On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> On 10/5/21 15:07, Bruce Momjian wrote:
> > On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
> >> While working on one of the internal features, we found that it is a
> >> bit difficult to run pg_waldump for a normal user to know WAL info and
> >> stats of a running postgres database instance in the cloud. Many a
> >> times users or DBAs or developers would want to get and analyze
> >> following:
> > 
> > Uh, are we going to implement everything that is only available at the
> > command line as an extension just for people who are using managed cloud
> > services where the command line is not available and the cloud provider
> > has not made that information accessible?  Seems like this might lead to
> > a lot of duplicated effort.
> 
> No. For most command line utilities, there's no reason to expose them in
> SQL or they already are exposed in SQL. For example, everything in
> pg_controldata is already available via SQL functions.

That's a good example.

> Specifically exposing pg_waldump functionality in SQL could speed up
> finding bugs in the PG logical replication code. We found and fixed a
> few over this past year, but there are more lurking out there.

Uh, why is pg_waldump more important than other command line tool
information?

> Having the extension in core is actually the only way to avoid
> duplicated effort, by having shared code which the pg_dump binary and
> the extension both wrap or call.

Uh, aren't you duplicating code by having pg_waldump as a command-line
tool and an extension?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Alvaro Herrera

On 2021-Oct-06, Michael Paquier wrote:

> On Tue, Oct 05, 2021 at 06:18:47PM -0400, Andrew Dunstan wrote:
> > On 10/5/21 4:38 PM, Andres Freund wrote:
> >> My problem with that is that that means there's no convenient way to 
> >> discover
> >> what one needs to do to run all tests. Perhaps we could have one 
> >> all-taptest
> >> target or such?
> > 
> > Yeah. That's a much better proposal.

So how about "vcregress check-world"?

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem."  (Tom Lane)

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Bharath Rupireddy

On Wed, Oct 6, 2021 at 6:53 AM Alvaro Herrera  wrote:
>
> On 2021-Oct-06, Michael Paquier wrote:
>
> > On Tue, Oct 05, 2021 at 06:18:47PM -0400, Andrew Dunstan wrote:
> > > On 10/5/21 4:38 PM, Andres Freund wrote:
> > >> My problem with that is that that means there's no convenient way to 
> > >> discover
> > >> what one needs to do to run all tests. Perhaps we could have one 
> > >> all-taptest
> > >> target or such?
> > >
> > > Yeah. That's a much better proposal.
>
> So how about "vcregress check-world"?

I was thinking of the same. +1 for "vcregress check-world" which is
more in sync with it's peer "make check-world" instead of "vcregress
all-taptest". I'm not sure whether we can also have "vcregress
installcheck-world" as well.

Having said that, with these new options, are we going to have only below?

vcregress check
vcregress installcheck
vcregress check-world
vcregress installcheck-world (?)

And remove others?

vcregress plcheck
vcregress contribcheck
vcregress modulescheck
vcregress ecpgcheck
vcregress isolationcheck
vcregress bincheck
vcregress recoverycheck
vcregress upgradecheck

Regards,
Bharath Rupireddy.

Re: Fix pg_log_backend_memory_contexts() 's delay

2021-10-05 Thread Bharath Rupireddy

On Wed, Oct 6, 2021 at 5:10 AM Michael Paquier  wrote:
>
> On Tue, Oct 05, 2021 at 12:16:06PM -0400, Robert Haas wrote:
> > Perhaps that's so, but it doesn't seem like a good reason not to make
> > them more responsive.
>
> Yeah, that's still some information that the user asked for.  Looking
> at the things that have a PGPROC entry, should we worry about the main
> loop of the logical replication launcher?

IMHO, we can support all the processes which return a PGPROC entry by
both BackendPidGetProc and AuxiliaryPidGetProc where the
AuxiliaryPidGetProc would cover the following processes. I'm not sure
one is interested in the  memory context info of auxiliary processes.

/*
 * We set aside some extra PGPROC structures for auxiliary processes,
 * ie things that aren't full-fledged backends but need shmem access.
 *
 * Background writer, checkpointer, WAL writer and archiver run during normal
 * operation.  Startup process and WAL receiver also consume 2 slots, but WAL
 * writer is launched only after startup has exited, so we only need 5 slots.
 */
#define NUM_AUXILIARY_PROCS 5

Regards,
Bharath Rupireddy.

Re: Skipping logical replication transactions on subscriber side

2021-10-05 Thread Greg Nancarrow

On Mon, Oct 4, 2021 at 4:31 PM Amit Kapila  wrote:
>
> I think here the main point is that does this addresses Peter's
> concern for this Patch to use a separate syntax? Peter E., can you
> please confirm? Do let us know if you have something else going in
> your mind?
>

Peter's concern seemed to be that the use of a subscription option,
though convenient, isn't an intuitive natural fit for providing this
feature (i.e. ability to skip a transaction by xid). I tend to have
that feeling about using a subscription option for this feature. I'm
not sure what possible alternative syntax he had in mind and currently
can't really think of a good one myself that fits the purpose.

I think that the 1st and 2nd patch are useful in their own right, but
couldn't this feature (i.e. the 3rd patch) be provided instead as an
additional Replication Management function (see 9.27.6)?
e.g. pg_replication_skip_xid

Regards,
Greg Nancarrow
Fujitsu Australia

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Michael Paquier

On Wed, Oct 06, 2021 at 07:19:04AM +0530, Bharath Rupireddy wrote:
> I was thinking of the same. +1 for "vcregress check-world" which is
> more in sync with it's peer "make check-world" instead of "vcregress
> all-taptest". I'm not sure whether we can also have "vcregress
> installcheck-world" as well.

check-world, if it spins new instances for each contrib/ test, would
be infinitely slower than installcheck-world.  I recall that's why
Andrew has been doing an installcheck for contribcheck to minimize the
load.  If you run the TAP tests, perhaps you don't really care anyway,
but I think that there is a case for making this set of targets run as
fast as we can, if we can, when TAP is disabled.

> Having said that, with these new options, are we going to have only below?
> 
> vcregress check
> vcregress installcheck
> vcregress check-world
> vcregress installcheck-world (?)
> 
> And remove others?

My take is that there is value for both installcheck-world and
check-world, depending on if we want to test on an installed instance
or not.  For CIs, check-world makes things simpler perhaps?

> vcregress plcheck
> vcregress contribcheck
> vcregress modulescheck
> vcregress ecpgcheck
> vcregress isolationcheck
> vcregress bincheck
> vcregress recoverycheck
> vcregress upgradecheck

I don't really see why we should do that, the code paths are the same
and the sub-routines would still be around, but don't cost much in
maintenance.
--
Michael

signature.asc
Description: PGP signature

Re: Added schema level support for publication.

2021-10-05 Thread vignesh C

On Tue, Oct 5, 2021 at 4:41 PM Amit Kapila  wrote:
>
> On Tue, Oct 5, 2021 at 11:10 AM Greg Nancarrow  wrote:
> >
> > On Tue, Oct 5, 2021 at 3:11 PM Amit Kapila  wrote:
> > >
> > > > Code has been added to prevent a table being set (via ALTER TABLE) to
> > > > UNLOGGED if it is part of a publication, but I found that I could
> > > > still add all tables of a schema having a table that is UNLOGGED:
> > > >
> > > > postgres=# create schema sch;
> > > > CREATE SCHEMA
> > > > postgres=# create unlogged table sch.test(i int);
> > > > CREATE TABLE
> > > > postgres=# create publication pub for table sch.test;
> > > > ERROR:  cannot add relation "test" to publication
> > > > DETAIL:  Temporary and unlogged relations cannot be replicated.
> > > > postgres=# create publication pub for all tables in schema sch;
> > > > CREATE PUBLICATION
> > > >
> > >
> > > What about when you use "create publication pub for all tables;"? I
> > > think that also works, now on similar lines shouldn't the behavior of
> > > "all tables in schema" publication be the same? I mean if we want we
> > > can detect and give an error but is it advisable to give an error if
> > > there are just one or few tables in schema that are unlogged?
> > >
> >
> >
> ..
> > Yes, I agree that ALL TABLES IN SCHEMA should behave the same as the
> > ALL TABLES case.
> > Problem is, shouldn't setting UNLOGGED on a table only then be
> > disallowed if that table was publicised using FOR TABLE?
> >
>
> Right, I also think so. Let us see what Vignesh or others think on this 
> matter.

Even I felt ALL TABLES IN SCHEMA should behave the same way as the ALL
TABLES case. I will keep the create publication behavior as it is i.e.
to allow even if unlogged tables are present and change the below
alter table behavior which was throwing error to be successful to keep
it similar to ALL TABLES publication:
alter table sch.test3 set unlogged;
ERROR:  cannot change table "test3" to unlogged because it is part of
a publication
DETAIL:  Unlogged relations cannot be replicated.

Regards,
Vignesh

Re: Transactions involving multiple postgres foreign servers, take 2

2021-10-05 Thread Fujii Masao





On 2021/10/05 10:38, Masahiko Sawada wrote:

Hi,

On Tue, Oct 5, 2021 at 9:56 AM k.jami...@fujitsu.com
 wrote:


Hi Sawada-san,

I noticed that this thread and its set of patches have been marked with "Returned 
with Feedback" by yourself.
I find the feature (atomic commit for foreign transactions) very useful
and it will pave the road for having a distributed transaction management in 
Postgres.
Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?


As I could not reply to the review comments from Fujii-san for almost
three months, I don't have enough time to move this project forward at
least for now. That's why I marked this patch as RWF. I’d like to
continue working on this project in my spare time but I know this is
not a project that can be completed by using only my spare time. If
someone wants to work on this project, I’d appreciate it and am happy
to help.


Probably it's time to rethink the approach. The patch introduces
foreign transaction manager into PostgreSQL core, but as far as
I review the patch, its changes look overkill and too complicated.
This seems one of reasons why we could not have yet committed
the feature even after several years.

Another concern about the approach of the patch is that it needs
to change a backend so that it additionally waits for replication
during commit phase before executing PREPARE TRANSACTION
to foreign servers. Which would decrease the performance
during commit phase furthermore.

So I wonder if it's worth revisiting the original approach, i.e.,
add the atomic commit into postgres_fdw. One disadvantage of
this is that it supports atomic commit only between foreign
PostgreSQL servers, not other various data resources like MySQL.
But I'm not sure if we really want to do atomic commit between
various FDWs. Maybe supporting only postgres_fdw is enough
for most users. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2021-10-05 Thread Bharath Rupireddy

On Wed, Oct 6, 2021 at 7:52 AM Michael Paquier  wrote:
> My take is that there is value for both installcheck-world and
> check-world, depending on if we want to test on an installed instance
> or not.  For CIs, check-world makes things simpler perhaps?
>
> > vcregress plcheck
> > vcregress contribcheck
> > vcregress modulescheck
> > vcregress ecpgcheck
> > vcregress isolationcheck
> > vcregress bincheck
> > vcregress recoverycheck
> > vcregress upgradecheck
>
> I don't really see why we should do that, the code paths are the same
> and the sub-routines would still be around, but don't cost much in
> maintenance.

Yeah, they can also be useful if someone wants to run tests
selectively. I'm just thinking that the "vcregress subscriptioncheck"
as proposed in my first email in this thread will also be useful (?)
along with the "vcregress check-world" and "vcregress
installcheck-world". Thoughts?

Regards,
Bharath Rupireddy.

Re: Adding CI to our tree

2021-10-05 Thread Thomas Munro

On Sat, Oct 2, 2021 at 11:27 AM Andres Freund  wrote:
> - runs check-world on FreeBSD, Linux, macOS - all using gcc

Small correction: on macOS and FreeBSD it's using the vendor compiler,
which is some kind of clang.

BTW, on those two OSes there are some messages like this each time a
submake dumps its output to the log:

[03:36:16.591] fcntl(): Bad file descriptor

It seems worth putting up with these compared to the alternatives of
either not using -j, not using -Otarget and having the output of
parallel tests all mashed up and unreadable (that still happen
sometimes but it's unlikely, because the submakes write() whole output
chunks at infrequent intervals), or redirecting to a file so you can't
see the realtime test output on the main CI page (not so fun, you have
to wait until it's finished and view it as an 'artifact').  I tried to
write a patch for GNU make to fix that[1], let's see if something
happens.

[1] https://savannah.gnu.org/bugs/?52922

Re: Timeout failure in 019_replslot_limit.pl

2021-10-05 Thread Michael Paquier

On Sat, Oct 02, 2021 at 07:00:01PM -0300, Alvaro Herrera wrote:
> A patch was proposed on that thread on September 22nd, can to try with
> that and see if this problem still reproduces?

Yes, the failure still shows up, even with a timeout set at 30s which
is the default of the patch.
--
Michael


signature.asc
Description: PGP signature

Re: Adding CI to our tree

2021-10-05 Thread Andres Freund

Hi,

On 2021-10-06 17:01:53 +1300, Thomas Munro wrote:
> On Sat, Oct 2, 2021 at 11:27 AM Andres Freund  wrote:
> > - runs check-world on FreeBSD, Linux, macOS - all using gcc
>
> Small correction: on macOS and FreeBSD it's using the vendor compiler,
> which is some kind of clang.

Oh, oops. I guess that's even better ;).


> I tried to write a patch for GNU make to fix that[1], let's see if something
> happens.
>
> [1] https://savannah.gnu.org/bugs/?52922

It'd be nice to get rid of these... They're definitely confusing.

Greetings,

Andres Freund

Lost logs with csvlog redirected to stderr under WIN32 service

2021-10-05 Thread Michael Paquier

Hi all,

While reviewing the code of elog.c to plug in JSON as a file-based log
destination, I have found what looks like a bug in
send_message_to_server_log().  If LOG_DESTINATION_CSVLOG is enabled,
we would do the following to make sure that the log entry is not
missed:
- If redirection_done or in the logging collector, call write_csvlog()
to write the CSV entry using the piped protocol or write directly if
the logging collector does the call.
- If the log redirection is not available yet, we'd just call
write_console() to redirect the message to stderr, which would be done
if it was not done in the code block for stderr before handling CSV to
avoid duplicates.  This uses a condition that matches the one based on
Log_destination and whereToSendOutput.

Now, in the stderr code path, we would actually do more than that:
- write_pipe_chunks() for a non-syslogger process if redirection is
done.
- If there is no redirection, redirect to eventlog when running as a
service on WIN32, or simply stderr with write_console().

So at the end, if one enables only csvlog, we would not capture any
logs if the redirection is not ready yet on WIN32 when running as a
service, meaning that we could lose some precious information if there
is for example a startup failure.

This choice comes from fd801f4 in 2007, that introduced csvlog as
a log_destination.

I think that there is a good argument for back-patching a fix, but I
don't recall seeing anybody complaining about that and I just need
that for the business with JSON.  I have thought about various ways to
fix that, and finished with a solution where we handle csvlog first,
and fallback to stderr after so as there is only one code path for
stderr, as of the attached.  This reduces a bit the confusion around
the handling of the stderr data that gets free()'d in more code paths
than really needed.

Thoughts or objections?
--
Michael
From a7b10adbe871a2330ddde9e593480171a3ef83f2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 6 Oct 2021 14:07:20 +0900
Subject: [PATCH] Fix fallback to stderr when using only csvlog

send_message_to_server_log() would force a redirection of a log entry to
stderr in some cases, like the syslogger not being available yet.  If
this happens, csvlog falls back to stderr to log something.  The code
was organized so as stderr was happening before csvlog, with csvlog
calling checking for the same conditions as the stderr code path for a
second time, with a log handling buggy for Windows.

Instead, move the csvlog handling to be before stderr, tracking down if
it is necessary to log something to stderr after.  The reduces the
handling of stderr to a single code path, and fixes one issue with the
case of a Windows service, where stderr is not available, where we would
lose log entries that should have been redirected from csvlog to
stderr.
---
 src/backend/utils/error/elog.c | 56 --
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 2af87ee3bd..2a88ef9e3b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -3008,6 +3008,7 @@ static void
 send_message_to_server_log(ErrorData *edata)
 {
 	StringInfoData buf;
+	bool		fallback_to_stderr = false;
 
 	initStringInfo(&buf);
 
@@ -3159,8 +3160,29 @@ send_message_to_server_log(ErrorData *edata)
 	}
 #endif			/* WIN32 */
 
-	/* Write to stderr, if enabled */
-	if ((Log_destination & LOG_DESTINATION_STDERR) || whereToSendOutput == DestDebug)
+	/* Write to CSV log, if enabled */
+	if ((Log_destination & LOG_DESTINATION_CSVLOG) != 0)
+	{
+		/*
+		 * Send CSV data if it's safe to do so (syslogger doesn't need the
+		 * pipe).  If this is not possible, fallback to an entry written
+		 * to stderr.
+		 */
+		if (redirection_done || MyBackendType == B_LOGGER)
+		{
+			write_csvlog(edata);
+		}
+		else
+			fallback_to_stderr = true;
+	}
+
+	/*
+	 * Write to stderr, if enabled or if required because of a previous
+	 * limitation.
+	 */
+	if ((Log_destination & LOG_DESTINATION_STDERR) ||
+		whereToSendOutput == DestDebug ||
+		fallback_to_stderr)
 	{
 		/*
 		 * Use the chunking protocol if we know the syslogger should be
@@ -3189,34 +3211,8 @@ send_message_to_server_log(ErrorData *edata)
 	if (MyBackendType == B_LOGGER)
 		write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_STDERR);
 
-	/* Write to CSV log if enabled */
-	if (Log_destination & LOG_DESTINATION_CSVLOG)
-	{
-		if (redirection_done || MyBackendType == B_LOGGER)
-		{
-			/*
-			 * send CSV data if it's safe to do so (syslogger doesn't need the
-			 * pipe). First get back the space in the message buffer.
-			 */
-			pfree(buf.data);
-			write_csvlog(edata);
-		}
-		else
-		{
-			/*
-			 * syslogger not up (yet), so just dump the message to stderr,
-			 * unless we already did so above.
-			 */
-			if (!(Log_destination & LOG_DESTINATION_STDERR) &&
-whereToSendOutput != DestDebug)
-write_c

wrapping CF 2021-09

2021-10-05 Thread Jaime Casanova

Hi everyone,

At this time I'm looking to close the CF.

So, I will write and mark as RwF the patches currently failing on the
cfbot and move to the next CF all other patches.

RwF:
- ALTER SYSTEM READ { ONLY | WRITE }
- GUC to disable cancellation of awaiting for synchronous replication
- Introduce ProcessInterrupts_hook for C extensions
- Make message at end-of-recovery less scary

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Improved tab completion for PostgreSQL parameters in enums

2021-10-05 Thread bt21masumurak


Hi

If you do tab completion in a situation like A, you will see ["on"] 
instead of [on].


A : "ALTER SYSTEM SET wal_compression TO "

I made a patch for this problem.

regards,
Kosei Masumuradiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 4155d81252..86087e790b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -969,7 +969,7 @@ Query_for_index_of_table \
 
 #define Query_for_enum \
 " SELECT name FROM ( "\
-"   SELECT pg_catalog.quote_ident(pg_catalog.unnest(enumvals)) AS name "\
+"   SELECT pg_catalog.unnest(enumvals) AS name "\
 " FROM pg_catalog.pg_settings "\
 "WHERE pg_catalog.lower(name)=pg_catalog.lower('%s') "\
 "UNION ALL " \

Re: Improved tab completion for PostgreSQL parameters in enums

2021-10-05 Thread Michael Paquier

On Wed, Oct 06, 2021 at 02:24:40PM +0900, bt21masumurak wrote:
> If you do tab completion in a situation like A, you will see ["on"] instead
> of [on].
> 
> A : "ALTER SYSTEM SET wal_compression TO "
> 
> I made a patch for this problem.

This would break the completion of enum entries that require quotes to
work properly for some of their values, like
default_transaction_isolation.
--
Michael


signature.asc
Description: PGP signature

Re: Added schema level support for publication.

2021-10-05 Thread vignesh C

On Tue, Oct 5, 2021 at 6:57 AM Greg Nancarrow  wrote:
>
> On Mon, Oct 4, 2021 at 4:55 AM vignesh C  wrote:
> >
> > Attached v36 patch has the changes for the same.
> >
>
> I have some comments on the v36-0001 patch:
>
> src/backend/commands/publicationcmds.c
>
> (1)
> GetPublicationSchemas()
>
> + /* Find all publications associated with the schema */
> + pubschsrel = table_open(PublicationNamespaceRelationId, AccessShareLock);
>
> I think the comment is not correct. It should say:
>
> + /* Find all schemas associated with the publication */

Modified

> (2)
> AlterPublicationSchemas
>
> I think that a comment should be added for the following lines,
> something like the comment used in the similar check in
> AlterPublicationTables():
>
> + if (!schemaidlist && stmt->action != DEFELEM_SET)
> + return;

Modified

> (3)
> CheckAlterPublication
>
> Minor comment fix suggested:
>
> BEFORE:
> + * Check if relations and schemas can be in given publication and throws
> AFTER:
> + * Check if relations and schemas can be in a given publication and throw

Modified

> (4)
> LockSchemaList()
>
> Suggest re-word of comment, to match imperative comment style used
> elsewhere in this code.
>
> BEFORE:
> + * The schemas specified in the schema list are locked in AccessShareLock 
> mode
> AFTER:
> + * Lock the schemas specified in the schema list in AccessShareLock mode

Modified

>
> src/backend/commands/tablecmds.c
>
> (5)
>
> Code has been added to prevent a table being set (via ALTER TABLE) to
> UNLOGGED if it is part of a publication, but I found that I could
> still add all tables of a schema having a table that is UNLOGGED:
>
> postgres=# create schema sch;
> CREATE SCHEMA
> postgres=# create unlogged table sch.test(i int);
> CREATE TABLE
> postgres=# create publication pub for table sch.test;
> ERROR:  cannot add relation "test" to publication
> DETAIL:  Temporary and unlogged relations cannot be replicated.
> postgres=# create publication pub for all tables in schema sch;
> CREATE PUBLICATION

I have changed the alter table behavior to allow setting it to an
unlogged table to keep the behavior similar to "ALL TABLES"
publication. I have kept the create publication behavior as it is, it
will be similar to "ALL TABLES" publication i.e to allow create
publication even if there are unlogged tables present.

These comments are handled in the v37 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm0ON%3D012jGC3oquSVVWTWXhHG0q8yOyRROVGFR9PjWa-g%40mail.gmail.com

Regards,
Vignesh

Running tests under valgrind is getting slower at an alarming pace

2021-10-05 Thread Andres Freund

Hi,


After a recent migration of the skink and a few other animals (sorry for the
false reports on BF, I forgot to adjust a path), I looked at the time it takes
to complete a valgrind run:

9.6: Consumed 4h 53min 18.518s CPU time
10: Consumed 5h 32min 50.839s CPU time
11: Consumed 7h 7min 17.455s CPU time
14: still going at 11h 51min 57.951s
HEAD: 14h 32min 29.571s CPU time

I changed it so that HEAD with be built in parallel separately from the other
branches, so that HEAD gets results within a useful timeframe. But even with
that, the test times are increasing at a rate we're not going to be able to
keep up.

Part of this is caused by a lot of tests running serially, rather than in
parallel. I was pondering setting PROVE_FLAGS=-j5 or something to reduce the
impact of tap tests a bit.

Greetings,

Andres Freund

1 2 >

1 - 100 of 107 matches

Mail list logo