from:"Noah Misch"

Re: do only critical work during single-user vacuum?

2022-02-20 Thread Noah Misch

On Sun, Feb 20, 2022 at 02:15:37PM -0800, Andres Freund wrote:
> On 2022-02-19 20:57:57 -0800, Noah Misch wrote:
> > On Wed, Feb 16, 2022 at 03:43:12PM +0700, John Naylor wrote:
> > > On Wed, Feb 16, 2022 at 6:17 AM Peter Geoghegan  wrote:
> > > > On Tue, Feb 15, 2022 at 9:28 AM Peter Geoghegan  wrote:
> > > 
> > > > > I did notice from my own testing of the failsafe (by artificially
> > > > > inducing wraparound failure using an XID burning C function) that
> > > > > autovacuum seemed to totally correct the problem, even when the system
> > > > > had already crossed xidStopLimit - it came back on its own. I wasn't
> > > > > completely sure of how robust this effect was, though.
> > > 
> > > I'll put some effort in finding any way that it might not be robust.
> > 
> > A VACUUM may create a not-trivially-bounded number of multixacts via
> > FreezeMultiXactId().  In a cluster at multiStopLimit, completing VACUUM
> > without error needs preparation something like:
> > 
> > 1. Kill each XID that might appear in a multixact.
> > 2. Resolve each prepared transaction that might appear in a multixact.
> > 3. Run VACUUM.  At this point, multiStopLimit is blocking new multixacts 
> > from
> >other commands, and the lack of running multixact members removes the 
> > need
> >for FreezeMultiXactId() to create multixacts.
> > 
> > Adding to the badness of single-user mode so well described upthread, one 
> > can
> > enter it without doing (2) and then wrap the nextMXact counter.
> 
> If we collected the information along the lines of  I proposed in the second 
> half of
> https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
> we should be able to handle such cases more intelligently, I think?
> 
> We could e.g. add an error if FreezeMultiXactId() needs to create a new
> multixact for a far-in-the-past xid. That's not great, of course, but if we
> include the precise cause (pid of backend / prepared xact name / slot name /
> ...) necessitating creating a new multi, it'd still be a significant
> improvement over the status quo.

Yes, exactly.

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-03-03 Thread Noah Misch

On Sun, Jan 16, 2022 at 01:02:41PM -0800, Noah Misch wrote:
> My next steps:
> 
> - Report a Debian bug for the sparc64+ext4 zeros problem.

Reported to Debian, then upstream:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006157
https://marc.info/?t=16453926991

Last week, someone confirmed the behavior on kernel 5.17.0-rc5.  No other news.

Re: Replication slot drop message is sent after pgstats shutdown.

2022-03-18 Thread Noah Misch

On Tue, Feb 15, 2022 at 08:58:56AM -0800, Andres Freund wrote:
> Pushed the test yesterday evening, after Tom checked if it is likely to be
> problematic. Seems to worked without problems so far.

 wrasse│ 2022-02-15 09:29:06 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-02-15%2009%3A29%3A06
 flaviventris  │ 2022-02-24 15:17:30 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-02-24%2015%3A17%3A30
 calliphoridae │ 2022-03-08 01:14:51 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2022-03-08%2001%3A14%3A51

The buildfarm failed to convey adequate logs for this particular test suite.
Here's regression.diffs from the wrasse case (saved via keep_error_builds):

===
diff -U3 
/export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out
 
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
--- 
/export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out
Tue Feb 15 06:58:14 2022
+++ 
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
Tue Feb 15 11:38:14 2022
@@ -29,16 +29,17 @@
 t
 (1 row)
 
-step s2_init: <... completed>
-ERROR:  canceling statement due to user request
 step s1_view_slot: 
 SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE 
slot_name = 'slot_creation_error'
 
-slot_name|slot_type|active
--+-+--
-(0 rows)
+slot_name  |slot_type|active
+---+-+--
+slot_creation_error|logical  |t 
+(1 row)
 
 step s1_c: COMMIT;
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request
 
 starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
 step s1_b: BEGIN;
===

I can make it fail that way by injecting a 1s delay here:

--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
 */
if (!DoingCommandRead)
{
+   pg_usleep(1 * 1000 * 1000);
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),

I plan to fix this as attached, similar to how commit c04c767 fixed the same
challenge in detach-partition-concurrently-[34].
Author: Noah Misch 
Commit: Noah Misch 

Close race condition in slot_creation_error.spec.

Use the pattern from detach-partition-concurrently-3.spec.  Per
buildfarm member wrasse.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/contrib/test_decoding/expected/slot_creation_error.out 
b/contrib/test_decoding/expected/slot_creation_error.out
index 043bdae..25883b5 100644
--- a/contrib/test_decoding/expected/slot_creation_error.out
+++ b/contrib/test_decoding/expected/slot_creation_error.out
@@ -23,14 +23,15 @@ step s1_cancel_s2:
 SELECT pg_cancel_backend(pid)
 FROM pg_stat_activity
 WHERE application_name = 'isolation/slot_creation_error/s2';
-
+ 
+step s2_init: <... completed>
+ERROR:  canceling statement due to user request
+step s1_cancel_s2: <... completed>
 pg_cancel_backend
 -
 t
 (1 row)
 
-step s2_init: <... completed>
-ERROR:  canceling statement due to user request
 step s1_view_slot: 
 SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE 
slot_name = 'slot_creation_error'
 
@@ -90,18 +91,19 @@ step s1_terminate_s2:
 SELECT pg_terminate_backend(pid)
 FROM pg_stat_activity
 WHERE application_name = 'isolation/slot_creation_error/s2';
-
-pg_terminate_backend
-
-t   
-(1 row)
-
+ 
 step s2_init: <... completed>
 FATAL:  terminating connection due to administrator command
 server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
 
+step s1_terminate_s2: <... completed>
+pg_terminate_backend
+
+t   
+(1 row)
+
 step s1_c: COMMIT;
 step s1_view_slot: 
 SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE 
slot_name = 'slot_creation_error'
diff --git a/contrib/test_decoding/specs/slot_creation_error.spec 
b/contrib/test_decoding/specs/slot_creation_error.spec
index 6816696..d1e35bf 100644
--- a/contrib/test_decoding/specs/slot_creation_error.spec
+++ b/contrib/test_decoding/specs/slot_creation_error.spec
@@ -35,7 +35,7 @@ step s2_init {
 # The tests first start a transaction with an xid assigned in s1, then create
 # a slot in s2. The slot creation waits for s1&#x

Re: Replication slot drop message is sent after pgstats shutdown.

2022-03-18 Thread Noah Misch

On Fri, Mar 18, 2022 at 01:24:15PM -0700, Andres Freund wrote:
> On 2022-03-18 00:28:37 -0700, Noah Misch wrote:
> > ===
> > diff -U3 
> > /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out
> >  
> > /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
> > --- 
> > /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out
> > Tue Feb 15 06:58:14 2022
> > +++ 
> > /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
> > Tue Feb 15 11:38:14 2022
> > @@ -29,16 +29,17 @@
> >  t
> >  (1 row)
> >  
> > -step s2_init: <... completed>
> > -ERROR:  canceling statement due to user request
> >  step s1_view_slot: 
> >  SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE 
> > slot_name = 'slot_creation_error'
> >  
> > -slot_name|slot_type|active
> > --+-+--
> > -(0 rows)
> > +slot_name  |slot_type|active
> > +---+-+--
> > +slot_creation_error|logical  |t 
> > +(1 row)
> >  
> >  step s1_c: COMMIT;
> > +step s2_init: <... completed>
> > +ERROR:  canceling statement due to user request
> >  
> >  starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
> >  step s1_b: BEGIN;
> > ===
> > 
> > I can make it fail that way by injecting a 1s delay here:
> > 
> > --- a/src/backend/tcop/postgres.c
> > +++ b/src/backend/tcop/postgres.c
> > @@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
> >  */
> > if (!DoingCommandRead)
> > {
> > +   pg_usleep(1 * 1000 * 1000);
> > LockErrorCleanup();
> > ereport(ERROR,
> > (errcode(ERRCODE_QUERY_CANCELED),
> 
> So isolationtester still sees the blocking condition from before the cancel
> processing is finished and thus proceeds to the next statement - which it
> normally should only do once the other running step is detected as still
> blocking?

Essentially.  It called s1_view_slot too early.  s2_init can remain blocked
arbitrarily long after pg_cancel_backend returns.  Writing
s1_cancel_s2(s2_init) directs the isolationtester to send the cancel, then
wait for s2_init to finish, then wait for the cancel to finish.

> I wonder if we should emit  everytime a step is detected anew as
> being blocked to make it easier to understand issues like this.

Good question.

> > diff --git a/contrib/test_decoding/specs/slot_creation_error.spec 
> > b/contrib/test_decoding/specs/slot_creation_error.spec
> > index 6816696..d1e35bf 100644
> > --- a/contrib/test_decoding/specs/slot_creation_error.spec
> > +++ b/contrib/test_decoding/specs/slot_creation_error.spec
> > @@ -35,7 +35,7 @@ step s2_init {
> >  # The tests first start a transaction with an xid assigned in s1, then 
> > create
> >  # a slot in s2. The slot creation waits for s1's transaction to end. 
> > Instead
> >  # we cancel / terminate s2.
> > -permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
> > +permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) 
> > s1_view_slot s1_c
> >  permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check 
> > slot creation still works
> > -permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
> > +permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot
> >  # can't run tests after this, due to s2's connection failure
> 
> That looks good to me.

Pushed.  Kyotaro Horiguchi had posted a patch that also changed the
pg_terminate_backend call in temp-schema-cleanup.spec.  I think that one is
fine as-is, because it does pg_terminate_backend(pg_backend_pid()).  There's
no way for a backend running that particular command to report that it's ready
for another query, so the problem doesn't arise.

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-03-19 Thread Noah Misch

On Mon, Jan 10, 2022 at 04:25:27PM -0500, Tom Lane wrote:
> Apropos of that, it's worth noting that wait_for_catchup *is*
> dependent on up-to-date stats, and here's a recent run where
> it sure looks like the timeout cause is AWOL stats collector:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2022-01-10%2004%3A51%3A34
> 
> I wonder if we should refactor wait_for_catchup to probe the
> standby directly instead of relying on the upstream's view.

It would be nice.  For logical replication tests, do we have a monitoring API
independent of the stats collector?  If not and we don't want to add one, a
hacky alternative might be for wait_for_catchup to run a WAL-writing command
every ~20s.  That way, if the stats collector misses the datagram about the
standby reaching a certain LSN, the stats collector would have more chances.

Re: Recent eelpout failures on 9.x branches

2020-12-01 Thread Noah Misch

On Tue, Dec 01, 2020 at 06:07:17PM -0500, Tom Lane wrote:
> Thomas Munro  writes:
> > Unfortunately, eelpout got kicked off the nice shiny ARM server it was
> > running on so last week I moved it to a rack mounted Raspberry Pi.  It
> > seems to be totally I/O starved causing some timeouts to be reached,
> > and I'm looking into fixing that by adding fast storage.  This may
> > take a few days.  Sorry for the noise.
> 
> Ah-hah.  Now that I look, eelpout is very clearly slower overall
> than it was a couple weeks ago, so all is explained.
> 
> It might still be reasonable to raise wal_sender_timeout in the
> buildfarm environment, though.  We usually try to make sure that
> buildfarm timeouts border on ridiculous, not just because of
> underpowered critters but also for cases like CLOBBER_CACHE_ALWAYS
> animals.

My buildfarm animals override these:

extra_config =>{
DEFAULT => [
"authentication_timeout = '600s'",
"wal_receiver_timeout = '18000s'",
"wal_sender_timeout = '18000s'",
],
},
build_env =>{
PGCTLTIMEOUT => 18000,
},

Each of those timeouts caused failures before I changed it.  For animals fast
enough to make them irrelevant, I've not yet encountered a disadvantage.

> I'm also wondering a bit why the issue isn't affecting the newer
> branches.  It's certainly not because we made the test shorter ...

That is peculiar.

Re: convert elog(LOG) calls to ereport

2020-12-02 Thread Noah Misch

On Wed, Dec 02, 2020 at 02:26:24PM +0100, Peter Eisentraut wrote:
> There are a number of elog(LOG) calls that appear to be user-facing, so they
> should be ereport()s.

> @@ -8591,25 +8604,46 @@ LogCheckpointEnd(bool restartpoint)
>   CheckpointStats.ckpt_sync_rels;
>   average_msecs = (long) ((average_sync_time + 999) / 1000);
>  
> - elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
> -  "%d WAL file(s) added, %d removed, %d recycled; "
> -  "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
> -  "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
> -  "distance=%d kB, estimate=%d kB",

+1 to this change.

> @@ -1763,7 +1771,8 @@ pq_setkeepalivesidle(int idle, Port *port)
>  #else
>   if (idle != 0)
>   {
> - elog(LOG, "setting the keepalive idle time is not supported");
> + ereport(LOG,
> + (errmsg("setting the keepalive idle time is not 
> supported")));

+1

> --- a/src/backend/optimizer/geqo/geqo_erx.c
> +++ b/src/backend/optimizer/geqo/geqo_erx.c
> @@ -420,7 +420,8 @@ edge_failure(PlannerInfo *root, Gene *gene, int index, 
> Edge *edge_table, int num
>   }
>   }
>  
> - elog(LOG, "no edge found via random decision and total_edges == 
> 4");
> + ereport(LOG,
> + (errmsg("no edge found via random decision and 
> total_edges == 4")));

The user can't act upon this without reading the source code.  This and the
other messages proposed in this file are better as elog().

> @@ -343,7 +346,8 @@ PGSharedMemoryCreate(Size size,
>* care.
>*/
>   if (!CloseHandle(hmap))
> - elog(LOG, "could not close handle to shared memory: error code 
> %lu", GetLastError());
> + ereport(LOG,
> + (errmsg("could not close handle to shared 
> memory: error code %lu", GetLastError(;

The numerous messages proposed in src/backend/port/ files are can't-happen
events, and there's little a DBA can do without reading the source code.
They're better as elog().

> @@ -6108,8 +6111,9 @@ backend_read_statsfile(void)
>   /* Copy because timestamptz_to_str returns a 
> static buffer */
>   filetime = pstrdup(timestamptz_to_str(file_ts));
>   mytime = pstrdup(timestamptz_to_str(cur_ts));
> - elog(LOG, "stats collector's time %s is later 
> than backend local time %s",
> -  filetime, mytime);
> + ereport(LOG,
> + (errmsg("statistics collector's 
> time %s is later than backend local time %s",
> + filetime, 
> mytime)));

+1

> @@ -769,10 +769,11 @@ StartupReplicationOrigin(void)
>   replication_states[last_state].remote_lsn = 
> disk_state.remote_lsn;
>   last_state++;
>  
> - elog(LOG, "recovered replication state of node %u to %X/%X",
> -  disk_state.roident,
> -  (uint32) (disk_state.remote_lsn >> 32),
> -  (uint32) disk_state.remote_lsn);
> + ereport(LOG,
> + (errmsg("recovered replication state of node %u 
> to %X/%X",
> + disk_state.roident,
> + (uint32) (disk_state.remote_lsn 
> >> 32),
> + (uint32) 
> disk_state.remote_lsn)));

+1

> @@ -1914,7 +1914,8 @@ FileClose(File file)
>  
>   /* in any case do the unlink */
>   if (unlink(vfdP->fileName))
> - elog(LOG, "could not unlink file \"%s\": %m", 
> vfdP->fileName);
> + ereport(LOG,
> + (errmsg("could not unlink file \"%s\": 
> %m", vfdP->fileName)));

+1

>  
>   /* and last report the stat results */
>   if (stat_errno == 0)
> @@ -1922,7 +1923,8 @@ FileClose(File file)
>   else
>   {
>   errno = stat_errno;
> - elog(LOG, "could not stat file \"%s\": %m", 
> vfdP->fileName);
> + ereport(LOG,
> + (errmsg("could not stat file \"%s\": 
> %m", vfdP->fileName)));

+1


For the changes I didn't mention explicitly (most of them), I'm -0.5.  Many of
them "can't happen", use source code terms of art, and/or breach guideline
"Avoid mentioning called function names, either; instead say what the code was
trying to do" (https://www.postgresql.org/docs/devel/error-style-guide.html).

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-12-09 Thread Noah Misch

On Wed, Dec 02, 2020 at 01:50:25PM +0530, Amit Kapila wrote:
> On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar  wrote:
> > > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila  
> > > wrote:
> > > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch  wrote:
> > > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > > see the buildfarm reports
> > > > >
> > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14
> > > > > failed the new 015_stream.pl test with the subscriber looping like 
> > > > > this:
> > > > >
> > > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker 
> > > > > "logical replication worker" (PID 13959252) exited with exit code 1
> > > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > ...

> > > > The above kind of error can happen due to the following reasons: (a)
> > > > the first time we sent the stream and created the file and that got
> > > > removed before the second stream reached the subscriber. (b) from the
> > > > publisher-side, we never sent the indication that it is the first
> > > > stream and the subscriber directly tries to open the file thinking it
> > > > is already there.

Further testing showed it was a file location problem, not a deletion problem.
The worker tried to open
base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
were the files actually existing:

[nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find 
src/test/subscription/tmp_check -name '*sharedfileset*')
src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
total 408
drwx--2 nm   usr 256 Dec 08 03:20 .
drwx--4 nm   usr 256 Dec 08 03:20 ..
-rw---1 nm   usr  207806 Dec 08 03:20 16393-510.changes.0

src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
total 0
drwx--2 nm   usr 256 Dec 08 03:20 .
drwx--4 nm   usr 256 Dec 08 03:20 ..
-rw---1 nm   usr   0 Dec 08 03:20 16393-511.changes.0

> > I have executed "make check" in the loop with only this file.  I have
> > repeated it 5000 times but no failure, I am wondering shall we try to
> > execute in the same machine in a loop where it failed once?
> 
> Yes, that might help. Noah, would it be possible for you to try that

The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
this to matter, xidhash must contain more than one element.  Existing tests
rarely exercise the multi-element scenario.  Under heavy load, on this system,
the test publisher can have two active transactions at once, in which case it
does exercise multi-element xidhash.  (The publisher is sensitive to timing,
but the subscriber is not; once WAL contains interleaved records of two XIDs,
the subscriber fails every time.)  This would be much harder to reproduce on a
little-endian system, where strcmp(&xid, &xid_plus_one)!=0.  On big-endian,
every small XID has zero in the first octet; they all look like empty strings.

The attached patch has the one-line fix and some test suite changes that make
this reproduce frequently on any big-endian system.  I'm currently planning to
drop the test suite changes from the commit, but I could keep them if folks
like them.  (They'd need more comments and timeout handling.)
Author: Noah Misch 
Commit: Noah Misch 

Use HASH_BLOBS for xidhash.

This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
possible.  Conditions for reaching those symptoms were more frequent on
big-endian systems.

Re: pg_basebackup test coverage

2020-12-10 Thread Noah Misch

On Thu, Dec 10, 2020 at 12:32:52PM -0500, Robert Haas wrote:
> It would probably be good to fix as much of this as we can, but there
> are a couple of cases I think would be particularly good to cover. One
> is 'pg_basebackup -Ft -Xnone -D -', which tries to write the output as
> a single tar file on standard output, injecting the backup_manifest
> file into the tar file instead of writing it out separately as we
> normally would. This case requires special handling in a few places
> and it would be good to check that it actually works. The other is the
> -z or -Z option, which produces a compressed tar file.
> 
> Now, there's nothing to prevent us from running commands like this
> from the pg_basebackup tests, but it doesn't seem like we could really
> check anything meaningful. Perhaps we'd notice if the command exited
> non-zero or didn't produce any output, but it would be nice to verify
> that the resulting backups are actually correct. The way to do that
> would seem to be to extract the tar file (and decompress it first, in
> the -z/-Z case) and then run pg_verifybackup on the result and check
> that it passes. However, as far as I can tell there's no guarantee
> that the user has 'tar' or 'gunzip' installed on their system, so I
> don't see a clean way to do this. A short (but perhaps incomplete)
> search didn't turn up similar precedents in the existing tests.
> 
> Any ideas?

I would probe for the commands with "tar -cf- anyfile | tar -xf-" and "echo
foo | gzip | gunzip".  If those fail, skip any test that relies on an
unavailable command.

Re: pg_basebackup test coverage

2020-12-11 Thread Noah Misch

On Fri, Dec 11, 2020 at 12:23:10PM -0500, Robert Haas wrote:
> On Fri, Dec 11, 2020 at 3:04 AM Michael Paquier  wrote:
> > Why don't you just use Archive::Tar [1] instead of looking for some system
> > commands?  Combining list_files() with extract(), it is possible to
> > extract an archive, with or without compression, without hoping for an
> > equivalent to exist on Windows.  That would not be the first time
> > either that there is a TAP test that skips some tests if a module does
> > not exist.  See for example what psql does with IO::Pty.
> 
> Well, either this or Noah's method has the disadvantage that not
> everyone will get the benefit of the tests, and that those who wish to
> get that benefit must install more stuff. But, all three of the
> computers I have within arm's reach (yeah, I might have a problem)
> have Archive::Tar installed, so maybe it's not a big concern in
> practice. I am slightly inclined to think that the perl package
> approach might be better than shell commands because perhaps it is
> more likely to work on Windows, but I'm not positive.

Outside Windows, Archive::Tar is less portable.  For example, in the forty-two
systems of the GCC Compile Farm, five lack Archive::Tar.  (Each of those five
is a CentOS 7 system.  Every system does have tar, gzip and gunzip.)

Either way is fine with me.  Favoring Archive::Tar, a Windows-specific bug is
more likely than a CentOS/RHEL-specific bug.  Favoring shell commands, they
can catch PostgreSQL writing a tar file that the system's tar can't expand.

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-13 Thread Noah Misch

On Sun, Dec 13, 2020 at 11:49:31AM -0500, Tom Lane wrote:
> But what jumps out at me here is that this sort of error seems way
> too easy to make, and evidently way too hard to detect.  What can we
> do to make it more obvious if one has incorrectly used or omitted
> HASH_BLOBS?  Both directions of error might easily escape notice on
> little-endian hardware.
> 
> I thought of a few ideas, all of which have drawbacks:
> 
> 1. Invert the sense of the flag, ie HASH_BLOBS becomes the default.
> This seems to just move the problem somewhere else, besides which
> it'd require touching an awful lot of callers, and would silently
> break third-party callers.
> 
> 2. Don't allow a default: invent a new HASH_STRING flag, and
> require that hash_create() calls specify exactly one of HASH_BLOBS,
> HASH_STRING, or HASH_FUNCTION.  This doesn't completely fix the
> hazard of mindless-copy-and-paste, but I think it might make it
> a little more obvious.  Still requires touching a lot of calls.

I like (2), for making the bug harder and for greppability.  Probably
pluralize it to HASH_STRINGS, for the parallel with HASH_BLOBS.

> 3. Add some sort of heuristic restriction on keysize.  A keysize
> that's only 4 or 8 bytes almost certainly is not a string.
> This doesn't give us much traction for larger keysizes, though.
> 
> 4. Disallow empty string keys, ie something like "Assert(s_len > 0)"
> in string_hash().  I think we could get away with that given that
> SQL disallows empty identifiers.  However, it would only help to
> catch one direction of error (omitting HASH_BLOBS), and it would
> only help on big-endian hardware, which is getting harder to find.
> Still, we could hope that the buildfarm would detect errors.

It's nontrivial to confirm that the empty-string key can't happen for a given
hash table.  (In contrast, what (3) asserts on is usually a compile-time
constant.)  I would stop short of adding (4), though it could be okay.

> A quick count of grep hits suggest that the large majority of
> existing hash_create() calls use HASH_BLOBS, and there might be
> only order-of-ten calls that would need to be touched if we
> required an explicit HASH_STRING flag.  So option #2 is seeming
> kind of attractive.  Maybe that together with an assertion that
> string keys have to exceed 8 or 16 bytes would be enough protection.

Agreed.  I expect (2) gives most of the benefit.  Requiring 8-byte capacity
should be harmless, and most architectures can zero 8 bytes in one
instruction.  Requiring more bytes trades specificity for sensitivity.

> A different angle we could think about is that the name "HASH_BLOBS"
> is kind of un-obvious.  Maybe we should deprecate that spelling in
> favor of something like "HASH_BINARY".

With (2) in place, I wouldn't worry about renaming HASH_BLOBS.  It's hard to
confuse with HASH_STRINGS or HASH_FUNCTION.  If anything, HASH_BLOBS conveys
something more specific.  HASH_FUNCTION cases see binary data, but that data
has structure that promotes it out of "blob" status.

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-15 Thread Noah Misch

On Mon, Dec 14, 2020 at 01:59:03PM -0500, Tom Lane wrote:
> * Should we just have a blanket insistence that all callers supply
> HASH_ELEM?  The default sizes that dynahash.c uses without that are
> undocumented and basically useless.

+1

> we should just rip out all those memsets as pointless, since there's
> basically no case where you'd use the memset to fill a field that
> you meant to pass as zero.  The fact that hash_create() doesn't
> read fields it's not told to by a flag means we should not need
> the memsets to avoid uninitialized-memory reads.

On Mon, Dec 14, 2020 at 06:55:20PM -0500, Tom Lane wrote:
> Here's a rolled-up patch that does some further documentation work
> and gets rid of the unnecessary memset's as well.

+1 on removing the memset() calls.  That said, it's not a big deal if more
creep in over time; it doesn't qualify as a project policy violation.

> @@ -329,6 +328,11 @@ InitShmemIndex(void)
>   * whose maximum size is certain, this should be equal to max_size; that
>   * ensures that no run-time out-of-shared-memory failures can occur.
>   *
> + * *infoP and hash_flags should specify at least the entry sizes and key

s/should/must/

Re: [PATCH] Logical decoding of TRUNCATE

2020-12-19 Thread Noah Misch

On Sat, Apr 07, 2018 at 07:40:11PM -0400, Peter Eisentraut wrote:
> Committed with those changes.

Since commit 039eb6e added logical replication support for TRUNCATE, logical
apply of the TRUNCATE fails if it chooses a parallel index build:

cat >/tmp/most_parallel.conf <

Re: [PATCH] Logical decoding of TRUNCATE

2020-12-21 Thread Noah Misch

On Sun, Dec 20, 2020 at 03:54:31PM -0800, Peter Geoghegan wrote:
> On Sun, Dec 20, 2020 at 3:13 PM Andres Freund  wrote:
> > Hm. Do I understand correctly that this problem is hit solely because
> > the parallel mode code relies on there already have been a transaction
> > snapshot set, thus avoiding the error? And that the code normally only
> > works because GetTransactionSnapshot() will already have been called
> > somewhere, before EnterParallelMode()?

I think so.

> It seems unlikely that InitializeParallelDSM() was ever intended to be
> run in a background worker.

That wouldn't surprise me.  Nonetheless, when worker_spi runs parallel
queries, they work fine.  The logical replication worker experiences novel
scenarios, because it calls ExecuteTruncateGuts() directly, not as part of an
actual TRUNCATE query.  That bypasses some of the usual once-per-query setup.

On Mon, Dec 21, 2020 at 12:29:37PM +0530, Amit Kapila wrote:
> I think the TRUNCATE operation should not use parallelism either via
> apply worker or without it because there is nothing to scan in heap.

That's fair.

> Additionally, we can have an Assert or elog in InitializeParallelDSM
> to ensure that it is never invoked by parallel worker.

I don't know whether InitializeParallelDSM() operates correctly from inside a
parallel worker.  That is orthogonal to the bug here.

Invalidate acl.c caches for pg_authid.rolinherit changes

2020-12-21 Thread Noah Misch

Backends reflect "GRANT role_name" changes rather quickly, due to a syscache
invalidation callback.  Let's register an additional callback to reflect
"ALTER ROLE ... [NO]INHERIT" with equal speed.  I propose to back-patch this.
While pg_authid changes may be more frequent than pg_auth_members changes, I
expect neither is frequent enough to worry about the resulting acl.c cache
miss rate.

pg_authid changes don't affect cached_membership_roles, so I could have
invalidated cached_privs_roles only.  That felt like needless complexity.  I
expect cached_privs_role gets the bulk of traffic, since SELECT, INSERT,
UPDATE and DELETE use it.  cached_membership_roles pertains to DDL and such.
Author:     Noah Misch 
Commit: Noah Misch 

Invalidate acl.c caches when pg_authid changes.

This makes existing sessions reflect "ALTER ROLE ... [NO]INHERIT" as
quickly as they have been reflecting "GRANT role_name".  Back-patch to
9.5 (all supported versions).

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index f97489f..fe6c444 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -52,7 +52,6 @@ typedef struct
  * role.  In most of these tests the "given role" is the same, namely the
  * active current user.  So we can optimize it by keeping a cached list of
  * all the roles the "given role" is a member of, directly or indirectly.
- * The cache is flushed whenever we detect a change in pg_auth_members.
  *
  * There are actually two caches, one computed under "has_privs" rules
  * (do not recurse where rolinherit isn't true) and one computed under
@@ -4675,12 +4674,16 @@ initialize_acl(void)
if (!IsBootstrapProcessingMode())
{
/*
-* In normal mode, set a callback on any syscache invalidation 
of
-* pg_auth_members rows
+* In normal mode, set a callback on any syscache invalidation 
of rows
+* of pg_auth_members (for each AUTHMEM search in this file) or
+* pg_authid (for has_rolinherit())
 */
CacheRegisterSyscacheCallback(AUTHMEMROLEMEM,
  
RoleMembershipCacheCallback,
  
(Datum) 0);
+   CacheRegisterSyscacheCallback(AUTHOID,
+ 
RoleMembershipCacheCallback,
+ 
(Datum) 0);
}
 }
 
diff --git a/src/test/regress/expected/privileges.out 
b/src/test/regress/expected/privileges.out
index 0a2dd37..7754c20 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -350,6 +350,13 @@ SET SESSION AUTHORIZATION regress_priv_user1;
 SELECT * FROM atest3; -- fail
 ERROR:  permission denied for table atest3
 DELETE FROM atest3; -- ok
+BEGIN;
+RESET SESSION AUTHORIZATION;
+ALTER ROLE regress_priv_user1 NOINHERIT;
+SET SESSION AUTHORIZATION regress_priv_user1;
+DELETE FROM atest3;
+ERROR:  permission denied for table atest3
+ROLLBACK;
 -- views
 SET SESSION AUTHORIZATION regress_priv_user3;
 CREATE VIEW atestv1 AS SELECT * FROM atest1; -- ok
diff --git a/src/test/regress/sql/privileges.sql 
b/src/test/regress/sql/privileges.sql
index e0c1a29..4911ad4 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -220,6 +220,12 @@ SET SESSION AUTHORIZATION regress_priv_user1;
 SELECT * FROM atest3; -- fail
 DELETE FROM atest3; -- ok
 
+BEGIN;
+RESET SESSION AUTHORIZATION;
+ALTER ROLE regress_priv_user1 NOINHERIT;
+SET SESSION AUTHORIZATION regress_priv_user1;
+DELETE FROM atest3;
+ROLLBACK;
 
 -- views

Re: [PATCH] Logical decoding of TRUNCATE

2020-12-23 Thread Noah Misch

On Mon, Dec 21, 2020 at 09:42:47AM -0800, Andres Freund wrote:
> On 2020-12-20 15:54:31 -0800, Peter Geoghegan wrote:
> > On Sun, Dec 20, 2020 at 3:13 PM Andres Freund  wrote:
> > > Hm. Do I understand correctly that this problem is hit solely because
> > > the parallel mode code relies on there already have been a transaction
> > > snapshot set, thus avoiding the error? And that the code normally only
> > > works because GetTransactionSnapshot() will already have been called
> > > somewhere, before EnterParallelMode()?
> > 
> > It seems unlikely that InitializeParallelDSM() was ever intended to be
> > run in a background worker.
> 
> IDK, the environment in a bgworker shouldn't be that different from the
> normal query environment in a normal connection. And it's far from
> insane to want to be able to run a paralell query in a bgworker (and I
> *think* I have seen that work before). This case here seems more like
> an accidental dependency than anything to me, once that could perhaps
> even hint at problems in normal backends too.

Yeah.  SPI_execute("TRUNCATE foo", false, 0) has no trouble doing a parallel
index build in a bgworker.  Let's ignore intention and be pleased about that.

Re: Temporary tables versus wraparound... again

2020-12-25 Thread Noah Misch

On Tue, Nov 10, 2020 at 04:10:57PM +0900, Masahiko Sawada wrote:
> On Mon, Nov 9, 2020 at 3:23 PM Greg Stark  wrote:
> > On Mon, 9 Nov 2020 at 00:17, Noah Misch  wrote:

> > > > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel)
> > > >
> > > >   /* Truncate the underlying relation */
> > > >   table_relation_nontransactional_truncate(rel);
> > > > + ResetVacStats(rel);
> > >
> > > I didn't test, but I expect this will cause a stats reset for the second
> > > TRUNCATE here:
> > >
> > > CREATE TABLE t ();
> > > ...
> > > BEGIN;
> > > TRUNCATE t;
> > > TRUNCATE t;  -- inplace relfrozenxid reset
> > > ROLLBACK;  -- inplace reset survives
> > >
> > > Does that indeed happen?
> >
> > Apparently no, see below.  I have to say I was pretty puzzled by the
> > actual behaviour which is that the rollback actually does roll back
> > the inplace update. But I *think* what is happening is that the first
> > truncate does an MVCC update so the inplace update happens only to the
> > newly created tuple which is never commited.
> 
> I think in-plase update that the patch introduces is not used because
> TRUNCATE doesn't use heap_truncate_one_rel() to truncate a table in
> that scenario. It does MVCC update the pg_class tuple for a new
> relfilenode with new relfrozenxid and other stats, see
> RelationSetNewRelfilenode(). If we create and truncate a table within
> the transaction it does in-place update that the patch introduces but
> I think it's no problem in this case either.

Agreed.  Rolling back a heap_truncate_one_rel() always implies rolling back to
an earlier version of the entire pg_class tuple.  (That may not be true of
mapped relations, but truncating them is unreasonable.)  Thanks for checking.

> > Thinking about things a bit this does worry me a bit. I wonder if
> > inplace update is really safe outside of vacuum where we know we're
> > not in a transaction that can be rolled back. But IIRC doing a
> > non-inplace update on pg_class for these columns breaks other things.
> > I don't know if that's still true.
> 
> heap_truncate_one_rel() is not a transaction-safe operation. Doing
> in-place updates during that operation seems okay to me unless I'm
> missing something.

Yep.

Tying an object's ownership to datdba

2020-12-27 Thread Noah Misch

https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave $SUBJECT as
one of the constituent projects for changing the public schema default ACL.
This feature stands on its own, hence the new thread.  I showed syntax "ALTER
SCHEMA public OWNER TO DATABASE_OWNER" and anticipated a new RoleSpecType.
That was fine for in-memory representation, but it lacked a value to store in
pg_namespace.nspowner.  That led me to instead add a default role,
"pg_database_owner".  That way, user queries joining owner columns to
pg_authid need no modification.  Also, user.c changes were trivial, and
pg_dump needed no changes.  The role's membership consists, implicitly, of the
current database owner.  The first patch refactors acl.c to reduce code
duplication, and the second patch adds the feature.

I ended up blocking DDL that creates role memberships involving the new role;
see reasons in user.c comments.  Lifting those restrictions looked feasible,
but it was inessential to the mission, and avoiding unintended consequences
would have been tricky.  Views "information_schema.enabled_roles" and
"information_schema.applicable_roles" list any implicit membership in
pg_database_owner, but pg_catalog.pg_group and psql \dgS do not.  If this
leads any reviewer to look closely at applicable_roles, note that its behavior
doesn't match its documentation
(https://postgr.es/m/flat/20060728170615.gy20...@kenobi.snowman.net).

This patch makes us read pg_database when reading pg_auth_members.
IndexScanOK() has been saying "during backend startup we have to be able to
use the pg_authid and pg_auth_members syscaches for authentication".  While
that's true of pg_authid, I found no sign of pg_auth_members reads that early.
(The read in InitPostgres() -> CheckMyDatabase() -> pg_database_aclcheck()
happens after RelationCacheInitializePhase3().  In a physical walsender, which
never has a mature relcache, some SHOW commands make guc.c check
DEFAULT_ROLE_READ_ALL_SETTINGS.  The walsender case, though it happens after
authentication, may necessitate IndexScanOK()'s treatment of pg_auth_members.)
For now, just in case I missed the early read, I've made IndexScanOK() treat
pg_database like pg_auth_members.

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 

Merge similar algorithms into roles_is_member_of().

The next commit would have complicated two or three algorithms, so take
this opportunity to consolidate.  No functional changes.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index fe6c444..1adacb9 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -50,32 +50,24 @@ typedef struct
 /*
  * We frequently need to test whether a given role is a member of some other
  * role.  In most of these tests the "given role" is the same, namely the
- * active current user.  So we can optimize it by keeping a cached list of
- * all the roles the "given role" is a member of, directly or indirectly.
- *
- * There are actually two caches, one computed under "has_privs" rules
- * (do not recurse where rolinherit isn't true) and one computed under
- * "is_member" rules (recurse regardless of rolinherit).
+ * active current user.  So we can optimize it by keeping cached lists of all
+ * the roles the "given role" is a member of, directly or indirectly.
  *
  * Possibly this mechanism should be generalized to allow caching membership
  * info for multiple roles?
  *
- * The has_privs cache is:
- * cached_privs_role is the role OID the cache is for.
- * cached_privs_roles is an OID list of roles that cached_privs_role
- * has the privileges of (always including itself).
- * The cache is valid if cached_privs_role is not InvalidOid.
- *
- * The is_member cache is similarly:
- * cached_member_role is the role OID the cache is for.
- * cached_membership_roles is an OID list of roles that cached_member_role
- * is a member of (always including itself).
- * The cache is valid if cached_member_role is not InvalidOid.
+ * Each element of cached_roles is an OID list of constituent roles for the
+ * corresponding element of cached_role (always including the cached_role
+ * itself).  One cache has ROLERECURSE_PRIVS semantics, and the other has
+ * ROLERECURSE_MEMBERS semantics.
  */
-static Oid cached_privs_role = InvalidOid;
-static List *cached_privs_roles = NIL;
-static Oid cached_member_role = InvalidOid;
-static List *cached_membership_roles = NIL;
+enum RoleRecurseType
+{
+   ROLERECURSE_PRIVS = 0,  /* recurse if rolinherit */
+   ROLERECURSE_MEMBERS = 1 /* recurse unconditionally */
+};
+static Oid cached_role[] = {InvalidOid, InvalidOid};
+static List *cached_roles[] = {NIL, NIL};
 
 
 static const char *getid(c

Dump public schema ownership & seclabels

2020-12-29 Thread Noah Misch

https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave $SUBJECT as
one of the constituent projects for changing the public schema default ACL.
This ended up being simple.  Attached.  I chose to omit the "ALTER SCHEMA
public OWNER TO" when the owner is the bootstrap superuser, like how we skip
acl GRANT/REVOKE when the ACL matches the one recorded in pg_init_privs.  I
waffled on that; would it be better to make the OWNER TO unconditional?

Like ownership, we've not been dumping security labels on the public schema.
The way I fixed ownership fixed security labels implicitly.  If anyone thinks
I should unbundle these two, let me know.

All this is arguably a fix for an ancient bug.  Some sites may need to
compensate for the behavior change, so I plan not to back-patch.

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 

Dump public schema ownership and security labels.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..a16d149 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3597,10 +3597,13 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool 
isData)
 * Actually print the definition.
 *
 * Really crude hack for suppressing AUTHORIZATION clause that old 
pg_dump
-* versions put into CREATE SCHEMA.  We have to do this when --no-owner
-* mode is selected.  This is ugly, but I see no other good way ...
+* versions put into CREATE SCHEMA.  Don't mutate the modern definition
+* for schema "public", which consists of a comment.  We have to do this
+* when --no-owner mode is selected.  This is ugly, but I see no other
+* good way ...
 */
-   if (ropt->noOwner && strcmp(te->desc, "SCHEMA") == 0)
+   if (ropt->noOwner &&
+   strcmp(te->desc, "SCHEMA") == 0 && strncmp(te->defn, "--", 2) 
!= 0)
{
ahprintf(AH, "CREATE SCHEMA %s;\n\n\n", fmtId(te->tag));
}
@@ -3612,11 +3615,16 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool 
isData)
 
/*
 * If we aren't using SET SESSION AUTH to determine ownership, we must
-* instead issue an ALTER OWNER command.  We assume that anything 
without
-* a DROP command is not a separately ownable object.  All the 
categories
-* with DROP commands must appear in one list or the other.
+* instead issue an ALTER OWNER command.  Schema "public" is special; a
+* dump never creates it, so we use ALTER OWNER even when using SET
+* SESSION for all other objects.  We assume that anything without a 
DROP
+* command is not a separately ownable object.  All the categories with
+* DROP commands must appear in one list or the other.
 */
-   if (!ropt->noOwner && !ropt->use_setsessauth &&
+   if (!ropt->noOwner &&
+   (!ropt->use_setsessauth ||
+(strcmp(te->tag, "public") == 0
+ && strcmp(te->desc, "SCHEMA") == 0)) &&
te->owner && strlen(te->owner) > 0 &&
te->dropStmt && strlen(te->dropStmt) > 0)
{
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1ab98a2..c633644 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_aggregate_d.h"
 #include "catalog/pg_am_d.h"
 #include "catalog/pg_attribute_d.h"
+#include "catalog/pg_authid_d.h"
 #include "catalog/pg_cast_d.h"
 #include "catalog/pg_class_d.h"
 #include "catalog/pg_collation_d.h"
@@ -1566,10 +1567,14 @@ selectDumpableNamespace(NamespaceInfo *nsinfo, Archive 
*fout)
 * no-mans-land between being a system object and a user 
object.  We
 * don't want to dump creation or comment commands for it, 
because
 * that complicates matters for non-superuser use of pg_dump.  
But we
-* should dump any ACL changes that have occurred for it, and of
-* course we should dump contained objects.
+* should dump any ownership changes, security labels, and ACL
+* changes, and of course we should dump contained objects.  
pg_dump
+* ties ownership to DUMP_COMPONENT_DEFINITION, so dump a 
"definition"
+* bearing just a comment.
 */
-   nsinfo->dobj.dump = DUMP_COMPONENT_ACL;
+   nsinfo->dobj.dump = DUMP_COMPONENT_ALL & 
~DUMP_COMPONENT_COMMENT;
+

Re: Dump public schema ownership & seclabels

2020-12-30 Thread Noah Misch

On Tue, Dec 29, 2020 at 05:49:24AM -0800, Noah Misch wrote:
> https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave $SUBJECT as
> one of the constituent projects for changing the public schema default ACL.
> This ended up being simple.  Attached.

This is defective; it fails to reproduce nspacl after "ALTER SCHEMA public
OWNER TO pg_write_server_files; REVOKE ALL ON SCHEMA public FROM
pg_write_server_files;".  I will try again later.

Re: Buildfarm's cross-version-upgrade tests

2020-12-30 Thread Noah Misch

On Wed, Dec 30, 2020 at 04:28:44PM -0500, Andrew Dunstan wrote:
> On 12/30/20 3:42 PM, Tom Lane wrote:
> > Since I'd just seen Noah's commit 52202bb39 go by, I tried
> > modifying src/bin/pg_upgrade/test.sh to do the drop, but that
> > isn't helping.  I recall now from prior discussions that we
> > have to modify code that's embedded in the buildfarm client
> > script to make this go.  (I wonder what test scenario Noah had
> > in mind, exactly.)

Commit 52202bb39 affected the cross-version test harness described in
src/bin/pg_upgrade/TESTING.  That harness is disused, having been broken at
head for at least 40 months.  (One can work around the remaining breakage by
locally back-patching e78900a into REL_13_STABLE.)

> > I'm now wondering if we can't devise a fixup method that is
> > controllable by committers and doesn't require a buildfarm
> > client rollout to update.

Ideally, the buildfarm code should curate per-version source and installation
trees, provide their locations to the test harness, and collect logs.  Other
test logic (i.e. the kinds of steps that appear in test.sh) should live in
postgresql.git code.  (That's the separation of concerns for all
run-by-default buildfarm tests.)

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-01 Thread Noah Misch

On Fri, Jan 01, 2021 at 11:05:29PM +0500, Andrey Borodin wrote:
> I've found this thread in CF looking for something to review.

Thanks for taking a look.

> > 9 нояб. 2020 г., в 09:53, Noah Misch  написал(а):
> > 
> > Rebased both patches, necessitated by commit c732c3f (a repair of commit
> > dee663f).  As I mentioned on another branch of the thread, I'd be content if
> > someone reviews the slru-truncate-modulo patch and disclaims knowledge of 
> > the
> > slru-truncate-insurance patch; I would then abandon the latter patch.
> > 
> 
> Commit c732c3f adds some SYNC_FORGET_REQUESTs.
> slru-truncate-modulo-v5.patch fixes off-by-one error in functions like 
> *PagePrecedes(int page1, int page2).
> slru-truncate-t-insurance-v4.patch ensures that off-by-one errors do not 
> inflict data loss.
> 
> While I agree that fixing error is better than hiding it, I could not figure 
> out how c732c3f is connected to these patches.
> Can you please give me few pointers how to understand this connection?

Commit c732c3f is the last commit that caused a merge conflict.  There's no
other connection to this thread, and one can review patches on this thread
without studying commit c732c3f.  Specifically, this thread's
slru-truncate-modulo patch and commit c732c3f modify adjacent lines in
SlruScanDirCbDeleteCutoff(); here's the diff after merge conflict resolution:

--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@@ -1525,4 -1406,4 +1517,4 @@@ SlruScanDirCbDeleteCutoff(SlruCtl ctl, 
  
 -  if (ctl->PagePrecedes(segpage, cutoffPage))
 +  if (SlruMayDeleteSegment(ctl, segpage, cutoffPage))
-   SlruInternalDeleteSegment(ctl, filename);
+   SlruInternalDeleteSegment(ctl, segpage / 
SLRU_PAGES_PER_SEGMENT);

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-02 Thread Noah Misch

On Sat, Jan 02, 2021 at 12:31:45PM +0500, Andrey Borodin wrote:
> Do I understand correctly that this is bugfix that needs to be back-patched?

The slru-truncate-modulo patch fixes a bug.  The slru-truncate-t-insurance
patch does not.  Neither _needs_ to be back-patched, though I'm proposing to
back-patch both.  I welcome opinions about that.

> Thus we should not refactor 4 identical *PagePrecedes(int page1, int page2) 
> into 1 generic function?

I agree with not refactoring that way, in this case.

> Since functions are not symmetric anymore, maybe we should have better names 
> for arguments than "page1" and "page2"? At least in dev branch.

That works for me.  What names would you suggest?

> Is it common practice to embed tests into assert checking like in 
> SlruPagePrecedesUnitTests()?

No; it's neither common practice nor a policy breach.

Re: pg_upgrade fails with non-standard ACL

2021-01-03 Thread Noah Misch

On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote:
> On 08.06.2020 19:31, Alvaro Herrera wrote:
> >I'm thinking what's a good way to have a test that's committable.  Maybe
> >it would work to add a TAP test to pg_upgrade that runs initdb, does a
> >few GRANTS as per your attachment, then runs pg_upgrade?  Currently the
> >pg_upgrade tests are not TAP ... we'd have to revive
> >https://postgr.es/m/20180126080026.gi17...@paquier.xyz
> >(Some progress was already made on the buildfarm front to permit this)
> 
> I would be glad to add some test, but it seems to me that the infrastructure
> changes for cross-version pg_upgrade test is much more complicated task than
> this modest bugfix.

Agreed.

> --- a/src/bin/pg_upgrade/check.c
> +++ b/src/bin/pg_upgrade/check.c

> +static void
> +check_for_changed_signatures(void)
> +{

> + snprintf(output_path, sizeof(output_path), "revoke_objects.sql");

If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database in
which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened
requires a GRANT.  Can you use a file name that will still make sense if
someone enhances pg_upgrade to generate such GRANT statements?

> + if (script == NULL && (script = 
> fopen_priv(output_path, "w")) == NULL)
> + pg_fatal("could not open file \"%s\": 
> %s\n",
> +  output_path, 
> strerror(errno));

Use %m instead of passing sterror(errno) to %s.

> + }
> +
> + /* Handle columns separately */
> + if (strstr(aclinfo->obj_type, "column") != NULL)
> + {
> + char   *pdot = 
> last_dot_location(aclinfo->obj_ident);

I'd write strrchr(aclinfo->obj_ident, '.') instead of introducing
last_dot_location().

This assumes column names don't contain '.'; how difficult would it be to
remove that assumption?  If it's difficult, a cheap approach would be to
ignore any entry containing too many dots.  We're not likely to create such
column names at initdb time, but v13 does allow:

  ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b";
  GRANT SELECT ("a.b") ON pg_locks TO backup;

After which revoke_objects.sql has:

  psql:./revoke_objects.sql:9: ERROR:  syntax error at or near "") ON 
pg_catalog.pg_locks.""
  LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup";

While that ALTER VIEW probably should have required allow_system_table_mods,
we need to assume such databases exist.

> --- a/src/bin/pg_upgrade/info.c
> +++ b/src/bin/pg_upgrade/info.c

> +get_non_default_acl_infos(ClusterInfo *cluster)
> +{

> + res = executeQueryOrDie(conn,
> + /*
> +  * Get relations, attributes, functions and procedures. 
>  Some system
> +  * objects like views are not pinned, but these type of 
> objects are
> +  * created in pg_catalog schema.
> +  */
> + "SELECT obj.type, obj.identity, shd.refobjid::regrole 
> rolename, "
> + "CASE WHEN shd.classid = 'pg_class'::regclass THEN 
> true "
> + "ELSE false "
> + "END is_relation "
> + "FROM pg_catalog.pg_shdepend shd, "
> + "LATERAL pg_catalog.pg_identify_object("
> + "shd.classid, shd.objid, shd.objsubid) obj "
> + /* 'a' is for SHARED_DEPENDENCY_ACL */
> + "WHERE shd.deptype = 'a' AND shd.dbid = %d "
> + "AND shd.classid IN ('pg_proc'::regclass, 
> 'pg_class'::regclass) "
> + /* get only system objects */
> + "AND obj.schema = 'pg_catalog' "

Overall, this patch predicts a subset of cases where pg_dump will emit a
failing GRANT or REVOKE that targets a pg_catalog object.  Can you write a
code comment stating what it does and does not detect?  I think it's okay to
not predict every failure, but let's record the intended coverage.  Here are a
few examples I know; there may be more.  The above query only finds GRANTs to
non-pinned roles.  pg_dump dumps the following, but the above query doesn't
find them:

  REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public;
  GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend;

The above query should test refclassid.

This function should not run queries against servers older than 9.6, because
pg_dump emits GRANT/REVOKE for pg_catalog objects only when targeting 9.6 or
later.  An upgrade from 8.4 to master is failing on the above query:

===
Checking for large objects  ok
SQL command failed
SELECT obj.type, obj.identity, shd.refobjid::regrole rolename, CASE WHEN 
shd.classid = 'pg_class'::re

Re: Context diffs

2021-01-04 Thread Noah Misch

On Tue, Jan 05, 2021 at 11:21:07AM +1300, Thomas Munro wrote:
> On Tue, Jan 5, 2021 at 8:07 AM Bruce Momjian  wrote:
> > *  "git apply" and "git am" can't process context diffs (they throw an
> >error once a context-like section of the diff is hit; simple
> >adding/removing lines in a block works)
> >
> > *  the commit-fest doesn't recognized context diff attachments as
> > patches:
> >
> > https://commitfest.postgresql.org/31/2912/
> >
> > *  cfbot can't process file renames/add/delete from context diffs
> 
> For the record, cfbot just uses plain old GNU patch, because that
> seems to accept nearly everything that anyone posts here (after a step
> that tries to unpack tarballs etc).  Several people have suggested I
> change it to use git apply instead (IIRC it works better for patches
> containing binary files such as cryptographic keys?)

It does work better for binary files, though there's little benefit in storing
binary cryptographic keys as opposed to ASCII ones.  Unfortunately for the
cfbot, "git apply" forces the equivalent of "patch -F0", so it rejects patches
needlessly.  If you do change the cfbot, I recommend having it start with "git
apply -3" (able to succeed when plain "git apply" fails), then fallback to
"patch" when "git apply -3" fails.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-09 Thread Noah Misch

On Wed, Jan 06, 2021 at 11:28:36AM +0500, Andrey Borodin wrote:
> First patch fixes a bug with possible SLRU truncation around wrapping point 
> too early.
> Basic idea of the patch is "If we want to delete a range we must be eligible 
> to delete it's beginning and ending".
> So to test if page is deletable it tests that first and last xids(or other 
> SLRU's unit) are of no interest. To test if a segment is deletable it tests 
> if first and last pages can be deleted.

Yes.

> I'm a little suspicious of implementation of *PagePrecedes(int page1, int 
> page2) functions. Consider following example from the patch:
> 
> static bool
> CommitTsPagePrecedes(int page1, int page2)
> {
> TransactionId xid1;
> TransactionId xid2;
> 
> xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
> xid1 += FirstNormalTransactionId + 1;
> xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
> xid2 += FirstNormalTransactionId + 1;
> 
> return (TransactionIdPrecedes(xid1, xid2) &&
> TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
> }
> 
> We are adding FirstNormalTransactionId to xids to avoid them being special 
> xids.
> We add COMMIT_TS_XACTS_PER_PAGE - 1 to xid2 to shift it to the end of the 
> page. But due to += FirstNormalTransactionId we shift slightly behind page 
> boundaries and risk that xid2 + COMMIT_TS_XACTS_PER_PAGE - 1 can become 
> FrozenTransactionId (FirstNormalTransactionId - 1). Thus we add +1 to all 
> values in scope. While the logic is correct, coding is difficult. Maybe we 
> could just use

Right.  The overall objective is to compare the first XID of page1 to the
first and last XIDs of page2.  The FirstNormalTransactionId+1 addend operates
at a lower level.  It just makes TransactionIdPrecedes() behave like
NormalTransactionIdPrecedes() without the latter's assertion.

>   page1_first_normal_xid = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE 
> + FirstNormalTransactionId;
>   page2_first_normal_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE 
> + FirstNormalTransactionId;
>   page2_last_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + 
> CLOG_XACTS_PER_PAGE - 1;
> But I'm not insisting on this.

I see your point, but I doubt using different addends on different operands
makes this easier to understand.  If anything, I'd lean toward adding more
explicit abstraction between "the XID we intend to test" and "the XID we're
using to fool some general-purpose API".

> Following comment is not correct for 1Kb and 4Kb pages
> + * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.

Fixed, thanks.

> All above notes are not blocking the fix, I just wanted to let committer know 
> about this. I think that it's very important to have this bug fixed.
> 
> Second patch converts binary *PagePreceeds() functions to *PageDiff()s and 
> adds logic to avoid deleting pages in suspicious cases. This logic depends on 
> the scale of returned by diff value: it expects that overflow happens between 
> INT_MIN\INT_MAX. This it prevents page deletion if page_diff <= INT_MIN / 2 
> (too far from current cleaning point; and in normal cases, of cause). 
> 
> It must be comparison here, not equality test.
> -   ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> +   ctl->PageDiff(segpage, trunc->earliestExistingPage))

That's bad.  Fixed, thanks.

> This
> int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
> seems to be functional equivalent of
> int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),

Do you think one conveys the concept better than the other?

> AFAICS the overall purpose of the 2nd patch is to help corrupted by other 
> bugs clusters avoid deleting SLRU segments.

Yes.

> I'm a little bit afraid that this kind of patch can hide bugs (while 
> potentially saving some users data). Besides this patch seems like a useful 
> precaution. Maybe we could emit scary warnings if SLRU segments do not stack 
> into continuous range?

Scary warnings are good for an observation that implies a bug, but the
slru-truncate-t-insurance patch causes such an outcome in non-bug cases where
it doesn't happen today.  In other words, discontinuous ranges of SLRU
segments would be even more common after that patch.  For example, it would
happen anytime oldestXID advances by more than ~1B at a time.

Thanks,
nm

pg_dump vs. GRANT OPTION among initial privileges

2021-01-09 Thread Noah Misch

pg_dump can generate invalid SQL syntax if an aclitem in pg_init_privs
contains a GRANT OPTION (an asterisk in the aclitemout() representation).
Also, pg_dump uses REVOKE GRANT OPTION FOR where it needs plain REVOKE.  For
more details, see the log message in the attached patch.  I recommend
reviewing "diff -w" output first.  I'm not certain the test suite will always
find those REVOKE statements in the same order; if that order varies, one
regex will need more help.
Author: Noah Misch 
Commit: Noah Misch 

Fix pg_dump for GRANT OPTION among initial privileges.

The context is an object that no longer bears some aclitem that it bore
initially.  (A user issued REVOKE or GRANT statements upon the object.)
pg_dump is forming SQL to reproduce the object ACL.  Since initdb
creates no ACL bearing GRANT OPTION, reaching this bug requires an
extension where the creation script establishes such an ACL.  No PGXN
extension does that.  If an installation did reach the bug, pg_dump
would have omitted a semicolon, causing a REVOKE and the next SQL
statement to fail.  Separately, since the affected code exists to
eliminate an entire aclitem, it wants plain REVOKE, not REVOKE GRANT
OPTION FOR.  Back-patch to 9.6, where commit
23f34fa4ba358671adab16773e79c17c92cbc870 first appeared.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 536c9ff..2129301 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -168,48 +168,28 @@ buildACLCommands(const char *name, const char *subname, 
const char *nspname,
for (i = 0; i < nraclitems; i++)
{
if (!parseAclItem(raclitems[i], type, name, subname, 
remoteVersion,
- grantee, grantor, 
privs, privswgo))
+ grantee, grantor, 
privs, NULL))
{
ok = false;
break;
}
 
-   if (privs->len > 0 || privswgo->len > 0)
+   if (privs->len > 0)
{
-   if (privs->len > 0)
-   {
-   appendPQExpBuffer(firstsql, "%sREVOKE 
%s ON %s ",
- 
prefix, privs->data, type);
-   if (nspname && *nspname)
-   appendPQExpBuffer(firstsql, 
"%s.", fmtId(nspname));
-   appendPQExpBuffer(firstsql, "%s FROM ", 
name);
-   if (grantee->len == 0)
-   appendPQExpBufferStr(firstsql, 
"PUBLIC;\n");
-   else if (strncmp(grantee->data, "group 
",
-
strlen("group ")) == 0)
-   appendPQExpBuffer(firstsql, 
"GROUP %s;\n",
-   
  fmtId(grantee->data + strlen("group ")));
-   else
-   appendPQExpBuffer(firstsql, 
"%s;\n",
-   
  fmtId(grantee->data));
-   }
-   if (privswgo->len > 0)
-   {
-   appendPQExpBuffer(firstsql,
- 
"%sREVOKE GRANT OPTION FOR %s ON %s ",
- 
prefix, privswgo->data, type);
-   if (nspname && *nspname)
-   appendPQExpBuffer(firstsql, 
"%s.", fmtId(nspname));
-   appendPQExpBuffer(firstsql, "%s FROM ", 
name);
-   if (grantee->len == 0)
-   appendPQExpBufferStr(firstsql, 
"PUBLIC");
-   else if (strncmp(grantee->data, "group 
",
-
strlen("group ")) == 0)
-   appendPQExpBuffer(firstsql, 
"GROUP %s",
-

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-09 Thread Noah Misch

On Sat, Jan 09, 2021 at 08:25:39PM +0500, Andrey Borodin wrote:
> > 9 янв. 2021 г., в 15:17, Noah Misch  написал(а):

> >> I'm a little bit afraid that this kind of patch can hide bugs (while 
> >> potentially saving some users data). Besides this patch seems like a 
> >> useful precaution. Maybe we could emit scary warnings if SLRU segments do 
> >> not stack into continuous range?
> > 
> > Scary warnings are good for an observation that implies a bug, but the
> > slru-truncate-t-insurance patch causes such an outcome in non-bug cases 
> > where
> > it doesn't happen today.  In other words, discontinuous ranges of SLRU
> > segments would be even more common after that patch.  For example, it would
> > happen anytime oldestXID advances by more than ~1B at a time.
> 
> Uhm, I thought that if there is going to be more than ~1B xids - we are going 
> to keep all segements forever and range still will be continuous. Or am I 
> missing something?

No; it deletes the most recent ~1B and leaves the older segments.  An
exception is multixact, as described in the commit message and the patch's
change to a comment in TruncateMultiXact().

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-10 Thread Noah Misch

On Sun, Jan 10, 2021 at 11:44:14AM +0500, Andrey Borodin wrote:
> > 10 янв. 2021 г., в 03:15, Noah Misch  написал(а):
> > 
> > No; it deletes the most recent ~1B and leaves the older segments.  An
> > exception is multixact, as described in the commit message and the patch's
> > change to a comment in TruncateMultiXact().
> 
> Thanks for clarification.
> One more thing: retention point at 3/4 of overall space (half of wraparound) 
> seems more or less random to me. Why not 5/8 or 9/16?

No reason for that exact value.  The purpose of that patch is to mitigate bugs
that cause the server to write data into a region of the SLRU that we permit
truncation to unlink.  If the patch instead tested "diff > INT_MIN * .99", the
new behavior would get little testing, because xidWarnLimit would start first.
Also, the new behavior wouldn't mitigate bugs that trespass >~20M XIDs into
unlink-eligible space.  If the patch tested "diff > INT_MIN * .01", more sites
would see disk consumption grow.  I think reasonable multipliers range from
0.5 (in the patch today) to 0.9, but it's a judgment call.

> Can you please send revised patches with fixes?

Attached.
Author: Noah Misch 
Commit: Noah Misch 

Prevent excess SimpleLruTruncate() deletion.

Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb.  Back-patch
to 9.5 (all supported versions).

Reviewed by Andrey Borodin and Tom Lane.

Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 69a81f3..410d02a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
  XactSLRULock, "pg_xact", 
LWTRANCHE_XACT_BUFFER,
  SYNC_HANDLER_CLOG);
+   SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }

 /*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid 
oldestxid_datoid)

 /*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
  *
  * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic.  However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs.  So, offset both such that xid1,
+ * xid2, and xid2 + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset
+ * is relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case.  The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not.  When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment.  Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback.  The benefit would be rare and small, so we
+ * don't optimize that edge case.
  */
 static bool
 CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;

xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
-   xid1 += FirstNormalTransactionId;
+   xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
-   xid2 += FirstNormalTransactionId;
+   xid2 += FirstNormalTransactionId + 1;

-   return TransactionIdPrecedes(xid1, xid2);
+   return (TransactionIdPrecedes(xid1, xid2) &&
+   TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE 
- 1));
 }

diff --git a/src/backend/access/transam/commit_ts.c 
b/src/backend/access/transam/commit_ts.c
index

Re: Inconsistent "" use

2021-01-10 Thread Noah Misch

On Sun, Jan 10, 2021 at 08:22:42PM +0900, Tatsuo Ishii wrote:
> In doc/src/sgml/func.sgml description of SHOW command use
> "SQL", while SET command description the same
> section does not use "". Shouldn't the description of SET use
> "" for "SQL" as well? Patch attached.

https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters
either ignore that  or use it as a signal to substitute small caps.
I don't consider small caps an improvement for "SQL", so I'd prefer to never
use SQL.   also makes the markup longer (though
one could mitigate that with an entity like &SQL).  However, standardizing on
either way is better than varying within the manual.

Re: Dump public schema ownership & seclabels

2021-02-12 Thread Noah Misch

On Thu, Feb 11, 2021 at 04:08:34AM -0800, Noah Misch wrote:
> On Sun, Jan 17, 2021 at 12:00:06PM +0100, Vik Fearing wrote:
> > On 1/17/21 10:41 AM, Noah Misch wrote:
> > > On Sat, Jan 16, 2021 at 02:05:43PM +0100, Vik Fearing wrote:
> > >> On 12/30/20 12:59 PM, Noah Misch wrote:
> > >>> On Tue, Dec 29, 2020 at 05:49:24AM -0800, Noah Misch wrote:
> > >>>> https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave 
> > >>>> $SUBJECT as
> > >>>> one of the constituent projects for changing the public schema default 
> > >>>> ACL.
> > >>>> This ended up being simple.  Attached.
> > >>>
> > >>> This is defective; it fails to reproduce nspacl after "ALTER SCHEMA 
> > >>> public
> > >>> OWNER TO pg_write_server_files; REVOKE ALL ON SCHEMA public FROM
> > >>> pg_write_server_files;".  I will try again later.
> 
> Fixed.  The comment added to getNamespaces() explains what went wrong.
> 
> Incidentally, --no-acl is fragile without --no-owner, because any REVOKE
> statements assume a particular owner.  Since one can elect --no-owner at
> restore time, we can't simply adjust the REVOKE statements constructed at dump
> time.  That's not new with this patch or specific to initdb-created objects.
> 
> > >> Could I ask you to also include COMMENTs when you try again, please?
> > > 
> > > That may work.  I had not expected to hear of a person changing the 
> > > comment on
> > > schema public.  To what do you change it?
> > 
> > It was a while ago and I don't remember because it didn't appear in the
> > dump so I stopped doing it. :(
> > 
> > Mine was an actual comment, but there are some tools out there that
> > (ab)use COMMENTs as crude metadata for what they do.  For example:
> > https://postgresql-anonymizer.readthedocs.io/en/stable/declare_masking_rules/#declaring-rules-with-comments
> 
> I've attached a separate patch for this, which applies atop the ownership
> patch.  This makes more restores fail for non-superusers, which is okay.

Oops, I botched a refactoring late in the development of that patch.  Here's a
fixed pair of patches.
Author: Noah Misch 
Commit: Noah Misch 

Dump public schema ownership and security labels.

As a side effect, this corrects dumps of public schema ACLs in databases
where the DBA dropped and recreated that schema.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20201229134924.ga1431...@rfd.leadboat.com

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 60d306e..ea67e52 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -725,6 +725,7 @@ void
 buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer racl_subquery,
PQExpBuffer init_acl_subquery, PQExpBuffer 
init_racl_subquery,
const char *acl_column, const char *acl_owner,
+   const char *initprivs_expr,
const char *obj_kind, bool binary_upgrade)
 {
/*
@@ -765,23 +766,25 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer 
racl_subquery,
  "WITH ORDINALITY AS perm(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS init(init_acl) WHERE acl = 
init_acl)) as foo)",
  acl_column,
  obj_kind,
  acl_owner,
+ initprivs_expr,
  obj_kind,
  acl_owner);
 
printfPQExpBuffer(racl_subquery,
  "(SELECT pg_catalog.array_agg(acl 
ORDER BY row_n) FROM "
  "(SELECT acl, row_n FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "WITH ORDINALITY AS initp(acl,row_n) "

Re: public schema default ACL

2021-02-13 Thread Noah Misch

I'm attaching the patch for $SUBJECT, which applies atop the four patches from
the two other threads below.  For convenience of testing, I've included a
rollup patch, equivalent to applying all five patches.

On Sat, Oct 31, 2020 at 09:35:18AM -0700, Noah Misch wrote:
> More details on the semantics I'll use:
> 
> 1. initdb will change like this:
>@@ -1721 +1721 @@ setup_privileges(FILE *cmdfd)
>-  "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n\n",
>+  "GRANT USAGE ON SCHEMA public TO PUBLIC;\n\n",
>+  "ALTER SCHEMA public OWNER TO DATABASE_OWNER;\n\n",

(I ended up assigning the ownership via pg_namespace.dat, not here.)

> 2. If schema public does not exist, pg_dump will emit nothing about it.  This
>is what happens today.  (I suspect it would be better for pg_dump to emit
>DROP SCHEMA public RESTRICT, but that is drifting offtopic for $SUBJECT.)
>Otherwise, when dumping from v13 or earlier, pg_dump will always emit
>REVOKE and/or GRANT statements to reproduce the old ACL.

More precisely, it diffs the source database ownership and ACL to the v14
defaults, then emits ALTER, GRANT, and/or REVOKE as needed.  That yields no
GRANT or REVOKE if the source database is an early adopter of the new default.

>When dumping from
>v14 or later, pg_dump will use pg_init_privs to compute GRANT and REVOKE
>statements, as it does today.

(It doesn't actually use pg_init_privs, but the effect is similar.)

>(This may interfere with cross-version
>pg_upgrade testing.  I haven't looked at how best to fix that.  Perhaps add
>more fix_sql in test.sh.)

src/bin/pg_upgrade/test.sh doesn't need changes.  Upgrades from 9.6 (the first
version having pg_init_privs) or later get no new diffs.  Upgrades from v8.4
or v9.5 to v14 have a relevant diff before or after this change.  In master:

-REVOKE ALL ON SCHEMA public FROM PUBLIC;
-REVOKE ALL ON SCHEMA public FROM nm;
-GRANT ALL ON SCHEMA public TO nm;
-GRANT ALL ON SCHEMA public TO PUBLIC;

After $SUBJECT:

-REVOKE ALL ON SCHEMA public FROM PUBLIC;
-REVOKE ALL ON SCHEMA public FROM nm;
-GRANT ALL ON SCHEMA public TO nm;
+REVOKE USAGE ON SCHEMA public FROM PUBLIC;
 GRANT ALL ON SCHEMA public TO PUBLIC;

> 3. pg_upgrade from v13 to later versions will transfer template1's ACL for
>schema public, even if that ACL was unchanged since v13 initdb.  (This is
>purely a consequence of the pg_dump behavior decision.)  template0 will
>keep the new default.
> 
> 4. OWNER TO DATABASE_OWNER will likely be available for schemas only, though I
>might propose it for all object classes if class-specific complexity proves
>negligible.

Class-specific complexity was negligible, so I made it available for all
objects.  The syntax is "OWNER TO pg_database_owner", because it's a special
predefined role.  That patch has its own thread:
https://postgr.es/m/20201228043148.ga1053...@rfd.leadboat.com

> 5. ALTER DATABASE OWNER TO changes access control decisions involving
>nspowner==DATABASE_OWNER.  Speed of nspacl checks is more important than
>reacting swiftly to ALTER DATABASE OWNER TO.  Sessions running concurrently
>will be eventually-consistent with respect to the ALTER DATABASE.
>(Existing access control decisions, too, allow this sort of anomaly.)
> 
> 6. pg_dump hasn't been reproducing ALTER SCHEMA public OWNER TO.  That's a
>mild defect today, but it wouldn't be mild anymore.  We'll need pg_dump of
>v13 databases to emit "ALTER SCHEMA public OWNER TO postgres" and for a v14
>=> v15 upgrade to propagate that.  This project can stand by itself; would
>anyone else like to own it?

That patch has its own thread:
https://postgr.es/m/20201229134924.ga1431...@rfd.leadboat.com

Changing this ACL caused 13 of 202 tests to fail in "make check".  I first
intended to modify tests as needed for that suite to keep the default ACL.
For complicated cases, my strategy was to make a test create a schema and
change search_path.  However, that created large expected output diffs
(e.g. ~120 lines in updatable_views.out), mostly in EXPLAIN and \d output
bearing the schema name.  I didn't want that kind of obstacle to future
back-patched test updates, so I did make the first test install the old ACL.
All other in-tree suites do test the new default.

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 

Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.

This switches the default ACL to what the documentation has recommended
since CVE-2018-1058.  Upgrades will carry forward any old ownership and
ACL.  Sites that declined the 2018 recommendation should take a fresh
look.  Recipes for commissioning a n

Re: increase size of pg_commit_ts buffers

2021-02-15 Thread Noah Misch

On Fri, Jan 15, 2021 at 07:07:44PM -0300, Alvaro Herrera wrote:
> I wrote this patch last year in response to a customer issue and I
> thought I had submitted it here, but evidently I didn't.  So here it is.
> 
> The short story is: in commit 5364b357fb11 we increased the size of
> pg_commit (née pg_clog) but we didn't increase the size of pg_commit_ts
> to match.  When commit_ts is in use, this can lead to significant buffer
> thrashing and thus poor performance.
> 
> Since commit_ts entries are larger than pg_commit, my proposed patch uses
> twice as many buffers.

This is a step in the right direction.  With commit_ts entries being forty
times as large as pg_xact, it's not self-evident that just twice as many
buffers is appropriate.  Did you try other numbers?  I'm fine with proceeding
even if not, but the comment should then admit that the new number was a guess
that solved problems for one site.

> --- a/src/backend/access/transam/commit_ts.c
> +++ b/src/backend/access/transam/commit_ts.c
> @@ -530,7 +530,7 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)

The comment right above here is outdated.

>  Size
>  CommitTsShmemBuffers(void)
>  {
> - return Min(16, Max(4, NBuffers / 1024));
> + return Min(256, Max(4, NBuffers / 512));
>  }
>  
>  /*

Re: We should stop telling users to "vacuum that database in single-user mode"

2021-03-01 Thread Noah Misch

On Mon, Mar 01, 2021 at 04:32:23PM +0100, Hannu Krosing wrote:
> It looks like we are unnecessarily instructing our usiers to vacuum their
> databases in single-user mode when just vacuuming would be enough.

> When I started looking at improving the situation I discovered, that there
> already is no need to run VACUUM in single user mode in any currently 
> supported
> PostgreSQL version as you can run VACUUM perfectly well when the wraparound
> protection is active.

A comment in SetTransactionIdLimit() says, "VACUUM requires an XID if it
truncates at wal_level!=minimal."  Hence, I think plain VACUUM will fail some
of the time; VACUUM (TRUNCATE false) should be reliable.  In general, I like
your goal of replacing painful error message advice with less-painful advice.
At the same time, it's valuable for the advice to reliably get the user out of
the bad situation.

Re: Some regular-expression performance hacking

2021-03-06 Thread Noah Misch

On Sat, Feb 13, 2021 at 06:19:34PM +0100, Joel Jacobson wrote:
> To test the correctness of the patches,
> I thought it would be nice with some real-life regexes,
> and just as important, some real-life text strings,
> to which the real-life regexes are applied to.
> 
> I therefore patched Chromium's v8 regexes engine,
> to log the actual regexes that get compiled when
> visiting websites, and also the text strings that
> are the regexes are applied to during run-time
> when the regexes are executed.
> 
> I logged the regex and text strings as base64 encoded
> strings to STDOUT, to make it easy to grep out the data,
> so it could be imported into PostgreSQL for analytics.
> 
> In total, I scraped the first-page of some ~50k websites,
> which produced 45M test rows to import,
> which when GROUP BY pattern and flags was reduced
> down to 235k different regex patterns,
> and 1.5M different text string subjects.

It's great to see this kind of testing.  Thanks for doing it.

Re: pg_upgrade fails with non-standard ACL

2021-03-08 Thread Noah Misch

On Thu, Feb 11, 2021 at 08:16:30PM +0300, Anastasia Lubennikova wrote:
> On 28.01.2021 09:55, Noah Misch wrote:
> >On Wed, Jan 27, 2021 at 07:32:42PM +0300, Anastasia Lubennikova wrote:
> >>On 27.01.2021 14:21, Noah Misch wrote:
> >>>On Mon, Jan 25, 2021 at 10:14:43PM +0300, Anastasia Lubennikova wrote:
> >>>>1) Could you please clarify, what do you mean by REVOKE failures?
> >>>>
> >>>>I tried following example:
> >>>>
> >>>>Start 9.6 cluster.
> >>>>
> >>>>REVOKE ALL ON function pg_switch_xlog() FROM public;
> >>>>REVOKE ALL ON function pg_switch_xlog() FROM backup;
> >>>>
> >>>>The upgrade to the current master passes with and without patch.
> >>>>It seems that current implementation of pg_upgrade doesn't take into 
> >>>>account
> >>>>revoke ACLs.
> >>>I think you can observe it by adding "revoke all on function
> >>>pg_stat_get_subscription from public;" to 
> >>>test_add_acl_to_catalog_objects.sql
> >>>and then rerunning your test script.  pg_dump will reproduce that REVOKE,
> >>>which would fail if postgresql.org had removed that function.  That's 
> >>>fine, so
> >>>long as a comment mentions the limitation.

I've now tested exactly that.  It didn't cause a test script failure, because
pg_upgrade_ACL_test.sh runs "pg_upgrade --check" but does not run the final
pg_upgrade (without --check).  The failure happens only without --check.  I've
attached a modified version of your test script that reproduces the problem.
It uses a different function, timenow(), so the test does not depend on
test_rename_catalog_objects_v14.  The attached script fails with this output:

===
Consult the last few lines of "pg_upgrade_dump_13325.log" for
the probable cause of the failure.
Failure, exiting
===

That log file contains:

===
command: "/tmp/workdir/postgresql_bin_test_new/bin/pg_dump" --host 
/home/nm/src/pg/backbranch/extra --port 50432 --username nm --schema-only 
--quote-all-identifiers --binary-upgrade --format=custom   
--file="pg_upgrade_dump_13325.custom" 'dbname=postgres' >> 
"pg_upgrade_dump_13325.log" 2>&1


command: "/tmp/workdir/postgresql_bin_test_new/bin/pg_restore" --host 
/home/nm/src/pg/backbranch/extra --port 50432 --username nm --clean --create 
--exit-on-error --verbose --dbname template1 "pg_upgrade_dump_13325.custom" >> 
"pg_upgrade_dump_13325.log" 2>&1
pg_restore: connecting to database for restore
pg_restore: dropping DATABASE PROPERTIES postgres
pg_restore: dropping DATABASE postgres
pg_restore: creating DATABASE "postgres"
pg_restore: connecting to new database "postgres"
pg_restore: creating COMMENT "DATABASE "postgres""
pg_restore: creating DATABASE PROPERTIES "postgres"
pg_restore: connecting to new database "postgres"
pg_restore: creating pg_largeobject "pg_largeobject"
pg_restore: creating ACL "pg_catalog.FUNCTION "timenow"()"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3047; 0 0 ACL FUNCTION "timenow"() nm
pg_restore: error: could not execute query: ERROR:  function 
pg_catalog.timenow() does not exist
Command was: REVOKE ALL ON FUNCTION "pg_catalog"."timenow"() FROM PUBLIC;
===

> >>In the updated patch, I implemented generation of both GRANT ALL and REVOKE
> >>ALL for problematic objects. If I understand it correctly, these calls will
> >>clean object's ACL completely. And I see no harm in doing this, because the
> >>objects don't exist in the new cluster anyway.

Here's fix_system_objects_ACL.sql with your v14 test script:

===
\connect postgres
 GRANT ALL  ON function  pg_catalog.pg_stat_get_subscription(pg_catalog.oid)  
TO "backup" ;
 REVOKE ALL  ON function  pg_catalog.pg_stat_get_subscription(pg_catalog.oid)  
FROM "backup" ;
 GRANT ALL  ON pg_catalog.pg_stat_subscription  TO "backup" ;
 REVOKE ALL  ON pg_catalog.pg_stat_subscription  FROM "backup" ;
 GRANT ALL  ON pg_catalog.pg_subscription  TO "backup","test" ;
 REVOKE ALL  ON pg_catalog.pg_subscription  FROM "backup","test" ;
 GRANT ALL  (subenabled)  ON pg_catalog.pg_subscription  TO "backup" ;
 REVOKE ALL  (subenabled)  ON pg_catalog.pg_subscription  FROM "backup" ;
===

Considering the REVOKE statements, those new GRANT statements have no effect.
To prevent the final pg_upgrade failure, fix_system_objects_ACL.sql would need
"GRANT ALL ON FUNCTION pg_catalog.pg_stat_get_s

Re: Procedures versus the "fastpath" API

2021-03-11 Thread Noah Misch

On Wed, Mar 10, 2021 at 10:03:24AM +0100, Laurenz Albe wrote:
> On Tue, 2021-03-09 at 14:15 -0500, Tom Lane wrote:
> > As for procedures, I'm of the opinion that we should just reject those
> > too, but some other security team members were not happy with that
> > idea.  Conceivably we could attempt to make the case actually work,
> > but is it worth the trouble?  Given the lack of field complaints
> > about the "invalid transaction termination" failure, it seems unlikely
> > that it's occurred to anyone to try to call procedures this way.
> > We'd need special infrastructure to test the case, too, since psql
> > offers no access to fastpath calls.
> > 
> > A compromise suggestion was to prohibit calling procedures via
> > fastpath as of HEAD, but leave existing releases as they are,
> > in case anyone is using a case that happens to work.

(That was my suggestion.)

> The "invalid transaction termination" failure alone doesn't
> worry or surprise me - transaction handling in procedures only works
> under rather narrow conditions anyway (no SELECT on the call stack,
> no explicit transaction was started outside the procedure).
> 
> If that is the only problem, I'd just document it.  The hard work is
> of course that there is no other problem with calling procedures that
> way.  If anybody wants to do that work, and transaction handling is
> the only thing that doesn't work with the fastpath API, we can call
> it supported and document the exception.

I'd be fine with that, too.

> In case of doubt, I would agree with you and forbid it in HEAD
> as a corner case with little real-world use.

The PQfn(some-procedure) feature has no known bugs and no known users, so I
think the decision carries little weight.  Removing the feature would look
wise if we later discover some hard-to-fix bug therein.  Removal also obeys
the typical pattern that a given parse context accepts either procedures or
functions, not both.  Keeping the feature, at least in back branches, would
look wise if it avoids urgent s/PQfn/PQexecParams/ work for some user trying
to update to 13.3.

Re: pg_amcheck contrib application

2021-03-12 Thread Noah Misch

On Sat, Mar 13, 2021 at 01:07:15AM -0500, Tom Lane wrote:
> I wrote:
> >> ... btw, prairiedog (which has a rather old Perl) has a
> >> different complaint:
> >> Invalid type 'q' in unpack at t/004_verify_heapam.pl line 104.
> 
> > Hmm ... "man perlfunc" on that system quoth
> >q   A signed quad (64-bit) value.
> >Q   An unsigned quad value.
> >  (Quads are available only if your system supports 
> > 64-bit
> >   integer values _and_ if Perl has been compiled to 
> > support those.
> >   Causes a fatal error otherwise.)
> > It does not seem unreasonable for us to rely on Perl having that
> > in 2021, so I'll see about upgrading this perl installation.
> 
> Hm, wait a minute: hoverfly is showing the same failure, even though
> it claims to be running a 64-bit Perl.  Now I'm confused.

On that machine:

[nm@power8-aix 7:0 2021-03-13T06:09:08 ~ 0]$ /usr/bin/perl64 -e 'unpack "q", ""'
[nm@power8-aix 7:0 2021-03-13T06:09:10 ~ 0]$ /usr/bin/perl -e 'unpack "q", ""'
Invalid type 'q' in unpack at -e line 1.

hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 32-bit
Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
"prove64".)  This test should unpack the field as two 32-bit values, not a
64-bit value, to avoid requiring more from the Perl installation.

Re: pg_amcheck contrib application

2021-03-12 Thread Noah Misch

On Sat, Mar 13, 2021 at 01:36:11AM -0500, Tom Lane wrote:
> Mark Dilger  writes:
> > On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
> >> Coping with both endiannesses might be painful.
> 
> > Not too bad if the bigint value is zero, as both the low and high 32bits 
> > will be zero, regardless of endianness.  The question is whether that gives 
> > up too much in terms of what the test is trying to do.  I'm not sure that 
> > it does, but if you'd rather solve this by upgrading perl, that's ok by me. 
> 
> I don't mind updating the perl installations on prairiedog and gaur,
> but Noah might have some difficulty with his AIX flotilla, as I believe
> he's not sysadmin there.

The AIX animals have Perl v5.28.1.  hoverfly, in particular, got a big update
package less than a month ago.  Hence, I doubt it's a question of applying
routine updates.  The puzzle would be to either (a) compile a 32-bit Perl that
handles unpack('q') or (b) try a PostgreSQL configuration like "./configure
... PROVE='perl64 /usr/bin/prove --'" to run the TAP suites under perl64.
(For hoverfly, it's enough to run "prove" under $PERL.  mandrill, however,
needs a 32-bit $PERL for plperl, regardless of what it needs for "prove".)
Future AIX packagers would face doing the same.

With v5-0001-pg_amcheck-continuing-to-fix-portability-problems.patch being so
self-contained, something like it is the way to go.

Re: pg_amcheck contrib application

2021-03-13 Thread Noah Misch

On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
> > On Mar 12, 2021, at 3:24 PM, Mark Dilger  
> > wrote:
> > 
> > and the second deals with an apparent problem with IPC::Run shell expanding 
> > an asterisk on some platforms but not others.  That second one, if true, 
> > seems like a problem with scope beyond the pg_amcheck project, as 
> > TestLib::command_checks_all uses IPC::Run, and it would be desirable to 
> > have consistent behavior across platforms.

> > One of pg_amcheck's regression tests was passing an asterisk through
> > TestLib's command_checks_all() command, which gets through to
> > pg_amcheck without difficulty on most platforms, but appears to get
> > shell expanded on Windows (jacana) and AIX (hoverfly).

For posterity, I can't reproduce this on hoverfly.  The suite fails the same
way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite passes
even after reverting f371a4c.  Self-contained IPC::Run usage also does not
corroborate the theory:

[nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
'IPC::Run::run "printf", "%s\n", "*"'
*
[nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
'IPC::Run::run "sh", "-c", "printf %sn *"'
COPYRIGHT
GNUmakefile.in
HISTORY
Makefile
README
README.git
aclocal.m4
config
configure
configure.ac
contrib
doc
src

> there is a similar symptom caused by an unrelated problem

> Subject: [PATCH v3] Avoid use of non-portable option ordering in
>  command_checks_all().

On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this.

Re: pg_amcheck contrib application

2021-03-13 Thread Noah Misch

On Sat, Mar 13, 2021 at 10:51:27AM -0800, Mark Dilger wrote:
> > On Mar 13, 2021, at 10:46 AM, Noah Misch  wrote:
> > On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
> >>> On Mar 12, 2021, at 3:24 PM, Mark Dilger  
> >>> wrote:
> >>> and the second deals with an apparent problem with IPC::Run shell 
> >>> expanding an asterisk on some platforms but not others.  That second one, 
> >>> if true, seems like a problem with scope beyond the pg_amcheck project, 
> >>> as TestLib::command_checks_all uses IPC::Run, and it would be desirable 
> >>> to have consistent behavior across platforms.
> > 
> >>> One of pg_amcheck's regression tests was passing an asterisk through
> >>> TestLib's command_checks_all() command, which gets through to
> >>> pg_amcheck without difficulty on most platforms, but appears to get
> >>> shell expanded on Windows (jacana) and AIX (hoverfly).
> > 
> > For posterity, I can't reproduce this on hoverfly.  The suite fails the same
> > way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite 
> > passes
> > even after reverting f371a4c.  Self-contained IPC::Run usage also does not
> > corroborate the theory:
> > 
> > [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
> > 'IPC::Run::run "printf", "%s\n", "*"'
> > *
> > [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
> > 'IPC::Run::run "sh", "-c", "printf %sn *"'
> > COPYRIGHT
> > GNUmakefile.in
> > HISTORY
> > Makefile
> > README
> > README.git
> > aclocal.m4
> > config
> > configure
> > configure.ac
> > contrib
> > doc
> > src

> The reason I suspected that passing the '*' through IPC::Run had to do with 
> the error that pg_amcheck gave.  It complained that too many arguments where 
> being passed to it, and that the first such argument was "pg_amcheck.c".  
> There's no reason pg_amcheck should know it's source file name, nor that the 
> regression test should know that, which suggested that the asterisk was being 
> shell expanded within the src/bin/pg_amcheck/ directory and the file listing 
> was being passed into pg_amcheck as arguments.

I agree.  I can reproduce the problem on Windows.  Commit f371a4c fixed
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-03-12%2020%3A12%3A44
and I see logs of that kind of failure only on fairywren and jacana.

Re: Tying an object's ownership to datdba

2021-03-15 Thread Noah Misch

A cfbot failure showed I had missed ORDER BY in some test queries.

On Sun, Dec 27, 2020 at 08:31:48PM -0800, Noah Misch wrote:
> I ended up blocking DDL that creates role memberships involving the new role;
> see reasons in user.c comments.  Lifting those restrictions looked feasible,
> but it was inessential to the mission, and avoiding unintended consequences
> would have been tricky.

Later, I pondered the case of pg_database_owner owning a shared object
(database or tablespace).  The behavior is certainly odd.  For a tablespace,
any database owner can act as the tablespace owner (but only when connected to
a database that the role owns).  For a database, likewise.  When connected to
a database having datdba=pg_database_owner, no particular role acts as the
owner, just superusers and SECURITY DEFINER functions owned by
pg_database_owner.  I don't have high hopes for that being useful, but I
couldn't quite convince myself to ban it.  Attached v2 does expand an
AddRoleMems() comment to discuss this, though.
Author: Noah Misch 
Commit: Noah Misch 

Merge similar algorithms into roles_is_member_of().

The next commit would have complicated two or three algorithms, so take
this opportunity to consolidate.  No functional changes.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index fe6c444..1adacb9 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -50,32 +50,24 @@ typedef struct
 /*
  * We frequently need to test whether a given role is a member of some other
  * role.  In most of these tests the "given role" is the same, namely the
- * active current user.  So we can optimize it by keeping a cached list of
- * all the roles the "given role" is a member of, directly or indirectly.
- *
- * There are actually two caches, one computed under "has_privs" rules
- * (do not recurse where rolinherit isn't true) and one computed under
- * "is_member" rules (recurse regardless of rolinherit).
+ * active current user.  So we can optimize it by keeping cached lists of all
+ * the roles the "given role" is a member of, directly or indirectly.
  *
  * Possibly this mechanism should be generalized to allow caching membership
  * info for multiple roles?
  *
- * The has_privs cache is:
- * cached_privs_role is the role OID the cache is for.
- * cached_privs_roles is an OID list of roles that cached_privs_role
- * has the privileges of (always including itself).
- * The cache is valid if cached_privs_role is not InvalidOid.
- *
- * The is_member cache is similarly:
- * cached_member_role is the role OID the cache is for.
- * cached_membership_roles is an OID list of roles that cached_member_role
- * is a member of (always including itself).
- * The cache is valid if cached_member_role is not InvalidOid.
+ * Each element of cached_roles is an OID list of constituent roles for the
+ * corresponding element of cached_role (always including the cached_role
+ * itself).  One cache has ROLERECURSE_PRIVS semantics, and the other has
+ * ROLERECURSE_MEMBERS semantics.
  */
-static Oid cached_privs_role = InvalidOid;
-static List *cached_privs_roles = NIL;
-static Oid cached_member_role = InvalidOid;
-static List *cached_membership_roles = NIL;
+enum RoleRecurseType
+{
+   ROLERECURSE_PRIVS = 0,  /* recurse if rolinherit */
+   ROLERECURSE_MEMBERS = 1 /* recurse unconditionally */
+};
+static Oid cached_role[] = {InvalidOid, InvalidOid};
+static List *cached_roles[] = {NIL, NIL};
 
 
 static const char *getid(const char *s, char *n);
@@ -4675,8 +4667,8 @@ initialize_acl(void)
{
/*
 * In normal mode, set a callback on any syscache invalidation 
of rows
-* of pg_auth_members (for each AUTHMEM search in this file) or
-* pg_authid (for has_rolinherit())
+* of pg_auth_members (for roles_is_member_of()) or pg_authid 
(for
+* has_rolinherit())
 */
CacheRegisterSyscacheCallback(AUTHMEMROLEMEM,
  
RoleMembershipCacheCallback,
@@ -4695,8 +4687,8 @@ static void
 RoleMembershipCacheCallback(Datum arg, int cacheid, uint32 hashvalue)
 {
/* Force membership caches to be recomputed on next use */
-   cached_privs_role = InvalidOid;
-   cached_member_role = InvalidOid;
+   cached_role[ROLERECURSE_PRIVS] = InvalidOid;
+   cached_role[ROLERECURSE_MEMBERS] = InvalidOid;
 }
 
 
@@ -4718,113 +4710,33 @@ has_rolinherit(Oid roleid)
 
 
 /*
- * Get a list of roles that the specified roleid has the privileges of
- *
- * This is defined not to recurse through roles that don't have rolinherit
- * set; for such roles, membership implies the abi

Re: pg_amcheck contrib application

2021-03-15 Thread Noah Misch

On Mon, Mar 15, 2021 at 02:57:20PM -0400, Tom Lane wrote:
> Mark Dilger  writes:
> > On Mar 15, 2021, at 10:04 AM, Tom Lane  wrote:
> >> These animals have somewhat weird alignment properties: MAXALIGN is 8
> >> but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
> >> choices about whether an out-of-line TOAST value is needed, breaking
> >> this test case.

That machine also has awful performance for filesystem metadata operations,
like open(O_CREAT).  Its CPU and read()/write() performance are normal.

> > The logic in verify_heapam only looks for a value in the toast table if
> > the tuple it gets from the main table (in this case, from pg_statistic)
> > has an attribute that claims to be toasted.  The error message we're
> > seeing that you quoted above simply means that no entry exists in the
> > toast table.
> 
> Yeah, that could be phrased better.  Do we have a strong enough lock on
> the table under examination to be sure that autovacuum couldn't remove
> a dead toast entry before we reach it?  But this would only be an
> issue if we are trying to check validity of toasted fields within
> known-dead tuples, which I would argue we shouldn't, since lock or
> no lock there's no guarantee the toast entry is still there.
> 
> Not sure that I believe the theory that this is from bad luck of
> concurrent autovacuum timing, though.

With autovacuum_naptime=1s, on hornet, the failure reproduced twice in twelve
runs.  With v6-0001-Turning-off-autovacuum-during-corruption-tests.patch
applied, 196 runs all succeeded.

> The fact that we're seeing
> this on just those two animals suggests strongly to me that it's
> architecture-correlated, instead.

That is possible.

Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2020-10-28 Thread Noah Misch

On Tue, Oct 27, 2020 at 07:14:14AM +, Hubert Zhang wrote:
> Libpq has supported to specify multiple hosts in connection string and enable 
> auto failover when the previous PostgreSQL instance cannot be accessed.
> But when I tried to enable this feature for a non-hot standby, it cannot do 
> the failover with the following messages.
> 
> psql: error: could not connect to server: FATAL:  the database system is 
> starting up
> 
> Document says ' If a connection is established successfully, but 
> authentication fails, the remaining hosts in the list are not tried.'
> I'm wondering is it a feature by design or a bug? If it's a bug, I plan to 
> fix it.

I felt it was a bug, but the community as a whole may or may not agree:
https://postgr.es/m/flat/16508-1a63222835164566%40postgresql.org

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2020-10-28 Thread Noah Misch

On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
> On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
> > On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
> > > On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> > > > Noah Misch  writes:
> > > > > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> > > > >> So I think what we're actually trying to accomplish here is to
> > > > >> ensure that instead of deleting up to half of the SLRU space
> > > > >> before the cutoff, we delete up to half-less-one-segment.
> > > > >> Maybe it should be half-less-two-segments, just to provide some
> > > > >> cushion against edge cases.  Reading the first comment in
> > > > >> SetTransactionIdLimit makes one not want to trust too much in
> > > > >> arguments based on the exact value of xidWrapLimit, while for
> > > > >> the other SLRUs it was already unclear whether the edge cases
> > > > >> were exactly right.
> > > > 
> > > > > That could be interesting insurance.  While it would be sad for us to 
> > > > > miss an
> > > > > edge case and print "must be vacuumed within 2 transactions" when 
> > > > > wrap has
> > > > > already happened, reaching that message implies the DBA burned ~1M 
> > > > > XIDs, all
> > > > > in single-user mode.  More plausible is FreezeMultiXactId() 
> > > > > overrunning the
> > > > > limit by tens of segments.  Hence, if we do buy this insurance, let's 
> > > > > skip far
> > > > > more segments.  For example, instead of unlinking segments 
> > > > > representing up to
> > > > > 2^31 past XIDs, we could divide that into an upper half that we 
> > > > > unlink and a
> > > > > lower half.  The lower half will stay in place; eventually, XID 
> > > > > consumption
> > > > > will overwrite it.  Truncation behavior won't change until the region 
> > > > > of CLOG
> > > > > for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
> > > > > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. 
> > > > >  CLOG
> > > > > maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?
> > 
> > > > Temporarily wasting some disk
> > > > space is a lot more palatable than corrupting data, and these code
> > > > paths are necessarily not terribly well tested.  So +1 for more
> > > > insurance.
> > > 
> > > Okay, I'll give that a try.  I expect this will replace the PagePrecedes
> > > callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
> > > PagePrecedes(a, b).  PageDiff callbacks shall distribute return values
> > > uniformly in [INT_MIN,INT_MAX].  SimpleLruTruncate() will unlink segments
> > > where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.
> > 
> > While doing so, I found that slru-truncate-modulo-v2.patch did get edge 
> > cases
> > wrong, as you feared.  In particular, if the newest XID reached xidStopLimit
> > and was in the first page of a segment, TruncateCLOG() would delete its
> > segment.  Attached slru-truncate-modulo-v3.patch fixes that; as 
> > restitution, I
> > added unit tests covering that and other scenarios.  Reaching the bug via 
> > XIDs
> > was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
> > single-user mode.  I expect the bug was easier to reach via pg_multixact.
> > 
> > The insurance patch stacks on top of the bug fix patch.  It does have a
> > negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
> > to skip truncation in corrupted clusters.  SlruScanDirCbFindEarliest() gives
> > nonsense answers if "future" segments exist.  That can happen today, but the
> > patch creates new ways to make it happen.  The symptom is wasting yet more
> > space in pg_multixact.  I am okay with this, since it arises only after one
> > fills pg_multixact 50% full.  There are alternatives.  We could weaken the
> > corruption defense in TruncateMultiXact() or look for another implementation
> > of equivalent defense.  We could unlink, say, 75% or 95% of the "past" 
> > instead
> > of 50% (this patch) or >99.99% (today's behavior).
> 
> Rebased the second patch.  The first patch did not need

Re: public schema default ACL

2020-10-31 Thread Noah Misch

On Thu, Aug 06, 2020 at 12:48:17PM +0200, Magnus Hagander wrote:
> On Mon, Aug 3, 2020 at 5:26 PM Stephen Frost  wrote:
> > * Noah Misch (n...@leadboat.com) wrote:
> > > Between (b)(2)(X) and (b)(3)(X), what are folks' preferences?  Does anyone
> > > strongly favor some other option (including the option of changing 
> > > nothing)
> > > over both of those two?
> > 
> > Having the database owner own the public schema makes the most sense to
> > me- that this doesn't happen today has always seemed a bit odd to me as,
> > notionally, you'd imagine the "owner" of a database as, well, owning the
> > objects in that database (clearly they shouldn't actually own system
> > catalogs or functions or such, but the public schema isn't some internal
> > thing like the system catalogs and such).  Having the database owner not
> > have to jump through hoops to create objects immediately upon connection
> > to a new database also seems like it reduces the compatibility impact
> > that this will have.
> > 
> 
> +1. This feels mostly like a weird quirk in the current system. Having the 
> database owner own it would feel a lot more logical.
> 
> > In general, I'm still in favor of the overall change and moving to
> > better and more secure defaults.
> 
> +.

(b)(2)(X) got no votes.

(b)(3)(X) got votes from Stephen Frost and Magnus Hagander.  I'll pick it,
too.  Peter Eisentraut did not vote, but I'm counting him as +0.2 for it in
light of this comment:

On Mon, Aug 03, 2020 at 07:46:02PM +0200, Peter Eisentraut wrote:
> The important things in my mind are that you keep an easy onboarding
> experience (you can do SQL things without having to create and unlock a
> bunch of things first) and that advanced users can do the things they want
> to do *somehow*.

Robert Haas did not vote, but this seems more consistent with a request to
wait for better ideas and change nothing for now:

On Mon, Aug 03, 2020 at 09:46:23AM -0400, Robert Haas wrote:
> I don't think we have any options here that are secure but do not
> break backward compatibility.

Overall, that's 3.2 votes for (b)(3)(X) and 0.0 to 1.0 votes for changing
nothing.  That suffices to proceed with (b)(3)(X).  However, given the few
votes and the conspicuous non-responses, work in this area has a high risk of
failure.  Hence, I will place it at a low-priority position in my queue.
Would anyone else would like to take over implementation?

More details on the semantics I'll use:

1. initdb will change like this:
   @@ -1721 +1721 @@ setup_privileges(FILE *cmdfd)
   -"GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n\n",
   +"GRANT USAGE ON SCHEMA public TO PUBLIC;\n\n",
   +"ALTER SCHEMA public OWNER TO DATABASE_OWNER;\n\n",

2. If schema public does not exist, pg_dump will emit nothing about it.  This
   is what happens today.  (I suspect it would be better for pg_dump to emit
   DROP SCHEMA public RESTRICT, but that is drifting offtopic for $SUBJECT.)
   Otherwise, when dumping from v13 or earlier, pg_dump will always emit
   REVOKE and/or GRANT statements to reproduce the old ACL.  When dumping from
   v14 or later, pg_dump will use pg_init_privs to compute GRANT and REVOKE
   statements, as it does today.  (This may interfere with cross-version
   pg_upgrade testing.  I haven't looked at how best to fix that.  Perhaps add
   more fix_sql in test.sh.)

3. pg_upgrade from v13 to later versions will transfer template1's ACL for
   schema public, even if that ACL was unchanged since v13 initdb.  (This is
   purely a consequence of the pg_dump behavior decision.)  template0 will
   keep the new default.

4. OWNER TO DATABASE_OWNER will likely be available for schemas only, though I
   might propose it for all object classes if class-specific complexity proves
   negligible.

5. ALTER DATABASE OWNER TO changes access control decisions involving
   nspowner==DATABASE_OWNER.  Speed of nspacl checks is more important than
   reacting swiftly to ALTER DATABASE OWNER TO.  Sessions running concurrently
   will be eventually-consistent with respect to the ALTER DATABASE.
   (Existing access control decisions, too, allow this sort of anomaly.)

6. pg_dump hasn't been reproducing ALTER SCHEMA public OWNER TO.  That's a
   mild defect today, but it wouldn't be mild anymore.  We'll need pg_dump of
   v13 databases to emit "ALTER SCHEMA public OWNER TO postgres" and for a v14
   => v15 upgrade to propagate that.  This project can stand by itself; would
   anyone else like to own it?

Thanks,
nm

Re: public schema default ACL

2020-11-02 Thread Noah Misch

On Mon, Nov 02, 2020 at 12:42:26PM -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Nov 2, 2020 at 5:51 AM Peter Eisentraut
> >  wrote:
> >> I'm not convinced, however, that this would would really move the needle
> >> in terms of the general security-uneasiness about the public schema and
> >> search paths.  AFAICT, in any of your proposals, the default would still
> >> be to have the public schema world-writable and in the path.
> 
> > Noah's proposed change to initdb appears to involve removing CREATE
> > permission by default, so I don't think this is true.
> 
> I assume that means removing *public* CREATE permissions, not the
> owner's (which'd be the DB owner with the proposed changes).

My plan is for the default to become:

  GRANT USAGE ON SCHEMA public TO PUBLIC;
  ALTER SCHEMA public OWNER TO DATABASE_OWNER;  -- new syntax

Hence, the dbowner can create objects in the schema or grant that ability to
others.  Anyone can e.g. SELECT/UPDATE tables in the schema or call functions
in the schema, subject to per-table/per-function ACLs.  ACK that this wasn't
explicit on the thread until a few days ago.  I kept universal USAGE because
the schema wouldn't be very "public" without that.

> > It's hard to predict how many users that might inconvenience, but I
> > suppose it's probably a big number. On the other hand, the only
> > alternative is to continue shipping a configuration that, by default,
> > is potentially insecure. It's hard to decide which thing we should
> > care more about.
> 
> Yeah.  The thing is, if we make it harder to create stuff in "public",
> that's going to result in the path-of-least-resistance being to run
> everything as the DB owner.  Which is better than running everything as
> superuser (at least if DB owner != postgres), but still not exactly great.
> Second least difficult thing is to re-grant public CREATE permissions,
> putting things right back where they were.

That is factual; whenever a strategy is easier to start than its alternatives,
folks will overconsume that strategy.  One can mitigate that by introducing
artificial obstacles to use of the discouraged strategy, but that will tend to
harm the onboarding experience.  Option (b)(2)(X) would have done that.

When folks end up creating all objects as the database owner, we still get the
win for roles that don't create permanent objects.  It's decently common to
have an app run as a user that queries existing permanent objects, not issuing
permanent-object DDL.  That works under both v13 and future defaults.

> What potentially could move the needle is separate search paths for
> relation lookup and function/operator lookup.  We have sort of stuck
> our toe in that pond already by discriminating against pg_temp for
> function/operator lookup, but we could make that more formalized and
> controllable if there were distinct settings.  I'm not sure offhand
> how much of a compatibility problem that produces.

Stephen raised a good point about this.  Separately, regarding compatibility,
suppose a v13 database has:

  CREATE FUNCTION f() RETURNS int LANGUAGE sql SECURITY DEFINER
  AS 'SELECT inner_f()' SET search_path = a, b;

For compatibility, no value of the function-search-path setting should break
this function's ability to find a.inner_f(void).  Which definition of
function-search-path achieves this?

Re: Rethinking LOCK TABLE's behavior on views

2020-11-07 Thread Noah Misch

On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote:
> The problems discussed in bug #16703 [1] show that pg_dump needs a
> version of LOCK TABLE that behaves differently for views than
> what we have now.  Since v11, LOCK TABLE on a view recurses to all
> tables and views named in the view, and it does so using the view
> owner's permissions, meaning that a view that would have permissions
> failures if executed will also have permissions failures when locked.
> That's probably fine for ordinary usage, but it's disastrous for
> pg_dump --- even a superuser can't lock such a view.
> 
> Moreover, pg_dump doesn't really need the recursive behavior.  It just
> needs the view's definition to hold still; and in any case, a typical
> pg_dump run would have independently acquired locks on all the other
> relations anyway.  The recursion is buying us nothing, except perhaps
> an increased risk of deadlocks against concurrent DDL operations.

The getTables() locking aims to take the locks that will be taken later.  That
avoids failing after expensive work.  For views, the later lock-taker is
pg_get_viewdef(), which locks more than just the view but less than[2] LOCK
TABLE.  Recursion buys us more than nothing for "pg_dump --table=viewname", so
abandoning recursion unconditionally is a step in the wrong direction.  I
don't expect --table to be as excellent as complete dumps, but a change that
makes it worse does lose points.  I want to keep the recursion.

> (I'm not quite sure if that's significant, given that pg_dump pays
> no attention to the order in which it locks things.  But it sure as
> heck isn't *decreasing* the risk; and it's a behavior that we could
> not hope to improve with more smarts about pg_dump's lock ordering.)

Reordering to avoid deadlocks would be best-effort, so it's fine not to have
full control over the order.

> Closely related to this is whether pg_dump ought to be using ONLY for
> locking regular tables too.  I tend to think that it should be, again
> on the grounds that any child tables we may be interested in will get
> locked separately, so that we're not doing anything by recursing except
> expending extra cycles and perhaps increasing the chance of a deadlock.

Agreed.  "pg_dump --table=inheritance_parent" never queries inheritance
children, so it's nice not to lock them.

> A completely different approach we could consider is to weaken the
> permissions requirements for LOCK on a view, say "allow it if either
> the calling user or the view owner has the needed permission".  This
> seems generally pretty messy and so I don't much like it, but we
> should consider as many solutions as we can think of.

This is the best of what you've listed by a strong margin, and I don't know of
better options you've not listed.  +1 for it.  Does it work for you?  I think
the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" as
a family of use cases.  For views only, different $ACTIONS want different
behavior.  $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants
shallower recursion and caller permissions; DROP VIEW wants no recursion.

> [1] 
> https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org

[2] For example, pg_get_viewdef('pg_user') locks pg_shadow, but "LOCK TABLE
pg_user" additionally locks pg_authid and pg_db_role_setting.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2020-11-08 Thread Noah Misch

On Wed, Oct 28, 2020 at 09:01:59PM -0700, Noah Misch wrote:
> On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
> > On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
> > > [last non-rebase change]
> > 
> > Rebased the second patch.  The first patch did not need a rebase.
> 
> Rebased both patches, necessitated by commit dee663f changing many of the same
> spots.

Rebased both patches, necessitated by commit c732c3f (a repair of commit
dee663f).  As I mentioned on another branch of the thread, I'd be content if
someone reviews the slru-truncate-modulo patch and disclaims knowledge of the
slru-truncate-insurance patch; I would then abandon the latter patch.
Author: Noah Misch 
Commit: Noah Misch 

Prevent excess SimpleLruTruncate() deletion.

Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb.  Back-patch
to 9.5 (all supported versions).

Reviewed by Tom Lane.

Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 034349a..55bdac4 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
  XactSLRULock, "pg_xact", 
LWTRANCHE_XACT_BUFFER,
  SYNC_HANDLER_CLOG);
+   SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
 /*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid 
oldestxid_datoid)
 
 
 /*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
  *
  * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic.  However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs.  So, offset both such that xid1,
+ * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
+ * relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case.  The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not.  When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment.  Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback.  The benefit would be rare and small, so we
+ * don't optimize that edge case.
  */
 static bool
 CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
 
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
-   xid1 += FirstNormalTransactionId;
+   xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
-   xid2 += FirstNormalTransactionId;
+   xid2 += FirstNormalTransactionId + 1;
 
-   return TransactionIdPrecedes(xid1, xid2);
+   return (TransactionIdPrecedes(xid1, xid2) &&
+   TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE 
- 1));
 }
 
 
diff --git a/src/backend/access/transam/commit_ts.c 
b/src/backend/access/transam/commit_ts.c
index 2fe551f..ae45777 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,7 @@ CommitTsShmemInit(void)
  CommitTsSLRULock, "pg_commit_ts",
  LWTRANCHE_COMMITTS_BUFFER,
  SYNC_HANDLER_COMMIT_TS);
+   SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
commitTsShared = ShmemInitStruct("CommitTs shared",

Re: Temporary tables versus wraparound... again

2020-11-08 Thread Noah Misch

On Sun, Nov 08, 2020 at 06:19:57PM -0500, Greg Stark wrote:
> However in the case of ON COMMIT DELETE ROWS we can do better. Why not
> just reset the relfrozenxid and other stats as if the table was
> freshly created when it's truncated?

That concept is sound.

> 1) Should we update relpages and reltuples. I think so but an argument
> could be made that people might be depending on running analyze once
> when the data is loaded and then not having to run analyze on every
> data load.

I'd wager no, we should not.  An app that ever analyzes an ON COMMIT DELETE
ROWS temp table probably analyzes it every time.  If not, it's fair to guess
that similar statistics recur in each xact.

> 2) adding the dependency on heapam.h to heap.c makes sense because of
> heap_inplace_update bt it may be a bit annoying because I suspect
> that's a useful sanity check that the tableam stuff hasn't been
> bypassed

That is not terrible.  How plausible would it be to call vac_update_relstats()
for this, instead of reimplementing part of it?

> 3) I added a check to the regression tests but I'm not sure it's a
> good idea to actually commit this. It could fail if there's a parallel
> transaction going on and even moving the test to the serial schedule
> might not guarantee that never happens due to autovacuum running
> analyze?

Right.

> @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel)
>  
>   /* Truncate the underlying relation */
>   table_relation_nontransactional_truncate(rel);
> + ResetVacStats(rel);

I didn't test, but I expect this will cause a stats reset for the second
TRUNCATE here:

CREATE TABLE t ();
...
BEGIN;
TRUNCATE t;
TRUNCATE t;  -- inplace relfrozenxid reset
ROLLBACK;  -- inplace reset survives

Does that indeed happen?

Re: Rethinking LOCK TABLE's behavior on views

2020-11-10 Thread Noah Misch

On Mon, Nov 09, 2020 at 11:42:33AM -0300, Alvaro Herrera wrote:
> On 2020-Nov-07, Noah Misch wrote:
> > On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote:
> > > A completely different approach we could consider is to weaken the
> > > permissions requirements for LOCK on a view, say "allow it if either
> > > the calling user or the view owner has the needed permission".  This
> > > seems generally pretty messy and so I don't much like it, but we
> > > should consider as many solutions as we can think of.
> > 
> > This is the best of what you've listed by a strong margin, and I don't know 
> > of
> > better options you've not listed.  +1 for it.  Does it work for you?
> 
> It does sound attractive from a user complexity perspective, even if it
> does sound messy form an implementation perspective.
> 
> > I think
> > the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" 
> > as
> > a family of use cases.  For views only, different $ACTIONS want different
> > behavior.  $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants
> > shallower recursion and caller permissions; DROP VIEW wants no recursion.
> 
> Maybe we can tackle this problem directly, by adding a clause to LOCK
> TABLE to indicate a purpose for the lock that the server can use to
> determine the level of recursion.  For example
>   LOCK TABLE xyz IN  FOR 
> where  can be READ, DROP, DEFINE.

Possible.  Regrettably, we're not set up for it; running pg_get_viewdef() to
completion is today's way to determine what it will lock.  Each of these modes
probably would have condensed copies of the operation they mimic, which I'd
find sadder than locking somewhat more than pg_dump needs (via today's "LOCK
TABLE viewname" behavior).  Is it plausible to do without that duplication?

Re: public schema default ACL

2020-11-12 Thread Noah Misch

On Mon, Nov 09, 2020 at 02:56:53PM -0500, Bruce Momjian wrote:
> On Mon, Nov  2, 2020 at 11:05:15PM -0800, Noah Misch wrote:
> > My plan is for the default to become:
> > 
> >   GRANT USAGE ON SCHEMA public TO PUBLIC;
> >   ALTER SCHEMA public OWNER TO DATABASE_OWNER;  -- new syntax
> 
> Seems it would be better to create a predefined role that owns the
> public schema, or at least has create permission for the public schema
> --- that way, when you are creating a role, you can decide if the role
> should have creation permissions in the public schema, rather than
> having people always using the database owner for this purpose.

Defaulting to a specific predefined role empowers the role's members in all
databases simultaneously.  Folks who want it like that can create a role and
issue "ALTER SCHEMA public OWNER TO that_role" in template1.  What's the
better default?  I think that depends on whether you regard this schema as a
per-database phenomenon or a per-cluster phenomenon.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-11-29 Thread Noah Misch

On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> Thanks, I have pushed the last patch. Let's wait for a day or so to
> see the buildfarm reports

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14
failed the new 015_stream.pl test with the subscriber looping like this:

2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication apply worker 
for subscription "tap_sub" has started
2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open temporary file 
"16393-510.changes.0" from BufFile "16393-510.changes": No such file or 
directory
2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply worker 
for subscription "tap_sub" has started
2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker "logical 
replication worker" (PID 13959252) exited with exit code 1
2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open temporary file 
"16393-510.changes.0" from BufFile "16393-510.changes": No such file or 
directory
...

What happened there?

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-12 Thread Noah Misch

On Mon, Jan 11, 2021 at 11:22:05AM +0500, Andrey Borodin wrote:
> I'm marking patch as ready for committer.

Thanks.

> I can't tell should we backpatch insurance patch or not: it potentially fixes 
> unknown bugs, and potentially contains unknown bugs. I can't reason because 
> of such uncertainty. I've tried to look for any potential problem and as for 
> now I see none. Chances are  is doing 
> code less error-prone.

What do you think of abandoning slru-truncate-t-insurance entirely?  As of
https://postgr.es/m/20200330052809.gb2324...@rfd.leadboat.com I liked the idea
behind it, despite its complicating the system for hackers and DBAs.  The
TruncateMultiXact() interaction rendered it less appealing.  In v14+, commit
cd5e822 mitigates the kind of bugs that slru-truncate-t-insurance mitigates,
further reducing the latter's value.  slru-truncate-t-insurance does mitigate
larger trespasses into unlink-eligible space, though.

> Fix  certainly worth backpatching.

I'll push it on Saturday, probably.

Re: Wrong usage of RelationNeedsWAL

2021-01-15 Thread Noah Misch

On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> The definition of the macro RelationNeedsWAL has been changed by
> c6b92041d3 to include conditions related to the WAL-skip optimization
> but some uses of the macro are not relevant to the optimization. That
> misuses are harmless for now as they are only executed while wal_level
> >= replica or WAL-skipping is inactive. However, this should be
> corrected to prevent future hazard.

I see user-visible consequences:

> --- a/src/backend/catalog/pg_publication.c
> +++ b/src/backend/catalog/pg_publication.c
> @@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
>errdetail("System tables cannot be added to 
> publications.")));
>  
>   /* UNLOGGED and TEMP relations cannot be part of publication. */
> - if (!RelationNeedsWAL(targetrel))
> + if (!RelationIsWalLogged(targetrel))

Today, the following fails needlessly under wal_level=minimal:

BEGIN;
SET client_min_messages = 'ERROR';
CREATE TABLE skip_wal ();
CREATE PUBLICATION p FOR TABLE skip_wal;
ROLLBACK;

Could you add that test to one of the regress scripts?

>   ereport(ERROR,
>   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>errmsg("table \"%s\" cannot be replicated",
> diff --git a/src/backend/optimizer/util/plancat.c 
> b/src/backend/optimizer/util/plancat.c
> index da322b453e..0500efcdb9 100644
> --- a/src/backend/optimizer/util/plancat.c
> +++ b/src/backend/optimizer/util/plancat.c
> @@ -126,7 +126,7 @@ get_relation_info(PlannerInfo *root, Oid 
> relationObjectId, bool inhparent,
>   relation = table_open(relationObjectId, NoLock);
>  
>   /* Temporary and unlogged relations are inaccessible during recovery. */
> - if (!RelationNeedsWAL(relation) && RecoveryInProgress())
> + if (!RelationIsWalLogged(relation) && RecoveryInProgress())

This has no user-visible consequences, but I agree you've clarified it.

>   ereport(ERROR,
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>errmsg("cannot access temporary or unlogged 
> relations during recovery")));
> diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
> index 10b63982c0..810806a542 100644
> --- a/src/include/utils/rel.h
> +++ b/src/include/utils/rel.h
> @@ -552,16 +552,23 @@ typedef struct ViewOptions
>   (relation)->rd_smgr->smgr_targblock = (targblock); \
>   } while (0)
>  
> +/*
> + * RelationIsPermanent
> + *   True if relation is WAL-logged.
> + */
> +#define RelationIsWalLogged(relation)
> \
> + ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)

To me, the names RelationIsWalLogged() and RelationNeedsWAL() don't convey
their behavior difference.  How about one of the following spellings?

- Name the new macro RelationEverNeedsWAL().
- Just test "relpersistence == RELPERSISTENCE_PERMANENT", without a macro.

> +
>  /*
>   * RelationNeedsWAL
> - *   True if relation needs WAL.
> + *   True if relation needs WAL at the time.
>   *
>   * Returns false if wal_level = minimal and this relation is created or
>   * truncated in the current transaction.  See "Skipping WAL for New
>   * RelFileNode" in src/backend/access/transam/README.
>   */
>  #define RelationNeedsWAL(relation)   
> \
> - ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&  
> \
> + (RelationIsWalLogged(relation) &&   
> \
>(XLogIsNeeded() || 
> \
> (relation->rd_createSubid == InvalidSubTransactionId &&   
> \
>  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> @@ -619,7 +626,7 @@ typedef struct ViewOptions
>   */
>  #define RelationIsAccessibleInLogicalDecoding(relation) \
>   (XLogLogicalInfoActive() && \
> -  RelationNeedsWAL(relation) && \
> +  RelationIsWalLogged(relation) && \

Today's condition expands to:

  wal_level >= WAL_LEVEL_LOGICAL &&
  relation->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&
  (wal_level >= WAL_LEVEL_REPLICA || ...)

The proposed change doesn't affect semantics, and it likely doesn't change
optimized code.  I slightly prefer to leave this line unchanged.

>(IsCatalogRelation(relation) || 
> RelationIsUsedAsCatalogTable(relation)))
>  
>  /*
> @@ -635,7 +642,7 @@ typedef struct ViewOptions
>   */
>  #define RelationIsLogicallyLogged(relation) \
>   (XLogLogicalInfoActive() && \
> -  RelationNeedsWAL(relation) && \
> +  RelationIsWalLogged(relation) && \

Likewise.

>!IsCatalogRelation(relation))
>  
>  /* routine

Re: Pg14, pg_dumpall and "password_encryption=true"

2021-01-15 Thread Noah Misch

On Fri, Jan 15, 2021 at 01:35:50PM +0900, Ian Lawrence Barwick wrote:
> $ tail -3 pg_upgrade_utility.log
> ALTER ROLE "postgres" SET "password_encryption" TO 'true';
> psql:pg_upgrade_dump_globals.sql:75: ERROR:  invalid value for
> parameter "password_encryption": "true"
> HINT:  Available values: md5, scram-sha-256.
> 
> This is a consquence of commit c7eab0e97, which removed support for the
> legacy
> values "on"/"true"/"yes"/"1".

Right.  This can happen anytime we reduce the domain of a setting having
GucContext PGC_SU_BACKEND, PGC_BACKEND, PGC_SUSET, or PGC_USERSET.

> I see following options to resolve this issue.
> 
> 1. Re-add support for a "true" as an alias for "md5".
> 2. Transparently rewrite "true" to "md5"
> 3. Have pg_dumpall map "true" to "md5"
> 4. Add an option to pg_dumpall to specify the target version

I expect rather few databases override this particular setting in
pg_proc.proconfig or pg_db_role_setting.setconfig.  The restore failure has a
clear error message, which is enough.  Each of the above would be overkill.

> 5. Have pg_upgrade emit a warning/hint

If done in a way not specific to this one setting, +1 for this.  That is to
say, do something like the following.  Retrieve every pg_proc.proconfig and
pg_db_role_setting.setconfig value from the old cluster.  Invoke the new
postmaster binary to test acceptance of each value.  I'm not generally a fan
of adding pg_upgrade code to predict whether the dump will fail to restore,
because such code will never be as good as just trying the restore.  That
said, this checking of GUC acceptance could be self-contained and useful for
the long term.

> 6. Document this as a backwards-compatibility thing
> 
> 
> Add an item in the documentation (release notes, pg_upgrade, pg_dumpall)
> stating
> that any occurrences of "password_encryption" which are not valid in Pg14
> should
> be updated before performing an upgrade.

The release notes will document the password_encryption change, though they
probably won't specifically mention the interaction with pg_dump.  I would not
document it elsewhere.

Re: Alter timestamp without timezone to with timezone rewrites rows

2021-01-16 Thread Noah Misch

On Wed, Jan 13, 2021 at 10:28:26AM -0500, Tom Lane wrote:
> So a non-rewriting conversion would only be possible if local time is
> identical to UTC; which is true for few enough people that nobody has
> bothered with attempting the optimization.

PostgreSQL 12 and later do have that optimization.  Example DDL:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c59263#patch4

Re: Alter timestamp without timezone to with timezone rewrites rows

2021-01-16 Thread Noah Misch

On Sat, Jan 16, 2021 at 12:03:19PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Wed, Jan 13, 2021 at 10:28:26AM -0500, Tom Lane wrote:
> >> So a non-rewriting conversion would only be possible if local time is
> >> identical to UTC; which is true for few enough people that nobody has
> >> bothered with attempting the optimization.
> 
> > PostgreSQL 12 and later do have that optimization.  Example DDL:
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c59263#patch4
> 
> Ah, I'd forgotten about that patch (obviously).  It is a kluge, without
> a doubt, since it has to hard-wire knowledge about the behavior of two
> specific conversions into what ought to be general-purpose code.  But
> I guess it is useful often enough to justify its existence.
> 
> I wonder whether it'd be worth moving this logic into a planner support
> function attached to those conversion functions?  I wouldn't bother
> right now, but if somebody else comes along with a similar proposal,
> we should think hard about that.

Yes.

Re: Dump public schema ownership & seclabels

2021-01-17 Thread Noah Misch

On Sat, Jan 16, 2021 at 02:05:43PM +0100, Vik Fearing wrote:
> On 12/30/20 12:59 PM, Noah Misch wrote:
> > On Tue, Dec 29, 2020 at 05:49:24AM -0800, Noah Misch wrote:
> >> https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave 
> >> $SUBJECT as
> >> one of the constituent projects for changing the public schema default ACL.
> >> This ended up being simple.  Attached.
> > 
> > This is defective; it fails to reproduce nspacl after "ALTER SCHEMA public
> > OWNER TO pg_write_server_files; REVOKE ALL ON SCHEMA public FROM
> > pg_write_server_files;".  I will try again later.
> 
> Could I ask you to also include COMMENTs when you try again, please?

That may work.  I had not expected to hear of a person changing the comment on
schema public.  To what do you change it?

Re: Pg14, pg_dumpall and "password_encryption=true"

2021-01-17 Thread Noah Misch

On Sun, Jan 17, 2021 at 01:51:35PM +0100, Magnus Hagander wrote:
> On Sat, Jan 16, 2021 at 8:27 AM Noah Misch  wrote:
> > On Fri, Jan 15, 2021 at 01:35:50PM +0900, Ian Lawrence Barwick wrote:
> > > $ tail -3 pg_upgrade_utility.log
> > > ALTER ROLE "postgres" SET "password_encryption" TO 'true';
> > > psql:pg_upgrade_dump_globals.sql:75: ERROR:  invalid value for
> > > parameter "password_encryption": "true"
> > > HINT:  Available values: md5, scram-sha-256.
> > >
> > > This is a consquence of commit c7eab0e97, which removed support for the
> > > legacy
> > > values "on"/"true"/"yes"/"1".
> >
> > Right.  This can happen anytime we reduce the domain of a setting having
> > GucContext PGC_SU_BACKEND, PGC_BACKEND, PGC_SUSET, or PGC_USERSET.
> >
> > > I see following options to resolve this issue.
> > >
> > > 1. Re-add support for a "true" as an alias for "md5".
> > > 2. Transparently rewrite "true" to "md5"
> > > 3. Have pg_dumpall map "true" to "md5"
> > > 4. Add an option to pg_dumpall to specify the target version
> >
> > I expect rather few databases override this particular setting in
> > pg_proc.proconfig or pg_db_role_setting.setconfig.  The restore failure has 
> > a
> > clear error message, which is enough.  Each of the above would be overkill.
> 
> Option 3 would be the closest to how other things work though,
> wuodln't it? Normally, it's the job of pg_dump (or pg_dumpall) to
> adapt the dump to the new version of PostgreSQL.

I didn't do a precedent search, but I can't think of an instance of those
programs doing (3).  We have things like guessConstraintInheritance() that
make up for lack of information in old versions, but dumpFunc() doesn't mutate
any proconfig setting values.  Is there a particular pg_dump behavior you had
in mind?

> And pg_dump[all] always generates a dump that should reload in the
> same version of postgres as the dump program -- not the source
> postgresql. Thus doing (4) would mean changing that, and would have to
> teach pg_dump[all] about *all* version differences in all directions
> -- which would be a huge undertaking. Doing that for just one
> individual parameter would be very strange.

Agreed.

> In a lot of ways,  think (3) seems like the reasonable ting to do.
> That's basically what we do when things change in the table creation
> commands etc, so it seems like the logical place.

This would be interpreting setconfig='{password_encryption=on}' as "opt out of
future password security increases".  I expect that will tend not to match the
intent of the person entering the setting.  That said, if v14 were already
behaving this way, I wouldn't dislike it enough to complain.

Re: Wrong usage of RelationNeedsWAL

2021-01-17 Thread Noah Misch

On Mon, Jan 18, 2021 at 03:08:38PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch  wrote in 
> > On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> > > --- a/src/include/utils/snapmgr.h
> > > +++ b/src/include/utils/snapmgr.h
> > > @@ -37,7 +37,7 @@
> > >   */
> > >  #define RelationAllowsEarlyPruning(rel) \
> > >  ( \
> > > -  RelationNeedsWAL(rel) \
> > > +  RelationIsWalLogged(rel) \
> > 
> > I suspect this is user-visible for a scenario like:
> > 
> > CREATE TABLE t AS SELECT ...; DELETE FROM t;
> > -- ... time passes, rows of t are now eligible for early pruning ...
> > BEGIN;
> > ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> > SELECT count(*) FROM t;
> > 
> > After this patch, the SELECT would do early pruning, as it does in the 
> > absence
> > of the ALTER TABLE.  When pruning doesn't update the page LSN,
> > TestForOldSnapshot() will be unable to detect that early pruning changed the
> > query results.  Hence, RelationAllowsEarlyPruning() must not change this 
> > way.
> > Does that sound right?
> 
> Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
> it seems to work well even if pruning happened at the SELECT.
> Conversely that should work after old_snapshot_threshold elapsed.
> 
> Am I missing something?

I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot().  If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?

Re: Wrong usage of RelationNeedsWAL

2021-01-17 Thread Noah Misch

On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
> On Mon, Jan 18, 2021 at 03:08:38PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch  wrote in 
> > > On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> > > > --- a/src/include/utils/snapmgr.h
> > > > +++ b/src/include/utils/snapmgr.h
> > > > @@ -37,7 +37,7 @@
> > > >   */
> > > >  #define RelationAllowsEarlyPruning(rel) \
> > > >  ( \
> > > > -RelationNeedsWAL(rel) \
> > > > +RelationIsWalLogged(rel) \
> > > 
> > > I suspect this is user-visible for a scenario like:
> > > 
> > > CREATE TABLE t AS SELECT ...; DELETE FROM t;
> > > -- ... time passes, rows of t are now eligible for early pruning ...
> > > BEGIN;
> > > ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> > > SELECT count(*) FROM t;
> > > 
> > > After this patch, the SELECT would do early pruning, as it does in the 
> > > absence
> > > of the ALTER TABLE.  When pruning doesn't update the page LSN,
> > > TestForOldSnapshot() will be unable to detect that early pruning changed 
> > > the
> > > query results.  Hence, RelationAllowsEarlyPruning() must not change this 
> > > way.
> > > Does that sound right?
> > 
> > Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
> > it seems to work well even if pruning happened at the SELECT.
> > Conversely that should work after old_snapshot_threshold elapsed.
> > 
> > Am I missing something?
> 
> I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
> TestForOldSnapshot().  If the LSN isn't important, what else explains
> RelationAllowsEarlyPruning() checking RelationNeedsWAL()?

Thinking about it more, some RelationAllowsEarlyPruning() callers need
different treatment.  Above, I was writing about the case of deciding whether
to do early pruning.  The other RelationAllowsEarlyPruning() call sites are
deciding whether the relation might be lacking old data.  For that case, we
should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL().  Otherwise, we
could fail to report an old-snapshot error in a case like this:

setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
xact2: DELETE FROM t;
(plenty of time passes)
xact3: SELECT count(*) FROM t;  -- early pruning
xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q;  -- "snapshot too 
old"
xact1: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
xact1: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"

Is that plausible?

Re: Wrong usage of RelationNeedsWAL

2021-01-19 Thread Noah Misch

On Tue, Jan 19, 2021 at 01:48:31PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 18 Jan 2021 17:30:22 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch  wrote in 
> > > On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
> > > > I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" 
> > > > check in
> > > > TestForOldSnapshot().  If the LSN isn't important, what else explains
> > > > RelationAllowsEarlyPruning() checking RelationNeedsWAL()?
> > > 
> > > Thinking about it more, some RelationAllowsEarlyPruning() callers need
> > > different treatment.  Above, I was writing about the case of deciding 
> > > whether
> > > to do early pruning.  The other RelationAllowsEarlyPruning() call sites 
> > > are
> > > deciding whether the relation might be lacking old data.  For that case, 
> > > we
> > > should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL().  
> > > Otherwise, we
> > > could fail to report an old-snapshot error in a case like this:
> > > 
> A> > setup: CREATE TABLE t AS SELECT ...;
> B> > xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
> C> > xact2: DELETE FROM t;
> D> > (plenty of time passes)
> E> > xact3: SELECT count(*) FROM t;  -- early pruning
> F> > xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q;  -- "snapshot 
> too old"
> G> > xact1: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> H> > xact1: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"
> > > 
> > > Is that plausible?
> > 
> > Thank you for the consideration and yes. But I get "snapshot too old"
> > from the last query with the patched version. maybe I'm missing
> > something. I'm going to investigate the case.
> 
> Ah. I took that in reverse way. (caught by the discussion on page
> LSN.) Ok, the "current" code works that way. So we need to perform the
> check the new way in RelationAllowsEarlyPruning. So in a short answer
> for the last question in regard to the reference side is yes.

Right.  I expect the above procedure shows a bug in git master, but your patch
versions suffice to fix that bug.

> I understand that you are suggesting that at least
> TransactionIdLimitedForOldSnapshots should follow not only relation
> persistence but RelationNeedsWAL, based on the discussion on the
> check on LSN of TestForOldSnapshot().
> 
> I still don't think that LSN in the WAL-skipping-created relfilenode
> harm.  ALTER TABLE SET TABLESPACE always makes a dead copy of every
> block (except checksum) including page LSN regardless of wal_level. In
> the scenario above, the last select at H correctly reads page LSN set
> by E then copied by G, which is larger than the snapshot LSN at B. So
> doesn't go the fast-return path before actual check performed by
> RelationAllowsEarlyPruning.

I agree the above procedure doesn't have a problem under your patch versions.
See https://postgr.es/m/20210116043816.ga1644...@rfd.leadboat.com for a
different test case.  In more detail:

setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN; DELETE FROM t;
xact2: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
xact1: COMMIT;
(plenty of time passes, rows of t are now eligible for early pruning)
xact3: BEGIN;
xact3: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
xact3: SELECT count(*) FROM t;  -- early pruning w/o WAL, w/o LSN changes
xact3: COMMIT;
xact2: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"

I expect that shows no bug for git master, but I expect it does show a bug
with your patch versions.  Could you try implementing both test procedures in
src/test/modules/snapshot_too_old?  There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.

Re: pg_upgrade fails with non-standard ACL

2021-01-20 Thread Noah Misch

On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote:
> On 03.01.2021 14:29, Noah Misch wrote:
> >On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote:

> Thank you for the review.
> New version of the patch is attached, though I haven't tested it properly
> yet. I am planning to do in a couple of days.

Once that testing completes, please change the commitfest entry status to
Ready for Committer.

> >>+   snprintf(output_path, sizeof(output_path), "revoke_objects.sql");
> >If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database 
> >in
> >which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened
> >requires a GRANT.  Can you use a file name that will still make sense if
> >someone enhances pg_upgrade to generate such GRANT statements?
> I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to
> you?

That name is fine with me.

> >   ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b";
> >   GRANT SELECT ("a.b") ON pg_locks TO backup;
> >
> >After which revoke_objects.sql has:
> >
> >   psql:./revoke_objects.sql:9: ERROR:  syntax error at or near "") ON 
> > pg_catalog.pg_locks.""
> >   LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup";
> >
> >While that ALTER VIEW probably should have required allow_system_table_mods,
> >we need to assume such databases exist.
> 
> I've fixed it by using pg_identify_object_as_address(). Now we don't have to
> parse obj_identity.

Nice.

Re: Wrong usage of RelationNeedsWAL

2021-01-21 Thread Noah Misch

On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > Anyway, it seems actually dangerous that cause pruning on wal-skipped
> > relation.
> > 
> > > with your patch versions.  Could you try implementing both test 
> > > procedures in
> > > src/test/modules/snapshot_too_old?  There's no need to make the test use
> > > wal_level=minimal explicitly; it's sufficient to catch these bugs when
> > > wal_level=minimal is in the TEMP_CONFIG file.
> > 
> > In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
> > instead of moving the condition on LSN to TestForOldSnapshot_impl for
> > performance.
> > 
> > I'll add the test part in the next version.

That test helped me.  I now see "there's not a single tuple removed due to
old_snapshot_threshold in src/test/modules/snapshot_too_old"[1], which limits
our ability to test using this infrastructure.

> However, with the previous patch, two existing tests sto_using_cursor
> and sto_using_select behaves differently from the master.  That change
> is coming from the omission of actual LSN check in TestForOldSnapshot
> while wal_level=minimal. So we have no choice other than actually
> updating page LSN.
> 
> In the scenario under discussion there are two places we need to do
> that. one is heap_page_prune, and the other is RelationCopyStorge. As
> a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> attached third file.

Fake LSNs make the system harder to understand, so I prefer not to spread fake
LSNs to more access methods.  What I had in mind is to simply suppress early
pruning when the relation is skipping WAL.  Attached.  Is this reasonable?  It
passes the older tests.  While it changes the sto_wal_optimized.spec output, I
think it preserves the old_snapshot_threshold behavior contract.

[1] 
https://www.postgresql.org/message-id/20200403001235.e6jfdll3gh2ygbuc%40alap3.anarazel.de
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 5f8e1c6..84d2efc 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
 errdetail("System tables cannot be added to 
publications.")));
 
/* UNLOGGED and TEMP relations cannot be part of publication. */
-   if (!RelationNeedsWAL(targetrel))
+   if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index da322b4..177e6e3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
relation = table_open(relationObjectId, NoLock);
 
/* Temporary and unlogged relations are inaccessible during recovery. */
-   if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+   if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+   RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot access temporary or unlogged 
relations during recovery")));
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ae16c3e..9570426 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1764,7 +1764,11 @@ TransactionIdLimitedForOldSnapshots(TransactionId 
recentXmin,
Assert(OldSnapshotThresholdActive());
Assert(limit_ts != NULL && limit_xid != NULL);
 
-   if (!RelationAllowsEarlyPruning(relation))
+   /*
+* TestForOldSnapshot() assumes early pruning advances the page LSN, so 
we
+* can't prune early when skipping WAL.
+*/
+   if (!RelationAllowsEarlyPruning(relation) || 
!RelationNeedsWAL(relation))
return false;
 
ts = GetSnapshotCurrentTimestamp();
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 579be35..c21ee3c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -37,7 +37,7 @@
  */
 #define RelationAllowsEarlyPruning(rel) \
 ( \
-RelationNeedsWAL(rel) \
+(rel)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT  \
   && !IsCatalogRelation(rel) \
   && !RelationIsAccessibleInLogicalDecoding(rel) \
 )
diff --git a/src/test/subscription/t/001_rep_changes.pl 
b/src/test/subscription/t/001_rep_changes.pl
index 0680f44..ed9b48e 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
 use warnin

Re: Wrong usage of RelationNeedsWAL

2021-01-21 Thread Noah Misch

On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch  wrote in 
> > On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> > > However, with the previous patch, two existing tests sto_using_cursor
> > > and sto_using_select behaves differently from the master.  That change
> > > is coming from the omission of actual LSN check in TestForOldSnapshot
> > > while wal_level=minimal. So we have no choice other than actually
> > > updating page LSN.
> > > 
> > > In the scenario under discussion there are two places we need to do
> > > that. one is heap_page_prune, and the other is RelationCopyStorge. As
> > > a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> > > attached third file.
> > 
> > Fake LSNs make the system harder to understand, so I prefer not to spread 
> > fake
> > LSNs to more access methods.  What I had in mind is to simply suppress early
> > pruning when the relation is skipping WAL.  Attached.  Is this reasonable?  
> > It
> > passes the older tests.  While it changes the sto_wal_optimized.spec 
> > output, I
> > think it preserves the old_snapshot_threshold behavior contract.
> 
> Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> test with wal_level=minimal?

Correct.  The case we must avoid is letting an old snapshot read an
early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too old".
The patch suspends early pruning, so that error is not applicable.

Re: Very misleading documentation for PQreset()

2021-01-21 Thread Noah Misch

On Thu, Jan 21, 2021 at 05:32:56PM -0500, Tom Lane wrote:
> I happened to notice that PQreset is documented thus:
> 
> This function will close the connection to the server and attempt to
> reestablish a new connection to the same server, using all the same
> parameters previously used.
> 
> Since we invented multi-host connection parameters, a reasonable person
> would assume that "to the same server" means we promise to reconnect to
> the same host we selected the first time.  There is no such guarantee
> though; the new connection attempt is done just like the first one,
> so it will select the first suitable server in the list.
> 
> I think we should just drop that phrase.

I agree that dropping the phrase strictly improves that documentation.

Re: pg_upgrade fails with non-standard ACL

2021-01-24 Thread Noah Misch

On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote:
> On 03.01.2021 14:29, Noah Misch wrote:
> >Overall, this patch predicts a subset of cases where pg_dump will emit a
> >failing GRANT or REVOKE that targets a pg_catalog object.  Can you write a
> >code comment stating what it does and does not detect?  I think it's okay to
> >not predict every failure, but let's record the intended coverage.  Here are 
> >a
> >few examples I know; there may be more.  The above query only finds GRANTs to
> >non-pinned roles.  pg_dump dumps the following, but the above query doesn't
> >find them:
> >
> >   REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public;
> >   GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend;

I see a new comment listing object types.  Please also explain the lack of
preventing REVOKE failures (first example query above) and the limitation
around non-pinned roles (second example).

> >The above query should test refclassid.

Please do so.

> + /* Handle table column objects */
> + if (strstr(aclinfo->obj_type, "column") != NULL)
> + {
> + char   *name_pos = 
> strstr(aclinfo->obj_ident,
> + 
>   aclinfo->obj_name);

> + if (*name_pos == '\"')
> + name_pos--;

This solves the problem affecting a column named "a.b", but it fails for a
column named "pg_catalog" or "a""b".  I recommend solving this by retrieving
all three elements of the pg_identify_object_as_address array, then quoting
each of them on the client side.

Re: PG vs LLVM 12 on seawasp, next round

2021-01-24 Thread Noah Misch

On Mon, Jan 18, 2021 at 09:29:53PM +0100, Fabien COELHO wrote:
> >>>The "no such file" error seems more like a machine local issue to me.
> >>
> >>I'll look into it when I have time, which make take some time. Hopefully
> >>over the holidays.
> >
> >This is still happening ... Any chance you can have a look at it?
> 
> Indeed. I'll try to look (again) into it soon. I had a look but did not find
> anything obvious in the short time frame I had. Last two months were a
> little overworked for me so I let slip quite a few things. If you want to
> disable the animal as Tom suggests, do as you want.

Perhaps he was suggesting that you (buildfarm owner) disable the cron job that
initiates new runs.  That's what I do when one of my animals needs my
intervention.

Re: Add SQL function for SHA1

2021-01-25 Thread Noah Misch

On Mon, Jan 25, 2021 at 10:12:28PM +0900, Michael Paquier wrote:
> SHA-1 is now an option available for cryptohashes, and like the
> existing set of functions of SHA-2, I don't really see a reason why we
> should not have a SQL function for SHA1.

NIST deprecated SHA1 over ten years ago.  It's too late to be adding this.

Re: Wrong usage of RelationNeedsWAL

2021-01-27 Thread Noah Misch

On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
> On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> > At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch  wrote in 
> > > On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> > > > However, with the previous patch, two existing tests sto_using_cursor
> > > > and sto_using_select behaves differently from the master.  That change
> > > > is coming from the omission of actual LSN check in TestForOldSnapshot
> > > > while wal_level=minimal. So we have no choice other than actually
> > > > updating page LSN.
> > > > 
> > > > In the scenario under discussion there are two places we need to do
> > > > that. one is heap_page_prune, and the other is RelationCopyStorge. As
> > > > a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> > > > attached third file.
> > > 
> > > Fake LSNs make the system harder to understand, so I prefer not to spread 
> > > fake
> > > LSNs to more access methods.  What I had in mind is to simply suppress 
> > > early
> > > pruning when the relation is skipping WAL.  Attached.  Is this 
> > > reasonable?  It
> > > passes the older tests.  While it changes the sto_wal_optimized.spec 
> > > output, I
> > > think it preserves the old_snapshot_threshold behavior contract.
> > 
> > Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> > test with wal_level=minimal?
> 
> Correct.  The case we must avoid is letting an old snapshot read an
> early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too old".
> The patch suspends early pruning, so that error is not applicable.

I think the attached version is ready.  The changes since v6nm are cosmetic:

- Wrote log messages
- Split into two patches, since the user-visible bugs are materially different
- Fixed typos
- Ran perltidy

Is it okay if I push these on Saturday, or would you like more time to
investigate?
Author: Noah Misch 
Commit: Noah Misch 

Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.

CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 broke this.

Kyotaro Horiguchi

Discussion: 
https://postgr.es/m/20210113.160705.2225256954956139776.horikyota@gmail.com

diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 5f8e1c6..84d2efc 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
 errdetail("System tables cannot be added to 
publications.")));
 
/* UNLOGGED and TEMP relations cannot be part of publication. */
-   if (!RelationNeedsWAL(targetrel))
+   if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index da322b4..177e6e3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
relation = table_open(relationObjectId, NoLock);
 
/* Temporary and unlogged relations are inaccessible during recovery. */
-   if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+   if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+   RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot access temporary or unlogged 
relations during recovery")));
diff --git a/src/test/subscription/t/001_rep_changes.pl 
b/src/test/subscription/t/001_rep_changes.pl
index 0680f44..3d90f81 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 23;
+use Test::More tests => 24;
 
 # Initialize publisher node
 my $node_publisher = get_new_node('publisher');
@@ -358,3 +358,21 @@ is($result, qq(0), &#x

Re: pg_upgrade fails with non-standard ACL

2021-01-27 Thread Noah Misch

On Mon, Jan 25, 2021 at 10:14:43PM +0300, Anastasia Lubennikova wrote:
> On 24.01.2021 11:39, Noah Misch wrote:
> >On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote:
> >>On 03.01.2021 14:29, Noah Misch wrote:
> >>>Overall, this patch predicts a subset of cases where pg_dump will emit a
> >>>failing GRANT or REVOKE that targets a pg_catalog object.  Can you write a
> >>>code comment stating what it does and does not detect?  I think it's okay 
> >>>to
> >>>not predict every failure, but let's record the intended coverage.  Here 
> >>>are a
> >>>few examples I know; there may be more.  The above query only finds GRANTs 
> >>>to
> >>>non-pinned roles.  pg_dump dumps the following, but the above query doesn't
> >>>find them:
> >>>
> >>>   REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public;
> >>>   GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend;
> >I see a new comment listing object types.  Please also explain the lack of
> >preventing REVOKE failures (first example query above) and the limitation
> >around non-pinned roles (second example).
> 
> 1) Could you please clarify, what do you mean by REVOKE failures?
> 
> I tried following example:
> 
> Start 9.6 cluster.
> 
> REVOKE ALL ON function pg_switch_xlog() FROM public;
> REVOKE ALL ON function pg_switch_xlog() FROM backup;
> 
> The upgrade to the current master passes with and without patch.
> It seems that current implementation of pg_upgrade doesn't take into account
> revoke ACLs.

I think you can observe it by adding "revoke all on function
pg_stat_get_subscription from public;" to test_add_acl_to_catalog_objects.sql
and then rerunning your test script.  pg_dump will reproduce that REVOKE,
which would fail if postgresql.org had removed that function.  That's fine, so
long as a comment mentions the limitation.

> 2) As for pinned roles, I think we should fix this behavior, rather than
> adding a comment. Because such roles can have grants on system objects.
> 
> In earlier versions of the patch, I gathered ACLs directly from system
> catalogs: pg_proc.proacl, pg_class.relacl pg_attribute.attacl and
> pg_type.typacl.
> 
> The only downside of this approach is that it cannot be easily extended to
> other object types, as we need to handle every object type separately.
> I don't think it is a big deal, as we already do it in
> check_for_changed_signatures()
> 
> And also the query to gather non-standard ACLs won't look as nice as now,
> because of the need to parse arrays of aclitems.
> 
> What do you think?

Hard to say for certain without seeing the code both ways.  I'm not generally
enthusiastic about adding pg_upgrade code to predict whether the dump will
fail to restore, because such code will never be as good as just trying the
restore.  The patch has 413 lines of code, which is substantial.  I would balk
if, for example, the patch tripled in size to catch more cases.

Re: pg_upgrade fails with non-standard ACL

2021-01-27 Thread Noah Misch

On Wed, Jan 27, 2021 at 07:32:42PM +0300, Anastasia Lubennikova wrote:
> On 27.01.2021 14:21, Noah Misch wrote:
> >On Mon, Jan 25, 2021 at 10:14:43PM +0300, Anastasia Lubennikova wrote:
> >
> >>1) Could you please clarify, what do you mean by REVOKE failures?
> >>
> >>I tried following example:
> >>
> >>Start 9.6 cluster.
> >>
> >>REVOKE ALL ON function pg_switch_xlog() FROM public;
> >>REVOKE ALL ON function pg_switch_xlog() FROM backup;
> >>
> >>The upgrade to the current master passes with and without patch.
> >>It seems that current implementation of pg_upgrade doesn't take into account
> >>revoke ACLs.
> >I think you can observe it by adding "revoke all on function
> >pg_stat_get_subscription from public;" to test_add_acl_to_catalog_objects.sql
> >and then rerunning your test script.  pg_dump will reproduce that REVOKE,
> >which would fail if postgresql.org had removed that function.  That's fine, 
> >so
> >long as a comment mentions the limitation.
> 
> In the updated patch, I implemented generation of both GRANT ALL and REVOKE
> ALL for problematic objects. If I understand it correctly, these calls will
> clean object's ACL completely. And I see no harm in doing this, because the
> objects don't exist in the new cluster anyway.
> 
> To test it I attempted to reproduce the problem, using attached
> test_revoke.sql in the test. Still, pg_upgrade works fine without any
> adjustments. I'd be grateful if you test it some more.

test_revoke.sql has "revoke all on function pg_stat_get_subscription() from
test", which does nothing.  You need something that causes a REVOKE in pg_dump
output.  Worked example:

=== shell script
set -x
createuser test
pg_dump | grep -E 'GRANT|REVOKE'
psql -Xc 'revoke all on function pg_stat_get_subscription() from test;'
psql -Xc 'revoke all on function pg_stat_get_subscription from test;'
pg_dump | grep -E 'GRANT|REVOKE'
psql -Xc 'revoke all on function pg_stat_get_subscription from public;'
pg_dump | grep -E 'GRANT|REVOKE'

=== output
+ createuser test
+ pg_dump
+ grep -E 'GRANT|REVOKE'
+ psql -Xc 'revoke all on function pg_stat_get_subscription() from test;'
ERROR:  function pg_stat_get_subscription() does not exist
+ psql -Xc 'revoke all on function pg_stat_get_subscription from test;'
REVOKE
+ pg_dump
+ grep -E 'GRANT|REVOKE'
+ psql -Xc 'revoke all on function pg_stat_get_subscription from public;'
REVOKE
+ pg_dump
+ grep -E 'GRANT|REVOKE'
REVOKE ALL ON FUNCTION pg_catalog.pg_stat_get_subscription(subid oid, OUT subid 
oid, OUT relid oid, OUT pid integer, OUT received_lsn pg_lsn, OUT 
last_msg_send_time timestamp with time zone, OUT last_msg_receipt_time 
timestamp with time zone, OUT latest_end_lsn pg_lsn, OUT latest_end_time 
timestamp with time zone) FROM PUBLIC;

That REVOKE is going to fail if the upgrade target cluster lacks the function
in question, and your test_rename_catalog_objects_v13 does simulate
pg_stat_get_subscription not existing.

Re: Wrong usage of RelationNeedsWAL

2021-01-27 Thread Noah Misch

On Thu, Jan 28, 2021 at 12:06:27PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch  wrote in 
> > On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
> > > On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> > > > Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> > > > test with wal_level=minimal?
> > > 
> > > Correct.  The case we must avoid is letting an old snapshot read an
> > > early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too 
> > > old".
> > > The patch suspends early pruning, so that error is not applicable.

> I studied the sto feature further and concluded that the checker side
> is fine that it always follow the chages of page-LSN.
> 
> So what we can do for the issue is setting seemingly correct page LSN
> at pruning or refrain from early-pruning while we are skipping
> WAL. The reason I took the former is I thought that the latter might
> be a problem since early-pruning would be postponed by a long-running
> wal-skipping transaction.

Yes, that's an accurate summary.

> So the patch looks fine to me. The commit message mekes sense.
> 
> However, is it ok that the existing tests (modules/snapshot_too_old)
> fails when wal_level=minimal?

That would not be okay, but I'm not seeing that.  How did your setup differ
from the following?

[nm@rfd 6:1 2021-01-27T23:06:33 postgresql 0]$ cat /nmscratch/minimal.conf
log_statement = all
wal_level = minimal
max_wal_senders = 0
log_line_prefix = '%m [%p %l %x] %q%a '
[nm@rfd 6:1 2021-01-27T23:06:38 postgresql 0]$ make -C 
src/test/modules/snapshot_too_old check TEMP_CONFIG=/nmscratch/minimal.conf
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 58080 with PID 2603099
== creating database "isolation_regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test sto_using_cursor ... ok30168 ms
test sto_using_select ... ok24197 ms
test sto_using_hash_index ... ok 6089 ms
== shutting down postmaster   ==
== removing temporary instance==

=
 All 3 tests passed.
=

Re: Why does creating logical replication subscriptions require superuser?

2021-01-31 Thread Noah Misch

On Fri, Jan 22, 2021 at 02:08:02PM -0800, Paul Martinez wrote:
> Some of the original justifications for requiring superuser to create
> subscriptions include:
> - Replication inherently adds significant network traffic and extra background
>   process, and we wouldn't want unprivileged users to be able to add such
>   drastic load to then database.

I think you're referring to these messages:

https://postgr.es/m/CA+TgmoahEoM2zZO71yv4883HFarXcBcOs3if6fEdRcRs8Fs=z...@mail.gmail.com

https://postgr.es/m/ca+tgmobqxge_7dcx1_dv8+kaf3neoe5sy4nxgb9ayfm5ydj...@mail.gmail.com

A permanent background process bypasses authentication, so mechanisms like
pg_hba.conf and expiration of the auth SSL certificate don't stop access.
Like that thread discussed, this justifies some privilege enforcement.
(Autovacuum also bypasses authentication, but those are less predictable.)

Since we already let users drive the database to out-of-memory, I wouldn't
worry about load.  In other words, the quantity of network traffic and number
of background processes don't matter, just the act of allowing them at all.

> - Running the apply process as a superuser drastically simplifies the number
>   of possible errors that might arise due to not having sufficient permissions
>   to write to target tables, and may have simplified the initial
>   implementation.

I think you're referring to this:

https://postgr.es/m/ca+tgmoye1x21zlycqovl-kdd9djsvz4v8nnmfdscjrwv9qf...@mail.gmail.com
 wrote:
> It seems more likely that there is a person whose job it is to set up
> replication but who doesn't normally interact with the table data
> itself.  In that kind of case, you just want to give the person
> permission to create subscriptions, without needing to also give them
> lots of privileges on individual tables (and maybe having whatever
> they are trying to do fail if you miss a table someplace).

Exposure to permission checks is a chief benefit of doing anything as a
non-superuser, so I disagree with this.  (I've bcc'd the author of that
message, in case he wants to comment.)  We could add a pg_write_any_table
special role.  DBAs should be more cautious granting pg_write_any_table than
granting subscription privilege.  (For this use case, grant both.)

> - Subscriptions store plaintext passwords, which are sensitive, and we
>   wouldn't want unprivileged users to see these. Only allowing superusers
>   to create subscriptions and view the subconninfo column is a way to resolve
>   this.

pg_user_mapping.umoptions has the same security considerations; one should be
able to protect it and subconninfo roughly the same way.

> Are there any other major reasons that we require superuser?

As another prerequisite for non-superuser-owned subscriptions, the connection
to the publisher must enforce the equivalent of dblink_security_check().

> Notably one may
> wonder why we didn't check for the REPLICATION attribute, but even replication
> users could run into table ownership and access issues.

REPLICATION represents the authority to read all bytes of the data directory.
Compared to the implications of starting a subscriber, REPLICATION carries a
lot of power.  I would not reuse REPLICATION here.

> One message in the thread mentioned somehow tricking Postgres into replicating
> system catalog tables:
> 
> https://www.postgresql.org/message-id/109201553163096%40myt5-68ad52a76c91.qloud-c.yandex.net
> 
> I'm unsure whether this is allowed by default, but it doesn't seem like too
> much trouble to run a modified publisher node that does allow it. Then
> something like 'UPDATE pg_authid SET rolsuper = TRUE' could result in 
> privilege
> escalation on the subscriber side. But, again, if the apply process isn't
> running as superuser, then presumably applying the change in the subscriber
> would fail, stopping replication, and neutralizing the attack.

This is a special case of the need for ordinary ACL checks in the subscriber.
Treating system catalogs differently would be insufficient and unnecessary.

Thanks,
nm

Race between KeepFileRestoredFromArchive() and restartpoint

2021-02-02 Thread Noah Misch

A race with KeepFileRestoredFromArchive() can cause a restartpoint to fail, as
seen once on the buildfarm[1].  The attached patch adds a test case; it
applies atop the "stop events" patch[2].  We have two systems for adding
long-term pg_wal directory entries.  KeepFileRestoredFromArchive() adds them
during archive recovery, while InstallXLogFileSegment() does so at all times.
Unfortunately, InstallXLogFileSegment() happens throughout archive recovery,
via the checkpointer recycling segments and calling PreallocXlogFiles().
Multiple processes can run InstallXLogFileSegment(), which uses
ControlFileLock to represent the authority to modify the directory listing of
pg_wal.  KeepFileRestoredFromArchive() just assumes it controls pg_wal.

Recycling and preallocation are wasteful during archive recovery, because
KeepFileRestoredFromArchive() unlinks every entry in its path.  I propose to
fix the race by adding an XLogCtl flag indicating which regime currently owns
the right to add long-term pg_wal directory entries.  In the archive recovery
regime, the checkpointer will not preallocate and will unlink old segments
instead of recycling them (like wal_recycle=off).  XLogFileInit() will fail.

Notable alternatives:

- Release ControlFileLock at the end of XLogFileInit(), not at the end of
  InstallXLogFileSegment().  Add ControlFileLock acquisition to
  KeepFileRestoredFromArchive().  This provides adequate mutual exclusion, but
  XLogFileInit() could return a file descriptor for an unlinked file.  That's
  fine for PreallocXlogFiles(), but it feels wrong.

- During restartpoints, never preallocate or recycle segments.  (Just delete
  obsolete WAL.)  By denying those benefits, this presumably makes streaming
  recovery less efficient.

- Make KeepFileRestoredFromArchive() call XLogFileInit() to open a segment,
  then copy bytes.  This is simple, but it multiplies I/O.  That might be
  negligible on account of caching, or it might not be.  A variant, incurring
  extra fsyncs, would be to use durable_rename() to replace the segment we get
  from XLogFileInit().

- Make KeepFileRestoredFromArchive() rename without first unlinking.  This
  avoids checkpoint failure, but a race could trigger noise from the LOG
  message in InstallXLogFileSegment -> durable_rename_excl.

Does anyone prefer some alternative?  It's probably not worth back-patching
anything for a restartpoint failure this rare, because most restartpoint
outcomes are not user-visible.

Thanks,
nm

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17
[2] 
https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com
Author:     Noah Misch 
Commit: Noah Misch 

Not for commit; for demonstration only.

Before applying this, apply

https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 199d911..1c0a988 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -71,6 +71,7 @@
 #include "storage/reinit.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "storage/stopevent.h"
 #include "storage/sync.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -3583,6 +3584,23 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, 
XLogSegNo srcsegno,
elog(ERROR, "InstallXLogFileSegment should not have failed");
 }
 
+static Jsonb *
+InstallXLogFileSegmentEndStopEventParams(XLogSegNo segno)
+{
+   MemoryContext oldCxt;
+   JsonbParseState *state = NULL;
+   Jsonb  *res;
+
+   stopevents_make_cxt();
+   oldCxt = MemoryContextSwitchTo(stopevents_cxt);
+   pushJsonbValue(&state, WJB_BEGIN_OBJECT, NULL);
+   jsonb_push_int8_key(&state, "segno", segno);
+   res = JsonbValueToJsonb(pushJsonbValue(&state, WJB_END_OBJECT, NULL));
+   MemoryContextSwitchTo(oldCxt);
+
+   return res;
+}
+
 /*
  * Install a new XLOG segment file as a current or future log segment.
  *
@@ -3664,6 +3682,9 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
if (use_lock)
LWLockRelease(ControlFileLock);
 
+   STOPEVENT(InstallXLogFileSegmentEndStopEvent,
+ InstallXLogFileSegmentEndStopEventParams(*segno));
+
return true;
 }
 
diff --git a/src/backend/access/transam/xlogarchive.c 
b/src/backend/access/transam/xlogarchive.c
index 1c5a4f8..d9c5a3a 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -30,6 +30,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/pmsignal.h"
+#include "storage/stopevent.h"
 
 /*
  * Attempt to retrieve the s

Re: 2021-02-11 release announcement draft

2021-02-08 Thread Noah Misch

On Mon, Feb 08, 2021 at 05:40:41PM -0500, Jonathan S. Katz wrote:
> This update also fixes over 80 bugs that were reported in the last several
> months. Some of these issues only affect version 13, but may also apply to 
> other
> supported versions.

Did you want s/may/many/?

> * Fix an issue with GiST indexes where concurrent insertions could lead to a
> corrupt index with entries placed in the wrong pages. You should `REINDEX` any
> affected GiST indexes.

For what it's worth, there's little way for a user to confirm whether an index
is affected.  (If you've never had more than one connection changing the table
at a time, the table's indexes would be unaffected.)

> * Fix `CREATE INDEX CONCURRENTLY` to ensure rows from concurrent prepared
> transactions are included in the index.

Consider adding a sentence like "Installations that have enabled prepared
transactions should `REINDEX` any concurrently-built indexes."  The release
notes say:

+  In installations that have enabled prepared transactions
+  (max_prepared_transactions > 0),
+  it's recommended to reindex any concurrently-built indexes in
+  case this problem occurred when they were built.

> * Fix a failure when a PL/pgSQL procedure used `CALL` on another procedure 
> that
> has `OUT` parameters did not call execute a `COMMIT` or `ROLLBACK`.

The release notes say the failure happened when the callee _did_ execute a
COMMIT or ROLLBACK:

+ 
+  A CALL in a PL/pgSQL procedure, to another
+  procedure that has OUT parameters, would fail if the called
+  procedure did a COMMIT
+  or ROLLBACK.
+ 

> For more details, please see the
> [release notes](https://www.postgresql.org/docs/current/release.html).

I recommend pointing this to https://www.postgresql.org/docs/release/, since
the above link now contains only v13 notes.

Re: Tightening up allowed custom GUC names

2021-02-09 Thread Noah Misch

On Tue, Feb 09, 2021 at 05:34:37PM -0500, Tom Lane wrote:
> Now granting that the best answer is just to forbid these cases,
> there are still a couple of decisions about how extensive the
> prohibition ought to be:
> 
> * We could forbid these characters only when you try to actually
> put such a GUC into pg_db_role_setting, and otherwise allow them.
> That seems like a weird nonorthogonal choice though, so I'd
> rather just forbid them period.

Agreed.

> * A case could be made for tightening things up a lot more, and not
> allowing anything that doesn't look like an identifier.  I'm not
> pushing for that, as it seems more likely to break existing
> applications than the narrow restriction proposed here.  But I could
> live with it if people prefer that way.

I'd prefer that.  Characters like backslash, space, and double quote have
significant potential to reveal bugs, while having negligible application
beyond revealing bugs.

Re: Dump public schema ownership & seclabels

2021-02-11 Thread Noah Misch

On Sun, Jan 17, 2021 at 12:00:06PM +0100, Vik Fearing wrote:
> On 1/17/21 10:41 AM, Noah Misch wrote:
> > On Sat, Jan 16, 2021 at 02:05:43PM +0100, Vik Fearing wrote:
> >> On 12/30/20 12:59 PM, Noah Misch wrote:
> >>> On Tue, Dec 29, 2020 at 05:49:24AM -0800, Noah Misch wrote:
> >>>> https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave 
> >>>> $SUBJECT as
> >>>> one of the constituent projects for changing the public schema default 
> >>>> ACL.
> >>>> This ended up being simple.  Attached.
> >>>
> >>> This is defective; it fails to reproduce nspacl after "ALTER SCHEMA public
> >>> OWNER TO pg_write_server_files; REVOKE ALL ON SCHEMA public FROM
> >>> pg_write_server_files;".  I will try again later.

Fixed.  The comment added to getNamespaces() explains what went wrong.

Incidentally, --no-acl is fragile without --no-owner, because any REVOKE
statements assume a particular owner.  Since one can elect --no-owner at
restore time, we can't simply adjust the REVOKE statements constructed at dump
time.  That's not new with this patch or specific to initdb-created objects.

> >> Could I ask you to also include COMMENTs when you try again, please?
> > 
> > That may work.  I had not expected to hear of a person changing the comment 
> > on
> > schema public.  To what do you change it?
> 
> It was a while ago and I don't remember because it didn't appear in the
> dump so I stopped doing it. :(
> 
> Mine was an actual comment, but there are some tools out there that
> (ab)use COMMENTs as crude metadata for what they do.  For example:
> https://postgresql-anonymizer.readthedocs.io/en/stable/declare_masking_rules/#declaring-rules-with-comments

I've attached a separate patch for this, which applies atop the ownership
patch.  This makes more restores fail for non-superusers, which is okay.
Author: Noah Misch 
Commit: Noah Misch 

Dump public schema ownership and security labels.

As a side effect, this corrects dumps of public schema ACLs in databases
where the DBA dropped and recreated that schema.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20201229134924.ga1431...@rfd.leadboat.com

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 60d306e..ea67e52 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -725,6 +725,7 @@ void
 buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer racl_subquery,
PQExpBuffer init_acl_subquery, PQExpBuffer 
init_racl_subquery,
const char *acl_column, const char *acl_owner,
+   const char *initprivs_expr,
const char *obj_kind, bool binary_upgrade)
 {
/*
@@ -765,23 +766,25 @@ buildACLQueries(PQExpBuffer acl_subquery, PQExpBuffer 
racl_subquery,
  "WITH ORDINALITY AS perm(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS init(init_acl) WHERE acl = 
init_acl)) as foo)",
  acl_column,
  obj_kind,
  acl_owner,
+ initprivs_expr,
  obj_kind,
  acl_owner);
 
printfPQExpBuffer(racl_subquery,
  "(SELECT pg_catalog.array_agg(acl 
ORDER BY row_n) FROM "
  "(SELECT acl, row_n FROM "
- 
"pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) "
+ 
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "WITH ORDINALITY AS initp(acl,row_n) "
  "WHERE NOT EXISTS ( "
  "SELECT 1 FROM "
  
"pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) "
  "AS permp(orig_acl) WHERE acl = 
orig_acl)) as foo)",
+ initprivs

Re: enable certain TAP tests for MSVC builds

2021-12-05 Thread Noah Misch

On Sun, Dec 05, 2021 at 11:57:31AM -0500, Andrew Dunstan wrote:
> Certain TAP tests rely on settings that the Make files provide for them.
> However vcregress.pl doesn't provide those settings. This patch remedies
> that, and I propose to apply it shortly (when we have a fix for the SSL
> tests that I will write about separately) and backpatch it appropriately.

> --- a/src/tools/msvc/vcregress.pl
> +++ b/src/tools/msvc/vcregress.pl
> @@ -59,6 +59,12 @@ copy("$Config/autoinc/autoinc.dll",   
> "src/test/regress");
>  copy("$Config/regress/regress.dll",   "src/test/regress");
>  copy("$Config/dummy_seclabel/dummy_seclabel.dll", "src/test/regress");
>  
> +# Settings used by TAP tests
> +$ENV{with_ssl} = $config->{openssl} ? 'openssl' : 'no';
> +$ENV{with_ldap} = $config->{ldap} ? 'yes' : 'no';
> +$ENV{with_icu} = $config->{icu} ? 'yes' : 'no';
> +$ENV{with_gssapi} = $config->{gss} ? 'yes' : 'no';

That's appropriate.  There are more variables to cover:

$ git grep -n ^export ':(glob)**/Makefile'
src/bin/pg_basebackup/Makefile:22:export LZ4
src/bin/pg_basebackup/Makefile:23:export TAR
src/bin/pg_basebackup/Makefile:27:export GZIP_PROGRAM=$(GZIP)
src/bin/psql/Makefile:20:export with_readline
src/test/kerberos/Makefile:16:export with_gssapi with_krb_srvnam
src/test/ldap/Makefile:16:export with_ldap
src/test/modules/ssl_passphrase_callback/Makefile:3:export with_ssl
src/test/recovery/Makefile:20:export REGRESS_SHLIB
src/test/ssl/Makefile:18:export with_ssl
src/test/subscription/Makefile:18:export with_icu

Re: enable certain TAP tests for MSVC builds

2021-12-05 Thread Noah Misch

On Sun, Dec 05, 2021 at 06:00:08PM -0500, Andrew Dunstan wrote:
> On 12/5/21 14:47, Noah Misch wrote:
> > On Sun, Dec 05, 2021 at 11:57:31AM -0500, Andrew Dunstan wrote:
> >> Certain TAP tests rely on settings that the Make files provide for them.
> >> However vcregress.pl doesn't provide those settings. This patch remedies
> >> that, and I propose to apply it shortly (when we have a fix for the SSL
> >> tests that I will write about separately) and backpatch it appropriately.
> >> --- a/src/tools/msvc/vcregress.pl
> >> +++ b/src/tools/msvc/vcregress.pl
> >> @@ -59,6 +59,12 @@ copy("$Config/autoinc/autoinc.dll",   
> >> "src/test/regress");
> >>  copy("$Config/regress/regress.dll",   "src/test/regress");
> >>  copy("$Config/dummy_seclabel/dummy_seclabel.dll", "src/test/regress");
> >>  
> >> +# Settings used by TAP tests
> >> +$ENV{with_ssl} = $config->{openssl} ? 'openssl' : 'no';
> >> +$ENV{with_ldap} = $config->{ldap} ? 'yes' : 'no';
> >> +$ENV{with_icu} = $config->{icu} ? 'yes' : 'no';
> >> +$ENV{with_gssapi} = $config->{gss} ? 'yes' : 'no';
> > That's appropriate.  There are more variables to cover:
> >
> > $ git grep -n ^export ':(glob)**/Makefile'
> > src/bin/pg_basebackup/Makefile:22:export LZ4
> > src/bin/pg_basebackup/Makefile:23:export TAR
> > src/bin/pg_basebackup/Makefile:27:export GZIP_PROGRAM=$(GZIP)
> > src/bin/psql/Makefile:20:export with_readline
> > src/test/kerberos/Makefile:16:export with_gssapi with_krb_srvnam
> > src/test/ldap/Makefile:16:export with_ldap
> > src/test/modules/ssl_passphrase_callback/Makefile:3:export with_ssl
> > src/test/recovery/Makefile:20:export REGRESS_SHLIB
> > src/test/ssl/Makefile:18:export with_ssl
> > src/test/subscription/Makefile:18:export with_icu
> 
> LZ4/TAR/GZIP_PROGAM: It's not clear what these should be set to. The TAP
> tests skip tests that use them if they are not set.

Could add config.pl entries for those.  Preventing those skips on Windows may
or may not be worth making config.pl readers think about them.

> with_readline: we don't build with readline on Windows, period. I guess
> we could just set it to "no".

> with_krb_srvnam: the default is "postgres", we could just set it to that
> I guess.

Works for me.

Re: Do we need pre-allocate WAL files during end-of-recovery checkpoint?

2021-12-06 Thread Noah Misch

On Mon, Dec 06, 2021 at 06:21:40PM +0530, Bharath Rupireddy wrote:
> The function PreallocXlogFiles doesn't get called during
> end-of-recovery checkpoint in CreateCheckPoint, see [1]. The server
> becomes operational after the end-of-recovery checkpoint and may need
> WAL files.

PreallocXlogFiles() is never a necessity; it's just an attempted optimization.
I expect preallocation at end-of-recovery would do more harm than good,
because the system would accept no writes at all while waiting for it.

> However, I'm not sure how beneficial it is going to be if
> the WAL is pre-allocated (as PreallocXlogFiles just allocates only 1
> extra WAL file).

Yeah, PreallocXlogFiles() feels like a relict from the same era that gave us
checkpoint_segments=3.  It was more useful before commit 63653f7 (2002).

Re: Probable memory leak with ECPG and AIX

2021-12-11 Thread Noah Misch

On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:
> After some time, the client
> crashes with a segfault error. According to him, it consumed around 256MB.
> What's weird is that it works great on Linux, but crashed on AIX.

That almost certainly means he's using a 32-bit binary with the default heap
size.  To use more heap on AIX, build 64-bit or override the heap size.  For
example, "env LDR_CNTRL=MAXDATA=0x8000 ./a.out" gives 2GiB of heap.  See
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX
for more ways to control heap size.  While that documentation focuses on the
server, the same techniques apply to clients like your test program.

That said, I don't know why your test program reaches 256MB on AIX.  On
GNU/Linux, it uses a lot less.  What version of PostgreSQL provided your
client libraries?

Re: Probable memory leak with ECPG and AIX

2021-12-31 Thread Noah Misch

On Wed, Dec 15, 2021 at 04:20:42PM +0100, Benoit Lobréau wrote:
> * with LDR_CNTRL=MAXDATA=0x1000, we reach 256Mo but there is no
> segfault, the program just continues running ;
> * with LDR_CNTRL=MAXDATA=0x8000, we reach 2Go and there is no segfault
> either, the program just continues running.

I get the same results.  The leak arises because AIX freelocale() doesn't free
all memory allocated in newlocale().  The following program uses trivial
memory on GNU/Linux, but it leaks like you're seeing on AIX:

#include 
int main(int argc, char **argv)
{
while (1)
freelocale(newlocale(LC_NUMERIC_MASK, "C", (locale_t) 0));
return 0;
}

If you have access to file an AIX bug, I recommend doing so.  If we want
PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
less often.  For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().

Re: Probable memory leak with ECPG and AIX

2022-01-01 Thread Noah Misch

On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > I get the same results.  The leak arises because AIX freelocale() doesn't 
> > free
> > all memory allocated in newlocale().  The following program uses trivial
> > memory on GNU/Linux, but it leaks like you're seeing on AIX:
> 
> Bleah.
> 
> > If you have access to file an AIX bug, I recommend doing so.  If we want
> > PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
> > less often.  For example, do it once per process or once per connection
> > instead of once per ecpg_do_prologue().
> 
> It's worse than that: see also ECPGget_desc().  Seems like a case
> could be made for doing something about this just on the basis
> of cycles expended, never mind freelocale() bugs.

Agreed.  Once per process seems best.  I only hesitated before since it means
nothing will free this storage, which could be annoying in the context of
Valgrind and similar.  However, ECPG already has bits of never-freed memory in
the form of pthread_key_create() calls having no pthread_key_delete(), so I
don't mind adding a bit more.

Re: null iv parameter passed to combo_init()

2022-01-08 Thread Noah Misch

On Fri, Jan 07, 2022 at 04:32:01PM -0800, Zhihong Yu wrote:
> In contrib/pgcrypto/pgcrypto.c :
> 
> err = px_combo_init(c, (uint8 *) VARDATA_ANY(key), klen, NULL, 0);
> 
> Note: NULL is passed as iv.
> 
> When combo_init() is called,
> 
> if (ivlen > ivs)
> memcpy(ivbuf, iv, ivs);
> else
> memcpy(ivbuf, iv, ivlen);
> 
> It seems we need to consider the case of null being passed as iv for
> memcpy() because of this:
> 
> /usr/include/string.h:44:28: note: nonnull attribute specified here

I agree it's time to fix cases like this, given
https://postgr.es/m/flat/20200904023648.gb3426...@rfd.leadboat.com.  However,
it should be one patch fixing all (or at least many) of them.

> --- a/contrib/pgcrypto/px.c
> +++ b/contrib/pgcrypto/px.c
> @@ -198,10 +198,13 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned 
> klen,
>   if (ivs > 0)
>   {
>   ivbuf = palloc0(ivs);
> - if (ivlen > ivs)
> - memcpy(ivbuf, iv, ivs);
> - else
> - memcpy(ivbuf, iv, ivlen);
> + if (iv != NULL)
> + {
> + if (ivlen > ivs)
> + memcpy(ivbuf, iv, ivs);
> + else
> + memcpy(ivbuf, iv, ivlen);
> + }
>   }

If someone were to pass NULL iv with nonzero ivlen, that will silently
malfunction.  I'd avoid that risk by writing this way:

--- a/contrib/pgcrypto/px.c
+++ b/contrib/pgcrypto/px.c
@@ -202,3 +202,3 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen,
memcpy(ivbuf, iv, ivs);
-   else
+   else if (ivlen > 0)
memcpy(ivbuf, iv, ivlen);

That also gives the compiler an additional optimization strategy.

Re: null iv parameter passed to combo_init()

2022-01-08 Thread Noah Misch

On Sat, Jan 08, 2022 at 06:52:14PM -0800, Zhihong Yu wrote:
> On Sat, Jan 8, 2022 at 5:52 PM Noah Misch  wrote:
> > On Fri, Jan 07, 2022 at 04:32:01PM -0800, Zhihong Yu wrote:

> > I agree it's time to fix cases like this, given
> > https://postgr.es/m/flat/20200904023648.gb3426...@rfd.leadboat.com.  
> > However,
> > it should be one patch fixing all (or at least many) of them.

> > > --- a/contrib/pgcrypto/px.c
> > > +++ b/contrib/pgcrypto/px.c
> > > @@ -198,10 +198,13 @@ combo_init(PX_Combo *cx, const uint8 *key,
> > unsigned klen,
> > >   if (ivs > 0)
> > >   {
> > >   ivbuf = palloc0(ivs);
> > > - if (ivlen > ivs)
> > > - memcpy(ivbuf, iv, ivs);
> > > - else
> > > - memcpy(ivbuf, iv, ivlen);
> > > + if (iv != NULL)
> > > + {
> > > + if (ivlen > ivs)
> > > + memcpy(ivbuf, iv, ivs);
> > > + else
> > > + memcpy(ivbuf, iv, ivlen);
> > > + }
> > >   }
> >
> > If someone were to pass NULL iv with nonzero ivlen, that will silently
> >
> Hi,
> If iv is NULL, none of the memcpy() would be called (based on my patch).
> Can you elaborate your suggestion in more detail ?

On further thought, I would write it this way:

--- a/contrib/pgcrypto/px.c
+++ b/contrib/pgcrypto/px.c
@@ -202,3 +202,3 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen,
memcpy(ivbuf, iv, ivs);
-   else
+   else if (ivlen != 0)
memcpy(ivbuf, iv, ivlen);

That helps in two ways.  First, if someone passes iv==NULL and ivlen!=0, my
version will tend to crash, but yours will treat that like ivlen==0.  Since
this would be a programming error, crashing is better.  Second, a compiler can
opt to omit the "ivlen != 0" test from the generated assembly, because the
compiler can know that memcpy(any_value_a, any_value_b, 0) is a no-op.

> Since the referenced email was old, line numbers have changed.
> It would be nice if an up-to-date list is provided in case more places
> should be changed.

To check whether you've gotten them all, configure with CC='gcc
-fsanitize=undefined -fsanitize-undefined-trap-on-error' and run check-world.

Re: null iv parameter passed to combo_init()

2022-01-09 Thread Noah Misch

On Sun, Jan 09, 2022 at 04:37:32AM -0800, Zhihong Yu wrote:
> On Sat, Jan 8, 2022 at 11:32 PM Tom Lane  wrote:
> > Noah Misch  writes:
> > > On further thought, I would write it this way:
> >
> > > - else
> > > + else if (ivlen != 0)
> > >   memcpy(ivbuf, iv, ivlen);
> >
> > FWIW, I liked the "ivlen > 0" formulation better.  They should be
> > equivalent, because ivlen is unsigned, but it just seems like "> 0"
> > is more natural.

If I were considering the one code site in isolation, I'd pick "ivlen > 0".
But of the four sites identified so far, three have signed length variables.
Since we're likely to get more examples of this pattern, some signed and some
unsigned, I'd rather use a style that does the optimal thing whether or not
the variable is signed.  What do you think?

> Patch v4 is attached.

Does this pass the test procedure shown upthread?

Re: null iv parameter passed to combo_init()

2022-01-12 Thread Noah Misch

On Mon, Jan 10, 2022 at 03:34:27PM -0800, Zhihong Yu wrote:
> On Sun, Jan 9, 2022 at 6:45 PM Zhihong Yu  wrote:
> > gcc -Wall -Wmissing-prototypes -Wpointer-arith
> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> > -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> > -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> > -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
> > -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
> > -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
> 
> Patch v3 passes `make check-world`

The patch uses the "LENGTH_VAR != 0" style in px.c, but it uses "POINTER_VAR
!= NULL" style in the other files.  Please use "LENGTH_VAR != 0" style in each
place you're changing.

Assuming the next version looks good, I'll likely back-patch it to v10.  Would
anyone like to argue for a back-patch all the way to 9.2?

Re: null iv parameter passed to combo_init()

2022-01-13 Thread Noah Misch

On Sun, Jan 09, 2022 at 06:45:09PM -0800, Zhihong Yu wrote:
> On Sun, Jan 9, 2022 at 1:27 PM Zhihong Yu  wrote:
> > After installing gcc-11, ./configure passed (with 0003-memcpy-null.patch).
> > In the output of `make check-world`, I don't see `runtime error`.

That's expected.  With -fsanitize-undefined-trap-on-error, the program will
generate SIGILL when UBSan detects undefined behavior.  To get "runtime error"
messages in the postmaster log, drop -fsanitize-undefined-trap-on-error.  Both
ways of running the tests have uses.  -fsanitize-undefined-trap-on-error is
better when you think the code is clean, because a zero "make check-world"
exit status confirms the code is clean.  Once you know the code is unclean in
some way, -fsanitize-undefined-trap-on-error is better for getting details.

> > Though there was a crash (maybe specific to my machine):
> >
> > Core was generated by
> > `/nfusr/dev-server/zyu/postgres/tmp_install/usr/local/pgsql/bin/postgres
> > --singl'.
> > Program terminated with signal SIGILL, Illegal instruction.
> > #0  0x0050642d in write_item.cold ()
> > Missing separate debuginfos, use: debuginfo-install
> > glibc-2.17-325.el7_9.x86_64 nss-pam-ldapd-0.8.13-25.el7.x86_64
> > sssd-client-1.16.5-10.el7_9.10.x86_64
> > (gdb) bt
> > #0  0x0050642d in write_item.cold ()
> > #1  0x00ba9d1b in write_relcache_init_file ()
> > #2  0x00bb58f7 in RelationCacheInitializePhase3 ()
> > #3  0x00bd5cb5 in InitPostgres ()
> > #4  0x00a0a9ea in PostgresMain ()

That is UBSan detecting undefined behavior.  A successful patch version will
fix write_item(), among many other places that are currently making
check-world fail.  I get the same when testing your v5 under "gcc (Debian
11.2.0-13) 11.2.0".  I used the same host as buildfarm member thorntail, and I
configured like this:

  ./configure -C --with-lz4 --prefix=$HOME/sw/nopath/pghead --enable-tap-tests 
--enable-debug --enable-depend --enable-cassert CC='ccache gcc-11 
-fsanitize=undefined -fsanitize-undefined-trap-on-error' CFLAGS='-O2 
-funwind-tables'

> Earlier I was using devtoolset-11 which had an `Illegal instruction` error.
> 
> I compiled / installed gcc-11 from source (which took whole afternoon).
> `make check-world` passed with patch v3.
> In tmp_install/log/install.log, I saw:
> 
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
> -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
> -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
> rm -f libpgport.a

Perhaps this self-compiled gcc-11 is defective, being unable to detect the
instances of undefined behavior that other builds detect.  If so, use the
"devtoolset-11" gcc instead.  You're also building without optimization; that
might be the problem.

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-15 Thread Noah Misch

On Fri, Nov 19, 2021 at 09:18:23PM -0800, Noah Misch wrote:
> On Wed, Nov 17, 2021 at 11:05:06PM -0800, Noah Misch wrote:
> > On Wed, Nov 17, 2021 at 05:47:10PM -0500, Tom Lane wrote:
> > > Noah Misch  writes:
> > > > Each of the three failures happened on a sparc64 Debian+gcc machine.  I 
> > > > had
> > > > tried ~8000 iterations on thorntail, another sparc64 Debian+gcc animal,
> > > > without reproducing this.
> > 
> > > #   'pgbench: error: client 0 script 1 aborted in command 
> > > 4 query 0: ERROR:  could not read two-phase state from WAL at 0/159EF88: 
> > > unexpected pageaddr 0/0 in log segment 00010001, offset 
> > > 5890048
> > > [1] 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-17%2013%3A01%3A24
> > 
> > Two others:
> > ERROR:  could not read two-phase state from WAL at 0/16F1850: invalid 
> > record length at 0/16F1850: wanted 24, got 0
> > -- 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-12%2013%3A01%3A15
> > ERROR:  could not read two-phase state from WAL at 0/1668020: incorrect 
> > resource manager data checksum in record at 0/1668020
> > -- 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-11-16%2015%3A00%3A52

> > I don't have a great theory, but here are candidates to examine next:
> > 
> > - Run with wal_debug=on to confirm logged write location matches read 
> > location.
> > - Run 
> > "PGDATA=contrib/amcheck/tmp_check/t_003_cic_2pc_CIC_2PC_test_data/pgdata
> >   pg_waldump -s 0/0100" at the end of the test.
> > - Dump WAL page binary image at the point of failure.
> > - Log which branches in XLogReadRecord() are taken.
> 
> Tom Turelinckx, are you able to provide remote access to kittiwake or
> tadarida?  I'd use it to attempt the above things.  All else being equal,
> kittiwake is more relevant since it's still supported upstream.

Thanks for setting up access.  Summary: this kernel has a bug in I/O syscalls.
How practical is it to update that kernel?  (Userland differs across these
animals, but the kernel does not.)  The kernel on buildfarm member thorntail
doesn't exhibit the bug.

For specifics of the kernel bug, see the attached test program.  In brief, the
bug arises if one process is write()ing or pwrite()ing a file at about the
same time that another process is read()ing or pread()ing the same.  POSIX
says the reader should see the data as it existed before the write or the
newly-written data.  On this kernel, the reader can see zeros instead.  That
leads to the $SUBJECT failure.  PostgreSQL processes write out a given WAL
block 20-30 times in ~10ms, and COMMIT PREPARED reads that block.  The writers
aren't changing the bytes of interest to COMMIT PREPARED, but the zeros from
the kernel bug yield the failure.  We could opt to work around that by writing
only the not-already-written portion of a WAL block, but I doubt that's
worthwhile unless it happens to be a performance win anyway.

Separately, while I don't know of relevance to PostgreSQL, I was interested to
see that CentOS 7 pwrite()/pread() fail to have the POSIX-required atomicity.

> The setup of your buildfarm animals provides a clue.  I understand kittiwake
> is the same as ibisbill except for build options, and tadarida is the same as
> mussurana except for build options.  ibisbill and mussurana haven't failed, so
> one of these is likely needed to reproduce:
> 
>   absence of --enable-cassert
>   CFLAGS='-g -O2 -fstack-protector -Wformat -Werror=format-security '
>   CPPFLAGS='-Wdate-time -D_FORTIFY_SOURCE=2'
>   LDFLAGS='-Wl,-z,relro -Wl,-z,now'

That was a red herring.  ibisbill and mussurana don't use --with-tap-tests.
Adding --with-tap-tests has been enough to make their configurations witness
the same kinds of failures.

nm
/*
 * Stress-test pread(), pwrite(), read(), and write() to detect a few problems
 * with their handling of regular files:
 *
 * - Lack of atomicity.
 *   
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07
 *   requires atomicity.  (An implementation may always return <= 1; if it
 *   chooses to return higher values, it must maintain atomicity.)
 *
 * - Transiently making readers see zeros when they read concurrently with a
 *   write, even if the file had no zero at that offset before or after the
 *   write.
 *
 * By default, this program will print "mismatch" messages if pwrite() is
 * non-atomic.  Build with other options to test other behaviors:
 * -DCHANGE_CONTENT=0 tests the zeros bug instead of plain atomicity
 * -D

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-16 Thread Noah Misch

Cancel that kernel upgrade idea.  I no longer expect it to help...

On Sun, Jan 16, 2022 at 10:19:30PM +1300, Thomas Munro wrote:
> On Sun, Jan 16, 2022 at 8:12 PM Noah Misch  wrote:
> > For specifics of the kernel bug, see the attached test program.  In brief, 
> > the
> > bug arises if one process is write()ing or pwrite()ing a file at about the
> > same time that another process is read()ing or pread()ing the same.  POSIX
> > says the reader should see the data as it existed before the write or the
> > newly-written data.  On this kernel, the reader can see zeros instead.  That
> > leads to the $SUBJECT failure.  PostgreSQL processes write out a given WAL
> > block 20-30 times in ~10ms, and COMMIT PREPARED reads that block.  The 
> > writers
> > aren't changing the bytes of interest to COMMIT PREPARED, but the zeros from
> > the kernel bug yield the failure.

The difference between kittiwake and thorntail comes from thorntail using xfs
and kittiwake using ext4.  Running the io-rectitude.c tests on an ext4
partition on thorntail, I see the zeros bug just like I do on kittiwake.  I
don't see the zeros bug on ppc64 or x86_64, just sparc64 so far:

 * ext4, Linux 3.10.0-1160.49.1.el7.x86_64 (CentOS 7.9.2009):
 * pwrite/pread is non-atomic if count>16 (no -D switches)
 * write/read is atomic (-DUSE_SEEK -DXLOG_BLCKSZ=8192000)
 * pwrite/pread is free from zeros bug (-DCHANGE_CONTENT=0)
 * write/read is free from zeros bug (-DUSE_SEEK -DCHANGE_CONTENT=0)
 *
 * ext4, Linux 4.9.0-13-sparc64-smp (Debian):
 * pwrite/pread is non-atomic if count>4 (no -D switches)
 * write/read is non-atomic if count>4 (-DUSE_SEEK)
 * write/read IS atomic w/o REOPEN (-DUSE_SEEK -DREOPEN=0 -DXLOG_BLCKSZ=8192000)
 * pwrite/pread has zeros bug for count>127 (-DCHANGE_CONTENT=0)
 * pwrite/pread w/ O_SYNC has zeros bug (-DCHANGE_CONTENT=0 -DOPEN_FLAGS=O_SYNC)
 *far less frequent w/ O_SYNC, but it still happens
 * pwrite/pread w/o REOPEN also has zeros bug for count>127 (-DCHANGE_CONTENT=0 
-DREOPEN=0)
 * write/read has zeros bug for count>127 (-DUSE_SEEK -DCHANGE_CONTENT=0)
 * write/read w/ O_SYNC has zeros bug (-DUSE_SEEK -DCHANGE_CONTENT=0 
-DOPEN_FLAGS=O_SYNC)
 * write/read w/o REOPEN is free from zeros bug (-DUSE_SEEK -DCHANGE_CONTENT=0 
-DREOPEN=0)
 *
 * ext4, Linux 5.15.0-2-sparc64-smp (Debian bookworm/sid):
 * [behaviors match the previous kernel exactly]
 *
 * ext4, Linux 5.15.0-2-powerpc64 (Debian bookworm/sid):
 * [atomicity matches previous kernel, but zeros bug does not]
 * pwrite/pread is non-atomic if count>4 (no -D switches)
 * write/read is non-atomic if count>4 (-DUSE_SEEK)
 * write/read IS atomic w/o REOPEN (-DUSE_SEEK -DREOPEN=0 -DXLOG_BLCKSZ=8192000)
 * pwrite/pread is free from zeros bug (-DCHANGE_CONTENT=0)
 * write/read is free from zeros bug (-DUSE_SEEK -DCHANGE_CONTENT=0)
 *
 * ext4, Linux 5.15.5-0-virt x86_64 (Alpine):
 * [behaviors match the previous kernel exactly]
 *
 * xfs, Linux 5.15.0-2-sparc64-smp (Debian bookworm/sid):
 * pwrite/pread is atomic (-DXLOG_BLCKSZ=8192000)
 * write/read is atomic (-DUSE_SEEK -DXLOG_BLCKSZ=8192000)
 * pwrite/pread is free from zeros bug (-DCHANGE_CONTENT=0)
 * write/read is free from zeros bug (-DUSE_SEEK -DCHANGE_CONTENT=0)

> > We could opt to work around that by writing
> > only the not-already-written portion of a WAL block, but I doubt that's
> > worthwhile unless it happens to be a performance win anyway.

My next steps:

- Report a Debian bug for the sparc64+ext4 zeros problem.

- Try to falsify the idea that "write only the not-already-written portion of
  a WAL block" is an effective workaround.  Specifically, modify the test
  program to have the writer process mutate offsets [N-k,N-1] and [N+1,N+k]
  while the reader process reads offset N.  If the reader sees a zero, that
  workaround is ineffective.

- Implement the workaround, if I didn't falsify its effectiveness.  If it
  doesn't hurt performance on x86_64, we can use it unconditionally.
  Otherwise, limit its use to sparc64 Linux.

> > Separately, while I don't know of relevance to PostgreSQL, I was interested 
> > to
> > see that CentOS 7 pwrite()/pread() fail to have the POSIX-required 
> > atomicity.
> 
> FWIW there was some related discussion over here:
> 
> https://www.postgresql.org/message-id/flat/17064-bb0d7904ef72add3%40postgresql.org

That gave me the idea to test different filesystems.  Thanks.  Incidentally, I
find https://utcc.utoronto.ca/~cks/space/blog/unix/WriteNotVeryAtomic is
mistaken about POSIX requirements.  There's no precedent for POSIX writing
"two threads" when it means "two threads of the same process".  Moreover, the
part about "shall also apply whenever a file descriptor is successfully
closed, however caused (for example [...] process termination)"

Re: A test for replay of regression tests

2022-01-16 Thread Noah Misch

On Mon, Jan 17, 2022 at 05:25:19PM +1300, Thomas Munro wrote:
> Here's how it failed on fairywren, in case someone knowledgeable of
> MSYS path translation etc can spot the problem:
> 
> psql::1: ERROR:  directory
> "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/modules/test_misc/tmp_check/t_002_tablespace_main_data/ts1"
> does not exist
> not ok 1 - create tablespace with absolute path
> 
> I think that means chmod() failed with ENOENT.  That's weird, because
> the .pl does:
> 
> +my $TS1_LOCATION = $node->basedir() . "/ts1";
> +mkdir($TS1_LOCATION);

You likely need a PostgreSQL::Test::Utils::perl2host() call.  MSYS Perl
understands Cygwin-style names like /home/... as well as Windows-style names,
but this PostgreSQL configuration understands only Windows-style names.

Re: cleaning up a few CLOG-related things

2021-03-21 Thread Noah Misch

On Wed, Jan 27, 2021 at 12:35:30PM -0500, Robert Haas wrote:
> On Mon, Jan 25, 2021 at 2:11 PM Heikki Linnakangas  wrote:
> > Having a separate FullTransactionIdToLatestPageNumber() function for
> > this seems like overkill to me.
> 
> I initially thought so too, but it turned out to be pretty useful for
> writing debugging cross-checks and things, so I'm inclined to keep it
> even though I'm not at present proposing to commit any such debugging
> cross-checks. For example I tried making the main redo loop check
> whether XactCtl->shared->latest_page_number ==
> FullTransactionIdToLatestPageNumber(nextXid) which turned out to be
> super-helpful in understanding things.

> +/*
> + * Based on ShmemVariableCache->nextXid, compute the latest CLOG page that
> + * is expected to exist.
> + */
> +static int
> +FullTransactionIdToLatestPageNumber(FullTransactionId nextXid)
> +{
> + /*
> +  * nextXid is the next XID that will be used, but we want to set
> +  * latest_page_number according to the last XID that's already been 
> used.
> +  * So retreat by one. See also GetNewTransactionId().
> +  */
> + FullTransactionIdRetreat(&nextXid);
> + return TransactionIdToPage(XidFromFullTransactionId(nextXid));
> +}

I don't mind the arguably-overkill function.  I'd probably have named it
FullTransactionIdPredecessorToPage(), to focus on its behavior as opposed to
its caller's behavior, but static function naming isn't a weighty matter.
Overall, the patch looks fine.  If nextXid is the first on a page, the next
GetNewTransactionId() -> ExtendCLOG() -> ZeroCLOGPage() -> SimpleLruZeroPage()
is responsible for updating latest_page_number.

Re: pgsql: Move tablespace path re-creation from the makefiles to pg_regres

2021-03-24 Thread Noah Misch

On Thu, Mar 25, 2021 at 07:44:02AM +0900, Michael Paquier wrote:
> On Wed, Mar 24, 2021 at 10:50:50AM -0400, Robert Haas wrote:
> > On Wed, Mar 24, 2021 at 5:56 AM Christoph Berg  wrote:
> >> Maybe creating the tablespace directory from within the testsuite
> >> would suffice?
> >>
> >> CREATE TABLE foo (t text);
> >> COPY foo FROM PROGRAM 'mkdir @testtablespace@';
> >> CREATE TABLESPACE regress_tblspace LOCATION '@testtablespace@';
> > 
> > Would that work on Windows?

That would entail forbidding various shell metacharacters in @testtablespace@.
One could avoid imposing such a restriction by adding a mkdir() function to
regress.c and writing it like:

CREATE FUNCTION mkdir(text)
   RETURNS void AS '@libdir@/regress@DLSUFFIX@', 'regress_mkdir'
   LANGUAGE C STRICT\;
  REVOKE ALL ON FUNCTION mkdir FROM public;
SELECT mkdir('@testtablespace@');
CREATE TABLESPACE regress_tblspace LOCATION '@testtablespace@';

> I doubt that all the Windows environments would be able to get their
> hands on that.

> And I am not sure either that this would work when it
> comes to the CI case, again on Windows.

How might a CI provider break that?

Re: Tying an object's ownership to datdba

2021-03-24 Thread Noah Misch

On Wed, Mar 24, 2021 at 11:57:37AM -0400, John Naylor wrote:
> On Mon, Dec 28, 2020 at 12:32 AM Noah Misch  wrote:
> > [v2]
> 
> Hi Noah,
> 
> In the refactoring patch, there is a lingering comment reference to 
> roles_has_privs_of(). Aside from that, it looks good to me. A possible thing 
> to consider is an assert that is_admin is not null where we expect that.

Thanks.  The next version will contain the comment fix and
"Assert(OidIsValid(admin_of) == PointerIsValid(is_admin));".

> The database owner role patch looks good as well.

Do you plan to put the CF entry into Ready for Committer, or should the
patches wait for another review?

> > I ended up blocking DDL that creates role memberships involving the new 
> > role;
> > see reasons in user.c comments.  Lifting those restrictions looked feasible,
> > but it was inessential to the mission, and avoiding unintended consequences
> > would have been tricky.  Views "information_schema.enabled_roles" and
> > "information_schema.applicable_roles" list any implicit membership in
> > pg_database_owner, but pg_catalog.pg_group and psql \dgS do not.  If this
> > leads any reviewer to look closely at applicable_roles, note that its 
> > behavior
> > doesn't match its documentation
> > (https://postgr.es/m/flat/20060728170615.gy20...@kenobi.snowman.net).
> 
> Is this something that needs fixing separately?

It is bug, but I think fixing it is not very urgent and should happen
separately, if at all.

Re: non-HOT update not looking at FSM for large tuple update

2021-03-27 Thread Noah Misch

I gather this is important when large_upd_rate=rate(cross-page update bytes
for tuples larger than fillfactor) exceeds small_ins_rate=rate(insert bytes
for tuples NOT larger than fillfactor).  That is a plausible outcome when
inserts are rare, and table bloat then accrues at
large_upd_rate-small_ins_rate.  I agree this patch improves behavior.

Does anyone have a strong opinion on whether to back-patch?  I am weakly
inclined not to back-patch, because today's behavior might happen to perform
better when large_upd_rate-small_ins_rate<0.  Besides the usual choices of
back-patching or not back-patching, we could back-patch with a stricter
threshold.  Suppose we accepted pages for larger-than-fillfactor tuples when
the pages have at least
BLCKSZ-SizeOfPageHeaderData-sizeof(ItemIdData)-MAXALIGN(MAXALIGN(SizeofHeapTupleHeader)+1)+1
bytes of free space.  That wouldn't reuse a page containing a one-column
tuple, but it would reuse a page having up to eight line pointers.

On Fri, Mar 19, 2021 at 02:16:22PM -0400, John Naylor wrote:
> --- a/src/backend/access/heap/hio.c
> +++ b/src/backend/access/heap/hio.c
> @@ -335,11 +335,24 @@ RelationGetBufferForTuple(Relation relation, Size len,

> + const Size  maxPaddedFsmRequest = MaxHeapTupleSize -
> + (MaxHeapTuplesPerPage / 8 * sizeof(ItemIdData));

In evaluating whether this is a good choice of value, I think about the
expected page lifecycle.  A tuple barely larger than fillfactor (roughly
len=1+BLCKSZ*fillfactor/100) will start on a roughly-empty page.  As long as
the tuple exists, the server will skip that page for inserts.  Updates can
cause up to floor(99/fillfactor) same-size versions of the tuple to occupy the
page simultaneously, creating that many line pointers.  At the fillfactor=10
minimum, it's good to accept otherwise-empty pages having at least nine line
pointers, so a page can restart the aforementioned lifecycle.  Tolerating even
more line pointers helps when updates reduce tuple size or when the page was
used for smaller tuples before it last emptied.  At the BLCKSZ=8192 default,
this maxPaddedFsmRequest allows 36 line pointers (good or somewhat high).  At
the BLCKSZ=1024 minimum, it allows 4 line pointers (low).  At the BLCKSZ=32768
maximum, 146 (likely excessive).  I'm not concerned about optimizing
non-default block sizes, so let's keep your proposal.

Comments and the maxPaddedFsmRequest variable name use term "fsm" for things
not specific to the FSM.  For example, the patch's test case doesn't use the
FSM.  (That is fine.  Ordinarily, RelationGetTargetBlock() furnishes its
block.  Under CLOBBER_CACHE_ALWAYS, the "try the last page" logic does so.  An
FSM-using test would contain a VACUUM.)  I plan to commit the attached
version; compared to v5, it updates comments and renames this variable.

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 

Accept slightly-filled pages for tuples larger than fillfactor.

We always inserted a larger-than-fillfactor tuple into a newly-extended
page, even when existing pages were empty or contained nothing but an
unused line pointer.  This was unnecessary relation extension.  Start
tolerating page usage up to 1/8 the maximum space that could be taken up
by line pointers.  This is somewhat arbitrary, but it should allow more
cases to reuse pages.  This has no effect on tables with fillfactor=100
(the default).

John Naylor and Floris van Nee.  Reviewed by Matthias van de Meent.
Reported by Floris van Nee.

Discussion: 
https://postgr.es/m/6e263217180649339720afe2176c5...@opammb0562.comp.optiver.com

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 75223c9..08b4e1b 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -317,10 +317,10 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState 
bistate)
  * BULKWRITE buffer selection strategy object to the buffer manager.
  * Passing NULL for bistate selects the default behavior.
  *
- * We always try to avoid filling existing pages further than the 
fillfactor.
- * This is OK since this routine is not consulted when updating a tuple and
- * keeping it on the same page, which is the scenario fillfactor is meant
- * to reserve space for.
+ * We don't fill existing pages further than the fillfactor, except for 
large
+ * tuples in nearly-empty tables.  This is OK since this routine is not
+ * consulted when updating a tuple and keeping it on the same page, which 
is
+ * the scenario fillfactor is meant to reserve space for.
  *
  * ereport(ERROR) is allowed here, so this routine *must* be called
  * before any (unlogged) changes are made in buffer pool.
@@ -334,8 +334,10 @@ RelationGetBufferForTuple(Relation relation, Size len,
booluse_fsm = !(options & HEAP_INSERT_SKIP_FSM);

Re: public schema default ACL

2021-03-27 Thread Noah Misch

On Sat, Feb 13, 2021 at 04:56:29AM -0800, Noah Misch wrote:
> I'm attaching the patch for $SUBJECT, which applies atop the four patches from
> the two other threads below.  For convenience of testing, I've included a
> rollup patch, equivalent to applying all five patches.

I committed prerequisites from one thread, so here's a rebased rollup patch.
Author: Noah Misch 
Commit: Noah Misch 

Rollup of three patches, for easy testing.

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..1bcc29b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8925,7 +8925,7 @@ $d$;
 -- But creation of user mappings for non-superusers should fail
 CREATE USER MAPPING FOR public SERVER loopback_nopw;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
-CREATE FOREIGN TABLE ft1_nopw (
+CREATE FOREIGN TABLE pg_temp.ft1_nopw (
c1 int NOT NULL,
c2 int NOT NULL,
c3 text,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..803e288 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2595,7 +2595,7 @@ $d$;
 CREATE USER MAPPING FOR public SERVER loopback_nopw;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
 
-CREATE FOREIGN TABLE ft1_nopw (
+CREATE FOREIGN TABLE pg_temp.ft1_nopw (
c1 int NOT NULL,
c2 int NOT NULL,
c3 text,
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index f073fba..550b3d0 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2976,20 +2976,18 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;

 By default, users cannot access any objects in schemas they do not
 own.  To allow that, the owner of the schema must grant the
-USAGE privilege on the schema.  To allow users
-to make use of the objects in the schema, additional privileges
-might need to be granted, as appropriate for the object.
+USAGE privilege on the schema.  By default, everyone
+has that privilege on the schema public.  To allow
+users to make use of the objects in a schema, additional privileges might
+need to be granted, as appropriate for the object.

 

-A user can also be allowed to create objects in someone else's
-schema.  To allow that, the CREATE privilege on
-the schema needs to be granted.  Note that by default, everyone
-has CREATE and USAGE privileges on
-the schema
-public.  This allows all users that are able to
-connect to a given database to create objects in its
-public schema.
+A user can also be allowed to create objects in someone else's schema.  To
+allow that, the CREATE privilege on the schema needs to
+be granted.  In databases upgraded from
+PostgreSQL 13 or earlier, everyone has that
+privilege on the schema public.
 Some usage patterns call for
 revoking that privilege:
 
@@ -3062,20 +3060,25 @@ REVOKE CREATE ON SCHEMA public FROM PUBLIC;
database owner attack. -->
   
Constrain ordinary users to user-private schemas.  To implement this,
-   issue REVOKE CREATE ON SCHEMA public FROM PUBLIC,
-   and create a schema for each user with the same name as that user.
-   Recall that the default search path starts
-   with $user, which resolves to the user name.
-   Therefore, if each user has a separate schema, they access their own
-   schemas by default.  After adopting this pattern in a database where
-   untrusted users had already logged in, consider auditing the public
-   schema for objects named like objects in
+   first issue REVOKE CREATE ON SCHEMA public FROM
+   PUBLIC.  Then, for every user needing to create non-temporary
+   objects, create a schema with the same name as that user.  Recall that
+   the default search path starts with $user, which
+   resolves to the user name.  Therefore, if each user has a separate
+   schema, they access their own schemas by default.  After adopting this
+   pattern in a database where untrusted users had already logged in,
+   consider auditing the public schema for objects named like objects in
schema pg_catalog.  This pattern is a secure schema
usage pattern unless an untrusted user is the database owner or holds
the CREATEROLE privilege, in which case no secure
schema usage pattern exists.
   
   
+   If the database originated in an upgrade
+   from PostgreSQL 13 or earlier,
+   the REVOKE is essential.  Otherwise, the default
+   configuration follows this pattern; ordinary users can create only
+   temporary objects until a privileged user furnishes a schema.
   
  
 
@@ -3084,10 +3087,10 @@ REVOKE CREATE ON SCHEMA public FROM PUBLIC;

Re: non-HOT update not looking at FSM for large tuple update

2021-03-27 Thread Noah Misch

On Sat, Mar 27, 2021 at 11:26:47AM -0400, John Naylor wrote:
> On Sat, Mar 27, 2021 at 3:00 AM Noah Misch  wrote:
> > Does anyone have a strong opinion on whether to back-patch?  I am weakly
> > inclined not to back-patch, because today's behavior might happen to perform
> > better when large_upd_rate-small_ins_rate<0.
> 
> It's not a clear case. The present behavior is clearly a bug, but only
> manifests in rare situations. The risk of the fix affecting other situations
> is not zero, as you mention, but (thinking briefly about this and I could be
> wrong) the consequences don't seem as big as the reported case of growing
> table size.

I agree sites that are hurting now will see a net benefit.  I can call it a
bug that we treat just-extended pages differently from existing
zero-line-pointer pages (e.g. pages left by RelationAddExtraBlocks()).
Changing how we treat pages having 100 bytes of data feels different to me.
It's more like a policy decision, not a clear bug fix.

I'm open to back-patching, but I plan to do so only if a few people report
being firmly in favor.

> > Besides the usual choices of
> > back-patching or not back-patching, we could back-patch with a stricter
> > threshold.  Suppose we accepted pages for larger-than-fillfactor tuples when
> > the pages have at least
> > BLCKSZ-SizeOfPageHeaderData-sizeof(ItemIdData)-MAXALIGN(MAXALIGN(SizeofHeapTupleHeader)+1)+1
> > bytes of free space.  That wouldn't reuse a page containing a one-column
> > tuple, but it would reuse a page having up to eight line pointers.
> 
> I'm not sure how much that would help in the reported case that started this 
> thread.

I'm not sure, either.  The thread email just before yours (27 Mar 2021
10:24:00 +) does suggest it would help less than the main proposal.

Re: policies with security definer option for allowing inline optimization

2021-04-04 Thread Noah Misch

On Fri, Apr 02, 2021 at 02:24:59PM -0700, Dan Lynch wrote:
> Does anyone know details of, or where to find more information about the
> implications of the optimizer on the quals/checks for the policies being
> functions vs inline?

Roughly, the PostgreSQL optimizer treats LANGUAGE SQL functions like a C
compiler treats "extern inline" functions.  Other PostgreSQL functions behave
like C functions in a shared library.  Non-SQL functions can do arbitrary
things, and the optimizer knows only facts like their volatility and the value
given in CREATE FUNCTION ... COST.

> I suppose if the
> get_group_ids_of_current_user() function is marked as STABLE, would the
> optimizer cache this value for every row in a SELECT that returned
> multiple rows?

While there was a patch to implement caching, it never finished.  The
optimizer is allowed to, and sometimes does, choose plan shapes that reduce
the number of function calls.

> Is it possible that if the function is sql vs plpgsql it
> makes a difference?

Yes; see inline_function() in the PostgreSQL source.  The hard part of
$SUBJECT is creating the infrastructure to inline across a SECURITY DEFINER
boundary.  Currently, a single optimizable statement operates under just one
user identity.  Somehow, the optimizer would need to translate the SECURITY
DEFINER call into a list of moments where the executor shall switch user ID,
then maintain that list across further optimization steps.  security_barrier
views are the most-similar thing, but as Joe Conway mentioned, views differ
from SECURITY DEFINER in crucial ways.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1201 matches

Mail list logo