date:20230111

I'm working on a new patchset for my commitfest entry. I'll make sure
to include a third patch for the +group support, and I'll include you
(Andrew) in the thread when I send it.

On Wed, 11 Jan 2023 at 02:14, Michael Paquier  wrote:
>
> On Tue, Jan 10, 2023 at 09:42:19AM -0500, Andrew Dunstan wrote:
> > Ok, that sounds reasonable, but the cfbot doesn't like patches that
> > depend on other patches in a different email. Maybe you can roll this up
> > as an extra patch in your next version? It's pretty small.
>
> This can go two ways if both of you agree, by sending an updated patch
> on this thread based on the other one..  And actually, Jelte's patch
> has less C code than this thread's proposal, eventhough it lacks
> tests.
> --
> Michael

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

> The confusion that 0001 is addressing is fair (cough, fc579e1, cough),
> still I am wondering whether we could do a bit better to be more

Yeah, even after 0001 it's definitely suboptimal. I tried to keep the changes
minimal to not distract from the main purpose of this patch. But I'll update
the patch to have some more. I'll respond to your other question first 

> In what is your proposal different from the following
> entry in pg_ident.conf?  As of:
> mapname /^(.*)$ \1

It's very different. I think easiest is to explain by example:

If there exist three users on the postgres server: admin, jelte and michael

Then this rule (your suggested rule):
mapname /^(.*)$ \1

Is equivalent to:
mapname admin admin
mapname jelte jelte
mapname michael michael

While with the "all" keyword you can create a rule like this:
mapname admin all

which is equivalent to:
mapname admin admin
mapname admin jelte
mapname admin michael

Re: Schema variables - new implementation for Postgres 15 (typo)

2023-01-11 Thread Julien Rouhaud

On Tue, Jan 10, 2023 at 08:35:16PM +0100, Pavel Stehule wrote:
> út 10. 1. 2023 v 3:20 odesílatel Julien Rouhaud  napsal:
> >
> > Another new behavior I see is the new rowtype_only parameter for
> > LookupVariable.  Has this been discussed?
> >
>
> I think so it was discussed about table shadowing
>
> without this filter, I lost the message "missing FROM-clause entry for ..."
>
>  -- should fail
>  SELECT varx.xxx;
> -ERROR:  missing FROM-clause entry for table "varx"
> -LINE 1: SELECT varx.xxx;
> -   ^
> +ERROR:  type text is not composite
>  -- don't allow multi column query
>  CREATE TYPE vartesttp AS (a1 int, b1 int, c1 int);
>  CREATE VARIABLE v1 AS vartesttp;
> @@ -1421,9 +1419,7 @@
>  DROP TYPE ab;
>  CREATE VARIABLE myvar AS int;
>  SELECT myvar.blabla;
> -ERROR:  missing FROM-clause entry for table "myvar"
> -LINE 1: SELECT myvar.blabla;
> -   ^
> +ERROR:  type integer is not composite
>  DROP VARIABLE myvar;
>  -- the result of view should be same in parallel mode too
>  CREATE VARIABLE v1 AS int;
>
> My original idea was to try to reduce possible conflicts (in old versions
> of this path, a conflict was disallowed). But it is true, so these "new"
> error messages are sensible too, and with eliminating rowtype_only I can
> reduce code.

Ok!  Another problem is that the error message as-is is highly unhelpful as
it's not clear at all that the problem is coming from an unsuitable variable.
Maybe change makeParamSessionVariable to use lookup_rowtype_tupdesc_noerror()
and emit a friendlier error message?  Something like

variable "X.Y" is of type Z, which is not composite

> I modified the IdentifyVariable function a little bit. With new argument
> noerror I am able to ensure so no error will be raised when this function
> is called just for shadowing detection.

I locally modified IdentifyVariable to emit WARNING reports when noerror is set
to quickly see how it was used and didn't get any regression test error.  This
definitely needs to be covered by regression tests.  Looking as
session_variables.sql, the session_variables_ambiguity_warning GUC doesn't have
a lot of tests in general.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-11 Thread Aleksander Alekseev

Hi Maxim,

> Here is a new patch set.
> I've added comments and make use GetClogDirName call in copy_subdir_files.

Jacob Champion pointed out (offlist, cc:'ed) that we may be wrong on this one:

> 0001 patch changes the SLRU internals without affecting the callers.

> 0001 - should make SLRU internally 64–bit, no effects from "outside"

... and apparently Jacob is right.

Besides other things 0001 modifies CLOG_ZEROPAGE and CLOG_TRUNCATE WAL
records - it makes changes to WriteZeroPageXlogRec() and
WriteTruncateXlogRec() and corresponding changes to clog_desc() and
clog_redo().

Firstly, it means that the patch doesn't change what it claims to
change. I think these changes should be part of 0002.

Secondly, shouldn't we introduce a new WAL record type in order to
make the code backward compatible with previous PG versions? I'm not
100% sure how the upgrade procedure works in all the details. If it
requires the DBMS to be gracefully shut down before the upgrade then
we are probably fine here.

-- 
Best regards,
Aleksander Alekseev

Flush SLRU counters in checkpointer process

2023-01-11 Thread Anthonin Bonnefoy

Hello hackers,

Currently, the Checkpointer process only reports SLRU statistics at server
shutdown, leading to delayed statistics for SLRU flushes. This patch adds a
flush of SLRU stats to the end of checkpoints.

Best regards,
Anthonin


flush-slru-counters.patch
Description: Binary data

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-11 Thread Aleksander Alekseev

Hi Maxim,

> Secondly, shouldn't we introduce a new WAL record type in order to
> make the code backward compatible with previous PG versions? I'm not
> 100% sure how the upgrade procedure works in all the details. If it
> requires the DBMS to be gracefully shut down before the upgrade then
> we are probably fine here.

After reading [1] carefully it looks like we shouldn't worry about
this. The upgrade procedure explicitly requires to run `pg_ctl stop`
during step 8 of the upgrade procedure, i.e. not in the immediate mode
[2]. It also has explicit instructions regarding the replicas. From
what I can tell there is no way they will see WAL records they
wouldn't understand.

[1]: https://www.postgresql.org/docs/current/pgupgrade.html
[2]: https://www.postgresql.org/docs/current/app-pg-ctl.html

-- 
Best regards,
Aleksander Alekseev

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-11 Thread John Naylor

I wrote:

> I see. IIUC from a brief re-reading of the code, saving that chunk would
only save us from re-loading "parent->shift" from L1 cache and shifting the
key. The cycles spent doing that seem small compared to the rest of the
work involved in growing a node. Expressions like "if (idx < 0) return
false;" return to an asserts-only variable, so in production builds, I
would hope that branch gets elided (I haven't checked).

On further reflection, this is completely false and I'm not sure what I was
thinking. However, for the update-inner case maybe we can assert that we
found a valid slot.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread shveta malik

On Tue, Jan 10, 2023 at 7:42 PM Takamichi Osumi (Fujitsu)
 wrote:
>
> On Tuesday, January 3, 2023 4:01 PM vignesh C  wrote:
> Hi, thanks for your review !
>
>
> > 1) This global variable can be removed as it is used only in send_feedback 
> > which
> > is called from maybe_delay_apply so we could pass it as a function argument:
> > + * delay, avoid having positions of the flushed and apply LSN
> > +overwritten by
> > + * the latest LSN.
> > + */
> > +static bool in_delaying_apply = false;
> > +static XLogRecPtr last_received = InvalidXLogRecPtr;
> > +
> I have removed the first variable and make it one of the arguments for 
> send_feedback().
>
> > 2) -1 gets converted to -1000
> >
> > +int64
> > +interval2ms(const Interval *interval)
> > +{
> > +   int64   days;
> > +   int64   ms;
> > +   int64   result;
> > +
> > +   days = interval->month * INT64CONST(30);
> > +   days += interval->day;
> > +
> > +   /* Detect whether the value of interval can cause an overflow. */
> > +   if (pg_mul_s64_overflow(days, MSECS_PER_DAY, &result))
> > +   ereport(ERROR,
> > +
> > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > +errmsg("bigint out of range")));
> > +
> > +   /* Adds portion time (in ms) to the previous result. */
> > +   ms = interval->time / INT64CONST(1000);
> > +   if (pg_add_s64_overflow(result, ms, &result))
> > +   ereport(ERROR,
> > +
> > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > +errmsg("bigint out of range")));
> >
> > create subscription sub7 connection 'dbname=regression host=localhost
> > port=5432' publication pub1 with (min_apply_delay = '-1');
> > ERROR:  -1000 ms is outside the valid range for parameter "min_apply_delay"
> Good catch! Fixed in order to make input '-1' interpretted as -1 ms.
>
> > 3) This can be slightly reworded:
> > +  
> > +   The length of time (ms) to delay the application of changes.
> > +  
> > to:
> > Delay applying the changes by a specified amount of time(ms).
> This has been suggested in [1] by Peter Smith. So, I'd like to keep the 
> current patch's description.
> Then, I didn't change this.
>
> > 4)  maybe_delay_apply can be moved from apply_handle_stream_prepare to
> > apply_spooled_messages so that it is consistent with
> > maybe_start_skipping_changes:
> > @@ -1120,6 +1240,19 @@ apply_handle_stream_prepare(StringInfo s)
> >
> > elog(DEBUG1, "received prepare for streamed transaction %u",
> > prepare_data.xid);
> >
> > +   /*
> > +* Should we delay the current prepared transaction?
> > +*
> > +* Although the delay is applied in BEGIN PREPARE messages,
> > streamed
> > +* prepared transactions apply the delay in a STREAM PREPARE
> > message.
> > +* That's ok because no changes have been applied yet
> > +* (apply_spooled_messages() will do it). The STREAM START message
> > does
> > +* not contain a prepare time (it will be available when the 
> > in-progress
> > +* prepared transaction finishes), hence, it was not possible to 
> > apply a
> > +* delay at that time.
> > +*/
> > +   maybe_delay_apply(prepare_data.prepare_time);
> >
> >
> > That way the call from apply_handle_stream_commit can also be removed.
> Sounds good. I moved the call of maybe_delay_apply() to the 
> apply_spooled_messages().
> Now it's aligned with maybe_start_skipping_changes().
>
> > 5) typo transfering should be transferring
> > +  publisher and the current time on the subscriber. Time
> > spent in logical
> > +  decoding and in transfering the transaction may reduce the
> > actual wait
> > +  time. If the system clocks on publisher and subscriber are
> > + not
> Fixed.
>
> > 6) feedbacks can be changed to feedback messages
> > + * it's necessary to keep sending feedbacks during the delay from the
> > + worker
> > + * process. Meanwhile, the feature delays the apply before starting the
> Fixed.
>
> > 7)
> > + /*
> > + * Suppress overwrites of flushed and writtten positions by the lastest
> > + * LSN in send_feedback().
> > + */
> >
> > 7a) typo writtten should be written
> >
> > 7b) lastest should latest
> I have removed this sentence. So, those typos are removed.
>
> Please have a look at the updated patch.
>
> [1] - 
> https://www.postgresql.org/message-id/CAHut%2BPttQdFMNM2c6WqKt2c9G6r3ZKYRGHm04RR-4p4fyA4WRg%40mail.gmail.com
>
>

Hi,

1.
+ errmsg("min_apply_delay must not be set when streaming = parallel")));
we give the same error msg for  both the cases:
a. when subscription is created with streaming=parallel  but we are
trying to alter subscription to set min_apply_delay >0
b. when subscription is created with some min_apply_delay and we are
trying to alter subscription to make it streaming=parallel.
For case a, error msg looks fine but for case b, I think error msg
should be changed slightly.

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-11 Thread Drouvot, Bertrand


Hi,

On 1/11/23 5:17 AM, Bharath Rupireddy wrote:

On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier  wrote:


On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote:

I like the idea of comparing the full page (and not just the LSN) but
I'm not sure that adding the pageinspect dependency is a good thing.

What about extracting the block directly from the relation file and
comparing it with the one extracted from the WAL? (We'd need to skip the
first 8 bytes to skip the LSN though).


Byte-by-byte counting for the page hole?  


I've in mind to use diff on the whole page (minus the LSN).


The page checksum would
matter as well,


Right, but the TAP test is done without checksum and we could also
skip the checksum from the page if we really want to.


Right. LSN of FPI from the WAL record and page from the table won't be
the same, essentially FPI LSN <= table page. 


Right, that's why I proposed to exclude it for the comparison.

What about something like the attached?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/bin/pg_waldump/t/002_save_fullpage.pl 
b/src/bin/pg_waldump/t/002_save_fullpage.pl
index 5072703a3d..21adee7771 100644
--- a/src/bin/pg_waldump/t/002_save_fullpage.pl
+++ b/src/bin/pg_waldump/t/002_save_fullpage.pl
@@ -10,6 +10,19 @@ use PostgreSQL::Test::Utils;
 use Test::More;
 
 my ($blocksize, $walfile_name);
+my $node = PostgreSQL::Test::Cluster->new('main');
+my $pgdata = $node->data_dir;
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+my ($relname) = @_;
+
+my $rel= $node->safe_psql('postgres',
+qq(SELECT pg_relation_filepath('$relname')));
+die "path not found for relation $relname" unless defined $rel;
+return "$pgdata/$rel";
+}
 
 # Function to extract the LSN from the given block structure
 sub get_block_lsn
@@ -29,7 +42,6 @@ sub get_block_lsn
return ($lsn_hi, $lsn_lo);
 }
 
-my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
 $node->append_conf(
'postgresql.conf', q{
@@ -45,7 +57,8 @@ $node->safe_psql(
 CREATE TABLE test_table AS SELECT generate_series(1,100) a;
 -- Force FPWs on the next writes.
 CHECKPOINT;
-UPDATE test_table SET a = a + 1;
+UPDATE test_table SET a = a + 1 where a = 1;
+CHECKPOINT;
 ");
 
 ($walfile_name, $blocksize) = split '\|' => $node->safe_psql('postgres',
@@ -108,4 +121,62 @@ for my $fullpath (glob "$tmp_folder/raw/*")
 
 ok($file_count > 0, 'verify that at least one block has been saved');
 
+# Check that the pg_waldump saves FPI file.
+my @fpi_files = glob "$tmp_folder/raw/*_main";
+is(scalar(@fpi_files), 1, 'one FPI file was created');
+
+# Now extract the block from the relation file and compare with the FPI
+my $relpath = relation_filepath('test_table');
+my $blk;
+my $blkfpi;
+
+my $frel;
+my $blkfrel;
+my $blkfpifh;
+my $blkfpinolsn;
+
+open($frel, '+<', $relpath)
+  or BAIL_OUT("open failed: $!");
+
+open($blkfrel, '+>', "$tmp_folder/test_table.blk0")
+  or BAIL_OUT("open failed: $!");
+
+open($blkfpifh, '+<', $fpi_files[0])
+  or BAIL_OUT("open failed: $!");
+
+open($blkfpinolsn, '+>', "$tmp_folder/fpinolsn")
+  or BAIL_OUT("open failed: $!");
+
+binmode $frel;
+binmode $blkfrel;
+binmode $blkfpifh;
+binmode $blkfpinolsn;
+
+# Extract the binary data without the LSN from the relation's block
+sysseek($frel, 8, 0); #bypass the LSN
+sysread($frel, $blk, 8184) or die "sysread failed: $!";
+syswrite($blkfrel, $blk) or die "syswrite failed: $!";
+
+# Extract the binary data without the LSN from the FPI
+sysseek($blkfpifh, 8, 0); #bypass the LSN
+sysread($blkfpifh, $blkfpi, 8184) or die "sysread failed: $!";
+syswrite($blkfpinolsn, $blkfpi) or die "syswrite failed: $!";
+
+close($frel)
+  or BAIL_OUT("close failed: $!");
+
+close($blkfrel)
+  or BAIL_OUT("close failed: $!");
+
+close($blkfpifh)
+  or BAIL_OUT("close failed: $!");
+
+close($blkfpinolsn)
+  or BAIL_OUT("close failed: $!");
+
+# Compare the blocks (LSN excluded)
+command_ok(
+[ 'diff', $tmp_folder . '/fpinolsn', $tmp_folder . '/test_table.blk0' ],
+'compare both pages');
+
 done_testing();

RE: [Proposal] Add foreign-server health checks infrastructure

Dear hackers,

I was not sure, but the cfbot could not be accepted the previous version.
I made the patch again from HEAD(5f6401) without any changes,
so I did not count up the version number.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
Description:  v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch


v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v24-0003-add-test.patch
Description: v24-0003-add-test.patch

Re: Common function for percent placeholder replacement


On 09.01.23 18:53, Nathan Bossart wrote:

+   nativePath = pstrdup(path);
+   make_native_path(nativePath);



+   nativePath = pstrdup(xlogpath);
+   make_native_path(nativePath);


Should these be freed?


committed with that fixed

Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-01-11 Thread Sandro Santilli

On Tue, Jan 10, 2023 at 06:50:31PM -0500, Tom Lane wrote:

> With the proposed % feature, if foo--%--3.0.sql exists then the
> system will invoke it and expect the end result to be a valid
> 3.0 installation, whether or not the script actually has any
> ability to do a downgrade. 

It is sane, for the system to expect the end result
to be a valid 3.0 installation if no exception is
thrown by the script itself. 

If we ship foo--%--3.0.sql we must have been taken care of
protecting from unsupported downgrades/upgrades (and we did,
having the script throw an exception if anything is unexpected).

> (I suppose you could add code to look at pg_extension.extversion,

We actually added code looking at our own version-extracting
function (which existed since before PostgreSQL added support
for extensions). Is the function not there ? Raise an exception.
Is the function not owned by the extension ? Raise an exception.
In other cases -> trust the output of that function to tell what
version we're coming from, throw an exception if upgrade to the
target version is unsupported.

> to add such code is that really better than having a number of
> one-liner scripts implementing the same check declaratively?)

Yes, it is, because no matter how many one-liner scripts we install
(we currently install 87 * 6 such scripts, we always end up missing
some of them upon releasing a new bug-fix release in stable branches.

> almost certainly you are
> going to end with an extension containing some leftover 4.0
> objects, some 3.0 objects, and maybe some objects with properties
> that don't exactly match either 3.0 or 4.0. 

This is already possible, depending on WHO writes those upgrade
scripts. This proposal just gives more expressiveness power to
those script authors.

> So I really think this is a case of "be careful what you ask
> for, you might get it".  Even if PostGIS is willing to put in
> the amount of infrastructure legwork needed to make such a
> design bulletproof, I'm quite sure nobody else will manage
> to use such a thing successfully.

This is why I initially made this something to be explicitly enabled
by the .control file, which I can do again if it feels safer for you.

> I'd rather spend our
> development effort on a feature that has more than one use-case.

Use case is any extension willing to support more than a single stable
branch while still allowing upgrading from newer-stable-bugfix-release
to older-feature-release (ie: 3.2.10 -> 3.4.0 ). Does not seem so
uncommon to me, for a big project. Maybe there aren't enough big
extension-based projects out there ?

--strk; 

  Libre GIS consultant/developer
  https://strk.kbt.io/services.html

Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-11 Thread Will Mortensen

Hi Marco, thanks for the reply! Glad to know you'd find it useful too. :-)

On Tue, Jan 10, 2023 at 1:01 AM Marco Slot  wrote:
> I'm wondering whether it could be an option of the LOCK command.
> (LOCK WAIT ONLY?)

I assume that's doable, but just from looking at the docs, it might be
a little confusing. For example, at least if we use
WaitForLockersMultiple(), waiting for multiple tables would happen in
parallel (which I think is good), while locking them is documented to
happen sequentially. Also, normal LOCK is illegal outside a
transaction, but waiting makes perfect sense. (Actually, normal LOCK
makes sense too, if the goal was just to wait. :-) )

By contrast, while LOCK has NOWAIT, and SELECT's locking clause
has NOWAIT and SKIP LOCKED, they only change the blocking/failure
behavior, while success still means taking the lock and has the same
semantics.

But I'm really no expert on SQL syntax or typical practice for things like
this. Anything that works is fine with me. :-)

As a possibly superfluous sidebar, I wanted to correct this part of my
original message:

> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen  wrote:
> > pg_sequence_last_value() (still undocumented!) can be used to
> > obtain an instantaneous upper bound on the sequence values
> > that have been returned by nextval(), even if the transaction
> > that called nextval() hasn't yet committed.

This is true, but not the most important part of making this scheme
work: as you mentioned in the Citus blog post, to avoid missing rows,
we need (and this gives us) an instantaneous *lower* bound on the
sequence values that could be used by transactions that commit after
we finish waiting (and start processing). This doesn't work with
sequence caching, since without somehow inspecting all sessions'
sequence caches, rows with arbitrarily old/low cached sequence
values could be committed arbitrarily far into the future, and we'd
fail to process them.

As you also implied in the blog post, the upper bound is what
allows us to also process each row *exactly* once (instead of at
least once) and in sequence order, if desired.

So those are the respective justifications for both arms of the
WHERE clause: id > min_id AND id <= max_id .

On Tue, Jan 10, 2023 at 1:01 AM Marco Slot  wrote:
>
> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen  wrote:
> > We'd like to be able to call the lock manager's WaitForLockers() and
> > WaitForLockersMultiple() from SQL. Below I describe our use case, but
> > basically I'm wondering if this:
> >
> > 1. Seems like a reasonable thing to do
> >
> > 2. Would be of interest upstream
> >
> > 3. Should be done with a new pg_foo() function (taking an
> >oid?), or a whole new SQL command, or something else
>
> Definitely +1 on adding a function/syntax to wait for lockers without
> actually taking a lock. The get sequence value + lock-and-release
> approach is still the only reliable scheme I've found for reliably and
> efficiently processing new inserts in PostgreSQL. I'm wondering
> whether it could be an option of the LOCK command. (LOCK WAIT ONLY?)
>
> Marco

Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-01-11 Thread 'Sandro Santilli'

On Tue, Jan 10, 2023 at 11:09:23PM -0500, Regina Obe wrote:

> The only way we can fix that in the current setup, is to move to a minor
> version mode which means we can 
> never do micro updates.

Or just not with standard PostgreSQL syntax, because we can of course
do upgrades using the `SELECT postgis_extensions_upgrade()` call at
the moment, which, if you ask me, sounds MORE dangerous than the
wildcard upgrade approach because the _implementation_ of that
function will always be the OLD implementation, never the NEW one,
so if bogus cannot be fixed by a new release w/out a way to upgrade
there...

--strk;

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread shveta malik

On Wed, Jan 11, 2023 at 3:27 PM shveta malik  wrote:
>
> On Tue, Jan 10, 2023 at 7:42 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> > On Tuesday, January 3, 2023 4:01 PM vignesh C  wrote:
> > Hi, thanks for your review !
> >
> >
> > > 1) This global variable can be removed as it is used only in 
> > > send_feedback which
> > > is called from maybe_delay_apply so we could pass it as a function 
> > > argument:
> > > + * delay, avoid having positions of the flushed and apply LSN
> > > +overwritten by
> > > + * the latest LSN.
> > > + */
> > > +static bool in_delaying_apply = false;
> > > +static XLogRecPtr last_received = InvalidXLogRecPtr;
> > > +
> > I have removed the first variable and make it one of the arguments for 
> > send_feedback().
> >
> > > 2) -1 gets converted to -1000
> > >
> > > +int64
> > > +interval2ms(const Interval *interval)
> > > +{
> > > +   int64   days;
> > > +   int64   ms;
> > > +   int64   result;
> > > +
> > > +   days = interval->month * INT64CONST(30);
> > > +   days += interval->day;
> > > +
> > > +   /* Detect whether the value of interval can cause an overflow. */
> > > +   if (pg_mul_s64_overflow(days, MSECS_PER_DAY, &result))
> > > +   ereport(ERROR,
> > > +
> > > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > > +errmsg("bigint out of range")));
> > > +
> > > +   /* Adds portion time (in ms) to the previous result. */
> > > +   ms = interval->time / INT64CONST(1000);
> > > +   if (pg_add_s64_overflow(result, ms, &result))
> > > +   ereport(ERROR,
> > > +
> > > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > > +errmsg("bigint out of range")));
> > >
> > > create subscription sub7 connection 'dbname=regression host=localhost
> > > port=5432' publication pub1 with (min_apply_delay = '-1');
> > > ERROR:  -1000 ms is outside the valid range for parameter 
> > > "min_apply_delay"
> > Good catch! Fixed in order to make input '-1' interpretted as -1 ms.
> >
> > > 3) This can be slightly reworded:
> > > +  
> > > +   The length of time (ms) to delay the application of changes.
> > > +  
> > > to:
> > > Delay applying the changes by a specified amount of time(ms).
> > This has been suggested in [1] by Peter Smith. So, I'd like to keep the 
> > current patch's description.
> > Then, I didn't change this.
> >
> > > 4)  maybe_delay_apply can be moved from apply_handle_stream_prepare to
> > > apply_spooled_messages so that it is consistent with
> > > maybe_start_skipping_changes:
> > > @@ -1120,6 +1240,19 @@ apply_handle_stream_prepare(StringInfo s)
> > >
> > > elog(DEBUG1, "received prepare for streamed transaction %u",
> > > prepare_data.xid);
> > >
> > > +   /*
> > > +* Should we delay the current prepared transaction?
> > > +*
> > > +* Although the delay is applied in BEGIN PREPARE messages,
> > > streamed
> > > +* prepared transactions apply the delay in a STREAM PREPARE
> > > message.
> > > +* That's ok because no changes have been applied yet
> > > +* (apply_spooled_messages() will do it). The STREAM START message
> > > does
> > > +* not contain a prepare time (it will be available when the 
> > > in-progress
> > > +* prepared transaction finishes), hence, it was not possible to 
> > > apply a
> > > +* delay at that time.
> > > +*/
> > > +   maybe_delay_apply(prepare_data.prepare_time);
> > >
> > >
> > > That way the call from apply_handle_stream_commit can also be removed.
> > Sounds good. I moved the call of maybe_delay_apply() to the 
> > apply_spooled_messages().
> > Now it's aligned with maybe_start_skipping_changes().
> >
> > > 5) typo transfering should be transferring
> > > +  publisher and the current time on the subscriber. Time
> > > spent in logical
> > > +  decoding and in transfering the transaction may reduce the
> > > actual wait
> > > +  time. If the system clocks on publisher and subscriber are
> > > + not
> > Fixed.
> >
> > > 6) feedbacks can be changed to feedback messages
> > > + * it's necessary to keep sending feedbacks during the delay from the
> > > + worker
> > > + * process. Meanwhile, the feature delays the apply before starting the
> > Fixed.
> >
> > > 7)
> > > + /*
> > > + * Suppress overwrites of flushed and writtten positions by the lastest
> > > + * LSN in send_feedback().
> > > + */
> > >
> > > 7a) typo writtten should be written
> > >
> > > 7b) lastest should latest
> > I have removed this sentence. So, those typos are removed.
> >
> > Please have a look at the updated patch.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CAHut%2BPttQdFMNM2c6WqKt2c9G6r3ZKYRGHm04RR-4p4fyA4WRg%40mail.gmail.com
> >
> >
>
> Hi,
>
> 1.
> + errmsg("min_apply_delay must not be set when streaming = parallel")));
> we give the same error msg for  both the cases:
> a. when subscr

RE: Logical replication timeout problem

2023-01-11 Thread wangw.f...@fujitsu.com

On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
>

Thanks for your comments.

> One more thing, I think it would be better to expose a new callback
> API via reorder buffer as suggested previously [2] similar to other
> reorder buffer APIs instead of directly using reorderbuffer API to
> invoke plugin API.

Yes, I agree. I think it would be better to add a new callback API on the HEAD.
So, I improved the fix approach:
Introduce a new optional callback to update the process. This callback function
is invoked at the end inside the main loop of the function
ReorderBufferProcessTXN() for each change. In this way, I think it seems that
similar timeout problems could be avoided.

BTW, I did the performance test for this patch. When running the SQL that
reproduces the problem (refresh the materialized view in sync logical
replication mode), the running time of new function pgoutput_update_progress is
less than 0.1% of the total time. I think this result looks OK.

Attach the new patch.

Regards,
Wang Wei

v2-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v2-0001-Fix-the-logical-replication-timeout-during-proces.patch

Re: Allow logical replication to copy tables in binary format

2023-01-11 Thread Melih Mutlu

Hi,

Thanks for your review.

shiy.f...@fujitsu.com , 11 Oca 2023 Çar, 11:56
tarihinde şunu yazdı:

> On Mon, Nov 14, 2022 8:08 PM Melih Mutlu  wrote:
> 1.
> +# Binary enabled subscription should fail
> +$node_subscriber_binary->wait_for_log("ERROR:  insufficient data left in
> message");
>
> Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription
> tests.
>

Done.


> 2.
> +# Binary disabled subscription should succeed
> +$node_publisher->wait_for_catchup('tap_sub');
>
> If we want to wait for table synchronization to finish, should we call
> wait_for_subscription_sync()?
>

Done.


> 3.
> I also think it might be better to support copy binary only for publishers
> of
> v16 or later. Do you plan to implement it in the patch?
>

Done.

Thanks,
-- 
Melih Mutlu
Microsoft


v5-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data

Adjust the description of OutputPluginCallbacks in pg-doc

2023-01-11 Thread wangw.f...@fujitsu.com

Hi,

When I was reading the "Logical Decoding Output Plugins" chapter in pg-doc [1],
I think in the summary section, only the callback message_cb is not described
whether it is required or optional, and the description of callback
stream_prepare_cb seems inaccurate.

And after the summary section, I think only the callback stream_xxx_cb section
and the callback truncate_cb section are not described this tag (are they
required or optional).

I think we could improve these to be more reader friendly. So I tried to write
a patch for these and attach it.

Any thoughts?

Regards,
Wang Wei

[1] - https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html


v1-0001-Adjust-the-description-of-OutputPluginCallbacks-i.patch
Description:  v1-0001-Adjust-the-description-of-OutputPluginCallbacks-i.patch

Re: meson oddities


On 04.01.23 23:53, Andres Freund wrote:

  dir_data = get_option('datadir')
-if not (dir_data.contains('pgsql') or dir_data.contains('postgres'))
+if not ((dir_prefix/dir_data).contains('pgsql') or 
(dir_prefix/dir_data).contains('postgres'))
dir_data = dir_data / pkg
  endif

Hm. Perhaps we should just test once whether prefix contains pgsql/postgres,
and then just otherwise leave the test as is? There afaict can't be a
dir_prefix/dir_* that matches postgres/pgsql that won't also match either of
the components.


You mean something like

dir_prefix_contains_pg =
  (dir_prefix.contains('pgsql') or dir_prefix.contains('postgres'))

and

if not (dir_prefix_contains_pg or
   (dir_data.contains('pgsql') or dir_data.contains('postgres'))

Seems more complicated to me.

I think there is also an adjacent issue:  The subdir options may be 
absolute or relative.  So if you specify --prefix=/usr/local and 
--sysconfdir=/etc/postgresql, then


config_paths_data.set_quoted('SYSCONFDIR', dir_prefix / dir_sysconf)

would produce something like /usr/local/etc/postgresql.

I think maybe we should make all the dir_* variables absolute right at 
the beginning, like


dir_lib = get_option('libdir')
if not fs.is_absolute(dir_lib)
dir_lib = dir_prefix / dir_lib
endif

And then the appending stuff could be done after that, keeping the 
current code.

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 09:04:56AM +, Jelte Fennema wrote:
> It's very different. I think easiest is to explain by example:
> 
> If there exist three users on the postgres server: admin, jelte and michael
> 
> Then this rule (your suggested rule):
> mapname /^(.*)$ \1
> 
> Is equivalent to:
> mapname admin admin
> mapname jelte jelte
> mapname michael michael
> 
> While with the "all" keyword you can create a rule like this:
> mapname admin all
> 
> which is equivalent to:
> mapname admin admin
> mapname admin jelte
> mapname admin michael

Thanks for the explanation, I was missing your point.  Hmm.  On top
of my mind, couldn't we also use a regexp for the pg-role rather than
just a hardcoded keyword here then, so as it would be possible to
allow a mapping to pass for a group of role names?  "all" is just a
pattern to allow everything, at the end.
--
Michael


signature.asc
Description: PGP signature

Re: Time delayed LR (WAS Re: logical replication restrictions)

On Tue, 10 Jan 2023 at 19:41, Takamichi Osumi (Fujitsu)
 wrote:
>
> On Tuesday, January 3, 2023 4:01 PM vignesh C  wrote:
> Hi, thanks for your review !
>
> Please have a look at the updated patch.

Thanks for the updated patch, few comments:
1) Comment inconsistency across create and alter subscription, better
to keep it same:
+   /*
+* Do additional checking for disallowed combination when
min_apply_delay
+* was not zero.
+*/
+   if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
+   opts->min_apply_delay > 0)
+   {
+   if (opts->streaming == LOGICALREP_STREAM_PARALLEL)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR)),
+   errmsg("min_apply_delay must
not be set when streaming = parallel"));
+   }

+   /*
+* Test the combination of
streaming mode and
+* min_apply_delay
+*/
+   if (opts.streaming ==
LOGICALREP_STREAM_PARALLEL &&
+   sub->minapplydelay > 0)
+   ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+
errmsg("min_apply_delay must not be set when streaming = parallel")));

2) ereport inconsistency, braces around errcode is present in few
places and not present in few places, it is better to keep it
consistent by removing it:
2.a)
+   if (opts->streaming == LOGICALREP_STREAM_PARALLEL)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR)),
+   errmsg("min_apply_delay must
not be set when streaming = parallel"));

2.b)
+   if (opts.streaming ==
LOGICALREP_STREAM_PARALLEL &&
+   sub->minapplydelay > 0)
+   ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+
errmsg("min_apply_delay must not be set when streaming = parallel")));

2.c)
+   if (opts.min_apply_delay > 0 &&
+   sub->stream ==
LOGICALREP_STREAM_PARALLEL)
+   ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+
errmsg("min_apply_delay must not be set when streaming = parallel")));

2.d)
+   if (pg_mul_s64_overflow(days, MSECS_PER_DAY, &result))
+   ereport(ERROR,
+   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("bigint out of range")));

2.e)
+   if (pg_add_s64_overflow(result, ms, &result))
+   ereport(ERROR,
+   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("bigint out of range")));

3) this include is not required, I could compile without it
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -48,6 +48,7 @@
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"

4)
4.a)
Should this be changed:
/* Adds portion time (in ms) to the previous result. */
to
/* Adds portion time (in ms) to the previous result */

4.b)
Should this be changed:
/* Detect whether the value of interval can cause an overflow. */
to
/* Detect whether the value of interval can cause an overflow */

5) Can this "ALTER SUBSCRIPTION regress_testsub SET (min_apply_delay =
'1d')" be combined along with "-- success -- 123 ms", that way few
statements could be reduced
+-- success -- 8640 ms
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, min_apply_delay = 123);
+ALTER SUBSCRIPTION regress_testsub SET (min_apply_delay = '1d');
+
+\dRs+
+
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;

6) Can we do the interval testing along with alter subscription and
combined with "-- success -- 123 ms" test, that way few statements
could be reduced
+-- success -- interval is converted into ms and stored as integer
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, min_apply_delay = '4h 27min 35s');
+
+\dRs+
+
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;

Regards,
Vignesh

Re: Flush SLRU counters in checkpointer process

2023-01-11 Thread Aleksander Alekseev

Hi Anthonin,

> This patch adds a flush of SLRU stats to the end of checkpoints.

The patch looks good to me and passes the tests but let's see if
anyone else has any feedback.

Also I added a CF entry: https://commitfest.postgresql.org/42/4120/

-- 
Best regards,
Aleksander Alekseev

RE: Time delayed LR (WAS Re: logical replication restrictions)

Dear Shveta,

Thanks for reviewing! PSA new version.

> 1.
> + errmsg("min_apply_delay must not be set when streaming = parallel")));
> we give the same error msg for  both the cases:
> a. when subscription is created with streaming=parallel  but we are
> trying to alter subscription to set min_apply_delay >0
> b. when subscription is created with some min_apply_delay and we are
> trying to alter subscription to make it streaming=parallel.
> For case a, error msg looks fine but for case b, I think error msg
> should be changed slightly.
> ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
> ERROR:  min_apply_delay must not be set when streaming = parallel
> This gives the feeling that we are trying to modify min_apply_delay
> but we are not. Maybe we can change it to:
> "subscription with min_apply_delay must not be allowed to stream
> parallel" (or something better)

Your point that error messages are strange is right. And while
checking other ones, I found they have very similar styles. Therefore I reworded
ERROR messages in AlterSubscription() and parse_subscription_options() to follow
them. Which version is better?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v14-0001-Time-delayed-logical-replication-subscriber.patch
Description:  v14-0001-Time-delayed-logical-replication-subscriber.patch

RE: Time delayed LR (WAS Re: logical replication restrictions)

> 2.
> I think users can set ' wal_receiver_status_interval ' to 0 or more
> than 'wal_sender_timeout'. But is this a frequent use-case scenario or
> do we see DBAs setting these in such a way by mistake? If so, then I
> think, it is better to give Warning message in such a case when a user
> tries to create or alter a subscription with a large 'min_apply_delay'
> (>=  'wal_sender_timeout') , rather than leaving it to the user's
> understanding that WalSender may repeatedly timeout in such a case.
> Parse_subscription_options and AlterSubscription can be modified to
> log a warning. Any thoughts?

Yes, DBAs may set wal_receiver_status_interval to more than wal_sender_timeout 
by
mistake.

But to handle the scenario we must compare between min_apply_delay *on 
subscriber*
and wal_sender_timeout *on publisher*. Both values are not transferred to 
opposite
sides, so the WARNING cannot be raised. I considered that such a mechanism 
seemed
to be complex. The discussion around [1] may be useful.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1Lq%2Bh8qo%2BrqGU-E%2BhwJKAHYocV54y4pvou4rLysCgYD-g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: Time delayed LR (WAS Re: logical replication restrictions)

Dear Vignesh,

Thanks for reviewing!

> 1) Comment inconsistency across create and alter subscription, better
> to keep it same:

A comment for CREATE SUBSCRIPTION became same as ALTER's one.

> 2) ereport inconsistency, braces around errcode is present in few
> places and not present in few places, it is better to keep it
> consistent by removing it:

Removed.

> 3) this include is not required, I could compile without it

Removed. Timestamp datatype is not used in subscriptioncmds.c.

> 4)
> 4.a)
> Should this be changed:
> /* Adds portion time (in ms) to the previous result. */
> to
> /* Adds portion time (in ms) to the previous result */

Changed.

> 4.b)
> Should this be changed:
> /* Detect whether the value of interval can cause an overflow. */
> to
> /* Detect whether the value of interval can cause an overflow */

Changed.

> 5) Can this "ALTER SUBSCRIPTION regress_testsub SET (min_apply_delay =
> '1d')" be combined along with "-- success -- 123 ms", that way few
> statements could be reduced

> 6) Can we do the interval testing along with alter subscription and
> combined with "-- success -- 123 ms" test, that way few statements
> could be reduced

To keep the code coverage, either of them must remain. 5) was cleanly removed 
and
6) was combined to you suggested. In addition, comments were updated to clarify
the testcase.

Please have a look at the latest patch v14 in [1].

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866D0527B1B8D589F1C2551F5FC9%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PATCH] support tab-completion for single quote input with equal sign

2023-01-11 Thread torikoshia


On 2023-01-11 11:28, Tom Lane wrote:

I wrote:

I've spent some effort previously on getting tab-completion to deal
sanely with single-quoted strings, but everything I've tried has
crashed and burned :-(, mainly because it's not clear when to take
the whole literal as one "word" and when not.


After a little further thought, a new idea occurred to me: maybe
we could push some of the problem down into the Matches functions.
Consider inventing a couple of new match primitives:


Thanks for the idea!
I'm going to try it.

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: fix and document CLUSTER privileges

2023-01-11 Thread Gilles Darold


Le 06/01/2023 à 01:26, Nathan Bossart a écrit :

Apparently I forgot to run all the tests before posting v4.  Here is a new
version of the patch that should pass all tests.


Review status:


The patch applies and compiles without issues, make check and 
checkinstall tests are running without error.


It aim to limit the permission check to run the CLUSTER command on a 
partition to ownership and the MAINTAIN privilege. Which it actually does.


In commit 3f19e17, to have CLUSTER ignore partitions not owned by 
caller, there was still a useless check of database ownership or shared 
relation in get_tables_to_cluster_partitioned().



Documentation have been updated to explain the conditions of a 
successful execution of the CLUSTER command.



Additionally this patch also adds a warning when a partition is skipped 
due to lack of permission just like VACUUM is doing:


    WARNING:  permission denied to vacuum "ptnowner2", skipping it

with CLUSTER now we have the same message:

    WARNING:  permission denied to cluster "ptnowner2", skipping it

Previous behavior was to skip the partition silently.


Tests on the CLUSTER command have been modified to skip warning messages 
except partially in src/test/regress/sql/cluster.sql to validate the 
presence of the warning.



I'm moving this commitfest entry to Ready for Committers.


Regards,

--
Gilles Darold

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-11 Thread Bharath Rupireddy

On Wed, Jan 11, 2023 at 10:07 AM Michael Paquier  wrote:
>
> +postgres=# SELECT lsn, tablespace_oid, database_oid, relfile_number,
> block_number, fork_name, length(fpi) > 0 as fpi_ok FROM
> pg_get_wal_fpi_info('0/7418E60', '0/7518218');
>
> This query in the docs is too long IMO.  Could you split that across
> multiple lines for readability?

Done.

> +  pg_get_wal_fpi_info(start_lsn pg_lsn,
> +  end_lsn pg_lsn,
> +  lsn OUT pg_lsn,
> +  tablespace_oid OUT oid,
> +  database_oid OUT oid,
> +  relfile_number OUT oid,
> +  block_number OUT int8,
> +  fork_name OUT text,
> +  fpi OUT bytea)
> I am a bit surprised by this format, used to define the functions part
> of the module in the docs, while we have examples that actually show
> what's printed out.  I understand that this comes from the original
> commit of the module, but the rendered docs are really hard to parse
> as well, no?  FWIW, I think that this had better be fixed as well in
> the docs of v15..  Showing a full set of attributes for the returned
> record is fine by me, still if these are too long we could just use
> \x.

Thanks. I'll work on that separately.

> For this one, I think that there is little point in showing 14
> records, so I would stick with a style similar to pageinspect.

I've done it that way for pg_get_wal_fpi_info. If this format looks
okay, I can propose to do the same for other functions (for
backpatching too) in a separate thread though.

> +CREATE FUNCTION pg_get_wal_fpi_info(IN start_lsn pg_lsn,
> +IN end_lsn pg_lsn,
> +OUT lsn pg_lsn,
> +   OUT tablespace_oid oid,
> Slight indentation issue here.

Done.

> Using "relfile_number" would be a first, for what is defined in the
> code and the docs as a filenode.

Yes, I've changed the column names to be consistent (like in pg_buffercache).

> +SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
> +-- Get FPI from WAL record
> +SELECT fpi AS page_from_wal FROM pg_get_wal_fpi_info(:'wal_lsn3', 
> :'wal_lsn4')
> +  WHERE relfile_number = :'sample_tbl_oid' \gset
> I would be tempted to keep the checks run here minimal with only a
> basic set of checks on the LSN, without the dependencies to
> pageinspect (tuple_data_split and get_raw_page), which would be fine
> enough to check the execution of the function.

I understand the concern here that creating dependency between
extensions just for testing isn't good.

I'm okay to just read the LSN (lsn1) from raw FPI (bytea stream) and
the WAL record's LSN (lsn2) and compare them to be lsn2 > lsn1. I'm
looking for a way to convert the first 8 bytes from bytea stream to
pg_lsn type, on a quick look I couldn't find direct conversion
functions, however, I'll try to figure out a way.

> FWIW, I am surprised by the design choice behind ValidateInputLSNs()
> to allow data to be gathered until the end of WAL in some cases, but
> to not allow it in others.  It is likely too late to come back to this
> choice for the existing functions in v15 (quoique?), but couldn't it

Separate functions for users passing end_lsn by themselves and users
letting functions decide the end_lsn (current flush LSN or replay LSN)
were chosen for better and easier usability and easier validation of
user-entered input lsns.

We deliberated to have something like below:
pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false);
pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false);

We wanted to have better validation of the start_lsn and end_lsn, that
is, start_lsn < end_lsn and end_lsn mustn't fall into the future when
users specify it by themselves (otherwise, one can easily trick the
server by passing in the extreme end of the LSN - 0x).
And, we couldn't find a better way to deal with when till_end_of_wal
is passed as true (in the above version of the functions).

Another idea was to have something like below:
pg_get_wal_stats(start_lsn, end_lsn default '0/0');
pg_get_wal_records_info(start_lsn, end_lsn default '0/0');

When end_lsn is not entered or entered as invalid lsn, then return the
stats/info till end of the WAL. Again, we wanted to have some
validation of the user-entered end_lsn.

Instead of cooking multiple behaviours into a single function we opted
for till_end_of_wal versions. I still feel this is better unless
there's a strong reason against till_end_of_wal versions.

> be useful to make this new FPI function work at least with an insanely
> high LSN value to make sure that we fetch all the FPIs from a given
> start position, up to the end of WAL?  That looks like a pretty good
> default behavior to me, rather than issuing an error when a LSN is
> defined as in the future..  I am really wondering why we have
> ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
> just allow any LSN value in t

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-11 Thread Bharath Rupireddy

On Wed, Jan 11, 2023 at 3:28 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 1/11/23 5:17 AM, Bharath Rupireddy wrote:
> > On Wed, Jan 11, 2023 at 6:32 AM Michael Paquier  wrote:
> >>
> >> On Tue, Jan 10, 2023 at 05:25:44PM +0100, Drouvot, Bertrand wrote:
> >>> I like the idea of comparing the full page (and not just the LSN) but
> >>> I'm not sure that adding the pageinspect dependency is a good thing.
> >>>
> >>> What about extracting the block directly from the relation file and
> >>> comparing it with the one extracted from the WAL? (We'd need to skip the
> >>> first 8 bytes to skip the LSN though).
> >>
> >> Byte-by-byte counting for the page hole?
>
> I've in mind to use diff on the whole page (minus the LSN).
>
> >> The page checksum would
> >> matter as well,
>
> Right, but the TAP test is done without checksum and we could also
> skip the checksum from the page if we really want to.
>
> > Right. LSN of FPI from the WAL record and page from the table won't be
> > the same, essentially FPI LSN <= table page.
>
> Right, that's why I proposed to exclude it for the comparison.
>
> What about something like the attached?

Note that the raw page on the table might differ not just in page LSN
but also in other fields, for instance see heap_mask for instance. It
masks lsn, checksum, hint bits, unused space etc. before verifying FPI
consistency during recovery in
verifyBackupPageConsistency().

I think the job of verifying FPI from WAL record with the page LSN is
better left to the core - via verifyBackupPageConsistency(). Honestly,
pg_waldump is good with what it has currently - LSN checks.

+# Extract the binary data without the LSN from the relation's block
+sysseek($frel, 8, 0); #bypass the LSN
+sysread($frel, $blk, 8184) or die "sysread failed: $!";
+syswrite($blkfrel, $blk) or die "syswrite failed: $!";

I suspect that these tests are portable with the hardcoded values such as above.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Rework of collation code, extensibility


On 06.01.23 08:04, Jeff Davis wrote:

The existing code is not great, in my opinion: it doesn't have clear
API boundaries, the comments are insufficient, and lots of special
cases need to be handled awkwardly by callers. That style is hard to
beat when it comes to the raw line count; but it's quite difficult to
understand and work on.

I think my changes are an improvement, but obviously that depends on
the opinion of others who are working in this part of the code. What do
you think?


I think the refactoring that you proposed in the thread "Refactor to 
introduce pg_strcoll()." was on a sensible track.  Maybe we should try 
to get that done.  The multiple-ICU stuff is still experimental and has 
its own rather impressive thread, so I don't think it's sensible to try 
to sort that out here.

Re: doc: add missing "id" attributes to extension packaging page


On 09.01.23 21:18, Tom Lane wrote:

* AFAIK our practice is to use "-" never "_" in XML ID attributes.
You weren't very consistent about that even within this patch, and
the overall effect would have been to have no standard about that
at all, which doesn't seem great.  I changed them all to "-".


In the olden says, "_" was invalid in ID attribute values.  This is no 
longer the case.  But of course it's good to stay consistent with 
existing practice where reasonable.

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

> couldn't we also use a regexp for the pg-role rather than
> just a hardcoded keyword here then, so as it would be possible to
> allow a mapping to pass for a group of role names?  "all" is just a
> pattern to allow everything, at the end.

That's a good point. I hadn't realised that you added support for
regexes in pg_hba.conf in 8fea868. Attached is a patchset
where I reuse the pg_hba.conf code path to add support to
pg_ident.conf for: all, +group and regexes.

The main uncertainty I have is if the case insensitivity check is
actually needed in check_role. It seems like a case insensitive
check against the database user shouldn't actually be necessary.
I only understand the need for the case insensitive check against
the system user. But I have too little experience with LDAP/kerberos
to say for certain. So for now I kept the existing behaviour to
not regress.

The patchset also contains 3 preparatory patches with two refactoring
passes and one small bugfix + test additions.

> - renaming "systemuser" to "system_user_token" to outline that this is
> not a simple string but an AuthToken with potentially a regexp?

I decided against this, since now both system user and database user
are tokens. Furthermore, compiler warnings should avoid any confusion
against using this as a normal string. If you feel strongly about this
though, I'm happy to change this.

On Wed, 11 Jan 2023 at 14:34, Michael Paquier  wrote:
>
> On Wed, Jan 11, 2023 at 09:04:56AM +, Jelte Fennema wrote:
> > It's very different. I think easiest is to explain by example:
> >
> > If there exist three users on the postgres server: admin, jelte and michael
> >
> > Then this rule (your suggested rule):
> > mapname /^(.*)$ \1
> >
> > Is equivalent to:
> > mapname admin admin
> > mapname jelte jelte
> > mapname michael michael
> >
> > While with the "all" keyword you can create a rule like this:
> > mapname admin all
> >
> > which is equivalent to:
> > mapname admin admin
> > mapname admin jelte
> > mapname admin michael
>
> Thanks for the explanation, I was missing your point.  Hmm.  On top
> of my mind, couldn't we also use a regexp for the pg-role rather than
> just a hardcoded keyword here then, so as it would be possible to
> allow a mapping to pass for a group of role names?  "all" is just a
> pattern to allow everything, at the end.
> --
> Michael

v3-0004-Support-same-user-patterns-in-pg_ident.conf-as-in.patch
Description: Binary data

v3-0002-Store-pg_user-as-AuthToken-in-IdentLine.patch
Description: Binary data

v3-0003-Only-expand-1-in-pg_ident.conf-when-not-quoted.patch
Description: Binary data

v3-0001-Make-naming-in-code-for-username-maps-consistent.patch
Description: Binary data

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

LGTM. As far as I can tell this is ready for a committer.

On Wed, 11 Jan 2023 at 00:15, Jacob Champion  wrote:
>
> On Mon, Jan 9, 2023 at 7:07 AM Jelte Fennema  wrote:
> > I also took a closer look at the code, and the only comment I have is:
> >
> > > appendPQExpBuffer(&conn->errorMessage,
> >
> > These calls can all be replaced by the recently added 
> > libpq_append_conn_error
>
> Argh, thanks for the catch. Fixed.
>
> > Finally I tested this against a Postgres server I created on Azure and
> > the new value works as expected. The only thing that I think would be
> > good to change is the error message when sslmode=verify-full, and
> > sslrootcert is not provided, but ~/.postgresql/root.crt is also not 
> > available.
> > I think it would be good for the error to mention sslrootcert=system
>
> Good idea. The wording I chose in v6 is
>
> Either provide the file, use the system's trusted roots with
> sslrootcert=system, or change sslmode to disable server certificate
> verification.
>
> What do you think?
>
> Thanks!
> --Jacob

Re: Printing backtrace of postgres processes

2023-01-11 Thread Bharath Rupireddy

On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy
 wrote:
>
> I'm attaching the v22 patch set for further review.

Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2.
Attaching v23 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From ee5c26f0d4e2e211166250857ea42d30a3666709 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 11 Jan 2023 13:32:13 +
Subject: [PATCH v23] Move sending multiplexed-SIGUSR1 signal code to a
 function

Add a new function hosting the common code for sending
multiplexed-SIGUSR1 signal to a backend process. This function
will also be used as-is by an upcoming commit reducing the code
duplication.
---
 src/backend/storage/ipc/procarray.c | 60 +
 src/backend/utils/adt/mcxtfuncs.c   | 49 ++-
 src/include/storage/procarray.h |  1 +
 3 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf9641..5681f0d3b0 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3151,6 +3151,66 @@ HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int type)
 	return result;
 }
 
+/*
+ * SendProcSignalBackendOrAuxproc -- check if the process with given pid is a
+ * backend or an auxiliary process and send it the SIGUSR1 signal for a given
+ * reason.
+ *
+ * Returns true if sending the signal was successful, false otherwise.
+ */
+bool
+SendProcSignalBackendOrAuxproc(int pid, ProcSignalReason reason)
+{
+	PGPROC	   *proc;
+	BackendId	backendId = InvalidBackendId;
+
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * See if the process with given pid is a backend or an auxiliary process.
+	 *
+	 * If the given process is a backend, use its backend id in
+	 * SendProcSignal() later to speed up the operation. Otherwise, don't do
+	 * that because auxiliary processes (except the startup process) don't
+	 * have a valid backend id.
+	 */
+	if (proc != NULL)
+		backendId = proc->backendId;
+	else
+		proc = AuxiliaryPidGetProc(pid);
+
+	/*
+	 * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+	 * isn't valid; but by the time we reach kill(), a process for which we
+	 * get a valid proc here might have terminated on its own.  There's no way
+	 * to acquire a lock on an arbitrary process to prevent that. But since
+	 * this mechanism is usually used to debug a backend or an auxiliary
+	 * process running and consuming lots of memory or a long running process,
+	 * that it might end on its own first and its memory contexts are not
+	 * logged or backtrace not logged is not a problem.
+	 */
+	if (proc == NULL)
+	{
+		/*
+		 * This is just a warning so a loop-through-resultset will not abort
+		 * if one backend terminated on its own during the run.
+		 */
+		ereport(WARNING,
+(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		return false;
+	}
+
+	if (SendProcSignal(pid, reason, backendId) < 0)
+	{
+		/* Again, just a warning to allow loops */
+		ereport(WARNING,
+(errmsg("could not send signal to process %d: %m", pid)));
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * BackendPidGetProc -- get a backend's PGPROC given its PID
  *
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 92ca5b2f72..7b17afc2ff 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -145,51 +145,8 @@ Datum
 pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 {
 	int			pid = PG_GETARG_INT32(0);
-	PGPROC	   *proc;
-	BackendId	backendId = InvalidBackendId;
+	bool		result;
 
-	proc = BackendPidGetProc(pid);
-
-	/*
-	 * See if the process with given pid is a backend or an auxiliary process.
-	 *
-	 * If the given process is a backend, use its backend id in
-	 * SendProcSignal() later to speed up the operation. Otherwise, don't do
-	 * that because auxiliary processes (except the startup process) don't
-	 * have a valid backend id.
-	 */
-	if (proc != NULL)
-		backendId = proc->backendId;
-	else
-		proc = AuxiliaryPidGetProc(pid);
-
-	/*
-	 * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
-	 * isn't valid; but by the time we reach kill(), a process for which we
-	 * get a valid proc here might have terminated on its own.  There's no way
-	 * to acquire a lock on an arbitrary process to prevent that. But since
-	 * this mechanism is usually used to debug a backend or an auxiliary
-	 * process running and consuming lots of memory, that it might end on its
-	 * own first and its memory contexts are not logged is not a problem.
-	 */
-	if (proc == NULL)
-	{
-		/*
-		 * This is just a warning so a loop-through-resultset will not abort
-		 * if one backend terminated on its own during the run.
-		 */
-		ereport(WARNING,
-(errmsg("PID %d is not a PostgreSQL server process", pid)));
-		PG_RETURN_BO

Re: can while loop in ClockSweepTick function be kind of infinite loop in some cases?

2023-01-11 Thread 斯波隼斗

Hi, Thank you for your quick reply

I misunderstood the logic of pg_atomic_compare_exchange_u32, so the loop
cannot be infinite.

> I wonder if we should make ->nextVictimBuffer a 64bit atomic. At the time
the changes went in we didn't (or rather, couldn't) rely on it, but these
days we could.  I think with a 64bit number we could get rid of
->completePasses and just infer it from ->nextVictimBuffer/NBuffers,
removing th need for the spinlock.  It's very unlikely that 64bit would
ever wrap, and even if, it'd just be a small inaccuracy in the assumed
number of passes. OTOH, it's doubtful the overflow handling / the spinlock
matters performance wise.

I'm not sure why 64 bit was not used at the time, so I'm concerned about
it.
but, except for it, you have a point and I completely agree with you. as
you have said,  we should use 64 bit whose higher-order 32 bit indicates
completePasses, and should remove spinlock.
maybe we don't have to exceptionally worry about the overflow here mainly
because, even now, the completePasses can overflow and the possibility of
overflow may not be so different so that the 64 bit atomic operation is
better.

if overflow would happen, passes_delta variable in the function called by
bgwriter would be negative high value and it would lead to the failure of
assert. (the code is below
https://github.com/postgres/postgres/blob/d9d873bac67047cfacc9f5ef96ee488f2cb0f1c3/src/backend/storage/buffer/bufmgr.c#L2298-L2303

Do you send patch for the replacement with 64 bit? If you don't mind, I
would like to send patch. ( or is there some procedure before sending patch?

Thanks
hayato

2023年1月11日(水) 3:59 Andres Freund :

> Hi,
>
> On 2023-01-10 13:11:35 -0500, Robert Haas wrote:
> > On Tue, Jan 10, 2023 at 12:40 PM Andres Freund 
> wrote:
> > > > I think. `expected = originalVictim + 1;` line should be in while
> loop
> > > > (before acquiring spin lock) so that, even in the case above,
> expected
> > > > variable is incremented for each loop and CAS operation will be
> successful
> > > > at some point.
> > > > Is my understanding correct? If so, I will send PR for fixing this
> issue.
> > >
> > > Yes, I think your understanding might be correct. Interesting that this
> > > apparently has never occurred.
> >
> > Doesn't pg_atomic_compare_exchange_u32 set expected if it fails?
>
> Indeed, so there's no problem.
>
> I wonder if we should make ->nextVictimBuffer a 64bit atomic. At the time
> the
> changes went in we didn't (or rather, couldn't) rely on it, but these days
> we
> could.  I think with a 64bit number we could get rid of ->completePasses
> and
> just infer it from ->nextVictimBuffer/NBuffers, removing th need for the
> spinlock.  It's very unlikely that 64bit would ever wrap, and even if, it'd
> just be a small inaccuracy in the assumed number of passes. OTOH, it's
> doubtful the overflow handling / the spinlock matters performance wise.
>
> Greetings,
>
> Andres Freund
>

Re: allowing for control over SET ROLE

2023-01-11 Thread Noah Misch

On Tue, Jan 10, 2023 at 11:06:52AM -0500, Robert Haas wrote:
> On Sat, Jan 7, 2023 at 12:00 AM Noah Misch  wrote:
> > The docs are silent on the SET / OWNER TO connection.  Hence,
> 
> Reviewing the documentation again today, I realized that the
> documentation describes the rules for changing the ownership of an
> object in a whole bunch of places which this patch failed to update.
> Here's a patch to update all of the places I found.

A "git grep 'direct or indirect mem'" found a few more:

doc/src/sgml/ref/alter_collation.sgml:42:   To alter the owner, you must also 
be a direct or indirect member of the new
doc/src/sgml/ref/create_database.sgml:92:role, you must be a direct or 
indirect member of that role,
doc/src/sgml/ref/create_schema.sgml:92:owned by another role, you must 
be a direct or indirect member of

I wondered if the new recurring phrase "must be able to SET ROLE" should be
more specific, e.g. one of "must have
{permission,authorization,authority,right} to SET ROLE".  But then I stopped
wondering and figured "be able to" is sufficient.

> I suspect that these changes will mean that we don't also need to
> adjust the discussion of the SET option itself, but let me know what
> you think.

I still think docs for the SET option itself should give a sense of the
diversity of things it's intended to control.  It could be simple.  A bunch of
the sites you're modifying are near text like "These restrictions enforce that
altering the owner doesn't do anything you couldn't do by dropping and
recreating the aggregate function."  Perhaps the main SET doc could say
something about how it restricts other things that would yield equivalent
outcomes.  (Incidentally, DROP is another case of something one likely doesn't
want the WITH SET FALSE member using.  I think that reinforces a point I wrote
upthread.  To achieve the original post's security objective, the role must
own no objects whatsoever.)

Re: Minimal logical decoding on standbys

2023-01-11 Thread Drouvot, Bertrand

Hi,

On 1/11/23 8:32 AM, Bharath Rupireddy wrote:

On Tue, Jan 10, 2023 at 2:03 PM Drouvot, Bertrand
wrote:

Please find attached, V37 taking care of:

Thanks. I started to digest the design specified in the commit message
and these patches.

Thanks for looking at it!

Here are some quick comments:

1. Does logical decoding on standby work without any issues if the
standby is set for cascading replication?

Without "any issues" is hard to guarantee ;-) But according to my tests:

Primary -> Standby1 with or without logical replication slot -> Standby2 with
or without logical replication slot

works as expected (and also with cascading promotion).
We can add some TAP tests in 0004 though.

2. Do logical decoding output plugins work without any issues on the
standby with decoding enabled, say, when the slot is invalidated?

Not sure, I got the question.
If the slot is invalidated then it's expected to get errors like:

pg_recvlogical: error: unexpected termination of replication stream: ERROR:
canceling statement due to conflict with recovery
DETAIL: User was using the logical slot that must be dropped.

pg_recvlogical: error: could not send replication command "START_REPLICATION SLOT "bdt_slot"
LOGICAL 0/0": ERROR: cannot read from logical replication slot "bdt_slot"
DETAIL: This slot has been invalidated because it was conflicting with
recovery.

3. Is this feature still a 'minimal logical decoding on standby'?
Firstly, why is it 'minimal'?

Good question and I don't have the answer.
That's how it has been named when this thread started back in 2018.

4. What happens in case of failover to the standby that's already
decoding for its clients? Will the decoding work seamlessly? If not,
what are recommended things that users need to take care of
during/after failovers?

Yes, it's expected to work seamlessly. There is a TAP test in
0004 for this scenario.

0002:
1.
-if (InvalidateObsoleteReplicationSlots(_logSegNo))
+InvalidateObsoleteOrConflictingLogicalReplicationSlots(_logSegNo,
&invalidated, InvalidOid, NULL);

Isn't the function name too long and verbose? How about just
InvalidateLogicalReplicationSlots()

The function also takes care of Invalidation of Physical replication slots
that are Obsolete (aka LSN case).

InvalidateObsoleteOrConflictingReplicationSlots() maybe?

let the function comment talk
about what sorts of replication slots it invalides?

Agree to make the comment more clear.

2.
+errdetail("Logical decoding on
standby requires wal_level to be at least logical on master"));
+ * master wal_level is set back to replica, so existing logical
slots need to
invalidate such slots. Also do the same thing if wal_level on master

Can we use 'primary server' instead of 'master' like elsewhere? This
comment also applies for other patches too, if any.

Sure.

3. Can we show a new status in pg_get_replication_slots's wal_status
for invalidated due to the conflict so that the user can monitor for
the new status and take necessary actions?

Not sure you've seen but the patch series is adding a new field
(confl_active_logicalslot) in
pg_stat_database_conflicts.

That said, I like your idea about adding a new status in pg_replication_slots
too.
Do you think it's mandatory for this patch series? (I mean it could be added
once this patch series is committed).

I'm asking because this patch series looks already like a "big" one, is more
than 4 years old
and I'm afraid of adding more "reporting" stuff to it (unless we feel a strong
need for it of course).

4. How will the user be notified when logical replication slots are
invalidated due to conflict with the primary server?

Emitting messages, like the ones mentioned above introduced in 0002.

How can they
(firstly, is there a way?) repair/restore such replication slots? Or
is recreating/reestablishing logical replication only the way out for
them?

Drop/recreate is what is part of the current design and discussed up-thread
IIRC.

What users can do to avoid their logical replication slots
getting invalidated and run into these situations? Because
recreating/reestablishing logical replication comes with cost
(sometimes huge) as it involves building another instance, syncing
tables etc. Isn't it a good idea to touch up on all these aspects in
the documentation at least as to why we're doing this and why we can't
do this?

0005 adds a few words about it:

+
+ A logical replication slot can also be created on a hot standby. To
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot gets
+ invalidated. It's highly recommended to use a physical slot between the
primary
+ and the standby. Otherwise, hot_standby_feedback will work, but only
while the
+ connection is alive (for example a node restart would

What object types should be in schemas?


The current hierarchy of object types is like this:

database
access method
event trigger
extension
foreign data wrapper
foreign server
language
publication
schema
aggregate
collation
conversion
domain
function/procedure
index
operator
operator class
operator family
sequence
statistics
table/view
policy
rule
trigger
text search configuration
text search dictionary
text search parser
text search template
type
subscription
role
tablespace

special:
- cast
- transform
- user mapping


How does one decide whether something should be in a schema or not?  The 
current state feels intuitively correct, but I can't determine any firm 
way to decide.


Over in the column encryption thread, the patch proposes to add various 
key types as new object types.  For simplicity, I just stuck them 
directly under database, but I don't know whether that is correct.


Thoughts?

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

2023-01-11 Thread Robert Treat

On Mon, Sep 5, 2022 at 2:04 PM Justin Pryzby  wrote:
>
> On Thu, Aug 04, 2022 at 01:45:49AM -0400, Robert Treat wrote:
> > After reading this again, it isn't clear to me that this advice would
> > be more appropriately placed into Section 5.11, aka
> > https://www.postgresql.org/docs/current/ddl-partitioning.html, but in
> > lieu of a specific suggestion for where to place it there (I haven't
> > settled on one yet), IMHO, I think the first sentence of the suggested
> > change should be rewritten as:
> >
> > 
> > Note that creating a partition using PARTITION OF
> > requires taking an ACCESS EXCLUSIVE lock on the parent 
> > table.
> > It may be preferable to first CREATE a separate table...
>
> Thanks for looking.  I used your language.
>
> There is some relevant information in ddl.sgml, but not a lot, and it's
> not easily referred to, so I removed the part of the patch that tried to
> cross-reference.
>

Yes, I see now what you are referring to, and thinking maybe an option
would be to also add a reference there back to what will include your
change above.

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 4b219435d4..c52092a45e 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4088,7 +4088,9 @@ CREATE TABLE measurement_y2008m02 PARTITION OF measurement
  As an alternative, it is sometimes more convenient to create the
  new table outside the partition structure, and make it a proper
  partition later. This allows new data to be loaded, checked, and
- transformed prior to it appearing in the partitioned table.
+ transformed prior to it appearing in the partitioned table; see
+ ALTER
TABLE ... ATTACH PARTITION
+ for additional details.
  The CREATE TABLE ... LIKE option is helpful
  to avoid tediously repeating the parent table's definition:

> @Robert: I wonder why shouldn't CREATE..PARTITION OF *also* be patched
> to first create a table, and then attach the partition, transparently
> doing what everyone would want, without having to re-read the updated
> docs or know to issue two commands?  I wrote a patch for this which
> "doesn't fail tests", but I still wonder if I'm missing something..
>

I was thinking there might be either lock escalation issues or perhaps
issues around index attachment that don't surface using create
partition of, but I don't actually see any, in which case that does
seem like a better change all around. But like you, I feel I must be
overlooking something :-)

> commit 723fa7df82f39aed5d58e5e52ba80caa8cb13515
> Author: Justin Pryzby 
> Date:   Mon Jul 18 09:24:55 2022 -0500
>
> doc: mention CREATE+ATTACH PARTITION as an alternative to CREATE 
> TABLE..PARTITION OF
>
> In v12, 898e5e329 (Allow ATTACH PARTITION with only 
> ShareUpdateExclusiveLock)
> allows attaching a partition with a weaker lock than in CREATE..PARTITION 
> OF,
> but it does that silently.  On the one hand, things that are automatically
> better, without having to enable the option are the best kind of feature.
>
> OTOH, I doubt many people know to do that, because the docs don't say
> so, because it was implemented as an transparent improvement.  This
> patch adds a bit of documentations to make that more visible.
>
> See also: 898e5e3290a72d288923260143930fb32036c00c
> Should backpatch to v12
>
> diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
> index 360284e37d6..66138b9299d 100644
> --- a/doc/src/sgml/ddl.sgml
> +++ b/doc/src/sgml/ddl.sgml
> @@ -4092,7 +4092,9 @@ ALTER TABLE measurement ATTACH PARTITION 
> measurement_y2008m02
>
>  
>   The ATTACH PARTITION command requires taking a
> - SHARE UPDATE EXCLUSIVE lock on the partitioned table.
> + SHARE UPDATE EXCLUSIVE lock on the partitioned table,
> + as opposed to the Access Exclusive lock which is
> + required by CREATE TABLE .. PARTITION OF.
>  
>
>  
> diff --git a/doc/src/sgml/ref/create_table.sgml 
> b/doc/src/sgml/ref/create_table.sgml
> index c14b2010d81..54dbfa72e4c 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> @@ -619,6 +619,16 @@ WITH ( MODULUS  class="parameter">numeric_literal, REM
>with DROP TABLE requires taking an ACCESS
>EXCLUSIVE lock on the parent table.
>   
> +
> + 
> +  Note that creating a partition using PARTITION OF
> +  requires taking an ACCESS EXCLUSIVE lock on the 
> parent
> +  table.  It may be preferable to first create a separate table and then
> +  attach it, which does not require as strong a lock.
> +  See ATTACH 
> PARTITION
> +  for more information.
> + 
> +
>  
> 
>


Robert Treat
https://xzilla.net

Re: What object types should be in schemas?

Peter Eisentraut  writes:
> The current hierarchy of object types is like this:
> ...
> How does one decide whether something should be in a schema or not?

Roughly speaking, I think the intuition was "if there are not likely
to be a lot of objects of type X, maybe they don't need to be within
schemas".

Extensions might be raised as a counterexample, but in that case
I recall that there was a specific consideration: extensions can
contain (own) schemas, so it would be very confusing if they could
also be within schemas.

I'm not sure about whether that holds for foreign data wrappers and
foreign servers, but isn't that case mandated by the SQL spec?

Roles and tablespaces aren't within schemas because they aren't
within databases.

> Over in the column encryption thread, the patch proposes to add various 
> key types as new object types.  For simplicity, I just stuck them 
> directly under database, but I don't know whether that is correct.

Is it reasonable for those to be per-database rather than cluster-wide?
I don't immediately see a reason to have encrypted columns in shared
catalogs, but there would never be any chance of supporting that if
the keys live in per-database catalogs.  (OTOH, perhaps there are
security reasons to keep them per-database, so I'm not insisting
that this is the right way.)

If we did make them cluster-wide then of course they'd be outside
schemas too.  If we don't, I'd lean slightly towards putting them
within schemas, because that seems to be the default choice if you're
not sure.  There probably aren't a huge number of text search parsers
either, but they live within schemas.

regards, tom lane

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

On Tue, 10 Jan 2023 at 02:41, Melanie Plageman
 wrote:
>
> Attached is v45 of the patchset. I've done some additional code cleanup
> and changes. The most significant change, however, is the docs. I've
> separated the docs into its own patch for ease of review.
>
> The docs patch here was edited and co-authored by Samay Sharma.
> I'm not sure if the order of pg_stat_io in the docs is correct.
>
> The significant changes are removal of all "correspondence" or
> "equivalence"-related sections (those explaining how other IO stats were
> the same or different from pg_stat_io columns).
>
> I've tried to remove references to "strategies" and "Buffer Access
> Strategy" as much as possible.
>
> I've moved the advice and interpretation section to the bottom --
> outside of the table of definitions. Since this page is primarily a
> reference page, I agree with Samay that incorporating interpretation
> into the column definitions adds clutter and confusion.
>
> I think the best course would be to have an "Interpreting Statistics"
> section.
>
> I suggest a structure like the following for this section:
> - Statistics Collection Configuration
> - Viewing Statistics
> - Statistics Views Reference
> - Statistics Functions Reference
> - Interpreting Statistics
>
> As an aside, this section of the docs has some other structural issues
> as well.
>
> For example, I'm not sure it makes sense to have the dynamic statistics
> views as sub-sections under 28.2, which is titled "The Cumulative
> Statistics System."
>
> In fact the docs say this under Section 28.2
> https://www.postgresql.org/docs/current/monitoring-stats.html
>
> "PostgreSQL also supports reporting dynamic information about exactly
> what is going on in the system right now, such as the exact command
> currently being executed by other server processes, and which other
> connections exist in the system. This facility is independent of the
> cumulative statistics system."
>
> So, it is a bit weird that they are defined under the section titled
> "The Cumulative Statistics System".
>
> In this version of the patchset, I have not attempted a new structure
> but instead moved the advice/interpretation for pg_stat_io to below the
> table containing the column definitions.

For some reason cfbot is not able to apply this patch as in [1],
please have a look and post an updated patch if required:
=== Applying patches on top of PostgreSQL commit ID
3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
=== applying patch
./v45-0001-pgindent-and-some-manual-cleanup-in-pgstat-relat.patch
patching file src/backend/storage/buffer/bufmgr.c
patching file src/backend/storage/buffer/localbuf.c
patching file src/backend/utils/activity/pgstat.c
patching file src/backend/utils/activity/pgstat_relation.c
patching file src/backend/utils/adt/pgstatfuncs.c
patching file src/include/pgstat.h
patching file src/include/utils/pgstat_internal.h
=== applying patch ./v45-0002-pgstat-Infrastructure-to-track-IO-operations.patch
gpatch:  Only garbage was found in the patch input.

[1] - http://cfbot.cputube.org/patch_41_3272.log

Regards,
Vignesh

Re: Flush SLRU counters in checkpointer process

Hi,

On 2023-01-11 10:29:06 +0100, Anthonin Bonnefoy wrote:
> Currently, the Checkpointer process only reports SLRU statistics at server
> shutdown, leading to delayed statistics for SLRU flushes. This patch adds a
> flush of SLRU stats to the end of checkpoints.

Hm. I wonder if we should do this even earlier, by the
pgstat_report_checkpointer() calls in CheckpointWriteDelay().

I'm inclined to move the pgstat_report_wal() and pgstat_report_slru() calls
into pgstat_report_checkpointer() to avoid needing to care about all the
individual places.

> @@ -505,6 +505,7 @@ CheckpointerMain(void)
>   /* Report pending statistics to the cumulative stats system */
>   pgstat_report_checkpointer();
>   pgstat_report_wal(true);
> + pgstat_report_slru(true);

Why do we need a force parameter if all callers use it?

Greetings,

Andres Freund

Re: Allow logical replication to copy tables in binary format

On Wed, 11 Jan 2023 at 16:14, Melih Mutlu  wrote:
>
> Hi,
>
> Thanks for your review.
>
> shiy.f...@fujitsu.com , 11 Oca 2023 Çar, 11:56 
> tarihinde şunu yazdı:
>>
>> On Mon, Nov 14, 2022 8:08 PM Melih Mutlu  wrote:
>> 1.
>> +# Binary enabled subscription should fail
>> +$node_subscriber_binary->wait_for_log("ERROR:  insufficient data left in 
>> message");
>>
>> Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription 
>> tests.
>
>
> Done.
>
>>
>> 2.
>> +# Binary disabled subscription should succeed
>> +$node_publisher->wait_for_catchup('tap_sub');
>>
>> If we want to wait for table synchronization to finish, should we call
>> wait_for_subscription_sync()?
>
>
> Done.
>
>>
>> 3.
>> I also think it might be better to support copy binary only for publishers of
>> v16 or later. Do you plan to implement it in the patch?
>
>
> Done.

For some reason CFBot is not able to apply the patch as in [1], Could
you have a look and post an updated patch if required:
=== Applying patches on top of PostgreSQL commit ID
c96de2ce1782116bd0489b1cd69ba88189a495e8 ===
=== applying patch
./v5-0001-Allow-logical-replication-to-copy-table-in-binary.patch
gpatch:  Only garbage was found in the patch input.

[1] - http://cfbot.cputube.org/patch_41_3840.log

Regards,
Vignesh

Re: Rework of collation code, extensibility

On Thu, 22 Dec 2022 at 11:11, Jeff Davis  wrote:
>
> On Sat, 2022-12-17 at 19:14 -0800, Jeff Davis wrote:
> > Attached is a new patch series. I think there are enough changes that
> > this has become more of a "rework" of the collation code rather than
> > just a refactoring. This is a continuation of some prior work[1][2]
> > in
> > a new thread given its new scope.
>
> Here's version 5. There are a number of fixes, and better tests, and
> it's passing in CI.
>
> The libc hook support is still experimental, but what's working is
> passing in CI, even on windows. The challenges with libc hook support
> are:
>
>  * It obviously doesn't replace all of libc, so the separation is not
> as clean and there are a number of callers throughout the code that
> don't necessarily care about specific collations.
>
>  * libc relies on setlocale() / uselocale(), which is global state and
> not as easy to track.
>
>  * More platform issues (obviously) and harder to test.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
c971a5b27ac946e7c94f7f655d321279512c7ee7 ===
=== applying patch ./v5-0003-Refactor-pg_locale_t-routines.patch

Hunk #1 FAILED at 88.
...
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/utils/adt/formatting.c.rej
patching file src/backend/utils/adt/like.c
Hunk #1 FAILED at 24.
Hunk #2 succeeded at 97 (offset 1 line).
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/utils/adt/like.c.rej

[1] - http://cfbot.cputube.org/patch_41_4058.log

Regards,
Vignesh

Re: GUC for temporarily disabling event triggers

On Tue, 29 Nov 2022 at 18:16, Daniel Gustafsson  wrote:
>
> > On 3 Nov 2022, at 21:47, Daniel Gustafsson  wrote:
>
> > The patch adds a new GUC, ignore_event_trigger with two option values, 'all'
> > and 'none' (the login event patch had 'login' as well).
>
> The attached v2 fixes a small bug which caused testfailures the CFBot.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

=== Applying patches on top of PostgreSQL commit ID
5f6401f81cb24bd3930e0dc589fc4aa8b5424cdc ===
=== applying patch
./v2-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
patching file doc/src/sgml/config.sgml
Hunk #1 succeeded at 9480 (offset 117 lines).
.
patching file src/backend/utils/misc/postgresql.conf.sample
Hunk #1 FAILED at 701.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/utils/misc/postgresql.conf.sample.rej

[1] - http://cfbot.cputube.org/patch_41_4013.log

Regards,
Vignesh

Re: Transparent column encryption

On Sat, 31 Dec 2022 at 19:47, Peter Eisentraut
 wrote:
>
> On 21.12.22 06:46, Peter Eisentraut wrote:
> > And another update.  The main changes are that I added an 'unspecified'
> > CMK algorithm, which indicates that the external KMS knows what it is
> > but the database system doesn't.  This was discussed a while ago.  I
> > also changed some details about how the "cmklookup" works in libpq. Also
> > added more code comments and documentation and rearranged some code.
> >
> > According to my local todo list, this patch is now complete.
>
> Another update, with some merge conflicts resolved.  I also fixed up the
> remaining TODO markers in the code, which had something to do with Perl
> and Windows.  I did some more work on schema handling, e.g., CREATE
> TABLE / LIKE, views, partitioning etc. on top of encrypted columns,
> mostly tedious and repetitive, nothing interesting.  I also rewrote the
> code that extracts the underlying tables and columns corresponding to
> query parameters.  It's now much simpler and better encapsulated.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
5f6401f81cb24bd3930e0dc589fc4aa8b5424cdc ===
=== applying patch ./v14-0001-Transparent-column-encryption.patch

Hunk #1 FAILED at 1109.

1 out of 5 hunks FAILED -- saving rejects to file doc/src/sgml/protocol.sgml.rej

patching file doc/src/sgml/ref/create_table.sgml
Hunk #3 FAILED at 351.
Hunk #4 FAILED at 704.
2 out of 4 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/create_table.sgml.rej

Hunk #1 FAILED at 1420.
Hunk #2 FAILED at 4022.
2 out of 2 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/psql-ref.sgml.rej

[1] - http://cfbot.cputube.org/patch_41_3718.log

Regards,
Vignesh

Re: [PATCH] pgbench: add multiconnect option

On Tue, 8 Nov 2022 at 02:16, Fabien COELHO  wrote:
>
>
> Hello Ian,
>
> > cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
> > currently underway, this would be an excellent time to update the patch.
>
> Attached a v5 which is just a rebase.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
=== applying patch ./pgbench-multi-connect-conninfo-5.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/ref/pgbench.sgml
Hunk #3 FAILED at 921.
1 out of 3 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/pgbench.sgml.rej

[1] - http://cfbot.cputube.org/patch_41_3227.log

Regards,
Vignesh

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

On Wed, Jan 11, 2023 at 1:48 AM Aleksander Alekseev
 wrote:
> After reading [1] carefully it looks like we shouldn't worry about
> this. The upgrade procedure explicitly requires to run `pg_ctl stop`
> during step 8 of the upgrade procedure, i.e. not in the immediate mode
> [2].

Yeah, pg_upgrade will briefly start and stop the old server to make
sure all the WAL is replayed, and won't transfer any of the files
over. AFAIK, major-version WAL changes are fine; it was the previous
claim that we could do it in a minor version that I was unsure about.

Thanks!
--Jacob

Re: [PATCH] psql: Add tab-complete for optional view parameters

On Fri, 9 Dec 2022 at 16:01, Christoph Heiss  wrote:
>
> Thanks for the review!
>
> On 12/8/22 12:19, Melih Mutlu wrote:
> > Hi Christoph,
> >
> > I just took a quick look at your patch.
> > Some suggestions:
> >
> > +   else if (Matches("ALTER", "VIEW", MatchAny, "SET", "("))
> > +   COMPLETE_WITH_LIST(view_optional_parameters);
> > +   /* ALTER VIEW xxx RESET ( yyy , ... ) */
> > +   else if (Matches("ALTER", "VIEW", MatchAny, "RESET", "("))
> > +   COMPLETE_WITH_LIST(view_optional_parameters);
> >
> >
> > What about combining these two cases into one like Matches("ALTER",
> > "VIEW", MatchAny, "SET|RESET", "(") ?
> Good thinking, incorporated that into v2.
> While at it, I also added completion for the values of the SET
> parameters, since that is useful as well.
>
> >
> >  /* ALTER VIEW  */
> >  else if (Matches("ALTER", "VIEW", MatchAny))
> >  COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
> >"SET SCHEMA");
> >
> >
> > Also seems like SET and RESET don't get auto-completed for "ALTER VIEW
> > ".
> > I think it would be nice to include those missing words.
> That is already contained in the patch:
>
> @@ -2139,7 +2146,7 @@ psql_completion(const char *text, int start, int end)
>   /* ALTER VIEW  */
>   else if (Matches("ALTER", "VIEW", MatchAny))
>   COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
> -  "SET SCHEMA");
> +  "SET SCHEMA", "SET (", "RESET (");

For some reason CFBot is not able to apply the patch, please have a
look and post an updated version if required, also check and handle
Dean Rasheed and Jim Jones  comment if applicable:
=== Applying patches on top of PostgreSQL commit ID
5f6401f81cb24bd3930e0dc589fc4aa8b5424cdc ===
=== applying patch
./v2-0001-psql-Add-tab-complete-for-optional-view-parameter.patch
gpatch:  Only garbage was found in the patch input.

[1] - http://cfbot.cputube.org/patch_41_4053.log

Regards,
Vignesh

Remove source code display from \df+?

2023-01-11 Thread Isaac Morland

I find \df+ much less useful than it should be because it tends to be
cluttered up with source code. Now that we have \sf, would it be reasonable
to remove the source code from the \df+ display? This would make it easier
to see function permissions and comments. If somebody wants to see the full
definition of a function they can always invoke \sf on it.

If there is consensus on the idea in principle I will write up a patch.

Re: Minimal logical decoding on standbys

Hi,

On 2023-01-11 13:02:13 +0530, Bharath Rupireddy wrote:
> 3. Is this feature still a 'minimal logical decoding on standby'?
> Firstly, why is it 'minimal'?

It's minimal in comparison to other proposals at the time that did explicit /
active coordination between primary and standby to allow logical decoding.



> 0002:
> 1.
> -if (InvalidateObsoleteReplicationSlots(_logSegNo))
> +InvalidateObsoleteOrConflictingLogicalReplicationSlots(_logSegNo,
> &invalidated, InvalidOid, NULL);
> 
> Isn't the function name too long and verbose?

+1


> How about just InvalidateLogicalReplicationSlots() let the function comment
> talk about what sorts of replication slots it invalides?

I'd just leave the name unmodified at InvalidateObsoleteReplicationSlots().


> 2.
> +errdetail("Logical decoding on
> standby requires wal_level to be at least logical on master"));
> + * master wal_level is set back to replica, so existing logical
> slots need to
> invalidate such slots. Also do the same thing if wal_level on master
> 
> Can we use 'primary server' instead of 'master' like elsewhere? This
> comment also applies for other patches too, if any.

+1


> 3. Can we show a new status in pg_get_replication_slots's wal_status
> for invalidated due to the conflict so that the user can monitor for
> the new status and take necessary actions?

Invalidated slots are not a new concept introduced in this patchset, so I'd
say we can introduce such a field separately.

Greetings,

Andres Freund

Re: mprove tab completion for ALTER EXTENSION ADD/DROP

On Wed, 11 Jan 2023 at 12:19, Michael Paquier  wrote:
>
> On Wed, Jan 11, 2023 at 12:10:33PM +0900, Kyotaro Horiguchi wrote:
> > It suggests the *kinds* of objects that are part of the extension, but
> > lists the objects of that kind regardless of dependency.  I read
> > Michael suggested (and I agree) to restrict the objects (not kinds) to
> > actually be a part of the extension. (And not for object kinds.)
>
> Yeah, that's what I meant.  Now, if Vignesh does not want to extend
> that, that's fine for me as well at the end on second thought, as this
> involves much more code for each DROP path depending on the object
> type involved.
>
> Adding the object names after DROP/ADD is useful on its own, and we
> already have some completion once the object type is specified, so
> simpler is perhaps just better here.

I too felt keeping it simpler is better. How about using the simple
first version of patch itself?

Regards,
Vignesh

[Proposal] Allow pg_dump to include all child tables with the root table

2023-01-11 Thread Gilles Darold


Hi all,


I would like to propose a new pg_dump option called --with-child to 
include or exclude from a dump all child and partition tables when a 
parent table is specified using option -t/--table or -T/--exclude-table. 
The whole tree is dumped with the root table.



To include all partitions or child tables with inheritance in a table 
dump we usually use the wildcard, for example:



    pg_dump -d mydb -t "root_tbname*" > out.sql


This suppose that all child/partition tables use the prefix root_tbname 
in their object name. This is often the case but, if you are as lucky as 
me, the partitions could have a total different name. No need to say 
that for inheritance this is rarely the case. The other problem is that 
with the wildcard you can also dump relations that are not concerned at 
all by what you want to dump. Using the --with-child option will allow 
to just specify the root relation and all child/partition definitions 
and/or data will be parts of dump.



    pg_dump -d mydb --table "root_tbname" --with-childs > out.sql


To exclude a whole inheritance tree from a dump:


    pg_dump -d mydb --exclude-table "root_tbname" --with-childs > out.sql


Here in attachment the patch that adds this feature to pg_dump.


Is there is any interest for this feature?


Best regards,

--
Gilles Darold
https://www.migops.com/
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 2c938cd7e1..f9635442f9 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1172,6 +1172,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --with-childs
+  
+   
+Include or exclude from a dump all child and partition tables when a parent
+table is specified using option -t/--table
+or -T/--exclude-table.
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..09284c82be 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -200,6 +200,7 @@ typedef struct _dumpOptions
 
 	int			sequence_data;	/* dump sequence data even in schema-only mode */
 	int			do_nothing;
+	bool			with_childs;
 } DumpOptions;
 
 /*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5e800dc79a..83c092080e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -421,6 +421,7 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"with-childs", no_argument, NULL, 12},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +632,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:/* dump child table too */
+dopt.with_childs = true;
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -810,6 +815,14 @@ main(int argc, char **argv)
 false);
 	/* non-matching exclusion patterns aren't an error */
 
+	/*
+	 * The include child option require that there is
+	 * at least one table inclusion
+	 */
+	if (dopt.with_childs && table_include_patterns.head == NULL
+			&& table_exclude_patterns.head == NULL)
+		pg_fatal("option --with-childs requires option -t/--table or -T/--exclude-table");
+
 	/* Expand table selection patterns into OID lists */
 	if (table_include_patterns.head != NULL)
 	{
@@ -1088,6 +1101,9 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   use SET SESSION AUTHORIZATION commands instead of\n"
 			 "   ALTER OWNER commands to set ownership\n"));
+	printf(_("  --with-childsinclude or exclude from a dump all child and partition\n"
+			 "   tables when a parent table is specified using\n"
+			 "   -t/--table or -T/--exclude-table\n"));
 
 	printf(_("\nConnection options:\n"));
 	printf(_("  -d, --dbname=DBNAME  database to dump\n"));
@@ -1520,6 +1536,15 @@ expand_table_name_patterns(Archive *fout,
 		PQExpBufferData dbbuf;
 		int			dotcnt;
 
+		/*
+		 * With --include_child we look recursively to the inheritance
+		 * tree to find the childs tables of the matching include filter
+		 */
+		if (fout->dopt->with_childs)
+		{
+			appendPQExpBuffer(query, "WITH RECURSIVE child_tree (relid) AS (\n");
+		}
+
 		/*
 		 * Query must remain ABSOLUTELY devoid of unqualified names.  This
 		 * would be unnecessary given a pg_table_is_visible() variant taking a
@@ -1547,6 +1572,20 @@ expand_table_name_patterns(Archive *fout,
 			prohibit_crossdb_refs(GetConnection(fout), dbbuf.data, cell->val);
 		termPQExpBuffer(&dbbuf);
 
+		if (fout->dopt->with_childs)
+		{
+			appendPQExpBuffer(query, "\n  UNION ALL"
+		 "\nSELECT c.oid AS relid"
+		 "\nFROM child_

Re: MultiXact\SLRU buffers configuration

2023-01-11 Thread Andrey Borodin

Hi Dilip! Thank you for the review!

On Tue, Jan 10, 2023 at 9:58 PM Dilip Kumar  wrote:
>
> On Mon, Jan 9, 2023 at 9:49 AM Andrey Borodin  wrote:
> >
> > On Tue, Jan 3, 2023 at 5:02 AM vignesh C  wrote:
> > > does not apply on top of HEAD as in [1], please post a rebased patch:
> > >
> > Thanks! Here's the rebase.
>
> I was looking into this patch, it seems like three different
> optimizations are squeezed in a single patch
> 1) dividing buffer space in banks to reduce the seq search cost 2) guc
> parameter for buffer size scale 3) some of the buffer size values are
> modified compared to what it is on the head.  I think these are 3
> patches which should be independently committable.
There's no point in dividing SLRU buffers in parts unless the buffer's
size is configurable.
And it's only possible to enlarge default buffers size if SLRU buffers
are divided into banks.
So the features can be viewed as independent commits, but make no
sense separately.

But, probably, it's a good idea to split the patch back anyway, for
easier review.

>
> While looking into the first idea of dividing the buffer space in
> banks, I see that it will speed up finding the buffers but OTOH while
> searching the victim buffer it will actually can hurt the performance
> the slru pages which are frequently accessed are not evenly
> distributed across the banks.  So imagine the cases where we have some
> banks with a lot of empty slots and other banks from which we
> frequently have to evict out the pages in order to get the new pages
> in.
>

Yes. Despite the extremely low probability of such a case, this
pattern when a user accesses pages assigned to only one bank may
happen.
This case is equivalent to having just one bank, which means small
buffers. Just as we have now.

Thank you!

Best regards, Andrey Borodin.

Re: Remove source code display from \df+?

st 11. 1. 2023 v 17:50 odesílatel Isaac Morland 
napsal:

> I find \df+ much less useful than it should be because it tends to be
> cluttered up with source code. Now that we have \sf, would it be reasonable
> to remove the source code from the \df+ display? This would make it easier
> to see function permissions and comments. If somebody wants to see the full
> definition of a function they can always invoke \sf on it.
>
> If there is consensus on the idea in principle I will write up a patch.
>

+1

Pavel

Re: Remove source code display from \df+?

2023-01-11 Thread Magnus Hagander

On Wed, Jan 11, 2023 at 6:19 PM Pavel Stehule 
wrote:

>
>
> st 11. 1. 2023 v 17:50 odesílatel Isaac Morland 
> napsal:
>
>> I find \df+ much less useful than it should be because it tends to be
>> cluttered up with source code. Now that we have \sf, would it be reasonable
>> to remove the source code from the \df+ display? This would make it easier
>> to see function permissions and comments. If somebody wants to see the full
>> definition of a function they can always invoke \sf on it.
>>
>> If there is consensus on the idea in principle I will write up a patch.
>>
>
> +1
>
>
+1 but maybe with a twist. For any functions in a procedural language like
plpgsql, it makes it pretty useless today. But when viewing an internal or
C language function, it's short enough and still actually useful. Maybe
some combination where it would keep showing those for such language, but
would show "use \sf to view source" for procedural languages?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

On Wed, Jan 11, 2023 at 6:37 AM Jelte Fennema  wrote:
>
> LGTM. As far as I can tell this is ready for a committer.

Thanks for the reviews!

--Jacob

Re: Remove source code display from \df+?

st 11. 1. 2023 v 18:25 odesílatel Magnus Hagander 
napsal:

>
>
> On Wed, Jan 11, 2023 at 6:19 PM Pavel Stehule 
> wrote:
>
>>
>>
>> st 11. 1. 2023 v 17:50 odesílatel Isaac Morland 
>> napsal:
>>
>>> I find \df+ much less useful than it should be because it tends to be
>>> cluttered up with source code. Now that we have \sf, would it be reasonable
>>> to remove the source code from the \df+ display? This would make it easier
>>> to see function permissions and comments. If somebody wants to see the full
>>> definition of a function they can always invoke \sf on it.
>>>
>>> If there is consensus on the idea in principle I will write up a patch.
>>>
>>
>> +1
>>
>>
> +1 but maybe with a twist. For any functions in a procedural language like
> plpgsql, it makes it pretty useless today. But when viewing an internal or
> C language function, it's short enough and still actually useful. Maybe
> some combination where it would keep showing those for such language, but
> would show "use \sf to view source" for procedural languages?
>

yes, it is almost necessary for C functions or functions in external
languages. Maybe it can be specified in pg_language if prosrc is really
source code or some reference.






> --
>  Magnus Hagander
>  Me: https://www.hagander.net/ 
>  Work: https://www.redpill-linpro.com/ 
>

Re: fix and document CLUSTER privileges

2023-01-11 Thread Nathan Bossart

On Wed, Jan 11, 2023 at 02:22:26PM +0100, Gilles Darold wrote:
> I'm moving this commitfest entry to Ready for Committers.

Thank you for reviewing.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Remove source code display from \df+?

2023-01-11 Thread Isaac Morland

Right, for internal or C functions it just gives a symbol name or something
similar. I've never been annoyed seeing that, just having pages of PL/PGSQL
(I use a lot of that, possibly biased towards the “too much” direction)
take up all the space.

A bit hacky, but what about only showing the first line of the source code?
Then you would see link names for that type of function but the main
benefit of suppressing the full source code would be obtained. Or, show
source if it is a single line, otherwise “…” (as in, literally an ellipsis).

On Wed, 11 Jan 2023 at 12:31, Pavel Stehule  wrote:

>
>
> st 11. 1. 2023 v 18:25 odesílatel Magnus Hagander 
> napsal:
>
>>
>>
>> On Wed, Jan 11, 2023 at 6:19 PM Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> st 11. 1. 2023 v 17:50 odesílatel Isaac Morland 
>>> napsal:
>>>
 I find \df+ much less useful than it should be because it tends to be
 cluttered up with source code. Now that we have \sf, would it be reasonable
 to remove the source code from the \df+ display? This would make it easier
 to see function permissions and comments. If somebody wants to see the full
 definition of a function they can always invoke \sf on it.

 If there is consensus on the idea in principle I will write up a patch.

>>>
>>> +1
>>>
>>>
>> +1 but maybe with a twist. For any functions in a procedural language
>> like plpgsql, it makes it pretty useless today. But when viewing an
>> internal or C language function, it's short enough and still actually
>> useful. Maybe some combination where it would keep showing those for such
>> language, but would show "use \sf to view source" for procedural languages?
>>
>
> yes, it is almost necessary for C functions or functions in external
> languages. Maybe it can be specified in pg_language if prosrc is really
> source code or some reference.
>
>
>
>
>
>
>> --
>>  Magnus Hagander
>>  Me: https://www.hagander.net/ 
>>  Work: https://www.redpill-linpro.com/ 
>>
>

Re: Remove source code display from \df+?

Hi

st 11. 1. 2023 v 18:57 odesílatel Isaac Morland 
napsal:

> Right, for internal or C functions it just gives a symbol name or
> something similar. I've never been annoyed seeing that, just having pages
> of PL/PGSQL (I use a lot of that, possibly biased towards the “too much”
> direction) take up all the space.
>
> A bit hacky, but what about only showing the first line of the source
> code? Then you would see link names for that type of function but the main
> benefit of suppressing the full source code would be obtained. Or, show
> source if it is a single line, otherwise “…” (as in, literally an ellipsis).
>

please, don't send top post replies -
https://en.wikipedia.org/wiki/Posting_style

I don't think printing a few first rows is a good idea - usually there is
nothing interesting (same is PL/Perl, PL/Python, ...)

If the proposed feature can be generic, then this information should be
stored somewhere in pg_language. Or we can redesign usage of prosrc and
probin columns - but this can be a much more massive change.

Regards

Pavel




>
> On Wed, 11 Jan 2023 at 12:31, Pavel Stehule 
> wrote:
>
>>
>>
>> st 11. 1. 2023 v 18:25 odesílatel Magnus Hagander 
>> napsal:
>>
>>>
>>>
>>> On Wed, Jan 11, 2023 at 6:19 PM Pavel Stehule 
>>> wrote:
>>>


 st 11. 1. 2023 v 17:50 odesílatel Isaac Morland <
 isaac.morl...@gmail.com> napsal:

> I find \df+ much less useful than it should be because it tends to be
> cluttered up with source code. Now that we have \sf, would it be 
> reasonable
> to remove the source code from the \df+ display? This would make it easier
> to see function permissions and comments. If somebody wants to see the 
> full
> definition of a function they can always invoke \sf on it.
>
> If there is consensus on the idea in principle I will write up a patch.
>

 +1


>>> +1 but maybe with a twist. For any functions in a procedural language
>>> like plpgsql, it makes it pretty useless today. But when viewing an
>>> internal or C language function, it's short enough and still actually
>>> useful. Maybe some combination where it would keep showing those for such
>>> language, but would show "use \sf to view source" for procedural languages?
>>>
>>
>> yes, it is almost necessary for C functions or functions in external
>> languages. Maybe it can be specified in pg_language if prosrc is really
>> source code or some reference.
>>
>>
>>
>>
>>
>>
>>> --
>>>  Magnus Hagander
>>>  Me: https://www.hagander.net/ 
>>>  Work: https://www.redpill-linpro.com/ 
>>>
>>

Re: Remove source code display from \df+?

2023-01-11 Thread Justin Pryzby

Or, only show the source in \df++.  But it'd be a bit unfortunate if the
C language function wasn't shown in \df+

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-01-11 Thread Magnus Hagander

On Wed, Jan 11, 2023 at 6:27 PM Jacob Champion 
wrote:

> On Wed, Jan 11, 2023 at 6:37 AM Jelte Fennema  wrote:
> >
> > LGTM. As far as I can tell this is ready for a committer.
>
> Thanks for the reviews!
>

Sorry to jump in (very) late in this game. So first, I like this general
approach :)

It feels icky to have to add configure tests just to make a test work. But
I guess there isn't really a way around that if we want to test the full
thing.

However, shouldn't we be using X509_get_default_cert_file_env() to get the
name of the env? Granted it's  unlikely to be anything else, but if it's an
API you're supposed to use. (In an ideal world that function would not
return anything in LibreSSL but I think it does include something, and then
just ignores it?)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Remove source code display from \df+?

2023-01-11 Thread Isaac Morland

On Wed, 11 Jan 2023 at 13:11, Pavel Stehule  wrote:

please, don't send top post replies -
> https://en.wikipedia.org/wiki/Posting_style
>

Sorry about that; I do know to do it properly and usually get it right.
GMail doesn’t seem to have an option (that I can find) to leave no space at
the top and put my cursor at the bottom; it nudges pretty firmly in the
direction of top-posting. Thanks for the reminder.

> I don't think printing a few first rows is a good idea - usually there is
> nothing interesting (same is PL/Perl, PL/Python, ...)
>
> If the proposed feature can be generic, then this information should be
> stored somewhere in pg_language. Or we can redesign usage of prosrc and
> probin columns - but this can be a much more massive change.
>
>> 

>>>
I’m looking for a quick win. So I think that means either drop the source
column entirely, or show single-line source values only and nothing or a
placeholder for anything that is more than one line, unless somebody comes
up with another suggestion. Originally I was thinking just to remove
entirely, but I’ve seen a couple of comments suggesting that people would
find it unfortunate if the source weren’t shown for internal/C functions,
so now I’m leaning towards showing single-line values only.

I agree that showing the first line or couple of lines isn't likely to be
very useful. The way I format my functions, the first line is always blank
anyway: I write the bodies like this:

$$
BEGIN
…
END;
$$;

Even if somebody uses a different style, the first line is probably just
"BEGIN" or something equally formulaic.

Re: pgsql: Add new GUC createrole_self_grant.

On Tue, Jan 10, 2023 at 10:25 PM Tom Lane  wrote:
> > Of course, if it's possible for a non-CREATEROLE user to set the value
> > that a CREATEROLE user experiences, that'd be more of a problem --
>
> That's exactly the case I'm worried about, and it's completely reachable
> if a CREATEROLE user makes a SECURITY DEFINER function that executes
> an affected GRANT and is callable by an unprivileged user.  Now, that
> probably isn't terribly likely, and it's unclear that there'd be any
> serious security consequence even if the GRANT did do something
> different than the author of the SECURITY DEFINER function was expecting.
> Nonetheless, I'm feeling itchy about this chain of assumptions.

If you want to make safe a SECURITY DEFINER function written using sql
or plpgsql, you either have to schema-qualify every single reference
or, more realistically, attach a SET clause to the function to set the
search_path to a sane value during the time that the function is
executing. The problem here can be handled the same way, except that
it's needed in a vastly more limited set of circumstances: you have to
be calling a SECURITY DEFINER function that will execute CREATE ROLE
as a non-superuser (and that user then needs to be sensitive to the
value of this GUC in some security-relevant way). It might be good to
document this -- I just noticed that the CREATE FUNCTION page has a
section on "Writing SECURITY DEFINER Functions Safely" which talks
about dealing with the search_path issues, and it seems like it would
be worth adding a sentence or two there to talk about this.

It might also be a good idea to make the section of the page that
explains the meaning of the SECURITY INVOKER and SECURITY DEFINER
clauses cross-link to the section on writing such functions safely,
because that section is way down at the bottom of the page and seems
easy to miss.

But I'm not convinced that we need more than documentation changes here.

> Bottom line is that a GUC doesn't feel like the right mechanism to use.
> What do you think about inventing a role property, or a grantable role
> that controls this behavior?

Well, I originally had a pair of role properties called
INHERITCREATEDROLES and SETCREATEDROLES, but nobody liked that,
including you. One reason was probably that the names are long and
ugly, but this is also unlike existing role properties, which
typically can only be changed by a superuser, or by a CREATEROLE user
with ADMIN OPTION on the target role. Since you don't have admin
option on yourself, that would mean you couldn't change this property
for yourself, which I wasn't too exercised about, but everyone else
who commented disliked it. We could go back to having those properties
and have the rules for changing them be different than the roles for
changing other role properties, I suppose.

But I don't think that's better. Some of the existing role properties
(SUPERUSER, REPLICATION, BYPASSRLS, CREATEROLE, CREATEDB) are
capabilities, and others (LOGIN, CONNECTION LIMIT, VALID UNTIL) are
limits established by the system administrator. This is neither: it's
a default. So if we put it into the role property system, then we're
saying this one particular default is a role property, whereas pretty
much all of the others are GUCs. Now, admittedly, I did have it as a
role property originally, but that was before we decided that it made
sense to give the CREATEROLE user, rather than the superuser, control
over the value of the property. And I think we decided that for good
reason, because the CREATEROLE user can always turn "no" into "yes" by
granting the created role back to themselves with additional options.
They can't necessarily turn "yes" into "no," because we could create
the implicit grant with one or both of those options turned on and the
CREATEROLE user can't undo that. But nobody thought that was a useful
thing to enforce, and neither do I. Given that shift in thinking, I
have a hard time believing that it's a good idea to shove this option
into a system that is meant for capabilities or limits and has no
existing precedent for anything that behaves like a setting or user
preference (except perhaps ALTER USER ... PASSWORD '...' but calling
that a preference might be a stretch).

If we used grantable roles, I suppose the design there would to invent
two new predefined role and grant them to the CREATEROLE user, or not,
to affect whether and how subsequent implicit self-grants by that user
would be performed. But that again takes the decision out of the hands
of the CREATEROLE user, and it again puts a user preference into a
system that we currently use only for capabilities.

Stepping back a second, I think that your underlying concern here is
that the entire GUC system is shaky from a security perspective.
Trying to write SECURITY DEFINER functions or procedures in sql or
plpgsql is not unlike trying to write a safe setuid shell script.
Among the problems with trying to do that is that the caller might
establish surp

Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

The use of the ringbuffer in VACUUM often causes very substantial slowdowns.

The primary reason for that is that often most/all the buffers in the
ringbuffer have been dirtied when they were processed, with an associated WAL
record. When we then reuse the buffer via the (quite small) ringbuffer, we
need to flush the WAL before reclaiming the buffer.  A synchronous flush of
the WAL every few buffers ends up as a very significant bottleneck, unless you
have local low-latency durable storage (i.e. local storage with some sort of
power protection).

The slowdown caused by the frequent WAL flushes is very significant.

A secondary issue, when we end up doing multiple passes, we'll have to re-read
data into shared_buffers, when we just wrote it out and evicted it.


An example:

On a local SSD with decently fast fdatasync([1]). Table size is 3322MB, with 
~10%
updated, ~30% deleted tuples, and a single index. m_w_m is large enough to do
this in one pass.  I used pg_prewarm() of another relation to ensure the
vacuumed table isn't in s_b (otherwise ringbuffers aren't doing anything).

s_b ringbuffer enabled  time wal_syncs wal_sync_time
128MB   1   77797ms
128MB   0   13676ms   241 2538ms
8GB 1   72834ms 2397651989ms
8GB 09544ms   150 1634ms


see [2] for logs / stats of the 8GB run. All the data here is in the OS page
cache, so we don't even pay the real-price for reading the data multiple
times.


On cloud hardware with higher fsync latency I've seen > 15x time differences
between using the ringbuffers and avoiding them by using pg_prewarm.


Of course there's a good reason we have the ringbuffer - we don't want
maintenance operations to completely disturb the buffer pool and harm the
production workload. But if, e.g., the database isn't available due to
anti-wraparound measures, there's no other workload to protect, and the
ringbuffer substantially reduces availability. Initial data loads could
similarly benefit.


Therefore I'd like to add an option to the VACUUM command to use to disable
the use of the ringbuffer. Not sure about the name yet.


I think we should auto-enable that mode once we're using the failsafe mode,
similar to [auto]vacuum cost delays getting disabled
(c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're
soon going to shut down, we want to be aggressive.


Greetings,

Andres Freund

[1] according to pg_test_fsync:
fdatasync   769.189 ops/sec1300 usecs/op


[2]

For the s_b 128MB case:

ringbuffers enabled:

2023-01-11 10:24:58.726 PST [355353][client backend][2/19:0][psql] INFO:  
aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:26:19.488 PST [355353][client backend][2/19:0][psql] INFO:  
finished vacuuming "postgres.public.copytest_0": index scans: 1
pages: 0 removed, 424975 remain, 424975 scanned (100.00% of total)
tuples: 400 removed, 700 remain, 0 are dead but not yet 
removable
removable cutoff: 2981, which was 0 XIDs old when operation ended
new relfrozenxid: 2981, which is 102 XIDs ahead of previous value
frozen: 424975 pages from table (100.00% of total) had 700 tuples 
frozen
index scan needed: 424975 pages from table (100.00% of total) had 
4325101 dead item identifiers removed
index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
I/O timings: read: 2284.292 ms, write: 4325.009 ms
avg read rate: 83.203 MB/s, avg write rate: 83.199 MB/s
buffer usage: 425044 hits, 860113 misses, 860074 dirtied
WAL usage: 1709902 records, 434990 full page images, 2273501683 bytes
system usage: CPU: user: 11.62 s, system: 11.86 s, elapsed: 80.76 s

┌─┬─┬┬──┬───┬──┬┬───┬───┐
│ wal_records │ wal_fpi │ wal_bytes  │ wal_buffers_full │ wal_write │ wal_sync 
│ wal_write_time │ wal_sync_time │  stats_reset  │
├─┼─┼┼──┼───┼──┼┼───┼───┤
│ 8092795 │  443356 │ 2999358740 │ 1569 │ 28651 │27081 
│   1874.391 │ 59895.674 │ 2023-01-11 10:24:58.664859-08 │
└─┴─┴┴──┴───┴──┴┴───┴───┘


ringbuffers disabled:

2023-01-11 10:23:05.081 PST [355054][client backend][2/19:0][psql] INFO:  
aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:23:18.755 PST [355054][client backend][2/19:0][psql] INFO:  
finished vacuuming "postgres.public.copytest_0": index scans: 1
pages: 0 removed, 424979 remain, 424979 scanned (100.00% of total)
tuples

Re: Can we let extensions change their dumped catalog schemas?

On Tue, Jan 10, 2023 at 7:53 PM Tom Lane  wrote:
> Jacob Champion  writes:
> > Unless I'm missing something obvious (please, let it be that) there's no
> > way to do this safely. Once you've marked an internal table as dumpable,
> > its schema is effectively frozen if you want your dumps to work across
> > versions, because otherwise you'll try to restore that "catalog" data
> > into a table that has different columns. And while sometimes you can
> > make that work, it doesn't in the general case.
>
> I agree that's a problem, but it's not that we're arbitrarily prohibiting
> something that would work.  What, concretely, do you think could be
> done to improve the situation?

Concretely, I think extensions should be able to invoke their update
scripts at some point after a dump/restore cycle, whether
automatically or manually.

> > 2) Provide a way to record the exact version of an extension in a dump.
>
> Don't really see how that helps?

If pg_dump recorded our extension using

CREATE EXTENSION timescaledb VERSION 

then we'd be able to migrate the changed catalog post-restore, using a
standard ALTER EXTENSION ... UPDATE.

> I also fear that it will break
> a bunch of use-cases that work fine today, which are exactly the
> ones for which we originally defined pg_dump as *not* committing
> to a particular extension version.

Right, I think it would have to be opt-in. Say, a new control file
option dump_version or some such.

> It feels like what we really need here is some way to mutate the
> old format of an extension config table into the new format.

Agreed. We already provide mutation functions via the update scripts,
so I think both proposal 2 and 3 do that. I'm curious about your
opinion on option 3, since it would naively seem to make pg_dump do
_less_ work for a given extension.

> Simple addition of new columns shouldn't be a problem (in fact,
> I think that works already, or could easily be made to).  If you
> want some ETL processing then it's harder :-(.

One sharp edge for the add-a-new-column case is where you give the new
column a default, and you want all of the old migrated rows to have
some non-default value to handle backwards compatibility. (But that
case is handled trivially if you _know_ that you're performing a
migration.)

> Could an ON INSERT
> trigger on an old config table transpose converted data into a
> newer config table?

You mean something like, introduce table catalog_v2, and have all
INSERTs to catalog_v1 migrate and redirect the rows? That seems like
it could work today, though it would mean maintaining two different
upgrade paths for the same table, migrating all users of the catalog
to the new name, and needing to drop the old table at... some point
after the restore? I don't know if there would be performance concerns
with larger catalogs (in fact I'm not sure how big these catalogs
get).

> Another point that ought to be made here is that pg_dump is not
> the only outside consumer of extension config data.  You're likely
> to break some applications if you change a config table too much.

Such as? We don't really want applications to be coupled against our
internals by accident, but we have to dump the internals to be able to
reproduce the state of the system.

> That's not an argument that we shouldn't try to make pg_dump more
> forgiving, but I'm not sure that we need to move heaven and earth.

Agreed. Hopefully we can find something that just moves a little earth. :D

Thanks!
--Jacob

Re: logical decoding and replication of sequences, take 2

2023-01-11 Thread Tomas Vondra

On 1/10/23 20:52, Robert Haas wrote:
> On Tue, Jan 10, 2023 at 1:32 PM Tomas Vondra
>  wrote:
>> 0001 is a fix for the pre-existing issue in logicalmsg_decode,
>> attempting to build a snapshot before getting into a consistent state.
>> AFAICS this only affects assert-enabled builds and is otherwise
>> harmless, because we are not actually using the snapshot (apply gets a
>> valid snapshot from the transaction).
>>
>> This is mostly the fix I shared in November, except that I kept the call
>> in decode.c (per comment from Andres). I haven't added any argument to
>> SnapBuildProcessChange because we may need to backpatch this (and it
>> didn't seem much simpler, IMHO).
> 
> I tend to associate transactional behavior with snapshots, so it looks
> odd to see code that builds a snapshot only when the message is
> non-transactional. I think that a more detailed comment spelling out
> the reasoning would be useful here.
> 

I'll try adding a comment explaining this, but the reasoning is fairly
simple AFAICS:

1) We don't actually need to build the snapshot for transactional
changes, because if we end up applying the change, we'll use snapshot
provided/maintained by reorderbuffer.

2) But we don't know if we end up applying the change - it may happen
this is one of the transactions we're waiting to finish / skipped, in
which case the snapshot is kinda bogus anyway. What "saved" us is that
we'll not actually use the snapshot in the end. It's just the assert
that causes issues.

3) For non-transactional changes, we need a snapshot because we're going
to execute the callback right away. But in this case the code actually
protects against building inconsistent snapshots.

>> This however brings me to the original question what's the purpose of
>> this patch - and that's essentially keeping sequences up to date to make
>> them usable after a failover. We can't generate values from the sequence
>> on the subscriber, because it'd just get overwritten. And from this
>> point of view, it's also fine that the sequence is slightly ahead,
>> because that's what happens after crash recovery anyway. And we're not
>> guaranteeing the sequences to be gap-less.
> 
> I agree that it's fine for the sequence to be slightly ahead, but I
> think that it can't be too far ahead without causing problems. Suppose
> for example that transaction #1 creates a sequence. Transaction #2
> does nextval on the sequence a bunch of times and inserts rows into a
> table using the sequence values as the PK. It's fine if the nextval
> operations are replicated ahead of the commit of transaction #2 -- in
> fact I'd say it's necessary for correctness -- but they can't precede
> the commit of transaction #1, since then the sequence won't exist yet.

It's not clear to me how could that even happen. If transaction #1
creates a sequence, it's invisible for transaction #2. So how could it
do nextval() on it? #2 has to wait for #1 to commit before it can do
anything on the sequence, which enforces the correct ordering, no?

> Likewise, if there's an ALTER SEQUENCE that creates a new relfilenode,
> I think that needs to act as a barrier: non-transactional changes that
> happened before that transaction must also be replicated before that
> transaction is replicated, and those that happened after that
> transaction is replicated must be replayed after that transaction is
> replicated. Otherwise, at the very least, there will be states visible
> on the standby that were never visible on the origin server, and maybe
> we'll just straight up get the wrong answer. For instance:
> 
> 1. nextval, setting last_value to 3
> 2. ALTER SEQUENCE, getting a new relfilenode, and also set last_value to 19
> 3. nextval, setting last_value to 20
> 
> If 3 happens before 2, the sequence ends up in the wrong state.
> 
> Maybe you've already got this and similar cases totally correctly
> handled, I'm not sure, just throwing it out there.
> 

I believe this should behave correctly too, thanks to locking.

If a transaction does ALTER SEQUENCE, that locks the sequence, so only
that transaction can do stuff with that sequence (and changes from that
point are treated as transactional). And everyone else is waiting for #1
to commit.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Remove source code display from \df+?

2023-01-11 Thread Magnus Hagander

On Wed, Jan 11, 2023 at 7:24 PM Isaac Morland 
wrote:

> On Wed, 11 Jan 2023 at 13:11, Pavel Stehule 
> wrote:
>
> please, don't send top post replies -
>> https://en.wikipedia.org/wiki/Posting_style
>>
>
> Sorry about that; I do know to do it properly and usually get it right.
> GMail doesn’t seem to have an option (that I can find) to leave no space at
> the top and put my cursor at the bottom; it nudges pretty firmly in the
> direction of top-posting. Thanks for the reminder.
>
>
>> I don't think printing a few first rows is a good idea - usually there is
>> nothing interesting (same is PL/Perl, PL/Python, ...)
>>
>> If the proposed feature can be generic, then this information should be
>> stored somewhere in pg_language. Or we can redesign usage of prosrc and
>> probin columns - but this can be a much more massive change.
>>
>>> 
>

> I’m looking for a quick win. So I think that means either drop the source
> column entirely, or show single-line source values only and nothing or a
> placeholder for anything that is more than one line, unless somebody comes
> up with another suggestion. Originally I was thinking just to remove
> entirely, but I’ve seen a couple of comments suggesting that people would
> find it unfortunate if the source weren’t shown for internal/C functions,
> so now I’m leaning towards showing single-line values only.
>
> I agree that showing the first line or couple of lines isn't likely to be
> very useful. The way I format my functions, the first line is always blank
> anyway: I write the bodies like this:
>
> $$
> BEGIN
> …
> END;
> $$;
>
> Even if somebody uses a different style, the first line is probably just
> "BEGIN" or something equally formulaic.
>

This is only about Internal and C, isn't it? Isn't the oid of these static,
and identified by INTERNALlanguageId and ClanguageId respectively? So we
could just have the query show the prosrc column if the language oid is one
of those two, and otherwise show "Please use \sf to view the source"?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Peter Geoghegan

On Wed, Jan 11, 2023 at 10:27 AM Andres Freund  wrote:
> Therefore I'd like to add an option to the VACUUM command to use to disable
> the use of the ringbuffer. Not sure about the name yet.

Sounds like a good idea.

> I think we should auto-enable that mode once we're using the failsafe mode,
> similar to [auto]vacuum cost delays getting disabled
> (c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're
> soon going to shut down, we want to be aggressive.

+1

-- 
Peter Geoghegan

Re: Remove source code display from \df+?

st 11. 1. 2023 v 19:31 odesílatel Magnus Hagander 
napsal:

> On Wed, Jan 11, 2023 at 7:24 PM Isaac Morland 
> wrote:
>
>> On Wed, 11 Jan 2023 at 13:11, Pavel Stehule 
>> wrote:
>>
>> please, don't send top post replies -
>>> https://en.wikipedia.org/wiki/Posting_style
>>>
>>
>> Sorry about that; I do know to do it properly and usually get it right.
>> GMail doesn’t seem to have an option (that I can find) to leave no space at
>> the top and put my cursor at the bottom; it nudges pretty firmly in the
>> direction of top-posting. Thanks for the reminder.
>>
>>
>>> I don't think printing a few first rows is a good idea - usually there
>>> is nothing interesting (same is PL/Perl, PL/Python, ...)
>>>
>>> If the proposed feature can be generic, then this information should be
>>> stored somewhere in pg_language. Or we can redesign usage of prosrc and
>>> probin columns - but this can be a much more massive change.
>>>
 
>>
>
>> I’m looking for a quick win. So I think that means either drop the source
>> column entirely, or show single-line source values only and nothing or a
>> placeholder for anything that is more than one line, unless somebody comes
>> up with another suggestion. Originally I was thinking just to remove
>> entirely, but I’ve seen a couple of comments suggesting that people would
>> find it unfortunate if the source weren’t shown for internal/C functions,
>> so now I’m leaning towards showing single-line values only.
>>
>> I agree that showing the first line or couple of lines isn't likely to be
>> very useful. The way I format my functions, the first line is always blank
>> anyway: I write the bodies like this:
>>
>> $$
>> BEGIN
>> …
>> END;
>> $$;
>>
>> Even if somebody uses a different style, the first line is probably just
>> "BEGIN" or something equally formulaic.
>>
>
> This is only about Internal and C, isn't it? Isn't the oid of these
> static, and identified by INTERNALlanguageId and ClanguageId respectively?
> So we could just have the query show the prosrc column if the language oid
> is one of those two, and otherwise show "Please use \sf to view the
> source"?
>

I think it can be acceptable - maybe we can rename the column "source code"
like "internal name" or some like that.

again I don't think printing  "Please use \sf to view the source"? " often
can be user friendly.  \? is clear and \sf is easy to use



> --
>  Magnus Hagander
>  Me: https://www.hagander.net/ 
>  Work: https://www.redpill-linpro.com/ 
>

Re: Common function for percent placeholder replacement

2023-01-11 Thread Nathan Bossart

On Wed, Jan 11, 2023 at 11:09:01AM +0100, Peter Eisentraut wrote:
> committed with that fixed

While rebasing my recovery modules patch set, I noticed a couple of small
things that might be worth cleaning up.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index f911e8c3a6..fcc87ff44f 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -154,9 +154,6 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	xlogRestoreCmd = BuildRestoreCommand(recoveryRestoreCommand,
 		 xlogpath, xlogfname,
 		 lastRestartPointFname);
-	if (xlogRestoreCmd == NULL)
-		elog(ERROR, "could not build restore command \"%s\"",
-			 recoveryRestoreCommand);
 
 	ereport(DEBUG3,
 			(errmsg_internal("executing restore command \"%s\"",
diff --git a/src/common/archive.c b/src/common/archive.c
index de42e914f7..641a58ee88 100644
--- a/src/common/archive.c
+++ b/src/common/archive.c
@@ -33,7 +33,7 @@
  * The result is a palloc'd string for the restore command built.  The
  * caller is responsible for freeing it.  If any of the required arguments
  * is NULL and that the corresponding alias is found in the command given
- * by the caller, then NULL is returned.
+ * by the caller, then an error is thrown.
  */
 char *
 BuildRestoreCommand(const char *restoreCommand,

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 10:27:20 -0800, Andres Freund wrote:
> On cloud hardware with higher fsync latency I've seen > 15x time differences
> between using the ringbuffers and avoiding them by using pg_prewarm.

A slightly edited version of what I've in the past to defeat the ringbuffers
using pg_prewarm, as I think it might be useful for others:

WITH what_rel AS (
  SELECT 'copytest_0'::regclass AS vacuum_me
),
what_to_prefetch AS (
SELECT vacuum_me, greatest(heap_blks_total - 1, 0) AS last_block,
CASE WHEN phase = 'scanning heap' THEN heap_blks_scanned ELSE 
heap_blks_vacuumed END AS current_pos
FROM what_rel, pg_stat_progress_vacuum
WHERE relid = vacuum_me AND phase IN ('scanning heap', 'vacuuming heap')
)
SELECT
vacuum_me, current_pos,
pg_prewarm(vacuum_me, 'buffer', 'main', current_pos, least(current_pos + 
1, last_block))
FROM what_to_prefetch
\watch 0.1

Having this running in the background brings the s_b=128MB, ringbuffer enabled
case down from 77797ms to 14838ms. Close to the version with the ringbuffer
disabled.

Unfortunately, afaik, that trick isn't currently possible for the index vacuum
phase, as we don't yet expose the current scan position. And not every index
might be as readily prefetchable as just prefetching the next 10k blocks from
the current position.

That's not too bad if your indexes are small, but unfortunately that's not
always the case...

Greetings,

Andres Freund

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 10:35:19 -0800, Peter Geoghegan wrote:
> On Wed, Jan 11, 2023 at 10:27 AM Andres Freund  wrote:
> > Therefore I'd like to add an option to the VACUUM command to use to disable
> > the use of the ringbuffer. Not sure about the name yet.
> 
> Sounds like a good idea.

Any idea about the name? The obvious thing is to reference ring buffers in the
option name, but that's more of an implementation detail...

Some ideas:

USE_RING_BUFFERS on|off
SCAN_PROTECTION on|off
REUSE_BUFFERS on|off
LIMIT_BUFFER_USAGE on|off

Regards,

Andres

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Peter Geoghegan

On Wed, Jan 11, 2023 at 10:58 AM Andres Freund  wrote:
> Any idea about the name? The obvious thing is to reference ring buffers in the
> option name, but that's more of an implementation detail...

What are the chances that anybody using this feature via a manual
VACUUM command will also use INDEX_CLEANUP off? It's not really
supposed to be used routinely, at all. Right? It's just for
emergencies.

Perhaps it can be tied to INDEX_CLEANUP=off? That makes it hard to get
just the behavior you want when testing VACUUM, but maybe that doesn't
matter.

Realistically, most of the value here comes from changing the failsafe
behavior, which doesn't require the user to know anything about
VACUUM. I know that AWS has reduced the vacuum_failsafe_age default on
RDS to 1.2 billion (a decision made before I joined Amazon), so it is
already something AWS lean on quite a bit where available.

-- 
Peter Geoghegan

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

On Wed, Jan 11, 2023 at 10:23 AM Magnus Hagander  wrote:
> Sorry to jump in (very) late in this game. So first, I like this general 
> approach :)

Thanks!

> It feels icky to have to add configure tests just to make a test work. But I 
> guess there isn't really a way around that if we want to test the full thing.

I agree...

> However, shouldn't we be using X509_get_default_cert_file_env() to get the 
> name of the env? Granted it's  unlikely to be anything else, but if it's an 
> API you're supposed to use. (In an ideal world that function would not return 
> anything in LibreSSL but I think it does include something, and then just 
> ignores it?)

I think you're right, but before I do that, is the cure better than
the disease? It seems like that would further complicate a part of the
Perl tests that is already unnecessarily complicated. (The Postgres
code doesn't use the envvar at all.) Unless you already know of an
OpenSSL-alike that doesn't use that same envvar name?

--Jacob

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 11:06:26 -0800, Peter Geoghegan wrote:
> On Wed, Jan 11, 2023 at 10:58 AM Andres Freund  wrote:
> > Any idea about the name? The obvious thing is to reference ring buffers in 
> > the
> > option name, but that's more of an implementation detail...
> 
> What are the chances that anybody using this feature via a manual
> VACUUM command will also use INDEX_CLEANUP off? It's not really
> supposed to be used routinely, at all. Right? It's just for
> emergencies.

I think it's also quite useful for e.g. vacuuming after initial data loads or
if you need to do a first vacuum after a lot of bloat accumulated due to a
stuck transaction.

> Perhaps it can be tied to INDEX_CLEANUP=off? That makes it hard to get
> just the behavior you want when testing VACUUM, but maybe that doesn't
> matter.

I don't like that - it's also quite useful to disable use of ringbuffers when
you actually need to clean up indexes. Especially when we have a lot of dead
tuples we'll rescan indexes over and over...

Greetings,

Andres Freund

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Peter Geoghegan

On Wed, Jan 11, 2023 at 11:18 AM Andres Freund  wrote:
> I don't like that - it's also quite useful to disable use of ringbuffers when
> you actually need to clean up indexes. Especially when we have a lot of dead
> tuples we'll rescan indexes over and over...

That's a fair point.

My vote goes to "REUSE_BUFFERS", then.

-- 
Peter Geoghegan

Re: recovery modules

2023-01-11 Thread Nathan Bossart

Thanks for taking a look.

On Wed, Jan 11, 2023 at 04:53:39PM +0900, Michael Paquier wrote:
> Hmm.  Is passing down the file name used as a cutoff point the best
> interface for the modules?  Perhaps passing down the redo LSN and its
> TLI would be a cleaner approach in terms of flexibility?  I agree with
> letting the startup enforce these numbers as that can be easy to mess
> up for plugin authors, leading to critical problems.

I'm having trouble thinking of any practical advantage of providing the
redo LSN and TLI.  If the main use-case is removing older archives as the
documentation indicates, it seems better to provide the file name so that
you can plug it straight into strcmp() to determine whether the file can be
removed (i.e., what pg_archivecleanup does).  If we provided the LSN and
TLI instead, you'd either need to convert that into a WAL file name for
strcmp(), or you'd need to convert the candidate file name into an LSN and
TLI and compare against those.

> "basic_archive" does not reflect what this module does.  Using one
> library simplifies the whole configuration picture and the tests, so
> perhaps something like basic_wal_module, or something like that, would
> be better long-term?

I initially created a separate basic_restore module, but decided to fold it
into basic_archive to simplify the patch and tests.  I hesitated to rename
it because it already exists in v15, and since it deals with creating and
restoring archive files, the name still seemed somewhat accurate.  That
being said, I don't mind renaming it if that's what folks want.

I've attached a new patch set that is rebased over c96de2c.  There are no
other changes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7cd6e2a870457f350d3c829fb434d9204d3814ae Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 23 Dec 2022 16:35:25 -0800
Subject: [PATCH v5 1/3] Move the code to restore files via the shell to a
 separate file.

This is preparatory work for allowing more extensibility in this
area.
---
 src/backend/access/transam/Makefile|   1 +
 src/backend/access/transam/meson.build |   1 +
 src/backend/access/transam/shell_restore.c | 160 +
 src/backend/access/transam/xlog.c  |  44 --
 src/backend/access/transam/xlogarchive.c   | 123 +---
 src/include/access/xlogarchive.h   |   7 +-
 6 files changed, 206 insertions(+), 130 deletions(-)
 create mode 100644 src/backend/access/transam/shell_restore.c

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db..099c315d03 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -19,6 +19,7 @@ OBJS = \
 	multixact.o \
 	parallel.o \
 	rmgr.o \
+	shell_restore.o \
 	slru.o \
 	subtrans.o \
 	timeline.o \
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index 8920c1bfce..3031c2f6cf 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -7,6 +7,7 @@ backend_sources += files(
   'multixact.c',
   'parallel.c',
   'rmgr.c',
+  'shell_restore.c',
   'slru.c',
   'subtrans.c',
   'timeline.c',
diff --git a/src/backend/access/transam/shell_restore.c b/src/backend/access/transam/shell_restore.c
new file mode 100644
index 00..a0562af95c
--- /dev/null
+++ b/src/backend/access/transam/shell_restore.c
@@ -0,0 +1,160 @@
+/*-
+ *
+ * shell_restore.c
+ *
+ * These recovery functions use a user-specified shell command (e.g., the
+ * restore_command GUC).
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/backend/access/transam/shell_restore.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xlogarchive.h"
+#include "access/xlogrecovery.h"
+#include "common/archive.h"
+#include "common/percentrepl.h"
+#include "storage/ipc.h"
+#include "utils/wait_event.h"
+
+static void ExecuteRecoveryCommand(const char *command,
+   const char *commandName, bool failOnSignal,
+   uint32 wait_event_info,
+   const char *lastRestartPointFileName);
+
+bool
+shell_restore(const char *file, const char *path,
+			  const char *lastRestartPointFileName)
+{
+	char	   *cmd;
+	int			rc;
+
+	/* Build the restore command to execute */
+	cmd = BuildRestoreCommand(recoveryRestoreCommand, path, file,
+			  lastRestartPointFileName);
+	if (cmd == NULL)
+		elog(ERROR, "could not build restore command \"%s\"", cmd);
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing restore command \"%s\"", cmd)));
+
+	/*
+	 * Copy xlog from archival storage to XLOGDIR
+	 */
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+	rc = system(cmd);
+	pgstat_report_wait_end();
+
+	pfree(cmd);
+
+	/*
+	 * Remember, we rollforward UNTIL the re

Re: logical decoding and replication of sequences, take 2

Hi,


Heikki, CCed you due to the point about 2c03216d8311 below.


On 2023-01-10 19:32:12 +0100, Tomas Vondra wrote:
> 0001 is a fix for the pre-existing issue in logicalmsg_decode,
> attempting to build a snapshot before getting into a consistent state.
> AFAICS this only affects assert-enabled builds and is otherwise
> harmless, because we are not actually using the snapshot (apply gets a
> valid snapshot from the transaction).

LGTM.


> 0002 is a rebased version of the original approach, committed as
> 0da92dc530 (and then reverted in 2c7ea57e56). This includes the same fix
> as 0001 (for the sequence messages), the primary reason for the revert.
> 
> The rebase was not quite straightforward, due to extensive changes in
> how publications deal with tables/schemas, and so on. So this adopts
> them, but other than that it behaves just like the original patch.

This is a huge diff:
>  72 files changed, 4715 insertions(+), 612 deletions(-)

It'd be nice to split it to make review easier. Perhaps the sequence decoding
support could be split from the whole publication rigamarole?


> This does not include any changes to test_decoding and/or the built-in
> replication - those will be committed in separate patches.

Looks like that's not the case anymore?


> +/*
> + * Update the sequence state by modifying the existing sequence data row.
> + *
> + * This keeps the same relfilenode, so the behavior is non-transactional.
> + */
> +static void
> +SetSequence_non_transactional(Oid seqrelid, int64 last_value, int64 log_cnt, 
> bool is_called)
> +{
> + SeqTableelm;
> + Relationseqrel;
> + Buffer  buf;
> + HeapTupleData seqdatatuple;
> + Form_pg_sequence_data seq;
> +
> + /* open and lock sequence */
> + init_sequence(seqrelid, &elm, &seqrel);
> +
> + /* lock page' buffer and read tuple */
> + seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
> +
> + /* check the comment above nextval_internal()'s equivalent call. */
> + if (RelationNeedsWAL(seqrel))
> + {
> + GetTopTransactionId();
> +
> + if (XLogLogicalInfoActive())
> + GetCurrentTransactionId();
> + }
> +
> + /* ready to change the on-disk (or really, in-buffer) tuple */
> + START_CRIT_SECTION();
> +
> + seq->last_value = last_value;
> + seq->is_called = is_called;
> + seq->log_cnt = log_cnt;
> +
> + MarkBufferDirty(buf);
> +
> + /* XLOG stuff */
> + if (RelationNeedsWAL(seqrel))
> + {
> + xl_seq_rec  xlrec;
> + XLogRecPtr  recptr;
> + Pagepage = BufferGetPage(buf);
> +
> + XLogBeginInsert();
> + XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
> +
> + xlrec.locator = seqrel->rd_locator;
> + xlrec.created = false;
> +
> + XLogRegisterData((char *) &xlrec, sizeof(xl_seq_rec));
> + XLogRegisterData((char *) seqdatatuple.t_data, 
> seqdatatuple.t_len);
> +
> + recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
> +
> + PageSetLSN(page, recptr);
> + }
> +
> + END_CRIT_SECTION();
> +
> + UnlockReleaseBuffer(buf);
> +
> + /* Clear local cache so that we don't think we have cached numbers */
> + /* Note that we do not change the currval() state */
> + elm->cached = elm->last;
> +
> + relation_close(seqrel, NoLock);
> +}
> +
> +/*
> + * Update the sequence state by creating a new relfilenode.
> + *
> + * This creates a new relfilenode, to allow transactional behavior.
> + */
> +static void
> +SetSequence_transactional(Oid seq_relid, int64 last_value, int64 log_cnt, 
> bool is_called)
> +{
> + SeqTableelm;
> + Relationseqrel;
> + Buffer  buf;
> + HeapTupleData seqdatatuple;
> + Form_pg_sequence_data seq;
> + HeapTuple   tuple;
> +
> + /* open and lock sequence */
> + init_sequence(seq_relid, &elm, &seqrel);
> +
> + /* lock page' buffer and read tuple */
> + seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
> +
> + /* Copy the existing sequence tuple. */
> + tuple = heap_copytuple(&seqdatatuple);
> +
> + /* Now we're done with the old page */
> + UnlockReleaseBuffer(buf);
> +
> + /*
> +  * Modify the copied tuple to update the sequence state (similar to what
> +  * ResetSequence does).
> +  */
> + seq = (Form_pg_sequence_data) GETSTRUCT(tuple);
> + seq->last_value = last_value;
> + seq->is_called = is_called;
> + seq->log_cnt = log_cnt;
> +
> + /*
> +  * Create a new storage file for the sequence - this is needed for the
> +  * transactional behavior.
> +  */
> + RelationSetNewRelfilenumber(seqrel, seqrel->rd_rel->relpersistence);
> +
> + /*
> +  * Ensure sequence's relfrozenxid is at 0, since it won't contain any
> +  * unfrozen XIDs.  Same with relminmxid, since a sequence will never
> +  * contai

Re: allowing for control over SET ROLE

On Wed, Jan 11, 2023 at 10:16 AM Noah Misch  wrote:
> A "git grep 'direct or indirect mem'" found a few more:
>
> doc/src/sgml/ref/alter_collation.sgml:42:   To alter the owner, you must also 
> be a direct or indirect member of the new
> doc/src/sgml/ref/create_database.sgml:92:role, you must be a direct 
> or indirect member of that role,
> doc/src/sgml/ref/create_schema.sgml:92:owned by another role, you 
> must be a direct or indirect member of

Ah, thanks.

> I wondered if the new recurring phrase "must be able to SET ROLE" should be
> more specific, e.g. one of "must have
> {permission,authorization,authority,right} to SET ROLE".  But then I stopped
> wondering and figured "be able to" is sufficient.

I think so, too. Note the wording of the error message in check_can_set_role().

> I still think docs for the SET option itself should give a sense of the
> diversity of things it's intended to control.  It could be simple.  A bunch of
> the sites you're modifying are near text like "These restrictions enforce that
> altering the owner doesn't do anything you couldn't do by dropping and
> recreating the aggregate function."  Perhaps the main SET doc could say
> something about how it restricts other things that would yield equivalent
> outcomes.  (Incidentally, DROP is another case of something one likely doesn't
> want the WITH SET FALSE member using.  I think that reinforces a point I wrote
> upthread.  To achieve the original post's security objective, the role must
> own no objects whatsoever.)

I spent a while on this. The attached is as well I was able to figure
out how to do. What do you think?

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-More-documentation-update-for-GRANT-.-WITH-SET-OP.patch
Description: Binary data

Re: logical decoding and replication of sequences, take 2

On Wed, Jan 11, 2023 at 1:29 PM Tomas Vondra
 wrote:
> > I agree that it's fine for the sequence to be slightly ahead, but I
> > think that it can't be too far ahead without causing problems. Suppose
> > for example that transaction #1 creates a sequence. Transaction #2
> > does nextval on the sequence a bunch of times and inserts rows into a
> > table using the sequence values as the PK. It's fine if the nextval
> > operations are replicated ahead of the commit of transaction #2 -- in
> > fact I'd say it's necessary for correctness -- but they can't precede
> > the commit of transaction #1, since then the sequence won't exist yet.
>
> It's not clear to me how could that even happen. If transaction #1
> creates a sequence, it's invisible for transaction #2. So how could it
> do nextval() on it? #2 has to wait for #1 to commit before it can do
> anything on the sequence, which enforces the correct ordering, no?

Yeah, I meant if #1 had committed and then #2 started to do its thing.
I was worried that decoding might reach the nextval operations in
transaction #2 before it replayed #1.

This worry may be entirely based on me not understanding how this
actually works. Do we always apply a transaction as soon as we see the
commit record for it, before decoding any further?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Minimal logical decoding on standbys

Hi,

On 2023-01-06 10:52:06 +0100, Drouvot, Bertrand wrote:
> On 1/6/23 4:40 AM, Andres Freund wrote:
> > ISTM that the ordering of patches isn't quite right later on. ISTM that it
> > doesn't make sense to introduce working logic decoding without first fixing
> > WalSndWaitForWal() (i.e. patch 0006). What made you order the patches that
> > way?
> > 
> 
> Idea was to ease the review: 0001 to 0005 to introduce the feature and 0006 
> to deal
> with this race condition.

> I thought it would be easier to review that way (given the complexity of 
> "just" adding the
> feature itself).

The problem I have with that is that I saw a lot of flakiness in the tests due
to the race condition. So introducing them in that order just doesn't make a
whole lot of sense to me. It's also something that can be committed
independently, I think.

> > Why is indisusercatalog stored as "full" column, whereas we store the fact 
> > of
> > table being used as a catalog table in a reloption? I'm not adverse to 
> > moving
> > to a full column, but then I think we should do the same for tables.
> > 
> > Earlier version of the patches IIRC sourced the "catalogness" from the
> > relation. What lead you to changing that? I'm not saying it's wrong, just 
> > not
> > sure it's right either.
> 
> That's right it's started retrieving this information from the relation.
> 
> Then, Robert made a comment in [1] saying it's not safe to call
> table_open() while holding a buffer lock.

The suggested path in earlier versions to avoid doing so was to make sure that
we pass down the Relation for the table into the necessary functions. Did you
explore that any further?

> Then, I worked on other options and submitted the current one.
> 
> While reviewing 0001, Robert's also thought of it (see [2])) and finished 
> with:
> 
> "
> So while I do not really like the approach of storing the same
> property in different ways for tables and for indexes, it's also not
> really obvious to me how to do better.
> "
> 
> That's also my thought.

I still dislike this approach. The code for cascading the change to the index
attributes is complicated. RelationIsAccessibleInLogicalDecoding() is getting
slower. We add unnecessary storage space to all pg_index rows.

Now I even wonder if this doesn't break the pg_index.indcheckxmin logic (which
I really dislike, but that's a separate discussion). I think updating pg_index
to set indisusercatalog might cause the index to be considered unusable, if
indcheckxmin = true. See

/*
 * If the index is valid, but cannot yet be used, 
ignore it; but
 * mark the plan we are generating as transient. See
 * src/backend/access/heap/README.HOT for discussion.
 */
if (index->indcheckxmin &&

!TransactionIdPrecedes(HeapTupleHeaderGetXmin(indexRelation->rd_indextuple->t_data),

TransactionXmin))
{
root->glob->transientPlan = true;
index_close(indexRelation, NoLock);
continue;
}

The reason we went with the indcheckxmin approach, instead of storing the xmin
after which an index is uable directly, was that that way we don't need
special logic around vacuum to reset the stored xid to prevent the index to
become unusable after xid wraparound. But these days we could just store a
64bit xid...

Greetings,

Andres Freund

Re: logical decoding and replication of sequences, take 2

Hi,

On 2023-01-11 15:23:18 -0500, Robert Haas wrote:
> Yeah, I meant if #1 had committed and then #2 started to do its thing.
> I was worried that decoding might reach the nextval operations in
> transaction #2 before it replayed #1.
>
> This worry may be entirely based on me not understanding how this
> actually works. Do we always apply a transaction as soon as we see the
> commit record for it, before decoding any further?

Yes.

Otherwise we'd have a really hard time figuring out the correct historical
snapshot to use for subsequent transactions - they'd have been able to see the
catalog modifications made by the committing transaction.

Greetings,

Andres Freund

Re: Exposing the lock manager's WaitForLockers() to SQL

Hi,

On 2023-01-10 10:01:25 +0100, Marco Slot wrote:
> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen  wrote:
> > We'd like to be able to call the lock manager's WaitForLockers() and
> > WaitForLockersMultiple() from SQL. Below I describe our use case, but
> > basically I'm wondering if this:
> >
> > 1. Seems like a reasonable thing to do
> >
> > 2. Would be of interest upstream
> >
> > 3. Should be done with a new pg_foo() function (taking an
> >oid?), or a whole new SQL command, or something else
> 
> Definitely +1 on adding a function/syntax to wait for lockers without
> actually taking a lock.

I think such a function would still have to integrate enough with the lock
manager infrastructure to participate in the deadlock detector. Otherwise I
think you'd trivially end up with loads of deadlocks.

Greetings,

Andres Freund

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Justin Pryzby

On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2023-01-11 10:35:19 -0800, Peter Geoghegan wrote:
> > On Wed, Jan 11, 2023 at 10:27 AM Andres Freund  wrote:
> > > Therefore I'd like to add an option to the VACUUM command to use to 
> > > disable
> > > the use of the ringbuffer. Not sure about the name yet.
> > 
> > Sounds like a good idea.
> 
> Any idea about the name? The obvious thing is to reference ring buffers in the
> option name, but that's more of an implementation detail...
> 
> Some ideas:
> 
> USE_RING_BUFFERS on|off
> REUSE_BUFFERS on|off

+1 for either of these.

I don't think it's an issue to expose implementation details here.
Anyone who wants to change this will know about the implementation
details that they're changing, and it's better to refer to it by the
same/similar name and not by some other name that's hard to find.

BTW I can't see that the ring buffer is currently exposed in any
user-facing docs for COPY/ALTER/VACUUM/CREATE ?

-- 
Justin

Re: daitch_mokotoff module

2023-01-11 Thread Paul Ramsey

On Mon, Jan 2, 2023 at 2:03 PM Dag Lem  wrote:

> I also improved on the documentation example (using Full Text Search).
> AFAIK you can't make general queries like that using arrays, however in
> any case I must admit that text arrays seem like more natural building
> blocks than space delimited text here.

This is a fun addition to fuzzystrmatch.

While it's a little late in the game, I'll just put it out there:
daitch_mokotoff() is way harder to type than soundex_dm(). Not sure
how you feel about that.

On the documentation, I found the leap directly into the tsquery
example a bit too big. Maybe start with a very simple example,

--
dm=# SELECT daitch_mokotoff('Schwartzenegger'),
daitch_mokotoff('Swartzenegger');

 daitch_mokotoff | daitch_mokotoff
-+-
 {479465}| {479465}
--

Then transition into a more complex example that illustrates the GIN
index technique you mention in the text, but do not show:

--
CREATE TABLE dm_gin (source text, dm text[]);

INSERT INTO dm_gin (source) VALUES
('Swartzenegger'),
('John'),
('James'),
('Steinman'),
('Steinmetz');

UPDATE dm_gin SET dm = daitch_mokotoff(source);

CREATE INDEX dm_gin_x ON dm_gin USING GIN (dm);

SELECT * FROM dm_gin WHERE dm && daitch_mokotoff('Schwartzenegger');
--

And only then go into the tsearch example. Incidentally, what does the
tsearch approach provide that the simple GIN approach does not?
Ideally explain that briefly before launching into the example. With
all the custom functions and so on it's a little involved, so maybe if
there's not a huge win in using that approach drop it entirely?

ATB,
P

Re: logical decoding and replication of sequences, take 2

On Wed, Jan 11, 2023 at 3:28 PM Andres Freund  wrote:
> On 2023-01-11 15:23:18 -0500, Robert Haas wrote:
> > Yeah, I meant if #1 had committed and then #2 started to do its thing.
> > I was worried that decoding might reach the nextval operations in
> > transaction #2 before it replayed #1.
> >
> > This worry may be entirely based on me not understanding how this
> > actually works. Do we always apply a transaction as soon as we see the
> > commit record for it, before decoding any further?
>
> Yes.
>
> Otherwise we'd have a really hard time figuring out the correct historical
> snapshot to use for subsequent transactions - they'd have been able to see the
> catalog modifications made by the committing transaction.

I wonder, then, what happens if somebody wants to do parallel apply.
That would seem to require some relaxation of this rule, but then
doesn't that break what this patch wants to do?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: low wal_retrieve_retry_interval causes missed signals on Windows

Hi,

On 2023-01-10 22:11:16 -0800, Nathan Bossart wrote:
> The attached patch fixes this by always calling WaitLatch(), even if
> wal_retrieve_retry_interval milliseconds have already elapsed and the
> timeout is 0.

It doesn't seem right to call WaitLatch() just for that purpose - nor
necessarily a complete fix.

Given that we check for interrupts in other parts of recovery with
HandleStartupProcInterrupt(), which doesn't interact with latches, isn't the
actual bug that HandleStartupProcInterrupt() doesn't contain the same black
magic that CHECK_FOR_INTERRUPTS() contains on windows?  Namely this stuff:

#ifndef WIN32
...
#else
#define INTERRUPTS_PENDING_CONDITION() \
(unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? pgwin32_dispatch_queued_signals() 
: 0, \
 unlikely(InterruptPending))
#endif

/* Service interrupt, if one is pending and it's safe to service it now */
#define CHECK_FOR_INTERRUPTS() \
do { \
if (INTERRUPTS_PENDING_CONDITION()) \
ProcessInterrupts(); \
} while(0)

Looks like we have that bug in quite a few places... Some are "protected" by
unconditional WaitLatch() calls, but at least pgarch.c, checkpointer.c via
CheckpointWriteDelay() seem borked.

Greetings,

Andres Freund

Re: logical decoding and replication of sequences, take 2

Hi,

On 2023-01-11 15:41:45 -0500, Robert Haas wrote:
> I wonder, then, what happens if somebody wants to do parallel apply.  That
> would seem to require some relaxation of this rule, but then doesn't that
> break what this patch wants to do?

I don't think it'd pose a direct problem - presumably you'd only parallelize
applying changes, not committing the transactions containing them. You'd get a
lot of inconsistencies otherwise.

If you're thinking of decoding changes in parallel (rather than streaming out
large changes before commit when possible), you'd only be able to do that in
cases when transaction haven't performed catalog changes, I think. In which
case there'd also be no issue wrt transactional sequence changes.

Greetings,

Andres Freund

Re: ATTACH PARTITION seems to ignore column generation status

Amit Langote  writes:
> I've updated your disallow-generated-child-columns-2.patch to do this,
> and have also merged the delta post that I had attached with my last
> email, whose contents you sound to agree with.

Pushed with some further work to improve the handling of multiple-
inheritance cases.  We still need to insist that all or none of the
parent columns are generated, but we don't have to require their
generation expressions to be alike: that can be resolved by letting
the child table override the expression, much as we've long done for
plain default expressions.  (This did need some work in pg_dump
after all.)  I'm pretty happy with where this turned out.

regards, tom lane

Re: pgsql: Add new GUC createrole_self_grant.

Robert Haas  writes:
> If you want to make safe a SECURITY DEFINER function written using sql
> or plpgsql, you either have to schema-qualify every single reference
> or, more realistically, attach a SET clause to the function to set the
> search_path to a sane value during the time that the function is
> executing. The problem here can be handled the same way, except that
> it's needed in a vastly more limited set of circumstances: you have to
> be calling a SECURITY DEFINER function that will execute CREATE ROLE
> as a non-superuser (and that user then needs to be sensitive to the
> value of this GUC in some security-relevant way). It might be good to
> document this -- I just noticed that the CREATE FUNCTION page has a
> section on "Writing SECURITY DEFINER Functions Safely" which talks
> about dealing with the search_path issues, and it seems like it would
> be worth adding a sentence or two there to talk about this.

OK, I'd be satisfied with that.

regards, tom lane

Re: Can we let extensions change their dumped catalog schemas?

Jacob Champion  writes:
> On Tue, Jan 10, 2023 at 7:53 PM Tom Lane  wrote:
>> I also fear that it will break
>> a bunch of use-cases that work fine today, which are exactly the
>> ones for which we originally defined pg_dump as *not* committing
>> to a particular extension version.

> Right, I think it would have to be opt-in. Say, a new control file
> option dump_version or some such.

That would require all the installed extensions to cope with this
the same way, which does not seem like a great assumption.

regards, tom lane

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 14:38:34 -0600, Justin Pryzby wrote:
> On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote:
> > Some ideas:
> > 
> > USE_RING_BUFFERS on|off
> > REUSE_BUFFERS on|off
> 
> +1 for either of these.

Then I'd go for REUSE_BUFFERS. What made you prefer it over
LIMIT_BUFFER_USAGE?

USE_BUFFER_ACCESS_STRATEGY would be a name tied to the implementation that's
not awful, I think..

> I don't think it's an issue to expose implementation details here.
> Anyone who wants to change this will know about the implementation
> details that they're changing, and it's better to refer to it by the
> same/similar name and not by some other name that's hard to find.

A ringbuffer could refer to a lot of things other than something limiting
buffer usage, that's why I don't like it.

> BTW I can't see that the ring buffer is currently exposed in any
> user-facing docs for COPY/ALTER/VACUUM/CREATE ?

Yea, there's surprisingly little in the docs about it, given how much it
influences behaviour. It's mentioned in tablesample-method.sgml, but without
explanation - and it's a page documenting C API...

Greetings,

Andres Freund

Re: Remove source code display from \df+?

Pavel Stehule  writes:
> st 11. 1. 2023 v 19:31 odesílatel Magnus Hagander 
> napsal:
>> This is only about Internal and C, isn't it? Isn't the oid of these
>> static, and identified by INTERNALlanguageId and ClanguageId respectively?
>> So we could just have the query show the prosrc column if the language oid
>> is one of those two, and otherwise show "Please use \sf to view the
>> source"?

> I think it can be acceptable - maybe we can rename the column "source code"
> like "internal name" or some like that.

Yeah, "source code" has always been kind of a misnomer for these
languages.

> again I don't think printing  "Please use \sf to view the source"? " often
> can be user friendly.  \? is clear and \sf is easy to use

We could shorten it to "See \sf" or something like that.  But if we change
the column header to "internal name" or the like, then the column just
obviously doesn't apply for non-internal languages, so leaving it null
should be fine.

regards, tom lane

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF