Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-07 Thread Richard Guo
On Tue, Aug 2, 2022 at 3:13 PM Richard Guo  wrote:

> On Sun, Jul 31, 2022 at 12:07 AM Tom Lane  wrote:
>
>> [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ]
>
> Maybe this is something we can do. Currently for the query below:
>
> # explain select * from foo where a in (select c from bar);
>QUERY PLAN
> -
>  Hash Semi Join  (cost=154156.00..173691.29 rows=10 width=8)
>Hash Cond: (foo.a = bar.c)
>->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8)
>->  Hash  (cost=72124.00..72124.00 rows=500 width=4)
>  ->  Seq Scan on bar  (cost=0.00..72124.00 rows=500 width=4)
> (5 rows)
>
> I believe we can get a cheaper plan if we are able to swap the outer and
> inner for SEMI JOIN and use the smaller 'foo' as inner rel.
>

I'm thinking about the JOIN_RIGHT_SEMI thing and it seems that it can be
implemented for HashJoin with very short change.  What we want to do is
to just have the first match for each inner tuple.  So after scanning
the hash bucket for matches, we just need to check whether the inner
tuple has been set match and skip it if so, something like

  {
  if (!ExecScanHashBucket(node, econtext))
  {
  /* out of matches; check for possible outer-join fill */
  node->hj_JoinState = HJ_FILL_OUTER_TUPLE;
  continue;
  }
  }

+ /*
+  * In a right-semijoin, we only need the first match for each
+  * inner tuple.
+  */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+

I have a simple implementation locally and tried it with the query below
and saw a speedup of 2055.617 ms VS. 1156.772 ms (both best of 3).

# explain (costs off, analyze)
select * from foo where a in (select c from bar);
  QUERY PLAN
---
 Hash Semi Join (actual time=1957.748..2055.058 rows=10 loops=1)
   Hash Cond: (foo.a = bar.c)
   ->  Seq Scan on foo (actual time=0.026..0.029 rows=10 loops=1)
   ->  Hash (actual time=1938.818..1938.819 rows=500 loops=1)
 Buckets: 262144  Batches: 64  Memory Usage: 4802kB
 ->  Seq Scan on bar (actual time=0.016..853.010 rows=500
loops=1)
 Planning Time: 0.327 ms
 Execution Time: 2055.617 ms
(8 rows)

# explain (costs off, analyze)
select * from foo where a in (select c from bar);
   QUERY PLAN
-
 Hash Right Semi Join (actual time=11.525..1156.713 rows=10 loops=1)
   Hash Cond: (bar.c = foo.a)
   ->  Seq Scan on bar (actual time=0.034..523.036 rows=500 loops=1)
   ->  Hash (actual time=0.027..0.029 rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on foo (actual time=0.009..0.014 rows=10 loops=1)
 Planning Time: 0.312 ms
 Execution Time: 1156.772 ms
(8 rows)

It may not be easy for MergeJoin and NestLoop though, as we do not have
a way to know if an inner tuple has been already matched or not.  But
the benefit of swapping inputs for MergeJoin and NestLoop seems to be
small, so I think it's OK to ignore them.

So is it worthwhile to make JOIN_RIGHT_SEMI come true?

Thanks
Richard


Re: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-07 Thread Masahiko Sawada
On Fri, Apr 7, 2023 at 2:28 PM Amit Kapila  wrote:
>
> On Wed, Apr 5, 2023 at 5:58 AM Peter Smith  wrote:
> >
> > There are some recent comment that added new options for CREATE SUBSCRIPTION
> >
> ...
> > PSA patches to add those tab completions.
> >
>
> LGTM, so pushed. BTW, while looking at this, I noticed that newly
> added options "password_required" and "run_as_owner" has incorrectly
> mentioned their datatype as a string in the docs. It should be
> boolean.

+1

> I think "password_required" belongs to first section of docs
> which says: "The following parameters control what happens during
> subscription creation".

But the documentation of ALTER SUBSCRIPTION says:

The parameters that can be altered are slot_name, synchronous_commit,
binary, streaming, disable_on_error, password_required, run_as_owner,
and origin. Only a superuser can set password_required = false.

ISTM that both password_required and run_as_owner are parameters to
control the subscription's behavior, like disable_on_error and
streaming. So it looks good to me that password_required belongs to
the second section.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 08:09:50 +0200, Drouvot, Bertrand wrote:
> Hi,
>
> On 4/7/23 7:56 AM, Andres Freund wrote:
> > Hi,
> >
> > On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:
> > > Done in V63 attached and did change the associated comment a bit.
> >
> > Can you send your changes incrementally, relative to V62? I'm polishing them
> > right now, and that'd make it a lot easier to apply your changes ontop.
> >
>
> Sure, please find them enclosed.

Thanks.


Here's my current working state - I'll go to bed soon.

Changes:

- shared catalog relations weren't handled correctly, because the dboid is
  InvalidOid for them. I wrote a test for that as well.

- ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into
  account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks
  at logical slots)

- I don't think the subset of slot xids that were checked when invalidating
  was right. We need to check effective_xmin and effective_catalog_xmin - the
  latter was using catalog_xmin.

- similarly, it wasn't right that specifically those two fields were
  overwritten when invalidated - as that was done, I suspect the changes might
  get lost on a restart...

- As mentioned previously, I did not like all the functions in slot.h, nor
  their naming. Not yet quite finished with that, but a good bit further

- There were a lot of unrelated changes, e.g. removing comments like
 * NB - this runs as part of checkpoint, so avoid raising errors if possible.

- I still don't like the order of the patches, fixing the walsender patches
  after introducing support for logical decoding on standby. Reordered.

- I don't think logical slots being invalidated as checked e.g. in
  pg_logical_replication_slot_advance()

- I didn't like much that InvalidatePossiblyObsoleteSlot() switched between
  kill() and SendProcSignal() based on the "conflict". There very well could
  be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside
  of the startup process in the future. Instead I made it differentiate based
  on MyBackendType == B_STARTUP.


I also:

Added new patch that replaces invalidated_at with a new enum, 'invalidated',
listing the reason for the invalidation. I added a check for !invalidated to
ReplicationSlotsComputeRequiredLSN() etc.

Added new patch moving checks for invalid logical slots into
CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
no sense. As far as I can tell the old message in
pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
"never previously reserved WAL"

Split "Handle logical slot conflicts on standby." into two. I'm not sure that
should stay that way, but it made it easier to hack on
InvalidateObsoleteReplicationSlots.


Todo:
- write a test that invalidated logical slots stay invalidated across a restart
- write a test that invalidated logical slots do not lead to retaining WAL
- Further evolve the API of InvalidateObsoleteReplicationSlots()
  - pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
  - rename xid to snapshotConflictHorizon, that'd be more in line with the
ResolveRecoveryConflictWithSnapshot and easier to understand, I think

- The test could stand a bit of cleanup and consolidation
  - No need to start 4 psql processes to do 4 updates, just do it in one
safe_psql()
  - the sequence of drop_logical_slots(), create_logical_slots(),
change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
repeated quite a few times
  - the stats queries checking for specific conflict counts, including
preceding tests, is pretty painful. I suggest to reset the stats at the
end of the test instead (likely also do the drop_logical_slot() there).
  - it's hard to correlate postgres log and the tap test, because the slots
are named the same across all tests. Perhaps they could have a per-test
prefix?
  - numbering tests is a PITA, I had to renumber the later ones, when adding a
test for shared catalog tables


My attached version does include your v62-63 incremental chnages.

Greetings,

Andres Freund
>From 1e5461e0019678a92192b0dd5d9bf3f7105f504d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 6 Apr 2023 20:00:07 -0700
Subject: [PATCH va65 1/9] replication slots: replace invalidated_at LSN with
 an enum

---
 src/include/replication/slot.h  | 15 +--
 src/backend/replication/slot.c  | 21 ++---
 src/backend/replication/slotfuncs.c |  8 +++-
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdfe..793f0701b88 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
 	RS_TEMPORARY
 } ReplicationSlotPersistency;
 
+/*
+ * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
+ * 'invalidated' field is se

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-04-07 Thread Denis Laxalde

The patch set does not apply any more.

I tried to rebase locally; even leaving out 1 ("libpq: Run pgindent 
after a9e9a9f32b3"), patch 4 ("Start using new libpq cancel APIs") is 
harder to resolve following 983ec23007b (I suppose).


Appart from that, the implementation in v19 sounds good to me, and seems 
worthwhile. FWIW, as said before, I also implemented it in Psycopg in a 
sort of an end-to-end validation.





Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-06 18:15:14 -0700, Andres Freund wrote:
> I think it might be worth having a C test for some of the bufmgr.c API. Things
> like testing that retrying a failed relation extension works the second time
> round.

A few hours after this I hit a stupid copy-pasto (21d7c05a5cf) that would
hopefully have been uncovered by such a test...

I guess we could even test this specific instance without a more complicated
framework.  Create table with some data, rename the file, checkpoint - should
fail, rename back, checkpoint - should succeed.

It's much harder to exercise the error paths inside the backend extending the
relation unfortunately, because we require the file to be opened rw before
doing much. And once the FD is open, removing the permissions doesn't help.
The least complicated approach I scan see is creating directory qutoas, but
that's quite file system specific...

Greetings,

Andres Freund




RE: Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c

2023-04-07 Thread wangw.f...@fujitsu.com
On Tues, Apr 4, 2023 at 23:48 PM Tom Lane  wrote:
> Nathan Bossart  writes:
> > On Wed, Feb 22, 2023 at 12:40:07PM +, wangw.f...@fujitsu.com wrote:
> >> After some rethinking, I think users can easily get exact value according 
> >> to
> >> exact formula, and I think using accurate formula can help users adjust
> >> max_locks_per_transaction or max_predicate_locks_per_transaction if
> needed. So,
> >> I used the exact formulas in the attached v2 patch.
> 
> > IMHO this is too verbose.
> 
> Yeah, it's impossibly verbose.  Even the current wording does not fit
> nicely in pg_settings output.
> 
> > Perhaps it could be simplified to something like
> > The shared lock table is sized on the assumption that at most
> > max_locks_per_transaction objects per eligible process or prepared
> > transaction will need to be locked at any one time.
> 
> I like the "per eligible process" wording, at least for guc_tables.c;
> or maybe it could be "per server process"?  That would be more
> accurate and not much longer than what we have now.
> 
> I've got mixed emotions about trying to put the exact formulas into
> the SGML docs either.  Space isn't such a constraint there, but I
> think the info would soon go out of date (indeed, I think the existing
> wording was once exactly accurate), and I'm not sure it's worth trying
> to maintain it precisely.

Thanks both for sharing your opinions.
I agree that verbose descriptions make maintenance difficult.
For consistency, I unified the formulas in guc_tables.c and pg-doc into the same
suggested short formula. Attach the new patch.

> One reason that I'm not very excited about this is that in fact the
> formula seen in the source code is not exact either; it's a lower
> bound for how much space will be available.  That's because we throw
> in 100K slop at the bottom of the shmem sizing calculation, and a
> large chunk of that remains available to be eaten by the lock table
> if necessary.

Thanks for sharing this.
Since no one has reported related issues, I'm also fine to close this entry if
this related modification is not necessary.

Regards,
Wang Wei


v3-0001-Fix-the-description-of-shared-lock-table-size-and.patch
Description:  v3-0001-Fix-the-description-of-shared-lock-table-size-and.patch


Re: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-07 Thread Amit Kapila
On Fri, Apr 7, 2023 at 1:12 PM Masahiko Sawada  wrote:
>
> On Fri, Apr 7, 2023 at 2:28 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 5, 2023 at 5:58 AM Peter Smith  wrote:
> > >
> >
> > LGTM, so pushed. BTW, while looking at this, I noticed that newly
> > added options "password_required" and "run_as_owner" has incorrectly
> > mentioned their datatype as a string in the docs. It should be
> > boolean.
>
> +1
>
> > I think "password_required" belongs to first section of docs
> > which says: "The following parameters control what happens during
> > subscription creation".
>
> But the documentation of ALTER SUBSCRIPTION says:
>
> The parameters that can be altered are slot_name, synchronous_commit,
> binary, streaming, disable_on_error, password_required, run_as_owner,
> and origin. Only a superuser can set password_required = false.
>

By the above, do you intend to say that all the parameters that can be
altered are in the second list? If so, slot_name belongs to the first
category.

> ISTM that both password_required and run_as_owner are parameters to
> control the subscription's behavior, like disable_on_error and
> streaming. So it looks good to me that password_required belongs to
> the second section.
>

Do you mean that because 'password_required' is used each time we make
a connection to a publisher during replication, it should be in the
second category? If so, slot_name is also used during the start
replication each time.

BTW, do we need to check one or both of these parameters in
maybe_reread_subscription() where we "Exit if any parameter that
affects the remote connection was changed."

-- 
With Regards,
Amit Kapila.




RE: Partial aggregates pushdown

2023-04-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian

> First, my apologies for not addressing this sooner.  I was so focused on my
> own tasks that I didn't realize this very important patch was not getting
> attention.  I will try my best to get it into PG 17.
Thank you very much for your comments. 
I will improve this patch for PG17.
I believe that this patch will help us use PostgreSQL's built-in sharding for 
OLAP.

> What amazes me is that you didn't need to create _any_ actual aggregate
> functions.  Rather, you just needed to hook existing functions into the
> aggregate tables for partial FDW execution.
Yes. This patch enables partial aggregate pushdown using 
only existing functions which belong to existing aggregate function
and are needed by parallel query(such as state transition function and 
serialization function).
This patch does not need new types of function belonging to aggregate functions
and does not need new functions belonging to existing aggregate functions.

> I suggest we remove the version check requirement --- instead just document
> that the FDW Postgres version should be the same or newer than the calling
> Postgres server --- that way, we can assume that whatever is in the system
> catalogs of the caller is in the receiving side.  
Thanks for the comment. I will modify this patch according to your comment.

> We should add a GUC to turn off
> this optimization for cases where the FDW Postgres version is older than the
> caller.  This handles case 1-2.
Thanks for the advice here too.
I thought it would be more appropriate to add a foregin server option of 
postgres_fdw rather than adding GUC. 
Would you mind if I ask you what you think about it?

> > 2. Automation of creating definition of partialaggfuncs In development
> > of v17, I manually create definition of partialaggfuncs for avg, min, max, 
> > sum,
> count.
> > I am concerned that this may be undesirable.
> > So I am thinking that v17 should be modified to automate creating
> > definition of partialaggfuncs for all built-in aggregate functions.
> 
> Are there any other builtin functions that need this?  I think we can just
> provide documention for extensions on how to do this.
For practical purposes, it is sufficient 
if partial aggregate for the above functions can be pushed down.
I think you are right, it would be sufficient to document how to achieve
 partial aggregate pushdown for other built-in functions.

> > 3. Documentation
> > I need add explanation of partialaggfunc to documents on postgres_fdw and
> other places.
> 
> I can help with that once we decide on the above.
Thank you. In the next verion of this patch, I will add documents on 
postgres_fdw
and other places. 

> I think 'partialaggfn' should be named 'aggpartialfn' to match other columns 
> in
> pg_aggregate.
Thanks for the comment. I will modify this patch according to your comment.

> For case 3, I don't even know how much pushdown those do of _any_
> aggregates to non-PG servers, let along parallel FDW ones.  Does anyone
> know the details?
To allow partial aggregate pushdown for non-PG FDWs,
I think we need to add pushdown logic to their FDWs for each function.
For example, we need to add logic avg() -> sum()/count() to their FDWs for avg.
To allow parallel partial aggregate by non-PG FDWs,
I think we need to add FDW Routines for Asynchronous Execution to their FDWs[1].

> I am confused by these changes to pg_aggegate:
> 
> +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
> +  aggfinalfn => 'int8_avg_serialize', aggcombinefn =>
> +'int8_avg_combine',
> +  aggserialfn => 'int8_avg_serialize', aggdeserialfn =>
> +'int8_avg_deserialize',
> +  aggtranstype => 'internal', aggtransspace => '48' },
> 
> ...
> 
> +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
> +  aggfinalfn => 'numeric_avg_serialize', aggcombinefn =>
> +'numeric_avg_combine',
> +  aggserialfn => 'numeric_avg_serialize',
> +  aggdeserialfn => 'numeric_avg_deserialize',
> +  aggtranstype => 'internal', aggtransspace => '128' },
> 
> Why are these marked as 'sum' but use 'avg' functions?
This reason is that sum(int8)/sum(numeric) shares some functions with 
avg(int8)/avg(numeric),
and sum_p_int8 is aggpartialfn of sum(int8) and sum_p_numeric is aggpartialfn 
of sum(numeric).

--Part of avg(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
--

--Part of sum(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'sum(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'i

Re: Initial Schema Sync for Logical Replication

2023-04-07 Thread Amit Kapila
On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada  wrote:
>
> On Thu, Mar 30, 2023 at 10:11 PM Masahiko Sawada  
> wrote:
> >
> > On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > How can we postpone creating the pg_subscription_rel entries until the
> > > tablesync worker starts and does the schema sync? I think that since
> > > pg_subscription_rel entry needs the table OID, we need either to do
> > > the schema sync before creating the entry (i.e, during CREATE
> > > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> > > apply worker needs the information of tables to sync in order to
> > > launch the tablesync workers, but it needs to create the table schema
> > > to get that information.
> >
> > For the above reason, I think that step 6 of the initial proposal won't 
> > work.
> >
> > If we can have the tablesync worker create an entry of
> > pg_subscription_rel after creating the table, it may give us the
> > flexibility to perform the initial sync. One idea is that we add a
> > relname field to pg_subscription_rel so that we can create entries
> > with relname instead of OID if the table is not created yet. Once the
> > table is created, we clear the relname field and set the OID of the
> > table instead. It's not an ideal solution but we might make it simpler
> > later.
>
> While writing a PoC patch, I found some difficulties in this idea.
> First, I tried to add schemaname+relname to pg_subscription_rel but I
> could not define the primary key of pg_subscription_rel. The primary
> key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
> Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
> also doesn't work.
>

Can we think of having a separate catalog table say
pg_subscription_remote_rel for this? You can have srsubid,
remote_schema_name, remote_rel_name, etc. We may need some other state
to be maintained during the initial schema sync where this table can
be used. Basically, this can be used to maintain the state till the
initial schema sync is complete because we can create a relation entry
in pg_subscritption_rel only after the initial schema sync is
complete.

> So I tried another idea: that we generate a new OID
> for srrelid and the tablesync worker will replace it with the new
> table's OID once it creates the table. However, since we use srrelid
> in replication slot names, changing srrelid during the initial
> schema+data sync is not straightforward (please note that the slot is
> created by the tablesync worker but is removed by the apply worker).
> Using relname in slot name instead of srrelid is not a good idea since
> it requires all pg_subscription_rel entries have relname, and slot
> names could be duplicated, for example, when the relname is very long
> and we cut it.
>
> I'm trying to consider the idea from another angle: the apply worker
> fetches the table list and passes the relname to the tablesync worker.
> But a problem of this approach is that the table list is not
> persisted. If the apply worker restarts during the initial table sync,
> it could not get the same list as before.
>

Agreed, this has some drawbacks. We can try to explore this if the
above idea of the new catalog table doesn't solve this problem.


-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

Thank you for giving comments!

> As I mentioned in my original thread, I'm not very familiar with that code, 
> but
> I'm a bit worried about "all the changes generated on publisher must be send
> and applied".  Is that a hard requirement for the feature to work reliably?

I think the requirement is needed because the existing WALs on old node cannot 
be
transported on new instance. The WAL hole from confirmed_flush to current 
position
could not be filled by newer instance.

> If
> yes, how does this work if some subscriber node isn't connected when the
> publisher node is stopped?  I guess you could add a check in pg_upgrade to 
> make
> sure that all logical slot are indeed caught up and fail if that's not the 
> case
> rather than assuming that a clean shutdown implies it.  It would be good to
> cover that in the TAP test, and also cover some corner cases, like any new row
> added on the publisher node after the pg_upgrade but before the subscriber is
> reconnected is also replicated as expected.

Hmm, good point. Current patch could not be handled the case because walsenders
for the such slots do not exist. I have tested your approach, however, I found 
that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was 
behind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?

> Agreed, but then shouldn't the option be named "--logical-slots-only" or
> something like that, same for all internal function names?

Seems right. Will be fixed in next version. Maybe 
"--logical-replication-slots-only"
will be used, per Peter's suggestion [1].

[1]: 
https://www.postgresql.org/message-id/CAHut%2BPvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-04-07 Thread John Naylor
On Thu, Feb 16, 2023 at 11:44 PM Andres Freund  wrote:
>
> We really ought to replace the tid bitmap used for bitmap heap scans. The
> hashtable we use is a pretty awful data structure for it. And that's not
> filled in-order, for example.

I spent some time studying tidbitmap.c, and not only does it make sense to
use a radix tree there, but since it has more complex behavior and stricter
runtime requirements, it should really be the thing driving the design and
tradeoffs, not vacuum:

- With lazy expansion and single-value leaves, the root of a radix tree can
point to a single leaf. That might get rid of the need to track TBMStatus,
since setting a single-leaf tree should be cheap.

- Fixed-size PagetableEntry's are pretty large, but the tid compression
scheme used in this thread (in addition to being complex) is not a great
fit for tidbitmap because it makes it more difficult to track per-block
metadata (see also next point). With the "combined pointer-value slots"
technique, if a page's max tid offset is 63 or less, the offsets can be
stored directly in the pointer for the exact case. The lowest bit can tag
to indicate a pointer to a single-value leaf. That would complicate
operations like union/intersection and tracking "needs recheck", but it
would reduce memory use and node-traversal in common cases.

- Managing lossy storage. With pure blocknumber keys, replacing exact
storage for a range of 256 pages amounts to replacing a last-level node
with a single leaf containing one lossy PagetableEntry. The leader could
iterate over the nodes, and rank the last-level nodes by how much storage
they (possibly with leaf children) are using, and come up with an optimal
lossy-conversion plan.

The above would address the points (not including better iteration and
parallel bitmap index scans) raised in

https://www.postgresql.org/message-id/capsanrn5ywsows8ghqwbwajx1selxlntv54biq0z-j_e86f...@mail.gmail.com

Ironically, by targeting a more difficult use case, it's easier since there
is less freedom. There are many ways to beat a binary search, but fewer
good ways to improve bitmap heap scan. I'd like to put aside vacuum for
some time and try killing two birds with one stone, building upon our work
thus far.

Note: I've moved the CF entry to the next CF, and set to waiting on
author for now. Since no action is currently required from Masahiko, I've
added myself as author as well. If tackling bitmap heap scan shows promise,
we could RWF and resurrect at a later time.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-07 Thread Michael Paquier
On Thu, Apr 06, 2023 at 03:14:20PM +, Imseih (AWS), Sami wrote:
>> Could it be worth thinking about a different design where
>> the value incremented and the parameters of
>> pgstat_progress_update_param() are passed through the 'P' message
>> instead?
> 
> I am not sure how this is different than the approach suggested.
> In the current design, the 'P' message is used to pass the
> ParallelvacuumState to parallel_vacuum_update_progress which then
> calls pgstat_progress_update_param.

The arguments of pgstat_progress_update_param() would be given by the
worker directly as components of the 'P' message.  It seems to me that
this approach would have the simplicity to not require the setup of a
shmem area for the extra counters, and there would be no need for a
callback.  Hence, the only thing the code paths of workers would need
to do is to call this routine, then the leaders would increment their
progress when they see a CFI to process the 'P' message.  Also, I
guess that we would only need an interface in backend_progress.c to
increment counters, like pgstat_progress_incr_param(), but usable by
workers.  Like a pgstat_progress_worker_incr_param()?
--
Michael


signature.asc
Description: PGP signature


RE: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-07 Thread Zhijie Hou (Fujitsu)
On Friday, April 7, 2023 5:11 PM Amit Kapila  wrote:
> 
> On Fri, Apr 7, 2023 at 1:12 PM Masahiko Sawada 
> wrote:
> >
> > On Fri, Apr 7, 2023 at 2:28 PM Amit Kapila 
> wrote:
> > >
> > > On Wed, Apr 5, 2023 at 5:58 AM Peter Smith 
> wrote:
> > > >
> > >
> > > LGTM, so pushed. BTW, while looking at this, I noticed that newly
> > > added options "password_required" and "run_as_owner" has incorrectly
> > > mentioned their datatype as a string in the docs. It should be
> > > boolean.
> >
> > +1
> >
> > > I think "password_required" belongs to first section of docs which
> > > says: "The following parameters control what happens during
> > > subscription creation".
> >
> > But the documentation of ALTER SUBSCRIPTION says:
> >
> > The parameters that can be altered are slot_name, synchronous_commit,
> > binary, streaming, disable_on_error, password_required, run_as_owner,
> > and origin. Only a superuser can set password_required = false.
> >
> 
> By the above, do you intend to say that all the parameters that can be altered
> are in the second list? If so, slot_name belongs to the first category.
> 
> > ISTM that both password_required and run_as_owner are parameters to
> > control the subscription's behavior, like disable_on_error and
> > streaming. So it looks good to me that password_required belongs to
> > the second section.
> >
> 
> Do you mean that because 'password_required' is used each time we make a
> connection to a publisher during replication, it should be in the second
> category? If so, slot_name is also used during the start replication each 
> time.
> 
> BTW, do we need to check one or both of these parameters in
> maybe_reread_subscription() where we "Exit if any parameter that affects the
> remote connection was changed."

I think changing run_as_owner doesn't require to be checked as it only affect
the role to perform the apply. But it seems password_required need to be
checked in maybe_reread_subscription() because we used this parameter for
connection.

Best Regards,
Hou zj


Re: Should vacuum process config file reload more often

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 08:52, Masahiko Sawada  wrote:
> On Fri, Apr 7, 2023 at 8:08 AM Daniel Gustafsson  wrote:

>> I had another read-through and test-through of this version, and have applied
>> it with some minor changes to comments and whitespace.  Thanks for the quick
>> turnaround times on reviews in this thread!
> 
> Cool!
> 
> Regarding the commit 7d71d3dd08, I have one comment:
> 
> +   /* Only log updates to cost-related variables */
> +   if (vacuum_cost_delay == original_cost_delay &&
> +   vacuum_cost_limit == original_cost_limit)
> +   return;
> 
> IIUC by default, we log not only before starting the vacuum but also
> when changing cost-related variables. Which is good, I think, because
> logging the initial values would also be helpful for investigation.
> However, I think that we don't log the initial vacuum cost values
> depending on the values. For example, if the
> autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
> the initial values. I think that instead of comparing old and new
> values, we can write the log only if
> message_level_is_interesting(DEBUG2) is true. That way, we don't need
> to acquire the lwlock unnecessarily. And the code looks cleaner to me.
> I've attached the patch (use_message_level_is_interesting.patch)

That's a good idea, unless Melanie has conflicting opinions I think we should
go ahead with this.  Avoiding taking a lock here is a good save.

> Also, while testing the autovacuum delay with relopt
> autovacuum_vacuum_cost_delay = 0, I realized that even if we set
> autovacuum_vacuum_cost_delay = 0 to a table, wi_dobalance is set to
> true. wi_dobalance comes from the following expression:
> 
>/*
> * If any of the cost delay parameters has been set individually for
> * this table, disable the balancing algorithm.
> */
>tab->at_dobalance =
>!(avopts && (avopts->vacuum_cost_limit > 0 ||
> avopts->vacuum_cost_delay > 0));
> 
> The initial values of both avopts->vacuum_cost_limit and
> avopts->vacuum_cost_delay are -1. I think we should use ">= 0" instead
> of "> 0". Otherwise, we include the autovacuum worker working on a
> table whose autovacuum_vacuum_cost_delay is 0 to the balancing
> algorithm. Probably this behavior has existed also on back branches
> but I haven't checked it yet.

Interesting, good find.  Looking quickly at the back branches I think there is
a variant of this for vacuum_cost_limit even there but needs more investigation.

--
Daniel Gustafsson





Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-07 Thread Andrew Dunstan


On 2023-03-29 We 07:55, Tom Lane wrote:

Yurii Rashkovskii  writes:

I would like to suggest a patch against master (although it may be worth
backporting it) that makes it possible to listen on any unused port.

I think this is a bad idea, mainly because this:


Instead, with this patch, one can specify `port` as `0` (the "wildcard"
port) and retrieve the assigned port from postmaster.pid

is a horrid way to find out what was picked, and yet there could
be no other.

Our existing design for this sort of thing is to let the testing
framework choose the port, and I don't really see what's wrong
with that approach.  Yes, I know it's theoretically subject to
race conditions, but that hasn't seemed to be a problem in
practice.  It's especially not a problem given that modern
testing practice tends to not open any TCP port at all, just
a Unix socket in a test-private directory, so that port
conflicts are a non-issue.



For TAP tests we have pretty much resolved the port collisions issue for 
TCP ports too. See commit 9b4eafcaf4


Perhaps the OP could adapt that logic to his use case.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Direct I/O

2023-04-07 Thread Thomas Munro
On Wed, Jan 25, 2023 at 8:57 PM Bharath Rupireddy
 wrote:
> Thanks. I have some comments on
> v3-0002-Add-io_direct-setting-developer-only.patch:
>
> 1. I think we don't need to overwrite the io_direct_string in
> check_io_direct so that show_io_direct can be avoided.

Thanks for looking at this, and sorry for the late response.  Yeah, agreed.

> 2. check_io_direct can leak the flags memory - when io_direct is not
> supported or for an invalid list syntax or an invalid option is
> specified.
>
> I have addressed my review comments as a delta patch on top of v3-0002
> and added it here as v1-0001-Review-comments-io_direct-GUC.txt.

Thanks.  Your way is nicer.  I merged your patch and added you as a co-author.

> Some comments on the tests added:
>
> 1. Is there a way to know if Direct IO for WAL and data has been
> picked up programmatically? IOW, can we know if the OS page cache is
> bypassed? I know an external extension pgfincore which can help here,
> but nothing in the core exists AFAICS.

Right, that extension can tell you how many pages are in the kernel
page cache which is quite interesting for this.  I also once hacked up
something primitive to see *which* pages are in kernel cache, so I
could join that against pg_buffercache to measure double buffering,
when I was studying the "smile" shape where pgbench TPS goes down and
then back up again as you increase shared_buffers if the working set
is nearly as big as physical memory (code available in a link from
[1]).

Yeah, I agree it might be nice for human investigators to put
something like that in contrib/pg_buffercache, but I'm not sure you
could rely on it enough for an automated test, though, 'cause it
probably won't work on some file systems and the tests would probably
fail for random transient reasons (for example: some systems won't
kick pages out of kernel cache if they were already there, just
because we decided to open the file with O_DIRECT).  (I got curious
about why mincore() wasn't standardised along with mmap() and all that
jazz; it seems the BSD and later Sun people who invented all those
interfaces didn't think that one was quite good enough[2], but every
(?) Unixoid OS copied it anyway, with variations...  Apparently the
Windows thing is called VirtualQuery()).

> 2. Can we combine these two append_conf to a single statement?
> +$node->append_conf('io_direct', 'data,wal,wal_init');
> +$node->append_conf('shared_buffers', '64kB'); # tiny to force I/O

OK, sure, done.  And also oops, that was completely wrong and not
working the way I had it in that version...

> 3. A nitpick: Can we split these queries multi-line instead of in a single 
> line?
> +$node->safe_psql('postgres', 'begin; create temporary table t2 as
> select 1 as i from generate_series(1, 1); update t2 set i = i;
> insert into t2count select count(*) from t2; commit;');

OK.

> 4. I don't think we need to stop the node before the test ends, no?
> +$node->stop;
> +
> +done_testing();

Sure, but why not?

Otherwise, I rebased, and made a couple more changes:

I found a line of the manual about wal_sync_method that needed to be removed:

-The open_* options also use
O_DIRECT if available.

In fact that sentence didn't correctly document the behaviour in
released branches (wal_level=minimal is also required for that, so
probably very few people ever used it).  I think we should adjust that
misleading sentence in back-branches, separately from this patch set.

I also updated the commit message to highlight the only expected
user-visible change for this, namely the loss of the above incorrectly
documented obscure special case, which is replaced by the less obscure
new setting io_direct=wal, if someone still wants that behaviour.

Also a few minor comment changes.

[1] https://twitter.com/MengTangmu/status/994770040745615361
[2] http://kos.enix.org/pub/gingell8.pdf
From c6e01d506762fb7c11a3fb31d56902fa53ea822b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 13 Dec 2022 16:25:59 +1300
Subject: [PATCH v4 1/3] Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.

In order to be able to use O_DIRECT/FILE_FLAG_NO_BUFFERING on common
systems in a later commit, we need the address and length of user space
buffers to align with the sector size of the storage.  O_DIRECT would
either fail to work or fail to work efficiently without that on various
platforms.  Even without O_DIRECT, aligning on memory pages is known to
improve traditional buffered I/O performance.

The alignment size is set to 4096, which is enough for currently known
systems: it covers traditional 512 byte sectors and modern 4096 byte
sectors, as well as common 4096 byte memory pages.  There is no standard
governing the requirements for O_DIRECT so it's possible we might have
to reconsider this approach or fail to work on some exotic system, but
for now this simplistic approach works and it can be changed at compile
time.

Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap bu

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

> > Agreed, but then shouldn't the option be named "--logical-slots-only" or
> > something like that, same for all internal function names?
> 
> Seems right. Will be fixed in next version. Maybe
> "--logical-replication-slots-only"
> will be used, per Peter's suggestion [1].

After considering more, I decided not to include the word "logical" in the 
option
at this point. This is because we have not decided yet whether we dumps physical
replication slots or not. Current restriction has been occurred because of just
lack of analysis and considerations, If we decide not to do that, then they will
be renamed accordingly.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Should vacuum process config file reload more often

2023-04-07 Thread Melanie Plageman
On Fri, Apr 7, 2023 at 2:53 AM Masahiko Sawada  wrote:
>
> On Fri, Apr 7, 2023 at 8:08 AM Daniel Gustafsson  wrote:
> >
> > > On 7 Apr 2023, at 00:12, Melanie Plageman  
> > > wrote:
> > >
> > > On Thu, Apr 6, 2023 at 5:45 PM Daniel Gustafsson  wrote:
> > >>
> > >>> On 6 Apr 2023, at 23:06, Melanie Plageman  
> > >>> wrote:
> > >>
> > >>> Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
> > >>> limit or cost delay have been changed. If they have, they assert that
> > >>> they don't already hold the AutovacuumLock, take it in shared mode, and
> > >>> do the logging.
> > >>
> > >> Another idea would be to copy the values to local temp variables while 
> > >> holding
> > >> the lock, and release the lock before calling elog() to avoid holding 
> > >> the lock
> > >> over potential IO.
> > >
> > > Good idea. I've done this in attached v19.
> > > Also I looked through the docs and everything still looks correct for
> > > balancing algo.
> >
> > I had another read-through and test-through of this version, and have 
> > applied
> > it with some minor changes to comments and whitespace.  Thanks for the quick
> > turnaround times on reviews in this thread!
>
> Cool!
>
> Regarding the commit 7d71d3dd08, I have one comment:
>
> +   /* Only log updates to cost-related variables */
> +   if (vacuum_cost_delay == original_cost_delay &&
> +   vacuum_cost_limit == original_cost_limit)
> +   return;
>
> IIUC by default, we log not only before starting the vacuum but also
> when changing cost-related variables. Which is good, I think, because
> logging the initial values would also be helpful for investigation.
> However, I think that we don't log the initial vacuum cost values
> depending on the values. For example, if the
> autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
> the initial values. I think that instead of comparing old and new
> values, we can write the log only if
> message_level_is_interesting(DEBUG2) is true. That way, we don't need
> to acquire the lwlock unnecessarily. And the code looks cleaner to me.
> I've attached the patch (use_message_level_is_interesting.patch)

Thanks for coming up with the case you thought of with storage param for
cost delay = 0. In that case we wouldn't print the message initially and
we should fix that.

I disagree, however, that we should condition it only on
message_level_is_interesting().

Actually, outside of printing initial values when the autovacuum worker
first starts (before vacuuming all tables), I don't think we should log
these values except when they are being updated. Autovacuum workers
could vacuum tons of small tables and having this print out at least
once per table (which I know is how it is on master) would be
distracting. Also, you could be reloading the config to update some
other GUCs and be oblivious to an ongoing autovacuum and get these
messages printed out, which I would also find distracting.

You will have to stare very hard at the logs to tell if your changes to
vacuum cost delay and limit took effect when you reload config. I think
with our changes to update the values more often, we should take the
opportunity to make this logging more useful by making it happen only
when the values are changed.

I would be open to elevating the log level to DEBUG1 for logging only
updates and, perhaps, having an option if you set log level to DEBUG2,
for example, to always log these values in VacuumUpdateCosts().

I'd even argue that, potentially, having the cost-delay related
parameters printed at the beginning of vacuuming could be interesting to
regular VACUUM as well (even though it doesn't benefit from config
reload while in progress).

To fix the issue you mentioned and ensure the logging is printed when
autovacuum workers start up before vacuuming tables, we could either
initialize vacuum_cost_delay and vacuum_cost_limit to something invalid
that will always be different than what they are set to in
VacuumUpdateCosts() (not sure if this poses a problem for VACUUM using
these values since they are set to the defaults for VACUUM). Or, we
could duplicate this logging message in do_autovacuum().

Finally, one other point about message_level_is_interesting(). I liked
the idea of using it a lot, since log level DEBUG2 will not be the
common case. I thought of it but hesitated because all other users of
message_level_is_interesting() are avoiding some memory allocation or
string copying -- not avoiding take a lock. Making this conditioned on
log level made me a bit uncomfortable. I can't think of a situation when
it would be a problem, but it felt a bit off.

> Also, while testing the autovacuum delay with relopt
> autovacuum_vacuum_cost_delay = 0, I realized that even if we set
> autovacuum_vacuum_cost_delay = 0 to a table, wi_dobalance is set to
> true. wi_dobalance comes from the following expression:
>
> /*
>  * If any of the cost delay parameters has been set individually for
>

Re: Should vacuum process config file reload more often

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 15:07, Melanie Plageman  wrote:
> On Fri, Apr 7, 2023 at 2:53 AM Masahiko Sawada  wrote:

>> +   /* Only log updates to cost-related variables */
>> +   if (vacuum_cost_delay == original_cost_delay &&
>> +   vacuum_cost_limit == original_cost_limit)
>> +   return;
>> 
>> IIUC by default, we log not only before starting the vacuum but also
>> when changing cost-related variables. Which is good, I think, because
>> logging the initial values would also be helpful for investigation.
>> However, I think that we don't log the initial vacuum cost values
>> depending on the values. For example, if the
>> autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
>> the initial values. I think that instead of comparing old and new
>> values, we can write the log only if
>> message_level_is_interesting(DEBUG2) is true. That way, we don't need
>> to acquire the lwlock unnecessarily. And the code looks cleaner to me.
>> I've attached the patch (use_message_level_is_interesting.patch)
> 
> Thanks for coming up with the case you thought of with storage param for
> cost delay = 0. In that case we wouldn't print the message initially and
> we should fix that.
> 
> I disagree, however, that we should condition it only on
> message_level_is_interesting().

I think we should keep the logging frequency as committed, but condition taking
the lock on message_level_is_interesting().

> Actually, outside of printing initial values when the autovacuum worker
> first starts (before vacuuming all tables), I don't think we should log
> these values except when they are being updated. Autovacuum workers
> could vacuum tons of small tables and having this print out at least
> once per table (which I know is how it is on master) would be
> distracting. Also, you could be reloading the config to update some
> other GUCs and be oblivious to an ongoing autovacuum and get these
> messages printed out, which I would also find distracting.
> 
> You will have to stare very hard at the logs to tell if your changes to
> vacuum cost delay and limit took effect when you reload config. I think
> with our changes to update the values more often, we should take the
> opportunity to make this logging more useful by making it happen only
> when the values are changed.
> 
> I would be open to elevating the log level to DEBUG1 for logging only
> updates and, perhaps, having an option if you set log level to DEBUG2,
> for example, to always log these values in VacuumUpdateCosts().
> 
> I'd even argue that, potentially, having the cost-delay related
> parameters printed at the beginning of vacuuming could be interesting to
> regular VACUUM as well (even though it doesn't benefit from config
> reload while in progress).
> 
> To fix the issue you mentioned and ensure the logging is printed when
> autovacuum workers start up before vacuuming tables, we could either
> initialize vacuum_cost_delay and vacuum_cost_limit to something invalid
> that will always be different than what they are set to in
> VacuumUpdateCosts() (not sure if this poses a problem for VACUUM using
> these values since they are set to the defaults for VACUUM). Or, we
> could duplicate this logging message in do_autovacuum().

Duplicating logging, maybe with a slightly tailored message, seem the least
bad option.

> Finally, one other point about message_level_is_interesting(). I liked
> the idea of using it a lot, since log level DEBUG2 will not be the
> common case. I thought of it but hesitated because all other users of
> message_level_is_interesting() are avoiding some memory allocation or
> string copying -- not avoiding take a lock. Making this conditioned on
> log level made me a bit uncomfortable. I can't think of a situation when
> it would be a problem, but it felt a bit off.

Considering how uncommon DEBUG2 will be in production, I think conditioning
taking a lock on it makes sense.

>> Also, while testing the autovacuum delay with relopt
>> autovacuum_vacuum_cost_delay = 0, I realized that even if we set
>> autovacuum_vacuum_cost_delay = 0 to a table, wi_dobalance is set to
>> true. wi_dobalance comes from the following expression:
>> 
>>/*
>> * If any of the cost delay parameters has been set individually for
>> * this table, disable the balancing algorithm.
>> */
>>tab->at_dobalance =
>>!(avopts && (avopts->vacuum_cost_limit > 0 ||
>> avopts->vacuum_cost_delay > 0));
>> 
>> The initial values of both avopts->vacuum_cost_limit and
>> avopts->vacuum_cost_delay are -1. I think we should use ">= 0" instead
>> of "> 0". Otherwise, we include the autovacuum worker working on a
>> table whose autovacuum_vacuum_cost_delay is 0 to the balancing
>> algorithm. Probably this behavior has existed also on back branches
>> but I haven't checked it yet.
> 
> Thank you for catching this. Indeed this exists in master since
> 1021bd6a89b which was backpatched. I c

Re: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-07 Thread Masahiko Sawada
On Fri, Apr 7, 2023 at 6:10 PM Amit Kapila  wrote:
>
> On Fri, Apr 7, 2023 at 1:12 PM Masahiko Sawada  wrote:
> >
> > On Fri, Apr 7, 2023 at 2:28 PM Amit Kapila  wrote:
> > >
> > > On Wed, Apr 5, 2023 at 5:58 AM Peter Smith  wrote:
> > > >
> > >
> > > LGTM, so pushed. BTW, while looking at this, I noticed that newly
> > > added options "password_required" and "run_as_owner" has incorrectly
> > > mentioned their datatype as a string in the docs. It should be
> > > boolean.
> >
> > +1
> >
> > > I think "password_required" belongs to first section of docs
> > > which says: "The following parameters control what happens during
> > > subscription creation".
> >
> > But the documentation of ALTER SUBSCRIPTION says:
> >
> > The parameters that can be altered are slot_name, synchronous_commit,
> > binary, streaming, disable_on_error, password_required, run_as_owner,
> > and origin. Only a superuser can set password_required = false.
> >
>
> By the above, do you intend to say that all the parameters that can be
> altered are in the second list? If so, slot_name belongs to the first
> category.
>
> > ISTM that both password_required and run_as_owner are parameters to
> > control the subscription's behavior, like disable_on_error and
> > streaming. So it looks good to me that password_required belongs to
> > the second section.
> >
>
> Do you mean that because 'password_required' is used each time we make
> a connection to a publisher during replication, it should be in the
> second category? If so, slot_name is also used during the start
> replication each time.

I think that parameters used by the backend process when performing
CREATE SUBSCRIPTION belong to the first category. And other parameters
used by apply workers and tablesync workers belong to the second
category. Since slot_name is used by both I'm not sure it should be in
the second category, but password_requried seems to be used by only
apply workers and tablesync workers, so it should be in the second
category.

>
> BTW, do we need to check one or both of these parameters in
> maybe_reread_subscription() where we "Exit if any parameter that
> affects the remote connection was changed."

As for run_as_owner, since we can dynamically switch the behavior I
think we don't need to reconnect. I'm not really sure about
password_required. From the implementation point of view, we don't
need to reconnect. Even if password_required is changed from false to
true, the apply worker already has the established connection. If it's
changed from true to false, we might not want to reconnect. I think we
need to consider it from the security point of view while checking the
motivation that password_required was introduced. So probably it's
better to discuss it on the original thread.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 5 Apr 2023, at 23:44, Daniel Gustafsson  wrote:
> 
> Unless there are objections I plan to get this in before the freeze, in order
> to have better interactive tests starting with 16.  With a little bit of
> documentation polish I think it's ready.

When looking at the CFBot failure on Linux and Windows (not on macOS) I noticed
that it was down to the instance lacking IO::Pty.

[19:59:12.609](1.606s) ok 1 - scram_iterations in server side ROLE
Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty module) 
(@INC contains: /tmp/cirrus-ci-build/src/test/perl 
/tmp/cirrus-ci-build/src/test/authentication /etc/perl 
/usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 
/usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5 
/usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32 
/usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1828.

Skimming the VM creation [0] it seems like it should be though?  On macOS the
module is installed inside Cirrus and the test runs fine.

I don't think we should go ahead with a patch that refactors interactive_psql
only to SKIP over it in CI (which is what the tab_completion test does now), so
let's wait until we have that sorted before going ahead.

--
Daniel Gustafsson

[0] 
https://github.com/anarazel/pg-vm-images/blob/main/scripts/linux_debian_install_deps.sh



Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2023-04-07 Thread Thomas Munro
On Tue, Apr 4, 2023 at 1:25 AM Tom Lane  wrote:
> Sorry for not looking at this sooner.  I am okay with the regex
> changes proposed in v5-0001 through 0003, but I think you need to
> take another mopup pass there.  Some specific complaints:
> * header comment for pg_regprefix has been falsified (s/malloc/palloc/)

Thanks.  Fixed.

> * in spell.c, regex_affix_deletion_callback could be got rid of

Done in a separate patch.  I wondered if regex_t should be included
directly as a member of that union inside AFFIX, but decided it should
keep using a pointer (just without the extra wrapper struct).  A
direct member would make the AFFIX slightly larger, and it would
require us to assume that regex_t is movable which it probably
actually is in practice I guess but that isn't written down anywhere
and it seemed strange to rely on it.

> * check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS

I found three of these to remove (jsonpath_gram.y, varlena.c, test_regex.c).

> In general there's a lot of comments referring to regexes being malloc'd.

There is also some remaining direct use of malloc() in
regc_pg_locale.c because "we mustn't lose control on out-of-memory".
At that time (2012) there was no MCXT_NO_OOM (2015), so we could
presumably bring that cache into an observable MemoryContext now too.
I haven't written a patch for that, though, because it's not in the
way of my recovery conflict mission.

> I'm disinclined to change the ones inside the engine, because as far as
> it knows it is still using malloc, but maybe we should work harder on
> our own comments.  In particular, it'd likely be useful to have something
> somewhere pointing out that pg_regfree is only needed when you can't
> get rid of the regex by context cleanup.  Maybe write a short section
> about memory management in backend/regex/README?

I'll try to write something for the README tomorrow.  Here's a new
version of the code changes.

> I've not really looked at 0004.

I'm hoping to get just the regex changes in ASAP, and then take a
little bit longer on the recovery conflict patch itself (v6-0005) on
the basis that it's bugfix work and not subject to the feature freeze.
From a21a43bf5b1ba073abb3238968b9f8d13b1b318a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 4 Jan 2023 14:15:40 +1300
Subject: [PATCH v6 1/5] Use MemoryContext API for regex memory management.

Previously, regex_t objects' memory was managed with malloc() and free()
directly.  Switch to palloc()-based memory management instead.
Advantages:

 * memory used by cached regexes is now visible with MemoryContext
   observability tools

 * cleanup can be done automatically in certain failure modes
   (something that later commits will take advantage of)

 * cleanup can be done in bulk

On the downside, there may be more fragmentation (wasted memory) due to
per-regex MemoryContext objects.  This is a problem shared with other
cached objects in PostgreSQL and can probably be improved with later
tuning.

Thanks to Noah Misch for suggesting this general approach, which
unblocks later work on interrupts.

Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/regex/regprefix.c  |  2 +-
 src/backend/utils/adt/regexp.c | 57 --
 src/include/regex/regcustom.h  |  6 ++--
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/src/backend/regex/regprefix.c b/src/backend/regex/regprefix.c
index 221f02da63..c09b2a9778 100644
--- a/src/backend/regex/regprefix.c
+++ b/src/backend/regex/regprefix.c
@@ -32,7 +32,7 @@ static int	findprefix(struct cnfa *cnfa, struct colormap *cm,
  *	REG_EXACT: all strings satisfying the regex must match the same string
  *	or a REG_XXX error code
  *
- * In the non-failure cases, *string is set to a malloc'd string containing
+ * In the non-failure cases, *string is set to a palloc'd string containing
  * the common prefix or exact value, of length *slength (measured in chrs
  * not bytes!).
  *
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 810dcb85b6..81400ba150 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -96,9 +96,13 @@ typedef struct regexp_matches_ctx
 #define MAX_CACHED_RES	32
 #endif
 
+/* A parent memory context for regular expressions. */
+static MemoryContext RegexpCacheMemoryContext;
+
 /* this structure describes one cached regular expression */
 typedef struct cached_re_str
 {
+	MemoryContext cre_context;	/* memory context for this regexp */
 	char	   *cre_pat;		/* original RE (not null terminated!) */
 	int			cre_pat_len;	/* length of original RE, in bytes */
 	int			cre_flags;		/* compile flags: extended,icase etc */
@@ -145,6 +149,7 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	int			regcomp_result;
 	cached_re_str re_temp;
 	char		errMsg[100];
+	MemoryContext oldcontext;
 
 	/*
 	 * Look for a ma

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing briefly. PSA new version.
If you can I want to ask the opinion about the checking by pg_upgrade [1].

> ==
> General
> 
> 1.
> Since these two new options are made to work together, I think the
> names should be more similar. e.g.
> 
> pg_dump: "--slot_only" --> "--replication-slots-only"
> pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"
> 
> help/comments/commit-message all should change accordingly, but I did
> not give separate review comments for each of these.

OK, I renamed. By the way, how do you think the suggestion raised by Julien?
Currently I did not address it because the restriction was caused by just lack 
of
analysis, and this may be not agreed in the community.
Or, should we keep the name anyway?

> 2.
> I felt there maybe should be some pg_dump test cases for that new
> option, rather than the current patch where it only seems to be
> testing the new pg_dump option via the pg_upgrade TAP tests.

Hmm, I supposed that the option shoul be used only for upgrading, so I'm not 
sure
it must be tested by only pg_dump.

> Commit message
> 
> 3.
> This commit introduces a new option called "--include-replication-slot".
> This allows nodes with logical replication slots to be upgraded. The commit 
> can
> be divided into two parts: one for pg_dump and another for pg_upgrade.
> 
> ~
> 
> "new option" --> "new pg_upgrade" option

Fixed.

> 4.
> For pg_upgrade, when '--include-replication-slot' is specified, it
> executes pg_dump
> with added option and restore from the dump. Apart from restoring
> schema, pg_resetwal
> must not be called after restoring replicaiton slots. This is because
> the command
> discards WAL files and starts from a new segment, even if they are required by
> replication slots. This leads an ERROR: "requested WAL segment XXX has already
> been removed". To avoid this, replication slots are restored at a different 
> time
> than other objects, after running pg_resetwal.
> 
> ~
> 
> 4a.
> "with added option and restore from the dump" --> "with the new
> "--slot-only" option and restores from the dump"

Fixed.

> 4b.
> Typo: /replicaiton/replication/

Fixed.

> 4c
> "leads an ERROR" --> "leads to an ERROR"

Fixed.

> doc/src/sgml/ref/pg_dump.sgml
> 
> 5.
> + 
> +  --slot-only
> +  
> +   
> +Dump only replication slots, neither the schema (data definitions) 
> nor
> +data. Mainly this is used for upgrading nodes.
> +   
> +  
> 
> SUGGESTION
> Dump only replication slots; not the schema (data definitions), nor
> data. This is mainly used when upgrading nodes.

Fixed.

> doc/src/sgml/ref/pgupgrade.sgml
> 
> 6.
> +   
> +Transport replication slots. Currently this can work only for logical
> +slots, and temporary slots are ignored. Note that pg_upgrade does not
> +check the installation of plugins.
> +   
> 
> SUGGESTION
> Upgrade replication slots. Only logical replication slots are
> currently supported, and temporary slots are ignored. Note that...

Fixed.

> src/bin/pg_dump/pg_dump.c
> 
> 7. main
>   {"exclude-table-data-and-children", required_argument, NULL, 14},
> -
> + {"slot-only", no_argument, NULL, 15},
>   {NULL, 0, NULL, 0}
> 
> The blank line is misplaced.

Fixed.

> 8. main
> + case 15: /* dump onlu replication slot(s) */
> + dopt.slot_only = true;
> + dopt.include_everything = false;
> + break;
> 
> typo: /onlu/only/

Fixed.

> 9. main
> + if (dopt.slot_only && dopt.dataOnly)
> + pg_fatal("options --replicatin-slots and -a/--data-only cannot be
> used together");
> + if (dopt.slot_only && dopt.schemaOnly)
> + pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
> used together");
> +
> 
> 9a.
> typo: /replicatin/replication/

Fixed. Additionally, wrong parameter reference was also fixed.

> 9b.
> I am wondering if these checks are enough. E.g. is "slots-only"
> compatible with "no-publications" ?

I think there are something what should be checked more. But I'm not sure about
"no-publication". There is a possibility that non-core logical replication is 
used,
and at that time these options are not contradicted.

> 10. main
> + /*
> + * If dumping replication slots are request, dumping them and skip others.
> + */
> + if (dopt.slot_only)
> + {
> + getRepliactionSlots(fout);
> + goto dump;
> + }
> 
> 10a.
> SUGGESTION
> If dump replication-slots-only was requested, dump only them and skip
> everything else.

Fixed.

> 10b.
> This code seems mutually exclusive to every other option. I'm
> wondering if this code even needs 'collectRoleNames', or should the
> slots option check be moved  above that (and also above the 'Dumping
> LOs' etc...)

I read again, and I found that collected username are used to check the owner of
objects. IIUC replicaiton slots are not owned by database users, so it is not
needed. Also, the LOs should not dumped here. Based on them, I moved 
getRepliactionSlots()
above them.

> 11. help
> 
> + 

Re: Commitfest 2023-03 starting tomorrow!

2023-04-07 Thread Greg Stark
As announced on this list feature freeze is at 00:00 April 8 AoE.
That's less than 24 hours away. If you need to set your watches to AoE
timezone it's currently:

$ TZ=AOE+12 date
Fri 07 Apr 2023 02:05:50 AM AOE

As we stand we have:

Status summary:
  Needs review: 82
  Waiting on Author:16
  Ready for Committer:  27
  Committed:   115
  Moved to next CF: 38
  Returned with Feedback:   10
  Rejected:  9
  Withdrawn:22
Total: 319.

In less than 24h most of the remaining patches will get rolled forward
to the next CF. The 16 that are Waiting on Author might be RwF
perhaps. The only exceptions would be non-features like Bug Fixes and
cleanup patches that have been intentionally held until the end --
those become Open Issues for the release.

So if we move forward all the remaining patches (so these numbers are
high by about half a dozen) the *next* CF would look like:

Commitfest 2023-07:Now  April 8
  Needs review: 46. 128
  Waiting on Author:17.  33
  Ready for Committer:   3.  30
Total:  66  191

I suppose that's better than the 319 we came into this CF with but
there's 3 months to accumulate more unreviewed patches...

I had hoped to find lots of patches that I could bring the hammer down
on and say there's just no interest in or there's no author still
maintaining. But that wasn't the case. Nearly all the patches still
had actively interested authors and looked like they were legitimately
interesting and worthwhile features that people just haven't had the
time to review or commit.


--
greg




Re: meson documentation build open issues

2023-04-07 Thread Andrew Dunstan


On 2023-04-06 Th 05:11, Peter Eisentraut wrote:

On 05.04.23 16:45, Andres Freund wrote:
I think it's still an issue that "make docs" builds html and man but 
"ninja
docs" only builds html.  For some reason the wiki page actually 
claims that

ninja docs builds both, but this does not happen for me.


It used to, but Tom insisted that it should not. I'm afraid that it's 
not
quite possible to emulate make here. 'make docs' at the toplevel 
builds both

HTML and manpages. But 'make -C doc/src/sgml', only builds HTML.


Ok, not a topic for this thread then.


5. There doesn't appear to be an equivalent of "make world" and "make
install-world" that includes documentation builds.


This has been addressed with the additional meson auto options.  But it
seems that these options only control building, not installing, so 
there is

no "install-world" aspect yet.


I'm not following - install-world install docs if the docs feature is
available, and not if not?


I had expected that if meson setup enables the 'docs' feature, then 
meson compile will build the documentation, which happens, and meson 
install will install it, which does not happen.






"meson compile" doesn't seem to build the docs by default ( see 
), 
and I'd rather it didn't, building the docs is a separate and optional 
step for the buildfarm.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Andrew Dunstan


On 2023-04-07 Fr 09:32, Daniel Gustafsson wrote:

On 5 Apr 2023, at 23:44, Daniel Gustafsson  wrote:

Unless there are objections I plan to get this in before the freeze, in order
to have better interactive tests starting with 16.  With a little bit of
documentation polish I think it's ready.

When looking at the CFBot failure on Linux and Windows (not on macOS) I noticed
that it was down to the instance lacking IO::Pty.

[19:59:12.609](1.606s) ok 1 - scram_iterations in server side ROLE
Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty module) 
(@INC contains: /tmp/cirrus-ci-build/src/test/perl 
/tmp/cirrus-ci-build/src/test/authentication /etc/perl 
/usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 
/usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5 
/usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32 
/usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1828.

Skimming the VM creation [0] it seems like it should be though?  On macOS the
module is installed inside Cirrus and the test runs fine.

I don't think we should go ahead with a patch that refactors interactive_psql
only to SKIP over it in CI (which is what the tab_completion test does now), so
let's wait until we have that sorted before going ahead.



It should probably be added to config/check_modules.pl if we're going to 
use it, but it seems to be missing for Strawberry Perl and msys/ucrt64 
perl and I'm not sure how easy it will be to add there. It would 
certainly add an installation burden for test instances at the very least.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: cataloguing NOT NULL constraints

2023-04-07 Thread Justin Pryzby
On Fri, Apr 07, 2023 at 04:14:13AM +0200, Alvaro Herrera wrote:
> On 2023-Apr-06, Justin Pryzby wrote:

> > +ERROR:  relation "c" already exists
>
> Do you intend to make an error here ?

These still look like mistakes in the tests.

> Also, I think these table names may be too generic, and conflict with
> other parallel tests, now or in the future.
>
> > +create table d(a int not null, f1 int) inherits(inh_p3, c);
> > +ERROR:  relation "d" already exists

> Sadly, the binary-upgrade mode is a bit of a mess and thus the
> pg_upgrade test is failing.




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 15:32:12 +0200, Daniel Gustafsson wrote:
> > On 5 Apr 2023, at 23:44, Daniel Gustafsson  wrote:
> > 
> > Unless there are objections I plan to get this in before the freeze, in 
> > order
> > to have better interactive tests starting with 16.  With a little bit of
> > documentation polish I think it's ready.
> 
> When looking at the CFBot failure on Linux and Windows (not on macOS) I 
> noticed
> that it was down to the instance lacking IO::Pty.
> 
> [19:59:12.609](1.606s) ok 1 - scram_iterations in server side ROLE
> Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty module) 
> (@INC contains: /tmp/cirrus-ci-build/src/test/perl 
> /tmp/cirrus-ci-build/src/test/authentication /etc/perl 
> /usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 
> /usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5 
> /usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32 
> /usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1828.
> 
> Skimming the VM creation [0] it seems like it should be though?

Note it just fails on the 32build, not the 64bit build. Unfortunately I don't
think debian's multiarch in bullseye support installing enough of perl in
32bit and 64bit.

We can't have a hard dependency on non-default modules like IO::Pty anyway, so
the test needs to skip if it's not available.

On windows IO::Pty can't be installed, IIRC.


> I don't think we should go ahead with a patch that refactors interactive_psql
> only to SKIP over it in CI (which is what the tab_completion test does now), 
> so
> let's wait until we have that sorted before going ahead.

Maybe I am a bit confused, but isn't that just an existing requirement? Why
would we expect this patchset to change what dependencies use of
interactive_psql() has?

Greetings,

Andres Freund




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 10:55:19 -0400, Andrew Dunstan wrote:
> It should probably be added to config/check_modules.pl if we're going to use
> it, but it seems to be missing for Strawberry Perl and msys/ucrt64 perl and
> I'm not sure how easy it will be to add there. It would certainly add an
> installation burden for test instances at the very least.

The last time I tried, it can't be installed on windows with cpan either, the
module simply doesn't have the necessary windows bits - likely because
traditionally windows didn't really have ptys. I think some stuff has been
added, but it probably would still require a bunch of portability work.

Note that we normally don't even build with readline support on windows - so
there's not really much point in using IO::Pty there. While I've gotten that
to work manually not too long ago, it's still manual and not documented etc.


Afaict the failures are purely about patch 2, not 1, right?

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 9:50 AM, Andres Freund wrote:

Hi,
Here's my current working state - I'll go to bed soon.


Thanks a lot for this Andres!



Changes:

- shared catalog relations weren't handled correctly, because the dboid is
   InvalidOid for them. I wrote a test for that as well.

- ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into
   account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks
   at logical slots)

- I don't think the subset of slot xids that were checked when invalidating
   was right. We need to check effective_xmin and effective_catalog_xmin - the
   latter was using catalog_xmin.

- similarly, it wasn't right that specifically those two fields were
   overwritten when invalidated - as that was done, I suspect the changes might
   get lost on a restart...

- As mentioned previously, I did not like all the functions in slot.h, nor
   their naming. Not yet quite finished with that, but a good bit further

- There were a lot of unrelated changes, e.g. removing comments like
  * NB - this runs as part of checkpoint, so avoid raising errors if possible.

- I still don't like the order of the patches, fixing the walsender patches
   after introducing support for logical decoding on standby. Reordered.

- I don't think logical slots being invalidated as checked e.g. in
   pg_logical_replication_slot_advance()

- I didn't like much that InvalidatePossiblyObsoleteSlot() switched between
   kill() and SendProcSignal() based on the "conflict". There very well could
   be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside
   of the startup process in the future. Instead I made it differentiate based
   on MyBackendType == B_STARTUP.



Thanks for all of this and the above explanations.



I also:

Added new patch that replaces invalidated_at with a new enum, 'invalidated',
listing the reason for the invalidation.


Yeah, that's a great idea.


I added a check for !invalidated to
ReplicationSlotsComputeRequiredLSN() etc.



looked at 65-0001 and it looks good to me.


Added new patch moving checks for invalid logical slots into
CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
no sense. As far as I can tell the old message in
pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
"never previously reserved WAL"



looked at 65-0002 and it looks good to me.


Split "Handle logical slot conflicts on standby." into two. I'm not sure that
should stay that way, but it made it easier to hack on
InvalidateObsoleteReplicationSlots.



looked at 65-0003 and the others.

It's easier to understand/read the code now that the 
ReplicationSlotInvalidationCause
enum has been created and that data.invalidated also make use of the enum. It does 
"simplify"
the review and that looks good to me.



Todo:
- write a test that invalidated logical slots stay invalidated across a restart


Done in 65-66-0008 attached.


- write a test that invalidated logical slots do not lead to retaining WAL


I'm not sure how to do that since pg_switch_wal() and friends can't be executed 
on
a standby.


- Further evolve the API of InvalidateObsoleteReplicationSlots()
   - pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
   - rename xid to snapshotConflictHorizon, that'd be more in line with the
 ResolveRecoveryConflictWithSnapshot and easier to understand, I think



Done. The new API can be found in 
v65-66-InvalidateObsoleteReplicationSlots_API.patch
attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a 
switch/case
can now be used. The "default" case does not emit an error since this code runs 
as part
of checkpoint.


- The test could stand a bit of cleanup and consolidation
   - No need to start 4 psql processes to do 4 updates, just do it in one
 safe_psql()


Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch 
attached.


   - the sequence of drop_logical_slots(), create_logical_slots(),
 change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
 repeated quite a few times


grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 
attached.


   - the stats queries checking for specific conflict counts, including
 preceding tests, is pretty painful. I suggest to reset the stats at the
 end of the test instead (likely also do the drop_logical_slot() there).


Good idea, done in 65-66-0008 attached.


   - it's hard to correlate postgres log and the tap test, because the slots
 are named the same across all tests. Perhaps they could have a per-test
 prefix?


Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset the
check for invalidation is now done in a single function 
"check_for_invalidation" that looks
for invalidation messages in the logfile and in pg_stat_database_conflicts.

Thanks for the suggestions: the TAP test is now easier to read/understand.


   -

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Julien Rouhaud
On Fri, Apr 07, 2023 at 09:40:14AM +, Hayato Kuroda (Fujitsu) wrote:
>
> > As I mentioned in my original thread, I'm not very familiar with that code, 
> > but
> > I'm a bit worried about "all the changes generated on publisher must be send
> > and applied".  Is that a hard requirement for the feature to work reliably?
>
> I think the requirement is needed because the existing WALs on old node 
> cannot be
> transported on new instance. The WAL hole from confirmed_flush to current 
> position
> could not be filled by newer instance.

I see, that was also the first blocker I could think of when Amit mentioned
that feature weeks ago and I also don't see how that whole could be filled
either.

> > If
> > yes, how does this work if some subscriber node isn't connected when the
> > publisher node is stopped?  I guess you could add a check in pg_upgrade to 
> > make
> > sure that all logical slot are indeed caught up and fail if that's not the 
> > case
> > rather than assuming that a clean shutdown implies it.  It would be good to
> > cover that in the TAP test, and also cover some corner cases, like any new 
> > row
> > added on the publisher node after the pg_upgrade but before the subscriber 
> > is
> > reconnected is also replicated as expected.
>
> Hmm, good point. Current patch could not be handled the case because 
> walsenders
> for the such slots do not exist. I have tested your approach, however, I 
> found that
> CHECKPOINT_SHUTDOWN record were generated twice when publisher was
> shutted down and started. It led that the confirmed_lsn of slots always was 
> behind
> from WAL insert location and failed to upgrade every time.
> Now I do not have good idea to solve it... Do anyone have for this?

I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that?  That's hackish, but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was.  For the double
shutdown checkpoint, I'm not sure that I get the problem.  The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-07 Thread Julien Rouhaud
On Fri, Apr 07, 2023 at 12:51:51PM +, Hayato Kuroda (Fujitsu) wrote:
> Dear Julien,
> 
> > > Agreed, but then shouldn't the option be named "--logical-slots-only" or
> > > something like that, same for all internal function names?
> > 
> > Seems right. Will be fixed in next version. Maybe
> > "--logical-replication-slots-only"
> > will be used, per Peter's suggestion [1].
> 
> After considering more, I decided not to include the word "logical" in the 
> option
> at this point. This is because we have not decided yet whether we dumps 
> physical
> replication slots or not. Current restriction has been occurred because of 
> just
> lack of analysis and considerations, If we decide not to do that, then they 
> will
> be renamed accordingly.

Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.

That being said, I have a hard time believing that we could actually preserve
physical replication slots.  I don't think that pg_upgrade final state is fully
reproducible:  not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication.  Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower.  And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.




Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 17:13:13 +0200, Drouvot, Bertrand wrote:
> On 4/7/23 9:50 AM, Andres Freund wrote:
> > I added a check for !invalidated to
> > ReplicationSlotsComputeRequiredLSN() etc.
> > 
> 
> looked at 65-0001 and it looks good to me.
> 
> > Added new patch moving checks for invalid logical slots into
> > CreateDecodingContext(). Otherwise we end up with 5 or so checks, which 
> > makes
> > no sense. As far as I can tell the old message in
> > pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
> > "never previously reserved WAL"
> > 
> 
> looked at 65-0002 and it looks good to me.
> 
> > Split "Handle logical slot conflicts on standby." into two. I'm not sure 
> > that
> > should stay that way, but it made it easier to hack on
> > InvalidateObsoleteReplicationSlots.
> > 
> 
> looked at 65-0003 and the others.

Thanks for checking!


> > Todo:
> > - write a test that invalidated logical slots stay invalidated across a 
> > restart
> 
> Done in 65-66-0008 attached.

Cool.


> > - write a test that invalidated logical slots do not lead to retaining WAL
> 
> I'm not sure how to do that since pg_switch_wal() and friends can't be 
> executed on
> a standby.

You can do it on the primary and wait for the records to have been applied.


> > - Further evolve the API of InvalidateObsoleteReplicationSlots()
> >- pass in the ReplicationSlotInvalidationCause we're trying to conflict 
> > on?
> >- rename xid to snapshotConflictHorizon, that'd be more in line with the
> >  ResolveRecoveryConflictWithSnapshot and easier to understand, I think
> > 
> 
> Done. The new API can be found in 
> v65-66-InvalidateObsoleteReplicationSlots_API.patch
> attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a 
> switch/case
> can now be used.

Integrated. I moved the cause to the first argument, makes more sense to me
that way.


> The "default" case does not emit an error since this code runs as part
> of checkpoint.

I made it an error - it's a programming error, not some data level
inconsistency if that ever happens.


> > - The test could stand a bit of cleanup and consolidation
> >- No need to start 4 psql processes to do 4 updates, just do it in one
> >  safe_psql()
> 
> Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch 
> attached.

> >- the sequence of drop_logical_slots(), create_logical_slots(),
> >  change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
> >  repeated quite a few times
> 
> grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 
> attached.
> 
> >- the stats queries checking for specific conflict counts, including
> >  preceding tests, is pretty painful. I suggest to reset the stats at the
> >  end of the test instead (likely also do the drop_logical_slot() there).
> 
> Good idea, done in 65-66-0008 attached.
> 
> >- it's hard to correlate postgres log and the tap test, because the slots
> >  are named the same across all tests. Perhaps they could have a per-test
> >  prefix?
> 
> Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset 
> the
> check for invalidation is now done in a single function 
> "check_for_invalidation" that looks
> for invalidation messages in the logfile and in pg_stat_database_conflicts.
> 
> Thanks for the suggestions: the TAP test is now easier to read/understand.

Integrated all of these.


I think pg_log_standby_snapshot() should be added in "Allow logical decoding
on standby", not the commit adding the tests.


Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.

Greetings,

Andres Freund




Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-07 Thread Tom Lane
Daniel Gustafsson  writes:
> Ah, ok, now I see what you mean, thanks!  I'll try to fix up the patch like
> this tomorrow.

Since we're running out of time, I took the liberty of fixing and
pushing this.

regards, tom lane




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-07 15:32:12 +0200, Daniel Gustafsson wrote:
>> I don't think we should go ahead with a patch that refactors interactive_psql
>> only to SKIP over it in CI (which is what the tab_completion test does now), 
>> so
>> let's wait until we have that sorted before going ahead.

> Maybe I am a bit confused, but isn't that just an existing requirement? Why
> would we expect this patchset to change what dependencies use of
> interactive_psql() has?

It is an existing requirement, but only for a test that's not too
critical.  If interactive_psql starts getting used for more interesting
things, we might be sad that the coverage is weak.

Having said that, weak coverage is better than no coverage.  I don't
think this point should be a show-stopper for committing.

regards, tom lane




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 16:58, Andres Freund  wrote:

> Note it just fails on the 32build, not the 64bit build. Unfortunately I don't
> think debian's multiarch in bullseye support installing enough of perl in
> 32bit and 64bit.

I should probably avoid parsing logfiles with fever-induced brainfog, I
confused myself to think it was both =(

> Maybe I am a bit confused, but isn't that just an existing requirement? Why
> would we expect this patchset to change what dependencies use of
> interactive_psql() has?

Correct, there is no change from the current implementation.

--
Daniel Gustafsson





Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 11:52:37 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-07 15:32:12 +0200, Daniel Gustafsson wrote:
> >> I don't think we should go ahead with a patch that refactors 
> >> interactive_psql
> >> only to SKIP over it in CI (which is what the tab_completion test does 
> >> now), so
> >> let's wait until we have that sorted before going ahead.
> 
> > Maybe I am a bit confused, but isn't that just an existing requirement? Why
> > would we expect this patchset to change what dependencies use of
> > interactive_psql() has?
> 
> It is an existing requirement, but only for a test that's not too
> critical.  If interactive_psql starts getting used for more interesting
> things, we might be sad that the coverage is weak.

I don't really expect it to be used for non-critical things - after all,
interactive_psql() also depends on psql being built with readline support,
which we traditionally don't have on windows... For most tasks background_psql
should suffice...


> Having said that, weak coverage is better than no coverage.  I don't
> think this point should be a show-stopper for committing.

Yea.

One thing I wonder is whether we should have a central function for checking
if interactive_psql() is available, instead of copying 010_tab_completion.pl's
logic for it into multiple tests. But that could come later too.

Greetings,

Andres Freund




Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 17:47, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Ah, ok, now I see what you mean, thanks!  I'll try to fix up the patch like
>> this tomorrow.
> 
> Since we're running out of time, I took the liberty of fixing and
> pushing this.

Great, thanks!

--
Daniel Gustafsson





Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 17:04, Andres Freund  wrote:

> Afaict the failures are purely about patch 2, not 1, right?

Correct.  The attached v6 wraps the interactive_psql test in a SKIP block with
a conditional on IO::Pty being available.

--
Daniel Gustafsson



v6-0002-Test-SCRAM-iteration-changes-with-psql-password.patch
Description: Binary data


v6-0001-Refactor-background-psql-TAP-functions.patch
Description: Binary data


Re: Track IO times in pg_stat_io

2023-04-07 Thread Melanie Plageman
Attached v9 addresses review feedback as well as resolving merge
conflicts with recent relation extension patchset.

I've changed pgstat_count_io_op_time() to take a count and call
pgstat_count_io_op_n() so it can be used with smgrzeroextend(). I do
wish that the parameter to pgstat_count_io_op_n() was called "count" and
not "cnt"...

I've also reordered the call site of pgstat_count_io_op_time() in a few
locations, but I have some questions about this.

Before, I didn't think it mattered much that we didn't finish counting
IO time until after setting BM_VALID or BM_DIRTY and unsetting
BM_IO_IN_PROGRESS. With the relation extension code doing this for many
buffers at once, though, I wondered if this will make the IO timing too
inaccurate.

As such, I've moved pgstat_count_io_op_time() to before we set those
flags in all locations. I did wonder if it is bad to prolong having the
buffer pinned and not having those flags set, though.

On Tue, Apr 4, 2023 at 8:59 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-03-31 15:44:58 -0400, Melanie Plageman wrote:
> > From 789d4bf1fb749a26523dbcd2c69795916b711c68 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Tue, 21 Mar 2023 16:00:55 -0400
> > Subject: [PATCH v8 1/4] Count IO time for temp relation writes
> >
> > Both pgstat_database and pgBufferUsage write times failed to count
> > timing for flushes of dirty local buffers when acquiring a new local
> > buffer for a temporary relation block.
>
> I think it'd be worth mentioning here that we do count read time? Otherwise
> it'd not be as clear that adding tracking increases consistency...

Done

> > From f4e0db5c833f33b30d4c0b4bebec1096a1745d81 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Tue, 21 Mar 2023 18:20:44 -0400
> > Subject: [PATCH v8 2/4] FlushRelationBuffers() counts temp relation IO 
> > timing
> >
> > Add pgstat_database and pgBufferUsage IO timing counting to
> > FlushRelationBuffers() for writes of temporary relations.
> > ---
> >  src/backend/storage/buffer/bufmgr.c | 18 ++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/src/backend/storage/buffer/bufmgr.c 
> > b/src/backend/storage/buffer/bufmgr.c
> > index b3adbbe7d2..05e98d5994 100644
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
> > @@ -3571,6 +3571,8 @@ FlushRelationBuffers(Relation rel)
> >  {
> >   int i;
> >   BufferDesc *bufHdr;
> > + instr_time  io_start,
> > + io_time;
> >
> >   if (RelationUsesLocalBuffers(rel))
> >   {
> > @@ -3596,17 +3598,33 @@ FlushRelationBuffers(Relation rel)
> >
> >   PageSetChecksumInplace(localpage, 
> > bufHdr->tag.blockNum);
> >
> > + if (track_io_timing)
> > + INSTR_TIME_SET_CURRENT(io_start);
> > + else
> > + INSTR_TIME_SET_ZERO(io_start);
> > +
> >   smgrwrite(RelationGetSmgr(rel),
> > 
> > BufTagGetForkNum(&bufHdr->tag),
> > bufHdr->tag.blockNum,
> > localpage,
> > false);
> >
> > +
>
> Spurious newline.

Fixed.

> > From 2bdad725133395ded199ecc726096e052d6e654b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Fri, 31 Mar 2023 15:32:36 -0400
> > Subject: [PATCH v8 3/4] Track IO times in pg_stat_io
> >
> > Add IO timing for reads, writes, extends, and fsyncs to pg_stat_io.
> >
> > Reviewed-by: Bertrand Drouvot 
> > Reviewed-by: Andres Freund 
> > Discussion: 
> > https://www.postgresql.org/message-id/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com
> > ---
>
> > -static PgStat_BktypeIO PendingIOStats;
> > +typedef struct PgStat_PendingIO
> > +{
> > + PgStat_Counter 
> > counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > + instr_time  
> > pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > +}PgStat_PendingIO;
>
> Probably will look less awful after adding the typedef to typedefs.list.

Done.
One day I will remember to add things to typedefs.list.

> > + /* we do track it */
> > + if (pgstat_tracks_io_op(bktype, io_object, 
> > io_context, io_op))
> > + {
> > + /* ensure that if IO times are 
> > non-zero, counts are > 0 */
> > + if 
> > (backend_io->times[io_object][io_context][io_op] != 0 &&
> > + 
> > backend_io->counts[io_object][io_context][io_op] <= 0)
> > + return false;
> > +
> >  

Re: Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c

2023-04-07 Thread Tom Lane
"wangw.f...@fujitsu.com"  writes:
> On Tues, Apr 4, 2023 at 23:48 PM Tom Lane  wrote:
>> I like the "per eligible process" wording, at least for guc_tables.c;
>> or maybe it could be "per server process"?  That would be more
>> accurate and not much longer than what we have now.

> Thanks both for sharing your opinions.
> I agree that verbose descriptions make maintenance difficult.
> For consistency, I unified the formulas in guc_tables.c and pg-doc into the 
> same
> suggested short formula. Attach the new patch.

After studying this for awhile, I decided "server process" is probably
the better term --- people will have some idea what that means, while
"eligible process" is not a term we use anywhere else.  Pushed with
that change and some minor other wordsmithing.

regards, tom lane




Unnecessary confirm work on logical replication

2023-04-07 Thread Emre Hasegeli
I was reading the logical replication code and found a little
unnecessary work we are doing.

The confirmed_flushed_lsn cannot reasonably be ahead of the
current_lsn, so there is no point of calling
LogicalConfirmReceivedLocation() every time we update the candidate
xmin or restart_lsn.

Patch is attached.


v00-eliminate-unnecessary-confirm-work-on-logical-replication.patch
Description: Binary data


Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> Integrated all of these.

Here's my current version. Changes:
- Integrated Bertrand's changes
- polished commit messages of 0001-0003
- edited code comments for 0003, including
  InvalidateObsoleteReplicationSlots()'s header
- added a bump of SLOT_VERSION to 0001
- moved addition of pg_log_standby_snapshot() to 0007
- added a catversion bump for pg_log_standby_snapshot()
- moved all the bits dealing with procsignals from 0003 to 0004, now the split
  makes sense IMO
- combined a few more sucessive ->safe_psql() calls

I see occasional failures in the tests, particularly in the new test using
pg_authid, but not solely. cfbot also seems to have seen these:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740

I made a bogus attempt at a workaround for the pg_authid case last night. But
that didn't actually fix anything, it just changed the timing.

I think the issue is that VACUUM does not force WAL to be flushed at the end
(since it does not assign an xid). wait_for_replay_catchup() uses
$node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from
before VACUUM completed.

The problem can be made more likely by adding pg_usleep(100); before
walwriter.c's call to XLogBackgroundFlush().

We probably should introduce some infrastructure in Cluster.pm for this, but
for now I just added a 'flush_wal' table that we insert into after a
VACUUM. That guarantees a WAL flush.


I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...

Greetings,

Andres Freund
>From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 6 Apr 2023 20:00:07 -0700
Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN with
 an enum

This is mainly useful because the upcoming logical-decoding-on-standby feature
adds further reasons for invalidating slots, and we don't want to end up with
multiple invalidated_* fields, or check different attributes.

Eventually we should consider not resetting restart_lsn when invalidating a
slot due to max_slot_wal_keep_size. But that's a user visible change, so left
for later.

Increases SLOT_VERSION, due to the changed field (with a different alignment,
no less).

Reviewed-by: "Drouvot, Bertrand" 
Discussion: https://postgr.es/m/20230407075009.igg7be27ha2ht...@awork3.anarazel.de
---
 src/include/replication/slot.h  | 15 +--
 src/backend/replication/slot.c  | 28 
 src/backend/replication/slotfuncs.c |  8 +++-
 src/tools/pgindent/typedefs.list|  1 +
 4 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8872c80cdfe..ebcb637baed 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
 	RS_TEMPORARY
 } ReplicationSlotPersistency;
 
+/*
+ * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
+ * 'invalidated' field is set to a value other than _NONE.
+ */
+typedef enum ReplicationSlotInvalidationCause
+{
+	RS_INVAL_NONE,
+	/* required WAL has been removed */
+	RS_INVAL_WAL,
+} ReplicationSlotInvalidationCause;
+
 /*
  * On-Disk data of a replication slot, preserved across restarts.
  */
@@ -72,8 +83,8 @@ typedef struct ReplicationSlotPersistentData
 	/* oldest LSN that might be required by this replication slot */
 	XLogRecPtr	restart_lsn;
 
-	/* restart_lsn is copied here when the slot is invalidated */
-	XLogRecPtr	invalidated_at;
+	/* RS_INVAL_NONE if valid, or the reason for having been invalidated */
+	ReplicationSlotInvalidationCause invalidated;
 
 	/*
 	 * Oldest LSN that the client has acked receipt for.  This is used as the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2293c0c6fc3..df23b7ed31e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -89,7 +89,7 @@ typedef struct ReplicationSlotOnDisk
 	sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize
 
 #define SLOT_MAGIC		0x1051CA1	/* format identifier */
-#define SLOT_VERSION	2		/* version for new files */
+#define SLOT_VERSION	3		/* version for new files */
 
 /* Control array for replication slot management */
 ReplicationSlotCtlData *ReplicationSlotCtl = NULL;
@@ -855,8 +855,7 @@ ReplicationSlotsComputeRequiredXmin(bool already_locked)
 		SpinLockAcquire(&s->mutex);
 		effective_xmin = s->effective_xmin;
 		effective_catalog_xmin = s->effective_catalog_xmin;
-		invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) &&
-	   XLogRecPtrIsInvalid(s->data.restart_lsn));
+		invalidated = s->data.invalidated != RS_INVAL_NONE;
 		SpinLockRelease(&s->mutex);
 
 		/* invalidated slots need not apply */
@@ -901,14 +900,20 @@ ReplicationSlotsComputeRequiredLSN(void)
 	{
 		Repl

Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 5:47 PM, Andres Freund wrote:

Hi,


- write a test that invalidated logical slots do not lead to retaining WAL


I'm not sure how to do that since pg_switch_wal() and friends can't be executed 
on
a standby.


You can do it on the primary and wait for the records to have been applied.



Thanks, will give it a try in a couple of hours.




- Further evolve the API of InvalidateObsoleteReplicationSlots()
- pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
- rename xid to snapshotConflictHorizon, that'd be more in line with the
  ResolveRecoveryConflictWithSnapshot and easier to understand, I think



Done. The new API can be found in 
v65-66-InvalidateObsoleteReplicationSlots_API.patch
attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a 
switch/case
can now be used.


Integrated. I moved the cause to the first argument, makes more sense to me
that way.


thanks!



I made it an error - it's a programming error, not some data level
inconsistency if that ever happens.


okay, makes sense.


Integrated all of these.


Thanks!




I think pg_log_standby_snapshot() should be added in "Allow logical decoding
on standby", not the commit adding the tests.


Yeah, that's a good point, I do agree.



Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.


I gave it a try and it does work.

"
node3 subscribes to node2 (standby).
Insert done in node1 (primary) where the publication is created => node3 see 
the changes.
"

I started to create the TAP test but currently stuck as the "create 
subscription" waits for a checkpoint/pg_log_standby_snapshot() on the primary.

So, trying to make use of things like:

"my %psql_subscriber = ('stdin' => '', 'stdout' => '');
$psql_subscriber{run} =
  $node_subscriber->background_psql('postgres', \$psql_subscriber{stdin},
\$psql_subscriber{stdout},
$psql_timeout);
$psql_subscriber{stdout} = '';
"

But in vain so far...

Will resume working on it in a couple of hours.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 8:12 PM, Andres Freund wrote:

Hi,

On 2023-04-07 08:47:57 -0700, Andres Freund wrote:

Integrated all of these.


Here's my current version. Changes:
- Integrated Bertrand's changes
- polished commit messages of 0001-0003
- edited code comments for 0003, including
   InvalidateObsoleteReplicationSlots()'s header
- added a bump of SLOT_VERSION to 0001
- moved addition of pg_log_standby_snapshot() to 0007
- added a catversion bump for pg_log_standby_snapshot()
- moved all the bits dealing with procsignals from 0003 to 0004, now the split
   makes sense IMO
- combined a few more sucessive ->safe_psql() calls



Thanks!


I see occasional failures in the tests, particularly in the new test using
pg_authid, but not solely. cfbot also seems to have seen these:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740

I made a bogus attempt at a workaround for the pg_authid case last night. But
that didn't actually fix anything, it just changed the timing.

I think the issue is that VACUUM does not force WAL to be flushed at the end
(since it does not assign an xid). wait_for_replay_catchup() uses
$node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from
before VACUUM completed.

The problem can be made more likely by adding pg_usleep(100); before
walwriter.c's call to XLogBackgroundFlush().

We probably should introduce some infrastructure in Cluster.pm for this, but
for now I just added a 'flush_wal' table that we insert into after a
VACUUM. That guarantees a WAL flush.



Ack for the Cluster.pm "improvement" and thanks for the "workaround"!


I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...



Sure, will do in a couple of hours.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: monitoring usage count distribution

2023-04-07 Thread Tom Lane
Nathan Bossart  writes:
> On Thu, Apr 06, 2023 at 01:32:35PM -0400, Tom Lane wrote:
>> There seems to be enough support for the existing summary function
>> definition to leave it as-is; Andres likes it for one, and I'm not
>> excited about trying to persuade him he's wrong.  But a second
>> slightly-less-aggregated summary function is clearly useful as well.
>> So I'm now thinking that we do want the patch as-submitted.
>> (Caveat: I've not read the patch, just the description.)

> In case we want to do both, here's a 0002 that changes usagecount_avg to an
> array of usage counts.

I'm not sure if there is consensus for 0002, but I reviewed and pushed
0001.  I made one non-cosmetic change: it no longer skips invalid
buffers.  Otherwise, the row for usage count 0 would be pretty useless.
Also it seemed to me that sum(buffers) ought to agree with the
shared_buffers setting.

regards, tom lane




Re: daitch_mokotoff module

2023-04-07 Thread Tom Lane
Tomas Vondra  writes:
> Hi, I think from the technical point of view it's sound and ready for
> commit. The patch stalled on the copyright/credit stuff, which is
> somewhat separate and mostly non-technical aspect of patches. Sorry for
> that, I'm sure it's annoying/frustrating :-(

> I see the current patch has two simple lines:

>  * This module was originally sponsored by Finance Norway /
>  * Trafikkforsikringsforeningen, and implemented by Dag Lem

> Any objections to this level of attribution in commnents?

That seems fine to me.  I'll check this over and see if I can get
it pushed today.

regards, tom lane




Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-06 12:28:04 +0900, Michael Paquier wrote:
> As some say, the introduction of a new message type in pqmq.c would be
> basically a one-way door, because we'd have to maintain it in a stable
> branch.

Why would it mean that? Parallel workers are updated together with the leader,
so there's no compatibility issue?

Greetings,

Andres Freund




Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-07 Thread Imseih (AWS), Sami
> The arguments of pgstat_progress_update_param() would be given by the
> worker directly as components of the 'P' message. It seems to me that
> this approach would have the simplicity to not require the setup of a
> shmem area for the extra counters, and there would be no need for a
> callback. Hence, the only thing the code paths of workers would need
> to do is to call this routine, then the leaders would increment their
> progress when they see a CFI to process the 'P' message. Also, I
> guess that we would only need an interface in backend_progress.c to
> increment counters, like pgstat_progress_incr_param(), but usable by
> workers. Like a pgstat_progress_worker_incr_param()?

So, here is what I think should be workable to give a generic
progress interface.

pgstat_progress_parallel_incr_param will be a new API that
can be called by either worker of leader from any parallel
code path that chooses to increment a progress index. 

If called by a worker, it will send a 'P' message to the front end
passing both the progress index, i.e. PROGRESS_VACUUM_INDEXES_PROCESSED
And the value to increment by, i.e. 1 for index vacuum progress.

With that, the additional shared memory counters in ParallelVacuumState
are not needed, and the poke of the worker to the leader goes directly
through a generic backend_progress API.

Let me know your thoughts.

Thanks!

--
Sami Imseih
Amazon Web Services (AWS)





v27-0001-Report-index-vacuum-progress.patch
Description: v27-0001-Report-index-vacuum-progress.patch


Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 18:14, Daniel Gustafsson  wrote:
>> On 7 Apr 2023, at 17:04, Andres Freund  wrote:

>> Afaict the failures are purely about patch 2, not 1, right?
> 
> Correct.  The attached v6 wraps the interactive_psql test in a SKIP block with
> a conditional on IO::Pty being available.

This version was green in the CFBot, so I ended up pushing it after some
documentation fixups and polish.

--
Daniel Gustafsson





Re: Show various offset arrays for heap WAL records

2023-04-07 Thread Melanie Plageman
Attached v3 is cleaned up and includes a pg_walinspect docs update as
well as some edited comments in rmgr_utils.c

On Mon, Mar 27, 2023 at 6:27 PM Peter Geoghegan  wrote:
>
> On Mon, Mar 27, 2023 at 2:29 PM Melanie Plageman
>  wrote:
> > I went to add dedup records and noticed that since the actual
> > BTDedupInterval struct is what is put in the xlog, I would need access
> > to that type from nbtdesc.c, however, including nbtree.h doesn't seem to
> > work because it includes files that cannot be included in frontend code.
>
> I suppose that the BTDedupInterval struct could have just as easily
> gone in nbtxlog.h, next to xl_btree_dedup. There might have been a
> moment where I thought about doing it that way, but I guess I found it
> slightly preferable to use that symbol name (BTDedupInterval) rather
> than (say) xl_btree_dedup_interval in places like the nearby
> BTDedupStateData struct.
>
> Actually, I suppose that it's hard to make that alternative work, at
> least without
> including nbtxlog.h in nbtree.h. Which sounds wrong.
>
> > I, of course, could make some local struct in nbtdesc.c which has an
> > OffsetNumber and a uint16, since the BTDedupInterval is pretty
> > straightforward, but that seems a bit annoying.
> > I'm probably missing something obvious, but is there a better way to do
> > this?
>
> It was probably just one of those cases where I settled on the
> arrangement that looked least odd overall. Not a particularly
> principled approach. But the approach that I'm going to take once more
> here.  ;-)
>
> All of the available alternatives are annoying in roughly the same
> way, though perhaps to varying degrees. All except one: I'm okay with
> just not adding coverage for deduplication records, for the time being
> -- just seeing the number of intervals alone is relatively informative
> with deduplication records, unlike (say) nbtree delete records. I'm
> also okay with having coverage for dedup records if you feel it's
> worth having. Your call.
>
> If we're going to have coverage for deduplication records then it
> seems to me that we have to have a struct in nbtxlog.h for your code
> to work off of. It also seems likely that we'll want to use that same
> struct within nbtxlog.c. What's less clear is what that means for the
> BTDedupInterval struct. I don't think that we should include nbtxlog.h
> in nbtree.h, nor should we do the converse.
>
> I guess maybe two identical structs would be okay. BTDedupInterval,
> and xl_btree_dedup_interval, with the former still used in nbtdedup.c,
> and the latter used through a pointer at the point that nbtxlog.c
> reads a dedup record. Then maybe at a sizeof() static assert beside
> the existing btree_xlog_dedup() assertions that check that the dedup
> state interval array matches the array taken from the WAL record.
> That's still a bit weird, but I find it preferable to any alternative
> that I can think of.

I've omitted enhancements for the dedup record type for now.

> > On another note, I've thought about how to include some example output
> > in docs, and, for example we could modify the example output in the
> > pgwalinspect docs which includes a PRUNE record already for
> > pg_get_wal_record_info() docs. We'd probably just want to keep it short.
>
> Yeah. Perhaps a PRUNE record for one of the system catalogs whose
> relfilenode is relatively recognizable. Say pg_class. It probably
> doesn't matter that much, but there is perhaps some small value in
> picking an example that is relatively easy to recreate later on (or to
> approximately recreate). I'm certainly not insisting on that, though.

I've added such an example to pg_walinspect docs.

On Tue, Mar 21, 2023 at 6:37 PM Peter Geoghegan  wrote:
>
> On Mon, Mar 13, 2023 at 6:41 PM Peter Geoghegan  wrote:
> > There are several different things that seem important to me
> > personally. These are in tension with each other, to a degree. These
> > are:
> >
> > 1. Like Andres, I'd really like to have some way of inspecting things
> > like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
> > detail. These record types happen to be very important in general, and
> > the ability to see detailed information about the WAL record would
> > definitely help with some debugging scenarios. I've really missed
> > stuff like this while debugging serious issues under time pressure.
>
> One problem that I often run into when performing analysis of VACUUM
> using pg_walinspect is the issue of *who* pruned which heap page, for
> any given PRUNE record. Was it VACUUM/autovacuum, or was it
> opportunistic pruning? There is no way of knowing for sure right now.
> You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
> record coming from VACUUM; it could just have been an opportunistic
> prune operation that happened to take place when a SELECT query ran,
> before any XID was ever allocated.
>
> I think that we should do something like the attached, to completely
> avoid this ambiguity. Th

Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

I think there's some test instability:

Fail:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=parula&dt=2023-04-07%2018%3A43%3A02
Subsequent success, without relevant changes:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=parula&dt=2023-04-07%2020%3A22%3A01
Followed by a failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=parula&dt=2023-04-07%2020%3A31%3A02

Similar failures on other animals:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2023-04-07%2020%3A27%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=siskin&dt=2023-04-07%2020%3A09%3A25

There's also as second type of failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2023-04-07%2020%3A23%3A35
..

I suspect there's a naming conflict between tests in different groups.

Greetings,

Andres Freund




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 13:38:43 -0700, Andres Freund wrote:
> I suspect there's a naming conflict between tests in different groups.

Yep:

test: create_aggregate create_function_sql create_cast constraints triggers 
select inherit typed_table vacuum drop_if_exists updatable_views roleattributes 
create_am hash_func errors infinite_recurse

src/test/regress/sql/inherit.sql
851:create table child(f1 int not null, f2 text not null) 
inherits(inh_parent_1, inh_parent_2);

src/test/regress/sql/triggers.sql
2127:create table child partition of parent for values in ('AAA');
2266:create table child () inherits (parent);
2759:create table child () inherits (parent);

The inherit.sql part is new.

I'll see how hard it is to fix.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 8:27 PM, Drouvot, Bertrand wrote:

Hi,




I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...



Sure, will do in a couple of hours.



That looks good to me, just few remarks:

0005 is missing author/reviewer, I'd propose:

Author: "Drouvot, Bertrand" 
Author: Andres Freund 
Reviewed-by: Andres Freund 
Reviewed-by: Robert Haas 
Reviewed-by: Amit Kapila 
Discussion: 
https://postgr.es/m/20230407075009.igg7be27ha2ht...@awork3.anarazel.de

0006, I'd propose:

Author: "Drouvot, Bertrand" 
Reviewed-By: Jeff Davis 
Reviewed-By: Robert Haas 
Reviewed-by: Amit Kapila 
Reviewed-by: Masahiko Sawada 

0007, I'd propose:

Author: "Drouvot, Bertrand" 
Author: Andres Freund 
Author: Amit Khandekar  (in an older version)
Reviewed-by: FabrÌzio de Royes Mello 
Reviewed-by: Amit Kapila 
Reviewed-By: Robert Haas 

0009, I'd propose:

Author: "Drouvot, Bertrand" 
Author: Andres Freund 
Author: Amit Khandekar  (in an older version)
Reviewed-by: FabrÌzio de Royes Mello 
Reviewed-by: Amit Kapila 
Reviewed-By: Robert Haas 

It's hard (given the amount of emails that have been send during all this time),
but I do hope it's correct and that nobody is missing.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2023-04-07 Thread Drouvot, Bertrand

Hi,

On 4/7/23 8:24 PM, Drouvot, Bertrand wrote:

Hi,

On 4/7/23 5:47 PM, Andres Freund wrote:

Hi,


- write a test that invalidated logical slots do not lead to retaining WAL


I'm not sure how to do that since pg_switch_wal() and friends can't be executed 
on
a standby.


You can do it on the primary and wait for the records to have been applied.



Thanks, will give it a try in a couple of hours.


I looked at it but I think we'd also need things like pg_walfile_name() on the 
standby but is not allowed.


Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.


I gave it a try and it does work.

"
node3 subscribes to node2 (standby).
Insert done in node1 (primary) where the publication is created => node3 see 
the changes.
"

I started to create the TAP test but currently stuck as the "create 
subscription" waits for a checkpoint/pg_log_standby_snapshot() on the primary.

So, trying to make use of things like:

"my %psql_subscriber = ('stdin' => '', 'stdout' => '');
$psql_subscriber{run} =
   $node_subscriber->background_psql('postgres', \$psql_subscriber{stdin},
     \$psql_subscriber{stdout},
     $psql_timeout);
$psql_subscriber{stdout} = '';
"

But in vain so far...



please find attached sub_in_progress.patch that "should work" but "does not" 
because
the wait_for_subscription_sync() call produces:

"
error running SQL: 'psql::1: ERROR:  recovery is in progress
HINT:  WAL control functions cannot be executed during recovery.'
while running 'psql -XAtq -d port=61441 host=/tmp/45dt3wqs2p dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_current_wal_lsn()'
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index 561dcd33c3..c3c0e718c8 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -8,14 +8,18 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
+my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, 
$subscriber_stdin, $subscriber_stdout, $subscriber_stderr, $ret, $handle, 
$slot);
 
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 my $node_standby = PostgreSQL::Test::Cluster->new('standby');
 my $node_cascading_standby = 
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout =
+  IPC::Run::timer(2 * $PostgreSQL::Test::Utils::timeout_default);
 my $res;
 
+
 # Name for the physical slot on primary
 my $primary_slotname = 'primary_physical';
 my $standby_physical_slotname = 'standby_physical';
@@ -263,6 +267,7 @@ $node_standby->init_from_backup(
has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
qq[primary_slot_name = '$primary_slotname']);
+$node_standby->append_conf('postgresql.conf', 'max_replication_slots = 6');
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
 $node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slot('$standby_physical_slotname');]);
@@ -280,6 +285,20 @@ $node_cascading_standby->append_conf('postgresql.conf',
 $node_cascading_standby->start;
 $node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
 
+###
+# Initialize subscriber node
+###
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', 'max_replication_slots = 4');
+$node_subscriber->start;
+
+my %psql_subscriber = ('subscriber_stdin' => '', 'subscriber_stdout' => '');
+$psql_subscriber{run} =
+  $node_subscriber->background_psql('postgres', 
\$psql_subscriber{subscriber_stdin},
+\$psql_subscriber{subscriber_stdout},
+$psql_timeout);
+$psql_subscriber{subscriber_stdout} = '';
+
 ##
 # Test that logical decoding on the standby
 # behaves correctly.
@@ -360,6 +379,43 @@ is( $node_primary->psql(
 3,
 'replaying logical slot from another database fails');
 
+##
+# Test that we can subscribe on the standby with the publication
+# created on the primary.
+##
+
+# Create a table on the primary
+$node_primary->safe_psql('postgres', "CREATE TABLE tab_rep (a int primary 
key)");
+
+# Create a table (same structure) on the subscriber node
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_rep (a int pri

Re: cataloguing NOT NULL constraints

2023-04-07 Thread Alvaro Herrera
On 2023-Apr-07, Andres Freund wrote:

> src/test/regress/sql/triggers.sql
> 2127:create table child partition of parent for values in ('AAA');
> 2266:create table child () inherits (parent);
> 2759:create table child () inherits (parent);
> 
> The inherit.sql part is new.

Yeah.

> I'll see how hard it is to fix.

Running the tests for it now -- it's a short fix.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
https://twitter.com/thingskatedid/status/1456027786158776329




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 22:24, Daniel Gustafsson  wrote:
> 
>> On 7 Apr 2023, at 18:14, Daniel Gustafsson  wrote:
>>> On 7 Apr 2023, at 17:04, Andres Freund  wrote:
> 
>>> Afaict the failures are purely about patch 2, not 1, right?
>> 
>> Correct.  The attached v6 wraps the interactive_psql test in a SKIP block 
>> with
>> a conditional on IO::Pty being available.
> 
> This version was green in the CFBot, so I ended up pushing it after some
> documentation fixups and polish.

Looks like morepork wasn't happy with the interactive \password test.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2023-04-07%2020%3A30%3A29

Looking into why the timer timed out in that test.

--
Daniel Gustafsson





Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-07 Thread Stephen Frost
Greetings,

* David Christensen (da...@pgguru.net) wrote:
> On Wed, Apr 5, 2023 at 3:30 PM Stephen Frost  wrote:
> > Per GSS docs, seems like we should be comparing to GSS_C_NO_CREDENTIAL
> > and validating that the gflags has the `deleg_flag` bit set before
> > considering whether there are valid credentials; in practice this might be
> > the same effect (haven't looked at what that symbol actually resolves to,
> > but NULL would be sensible).
> >
> > GSS_C_NO_CREDENTIAL is indeed NULL, but updated to that anyway to be a
> > bit cleaner and also added an explicit check that GSS_C_DELEG_FLAG was
> > set in gflags.
> 
> + proxy = NULL;
> [...]
> + if (proxy != GSS_C_NO_CREDENTIAL && gflags & GSS_C_DELEG_FLAG)
> 
> We should probably also initialize "proxy" to GSS_C_NO_CREDENTIAL as well,
> yes?

Sure, done, and updated for both auth.c and be-secure-gssapi.c

> > > + /*
> > > +  * Set KRB5CCNAME for this backend, so that later calls to
> > gss_acquire_cred
> > > +  * will find the proxied credentials we stored.
> > > +  */
> > >
> > > So I'm not seeing this in other use in the code; I assume this is just
> > used by the krb5 libs?
> >
> > Not sure I'm following.  gss_acquire_cred() is called in
> > src/interfaces/libpq/fe-gssapi-common.c.
> 
> I just meant the KRB5CCNAME envvar itself; looks like my assumption was
> right.

Ah, yes, that's correct.

> So on a re-read of the v7 patch, there seems to be a bit of inconsistent
> usage between delegation and proxying; i.e., the field itself is called
> gss_proxy in the gssstatus struct, authentication messages, etc, but the
> setting and docs refer to GSS delegation.  Are there subtle distinctions
> between these? It seems like this patch is using them interchangeably, so
> it might be good to settle on one terminology here unless there are already
> well-defined categories for where to use one and where to use the other.

That's a fair point and so I've updated the patch to consistently use
'delegated credentials' and similar to match the Kerberos documentation.
In Kerberos there is *also* the concept of proxied credentials which are
very very similar to delegated credentials (they're actually
"constrainted delegations") but they're not exactly the same and that
isn't what we're doing with this particular patch (though I hope that
once we get support for unconstrained delegation, which is what this
patch is doing, we can then go add support for constrainted
delegations).

Updated patch attached.

Thanks!

Stephen
From 0be69ca2720b2091d2ae79b7f5c91fa30cc27d13 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Feb 2022 20:17:55 -0500
Subject: [PATCH] Add support for Kerberos credential delegation

Support GSSAPI/Kerberos credentials being delegated to the server by a
client.  With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.

Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
delegated to the server by the client and GSSAPI is used to
authenticate to the remote system.

Authors: Stephen Frost, Peifeng Qiu
Reviewed-By: David Christensen
Discussion: https://postgr.es/m/co1pr05mb8023cc2cb575e0faad7df4f8a8...@co1pr05mb8023.namprd05.prod.outlook.com
---
 contrib/dblink/dblink.c   | 127 ---
 contrib/dblink/expected/dblink.out|   4 +-
 contrib/postgres_fdw/connection.c |  72 +++-
 .../postgres_fdw/expected/postgres_fdw.out|  19 +-
 contrib/postgres_fdw/option.c |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |   3 +-
 doc/src/sgml/config.sgml  |  17 +
 doc/src/sgml/dblink.sgml  |   5 +-
 doc/src/sgml/libpq.sgml   |  41 +++
 doc/src/sgml/monitoring.sgml  |   9 +
 doc/src/sgml/postgres-fdw.sgml|   7 +-
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/foreign/foreign.c |   1 +
 src/backend/libpq/auth.c  |  13 +-
 src/backend/libpq/be-gssapi-common.c  |  51 +++
 src/backend/libpq/be-secure-gssapi.c  |  26 +-
 src/backend/utils/activity/backend_status.c   |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  20 +-
 src/backend/utils/init/postinit.c |   8 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/libpq/auth.h  |   1 +
 src/include/libpq/be-gssapi-common.h  |   3 +
 src/include/libpq/libpq-be.h 

Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 23:00:01 +0200, Alvaro Herrera wrote:
> On 2023-Apr-07, Andres Freund wrote:
> 
> > src/test/regress/sql/triggers.sql
> > 2127:create table child partition of parent for values in ('AAA');
> > 2266:create table child () inherits (parent);
> > 2759:create table child () inherits (parent);
> > 
> > The inherit.sql part is new.
> 
> Yeah.
> 
> > I'll see how hard it is to fix.
> 
> Running the tests for it now -- it's a short fix.

I just pushed a fix - sorry, I thought you might have stopped working for the
day and CI finished with the modification a few seconds before your email
arrived...

Greetings,

Andres Freund




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Alvaro Herrera
On 2023-Apr-07, Andres Freund wrote:

> I just pushed a fix - sorry, I thought you might have stopped working for the
> day and CI finished with the modification a few seconds before your email
> arrived...

Ah, cool, no worries.  I would have stopped indeed, but I had to stay
around in case of any test failures.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo." (Augusto Pinochet a una corte de justicia)




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 23:11:55 +0200, Alvaro Herrera wrote:
> On 2023-Apr-07, Andres Freund wrote:
> 
> > I just pushed a fix - sorry, I thought you might have stopped working for 
> > the
> > day and CI finished with the modification a few seconds before your email
> > arrived...
> 
> Ah, cool, no worries.  I would have stopped indeed, but I had to stay
> around in case of any test failures.

Looks like there's work for you if you want ;)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2023-04-07%2018%3A52%3A13

But IMO fixing sepgsql can easily wait till tomorrow.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 22:54:01 +0200, Drouvot, Bertrand wrote:
> That looks good to me

Cool.

I think I'll push these in a few hours. While this needed more changes than
I'd like shortly before the freeze, I think they're largely not in very
interesting bits and pieces - and this feature has been in the works for about
three eternities, and it is blocking a bunch of highly requested features.

If anybody still has energy, I would appreciate a look at 0001, 0002, the new
pieces I added, to make what's now 0003 and 0004 cleaner.


> 0005 is missing author/reviewer, I'd propose:
> [...]

Thanks, I'll integrate them...


> It's hard (given the amount of emails that have been send during all this 
> time),

Indeed.

Greetings,

Andres Freund




Re: Minimal logical decoding on standbys

2023-04-07 Thread Alvaro Herrera
I gave a very quick look at 0001 and 0003.  I find no fault with 0001.
It was clear back when we added that stuff that invalidated_at was not
terribly useful -- I was just too conservative to not have it -- but now
that a lot of time has passed and we haven't done anything with it,
removing it seems perfectly OK.

As for 0003, I have no further concerns about the translatability.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)




Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-07 Thread Yurii Rashkovskii
Hi Andrew,

On Fri, Apr 7, 2023, 7:07 p.m. Andrew Dunstan  wrote:

>
> On 2023-03-29 We 07:55, Tom Lane wrote:
>
> Yurii Rashkovskii   writes:
>
> I would like to suggest a patch against master (although it may be worth
> backporting it) that makes it possible to listen on any unused port.
>
> I think this is a bad idea, mainly because this:
>
>
> Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> port) and retrieve the assigned port from postmaster.pid
>
> is a horrid way to find out what was picked, and yet there could
> be no other.
>
> Our existing design for this sort of thing is to let the testing
> framework choose the port, and I don't really see what's wrong
> with that approach.  Yes, I know it's theoretically subject to
> race conditions, but that hasn't seemed to be a problem in
> practice.  It's especially not a problem given that modern
> testing practice tends to not open any TCP port at all, just
> a Unix socket in a test-private directory, so that port
> conflicts are a non-issue.
>
>
> For TAP tests we have pretty much resolved the port collisions issue for
> TCP ports too. See commit 9b4eafcaf4
>
> Perhaps the OP could adapt that logic to his use case.
>

Thank you for referencing this commit. The point why I am suggesting my
patch is that I believe that my solution is a much better way to avoid
collisions in the first place. Implementing an algorithm similar to the one
in the referenced commit is error-pfone and can be difficult in
environments like shell script.

I'm trying to understand what's wrong with reading port from the pid file
(if Postgres writes information there, it's surely so that somebody can
read it, otherwise, why write it in the first placd)? The proposed solution
uses operating system's functionality to achieve collision-free mechanics
with none of the complexity introduced in the commit.


Re: Show various offset arrays for heap WAL records

2023-04-07 Thread Peter Geoghegan
On Fri, Apr 7, 2023 at 1:33 PM Melanie Plageman
 wrote:
> Attached v3 is cleaned up and includes a pg_walinspect docs update as
> well as some edited comments in rmgr_utils.c

Attached v4 has some small tweaks on your v3. Mostly just whitespace
tweaks. Two slightly notable tweaks:

* I changed the approach to globbing in the Makefile, rather than use
your original overwide formulation for the new rmgrdesc_utils.c file.

What do you think of this approach?

* Removed use of the restrict keyword.

While "restrict" is C99, I'm not completely sure that it's totally
supported by Postgres. I'm a bit surprised that you opted to use it in
this particular patch.

I meant to ask you about this earlier...why use restrict in this patch?

> I've added such an example to pg_walinspect docs.

There already was a PRUNE example, though -- for the
pg_get_wal_record_info function (singular, not to be confused with
pg_get_wal_records_info).

v4 makes the example a VACUUM record, which replaces the previous
pg_get_wal_record_info PRUNE example -- that needed to be updated
anyway. This approach has the advantage of not being too verbose,
which still showing some of this kind of detail.

This has the advantage of allowing pg_get_wal_records_info's example
to continue to be an example that lacks a block reference (and so has
a NULL block_ref). This is a useful contrast against the new
pg_get_wal_block_info function.

> I really like this idea and would find it useful. I reviewed the patch
> and tried it out and it worked for me and code looked fine as well.
>
> I didn't include it in the attached patchset because I don't feel
> confident enough in my own understanding of any potential implications
> of splitting up these record types to definitively endorse it. But, if
> someone else felt comfortable with it, I would like to see it in the
> tree.

I'm not going to move on it now for 16, given the lack of feedback about it.

-- 
Peter Geoghegan


v4-0001-Add-rmgr_desc-utilities.patch
Description: Binary data


Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-07 Thread David Christensen
Reviewed v8; largely looking good, though I notice this hunk, which may
arguably be a bug fix, but doesn't appear to be relevant to this specific
patch, so could probably be debated independently (and if a bug, should
probably be backpatched):

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 4229d2048c..11d41979c6 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -288,6 +288,9 @@ InitPgFdwOptions(void)
  {"sslcert", UserMappingRelationId, true},
  {"sslkey", UserMappingRelationId, true},

+ /* gssencmode is also libpq option, same to above. */
+ {"gssencmode", UserMappingRelationId, true},
+
  {NULL, InvalidOid, false}
  };

That said, should "gssdeleg" be exposed as a user mapping?  (This shows up
in postgresql_fdw; not sure if there are other places that would be
relevant, like in dblink somewhere as well, just a thought.)

Best,

David


Re: cataloguing NOT NULL constraints

2023-04-07 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-07 23:11:55 +0200, Alvaro Herrera wrote:
>> Ah, cool, no worries.  I would have stopped indeed, but I had to stay
>> around in case of any test failures.

> Looks like there's work for you if you want ;)
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2023-04-07%2018%3A52%3A13

> But IMO fixing sepgsql can easily wait till tomorrow.

I can deal with that one -- it's a bit annoying to work with sepgsql
if you're not on a Red Hat platform.

After quickly eyeing the diffs, I'm just going to take the new output
as good.  I'm not surprised that there are additional output messages
given the additional catalog entries this made.  I *am* a bit surprised
that some messages seem to have disappeared --- are there places where
this resulted in fewer catalog accesses than before?  Nonetheless,
there's no good reason to assume this test is exposing any bugs.

regards, tom lane




Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-07 Thread Stephen Frost
Greetings,

* David Christensen (da...@pgguru.net) wrote:
> Reviewed v8; largely looking good, though I notice this hunk, which may
> arguably be a bug fix, but doesn't appear to be relevant to this specific
> patch, so could probably be debated independently (and if a bug, should
> probably be backpatched):
> 
> diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
> index 4229d2048c..11d41979c6 100644
> --- a/contrib/postgres_fdw/option.c
> +++ b/contrib/postgres_fdw/option.c
> @@ -288,6 +288,9 @@ InitPgFdwOptions(void)
>   {"sslcert", UserMappingRelationId, true},
>   {"sslkey", UserMappingRelationId, true},
> 
> + /* gssencmode is also libpq option, same to above. */
> + {"gssencmode", UserMappingRelationId, true},
> +
>   {NULL, InvalidOid, false}
>   };

Hmm, yeah, hard to say if that makes sense at a user-mapping level or
not.  Agreed that we could have an independent discussion regarding
that and if it should be back-patched, so removed it from this patch.

> That said, should "gssdeleg" be exposed as a user mapping?  (This shows up
> in postgresql_fdw; not sure if there are other places that would be
> relevant, like in dblink somewhere as well, just a thought.)

Ah, yeah, that certainly makes sense to have as optional for a user
mapping.  dblink doesn't have the distinction between server-level
options and user mapping options (as it doesn't have user mappings at
all really) so it doesn't have something similar.

Updated patch attached.

Thanks!

Stephen
From 87642bc75e7d4f3d986d4f100e6ee00711155bc7 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Feb 2022 20:17:55 -0500
Subject: [PATCH] Add support for Kerberos credential delegation

Support GSSAPI/Kerberos credentials being delegated to the server by a
client.  With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.

Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
delegated to the server by the client and GSSAPI is used to
authenticate to the remote system.

Authors: Stephen Frost, Peifeng Qiu
Reviewed-By: David Christensen
Discussion: https://postgr.es/m/co1pr05mb8023cc2cb575e0faad7df4f8a8...@co1pr05mb8023.namprd05.prod.outlook.com
---
 contrib/dblink/dblink.c   | 127 ---
 contrib/dblink/expected/dblink.out|   4 +-
 contrib/postgres_fdw/connection.c |  72 +++-
 .../postgres_fdw/expected/postgres_fdw.out|  19 +-
 contrib/postgres_fdw/option.c |   6 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |   3 +-
 doc/src/sgml/config.sgml  |  17 +
 doc/src/sgml/dblink.sgml  |   5 +-
 doc/src/sgml/libpq.sgml   |  41 +++
 doc/src/sgml/monitoring.sgml  |   9 +
 doc/src/sgml/postgres-fdw.sgml|   7 +-
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/foreign/foreign.c |   1 +
 src/backend/libpq/auth.c  |  13 +-
 src/backend/libpq/be-gssapi-common.c  |  51 +++
 src/backend/libpq/be-secure-gssapi.c  |  26 +-
 src/backend/utils/activity/backend_status.c   |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  20 +-
 src/backend/utils/init/postinit.c |   8 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/libpq/auth.h  |   1 +
 src/include/libpq/be-gssapi-common.h  |   3 +
 src/include/libpq/libpq-be.h  |   2 +
 src/include/utils/backend_status.h|   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-auth.c|  15 +-
 src/interfaces/libpq/fe-connect.c |  17 +
 src/interfaces/libpq/fe-secure-gssapi.c   |  23 +-
 src/interfaces/libpq/libpq-fe.h   |   1 +
 src/interfaces/libpq/libpq-int.h  |   2 +
 src/test/kerberos/Makefile|   3 +
 src/test/kerberos/t/001_auth.pl   | 331 --
 src/test/perl/PostgreSQL/Test/Utils.pm|  27 ++
 src/test/regress/expected/rules.out   |  11 +-
 36 files changed, 752 insertions(+), 136 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 78a8bcee6e..533eb90e62 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -48,6 +48,7 @@
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
+#include "libpq/libpq-be.h"
 #include "libpq/libpq-be-fe-helpers.

Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-07 Thread David Christensen
Ok, based on the interdiff there, I'm happy with that last change.  Marking
as Ready For Committer.

Best,

David


Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 7 Apr 2023, at 23:01, Daniel Gustafsson  wrote:

> Looks like morepork wasn't happy with the interactive \password test.
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2023-04-07%2020%3A30%3A29
> 
> Looking into why the timer timed out in that test.

Staring at this I've been unable to figure out if there an underlying problem
here or a flaky testrun, since I can't reproduce it.  Maybe the animal owner
(on cc) have any insights?

The test has passed on several different platforms in the buildfarm, including
Linux, Solaris, macOS, NetBSD, FreeBSD and other OpenBSD animals.  It also
passed in an OpenBSD VM running with our Cirrus framework.

Unless there are objections raised I propose leaving it in for now, and I will
return to it tomorrow after some sleep, and install OpenBSD 6.9 to see if it's
reproducible.

--
Daniel Gustafsson





Re: Commitfest 2023-03 starting tomorrow!

2023-04-07 Thread Kirk Wolak
On Fri, Apr 7, 2023 at 10:21 AM Greg Stark  wrote:

> As announced on this list feature freeze is at 00:00 April 8 AoE.
> That's less than 24 hours away. If you need to set your watches to AoE
> timezone it's currently:
>
> $ TZ=AOE+12 date
> Fri 07 Apr 2023 02:05:50 AM AOE
>
> As we stand we have:
>
> Status summary:
>   Needs review: 82
>   Waiting on Author:16
>   Ready for Committer:  27
>   Committed:   115
>   Moved to next CF: 38
>   Returned with Feedback:   10
>   Rejected:  9
>   Withdrawn:22
> Total: 319.
>
> In less than 24h most of the remaining patches will get rolled forward
> to the next CF. The 16 that are Waiting on Author might be RwF
> perhaps. The only exceptions would be non-features like Bug Fixes and
> cleanup patches that have been intentionally held until the end --
> those become Open Issues for the release.
>
> So if we move forward all the remaining patches (so these numbers are
> high by about half a dozen) the *next* CF would look like:
>
> Commitfest 2023-07:Now  April 8
>   Needs review: 46. 128
>   Waiting on Author:17.  33
>   Ready for Committer:   3.  30
> Total:  66  191
>
> I suppose that's better than the 319 we came into this CF with but
> there's 3 months to accumulate more unreviewed patches...
>
> I had hoped to find lots of patches that I could bring the hammer down
> on and say there's just no interest in or there's no author still
> maintaining. But that wasn't the case. Nearly all the patches still
> had actively interested authors and looked like they were legitimately
> interesting and worthwhile features that people just haven't had the
> time to review or commit.
>
>
> --
> greg
>
> The %T added to the PSQL Prompt is about 5 lines of code.  Reviewed and
Ready to commit.
That could knock one more off really quickly :-)

Excellent work to everyone.

Thanks, Kirk


Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 17:46:33 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-07 23:11:55 +0200, Alvaro Herrera wrote:
> >> Ah, cool, no worries.  I would have stopped indeed, but I had to stay
> >> around in case of any test failures.
> 
> > Looks like there's work for you if you want ;)
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2023-04-07%2018%3A52%3A13
> 
> > But IMO fixing sepgsql can easily wait till tomorrow.
> 
> I can deal with that one -- it's a bit annoying to work with sepgsql
> if you're not on a Red Hat platform.

Indeed. I tried to get them running a while back, to enable the tests with
meson, without lot of success. Then I realized that they're also not wired up
in make... ;)


> After quickly eyeing the diffs, I'm just going to take the new output
> as good.  I'm not surprised that there are additional output messages
> given the additional catalog entries this made.  I *am* a bit surprised
> that some messages seem to have disappeared --- are there places where
> this resulted in fewer catalog accesses than before?  Nonetheless,
> there's no good reason to assume this test is exposing any bugs.

I wonder if the issue is that the new paths miss a hook invocation.

@@ -160,11 +160,7 @@
 ALTER TABLE regtest_table ALTER b SET DEFAULT 'XYZ';-- not supported yet
 ALTER TABLE regtest_table ALTER b DROP DEFAULT; -- not supported yet
 ALTER TABLE regtest_table ALTER b SET NOT NULL;
-LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema_2.regtest_table.b" permissive=0
-LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema.regtest_table_2.b" permissive=0
 ALTER TABLE regtest_table ALTER b DROP NOT NULL;
-LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema_2.regtest_table.b" permissive=0
-LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema.regtest_table_2.b" permissive=0
 ALTER TABLE regtest_table ALTER b SET STATISTICS -1;
 LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema_2.regtest_table.b" permissive=0
 LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema.regtest_table_2.b" permissive=0

The 'not supported yet' cases don't emit messages. Previously SET NOT NULL
wasn't among that set, but seemingly it now is.

Greetings,

Andres Freund




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-07 17:46:33 -0400, Tom Lane wrote:
>> After quickly eyeing the diffs, I'm just going to take the new output
>> as good.  I'm not surprised that there are additional output messages
>> given the additional catalog entries this made.  I *am* a bit surprised
>> that some messages seem to have disappeared --- are there places where
>> this resulted in fewer catalog accesses than before?  Nonetheless,
>> there's no good reason to assume this test is exposing any bugs.

> I wonder if the issue is that the new paths miss a hook invocation.

Perhaps.  I'm content to silence the buildfarm for today; we can
investigate more closely later.

regards, tom lane




Re: Commitfest 2023-03 starting tomorrow!

2023-04-07 Thread Tom Lane
Kirk Wolak  writes:
> The %T added to the PSQL Prompt is about 5 lines of code.  Reviewed and
> Ready to commit.
> That could knock one more off really quickly :-)

I'm still objecting to it, for the same reason as before.

regards, tom lane




Re: Minimal logical decoding on standbys

2023-04-07 Thread Melanie Plageman
Code review only of 0001-0005.

I noticed you had two 0008, btw.

On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> > Integrated all of these.
> 
> From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001
> From: Andres Freund 
> Date: Thu, 6 Apr 2023 20:00:07 -0700
> Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN with
>  an enum
> 
> diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
> index 8872c80cdfe..ebcb637baed 100644
> --- a/src/include/replication/slot.h
> +++ b/src/include/replication/slot.h
> @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency
>   RS_TEMPORARY
>  } ReplicationSlotPersistency;
>  
> +/*
> + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
> + * 'invalidated' field is set to a value other than _NONE.
> + */
> +typedef enum ReplicationSlotInvalidationCause
> +{
> + RS_INVAL_NONE,
> + /* required WAL has been removed */

I just wonder if RS_INVAL_WAL is too generic. Something like
RS_INVAL_WAL_MISSING or similar may be better since it seems there are
other inavlidation causes that may be related to WAL.

> + RS_INVAL_WAL,
> +} ReplicationSlotInvalidationCause;
> +

0002 LGTM

> From 52c25cc15abc4470d19e305d245b9362e6b8d6a3 Mon Sep 17 00:00:00 2001
> From: Andres Freund 
> Date: Fri, 7 Apr 2023 09:32:48 -0700
> Subject: [PATCH va67 3/9] Support invalidating replication slots due to
>  horizon and wal_level
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Needed for supporting logical decoding on a standby. The new invalidation
> methods will be used in a subsequent commit.
> 

You probably are aware, but applying 0003 and 0004 both gives me two
warnings:

warning: 1 line adds whitespace errors.
Warning: commit message did not conform to UTF-8.
You may want to amend it after fixing the message, or set the config
variable i18n.commitEncoding to the encoding your project uses.

> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> index df23b7ed31e..c2a9accebf6 100644
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -1241,8 +1241,58 @@ ReplicationSlotReserveWal(void)
>  }
>  
>  /*
> - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot
> - * and mark it invalid, if necessary and possible.
> + * Report that replication slot needs to be invalidated
> + */
> +static void
> +ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
> +bool terminating,
> +int pid,
> +NameData slotname,
> +XLogRecPtr restart_lsn,
> +XLogRecPtr oldestLSN,
> +TransactionId 
> snapshotConflictHorizon)
> +{
> + StringInfoData err_detail;
> + boolhint = false;
> +
> + initStringInfo(&err_detail);
> +
> + switch (cause)
> + {
> + case RS_INVAL_WAL:
> + hint = true;
> + appendStringInfo(&err_detail, _("The slot's restart_lsn 
> %X/%X exceeds the limit by %llu bytes."),
> +  
> LSN_FORMAT_ARGS(restart_lsn),

I'm not sure what the below cast is meant to do. If you are trying to
protect against overflow/underflow, I think you'd need to cast before
doing the subtraction.

> +  (unsigned long long) 
> (oldestLSN - restart_lsn));
> + break;
> + case RS_INVAL_HORIZON:
> + appendStringInfo(&err_detail, _("The slot conflicted 
> with xid horizon %u."),
> +  
> snapshotConflictHorizon);
> + break;
> +
> + case RS_INVAL_WAL_LEVEL:
> + appendStringInfo(&err_detail, _("Logical decoding on 
> standby requires wal_level to be at least logical on the primary server"));
> + break;
> + case RS_INVAL_NONE:
> + pg_unreachable();
> + }

This ereport is quite hard to read. Is there any simplification you can
do of the ternaries without undue duplication?

> + ereport(LOG,
> + terminating ?
> + errmsg("terminating process %d to release replication 
> slot \"%s\"",
> +pid, NameStr(slotname)) :
> + errmsg("invalidating obsolete replication slot \"%s\"",
> +NameStr(slotname)),
> + errdetail_internal("%s", err_detail.data),
> + hint ? errhint("You might need to increase 
> max_slot_wal_keep_size.") : 0);
> +
> + pfree

Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 7 Apr 2023, at 23:01, Daniel Gustafsson  wrote:
> Staring at this I've been unable to figure out if there an underlying problem
> here or a flaky testrun, since I can't reproduce it.  Maybe the animal owner
> (on cc) have any insights?

> The test has passed on several different platforms in the buildfarm, including
> Linux, Solaris, macOS, NetBSD, FreeBSD and other OpenBSD animals.  It also
> passed in an OpenBSD VM running with our Cirrus framework.

prion and mantid have now failed with the same symptom.  I don't
see a pattern, but it's not OpenBSD-only.  It will be interesting
to see if the failure is intermittent or not on those animals.

> Unless there are objections raised I propose leaving it in for now, and I will
> return to it tomorrow after some sleep, and install OpenBSD 6.9 to see if it's
> reproducible.

Agreed, we don't need a hasty revert here.  Better to gather data.

regards, tom lane




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Tom Lane
... BTW, shouldn't
https://commitfest.postgresql.org/42/3869/
now get closed as committed?

regards, tom lane




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 8 Apr 2023, at 00:35, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 7 Apr 2023, at 23:01, Daniel Gustafsson  wrote:
>> Staring at this I've been unable to figure out if there an underlying problem
>> here or a flaky testrun, since I can't reproduce it.  Maybe the animal owner
>> (on cc) have any insights?
> 
>> The test has passed on several different platforms in the buildfarm, 
>> including
>> Linux, Solaris, macOS, NetBSD, FreeBSD and other OpenBSD animals.  It also
>> passed in an OpenBSD VM running with our Cirrus framework.
> 
> prion and mantid have now failed with the same symptom.  I don't
> see a pattern, but it's not OpenBSD-only.  It will be interesting
> to see if the failure is intermittent or not on those animals.

It would be interesting to know how far in the pumped input they get, if they
time out on the first one with nothing going through?  Will investigate further
tomorrow to see.

--
Daniel Gustafsson





Re: Show various offset arrays for heap WAL records

2023-04-07 Thread Melanie Plageman
On Fri, Apr 7, 2023 at 5:43 PM Peter Geoghegan  wrote:
>
> On Fri, Apr 7, 2023 at 1:33 PM Melanie Plageman
>  wrote:
> > Attached v3 is cleaned up and includes a pg_walinspect docs update as
> > well as some edited comments in rmgr_utils.c
>
> Attached v4 has some small tweaks on your v3. Mostly just whitespace
> tweaks. Two slightly notable tweaks:
>
> * I changed the approach to globbing in the Makefile, rather than use
> your original overwide formulation for the new rmgrdesc_utils.c file.
>
> What do you think of this approach?

Seems fine.

> * Removed use of the restrict keyword.
>
> While "restrict" is C99, I'm not completely sure that it's totally
> supported by Postgres. I'm a bit surprised that you opted to use it in
> this particular patch.
>
> I meant to ask you about this earlier...why use restrict in this patch?


So, I think the signature I meant to have was:

void
array_desc(StringInfo buf, void *array, size_t elem_size, int count,
  void (*elem_desc) (StringInfo buf, const void *elem, void *data),
  void *data)

Basically I wanted to indicate that elem was not and should not be
modified and data can be modified but that they should not be the same
element or overlap at all.

> > I've added such an example to pg_walinspect docs.
>
> There already was a PRUNE example, though -- for the
> pg_get_wal_record_info function (singular, not to be confused with
> pg_get_wal_records_info).
>
> v4 makes the example a VACUUM record, which replaces the previous
> pg_get_wal_record_info PRUNE example -- that needed to be updated
> anyway. This approach has the advantage of not being too verbose,
> which still showing some of this kind of detail.
>
> This has the advantage of allowing pg_get_wal_records_info's example
> to continue to be an example that lacks a block reference (and so has
> a NULL block_ref). This is a useful contrast against the new
> pg_get_wal_block_info function.

LGTM

- Melanie




Re: Show various offset arrays for heap WAL records

2023-04-07 Thread Peter Geoghegan
On Fri, Apr 7, 2023 at 4:01 PM Melanie Plageman
 wrote:
> LGTM

Pushed, thanks.

-- 
Peter Geoghegan




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Daniel Gustafsson
> On 8 Apr 2023, at 00:59, Daniel Gustafsson  wrote:
> 
>> On 8 Apr 2023, at 00:35, Tom Lane  wrote:
>> 
>> Daniel Gustafsson  writes:
 On 7 Apr 2023, at 23:01, Daniel Gustafsson  wrote:
>>> Staring at this I've been unable to figure out if there an underlying 
>>> problem
>>> here or a flaky testrun, since I can't reproduce it.  Maybe the animal owner
>>> (on cc) have any insights?
>> 
>>> The test has passed on several different platforms in the buildfarm, 
>>> including
>>> Linux, Solaris, macOS, NetBSD, FreeBSD and other OpenBSD animals.  It also
>>> passed in an OpenBSD VM running with our Cirrus framework.
>> 
>> prion and mantid have now failed with the same symptom.  I don't
>> see a pattern, but it's not OpenBSD-only.  It will be interesting
>> to see if the failure is intermittent or not on those animals.

morepork has failed again, which is good, since intermittent failures are
harder to track down.

> It would be interesting to know how far in the pumped input they get, if they
> time out on the first one with nothing going through?  Will investigate 
> further
> tomorrow to see.

Actually, one quick datapoint.  prion and mantid report running IPC::Run
version 0.92, and morepork 0.96.  Animals that pass are running 20180523.0,
20200505.0, 20220807.0 or similar versions.  We don't print the IO::Pty version
during configure, but maybe this is related to older versions of the modules
and this test (not all of them apparently) need to SKIP if IO::Pty is missing
or too old?  Somewhere to start looking at the very least.

--
Daniel Gustafsson





Re: Show various offset arrays for heap WAL records

2023-04-07 Thread Melanie Plageman
On Fri, Apr 7, 2023 at 7:09 PM Peter Geoghegan  wrote:
>
> On Fri, Apr 7, 2023 at 4:01 PM Melanie Plageman
>  wrote:
> > LGTM
>
> Pushed, thanks.

It's come to my attention that I forgot to include the btree patch earlier.

PFA
From 4f502b2513ba79d738e7ed87aaf7d18ed2a2e30f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 13 Mar 2023 18:15:17 -0400
Subject: [PATCH v4 2/2] Add detail to some btree xlog record descs

Suggested by Peter Geoghegan

Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de
---
 src/backend/access/rmgrdesc/nbtdesc.c| 84 +---
 src/backend/access/rmgrdesc/rmgrdesc_utils.c |  6 ++
 src/include/access/rmgrdesc_utils.h  |  2 +
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index c5dc543a0f..9ffece109d 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -15,6 +15,58 @@
 #include "postgres.h"
 
 #include "access/nbtxlog.h"
+#include "access/rmgrdesc_utils.h"
+
+static void btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted,
+		   uint16 nupdated);
+static void btree_update_elem_desc(StringInfo buf, void *restrict update, void
+   *restrict data);
+
+static void
+btree_del_desc(StringInfo buf, char *block_data, uint16 ndeleted, uint16 nupdated)
+{
+	OffsetNumber *updatedoffsets;
+	xl_btree_update *updates;
+	OffsetNumber *data = (OffsetNumber *) block_data;
+
+	appendStringInfoString(buf, ", deleted:");
+	array_desc(buf, data, sizeof(OffsetNumber), ndeleted, &offset_elem_desc, NULL);
+
+	appendStringInfoString(buf, ", updated:");
+	array_desc(buf, data, sizeof(OffsetNumber), nupdated, &offset_elem_desc, NULL);
+
+	if (nupdated <= 0)
+		return;
+
+	updatedoffsets = (OffsetNumber *)
+		((char *) data + ndeleted * sizeof(OffsetNumber));
+	updates = (xl_btree_update *) ((char *) updatedoffsets +
+   nupdated *
+   sizeof(OffsetNumber));
+
+	appendStringInfoString(buf, ", updates:");
+	array_desc(buf, updates, sizeof(xl_btree_update),
+			   nupdated, &btree_update_elem_desc,
+			   &updatedoffsets);
+}
+
+static void
+btree_update_elem_desc(StringInfo buf, void *restrict update, void *restrict data)
+{
+	xl_btree_update *new_update = (xl_btree_update *) update;
+	OffsetNumber *updated_offset = *((OffsetNumber **) data);
+
+	appendStringInfo(buf, "{ updated offset: %u, ndeleted tids: %u", *updated_offset, new_update->ndeletedtids);
+
+	appendStringInfoString(buf, ", deleted tids:");
+
+	array_desc(buf, (char *) new_update + SizeOfBtreeUpdate,
+			   sizeof(uint16), new_update->ndeletedtids, &uint16_elem_desc, NULL);
+
+	updated_offset++;
+
+	appendStringInfo(buf, " }");
+}
 
 void
 btree_desc(StringInfo buf, XLogReaderState *record)
@@ -31,7 +83,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 xl_btree_insert *xlrec = (xl_btree_insert *) rec;
 
-appendStringInfo(buf, "off %u", xlrec->offnum);
+appendStringInfo(buf, "off: %u", xlrec->offnum);
 break;
 			}
 		case XLOG_BTREE_SPLIT_L:
@@ -39,7 +91,7 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 xl_btree_split *xlrec = (xl_btree_split *) rec;
 
-appendStringInfo(buf, "level %u, firstrightoff %d, newitemoff %d, postingoff %d",
+appendStringInfo(buf, "level: %u, firstrightoff: %d, newitemoff: %d, postingoff: %d",
  xlrec->level, xlrec->firstrightoff,
  xlrec->newitemoff, xlrec->postingoff);
 break;
@@ -48,31 +100,41 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 xl_btree_dedup *xlrec = (xl_btree_dedup *) rec;
 
-appendStringInfo(buf, "nintervals %u", xlrec->nintervals);
+appendStringInfo(buf, "nintervals: %u", xlrec->nintervals);
 break;
 			}
 		case XLOG_BTREE_VACUUM:
 			{
 xl_btree_vacuum *xlrec = (xl_btree_vacuum *) rec;
 
-appendStringInfo(buf, "ndeleted %u; nupdated %u",
+appendStringInfo(buf, "ndeleted: %u, nupdated: %u",
  xlrec->ndeleted, xlrec->nupdated);
+
+if (!XLogRecHasBlockImage(record, 0))
+	btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+   xlrec->ndeleted, xlrec->nupdated);
+
 break;
 			}
 		case XLOG_BTREE_DELETE:
 			{
 xl_btree_delete *xlrec = (xl_btree_delete *) rec;
 
-appendStringInfo(buf, "snapshotConflictHorizon %u; ndeleted %u; nupdated %u",
+appendStringInfo(buf, "snapshotConflictHorizon: %u, ndeleted: %u, nupdated: %u",
  xlrec->snapshotConflictHorizon,
  xlrec->ndeleted, xlrec->nupdated);
+
+if (!XLogRecHasBlockImage(record, 0))
+	btree_del_desc(buf, XLogRecGetBlockData(record, 0, NULL),
+   xlrec->ndeleted, xlrec->nupdated);
+
 break;
 			}
 		case XLOG_BTREE_MARK_PAGE_HALFDEAD:
 			{
 xl_btree_mark_page_halfdead *xlrec = (xl_btree_mark_page_halfdead *) rec;
 
-appendStringInfo(buf, "topparent %u; leaf %u; left %u; right %u",
+appendStringInfo

Re: Disable rdns for Kerberos tests

2023-04-07 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Push, thanks again!
> 
> Why'd you only change HEAD?  Isn't the test equally fragile in the
> back branches?

Back-patched.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Disable rdns for Kerberos tests

2023-04-07 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > Push, thanks again!
> > 
> > Why'd you only change HEAD?  Isn't the test equally fragile in the
> > back branches?
> 
> Following on from this after some additional cross-platform testing,
> turns out there's other options we should be disabling in these tests to
> avoid depending on DNS for the test.
> 
> Attached is another patch which, for me at least, seems to prevent the
> tests from causing any DNS requests to happen.  This also means that the
> tests run in a reasonable time even in cases where DNS is entirely
> broken (the resolver set in /etc/resolv.conf doesn't respond).
> 
> Barring objections, my plan is to commit this change soon and to
> back-patch both patches to supported branches.

Done.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Kerberos delegation support in libpq and postgres_fdw

2023-04-07 Thread Stephen Frost
Greetings,

* David Christensen (da...@pgguru.net) wrote:
> Ok, based on the interdiff there, I'm happy with that last change.  Marking
> as Ready For Committer.

Great, thanks!

I'm going to go through it again myself but I feel reasonably good about
it and if nothing else pops and there aren't objections, I'll push it
before feature freeze.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Show various offset arrays for heap WAL records

2023-04-07 Thread Peter Geoghegan
On Fri, Apr 7, 2023 at 4:21 PM Melanie Plageman
 wrote:
> It's come to my attention that I forgot to include the btree patch earlier.

Pushed that one too.

Also removed the use of the "restrict" keyword here.

Thanks
--
Peter Geoghegan




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Andrew Dunstan


On 2023-04-07 Fr 19:14, Daniel Gustafsson wrote:

On 8 Apr 2023, at 00:59, Daniel Gustafsson  wrote:


On 8 Apr 2023, at 00:35, Tom Lane  wrote:

Daniel Gustafsson  writes:

On 7 Apr 2023, at 23:01, Daniel Gustafsson  wrote:

Staring at this I've been unable to figure out if there an underlying problem
here or a flaky testrun, since I can't reproduce it.  Maybe the animal owner
(on cc) have any insights?
The test has passed on several different platforms in the buildfarm, including
Linux, Solaris, macOS, NetBSD, FreeBSD and other OpenBSD animals.  It also
passed in an OpenBSD VM running with our Cirrus framework.

prion and mantid have now failed with the same symptom.  I don't
see a pattern, but it's not OpenBSD-only.  It will be interesting
to see if the failure is intermittent or not on those animals.

morepork has failed again, which is good, since intermittent failures are
harder to track down.


It would be interesting to know how far in the pumped input they get, if they
time out on the first one with nothing going through?  Will investigate further
tomorrow to see.

Actually, one quick datapoint.  prion and mantid report running IPC::Run
version 0.92, and morepork 0.96.  Animals that pass are running 20180523.0,
20200505.0, 20220807.0 or similar versions.  We don't print the IO::Pty version
during configure, but maybe this is related to older versions of the modules
and this test (not all of them apparently) need to SKIP if IO::Pty is missing
or too old?  Somewhere to start looking at the very least.



Those aren't CPAN version numbers. See 


prion was running 1.10 (dated to 2010). I have just updated it to 1.17 
(the CPAN latest). We'll see if that makes a difference.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2023-04-07 Thread Michael Paquier
On Sat, Apr 08, 2023 at 01:32:22AM +1200, Thomas Munro wrote:
> I'm hoping to get just the regex changes in ASAP, and then take a
> little bit longer on the recovery conflict patch itself (v6-0005) on
> the basis that it's bugfix work and not subject to the feature freeze.

Agreed.  It would be good to check with the RMT, but as long as that's
not at the middle/end of the beta cycle I guess that's OK for this
one, even if it is only for HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: Track IO times in pg_stat_io

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 12:17:38 -0400, Melanie Plageman wrote:
> Attached v9 addresses review feedback as well as resolving merge
> conflicts with recent relation extension patchset.

I've edited it a bit more:

- removed pgstat_tracks_io_time() and replaced it by returning the new
  IO_COL_INVALID = -1 from pgstat_get_io_time_index() when there's no time

- moved PgStat_Counter count, time into the respective branches. It feels
  somewhat wrong to access the time when we then decide there is no time.

- s/io_object/io_obj/ in pgstat_count_io_op_time(), combined with added
  linebreaks, got the code to under 80 chars

- renamed pg_stat_microseconds_to_milliseconds to pg_stat_us_to_ms

- removed a spurious newline

- the times reported by pg_stat_io had their fractional part removed, due to
  pg_stat_us_to_ms returning an integer


Verifying this, I saw that the write time visible in pg_stat_io didn't quite
match what I saw in log_checkpoints. But not always. Eventually I figured out
that that's not pg_stat_io's fault - log_checkpoint's write includes a lot of
things, including several other CheckPoint* routines, flushing WAL, asking the
kernel to flush things to disk...  The biggest portion in my case were the
smgrwriteback() calls - which pg_stat_io doesn't track - oops.

Pushed up to and including 0003.


> I've changed pgstat_count_io_op_time() to take a count and call
> pgstat_count_io_op_n() so it can be used with smgrzeroextend(). I do
> wish that the parameter to pgstat_count_io_op_n() was called "count" and
> not "cnt"...

Heh.


> I've also reordered the call site of pgstat_count_io_op_time() in a few
> locations, but I have some questions about this.
> 
> Before, I didn't think it mattered much that we didn't finish counting
> IO time until after setting BM_VALID or BM_DIRTY and unsetting
> BM_IO_IN_PROGRESS. With the relation extension code doing this for many
> buffers at once, though, I wondered if this will make the IO timing too
> inaccurate.

> As such, I've moved pgstat_count_io_op_time() to before we set those
> flags in all locations. I did wonder if it is bad to prolong having the
> buffer pinned and not having those flags set, though.

I went back and forth about this before. I think it's ok the way you did it.


I think 0004 needs a bit more work. At the very least we would have to swap
the order of pgstat_flush_pending_entries() and pgstat_flush_io() - entirely
doable. Unlike 0003, this doesn't make pg_stat_io more complete, or such, so
I'm inclined to leave it for 17.  I think there might be some more
opportunities for having counts "flow down", like the patch does.

Greetings,

Andres Freund




Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2023-04-07 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Apr 08, 2023 at 01:32:22AM +1200, Thomas Munro wrote:
>> I'm hoping to get just the regex changes in ASAP, and then take a
>> little bit longer on the recovery conflict patch itself (v6-0005) on
>> the basis that it's bugfix work and not subject to the feature freeze.

> Agreed.  It would be good to check with the RMT, but as long as that's
> not at the middle/end of the beta cycle I guess that's OK for this
> one, even if it is only for HEAD.

Right.  regex changes pass an eyeball check here.

regards, tom lane




Re: monitoring usage count distribution

2023-04-07 Thread Nathan Bossart
On Fri, Apr 07, 2023 at 02:29:31PM -0400, Tom Lane wrote:
> I'm not sure if there is consensus for 0002, but I reviewed and pushed
> 0001.  I made one non-cosmetic change: it no longer skips invalid
> buffers.  Otherwise, the row for usage count 0 would be pretty useless.
> Also it seemed to me that sum(buffers) ought to agree with the
> shared_buffers setting.

Makes sense.  Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 18:26:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-07 17:46:33 -0400, Tom Lane wrote:
> >> After quickly eyeing the diffs, I'm just going to take the new output
> >> as good.  I'm not surprised that there are additional output messages
> >> given the additional catalog entries this made.  I *am* a bit surprised
> >> that some messages seem to have disappeared --- are there places where
> >> this resulted in fewer catalog accesses than before?  Nonetheless,
> >> there's no good reason to assume this test is exposing any bugs.
> 
> > I wonder if the issue is that the new paths miss a hook invocation.
> 
> Perhaps.  I'm content to silence the buildfarm for today; we can
> investigate more closely later.

Makes sense.

I think
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-04-07%2021%3A16%3A04
might point out a problem with the pg_dump or pg_upgrade backward compat
paths:

--- C:\\prog\\bf/root/upgrade.drongo/HEAD/origin-REL9_5_STABLE.sql.fixed
2023-04-07 23:51:27.641328600 +
+++ 
C:\\prog\\bf/root/upgrade.drongo/HEAD/converted-REL9_5_STABLE-to-HEAD.sql.fixed 
2023-04-07 23:51:27.672571900 +
@@ -416,9 +416,9 @@
 -- Name: entry; Type: TABLE; Schema: public; Owner: buildfarm
 --
 CREATE TABLE public.entry (
-accession text,
-eid integer,
-txid smallint
+accession text NOT NULL,
+eid integer NOT NULL,
+txid smallint NOT NULL
 );
 ALTER TABLE public.entry OWNER TO buildfarm;
 --

Looks like we're making up NOT NULL constraints when migrating from 9.5, for
some reason?

Greetings,

Andres Freund




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Tom Lane
Andrew Dunstan  writes:
>> Actually, one quick datapoint.  prion and mantid report running IPC::Run
>> version 0.92, and morepork 0.96.  Animals that pass are running 20180523.0,
>> 20200505.0, 20220807.0 or similar versions.  We don't print the IO::Pty 
>> version
>> during configure, but maybe this is related to older versions of the modules
>> and this test (not all of them apparently) need to SKIP if IO::Pty is missing
>> or too old?  Somewhere to start looking at the very least.

> prion was running 1.10 (dated to 2010). I have just updated it to 1.17 
> (the CPAN latest). We'll see if that makes a difference.

I've been doing some checking with perlbrew locally.  It appears to not
be about IO::Pty so much as IPC::Run: it works with IPC::Run 0.99 but
not 0.79.  Still bisecting to identify exactly what's the minimum
okay version.

regards, tom lane




Re: cataloguing NOT NULL constraints

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 17:19:42 -0700, Andres Freund wrote:
> I think
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-04-07%2021%3A16%3A04
> might point out a problem with the pg_dump or pg_upgrade backward compat
> paths:
> 
> --- C:\\prog\\bf/root/upgrade.drongo/HEAD/origin-REL9_5_STABLE.sql.fixed  
> 2023-04-07 23:51:27.641328600 +
> +++ 
> C:\\prog\\bf/root/upgrade.drongo/HEAD/converted-REL9_5_STABLE-to-HEAD.sql.fixed
>2023-04-07 23:51:27.672571900 +
> @@ -416,9 +416,9 @@
>  -- Name: entry; Type: TABLE; Schema: public; Owner: buildfarm
>  --
>  CREATE TABLE public.entry (
> -accession text,
> -eid integer,
> -txid smallint
> +accession text NOT NULL,
> +eid integer NOT NULL,
> +txid smallint NOT NULL
>  );
>  ALTER TABLE public.entry OWNER TO buildfarm;
>  --
> 
> Looks like we're making up NOT NULL constraints when migrating from 9.5, for
> some reason?

My compiler complains:

../../../../home/andres/src/postgresql/src/backend/catalog/heap.c: In function 
‘AddRelationNotNullConstraints’:
../../../../home/andres/src/postgresql/src/backend/catalog/heap.c:2829:37: 
warning: ‘conname’ may be used uninitialized [-Wmaybe-uninitialized]
 2829 | if (strcmp(lfirst(lc2), conname) == 0)
  | ^~~~
../../../../home/andres/src/postgresql/src/backend/catalog/heap.c:2802:29: 
note: ‘conname’ was declared here
 2802 | char   *conname;
  | ^~~

I think the compiler may be right - I think the first use of conname might
have been intended as constr->conname?

Greetings,

Andres Freund




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Tom Lane
I wrote:
> I've been doing some checking with perlbrew locally.  It appears to not
> be about IO::Pty so much as IPC::Run: it works with IPC::Run 0.99 but
> not 0.79.  Still bisecting to identify exactly what's the minimum
> okay version.

The answer is: it works with IPC::Run >= 0.98.  The version of IO::Pty
doesn't appear significant; it works at least back to 1.00 from early
2002.

IPC::Run 0.98 is relatively new (2018), so I don't think it'd fly
to make that our new minimum version across-the-board.  I recommend
just setting up this one test to SKIP if IPC::Run is too old.

regards, tom lane




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 20:49:39 -0400, Tom Lane wrote:
> I wrote:
> > I've been doing some checking with perlbrew locally.  It appears to not
> > be about IO::Pty so much as IPC::Run: it works with IPC::Run 0.99 but
> > not 0.79.  Still bisecting to identify exactly what's the minimum
> > okay version.
> 
> The answer is: it works with IPC::Run >= 0.98.  The version of IO::Pty
> doesn't appear significant; it works at least back to 1.00 from early
> 2002.
> 
> IPC::Run 0.98 is relatively new (2018), so I don't think it'd fly
> to make that our new minimum version across-the-board.  I recommend
> just setting up this one test to SKIP if IPC::Run is too old.

Does the test actually take a while before it fails, or is it quick? It's
possible the failure is caused by 001_password.pl's use of
set_query_timer_restart(). I don't think other tests do something quite
comparable.

Greetings,

Andres Freund




Re: Making background psql nicer to use in tap tests

2023-04-07 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-07 20:49:39 -0400, Tom Lane wrote:
>> IPC::Run 0.98 is relatively new (2018), so I don't think it'd fly
>> to make that our new minimum version across-the-board.  I recommend
>> just setting up this one test to SKIP if IPC::Run is too old.

> Does the test actually take a while before it fails, or is it quick?

It times out at whatever your PG_TEST_TIMEOUT_DEFAULT is.  I waited
3 minutes the first time, and then reduced that to 20sec for the
rest of the tries ...

regards, tom lane




Re: daitch_mokotoff module

2023-04-07 Thread Tom Lane
I wrote:
> That seems fine to me.  I'll check this over and see if I can get
> it pushed today.

I pushed this after some mostly-cosmetic fiddling.  Most of the
buildfarm seems okay with it, but crake's perlcritic run is not:

./contrib/fuzzystrmatch/daitch_mokotoff_header.pl: I/O layer ":utf8" used at 
line 15, column 5.  Use ":encoding(UTF-8)" to get strict validation.  
([InputOutput::RequireEncodingWithUTF8Layer] Severity: 5)

Any suggestions on exactly how to pacify that?

regards, tom lane




check_GUC_init(wal_writer_flush_after) fails with non-default block size

2023-04-07 Thread Thomas Munro
Hi,

If you build with --with-wal-blocksize=/-Dwal_blocksize= anything but
8, this breaks:

running bootstrap script ... LOG:  GUC (PGC_INT)
wal_writer_flush_after, boot_val=256, C-var=128
TRAP: failed Assert("check_GUC_init(hentry->gucvar)"), File: "guc.c",
Line: 1519, PID: 84605
From 48d971e0b19f770991e334b8dc38422462b4485e Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 8 Apr 2023 13:12:48 +1200
Subject: [PATCH] Fix default wal_writer_flush_after value.

Commit a73952b7956 requires default values in guc_table.c and C variable
initializers to match.  This one only matched for XLOG_BLCKSZ == 8kB.
Fix by using the same expression in both places with a  new DEFAULT_XXX
macro, as done for other GUCs.

diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 65e84be39b..266fbc2339 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -68,7 +68,7 @@
  * GUC parameters
  */
 int			WalWriterDelay = 200;
-int			WalWriterFlushAfter = 128;
+int			WalWriterFlushAfter = DEFAULT_WAL_WRITER_FLUSH_AFTER;
 
 /*
  * Number of do-nothing loops before lengthening the delay time, and the
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index cf7f465ddb..916f6e2cfa 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2778,7 +2778,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_XBLOCKS
 		},
 		&WalWriterFlushAfter,
-		(1024 * 1024) / XLOG_BLCKSZ, 0, INT_MAX,
+		DEFAULT_WAL_WRITER_FLUSH_AFTER, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/postmaster/walwriter.h b/src/include/postmaster/walwriter.h
index 22281a97ba..af25cf0025 100644
--- a/src/include/postmaster/walwriter.h
+++ b/src/include/postmaster/walwriter.h
@@ -16,6 +16,8 @@
 extern PGDLLIMPORT int WalWriterDelay;
 extern PGDLLIMPORT int WalWriterFlushAfter;
 
+#define DEFAULT_WAL_WRITER_FLUSH_AFTER ((1024 * 1024) / XLOG_BLCKSZ)
+
 extern void WalWriterMain(void) pg_attribute_noreturn();
 
 #endif			/* _WALWRITER_H */
-- 
2.40.0



Re: daitch_mokotoff module

2023-04-07 Thread Andres Freund
Hi,

On 2023-04-07 21:13:43 -0400, Tom Lane wrote:
> I wrote:
> > That seems fine to me.  I'll check this over and see if I can get
> > it pushed today.
> 
> I pushed this after some mostly-cosmetic fiddling.  Most of the
> buildfarm seems okay with it, but crake's perlcritic run is not:
> 
> ./contrib/fuzzystrmatch/daitch_mokotoff_header.pl: I/O layer ":utf8" used at 
> line 15, column 5.  Use ":encoding(UTF-8)" to get strict validation.  
> ([InputOutput::RequireEncodingWithUTF8Layer] Severity: 5)
> 
> Any suggestions on exactly how to pacify that?

You could follow it's advise and replace the :utf8 with :encoding(UTF-8), that
works here. Or disable it in that piece of code with ## no critic
(RequireEncodingWithUTF8Layer) Or we could disable the warning in
perlcriticrc for all files?

Unless it's not available with old versions, using :encoding(UTF-8) seems
sensible?

Greetings,

Andres Freund




  1   2   >