date:20230213

Re: ExecRTCheckPerms() and many prunable partitions (sqlsmith)

2023-02-13 Thread Amit Langote

On Mon, Feb 13, 2023 at 5:07 Justin Pryzby  wrote:

> On Tue, Nov 29, 2022 at 10:37:56PM +0900, Amit Langote wrote:
> > 0002 contains changes that has to do with changing how we access
> > checkAsUser in some foreign table planning/execution code sites.
> > Thought it might be better to describe it separately too.
>
> This was committed as 599b33b94:
> Stop accessing checkAsUser via RTE in some cases
>
> That seems to add various elog()s which are hit frequently by sqlsmith:
>
> postgres=# select from
> (select transaction
> from pg_prepared_xacts
> right join pg_available_extensions
> on false limit 0) where false;
> ERROR:  permission info at index 2 (with relid=1262) does not match
> provided RTE (with relid=12081)
>
> postgres=# select from (select confl_tablespace
> from pg_stat_database_conflicts
> where datname <> (select 'af')
> limit 1) where false;
> ERROR:  invalid perminfoindex 1 in RTE with relid 12271


Thanks for the report.  I’ll take a look once I’m back at a computer in a
few days.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-13 Thread Amit Kapila

On Sat, Feb 11, 2023 at 2:34 AM Andres Freund  wrote:
>
> On 2023-02-09 11:21:41 +0530, Amit Kapila wrote:
> > On Thu, Feb 9, 2023 at 1:33 AM Andres Freund  wrote:
> > >
> > > Hacking on a rough prototype how I think this should rather look, I had a 
> > > few
> > > questions / remarks:
> > >
> > > - We probably need to call UpdateProgress from a bunch of places in 
> > > decode.c
> > >   as well? Indicating that we're lagging by a lot, just because all
> > >   transactions were in another database seems decidedly suboptimal.
> > >
> >
> > We can do that but I think in all those cases we will reach quickly
> > enough back to walsender logic (WalSndLoop - that will send keepalive
> > if required) that we don't need to worry. After processing each
> > record, the logic will return back to the main loop that will send
> > keepalive if required.
>
> For keepalive processing yes, for syncrep and accurate lag tracking, I don't
> think that suffices?  We could do that in WalSndLoop() instead I guess, but
> we'd have more information about when that's useful in decode.c.
>

Yeah, I think one possibility to address that is to call
update_progress() in DecodeCommit() and friends when we need to skip
the xact. We decide that in DecodeTXNNeedSkip. In the checks in that
function, I am not sure whether we need to call it for the case where
we skip the xact because we decide that it was previously decoded.

>
> > The patch calls update_progress in change_cb_wrapper and other
> > wrappers which will miss the case of DDLs that generates a lot of data
> > that is not processed by the plugin. I think for that we either need
> > to call update_progress from reorderbuffer.c similar to what the patch
> > has removed or we need some other way to address it. Do you have any
> > better idea?
>
> I don't mind calling something like update_progress() in the specific cases
> that's needed, but I think those are just the
>   if (!RelationIsLogicallyLogged(relation))
>   if (relation->rd_rel->relrewrite && !rb->output_rewrites))
>
> To me it makes a lot more sense to call update_progress() for those, rather
> than generally.
>

Won't it be better to call it wherever we don't invoke any wrapper
function like for cases REORDER_BUFFER_CHANGE_INVALIDATION, sequence
changes, etc.? I was thinking that wherever we don't call the wrapper
function which means we don't have a chance to invoke
update_progress(), the timeout can happen if there are a lot of such
messages.

>
> I think, independent of the update_progress calls, it'd be worth investing a
> bit of time into optimizing those cases, so that we don't put the changes into
> the reorderbuffer in the first place.  I think we find space for two flag bits
> to identify the cases in the WAL, rather than needing to access the catalog to
> figure it out.  If we don't find space, we could add an annotation the WAL
> record (making it bigger) for the two cases, because they're not the path most
> important to optimize.
>
>
>
> > > - Why should lag tracking only be updated at commit like points? That 
> > > seems
> > >   like it adds odd discontinuinities?
> > >
> >
> > We have previously experimented to call it from non-commit locations
> > but that turned out to give inaccurate information about Lag. See
> > email [1].
>
> That seems like an issue with WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS, not with
> reporting something more frequently.  ISTM that
> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS just isn't a good proxy for when to
> update lag reporting for records that don't strictly need it. I think that
> decision should be made based on the LSN, and be deterministic.
>
>
> > > - Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
> > >   WalSndWriteData() missing wal_sender_timeout <= 0 checks?
> > >
> >
> > It seems we are checking that via
> > ProcessPendingWrites()->WalSndKeepaliveIfNecessary(). Do you think we
> > need to check it before as well?
>
> Either we don't need the precheck at all, or we should do it reliably. Right
> now we'll have a higher overhead / some behavioural changes, if
> wal_sender_timeout is disabled. That doesn't make sense.
>

Fair enough, we can probably do it earlier.

>
> > How about renaming ProcessPendingWrites to WaitToSendPendingWrites or
> > WalSndWaitToSendPendingWrites?
>
> I don't like those much:
>
> We're not really waiting for the data to be sent or such, we just want to give
> it to the kernel to be sent out. Contrast that to WalSndWaitForWal, where we
> actually are waiting for something to complete.
>
> I don't think 'write' is a great description either, although our existing
> terminology is somewhat muddled. We're waiting calling pq_flush() until
> !pq_is_send_pending().
>
> WalSndSendPending() or WalSndFlushPending()?
>

Either of those sounds fine.

-- 
With Regards,
Amit Kapila.

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-13 Thread Peter Smith

Here are my review comments for the v34 patch.

==
src/backend/replication/logical/worker.c

+/* The last time we send a feedback message */
+static TimestampTz send_time = 0;
+

IMO this is a bad variable name. When this variable was changed to be
global it ought to have been renamed.

The name "send_time" is almost meaningless without any contextual information.

But also it's bad because this global name is "shadowed" by several
other parameters and other local variables using that same name  (e.g.
see UpdateWorkerStats, LogicalRepApplyLoop, etc). It is too confusing.

How about using a unique/meaningful name with a comment to match to
improve readability and remove unwanted shadowing?

SUGGESTION
/* Timestamp of when the last feedback message was sent. */
static TimestampTz last_sent_feedback_ts = 0;

~~~

2. maybe_apply_delay

+ /* Apply the delay by the latch mechanism */
+ do
+ {
+ TimestampTz delayUntil;
+ long diffms;
+
+ ResetLatch(MyLatch);
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* This might change wal_receiver_status_interval */
+ if (ConfigReloadPending)
+ {
+ ConfigReloadPending = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ }
+
+ /*
+ * Before calculating the time duration, reload the catalog if needed.
+ */
+ if (!in_remote_transaction && !in_streamed_transaction)
+ {
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ delayUntil = TimestampTzPlusMilliseconds(finish_ts,
MySubscription->minapplydelay);
+ diffms = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil);
+
+ /*
+ * Exit without arming the latch if it's already past time to apply
+ * this transaction.
+ */
+ if (diffms <= 0)
+ break;
+
+ elog(DEBUG2, "time-delayed replication for txid %u, min_apply_delay
= %d ms, remaining wait time: %ld ms",
+ xid, MySubscription->minapplydelay, diffms);
+
+ /*
+ * Call send_feedback() to prevent the publisher from exiting by
+ * timeout during the delay, when the status interval is greater than
+ * zero.
+ */
+ if (!status_interval_ms)
+ {
+ TimestampTz nextFeedback;
+
+ /*
+ * Based on the last time when we send a feedback message, adjust
+ * the first delay time for this transaction. This ensures that
+ * the first feedback message follows wal_receiver_status_interval
+ * interval.
+ */
+ nextFeedback = TimestampTzPlusMilliseconds(send_time,
+wal_receiver_status_interval * 1000L);
+ status_interval_ms =
TimestampDifferenceMilliseconds(GetCurrentTimestamp(), nextFeedback);
+ }
+ else
+ status_interval_ms = wal_receiver_status_interval * 1000L;
+
+ if (status_interval_ms > 0 && diffms > status_interval_ms)
+ {
+ WaitLatch(MyLatch,
+   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+   status_interval_ms,
+   WAIT_EVENT_LOGICAL_APPLY_DELAY);
+ send_feedback(last_received, true, false, true);
+ }
+ else
+ WaitLatch(MyLatch,
+   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+   diffms,
+   WAIT_EVENT_LOGICAL_APPLY_DELAY);
+
+ } while (true);

~

IMO this logic has been tweaked too many times without revisiting the
variable names and logic from scratch, so it has become over-complex
- some variable names are assuming multiple meanings
- multiple * 1000L have crept back in again
- the 'diffms' is too generic now with so many vars so it has lost its meaning
- GetCurrentTimestamp call in multiple places

SUGGESTIONS
- rename some variables and simplify the logic.
- reduce all the if/else
- don't be sneaky with the meaning of status_interval_ms
- 'diffms' --> 'remaining_delay_ms'
- 'DelayUntil' --> 'delay_until_ts'
- introduce 'now' variable
- simplify the check of (next_feedback_due_ms < remaining_delay_ms)

SUGGESTION (WFM)

/* Apply the delay by the latch mechanism */
while (true)
{
TimestampTz now;
TimestampTz delay_until_ts;
long remaining_delay_ms;
long status_interval_ms;

ResetLatch(MyLatch);

CHECK_FOR_INTERRUPTS();

/* This might change wal_receiver_status_interval */
if (ConfigReloadPending)
{
ConfigReloadPending = false;
ProcessConfigFile(PGC_SIGHUP);
}

/*
* Before calculating the time duration, reload the catalog if needed.
*/
if (!in_remote_transaction && !in_streamed_transaction)
{
AcceptInvalidationMessages();
maybe_reread_subscription();
}

now = GetCurrentTimestamp();
delay_until_ts = TimestampTzPlusMilliseconds(finish_ts,
MySubscription->minapplydelay);
remaining_delay_ms = TimestampDifferenceMilliseconds(now, delay_until_ts);

/*
* Exit without arming the latch if it's already past time to apply
* this transaction.
*/
if (remaining_delay_ms <= 0)
break;

elog(DEBUG2, "time-delayed replication for txid %u, min_apply_delay =
%d ms, remaining wait time: %ld ms",
xid, MySubscription->minapplydelay, remaining_delay_ms);
/*
* If a status interval is defined then we may need to call send_feedback()
* early to prevent the publisher from exiting during a long apply delay.
*/
status_interval_ms = wal_receiver_status_interval * 1000L;
if (status_interval_ms > 0)
{
TimestampTz next_feedback_due_ts;
long next_feedback_due_ms;

/*
* Find if the next feedback i

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-02-13 Thread Drouvot, Bertrand


Hi,

On 2/13/23 8:40 AM, Kyotaro Horiguchi wrote:

At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand" 
 wrote in

I think this is useful beyond being able to generate those functions
with
macros. The fact that we had to deal with transactional code in
pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside"
pgstat,
which seems architecturally not great.


Right, good point.


Agreed.


Removing the pfrees in V2 attached.


Ah, that sound good.

if (!entry_ref)
+   {
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
InvalidOid, rel_id);
+   return tablestatus;
+   }

We should return something if the call returns a non-null value?


What we do is: if entry_ref is NULL then we return NULL (so that the caller 
returns 0).

If entry_ref is not NULL then we return a copy of entry_ref->pending (with or 
without subtrans).



So, since we want to hide the internal from pgstatfuncs, the
additional flag should be gone. 


I think there is pros and cons for both but I don't have a strong opinion about 
that.

So also proposing V3 attached without the flag in case this is the preferred 
approach.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/utils/activity/pgstat_relation.c 
b/src/backend/utils/activity/pgstat_relation.c
index f793ac1516..2e22d190bf 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -477,14 +477,35 @@ PgStat_TableStatus *
 find_tabstat_entry(Oid rel_id)
 {
PgStat_EntryRef *entry_ref;
+   PgStat_TableXactStatus *trans;
+   PgStat_TableStatus *tabentry = NULL;
+   PgStat_TableStatus *tablestatus = NULL;
 
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
MyDatabaseId, rel_id);
if (!entry_ref)
+   {
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
InvalidOid, rel_id);
+   return tablestatus;
+   }
+
+   tabentry = (PgStat_TableStatus *) entry_ref->pending;
+   tablestatus = palloc(sizeof(PgStat_TableStatus));
+   *tablestatus = *tabentry;
+
+   /*
+* Live subtransactions' counts aren't in t_counts yet. This is not a 
hot
+* code path so it sounds ok to reconcile for tuples_inserted,
+* tuples_updated and tuples_deleted even if this is not what the caller
+* is interested in.
+*/
+   for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+   {
+   tablestatus->t_counts.t_tuples_inserted += 
trans->tuples_inserted;
+   tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+   tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+   }
 
-   if (entry_ref)
-   return entry_ref->pending;
-   return NULL;
+   return tablestatus;
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index 924698e6ae..c37d521f08 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1407,17 +1407,11 @@ pg_stat_get_xact_tuples_inserted(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_inserted;
-   /* live subtransactions' counts aren't in t_tuples_inserted yet 
*/
-   for (trans = tabentry->trans; trans != NULL; trans = 
trans->upper)
-   result += trans->tuples_inserted;
-   }
+   result = (int64) (tabentry->t_counts.t_tuples_inserted);
 
PG_RETURN_INT64(result);
 }
@@ -1428,17 +1422,11 @@ pg_stat_get_xact_tuples_updated(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_updated;
-   /* live subtransactions' counts aren't in t_tuples_updated yet 
*/
-   for (trans = tabentry->trans; trans != NULL; trans = 
trans->upper)
-   result += trans->tuples_updated;
-   }
+   result = (int64) (tabentry->t_counts.t_tuples_updated);
 
PG_RETURN_INT64(result);
 }
@@ -1449,17 +1437,11 @@ pg_stat_get_xact_tuples_deleted(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = f

Get rid of PgStat_BackendFunctionEntry

2023-02-13 Thread Drouvot, Bertrand


Hi hackers,

Please find attached a patch proposal to $SUBJECT: it would allow to simplify
even more the work done to generate pg_stat_get_xact*() functions with Macros.

This is a follow up for [1] where it has been suggested
to get rid of PgStat_BackendFunctionEntry in a separate patch.

Looking forward to your feedback,

Regards

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

[1]: 
https://www.postgresql.org/message-id/20230210214619.bdpbd5wvxcpx27rw%40awork3.anarazel.dediff --git a/src/backend/utils/activity/pgstat.c 
b/src/backend/utils/activity/pgstat.c
index 60fc4e761f..b125802b21 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -292,7 +292,7 @@ static const PgStat_KindInfo 
pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
-   .pending_size = sizeof(PgStat_BackendFunctionEntry),
+   .pending_size = sizeof(PgStat_FunctionCounts),
 
.flush_pending_cb = pgstat_function_flush_cb,
},
diff --git a/src/backend/utils/activity/pgstat_function.c 
b/src/backend/utils/activity/pgstat_function.c
index 6139a27fee..0fdcefb783 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -73,7 +73,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
   PgStat_FunctionCallUsage 
*fcu)
 {
PgStat_EntryRef *entry_ref;
-   PgStat_BackendFunctionEntry *pending;
+   PgStat_FunctionCounts *pending;
boolcreated_entry;
 
if (pgstat_track_functions <= fcinfo->flinfo->fn_stats)
@@ -121,10 +121,10 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 
pending = entry_ref->pending;
 
-   fcu->fs = &pending->f_counts;
+   fcu->fs = pending;
 
/* save stats for this function, later used to compensate for recursion 
*/
-   fcu->save_f_total_time = pending->f_counts.f_total_time;
+   fcu->save_f_total_time = pending->f_total_time;
 
/* save current backend-wide total time */
fcu->save_total = total_func_time;
@@ -192,10 +192,10 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, 
bool finalize)
 bool
 pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 {
-   PgStat_BackendFunctionEntry *localent;
+   PgStat_FunctionCounts *localent;
PgStatShared_Function *shfuncent;
 
-   localent = (PgStat_BackendFunctionEntry *) entry_ref->pending;
+   localent = (PgStat_FunctionCounts *) entry_ref->pending;
shfuncent = (PgStatShared_Function *) entry_ref->shared_stats;
 
/* localent always has non-zero content */
@@ -203,11 +203,11 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool 
nowait)
if (!pgstat_lock_entry(entry_ref, nowait))
return false;
 
-   shfuncent->stats.f_numcalls += localent->f_counts.f_numcalls;
+   shfuncent->stats.f_numcalls += localent->f_numcalls;
shfuncent->stats.f_total_time +=
-   INSTR_TIME_GET_MICROSEC(localent->f_counts.f_total_time);
+   INSTR_TIME_GET_MICROSEC(localent->f_total_time);
shfuncent->stats.f_self_time +=
-   INSTR_TIME_GET_MICROSEC(localent->f_counts.f_self_time);
+   INSTR_TIME_GET_MICROSEC(localent->f_self_time);
 
pgstat_unlock_entry(entry_ref);
 
@@ -215,11 +215,11 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool 
nowait)
 }
 
 /*
- * find any existing PgStat_BackendFunctionEntry entry for specified function
+ * find any existing PgStat_FunctionCounts entry for specified function
  *
  * If no entry, return NULL, don't create a new one
  */
-PgStat_BackendFunctionEntry *
+PgStat_FunctionCounts *
 find_funcstat_entry(Oid func_id)
 {
PgStat_EntryRef *entry_ref;
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index 9d707c3521..9819afa462 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1654,33 +1654,33 @@ Datum
 pg_stat_get_xact_function_calls(PG_FUNCTION_ARGS)
 {
Oid funcid = PG_GETARG_OID(0);
-   PgStat_BackendFunctionEntry *funcentry;
+   PgStat_FunctionCounts *funcentry;
 
if ((funcentry = find_funcstat_entry(funcid)) == NULL)
PG_RETURN_NULL();
-   PG_RETURN_INT64(funcentry->f_counts.f_numcalls);
+   PG_RETURN_INT64(funcentry->f_numcalls);
 }
 
 Datum
 pg_stat_get_xact_function_total_time(PG_FUNCTION_ARGS)
 {
Oid funcid = PG_GETARG_OID(0);
-   PgStat_BackendFunctionEntry *funcentry;
+   PgStat_FunctionCounts *funcentry;
 
if ((funcentry = find_funcstat

Re: Support logical replication of DDLs

2023-02-13 Thread Alvaro Herrera

On 2023-Feb-06, Peter Smith wrote:

> I thought this comment was analogous to another one from this same
> patch 0001 (see seclabel.c), so the suggested change above was simply
> to make the wording consistent.
> 
> @@ -134,6 +134,9 @@ ExecSecLabelStmt(SecLabelStmt *stmt)
>   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>   errmsg("must specify provider when multiple security label providers
> have been loaded")));
>   provider = (LabelProvider *) linitial(label_provider_list);
> +
> + /* Copy the provider name to the parsetree, needed for DDL deparsing
> of SecLabelStmt */
> + stmt->provider = pstrdup(provider->provider_name);
> 
> So if the suggestion for the ExecuteGrantStmt comment was a mistake
> then perhaps the ExecSecLabelStmt comment is wrong also?

Well, here the patch would have us modifying a parse tree node, which is
probably not a good thing to do.  I don't remember whether I coded the
deparsing of any other object type this way, but nowadays modifying
parse trees is generally frowned upon.  Most likely, this would have to
be done some other way.  Maybe set the provider as secondary object
address for the command.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La vida es para el que se aventura"

Re: Consolidate ItemPointer to Datum conversion functions

2023-02-13 Thread Peter Eisentraut


On 09.02.23 09:33, Peter Eisentraut wrote:

On 06.02.23 11:11, Heikki Linnakangas wrote:

On 06/02/2023 11:54, Peter Eisentraut wrote:

Instead of defining the same set of macros several times, define it once
in an appropriate header file.  In passing, convert to inline functions.


Looks good to me. Did you consider moving PG_GETARG_ITEMPOINTER and 
PG_RETURN_ITEMPOINTER, too? They're only used in tid.c, but for most 
datatypes, we define the PG_GETARG and PG_RETURN macros in the same 
header file as the the Datum conversion functions.


Yeah that makes sense.  Here is an updated patch for that.


committed

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2023-02-13 Thread Kyotaro Horiguchi

At Mon, 13 Feb 2023 10:15:03 +0530, Bharath Rupireddy 
 wrote in 
> On Sun, Feb 12, 2023 at 11:01 PM Andres Freund  wrote:
> >
> > On 2023-02-12 19:59:00 +0530, Bharath Rupireddy wrote:
> > >   /* Prepare to write out a lot of copies of our zero buffer at once. 
> > > */
> > >   for (i = 0; i < lengthof(iov); ++i)
> > >   {
> > > - iov[i].iov_base = zbuffer.data;
> > > + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, 
> > > &zbuffer)->data);
> > >   iov[i].iov_len = zbuffer_sz;
> > >   }
> >
> > Another thing: I think we should either avoid iterating over all the IOVs if
> > we don't need them, or, even better, initialize the array as a constant, 
> > once.

FWIW, I tried to use the "{[start .. end] = {}}" trick (GNU extension?
[*1]) for constant array initialization, but individual members don't
accept assigning a const value, thus I did deconstify as the follows.

>   static const struct iovec   iov[PG_IOV_MAX] =
>   {[0 ... PG_IOV_MAX - 1] =
>{
>.iov_base = (void *)&zbuffer.data,
>.iov_len = BLCKSZ
>}
>   };

I didn't checked the actual mapping, but if I tried an assignment
"iov[0].iov_base = NULL", it failed as "assignment of member
‘iov_base’ in read-only object", so is it successfully placed in a
read-only segment?

Later code assigns iov[0].iov_len thus we need to provide a separate
iov non-const variable, or can we use pwrite instead there?  (I didn't
find pg_pwrite_with_retry(), though)

> How about like the attached patch? It makes the iovec static variable
> and points the zero buffer only once/for the first time to iovec. This
> avoids for-loop on every call.

As the patch itself, it seems forgetting to reset iov[0].iov_len after
writing a partial block.

retards.

*1: https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Killing off removed rels properly

2023-02-13 Thread Richard Guo

On Sat, Feb 11, 2023 at 4:50 AM Tom Lane  wrote:

> I think it's time to clean this up by removing the rel from the
> planner data structures altogether.  The attached passes check-world,
> and if it does trigger any problems I would say that's a clear
> sign of bugs elsewhere.

+1. The patch looks good to me.  One minor comment is that we should
also remove the comments about RELOPT_DEADREL in pathnodes.h.

 * Lastly, there is a RelOptKind for "dead" relations, which are base rels
 * that we have proven we don't need to join after all.

Thanks
Richard

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-13 Thread Amit Kapila

On Fri, Feb 10, 2023 at 4:56 PM Takamichi Osumi (Fujitsu)
 wrote:
>
> On Friday, February 10, 2023 2:05 PM Friday, February 10, 2023 2:05 PM wrote:
> > On Fri, Feb 10, 2023 at 10:11 AM Amit Kapila 
> > wrote:
>
> In the previous patch, we couldn't solve the
> timeout of the publisher, when we conduct a scenario suggested by 
> Horiguchi-san
> and reproduced in the scenario attached test file 'test.sh'.
> But now we handle it by adjusting the timing of the first wait time.
>
> FYI, we thought to implement the new variable 'send_time'
> in the LogicalRepWorker structure at first. But, this structure
> is used when launcher controls workers or reports statistics
> and it stores TimestampTz recorded in the received WAL,
> so not sure if the struct is the right place to implement the variable.
> Moreover, there are other similar variables such as last_recv_time
> or reply_time. So, those will be confusing when we decide to have
> new variable together. Then, it's declared separately.
>

I think we can introduce a new variable as last_feedback_time in the
LogicalRepWorker structure and probably for the last_received, we can
last_lsn in MyLogicalRepWorker as that seems to be updated correctly.
I think it would be good to avoid global variables.

-- 
With Regards,
Amit Kapila.

Re: Rework of collation code, extensibility

2023-02-13 Thread Peter Eisentraut


On 01.02.23 00:33, Jeff Davis wrote:

On Tue, 2023-01-31 at 11:40 +0100, Peter Eisentraut wrote:

I don't know to what extent this depends on the abbreviated key GUC
discussion.  Does the rest of this patch set depend on this?


The overall refactoring is not dependent logically on the GUC patch. It
may require some trivial fixup if you eliminate the GUC patch.

I left it there because it makes exploring/testing easier (at least for
me), but the GUC patch doesn't need to be committed if there's no
consensus.


I took another closer look at the 0002 and 0003 patches.

The commit message for 0002 says "Also remove the TRUST_STRXFRM define", 
but I think this is incorrect, as that is done in the 0001 patch.


I don't like that the pg_strnxfrm() function requires these kinds of 
repetitive error checks:


+   if (rsize != bsize)
+   elog(ERROR, "pg_strnxfrm() returned unexpected result");

This could be checked inside the function itself, so that the callers 
don't have to do this themselves every time.


I don't really understand the 0003 patch.  It's a lot of churn but I'm 
not sure that it achieves more clarity or something.


The new function pg_locale_deterministic() seems sensible.  Maybe this 
could be proposed as a separate patch.


I don't understand the new header pg_locale_internal.h.  What is 
"internal" and what is not internal?  What are we hiding from whom? 
There are no code comments about this AFAICT.


pg_locale_struct has new fields

+   char   *collate;
+   char   *ctype;

that are not explained anywhere.

I think this patch would need a bit more explanation and commenting.

Re: psql - factor out echo code

2023-02-13 Thread Peter Eisentraut

On 01.12.22 08:27, Pavel Stehule wrote:
st 30. 11. 2022 v 10:43 odesílatel Fabien COELHO > napsal:

 >> Now some of the output generated by test_extdepend gets a bit
 >> confusing:
 >> +-- QUERY:
 >> +
 >> +
 >> +-- QUERY:
 >>
 >> That's not entirely this patch's fault.  Still that's not really
 >> intuitive to see the output of a query that's just a blank spot..
 >
 > Hmmm.
 >
 > What about adding an explicit \echo before these empty outputs to
mitigate
 > the possible induced confusion?

\echo is not possible.

Attached an attempt to improve the situation by replacing empty
lines with
comments in this test.

I can confirm so all regress tests passed

I think this patch requires an up-to-date summary and explanation.  The 
thread is over a year old and the patch has evolved quite a bit.  There 
are some test changes that are not explained.  Please provide more 
detail so that the patch can be considered.

Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-02-13 Thread Alvaro Herrera

On 2023-Jan-26, Drouvot, Bertrand wrote:

> Hi,
> 
> On 1/26/23 10:42 AM, Alvaro Herrera wrote:
> > On 2023-Jan-26, Drouvot, Bertrand wrote:
> > 
> > > On 1/24/23 7:27 PM, Alvaro Herrera wrote:
> > 
> > > > 1. I don't think wait_for_write_catchup is necessary, because
> > > > calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
> > > > would already do the same thing.
> 
> Having a closer look, it does not seem to be the case. The default mode
> in wait_for_catchup() is 'replay' and the default mode for the lsn is 'write'.
> 
> But in wait_for_write_catchup() we are making use of 'write' for both.

But that turns
 $node->wait_for_catchup('foobar', 'write')
into
 $node->wait_for_write_catchup('foobar');
so I don't see much value in it.  Also, the patch series from which this
patch spawned in the first place doesn't wait for write AFAICS.

After adding some more POD docs for it, I pushed the one for replay.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
¡Ay, ay, ay!  Con lo mucho que yo lo quería (bis)
se fue de mi vera ... se fue para siempre, pa toíta ... pa toíta la vida
¡Ay Camarón! ¡Ay Camarón!(Paco de Lucía)

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-13 Thread shiy.f...@fujitsu.com

On Thu, Feb 2, 2023 4:34 PM Önder Kalacı  wrote:
>
>>
>> and if there's more
>> than one candidate index pick any one. Additionally, we can allow
>> disabling the use of an index scan for this particular case. If we are
>> too worried about API change for allowing users to specify the index
>> then we can do that later or as a separate patch.
>>
>
> On v23, I dropped the planner support for picking the index. Instead, it 
> simply
> iterates over the indexes and picks the first one that is suitable.
>
> I'm currently thinking on how to enable users to override this decision.
> One option I'm leaning towards is to add a syntax like the following:
>
> ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
>
> Though, that should probably be a seperate patch. I'm going to work
> on that, but still wanted to share v23 given picking the index sounds
> complementary, not strictly required at this point.
>

Thanks for your patch. Here are some comments.

1.
I noticed that get_usable_indexoid() is called in apply_handle_update_internal()
and apply_handle_delete_internal() to get the usable index. Could usableIndexOid
be a parameter of these two functions? Because we have got the
LogicalRepRelMapEntry when calling them and if we do so, we can get
usableIndexOid without get_usable_indexoid(). Otherwise for partitioned tables,
logicalrep_partition_open() is called in get_usable_indexoid() and searching
the entry via hash_search() will increase cost.

2.
+* This attribute is an expression, and
+* SuitableIndexPathsForRepIdentFull() was called 
earlier when the
+* index for subscriber was selected. There, the indexes
+* comprising *only* expressions have already been 
eliminated.

The comment looks need to be updated:
SuitableIndexPathsForRepIdentFull
->
FindUsableIndexForReplicaIdentityFull

3.

/* Build scankey for every attribute in the index. */
-   for (attoff = 0; attoff < 
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+   for (index_attoff = 0; index_attoff < 
IndexRelationGetNumberOfKeyAttributes(idxrel);
+index_attoff++)
{

Should the comment be changed? Because we skip the attributes that are 
expressions.

4.
+   Assert(RelationGetReplicaIndex(rel) != 
RelationGetRelid(idxrel) &&
+  RelationGetPrimaryKeyIndex(rel) != 
RelationGetRelid(idxrel));

Maybe we can call the new function idxIsRelationIdentityOrPK()?

Regards,
Shi Yu

Re: Can we do something to help stop users mistakenly using force_parallel_mode?

2023-02-13 Thread David Rowley

On Sat, 11 Feb 2023 at 04:34, Andrew Dunstan  wrote:
>
> On 2023-02-09 Th 15:25, David Rowley wrote:
> Likely the hardest part to get right here is the new name.  Can anyone
> think of anything better than debug_parallel_query?
>
>
> WFM

Thanks for chipping in.

I've attached a patch which fixes the problem John mentioned.

I feel like nobody is against doing this rename, so I'd quite like to
get this done pretty soon.  If anyone else wants to voice their
opinion in regards to this, please feel free to do so. +1s are more
reassuring than silence.  I just want to get this right so we never
have to think about it again.

If nobody is against this then I'd like to push the attached in about
24 hours from now.

David
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8c56b134a8..ecd9aa73ef 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11091,17 +11091,17 @@ dynamic_library_path = 
'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
- 
-  force_parallel_mode (enum)
+ 
+  debug_parallel_query (enum)
   
-   force_parallel_mode configuration 
parameter
+   debug_parallel_query configuration 
parameter
   
   
   

 Allows the use of parallel queries for testing purposes even in cases
 where no performance benefit is expected.
-The allowed values of force_parallel_mode are
+The allowed values of debug_parallel_query are
 off (use parallel mode only when it is expected to 
improve
 performance), on (force parallel query for all 
queries
 for which it is thought to be safe), and regress 
(like
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 117d097390..a08c7a78af 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -370,7 +370,7 @@ make check LANG=C ENCODING=EUC_JP
 set in the PGOPTIONS environment variable (for settings
 that allow this):
 
-make check PGOPTIONS="-c force_parallel_mode=regress -c work_mem=50MB"
+make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"
 
 When running against a temporary installation, custom settings can also be
 set by supplying a pre-written postgresql.conf:
diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index 9e3ec0d5d8..b26f2a64fb 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1152,11 +1152,11 @@ HandleParallelMessage(ParallelContext *pcxt, int i, 
StringInfo msg)
 * If desired, add a context line to show that 
this is a
 * message propagated from a parallel worker.  
Otherwise, it
 * can sometimes be confusing to understand 
what actually
-* happened.  (We don't do this in 
FORCE_PARALLEL_REGRESS mode
+* happened.  (We don't do this in 
DEBUG_PARALLEL_REGRESS mode
 * because it causes test-result instability 
depending on
 * whether a parallel worker is actually used 
or not.)
 */
-   if (force_parallel_mode != 
FORCE_PARALLEL_REGRESS)
+   if (debug_parallel_query != 
DEBUG_PARALLEL_REGRESS)
{
if (edata.context)
edata.context = 
psprintf("%s\n%s", edata.context,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..e57bda7b62 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -756,8 +756,8 @@ ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc)
/*
 * Sometimes we mark a Gather node as "invisible", which means that it's
 * not to be displayed in EXPLAIN output.  The purpose of this is to 
allow
-* running regression tests with force_parallel_mode=regress to get the
-* same results as running the same tests with force_parallel_mode=off.
+* running regression tests with debug_parallel_query=regress to get the
+* same results as running the same tests with debug_parallel_query=off.
 * Such marking is currently only supported on a Gather at the top of 
the
 * plan.  We skip that node, and we must also hide per-worker detail 
data
 * further down in the plan tree.
diff --git a/src/backend/optimizer/plan/planmain.c 
b/src/backend/optimizer/plan/planmain.c
index 4c17407e5d..7afd434c60 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -114,12 +114,12 @@ query_planner(PlannerInfo *root,
 * Anything parallel-restricted in the query 
tlist will be
 * dealt with later.)

Making CopyFromStateData not internal anymore

2023-02-13 Thread Jelte Fennema

For Citus we'd like to hook into the way that GENERATED AS IDENTITY
generates the next value. The way we had in mind was to replace
T_NextValueExpr with a function call node. But doing that same for
COPY seems only possible by manually changing the defexprs array of
the CopyFromState. Sadly CopyFromStateData is in an internal header so
that seems dangerous to do, since the struct definition might change
across minor versions.
However, it seems that the only time this was actually done in the
last 5 years was in 8dc49a8934de023c08890035d96916994bd9b297

What do you think about making CopyFromStateData its definition public?

Re: [PATCH] Add pretty-printed XML output option

2023-02-13 Thread Jim Jones


On 13.02.23 02:15, Peter Smith wrote:

Something is misbehaving.

Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.

But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.


Yes, I noticed it yesterday ... and I'm not sure how to solve it. It 
seems that in the system is returning a different error message in the 
FreeBSD patch tester, which is causing a regression test in this 
particular OS to fail.



diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out  2023-02-12 
09:02:57.077569000 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out  
2023-02-12 09:05:45.14810 +
@@ -1695,10 +1695,7 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty
 
 ^

 -- XML format: invalid string (whitespaces)


Does anyone know if there is anything I can do to make the error 
messages be consistent among different OS?

Re: run pgindent on a regular basis / scripted manner

2023-02-13 Thread Andrew Dunstan



On 2023-02-12 Su 15:59, Justin Pryzby wrote:

It seems like if pgindent knows about git, it ought to process only
tracked files.  Then, it wouldn't need to manually exclude generated
files, and it wouldn't process vpath builds and who-knows-what else it
finds in CWD.



for vpath builds use an exclude file that excludes the vpath you use.

I don't really want restrict this to tracked files because it would mean 
you can't pgindent files before you `git add` them. And we would still 
need to do manual exclusion for some files that are tracked, e.g. the 
snowball files.





At least --commit doesn't seem to work when run outside of the root
source dir.



Yeah, I'll fix that, thanks for mentioning.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-02-13 Thread Andrew Dunstan



On 2023-02-12 Su 16:13, Tom Lane wrote:

Andrew Dunstan  writes:

On 2023-02-12 Su 11:24, Tom Lane wrote:

It seems like "indent the whole tree" is about to become a minority
use-case.  Maybe instead of continuing to privilege that case, we
should say that it's invoked by some new switch like --all-files,
and without that only the stuff identified by command-line arguments
gets processed.

I don't think we need --all-files. The attached gets rid of the build
and code-base cruft, which is now in any case obsolete given we've put
pg_bsd_indent in our code base. So the way to spell this instead of
"pgindent --all-files" would be "pgindent ."

Ah, of course.


I added a warning if there are no files at all specified.

LGTM.



Thanks, pushed.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: amvalidate(): cache lookup failed for operator class 123

2023-02-13 Thread Justin Pryzby

On Thu, May 13, 2021 at 12:01:22PM -0500, Justin Pryzby wrote:
> postgres=# select amvalidate(123);
> ERROR:  cache lookup failed for operator class 123
> The usual expectation is that sql callable functions should return null rather
> than hitting elog().

On Thu, May 13, 2021 at 2:22 PM Tom Lane  wrote:
> Meh.  I'm not convinced that that position ought to apply to amvalidate.

On Thu, May 13, 2021 at 04:12:10PM -0400, Robert Haas wrote:
> I am still of the opinion that we ought to apply it across the board,
> for consistency. It makes it easier for humans to know which problems
> are known to be reachable and which are thought to be can't-happen and
> thus bugs. If we fix cases like this to return a real error code, then
> anything that comes up as XX000 is likely to be a real bug, whereas if
> we don't, the things that we're not concerned about have to be
> filtered out by some other method, probably involving a human being.
> If the filter that human being has to apply further involves reading
> Tom Lane's mind and knowing what he will think about a particular
> report, or alternatively asking him, it just makes complicated
> something that we could have made simple.

FWIW, here are some other cases from sqlsmith which hit elog()/XX000:

postgres=# select unknownin('');
ERROR:  failed to find conversion function from unknown to text
postgres=# \errverbose
ERROR:  XX000: failed to find conversion function from unknown to text
LOCATION:  coerce_type, parse_coerce.c:542

postgres=# SELECT pg_catalog.interval( '12 seconds'::interval ,3);
ERROR:  unrecognized interval typmod: 3

postgres=# SELECT pg_describe_object(1,0,1);
ERROR:  invalid non-zero objectSubId for object class 1

postgres=# SELECT pg_read_file( repeat('a',333));
ERROR:  could not open file 
"aa
 
aaa"
 for reading: File name too long

-- 
Justin

Re: run pgindent on a regular basis / scripted manner

2023-02-13 Thread Jelte Fennema

On Sun, 12 Feb 2023 at 15:16, Andrew Dunstan  wrote:
> I'm not sure how much more I really want to do here. Given the way pgindent 
> now processes command line arguments, maybe the best thing is for people to 
> use that. Use of git aliases can help. Something like these for example
>
>
> [alias]
>
> dirty = diff --name-only --diff-filter=ACMU -- .
> staged = diff --name-only --cached --diff-filter=ACMU -- .
> dstaged = diff --name-only --diff-filter=ACMU HEAD -- .
>
>
> and then you could do
>
> pgindent `git dirty`
>
>
> The only danger would be if there were no dirty files. Maybe we need a switch 
> to inhibit using the current directory if there are no command line files.
>
>
> Thoughts?

I think indenting staged or dirty files is probably the most common
operation that people want to do with pgindent. So I think that having
dedicated flags makes sense. I agree that it's not strictly necessary
and git aliases help a lot. But the git aliases require you to set
them up. To me making the most common operation as easy as possible to
do, seems worth the few extra lines to pgindent.

Sidenote: You mentioned untracked files in another email. I think that
the --dirty flag should probably also include untracked files. A
command to do so is: git ls-files --others --exclude-standard

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-02-13 Thread Pavel Luzanov


Hello,

Playing with this patch, I did not see descriptive comments in 
pg_ident.conf.


Does it make sense to reflect changes to the PG-USERNAME field in the 
pg_ident.conf.sample file?


The same relates to the regexp supportin the DATABASE and USER fieldsof 
the pg_hba.conf.sample file(8fea8683).


-
Pavel Luzanov

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-02-13 Thread Jelte Fennema

On Mon, 13 Feb 2023 at 15:06, Pavel Luzanov  wrote:
> Does it make sense to reflect changes to the PG-USERNAME field in the
> pg_ident.conf.sample file?
>
> The same relates to the regexp supportin the DATABASE and USER fieldsof
> the pg_hba.conf.sample file(8fea8683).

That definitely makes sense to me. When writing the patch I didn't
realise that there was also documentation in those files.

I think it also makes sense to include usage of (some of) the features
in the example files here:
https://www.postgresql.org/docs/devel/auth-username-maps.html

[PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans

2023-02-13 Thread Aleksander Alekseev

Hi hackers,

A colleague of mine wanted to use a ScanKey with SK_SEARCHNULL flag
for a heap-only scan (besides other ScanKeys) and discovered that the
result differs from what he would expect. Turned out that this is
currently not supported as it is explicitly stated in skey.h.

Although several workarounds come to mind this limitation may be
really of inconvenience for the extension authors, and implementing
corresponding support seems to be pretty straightforward.

The attached patch does this.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Support-SK_SEARCHNULL-SK_SEARCHNOTNULL-for-heap-o.patch
Description: Binary data

Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-02-13 Thread Drouvot, Bertrand


Hi,

On 2/13/23 11:58 AM, Alvaro Herrera wrote:

On 2023-Jan-26, Drouvot, Bertrand wrote:


Hi,

On 1/26/23 10:42 AM, Alvaro Herrera wrote:

On 2023-Jan-26, Drouvot, Bertrand wrote:


On 1/24/23 7:27 PM, Alvaro Herrera wrote:



1. I don't think wait_for_write_catchup is necessary, because
calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
would already do the same thing.


Having a closer look, it does not seem to be the case. The default mode
in wait_for_catchup() is 'replay' and the default mode for the lsn is 'write'.

But in wait_for_write_catchup() we are making use of 'write' for both.


But that turns
  $node->wait_for_catchup('foobar', 'write')
into
  $node->wait_for_write_catchup('foobar');
so I don't see much value in it.


Agree.


 Also, the patch series from which this
patch spawned in the first place doesn't wait for write AFAICS.



Right, it does wait for replay only.


After adding some more POD docs for it, I pushed the one for replay.



Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

ERROR: permission info at index 1 ....

2023-02-13 Thread tender wang

Hi hackers,
   After a61b1f74823c commit, below query reports error:

 create table perm_test1(a int);
 create table perm_test2(b int);
 select subq.c0
from (select (select a from perm_test1 order by a limit 1) as c0, b as c1
from perm_test2 where false order by c0, c1) as subq where false;
ERROR:  permission info at index 1 (with relid=16457) does not match
provided RTE (with relid=16460)

Below codes can fix this:

--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -512,11 +512,16 @@ flatten_rtes_walker(Node *node,
flatten_rtes_walker_context *cxt)
 * Recurse into subselects.  Must update cxt->query to this
query so
 * that the rtable and rteperminfos correspond with each
other.
 */
+   Query *current_query = cxt->query;
+   bool result;
+
cxt->query = (Query *) node;
-   return query_tree_walker((Query *) node,
+   result = query_tree_walker((Query *) node,

 flatten_rtes_walker,
 (void *)
cxt,

 QTW_EXAMINE_RTES_BEFORE);
+   cxt->query = current_query;
+   return result;
}


 regards, tender wang

Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 17:59:13 +0300, Aleksander Alekseev wrote:
> @@ -36,20 +36,36 @@ HeapKeyTest(HeapTuple tuple, TupleDesc tupdesc, int 
> nkeys, ScanKey keys)
>   boolisnull;
>   Datum   test;
>  
> - if (cur_key->sk_flags & SK_ISNULL)
> - return false;
> + if (cur_key->sk_flags & (SK_SEARCHNULL | SK_SEARCHNOTNULL))
> + {
> + /* special case: looking for NULL / NOT NULL values */
> + Assert(cur_key->sk_flags & SK_ISNULL);
>  
> - atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, &isnull);
> + atp = heap_getattr(tuple, cur_key->sk_attno, tupdesc, 
> &isnull);
>  
> - if (isnull)
> - return false;
> + if (isnull && (cur_key->sk_flags & SK_SEARCHNOTNULL))
> + return false;
>  
> - test = FunctionCall2Coll(&cur_key->sk_func,
> -  
> cur_key->sk_collation,
> -  atp, 
> cur_key->sk_argument);
> + if (!isnull && (cur_key->sk_flags & SK_SEARCHNULL))
> + return false;

Shouldn't need to extract the column if we just want to know if it's NULL (see
heap_attisnull()). Afaics the value isn't accessed after this.

Greetings,

Andres Freund

Re: run pgindent on a regular basis / scripted manner

2023-02-13 Thread Andrew Dunstan



On 2023-02-13 Mo 09:02, Jelte Fennema wrote:

On Sun, 12 Feb 2023 at 15:16, Andrew Dunstan  wrote:

I'm not sure how much more I really want to do here. Given the way pgindent now 
processes command line arguments, maybe the best thing is for people to use 
that. Use of git aliases can help. Something like these for example


[alias]

 dirty = diff --name-only --diff-filter=ACMU -- .
 staged = diff --name-only --cached --diff-filter=ACMU -- .
 dstaged = diff --name-only --diff-filter=ACMU HEAD -- .


and then you could do

 pgindent `git dirty`


The only danger would be if there were no dirty files. Maybe we need a switch 
to inhibit using the current directory if there are no command line files.


Thoughts?

I think indenting staged or dirty files is probably the most common
operation that people want to do with pgindent. So I think that having
dedicated flags makes sense. I agree that it's not strictly necessary
and git aliases help a lot. But the git aliases require you to set
them up. To me making the most common operation as easy as possible to
do, seems worth the few extra lines to pgindent.



OK, but I'd like to hear from more people about what they want. 
Experience tells me that making assumptions about how people work is not 
a good idea. I doubt anyone's work pattern is like mine. I don't want to 
implement an option that three people are going to use.





Sidenote: You mentioned untracked files in another email. I think that
the --dirty flag should probably also include untracked files. A
command to do so is: git ls-files --others --exclude-standard



Thanks for the info.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-11 05:44:47 +, Takamichi Osumi (Fujitsu) wrote:
> On Saturday, February 11, 2023 11:10 AM Andres Freund  
> wrote:
> > Has there been any discussion about whether this is actually best
> > implemented on the client side? You could alternatively implement it on the
> > sender.
> > 
> > That'd have quite a few advantages, I think - you e.g. wouldn't remove the
> > ability to *receive* and send feedback messages.  We'd not end up filling up
> > the network buffer with data that we'll not process anytime soon.
> Thanks for your comments !
> 
> We have discussed about the publisher side idea around here [1]
> but, we chose the current direction. Kindly have a look at the discussion.
> 
> If we apply the delay on the publisher, then
> it can lead to extra delay where we don't need to apply.
> The current proposed approach can take other loads or factors
> (network, busyness of the publisher, etc) into account
> because it calculates the required delay on the subscriber.

I don't think it's OK to just loose the ability to read / reply to keepalive
messages.

I think as-is we seriously consider to just reject the feature, adding too
much complexity, without corresponding gain.

Greetings,

Andres Freund

Re: Making Vars outer-join aware

2023-02-13 Thread Tom Lane

Richard Guo  writes:
> Thanks for the report!  I've looked at it a little bit and traced down
> to function have_unsafe_outer_join_ref().  The comment there says
>  * In practice, this test never finds a problem ...
> This seems not correct as showed by the counterexample.

Right.  I'd noticed that the inner loop of that function was not
reached in our regression tests, and incorrectly concluded that it
was not reachable --- but I failed to consider cases where the
inner rel's parameterization depends on Vars from multiple places.

> The NOT_USED part of code is doing this check.  But I think we need a
> little tweak.  We should check the nullable side of related outer joins
> against the satisfied part, rather than inner_paramrels.  Maybe
> something like attached.

Agreed.

> However, this test seems to cost some cycles after the change.  So I
> wonder if it's worthwhile to perform it, considering that join order
> restrictions should be able to guarantee there is no problem here.

Yeah, I think we should reduce it to an Assert check.  It shouldn't be
worth the cycles to run in production, and that will also make it easier
to notice in sqlsmith testing if anyone happens across another
counterexample.

Pushed that way.  Thanks for the report and the patch!

regards, tom lane

BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

2023-02-13 Thread Tomas Vondra

Hi,

while experimenting with BRIN indexes after a couple FOSDEM discussions,
I ran into the existing limitation that BRIN indexes don't handle array
scan keys. So BRIN indexes can be used for conditions like

WHERE a IN (1,2,3,4,5)

but we essentially treat the values as individual scan keys, and for
each one we scan the BRIN index and build/update the bitmap. Which for
large indexes may be fairly expensive - the cost is proportional to the
number of values, so if building the bitmap for 1 value takes 10ms, for
100 values it'll take ~1000ms.

It's not hard to construct cases like this (e.g. when using indexes with
small pages_per_range values) etc. Of course, if the query does a lot of
other expensive stuff, this cost may be insignificant.

I'm not sure how often people do queries with such conditions. But I've
been experimenting with features that'd build such paths, so I took a
stab at a PoC, which can significantly reduce the time needed to build
the bitmaps. And there's a couple more interesting opportunities.


0001 - Support SK_SEARCHARRAY in BRIN minmax

The 0001 part does a "naive" SK_SEARCHARRAY implementation for minmax.
It simply sets amsearcharray=true and then tweaks the consistent
function to handle both the scalar and array scan keys.

This is obviously rather inefficient, because the array is searched
linearly. So yes, we don't walk the index repeatedly, but we have to
compare each range to (almost-)all values.


0002 - Sort the array in brinrescan() and do binsearch
--
There's a simple way to optimize the naive approach by sorting the array
and then searching in this array. If the array is sorted, we can search
for the first value >= minvalue, and see if that is consistent (i.e. if
it's <= maxval).

In my experiments this cuts the time needed to build the bitmap for
array to pretty much the same as for a single value.

I think this is similar to the preprocessing of scan keys in b-tree, so
brinrescan() is a natural way to do the sort. The problem however is
where to store the result.

Ideally, we'd store it in BrinOpaque (just like BTScanOpaque in btree),
but the problem is we don't pass that to the consistent functions. Those
only get the ScanKeys and stuff extracted from BrinOpaque.

We might add a parameter to the "consistent" function, but that
conflicts with not wanting to break existing extensions implementing
their own BRIN indexes. We allow the opclasses to define "consistent"
with either 4 or 5 arguments. Adding an argument would mean 5 or 6
arguments, but because of the backwards compatibility we'd need to
support existing opclasses, and 5 is ambiguous :-/

In hindsight, I would probably not chose supporting both 4 and 5
arguments again. It makes it harder for us to maintain the code to make
life easier for extensions, but I'm not aware of any out-of-core BRIN
opclasses anyway. So I'd probably just change the API, it's pretty easy
to update existing extensions.

This patch however does a much simpler thing - it just replaces the
array in the SK_SEARCHARRAY scan key with a sorted one. That works for
for minmax, but not for bloom/inclusion, because those are not based on
sorting. And the ArrayType is not great for minmax either, because it
means we need to deconstruct it again and again, for each range. It'd be
much better to deconstruct the array once.

I'll get back to this ...


0003 - Support SK_SEARCHARRAY in BRIN inclusion
---
Trivial modification to support array scan keys, can't benefit from
sorting the array.


0004 - Support SK_SEARCHARRAY in BRIN bloom
---
Trivial modification to support array scan keys, can't benefit from
sorted array either.

But we might "preprocess" the keys in a different way - bloom needs to
calculate two hashes per key, and at the moment it happens again and
again for each range. So if you have 1M ranges, and SK_SEARCHARRAY query
with 100 values, we'll do 100M calls to PROCNUM_HASH and 200M calls to
hash_uint32_extended(). And our hash functions are pretty expensive,
certainly compared to the fast functions often used for bloom filters.

So the preprocessing might actually calculate the hash functions once,
and then only reuse those in the "consistent" function.

0005 is a dirty PoC illustrating the benefit of caching the hashes.

Unfortunately, this complicates things, because it means:

* The scan key preprocessing is not universal for all BRIN opclasses,
  because some opclasses, i.e. each BRIN opclass might have optional
  BRIN_PROCNUM_PREPROCESS which would preprocess the keys the way the
  opclass would like.

* We can simply replace the array in the scan key the way minmax does
  that with the sorted array, because the data type is not the same
  (hashes are uint64).


When I started to write this e-mail I thought there's pretty much just
one way to move this

Re: ExecRTCheckPerms() and many prunable partitions (sqlsmith)

2023-02-13 Thread Tom Lane

Amit Langote  writes:
> On Mon, Feb 13, 2023 at 5:07 Justin Pryzby  wrote:
>> That seems to add various elog()s which are hit frequently by sqlsmith:

> Thanks for the report.  I’ll take a look once I’m back at a computer in a
> few days.

Looks like we already have a diagnosis and fix [1].  I'll get that
pushed.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAHewXNnnNySD_YcKNuFpQDV2gxWA7_YLWqHmYVcyoOYxn8kY2A%40mail.gmail.com

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Peter Eisentraut


On 09.02.23 10:11, Pavel Stehule wrote:
first and main (for me) - I can use psql variables tab complete - just 
:B - it is significantly faster

second - I can see all connection related information by \set
third - there is not hook on reconnect in psql - so if you implement 
BACKEND_PID by self, you ensure to run query with pg_backend_pid() after 
any reconnect or connection change.


It is clean so you can run "select pg_backend_pid() AS "BACKEND_PID" 
\gset" and you can store it to .psqlrc. But most of the time I am in 
customer's environment, and I have the time, possibility to do a 
complete setup of .psqlrc. It looks (for me) like a generally useful 
feature to be everywhere.


But what do you need the backend PID for in the first place?

Of course, you might want to use it to find your own session in 
pg_stat_activity or something like that, but then you're already in a 
query and can use pg_backend_pid().  What do you need the backend PID 
for outside of such a query?

Re: ERROR: permission info at index 1 ....

2023-02-13 Thread Tom Lane

tender wang  writes:
>After a61b1f74823c commit, below query reports error:
>  create table perm_test1(a int);
>  create table perm_test2(b int);
>  select subq.c0
> from (select (select a from perm_test1 order by a limit 1) as c0, b as c1
> from perm_test2 where false order by c0, c1) as subq where false;
> ERROR:  permission info at index 1 (with relid=16457) does not match
> provided RTE (with relid=16460)

Yeah, this was also reported by Justin Pryzby [1].

> Below codes can fix this:

Right you are.  Pushed, thanks!

regards, tom lane

[1] https://www.postgresql.org/message-id/20230212233711.GA1316%40telsasoft.com

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Pavel Stehule

po 13. 2. 2023 v 18:06 odesílatel Peter Eisentraut <
peter.eisentr...@enterprisedb.com> napsal:

> On 09.02.23 10:11, Pavel Stehule wrote:
> > first and main (for me) - I can use psql variables tab complete - just
> > :B - it is significantly faster
> > second - I can see all connection related information by \set
> > third - there is not hook on reconnect in psql - so if you implement
> > BACKEND_PID by self, you ensure to run query with pg_backend_pid() after
> > any reconnect or connection change.
> >
> > It is clean so you can run "select pg_backend_pid() AS "BACKEND_PID"
> > \gset" and you can store it to .psqlrc. But most of the time I am in
> > customer's environment, and I have the time, possibility to do a
> > complete setup of .psqlrc. It looks (for me) like a generally useful
> > feature to be everywhere.
>
> But what do you need the backend PID for in the first place?
>
> Of course, you might want to use it to find your own session in
> pg_stat_activity or something like that, but then you're already in a
> query and can use pg_backend_pid().  What do you need the backend PID
> for outside of such a query?
>

In every real use case you can use pg_backend_pid(), but you need to write
a complete name without tab complete, and you need to know so this function
is available.

BACKEND_PID is supported by  tab complete, and it is displayed in \set list
and \? variables. Nothing less, nothing more, Custom psql variable can have
some obsolete value.

I can imagine using :BACKEND_PID in \echo command - and it just saves you
one step with its own custom variable.

It is just some more comfort with almost zero cost.

Regards

Pavel

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2023-02-13 Thread Andres Freund

On 2023-02-13 18:33:34 +0900, Kyotaro Horiguchi wrote:
> At Mon, 13 Feb 2023 10:15:03 +0530, Bharath Rupireddy 
>  wrote in 
> > On Sun, Feb 12, 2023 at 11:01 PM Andres Freund  wrote:
> > >
> > > On 2023-02-12 19:59:00 +0530, Bharath Rupireddy wrote:
> > > >   /* Prepare to write out a lot of copies of our zero buffer at 
> > > > once. */
> > > >   for (i = 0; i < lengthof(iov); ++i)
> > > >   {
> > > > - iov[i].iov_base = zbuffer.data;
> > > > + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, 
> > > > &zbuffer)->data);
> > > >   iov[i].iov_len = zbuffer_sz;
> > > >   }
> > >
> > > Another thing: I think we should either avoid iterating over all the IOVs 
> > > if
> > > we don't need them, or, even better, initialize the array as a constant, 
> > > once.
> 
> FWIW, I tried to use the "{[start .. end] = {}}" trick (GNU extension?
> [*1]) for constant array initialization, but individual members don't
> accept assigning a const value, thus I did deconstify as the follows.
> 
> > static const struct iovec   iov[PG_IOV_MAX] =
> > {[0 ... PG_IOV_MAX - 1] =
> >  {
> >  .iov_base = (void *)&zbuffer.data,
> >  .iov_len = BLCKSZ
> >  }
> > };
> 
> I didn't checked the actual mapping, but if I tried an assignment
> "iov[0].iov_base = NULL", it failed as "assignment of member
> ‘iov_base’ in read-only object", so is it successfully placed in a
> read-only segment?
> 
> Later code assigns iov[0].iov_len thus we need to provide a separate
> iov non-const variable, or can we use pwrite instead there?  (I didn't
> find pg_pwrite_with_retry(), though)

Given that we need to do that, and given that we already need to loop to
handle writes that are longer than PG_IOV_MAX * BLCKSZ, it's probably not
worth avoiding iov initialization.

But I think it's worth limiting the initialization to blocks.

I'd also try to combine the first pg_writev_* with the second one.

Greetings,

Andres Freund

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 18:06:23 +0100, Peter Eisentraut wrote:
> On 09.02.23 10:11, Pavel Stehule wrote:
> > first and main (for me) - I can use psql variables tab complete - just
> > :B - it is significantly faster
> > second - I can see all connection related information by \set
> > third - there is not hook on reconnect in psql - so if you implement
> > BACKEND_PID by self, you ensure to run query with pg_backend_pid() after
> > any reconnect or connection change.
> > 
> > It is clean so you can run "select pg_backend_pid() AS "BACKEND_PID"
> > \gset" and you can store it to .psqlrc. But most of the time I am in
> > customer's environment, and I have the time, possibility to do a
> > complete setup of .psqlrc. It looks (for me) like a generally useful
> > feature to be everywhere.
> 
> But what do you need the backend PID for in the first place?

For me it's using gdb, pidstat, strace, perf, ...

But for those %p in the PROMPTs is more useful.


> Of course, you might want to use it to find your own session in
> pg_stat_activity or something like that, but then you're already in a query
> and can use pg_backend_pid().  What do you need the backend PID for outside
> of such a query?

E.g. I fire of a query, it's slower than I'd like, I want to attach perf. Of
course I can establish a separate connection, query pg_stat_activity there,
and then perf. But that requires manually filtering pg_stat_activity to find
the query.

Greetings,

Andres Freund

Re: Force testing of query jumbling code in TAP tests

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 14:00:36 +0900, Michael Paquier wrote:
> With this addition, the query jumbling gets covered at 95%~, while
> https://coverage.postgresql.org/src/backend/nodes/index.html reports
> currently 35%.
> 
> Thoughts or comments?

Shouldn't there at least be some basic verification of pg_stat_statements
output being sane after running the test? Even if that's perhaps just actually
printing the statements.

Greetings,

Andres Freund

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Tom Lane

Andres Freund  writes:
> On 2023-02-13 18:06:23 +0100, Peter Eisentraut wrote:
>> But what do you need the backend PID for in the first place?

> For me it's using gdb, pidstat, strace, perf, ...
> But for those %p in the PROMPTs is more useful.

Indeed, because ...

> E.g. I fire of a query, it's slower than I'd like, I want to attach perf. Of
> course I can establish a separate connection, query pg_stat_activity there,
> and then perf. But that requires manually filtering pg_stat_activity to find
> the query.

... in this case, the problem is that the session is tied up doing the
slow query.  You can't run "select pg_backend_pid()", but you can't
extract a psql variable value either.  If you had the foresight to
set up a PROMPT, or to collect the PID earlier, you're good.  But I'm
still not seeing where a psql variable makes that easier.

I don't buy Pavel's argument that adding Yet Another built-in variable
adds ease of use.  I think what it mostly adds is clutter.  I realize
that "psql --help=variables | wc" is already 160+ lines, but that
doesn't mean that making it longer and longer is a net improvement.

regards, tom lane

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 12:52:23 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > E.g. I fire of a query, it's slower than I'd like, I want to attach perf. Of
> > course I can establish a separate connection, query pg_stat_activity there,
> > and then perf. But that requires manually filtering pg_stat_activity to find
> > the query.
> 
> ... in this case, the problem is that the session is tied up doing the
> slow query.  You can't run "select pg_backend_pid()", but you can't
> extract a psql variable value either.  If you had the foresight to
> set up a PROMPT, or to collect the PID earlier, you're good.  But I'm
> still not seeing where a psql variable makes that easier.

I guess you could argue that referencing BACKEND_PID in PROMPT would be more
readable. But that's about it.

Greetings,

Andres Freund

Re: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 08:22:34 +0530, Amit Kapila wrote:
> On Sat, Feb 11, 2023 at 3:04 AM Andres Freund  wrote:
> >
> > > One difference I see with the patch is that I think we will end up
> > > sending keepalive for empty prepared transactions even though we don't
> > > skip sending begin/prepare messages for those.
> >
> > With the proposed approach we reliably know whether a callback wrote
> > something, so we can tune the behaviour here fairly easily.
> >
> 
> I would like to clarify a few things about the proposed approach. In
> commit_cb_wrapper()/prepare_cb_wrapper(), the patch first did
> ctx->did_write = false;, then call the commit/prepare callback (which
> will call pgoutput_commit_txn()/pgoutput_prepare_txn()) and then call
> update_progress() which will make decisions based on ctx->did_write
> flag. Now, for this to work pgoutput_commit_txn/pgoutput_prepare_txn
> should know that the transaction has performed some writes before that
> call which is currently working because pgoutput is tracking the same
> via sent_begin_txn.

I don't really see these as being related. What pgoutput does internally to
optimize for some usecases shouldn't matter to the larger infrastructure.


> Is the intention here that we still track whether BEGIN () has been sent via
> pgoutput?

Yes. If somebody later wants to propose tracking this alongside a txn and
passing that to the output plugin callbacks, we can do that. But that's
independent of fixing the broken architecture of the progress infrastructure.

Greetings,

Andres Freund

Re: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 14:06:57 +0530, Amit Kapila wrote:
> > > The patch calls update_progress in change_cb_wrapper and other
> > > wrappers which will miss the case of DDLs that generates a lot of data
> > > that is not processed by the plugin. I think for that we either need
> > > to call update_progress from reorderbuffer.c similar to what the patch
> > > has removed or we need some other way to address it. Do you have any
> > > better idea?
> >
> > I don't mind calling something like update_progress() in the specific cases
> > that's needed, but I think those are just the
> >   if (!RelationIsLogicallyLogged(relation))
> >   if (relation->rd_rel->relrewrite && !rb->output_rewrites))
> >
> > To me it makes a lot more sense to call update_progress() for those, rather
> > than generally.
> >
> 
> Won't it be better to call it wherever we don't invoke any wrapper
> function like for cases REORDER_BUFFER_CHANGE_INVALIDATION, sequence
> changes, etc.? I was thinking that wherever we don't call the wrapper
> function which means we don't have a chance to invoke
> update_progress(), the timeout can happen if there are a lot of such
> messages.

ISTM that the likelihood of causing harm due to increased overhead is higher
than the gain.

Greetings,

Andres Freund

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Pavel Stehule

po 13. 2. 2023 v 18:52 odesílatel Tom Lane  napsal:

> Andres Freund  writes:
> > On 2023-02-13 18:06:23 +0100, Peter Eisentraut wrote:
> >> But what do you need the backend PID for in the first place?
>
> > For me it's using gdb, pidstat, strace, perf, ...
> > But for those %p in the PROMPTs is more useful.
>
> Indeed, because ...
>
> > E.g. I fire of a query, it's slower than I'd like, I want to attach
> perf. Of
> > course I can establish a separate connection, query pg_stat_activity
> there,
> > and then perf. But that requires manually filtering pg_stat_activity to
> find
> > the query.
>
> ... in this case, the problem is that the session is tied up doing the
> slow query.  You can't run "select pg_backend_pid()", but you can't
> extract a psql variable value either.  If you had the foresight to
> set up a PROMPT, or to collect the PID earlier, you're good.  But I'm
> still not seeing where a psql variable makes that easier.
>
> I don't buy Pavel's argument that adding Yet Another built-in variable
> adds ease of use.  I think what it mostly adds is clutter.  I realize
> that "psql --help=variables | wc" is already 160+ lines, but that
> doesn't mean that making it longer and longer is a net improvement.
>

There are three kinds of variables - there are about 40 psql variables.

I can be mistaken - I thought so somebody if needed filtering in
pg_stat_activity, they can run just "\set"

and he can see


(2023-02-13 19:09:10) postgres=# \set
AUTOCOMMIT = 'on'
BACKEND_PID = 10102
COMP_KEYWORD_CASE = 'preserve-upper'
DBNAME = 'postgres'
ECHO = 'none'
ECHO_HIDDEN = 'off'
ENCODING = 'UTF8'
ERROR = 'false'
FETCH_COUNT = '0'
HIDE_TABLEAM = 'off'
HIDE_TOAST_COMPRESSION = 'off'
HISTCONTROL = 'none'
HISTSIZE = '500'
HOST = '/tmp'
IGNOREEOF = '0'
LAST_ERROR_MESSAGE = ''
...

he don't need to search more

To find and use pg_backend_pid is not rocket science. But use :BACKEND_PID
is simpler.

It is true, so this information is redundant - I see some benefit in the
possibility to see "by using \set" a little bit more complete view on
session, but surely - this is in "nice to have" category (from my
perspective), and if others has different opinion, than we don't need to
spend with this patch more time. This is not an important feature.

Regards

Pavel


> regards, tom lane
>

Re: run pgindent on a regular basis / scripted manner

2023-02-13 Thread Jelte Fennema

On Mon, 13 Feb 2023 at 17:47, Andrew Dunstan  wrote:
> OK, but I'd like to hear from more people about what they want. Experience 
> tells me that making assumptions about how people work is not a good idea. I 
> doubt anyone's work pattern is like mine. I don't want to implement an option 
> that three people are going to use.

In the general case I agree with you. But in this specific case I
don't. To me the whole point of this email thread is to nudge people
towards indenting the changes that they are committing. Thus indenting
those changes (either before or after adding) is the workflow that we
want to make as easy as possible. Because even if it's not people
their current workflow, by adding the flag it hopefully becomes their
workflow, because it's so easy to use. So my point is we want to
remove as few hurdles as possible for people to indent their changes
(and setting up git aliases or pre-commit hooks are all hurdles).

Re: ERROR: permission info at index 1 ....

2023-02-13 Thread Alvaro Herrera

On 2023-Feb-13, Tom Lane wrote:

> tender wang  writes:
> >After a61b1f74823c commit, below query reports error:
> >  create table perm_test1(a int);
> >  create table perm_test2(b int);
> >  select subq.c0
> > from (select (select a from perm_test1 order by a limit 1) as c0, b as c1
> > from perm_test2 where false order by c0, c1) as subq where false;
> > ERROR:  permission info at index 1 (with relid=16457) does not match
> > provided RTE (with relid=16460)
> 
> Yeah, this was also reported by Justin Pryzby [1].
> 
> > Below codes can fix this:
> 
> Right you are.  Pushed, thanks!

Thank you both!

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Killing off removed rels properly

2023-02-13 Thread Tom Lane

Richard Guo  writes:
> On Sat, Feb 11, 2023 at 4:50 AM Tom Lane  wrote:
>> I think it's time to clean this up by removing the rel from the
>> planner data structures altogether.  The attached passes check-world,
>> and if it does trigger any problems I would say that's a clear
>> sign of bugs elsewhere.

> +1. The patch looks good to me.

Pushed, thanks for looking at it!

> One minor comment is that we should
> also remove the comments about RELOPT_DEADREL in pathnodes.h.

Yeah, I noticed that shortly after posting the patch :-(.  Searching
the optimizer code for other references to "dead" rels turned up another
place where the comments need fixed, namely match_foreign_keys_to_quals
... which is someplace I should have thought to check before, given the
reference to it in remove_rel_from_query.  Its code is fine as-is though.

regards, tom lane

Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

Hi,

I'm working on rebasing [1], my patch to make relation extension scale
better.

As part of that I'd like to add tests for relation extension. To be able to
test the bulk write strategy path, we need to have a few backends concurrently
load > 16MB files.

It seems pretty clear that doing that on all buildfarm machines wouldn't be
nice / welcome. And it also seems likely that this won't be the last case
where that'd be useful.

So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
that we only want to execute on machines with sufficient resources.

Makes sense?

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6he3%40awork3.anarazel.de

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> I'm working on rebasing [1], my patch to make relation extension scale
> better.
> 
> As part of that I'd like to add tests for relation extension. To be able to
> test the bulk write strategy path, we need to have a few backends concurrently
> load > 16MB files.
> 
> It seems pretty clear that doing that on all buildfarm machines wouldn't be
> nice / welcome. And it also seems likely that this won't be the last case
> where that'd be useful.
> 
> So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
> that we only want to execute on machines with sufficient resources.
> 
> Makes sense?

+1 in general.  Are there existing tests that we should add into that
set that you're thinking of..?  I've been working with the Kerberos
tests and that's definitely one that seems to fit this description...

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Making Vars outer-join aware

2023-02-13 Thread Justin Pryzby

On Mon, Feb 13, 2023 at 03:33:15PM +0800, Richard Guo wrote:
> On Mon, Feb 13, 2023 at 7:58 AM Justin Pryzby  wrote:
> > The patch broke this query:
> >
> > select from pg_inherits inner join information_schema.element_types
> > right join (select from pg_constraint as sample_2) on true
> > on false, lateral (select scope_catalog, inhdetachpending from
> > pg_publication_namespace limit 3);
> > ERROR:  could not devise a query plan for the given query

> BTW, here is a simplified query that can trigger this issue on HEAD.
> 
> select * from t1 inner join t2 left join (select null as c from t3 left
> join t4 on true) as sub on true on true, lateral (select c, t1.a from t5
> offset 0 ) ss;

It probably doesn't need to be said that the original query was reduced
from sqlsmith...  But I mention that now to make it searchable.

Thanks,
-- 
Justin

[PATCH] Align GSS and TLS error handling in PQconnectPoll()

2023-02-13 Thread Jacob Champion

Hi all,

During the gssencmode CVE discussion, we noticed that PQconnectPoll()
handles the error cases for TLS and GSS transport encryption slightly
differently. After TLS fails, the connection handle is dead and future
calls to PQconnectPoll() return immediately. But after GSS encryption
fails, the connection handle can still be used to reenter the GSS
handling code.

This doesn't appear to have any security implications today -- and a
client has to actively try to reuse a handle that's already failed --
but it seems undesirable. Michael (cc'd) came up with a patch, which I
have attached here and will register in the CF.

Thanks,
--JacobFrom 75b1be5f484a28e543290e208474d3603a3270bf Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 13 Feb 2023 10:30:43 -0800
Subject: [PATCH] PQconnectPoll: poison connection on gssenc error

Currently, conn->status isn't updated when gssenc establishment fails,
which allows any (misbehaved) future calls to PQconnectPoll() to reenter
CONNECTION_GSS_STARTUP. Align this code with the CONNECTION_SSL_STARTUP
case, which jumps to error_return and poisons the connection handle.

Patch is by Michael Paquier; I just rebased over HEAD.
---
 src/interfaces/libpq/fe-connect.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 50b5df3490..1070b6db24 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3148,17 +3148,22 @@ keep_going:		/* We will come back to here until there is
 	conn->status = CONNECTION_MADE;
 	return PGRES_POLLING_WRITING;
 }
-else if (pollres == PGRES_POLLING_FAILED &&
-		 conn->gssencmode[0] == 'p')
+else if (pollres == PGRES_POLLING_FAILED)
 {
-	/*
-	 * We failed, but we can retry on "prefer".  Have to drop
-	 * the current connection to do so, though.
-	 */
-	conn->try_gss = false;
-	need_new_connection = true;
-	goto keep_going;
+	if (conn->gssencmode[0] == 'p')
+	{
+		/*
+		 * We failed, but we can retry on "prefer".  Have to drop
+		 * the current connection to do so, though.
+		 */
+		conn->try_gss = false;
+		need_new_connection = true;
+		goto keep_going;
+	}
+	/* Else it's a hard failure */
+	goto error_return;
 }
+/* Else, return POLLING_READING or POLLING_WRITING status */
 return pollres;
 #else			/* !ENABLE_GSS */
 /* unreachable */
-- 
2.25.1

Re: ICU locale validation / canonicalization

2023-02-13 Thread Jeff Davis

On Fri, 2023-02-10 at 22:50 -0500, Robert Haas wrote:
> The fact that you're figuring out how it all works from reading the
> source code does not give me a warm feeling.

Right. On the other hand, the behavior is quite well documented, it was
just the keyword that was undocumented (or I didn't find it).

> > But it seems like a better place for us than libc for the reasons I
> > mentioned in the other thread.
> 
> It may be. But sometimes I feel that's not setting our sights very
> high. :-(

How much higher could we set our sights? What would the ideal collation
provider look like?

Those are good questions, but please let's take those questions to the
thread about ICU as a default.

The topic of this thread is:

Given that we are already offering ICU support, should we canonicalize
the locale string stored in the catalog? If so, should we use the ICU
format locale IDs, or BCP 47 language tags?

Do you have an opinion on that topic? If not, do you need additional
information?

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Tom Lane

Andres Freund  writes:
> As part of that I'd like to add tests for relation extension. To be able to
> test the bulk write strategy path, we need to have a few backends concurrently
> load > 16MB files.
> It seems pretty clear that doing that on all buildfarm machines wouldn't be
> nice / welcome. And it also seems likely that this won't be the last case
> where that'd be useful.
> So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
> that we only want to execute on machines with sufficient resources.

Makes sense.  I see that this approach would result in manual check-world
runs not running such tests by default either, which sounds right.

Bikeshedding a bit ... is "large" the right name?  It's not awful but
I wonder if there is a better one; it seems like this class could
eventually include tests that run a long time but don't necessarily
eat disk space.  "resource-intensive" is too long.

regards, tom lane

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
> Are there existing tests that we should add into that set that you're
> thinking of..?  I've been working with the Kerberos tests and that's
> definitely one that seems to fit this description...

I think the kerberos tests are already opt-in, so I don't think we need to
gate it further.

Maybe the pgbench tests?

I guess there's an argument to be made that we should use this for e.g.
002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test
pretty fundamental behaviour like WAL replay, which is unfortunately is pretty
easy to break, so I'd be hesitant.

I guess we could stop running the full regression tests in 002_pg_upgrade.pl
if !large?

Greetings,

Andres Freund

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 13:54:59 -0500, Tom Lane wrote:
> Bikeshedding a bit ... is "large" the right name?  It's not awful but
> I wonder if there is a better one

I did wonder about that too. But didn't come up with something more poignant.

> it seems like this class could eventually include tests that run a long time
> but don't necessarily eat disk space.  "resource-intensive" is too long.

I'm not sure we'd want to combine time-intensive and disk-space-intensive test
in the same category. Availability of disk space and cpu cycles don't have to
correlate that well.

lotsadisk, lotsacpu? :)

Greetings,

Andres Freund

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
> > Are there existing tests that we should add into that set that you're
> > thinking of..?  I've been working with the Kerberos tests and that's
> > definitely one that seems to fit this description...
> 
> I think the kerberos tests are already opt-in, so I don't think we need to
> gate it further.

I'd like to lump them in with a bunch of other tests though, to give it
more chance to run..  My issue currently is that they're *too* gated.

> Maybe the pgbench tests?

Sure.

> I guess there's an argument to be made that we should use this for e.g.
> 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test
> pretty fundamental behaviour like WAL replay, which is unfortunately is pretty
> easy to break, so I'd be hesitant.

Hm.  If you aren't playing with that part of the code though, maybe it'd
be nice to not run them.  The pg_dump tests might also make sense to
segregate out as they can add up to be a lot, and there's more that we
could and probably should be doing there.

> I guess we could stop running the full regression tests in 002_pg_upgrade.pl
> if !large?

Perhaps... but then what are we testing?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 14:15:24 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:
> > > Are there existing tests that we should add into that set that you're
> > > thinking of..?  I've been working with the Kerberos tests and that's
> > > definitely one that seems to fit this description...
> > 
> > I think the kerberos tests are already opt-in, so I don't think we need to
> > gate it further.
> 
> I'd like to lump them in with a bunch of other tests though, to give it
> more chance to run..  My issue currently is that they're *too* gated.

Isn't the reason that we gate them that much that the test poses a security
hazard on a multi-user system?

I don't think we should combine opting into security hazards with opting into
using disk space.

FWIW, the kerberos tests run on all CI OSs other than windows. I have
additional CI coverage for openbsd and netbsd in a separate branch, providing
further coverage - but I'm not sure we want those additional covered OSs in
core PG.

I think the tests for kerberos run frequently enough in practice. I don't know
how good the coverage they provide is, though, but that's a separate aspect to
improve anyway.

> > I guess there's an argument to be made that we should use this for e.g.
> > 002_pg_upgrade.pl or 027_stream_regress.pl - but I think both of these test
> > pretty fundamental behaviour like WAL replay, which is unfortunately is 
> > pretty
> > easy to break, so I'd be hesitant.
> 
> Hm.  If you aren't playing with that part of the code though, maybe it'd
> be nice to not run them.

It's surprisingly easy to break it accidentally...

> The pg_dump tests might also make sense to segregate out as they can add up
> to be a lot, and there's more that we could and probably should be doing
> there.

IMO the main issue with the pg_dump test is their verbosity, rather than the
runtime... ~8.8k subtests is a lot.

find . -name 'regress_log*'|xargs -n 1 wc -l|sort -nr|head -n 5|less
12712 ./testrun/pg_dump/002_pg_dump/log/regress_log_002_pg_dump
5124 ./testrun/pg_rewind/002_databases/log/regress_log_002_databases
1928 ./testrun/pg_rewind/001_basic/log/regress_log_001_basic
1589 ./testrun/recovery/017_shm/log/regress_log_017_shm
1077 ./testrun/pg_rewind/004_pg_xlog_symlink/log/regress_log_004_pg_xlog_symlink

> > I guess we could stop running the full regression tests in 002_pg_upgrade.pl
> > if !large?
> 
> Perhaps... but then what are we testing?

There's plenty to pg_upgrade other than the pg_dump comparison aspect.

I'm not sure it's worth spending too much energy finding tests that we can run
less commonly than now. We've pushed back on tests using lots of resources so
far, so we don't really have them...

Greetings,

Andres Freund

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Tom Lane

Andres Freund  writes:
> On 2023-02-13 13:54:59 -0500, Tom Lane wrote:
>> it seems like this class could eventually include tests that run a long time
>> but don't necessarily eat disk space.  "resource-intensive" is too long.

> I'm not sure we'd want to combine time-intensive and disk-space-intensive test
> in the same category. Availability of disk space and cpu cycles don't have to
> correlate that well.

Yeah, I was thinking along the same lines.

> lotsadisk, lotsacpu? :)

bigdisk, bigcpu?

regards, tom lane

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

On 2023-02-13 14:55:45 -0500, Tom Lane wrote:
> bigdisk, bigcpu?

Works for me.

I'll probably just add bigdisk as part of adding a test for bulk relation
extensions, mentioning in a comment that we might want bigcpu if we have a
test for it?

Re: old_snapshot_threshold bottleneck on replica

2023-02-13 Thread Andres Freund

Hi,

On 2023-01-24 10:46:28 -0500, Robert Haas wrote:
> On Mon, Jan 23, 2023 at 9:40 AM Maxim Orlov  wrote:
> > One of our customers stumble onto a significant performance degradation 
> > while running multiple OLAP-like queries on a replica.
> > After some investigation, it became clear that the problem is in accessing 
> > old_snapshot_threshold parameter.
>
> It has been suggested that we remove that feature entirely.

Indeed. There's a lot of things wrong with it. We have reproducers for
creating wrong query results. Nobody has shown interest in fixing the
problems, for several years by now. It costs users that *do not* use the
feature performance (*).

I think we're doing our users a disservice by claiming to have this feature.

I don't think a lot of the existing code would survive if we were to create a
newer version, more maintainable / reliable, version of the feature.

Greetings,

Andres Freund

(*) E.g. TestForOldSnapshot() is called in a good number of places, and emits
quite a bit of code. It's not executed, but the emitted code is large
enough to lead to worse code being generated.

Re: proposal: psql: psql variable BACKEND_PID

2023-02-13 Thread Peter Eisentraut


On 13.02.23 18:33, Pavel Stehule wrote:
In every real use case you can use pg_backend_pid(), but you need to 
write a complete name without tab complete, and you need to know so this 
function is available.


BACKEND_PID is supported by  tab complete, and it is displayed in \set 
list and \? variables. Nothing less, nothing more, Custom psql variable 
can have some obsolete value.


I can imagine using :BACKEND_PID in \echo command - and it just saves 
you one step with its own custom variable.


It is just some more comfort with almost zero cost.


This line of argument would open us up to copying just about every bit 
of session state into psql just to make it slightly easier to use.

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Tom Lane

Andres Freund  writes:
> On 2023-02-13 14:55:45 -0500, Tom Lane wrote:
>> bigdisk, bigcpu?

> Works for me.

> I'll probably just add bigdisk as part of adding a test for bulk relation
> extensions, mentioning in a comment that we might want bigcpu if we have a
> test for it?

No objection here.

regards, tom lane

Re: REASSIGN OWNED vs ALTER TABLE OWNER TO permission inconsistencies

2023-02-13 Thread Robert Haas

On Wed, Feb 8, 2023 at 5:49 AM Nazir Bilal Yavuz  wrote:
> My colleague Adam realized that when transferring ownership, 'REASSIGN
> OWNED' command doesn't check 'CREATE privilege on the table's schema' on
> new owner but 'ALTER TABLE OWNER TO' docs state that:

Well, that sucks.

> As you can see, 'ALTER TABLE OWNER TO' checked 'CREATE privilege on the
> table's schema' on target_role but 'REASSIGN OWNED' didn't check it and
> transferred ownership of the table. Is this a potential security gap or
> intentional behaviour?

I was looking at this recently and I noticed that for some object
types, ALTER WHATEVER ... OWNER TO requires that the user transferring
ownership possess CREATE on the containing object, which might be a
schema or database depending on the object type. For other object
types, ALTER WHATEVER ... OWNER TO requires that the user *receiving*
ownership possess CREATE on the containing object, either schema or
database. That's not very consistent, and I couldn't find anything to
explain why it's like that. Now you've discovered that REASSIGN OWNED
ignores this issue altogether. Ugh.

We probably ought to make this consistent. Either the donor needs
CREATE permission on the parent object, or the recipient does, or
both, or neither, and whatever the rule may be, it should be
consistent across all types of objects (except for shared objects
which have no parent object) and across all commands.

I think that requiring the recipient to have CREATE permission on the
parent object doesn't really make sense. It could make sense if we did
it consistently, so that there was a hard-and-fast rule that the
current owner always has CREATE on the parent object, but I think that
will never happen. You can be a superuser and thus create objects with
no explicit privileges on the containing object at all, and if your
role is later made NOSUPERUSER, you'll still own those objects. You
could have the privilege initially and then later it could be revoked,
and we would not demand those objects to be dropped or given to a
different owner or whatever. Changing those behaviors doesn't seem
desirable. It would lead to lots of pedantic failures trying to
execute REASSIGN OWNED or REVOKE or ALTER USER ... NOSUPERUSER and I
can't see what we'd really be gaining.

I think that requiring the donor to have CREATE permission on the
parent object makes a little bit more sense. I wouldn't mind if we
tried to standardize on that rule. It would be unlikely to
inconvenience users trying to execute REASSIGN OWNED because most
users running REASSIGNED OWNED are going to be superusers already, or
at the very least highly privileged. However, I'm not sure how much
logical sense it makes. Changing the owner of an object is pretty
different from creating it. It makes sense to require CREATE
permission on the parent object if an object is being *renamed*,
because that's a lot like creating a new object: there's now something
in this schema or database under a name that previously wasn't in use.
But ALTER .. OWNER TO does not have that effect, so I think it's kind
of unclear why we even care about CREATE on the parent object.

I think the important permission checks around ALTER ... OWNER TO are
on the roles involved and their relationship to the object itself. You
need to own the object (or inherit those privileges) and, in master,
you need to be able to SET ROLE to the new owner. If you have those
permissions, is that, perhaps, good enough? Maybe checking CREATE on
the parent object just isn't really needed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

OID ordering dependency in pg_dump

2023-02-13 Thread Tom Lane

While starting to poke at the hashed-enum-partition-key problem
recently discussed [1], I realized that pg_dump's flagInhAttrs()
function has a logic issue: its loop changes state that will be
inspected in other iterations of the loop, and there's no guarantee
about the order in which related tables will be visited.  Typically
we'll see parent tables before children because parents tend to have
smaller OIDs, but there are plenty of ways in which that might not
be true.

As far as I can tell, the implications of this are just cosmetic:
we might dump DEFAULT or GENERATED expressions that we don't really
need to because they match properties of the parent.  Still, it's
buggy, and somebody might carelessly extend the logic in a way that
introduces more-serious bugs.  PFA a proposed patch.

regards, tom lane

[1] https://www.postgresql.org/message-id/1376149.1675268279%40sss.pgh.pa.us

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index a43f2e5553..cbf00adbae 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -471,6 +471,13 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables)
 j,
 k;
 
+	/*
+	 * We scan the tables in OID order, since that's how tblinfo[] is sorted.
+	 * Hence we will typically visit parents before their children --- but
+	 * that is *not* guaranteed.  Thus this loop must be careful that it does
+	 * not alter table properties in a way that would change decisions made at
+	 * their child tables during other iterations.
+	 */
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &(tblinfo[i]);
@@ -519,15 +526,18 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables)
 		parent->numatts);
 if (inhAttrInd >= 0)
 {
+	AttrDefInfo *parentDef = parent->attrdefs[inhAttrInd];
+
 	foundNotNull |= parent->notnull[inhAttrInd];
-	foundDefault |= (parent->attrdefs[inhAttrInd] != NULL &&
+	foundDefault |= (parentDef != NULL &&
+	 strcmp(parentDef->adef_expr, "NULL") != 0 &&
 	 !parent->attgenerated[inhAttrInd]);
 	if (parent->attgenerated[inhAttrInd])
 	{
 		/* these pointer nullness checks are just paranoia */
-		if (parent->attrdefs[inhAttrInd] != NULL &&
+		if (parentDef != NULL &&
 			tbinfo->attrdefs[j] != NULL &&
-			strcmp(parent->attrdefs[inhAttrInd]->adef_expr,
+			strcmp(parentDef->adef_expr,
    tbinfo->attrdefs[j]->adef_expr) == 0)
 			foundSameGenerated = true;
 		else
@@ -539,7 +549,14 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables)
 			/* Remember if we found inherited NOT NULL */
 			tbinfo->inhNotNull[j] = foundNotNull;
 
-			/* Manufacture a DEFAULT NULL clause if necessary */
+			/*
+			 * Manufacture a DEFAULT NULL clause if necessary.  This breaks
+			 * the advice given above to avoid changing state that might get
+			 * inspected in other loop iterations.  We prevent trouble by
+			 * having the foundDefault test above check whether adef_expr is
+			 * "NULL", so that it will reach the same conclusion before or
+			 * after this is done.
+			 */
 			if (foundDefault && tbinfo->attrdefs[j] == NULL)
 			{
 AttrDefInfo *attrDef;
@@ -575,10 +592,10 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables)
 tbinfo->attrdefs[j] = attrDef;
 			}
 
-			/* Remove generation expression from child if possible */
+			/* No need to dump generation expression if it's inheritable */
 			if (foundSameGenerated && !foundDiffGenerated &&
 !tbinfo->ispartition && !dopt->binary_upgrade)
-tbinfo->attrdefs[j] = NULL;
+tbinfo->attrdefs[j]->dobj.dump = DUMP_COMPONENT_NONE;
 		}
 	}
 }
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 527c7651ab..8cd8f10d5e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -8531,9 +8531,9 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
  * Column generation expressions cannot be dumped separately,
  * because there is no syntax for it.  By setting separate to
  * false here we prevent the "default" from being processed as
- * its own dumpable object, and flagInhAttrs() will remove it
- * from the table if possible (that is, if it can be inherited
- * from a parent).
+ * its own dumpable object.  Later, flagInhAttrs() will mark
+ * it as not to be dumped at all, if possible (that is, if it
+ * can be inherited from a parent).
  */
 attrdefs[j].separate = false;
 			}
@@ -15351,9 +15351,11 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 	bool		print_notnull;
 
 	/*
-	 * Default value --- suppress if to be printed separately.
+	 * Default value --- suppress if to be printed separately
+	 * or not at all.
 	 */
 	print_default = (tbinfo->attrdefs[j] != NULL &&
+	 tbinfo->attrdefs[j]->dobj.dump &&
 	 !tbinfo->attrdefs[j]->separate);
 
 	/*

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andrew Dunstan



On 2023-02-13 Mo 14:34, Andres Freund wrote:

Hi,

On 2023-02-13 14:15:24 -0500, Stephen Frost wrote:

* Andres Freund (and...@anarazel.de) wrote:

On 2023-02-13 13:45:41 -0500, Stephen Frost wrote:

Are there existing tests that we should add into that set that you're
thinking of..?  I've been working with the Kerberos tests and that's
definitely one that seems to fit this description...

I think the kerberos tests are already opt-in, so I don't think we need to
gate it further.

I'd like to lump them in with a bunch of other tests though, to give it
more chance to run..  My issue currently is that they're *too* gated.

Isn't the reason that we gate them that much that the test poses a security
hazard on a multi-user system?



That's my understanding.




I don't think we should combine opting into security hazards with opting into
using disk space.



I agree


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Peter Smith

On Tue, Feb 14, 2023 at 5:42 AM Andres Freund  wrote:
>
> Hi,
>
> I'm working on rebasing [1], my patch to make relation extension scale
> better.
>
> As part of that I'd like to add tests for relation extension. To be able to
> test the bulk write strategy path, we need to have a few backends concurrently
> load > 16MB files.
>
> It seems pretty clear that doing that on all buildfarm machines wouldn't be
> nice / welcome. And it also seems likely that this won't be the last case
> where that'd be useful.
>
> So I'd like to add a 'large' class to PG_TEST_EXTRA, that we can use in tests
> that we only want to execute on machines with sufficient resources.
>

Oh, I was been thinking about a similar topic recently, but I was
unaware of PG_TEST_EXTRA  [1]

I've observed suggested test cases get rejected as being overkill, or
because they would add precious seconds to the test execution. OTOH, I
felt such tests would still help gain some additional percentages from
the "code coverage" stats. The kind of tests I am thinking of don't
necessarily need a huge disk/CPU - but they just take longer to run
than anyone has wanted to burden the build-farm with.

~

Sorry for the thread interruption -- but I thought this might be the
right place to ask: What is the recommended way to deal with such
tests intended primarily for better code coverage?

I didn't see anything that looked pre-built for 'coverage'. Did I miss
something, or is it a case of just invent-your-own extra tests/values
for your own ad-hoc requirements?

e.g.
make check EXTRA_TESTS=extra_regress_for_coverage
make check-world PG_TEST_EXTRA='extra_tap_tests_for_coverage'

Thanks!

--
[1] https://www.postgresql.org/docs/devel/regress-run.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: recovery modules

2023-02-13 Thread Nathan Bossart

On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote:
> +   basic_archive_context = data->context;
> +   if (CurrentMemoryContext == basic_archive_context)
> +   MemoryContextSwitchTo(TopMemoryContext);
> +
> +   if (MemoryContextIsValid(basic_archive_context))
> +   MemoryContextDelete(basic_archive_context);
> 
> This is a bit confusing, because it means that we enter in the
> shutdown callback with one context, but exit it under
> TopMemoryContext.  Are you sure that this will be OK when there could
> be multiple callbacks piled up with before_shmem_exit()?  shmem_exit()
> has nothing specific to memory contexts.

Well, we can't free the memory context while we are in it, so we have to
switch to another one.  I agree that this is a bit confusing, though.

On second thought, I'm not sure it's important to make sure the state is
freed in the shutdown callback.  It's only called just before the archiver
process exits, so we're not really at risk of leaking anything.  I suppose
we might not always restart the archiver in this case, but I also don't
anticipate that behavior changing in the near future.  I think this
callback is more useful for things like shutting down background workers.

I went ahead and removed the shutdown callback from basic_archive and the
note about leaking from the documentation.

> Is putting the new headers in src/include/postmaster/ the best move in
> the long term?  Perhaps that's fine, but I was wondering whether a new
> location like archive/ would make more sense.  pg_arch.h being in the
> postmaster section is fine.

I moved them to archive/.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2d4231d40b47a01f46dc2d151661eba46b8ba3b6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 9 Feb 2023 13:49:46 -0800
Subject: [PATCH v11 1/1] restructure archive modules API

---
 contrib/basic_archive/basic_archive.c | 53 -
 doc/src/sgml/archive-modules.sgml | 32 +++---
 src/backend/Makefile  |  2 +-
 src/backend/archive/Makefile  | 18 ++
 src/backend/archive/meson.build   |  5 ++
 .../{postmaster => archive}/shell_archive.c   | 30 ++
 src/backend/meson.build   |  1 +
 src/backend/postmaster/Makefile   |  1 -
 src/backend/postmaster/meson.build|  1 -
 src/backend/postmaster/pgarch.c   | 27 +
 src/backend/utils/misc/guc_tables.c   |  1 +
 src/include/archive/archive_module.h  | 58 +++
 src/include/archive/shell_archive.h   | 22 +++
 src/include/postmaster/pgarch.h   | 39 -
 14 files changed, 205 insertions(+), 85 deletions(-)
 create mode 100644 src/backend/archive/Makefile
 create mode 100644 src/backend/archive/meson.build
 rename src/backend/{postmaster => archive}/shell_archive.c (78%)
 create mode 100644 src/include/archive/archive_module.h
 create mode 100644 src/include/archive/shell_archive.h

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 36b7a4814a..8a6bc25db5 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -30,9 +30,9 @@
 #include 
 #include 
 
+#include "archive/archive_module.h"
 #include "common/int.h"
 #include "miscadmin.h"
-#include "postmaster/pgarch.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
@@ -40,15 +40,27 @@
 
 PG_MODULE_MAGIC;
 
+typedef struct BasicArchiveData
+{
+	MemoryContext context;
+} BasicArchiveData;
+
 static char *archive_directory = NULL;
-static MemoryContext basic_archive_context;
 
-static bool basic_archive_configured(void);
-static bool basic_archive_file(const char *file, const char *path);
+static void basic_archive_startup(ArchiveModuleState *state);
+static bool basic_archive_configured(ArchiveModuleState *state);
+static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path);
 static void basic_archive_file_internal(const char *file, const char *path);
 static bool check_archive_directory(char **newval, void **extra, GucSource source);
 static bool compare_files(const char *file1, const char *file2);
 
+static const ArchiveModuleCallbacks basic_archive_callbacks = {
+	.startup_cb = basic_archive_startup,
+	.check_configured_cb = basic_archive_configured,
+	.archive_file_cb = basic_archive_file,
+	.shutdown_cb = NULL
+};
+
 /*
  * _PG_init
  *
@@ -67,10 +79,6 @@ _PG_init(void)
 			   check_archive_directory, NULL, NULL);
 
 	MarkGUCPrefixReserved("basic_archive");
-
-	basic_archive_context = AllocSetContextCreate(TopMemoryContext,
-  "basic_archive",
-  ALLOCSET_DEFAULT_SIZES);
 }
 
 /*
@@ -78,11 +86,28 @@ _PG_init(void)
  *
  * Returns the module's archiving callbacks.
  */
+const ArchiveModuleCallbacks *
+_PG_archive_module_init(void)
+{
+	return &basic_archive_callbacks

[PATCH] Fix unbounded authentication exchanges during PQconnectPoll()

2023-02-13 Thread Jacob Champion

Hello,

This is closely related to the prior conversation at [1]. There are a
couple places in CONNECTION_AWAITING_RESPONSE where libpq will read a
huge number of bytes from a server that we really should have hung up on.

The attached patch adds a length check for the v2 error compatibility
case, and updates the v3 error handling to jump to error_return rather
than asking for more data. The existing error_return paths have been
updated for consistency.

Thanks,
--Jacob

[1]
https://www.postgresql.org/message-id/a5c5783d-73f3-acbc-997f-1649a7406029%40timescale.comFrom 949a5e994235a60731ff827dcfc5157007d937ca Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Fri, 18 Nov 2022 13:45:34 -0800
Subject: [PATCH] PQconnectPoll: fix unbounded authentication exchanges

A couple of code paths in CONNECTION_AWAITING_RESPONSE will eagerly read
bytes off a connection that should be closed. Don't let a misbehaving
server chew up client resources here; a v2 error can't be infinitely
long, and a v3 error should be bounded by its original message length.

For the existing error_return cases, I added some additional error
messages for symmetry with the new ones.
---
 src/interfaces/libpq/fe-connect.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 50b5df3490..5212ae1a52 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3230,12 +3230,24 @@ keep_going:		/* We will come back to here until there is
 	goto error_return;
 }
 
-if (beresp == 'E' && (msgLength < 8 || msgLength > 3))
+#define MAX_ERRLEN 3
+if (beresp == 'E' && (msgLength < 8 || msgLength > MAX_ERRLEN))
 {
 	/* Handle error from a pre-3.0 server */
 	conn->inCursor = conn->inStart + 1; /* reread data */
 	if (pqGets_append(&conn->errorMessage, conn))
 	{
+		/*
+		 * We may not have authenticated the server yet, so
+		 * don't let the buffer grow forever.
+		 */
+		avail = conn->inEnd - conn->inCursor;
+		if (avail > MAX_ERRLEN)
+		{
+			libpq_append_conn_error(conn, "server sent overlong v2 error message");
+			goto error_return;
+		}
+
 		/* We'll come back when there is more data */
 		return PGRES_POLLING_READING;
 	}
@@ -3255,9 +3267,15 @@ keep_going:		/* We will come back to here until there is
 
 	goto error_return;
 }
+#undef MAX_ERRLEN
 
 /*
  * Can't process if message body isn't all here yet.
+ *
+ * After this check passes, any further EOF during parsing
+ * implies that the server sent a bad/truncated message. Reading
+ * more bytes won't help in that case, so don't return
+ * PGRES_POLLING_READING after this point.
  */
 msgLength -= 4;
 avail = conn->inEnd - conn->inCursor;
@@ -3280,8 +3298,8 @@ keep_going:		/* We will come back to here until there is
 {
 	if (pqGetErrorNotice3(conn, true))
 	{
-		/* We'll come back when there is more data */
-		return PGRES_POLLING_READING;
+		libpq_append_conn_error(conn, "server sent truncated error message");
+		goto error_return;
 	}
 	/* OK, we read the message; mark data consumed */
 	conn->inStart = conn->inCursor;
@@ -3357,6 +3375,7 @@ keep_going:		/* We will come back to here until there is
 {
 	if (pqGetNegotiateProtocolVersion3(conn))
 	{
+		libpq_append_conn_error(conn, "server sent truncated protocol negotiation message");
 		goto error_return;
 	}
 	/* OK, we read the message; mark data consumed */
@@ -3370,6 +3389,7 @@ keep_going:		/* We will come back to here until there is
 /* Get the type of request. */
 if (pqGetInt((int *) &areq, 4, conn))
 {
+	libpq_append_conn_error(conn, "server sent truncated authentication request");
 	goto error_return;
 }
 msgLength -= 4;
-- 
2.25.1

pg_walinspect memory leaks

2023-02-13 Thread Peter Geoghegan

It looks like pg_walinspect's GetWALRecordsInfo() routine doesn't take
sufficient care with memory management. It should avoid memory leaks
of the kind that lead to OOMs whenever
pg_get_wal_records_info_till_end_of_wal() has to return very many
tuples. Right now it isn't that hard to make that happen, even on a
system where memory is plentiful. I wasn't expecting that, because all
of these functions use a tuplestore.

More concretely, it looks like GetWALRecordInfo() calls
CStringGetTextDatum/cstring_to_text in a way that accumulates way too
much memory in ExprContext. This could be avoided by using a separate
memory context that is reset periodically, or something else along the
same lines.

-- 
Peter Geoghegan

Re: Improve logging when using Huge Pages

2023-02-13 Thread Justin Pryzby

On Thu, Feb 09, 2023 at 11:29:09AM -0800, Nathan Bossart wrote:
> On Thu, Feb 09, 2023 at 10:40:13AM +0100, Alvaro Herrera wrote:
> > On 2023-Feb-08, Justin Pryzby wrote:
> >> I don't think it makes sense to run postgres -C huge_pages_active,
> >> however, so I see no issue that that would always returns "false".
> > 
> > Hmm, I would initialize it to return "unknown" rather than "off" — and
> > make sure it turns "off" at the appropriate time.  Otherwise you're just
> > moving the confusion elsewhere.
> 
> I think this approach would address my concerns about using a GUC.

Done that way.  This also fixes the logic in win32_shmem.c.

-- 
Justin
>From 5ead7ba3711dd7a0bb9ec083ebd60049f50f9907 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Jan 2023 18:33:51 -0600
Subject: [PATCH v3] add GUC: huge_pages_active

This is useful to show the current state of huge pages when
huge_pages=try.  The effective status is not otherwise visible without
OS level tools like gdb or /proc/N/smaps.

https://www.postgresql.org/message-id/flat/tu4pr8401mb1152ebb0d271f827e2e37a01ee...@tu4pr8401mb1152.namprd84.prod.outlook.com
---
 doc/src/sgml/config.sgml| 17 -
 src/backend/port/sysv_shmem.c   |  3 +++
 src/backend/port/win32_shmem.c  |  4 
 src/backend/utils/misc/guc_tables.c | 12 
 4 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8c56b134a84..3770c7dc254 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1700,23 +1700,24 @@ include_dir 'conf.d'
   
   
   

 Controls whether huge pages are requested for the main shared memory
 area. Valid values are try (the default),
 on, and off.  With
 huge_pages set to try, the
 server will try to request huge pages, but fall back to the default if
 that fails. With on, failure to request huge pages
 will prevent the server from starting up. With off,
-huge pages will not be requested.
+huge pages will not be requested.  The actual state of huge pages is
+indicated by the server variable .

 

 At present, this setting is supported only on Linux and Windows. The
 setting is ignored on other systems when set to
 try.  On Linux, it is only supported when
 shared_memory_type is set to mmap
 (the default).

 

@@ -10679,22 +10680,36 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 with assertions enabled. That is the case if the
 macro USE_ASSERT_CHECKING is defined
 when PostgreSQL is built (accomplished
 e.g., by the configure option
 --enable-cassert). By
 default PostgreSQL is built without
 assertions.

   
  
 
+ 
+  huge_pages_active (boolean)
+  
+   huge_pages_active configuration parameter
+  
+  
+  
+   
+Reports whether huge pages are in use by the current process.
+See  for more information.
+   
+  
+ 
+
  
   integer_datetimes (boolean)
   
integer_datetimes configuration parameter
   
   
   

 Reports whether PostgreSQL was built with support for
 64-bit-integer dates and times.  As of PostgreSQL 10,
 this is always on.
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index eaba244bc9c..48ead4d27bf 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -619,22 +619,25 @@ CreateAnonymousSegment(Size *size)
 			allocsize += hugepagesize - (allocsize % hugepagesize);
 
 		ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE,
    PG_MMAP_FLAGS | mmap_flags, -1, 0);
 		mmap_errno = errno;
 		if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED)
 			elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m",
  allocsize);
 	}
 #endif
 
+	SetConfigOption("huge_pages_active", ptr == MAP_FAILED ? "off" : "on",
+	PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+
 	if (ptr == MAP_FAILED && huge_pages != HUGE_PAGES_ON)
 	{
 		/*
 		 * Use the original size, not the rounded-up value, when falling back
 		 * to non-huge pages.
 		 */
 		allocsize = *size;
 		ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE,
    PG_MMAP_FLAGS, -1, 0);
 		mmap_errno = errno;
 	}
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 62e08074770..0bf594c8bf9 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -319,22 +319,26 @@ retry:
 		 * If the segment already existed, CreateFileMapping() will return a
 		 * handle to the existing one and set ERROR_ALREADY_EXISTS.
 		 */
 		if (GetLastError() == ERROR_ALREADY_EXISTS)
 		{
 			CloseHandle(hmap);	/* Close the handle, since we got a valid one
  * to the previous segment. */

Re: recovery modules

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 14:56:47 -0800, Nathan Bossart wrote:
> On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote:
> > +   basic_archive_context = data->context;
> > +   if (CurrentMemoryContext == basic_archive_context)
> > +   MemoryContextSwitchTo(TopMemoryContext);
> > +
> > +   if (MemoryContextIsValid(basic_archive_context))
> > +   MemoryContextDelete(basic_archive_context);
> > 
> > This is a bit confusing, because it means that we enter in the
> > shutdown callback with one context, but exit it under
> > TopMemoryContext.  Are you sure that this will be OK when there could
> > be multiple callbacks piled up with before_shmem_exit()?  shmem_exit()
> > has nothing specific to memory contexts.
> 
> Well, we can't free the memory context while we are in it, so we have to
> switch to another one.  I agree that this is a bit confusing, though.

Why would we be in that memory context? I'd just add an assert documenting
we're not.


> On second thought, I'm not sure it's important to make sure the state is
> freed in the shutdown callback.  It's only called just before the archiver
> process exits, so we're not really at risk of leaking anything.  I suppose
> we might not always restart the archiver in this case, but I also don't
> anticipate that behavior changing in the near future.  I think this
> callback is more useful for things like shutting down background workers.

I think it's crucial. Otherwise we're just ossifying the design that there's
just one archive module active at a time.


> I went ahead and removed the shutdown callback from basic_archive and the
> note about leaking from the documentation.

-1


Greetings,

Andres Freund

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-14 09:26:47 +1100, Peter Smith wrote:
> I've observed suggested test cases get rejected as being overkill, or
> because they would add precious seconds to the test execution. OTOH, I
> felt such tests would still help gain some additional percentages from
> the "code coverage" stats. The kind of tests I am thinking of don't
> necessarily need a huge disk/CPU - but they just take longer to run
> than anyone has wanted to burden the build-farm with.

I'd say it depend on the test whether it's worth adding. Code coverage for its
own sake isn't that useful, they have to actually test something useful. And
tests have costs beyond runtime, e.g. new tests tend to fail in some edge
cases.

E.g. just having tests hit more lines, without verifying that the behaviour is
actually correct, only provides limited additional assurance.  It's also not
very useful to add a very expensive test that provides only a very small
additional amount of coverage.

IOW, even if we add more test categories, it'll still be a tradeoff.

> Sorry for the thread interruption -- but I thought this might be the
> right place to ask: What is the recommended way to deal with such
> tests intended primarily for better code coverage?

I don't think that exists today.

Do you have an example of the kind of test you're thinking of?

Greetings,

Andres Freund

Re: Rework of collation code, extensibility

2023-02-13 Thread Jeff Davis

New version attached. Changes:

* I moved the GUC patch to the end (so you can ignore it if it's not
useful for review)
* I cut out the pg_locale_internal.h rearrangement (at least for now,
it might seem useful after the dust settles on the other changes).
* I added a separate patch for pg_locale_deterministic().
* I added a separate patch for a simple cleanup of a USE_ICU special
case.

Now the patches are:

0001: pg_strcoll/pg_strxfrm
0002: pg_locale_deterministic()
0003: cleanup a USE_ICU special case
0004: GUCs (only for testing, not for commit)

Responses to your review comments inline below:

On Mon, 2023-02-13 at 11:35 +0100, Peter Eisentraut wrote:
> The commit message for 0002 says "Also remove the TRUST_STRXFRM
> define", 
> but I think this is incorrect, as that is done in the 0001 patch.

Fixed.

> I don't like that the pg_strnxfrm() function requires these kinds of 
> repetitive error checks:
> 
> +   if (rsize != bsize)
> +   elog(ERROR, "pg_strnxfrm() returned unexpected
> result");
> 
> This could be checked inside the function itself, so that the callers
> don't have to do this themselves every time.

The current API allows for a pattern like:

   /* avoids extra work if existing buffer is big enough */
   len = pg_strxfrm(buf, src, bufSize, loc);
   if (len >= bufSize)
   {
   buf = repalloc(len+1);
   bufSize = len+1;
   len2 = pg_strxfrm(buf, src, bufSize, loc);
   }

The test for rsize != bsize are just there to check that the underlying
library calls (strxfrm or getSortKey) behave as documented, and we
expect that they'd never be hit. It's hard to move that kind of check
into pg_strxfrm() without making it also manage the buffers.

Do you have a more specific suggestion? I'd like to keep the API
flexible enough that the caller can manage the buffers, like with
abbreviated keys. Perhaps the check can just be removed if we trust
that the library functions at least get the size calculation right? Or
turned into an Assert?


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 5fedf2efffa4a75d8de974c41af084c5a028551d Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 1 Dec 2022 14:45:15 -0800
Subject: [PATCH v10 1/4] Add pg_strcoll(), pg_strxfrm(), and variants.

Towards multi-lib ICU, which should be based on a clean separation of
the routines required for collation providers. Also offers a generally
better separation of responsibilities.

Callers with NUL-terminated strings should call pg_strcoll() or
pg_strxfrm(); callers with strings and their length should call the
variants pg_strncoll() or pg_strnxfrm().
---
 src/backend/access/hash/hashfunc.c |  45 +-
 src/backend/utils/adt/pg_locale.c  | 784 -
 src/backend/utils/adt/varchar.c|  41 +-
 src/backend/utils/adt/varlena.c| 368 ++
 src/include/utils/pg_locale.h  |  13 +
 5 files changed, 863 insertions(+), 388 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index e3e40d6c21..c0ed995919 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -292,21 +292,19 @@ hashtext(PG_FUNCTION_ARGS)
 #ifdef USE_ICU
 		if (mylocale->provider == COLLPROVIDER_ICU)
 		{
-			int32_t		ulen = -1;
-			UChar	   *uchar = NULL;
-			Size		bsize;
-			uint8_t*buf;
+			Size		bsize, rsize;
+			char	   *buf;
+			const char *keydata = VARDATA_ANY(key);
+			size_t		keylen = VARSIZE_ANY_EXHDR(key);
 
-			ulen = icu_to_uchar(&uchar, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
-
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-	uchar, ulen, NULL, 0);
+			bsize = pg_strnxfrm(NULL, 0, keydata, keylen, mylocale);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-			uchar, ulen, buf, bsize);
-			pfree(uchar);
 
-			result = hash_any(buf, bsize);
+			rsize = pg_strnxfrm(buf, bsize, keydata, keylen, mylocale);
+			if (rsize != bsize)
+elog(ERROR, "pg_strnxfrm() returned unexpected result");
+
+			result = hash_any((uint8_t *) buf, bsize);
 
 			pfree(buf);
 		}
@@ -350,21 +348,20 @@ hashtextextended(PG_FUNCTION_ARGS)
 #ifdef USE_ICU
 		if (mylocale->provider == COLLPROVIDER_ICU)
 		{
-			int32_t		ulen = -1;
-			UChar	   *uchar = NULL;
-			Size		bsize;
-			uint8_t*buf;
+			Size		bsize, rsize;
+			char	   *buf;
+			const char *keydata = VARDATA_ANY(key);
+			size_t		keylen = VARSIZE_ANY_EXHDR(key);
 
-			ulen = icu_to_uchar(&uchar, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
-
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-	uchar, ulen, NULL, 0);
+			bsize = pg_strnxfrm(NULL, 0, keydata, keylen, mylocale);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-			uchar, ulen, buf, bsize);
-			pfree(uchar);
 
-			result = hash_any_extended(buf, bsize, PG_GETARG_INT64(1));
+			rsize = pg_strnxfrm(buf, bsize, keydata, keylen, mylocale);
+			if (rsize != bsize)
+elog(ERROR, "pg_strnxfrm() returned unexpected r

Re: pglz compression performance, take two

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-08 11:16:47 +0100, Tomas Vondra wrote:
> On 2/7/23 21:18, Andres Freund wrote:
> > 
> > Independent of this failure, I'm worried about the cost/benefit analysis of 
> > a
> > pglz change that changes this much at once. It's quite hard to review.
> > 
> 
> I agree.
> 
> I think I managed to understand what the patch does during the review,
> but it's so much harder - it'd definitely be better to have this split
> into smaller parts, somehow. Interestingly enough the commit message
> actually says this:
> 
>   This patch accumulates several changes to pglz compression:
>   1. Convert macro-functions to regular functions for readability
>   2. Use more compact hash table with uint16 indexes instead of pointers
>   3. Avoid prev pointer in hash table
>   4. Use 4-byte comparisons during a search instead of 1-byte
>  comparisons
> 
> Which I think is a pretty good recipe how to split the patch. (And we
> also need a better commit message, or at least a proposal.)
> 
> This'd probably also help when investigating the extra byte issue,
> discussed yesterday. (Assuming it's not related to the invalid access
> reported by valgrind / asan).

Due to the sanitizer changes, and this feedback, I'm marking the entry as
waiting on author.

Greetings,

Andres Freund

Re: pglz compression performance, take two

2023-02-13 Thread Andrey Borodin

On Mon, Feb 13, 2023 at 4:03 PM Andres Freund  wrote:
>
> Due to the sanitizer changes, and this feedback, I'm marking the entry as
> waiting on author.
>
Thanks Andres! Yes, I plan to make another attempt to refactor this
patch on the weekend. If this attempt fails, I think we should just
reject it and I'll get back to this during summer.

Best regards, Andrey Borodin.

Re: Support for dumping extended statistics

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-01 04:38:17 +, Hari krishna Maddileti wrote:
> Thanks for the update, I have attached the updated patch with 
> meson compatible and  addressed warnings from make file too.

This patch consistently crashes in CI:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F4114

Example crash:
https://api.cirrus-ci.com/v1/task/4910781754507264/logs/cores.log

Greetings,

Andres Freund

Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-08 13:10:40 +0100, Alvaro Herrera wrote:
> On 2023-Feb-08, Alvaro Herrera wrote:
> 
> > I propose instead the following: each command is prepared just before
> > it's executed, as previously, and if we see a \startpipeline, then we
> > prepare all commands starting with the one just after, and until the
> > \endpipeline.
> 
> Here's the patch.

There's something wrong with the patch, it reliably fails with core dumps:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3260

Example crash:
https://api.cirrus-ci.com/v1/task/4922406553255936/logs/cores.log
https://api.cirrus-ci.com/v1/artifact/task/6611256413519872/crashlog/crashlog-pgbench.EXE_0750_2023-02-13_14-07-06-189.txt

Andres

Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

2023-02-13 Thread Andrey Borodin

Hi hackers!

I was asked to prototype a feature that helps to distinguish shared
buffer usage between index reads and heap reads. Practically it looks
like this:

# explain (analyze,verbose,buffers) select nextval('s');
   QUERY PLAN

 Result  (cost=0.00..0.01 rows=1 width=8) (actual time=1.159..1.160
rows=1 loops=1)
   Output: nextval('s'::regclass)
   Buffers: shared hit=7(relation 3, index 4) read=6(relation 1, index
4, sequence 1) dirtied=1
 Planning Time: 0.214 ms
 Execution Time: 1.238 ms
(5 rows)


The change is in these parts "(relation 3, index 4)" and "(relation 1,
index 4, sequence 1)". Probably, it should help DBAs to better
understand complex queries.
I think cluttering output with more text is OK as long as "verbose" is
requested.

But there are some caveats:
1. Some more increments on hot paths. We have to add this tiny toll to
every single buffer hit, but it will be seldom of any use.
2. It's difficult to measure writes caused by query, and even dirties.
The patch adds "evictions" caused by query, but they have little
practical sense too.

All in all I do not have an opinion if this feature is a good tradeoff.
What do you think? Does the feature look useful? Do we want a more
polished implementation?

Thanks!


Best regards, Andrey Borodin.


v1-0001-Split-EXPLAIN-ANALYZE-BUFFERS-by-RelKind.patch
Description: Binary data

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-02-13 Thread Andres Freund

Hi,

On 2023-01-26 15:27:20 -0500, Reid Thompson wrote:
> Yes, just a rebase. There is still work to be done per earlier in the
> thread.

The tests recently started to fail:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3867


> I do want to follow up and note re palloc/pfree vs malloc/free that the
> tracking code (0001-Add-tracking-...) is not tracking palloc/pfree but is
> explicitely tracking malloc/free. Not every palloc/pfree call executes the
> tracking code, only those where the path followed includes malloc() or
> free().  Routine palloc() calls fulfilled from the context's
> freelist/emptyblocks/freeblock/etc and pfree() calls not invoking free()
> avoid the tracking code.

Sure, but we create a lot of memory contexts, so that's not a whole lot of
comfort.


I marked this as waiting on author.

Greetings,

Andres Freund

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 16:23:30 -0800, Andrey Borodin wrote:
> But there are some caveats:
> 1. Some more increments on hot paths. We have to add this tiny toll to
> every single buffer hit, but it will be seldom of any use.

Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's
already quite expensive...


> All in all I do not have an opinion if this feature is a good tradeoff.
> What do you think? Does the feature look useful? Do we want a more
> polished implementation?

Unless the above issues could be avoided, I don't think so.

Greetings,

Andres Freund

Re: Make EXPLAIN generate a generic plan for a parameterized query

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-05 18:24:03 +0100, Laurenz Albe wrote:
> Anyway, attached is v7 that tries to do it that way.

This consistently fails on CI:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3962

https://api.cirrus-ci.com/v1/artifact/task/5156723929907200/testrun/build/testrun/regress/regress/regression.diffs

Greetings,

Andres Freund

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

2023-02-13 Thread Andrey Borodin

On Mon, Feb 13, 2023 at 4:29 PM Andres Freund  wrote:
> > 1. Some more increments on hot paths. We have to add this tiny toll to
> > every single buffer hit, but it will be seldom of any use.
>
> Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's
> already quite expensive...
>

I think collection of instrumentation is done unconditionally.
We always do that
pgBufferUsage.shared_blks_hit++;
when the buffer is in shared_buffers.

Best regards, Andrey Borodin.

Re: Inconsistency in reporting checkpointer stats

2023-02-13 Thread Andres Freund



On 2022-12-21 17:14:12 +0530, Nitin Jadhav wrote:
> Kindly review and share your comments.

This doesn't pass the tests, because the regression tests weren't adjusted:

https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Peter Smith

On Tue, Feb 14, 2023 at 10:44 AM Andres Freund  wrote:
>
> Hi,
>
> On 2023-02-14 09:26:47 +1100, Peter Smith wrote:
> > I've observed suggested test cases get rejected as being overkill, or
> > because they would add precious seconds to the test execution. OTOH, I
> > felt such tests would still help gain some additional percentages from
> > the "code coverage" stats. The kind of tests I am thinking of don't
> > necessarily need a huge disk/CPU - but they just take longer to run
> > than anyone has wanted to burden the build-farm with.
>
> I'd say it depend on the test whether it's worth adding. Code coverage for its
> own sake isn't that useful, they have to actually test something useful. And
> tests have costs beyond runtime, e.g. new tests tend to fail in some edge
> cases.
>
> E.g. just having tests hit more lines, without verifying that the behaviour is
> actually correct, only provides limited additional assurance.  It's also not
> very useful to add a very expensive test that provides only a very small
> additional amount of coverage.
>
> IOW, even if we add more test categories, it'll still be a tradeoff.
>
>
> > Sorry for the thread interruption -- but I thought this might be the
> > right place to ask: What is the recommended way to deal with such
> > tests intended primarily for better code coverage?
>
> I don't think that exists today.
>
> Do you have an example of the kind of test you're thinking of?

No, nothing specific in mind. But maybe like these:
- tests for causing obscure errors that would never otherwise be
reached without something deliberately designed to fail a certain way
- tests for trivial user errors apparently deemed not worth bloating
the regression tests with -- e.g. many errorConflictingDefElem not
being called [1].
- timing-related or error tests where some long (multi-second) delay
is a necessary part of the setup.

--
[1] 
https://coverage.postgresql.org/src/backend/commands/subscriptioncmds.c.gcov.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 16:36:25 -0800, Andrey Borodin wrote:
> On Mon, Feb 13, 2023 at 4:29 PM Andres Freund  wrote:
> > > 1. Some more increments on hot paths. We have to add this tiny toll to
> > > every single buffer hit, but it will be seldom of any use.
> >
> > Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. 
> > It's
> > already quite expensive...
> >
> 
> I think collection of instrumentation is done unconditionally.
> We always do that
> pgBufferUsage.shared_blks_hit++;
> when the buffer is in shared_buffers.

The problem I'm talking about is the increased overhead in InstrStopNode(),
due to BufferUsageAccumDiff() getting more expensive.

Andres

Re: Adding "large" to PG_TEST_EXTRA

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-14 11:38:06 +1100, Peter Smith wrote:
> No, nothing specific in mind. But maybe like these:
> - tests for causing obscure errors that would never otherwise be
> reached without something deliberately designed to fail a certain way

I think there's some cases around this that could be usefu, but also a lot
that wouldn't.

> - tests for trivial user errors apparently deemed not worth bloating
> the regression tests with -- e.g. many errorConflictingDefElem not
> being called [1].

I don't think it's worth adding a tests for all of these. The likelihood of
catching a problem seems quite small.

> - timing-related or error tests where some long (multi-second) delay
> is a necessary part of the setup.

IME that's almost always a sign that the test wouldn't be stable anyway.

Greetings,

Andres Freund

Re: pg_walinspect memory leaks

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote:
> More concretely, it looks like GetWALRecordInfo() calls
> CStringGetTextDatum/cstring_to_text in a way that accumulates way too
> much memory in ExprContext.

Additionally, we leak two stringinfos for each record.

> This could be avoided by using a separate memory context that is reset
> periodically, or something else along the same lines.

Everything other than a per-row memory context that's reset each time seems
hard to manage in this case.

Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily
dilligent about cleaning up O(1) memory, after not caring about O(N) memory...

Greetings,

Andres Freund

Re: recovery modules

2023-02-13 Thread Nathan Bossart

On Mon, Feb 13, 2023 at 03:37:33PM -0800, Andres Freund wrote:
> On 2023-02-13 14:56:47 -0800, Nathan Bossart wrote:
>> On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote:
>> > +   basic_archive_context = data->context;
>> > +   if (CurrentMemoryContext == basic_archive_context)
>> > +   MemoryContextSwitchTo(TopMemoryContext);
>> > +
>> > +   if (MemoryContextIsValid(basic_archive_context))
>> > +   MemoryContextDelete(basic_archive_context);
>> > 
>> > This is a bit confusing, because it means that we enter in the
>> > shutdown callback with one context, but exit it under
>> > TopMemoryContext.  Are you sure that this will be OK when there could
>> > be multiple callbacks piled up with before_shmem_exit()?  shmem_exit()
>> > has nothing specific to memory contexts.
>> 
>> Well, we can't free the memory context while we are in it, so we have to
>> switch to another one.  I agree that this is a bit confusing, though.
> 
> Why would we be in that memory context? I'd just add an assert documenting
> we're not.
> 
> 
>> On second thought, I'm not sure it's important to make sure the state is
>> freed in the shutdown callback.  It's only called just before the archiver
>> process exits, so we're not really at risk of leaking anything.  I suppose
>> we might not always restart the archiver in this case, but I also don't
>> anticipate that behavior changing in the near future.  I think this
>> callback is more useful for things like shutting down background workers.
> 
> I think it's crucial. Otherwise we're just ossifying the design that there's
> just one archive module active at a time.
> 
> 
>> I went ahead and removed the shutdown callback from basic_archive and the
>> note about leaking from the documentation.
> 
> -1

Okay.  I've added it back in v12 with the suggested adjustment for the
memory context stuff.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 22e583fa5b41d820e3233d87ea1306ff81349574 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 9 Feb 2023 13:49:46 -0800
Subject: [PATCH v12 1/1] restructure archive modules API

---
 contrib/basic_archive/basic_archive.c | 85 ---
 doc/src/sgml/archive-modules.sgml | 35 ++--
 src/backend/Makefile  |  2 +-
 src/backend/archive/Makefile  | 18 
 src/backend/archive/meson.build   |  5 ++
 .../{postmaster => archive}/shell_archive.c   | 30 ---
 src/backend/meson.build   |  1 +
 src/backend/postmaster/Makefile   |  1 -
 src/backend/postmaster/meson.build|  1 -
 src/backend/postmaster/pgarch.c   | 27 +++---
 src/backend/utils/misc/guc_tables.c   |  1 +
 src/include/archive/archive_module.h  | 58 +
 src/include/archive/shell_archive.h   | 22 +
 src/include/postmaster/pgarch.h   | 39 -
 14 files changed, 239 insertions(+), 86 deletions(-)
 create mode 100644 src/backend/archive/Makefile
 create mode 100644 src/backend/archive/meson.build
 rename src/backend/{postmaster => archive}/shell_archive.c (78%)
 create mode 100644 src/include/archive/archive_module.h
 create mode 100644 src/include/archive/shell_archive.h

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 36b7a4814a..d7c227a10b 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -30,9 +30,9 @@
 #include 
 #include 
 
+#include "archive/archive_module.h"
 #include "common/int.h"
 #include "miscadmin.h"
-#include "postmaster/pgarch.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
@@ -40,14 +40,27 @@
 
 PG_MODULE_MAGIC;
 
+typedef struct BasicArchiveData
+{
+	MemoryContext context;
+} BasicArchiveData;
+
 static char *archive_directory = NULL;
-static MemoryContext basic_archive_context;
 
-static bool basic_archive_configured(void);
-static bool basic_archive_file(const char *file, const char *path);
+static void basic_archive_startup(ArchiveModuleState *state);
+static bool basic_archive_configured(ArchiveModuleState *state);
+static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path);
 static void basic_archive_file_internal(const char *file, const char *path);
 static bool check_archive_directory(char **newval, void **extra, GucSource source);
 static bool compare_files(const char *file1, const char *file2);
+static void basic_archive_shutdown(ArchiveModuleState *state);
+
+static const ArchiveModuleCallbacks basic_archive_callbacks = {
+	.startup_cb = basic_archive_startup,
+	.check_configured_cb = basic_archive_configured,
+	.archive_file_cb = basic_archive_file,
+	.shutdown_cb = basic_archive_shutdown
+};
 
 /*
  * _PG_init
@@ -67,10 +80,6 @@ _PG_init(void)
 			   check_archive_directory, NULL, NULL);
 
 	MarkGUCPrefixReserved("basic_archive");
-
-	basic_archive_context = AllocSet

Re: recovery modules

2023-02-13 Thread Nathan Bossart

On Mon, Feb 13, 2023 at 04:55:58PM -0800, Nathan Bossart wrote:
> Okay.  I've added it back in v12 with the suggested adjustment for the
> memory context stuff.

Sorry for then noise, cfbot alerted me to a missing #include, which I've
added in v13.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 41619a415a5a544a8a5b1e16ccbadac69a997027 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 9 Feb 2023 13:49:46 -0800
Subject: [PATCH v13 1/1] restructure archive modules API

---
 contrib/basic_archive/basic_archive.c | 85 ---
 doc/src/sgml/archive-modules.sgml | 35 ++--
 src/backend/Makefile  |  2 +-
 src/backend/archive/Makefile  | 18 
 src/backend/archive/meson.build   |  5 ++
 .../{postmaster => archive}/shell_archive.c   | 30 ---
 src/backend/meson.build   |  1 +
 src/backend/postmaster/Makefile   |  1 -
 src/backend/postmaster/meson.build|  1 -
 src/backend/postmaster/pgarch.c   | 27 +++---
 src/backend/utils/misc/guc_tables.c   |  1 +
 src/include/archive/archive_module.h  | 58 +
 src/include/archive/shell_archive.h   | 24 ++
 src/include/postmaster/pgarch.h   | 39 -
 14 files changed, 241 insertions(+), 86 deletions(-)
 create mode 100644 src/backend/archive/Makefile
 create mode 100644 src/backend/archive/meson.build
 rename src/backend/{postmaster => archive}/shell_archive.c (78%)
 create mode 100644 src/include/archive/archive_module.h
 create mode 100644 src/include/archive/shell_archive.h

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 36b7a4814a..d7c227a10b 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -30,9 +30,9 @@
 #include 
 #include 
 
+#include "archive/archive_module.h"
 #include "common/int.h"
 #include "miscadmin.h"
-#include "postmaster/pgarch.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
@@ -40,14 +40,27 @@
 
 PG_MODULE_MAGIC;
 
+typedef struct BasicArchiveData
+{
+	MemoryContext context;
+} BasicArchiveData;
+
 static char *archive_directory = NULL;
-static MemoryContext basic_archive_context;
 
-static bool basic_archive_configured(void);
-static bool basic_archive_file(const char *file, const char *path);
+static void basic_archive_startup(ArchiveModuleState *state);
+static bool basic_archive_configured(ArchiveModuleState *state);
+static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path);
 static void basic_archive_file_internal(const char *file, const char *path);
 static bool check_archive_directory(char **newval, void **extra, GucSource source);
 static bool compare_files(const char *file1, const char *file2);
+static void basic_archive_shutdown(ArchiveModuleState *state);
+
+static const ArchiveModuleCallbacks basic_archive_callbacks = {
+	.startup_cb = basic_archive_startup,
+	.check_configured_cb = basic_archive_configured,
+	.archive_file_cb = basic_archive_file,
+	.shutdown_cb = basic_archive_shutdown
+};
 
 /*
  * _PG_init
@@ -67,10 +80,6 @@ _PG_init(void)
 			   check_archive_directory, NULL, NULL);
 
 	MarkGUCPrefixReserved("basic_archive");
-
-	basic_archive_context = AllocSetContextCreate(TopMemoryContext,
-  "basic_archive",
-  ALLOCSET_DEFAULT_SIZES);
 }
 
 /*
@@ -78,11 +87,28 @@ _PG_init(void)
  *
  * Returns the module's archiving callbacks.
  */
+const ArchiveModuleCallbacks *
+_PG_archive_module_init(void)
+{
+	return &basic_archive_callbacks;
+}
+
+/*
+ * basic_archive_startup
+ *
+ * Creates the module's memory context.
+ */
 void
-_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+basic_archive_startup(ArchiveModuleState *state)
 {
-	cb->check_configured_cb = basic_archive_configured;
-	cb->archive_file_cb = basic_archive_file;
+	BasicArchiveData *data;
+
+	data = (BasicArchiveData *) MemoryContextAllocZero(TopMemoryContext,
+	   sizeof(BasicArchiveData));
+	data->context = AllocSetContextCreate(TopMemoryContext,
+		  "basic_archive",
+		  ALLOCSET_DEFAULT_SIZES);
+	state->private_data = (void *) data;
 }
 
 /*
@@ -133,7 +159,7 @@ check_archive_directory(char **newval, void **extra, GucSource source)
  * Checks that archive_directory is not blank.
  */
 static bool
-basic_archive_configured(void)
+basic_archive_configured(ArchiveModuleState *state)
 {
 	return archive_directory != NULL && archive_directory[0] != '\0';
 }
@@ -144,10 +170,12 @@ basic_archive_configured(void)
  * Archives one file.
  */
 static bool
-basic_archive_file(const char *file, const char *path)
+basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
 {
 	sigjmp_buf	local_sigjmp_buf;
 	MemoryContext oldcontext;
+	BasicArchiveData *data = (BasicArchiveData *) (state->private_data);
+	MemoryContext

Re: Exit walsender before confirming remote flush in logical replication

2023-02-13 Thread Kyotaro Horiguchi

At Mon, 13 Feb 2023 08:27:01 +0530, Amit Kapila  wrote 
in 
> On Mon, Feb 13, 2023 at 7:26 AM Kyotaro Horiguchi
>  wrote:
> >
> > IMHO I vaguely don't like that we lose a means to specify the default
> > behavior here. And I'm not sure we definitely don't need other than
> > flush and immedaite for both physical and logical replication.
> >
> 
> If we can think of any use case that requires its extension then it
> makes sense to make it a non-boolean option but otherwise, let's keep
> things simple by having a boolean option.

What do you think about the need for explicitly specifying the
default?  I'm fine with specifying the default using a single word,
such as WAIT_FOR_REMOTE_FLUSH.

> > If it's
> > not the case, I don't object to make it a Boolean.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-12 09:31:36 -0800, Andres Freund wrote:
> Another thing: I think we should either avoid iterating over all the IOVs if
> we don't need them, or, even better, initialize the array as a constant, once.

I just tried to use pg_pwrite_zeros - and couldn't because it doesn't have an
offset parameter.  Huh, what lead to the function being so constrained?

- Andres

Re: Move defaults toward ICU in 16?

2023-02-13 Thread Jeff Davis

On Thu, 2023-02-02 at 05:13 -0800, Jeff Davis wrote:
> As a project, do we want to nudge users toward ICU as the collation
> provider as the best practice going forward?

One consideration here is security. Any vulnerability in ICU collation
routines could easily become a vulnerability in Postgres.

I looked at these lists:

https://www.cvedetails.com/vulnerability-list/vendor_id-17477/Icu-project.html
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=icu
https://unicode-org.atlassian.net/issues/?jql=labels%20%3D%20%22security%22
https://unicode-org.atlassian.net/issues/?jql=labels%20%3D%20%22was_sensitive%22

Here are the recent CVEs:

CVE-2021-30535 https://unicode-org.atlassian.net/browse/ICU-21587
CVE-2020-21913 https://unicode-org.atlassian.net/browse/ICU-20850
CVE-2020-10531 https://unicode-org.atlassian.net/browse/ICU-20958

But there are quite a few JIRAs that look concerning that don't have a
CVE assigned:

2021 https://unicode-org.atlassian.net/browse/ICU-21537
2021 https://unicode-org.atlassian.net/browse/ICU-21597
2021 https://unicode-org.atlassian.net/browse/ICU-21676
2021 https://unicode-org.atlassian.net/browse/ICU-21749

Not sure which of these are exploitable, and if they are, why they
don't have a CVE. If someone else finds more issues, please let me
know.

The good news is that the Chrome/Chromium projects are actively finding
and reporting issues.

I didn't look for comparable information about glibc, but I would guess
that exploitable memory errors in setlocale/strcoll are very rare,
otherwise it would be a security disaster for many projects.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Exit walsender before confirming remote flush in logical replication

2023-02-13 Thread Andres Freund

On 2023-02-14 10:05:40 +0900, Kyotaro Horiguchi wrote:
> What do you think about the need for explicitly specifying the
> default?  I'm fine with specifying the default using a single word,
> such as WAIT_FOR_REMOTE_FLUSH.

We obviously shouldn't force the option to be present. Why would we want to
break existing clients unnecessarily?  Without it the behaviour should be
unchanged from today's.

Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-02-13 Thread Kyotaro Horiguchi

At Mon, 13 Feb 2023 12:18:07 +0900, Michael Paquier  wrote 
in 
> On Mon, Feb 13, 2023 at 11:27:58AM +0900, Kyotaro Horiguchi wrote:
> > I think currently the output by --describe-config can be used only for
> > consulting while editing a (possiblly broken) config file.  Thus I
> > think it's no use showing GIC_DISALLOW_IN_FILE items there unless we
> > use help_config() for an on-session use.
> > 
> > On the other hand, don't we need to remove the condition
> > GUC_NOT_IN_SAMPLE from displayStruct? I think that help_config()
> > should show a value if it is marked as !GUC_DISALLOW_IN_FILE even if
> > it is GUC_NOT_IN_SAMPLE.  I'm not sure whether there's any variable
> > that are marked that way, though.
> 
> As in marked with GUC_NOT_IN_SAMPLE but not GUC_DISALLOW_IN_FILE?
> There are quite a lot, developer GUCs being one (say
> ignore_invalid_pages).  We don't want to list them in the sample file
> so as common users don't play with them, still they make sense if
> listed in a file.

Ah, right.  I think I faintly had them in my mind.

> If you add a check meaning that GUC_DISALLOW_IN_FILE implies
> GUC_NOT_IN_SAMPLE, where one change would need to be applied to
> config_file as all the other GUC_DISALLOW_IN_FILE GUCs already do
> that, you could remove GUC_DISALLOW_IN_FILE.  However,
> GUC_NOT_IN_SAMPLE should be around to not expose options, we don't
> want common users to know too much about.

Okay, I thought that "postgres --help-config" was a sort of developer
option, but your explanation above makes sense.

> The question about how much people rely on --describe-config these
> days is a good one, so perhaps there could be an argument in removing

Yeah, that the reason for my thought it was a developer option...

> GUC_NOT_IN_SAMPLE from the set.  TBH, I would be really surprised that
> anybody able to use a developer option writes an configuration file in
> an incorrect format and needs to use this option, though :)

Hmm.  I didn't directly link GUC_NOT_IN_SAMPLE to being a developer
option. But on second thought, it seems that it is. So, the current
code looks good for me now. Thanks for the explanation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Support logical replication of DDLs

2023-02-13 Thread Peter Smith

FYI - the latest patch cannot be applied.

See cfbot [1]

--
[1] http://cfbot.cputube.org/patch_42_3595.log

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: A new strategy for pull-up correlated ANY_SUBLINK

2023-02-13 Thread Andy Fan

Hi All:

   Sorry for the delay.  Once I saw Tom's reply on Nov 15,  I tried his
suggestion about "whether we need to restrict the optimization so
that it doesn't occur if the subquery contains outer references
falling outside available_rels. " quickly,  I'm sure such a restriction
can fix the bad case Richard provided. But even Tom "I'm not at
all clear about ..",  I'd like to prepare myself better for this
discussion and that took some time. Then an internal urgent
project  occupied my attention, the project is still in-progress
now:(


> There has been no updates on this thread for some time, so this has
> been switched as Returned with Feedback. Feel free to open it in the
> next commitfest if you plan to continue on this.
>
>
Thank you vignesh C for this,  I didn't give up yet,  probably I can
come back in the following month.

-- 
Best Regards
Andy Fan

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

2023-02-13 Thread Andrey Borodin

On Mon, Feb 13, 2023 at 4:39 PM Andres Freund  wrote:
>
> The problem I'm talking about is the increased overhead in InstrStopNode(),
> due to BufferUsageAccumDiff() getting more expensive.
>

Thanks, now I understand the problem better. According to godbolt.com
my patch doubles the number of instructions in this function. Unless
we compute only tables\indexes\matviews.
Anyway, without regarding functionality of this particular patch,
BufferUsageAccumDiff() does not seem slow to me. It's just a
branchless bunch of simd instructions. Performance of this function
might matter only when called gazillion times per second.

Best regards, Andrey Borodin.

for godbolt.cpp
Description: Binary data

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-13 Thread Kyotaro Horiguchi

At Mon, 13 Feb 2023 15:51:25 +0530, Amit Kapila  wrote 
in 
> I think we can introduce a new variable as last_feedback_time in the
> LogicalRepWorker structure and probably for the last_received, we can
> last_lsn in MyLogicalRepWorker as that seems to be updated correctly.
> I think it would be good to avoid global variables.

MyLogicalRepWorker is a global variable:p, but it is far better than a
bear one.

By the way, we are trying to send the status messages regularly, but
as Andres pointed out, worker does not read nor reply to keepalive
messages from publisher while delaying. It is not possible as far as
we choke the stream at the subscriber end. It doesn't seem to be a
practical problem, but IMHO I think he's right in terms of adherence
to the wire protocol, which was also one of my own initial concern.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

2023-02-13 Thread Andres Freund

Hi,

On 2023-02-13 18:14:58 -0800, Andrey Borodin wrote:
> On Mon, Feb 13, 2023 at 4:39 PM Andres Freund  wrote:
> >
> > The problem I'm talking about is the increased overhead in InstrStopNode(),
> > due to BufferUsageAccumDiff() getting more expensive.
> >
> 
> Thanks, now I understand the problem better. According to godbolt.com
> my patch doubles the number of instructions in this function. Unless
> we compute only tables\indexes\matviews.
> Anyway, without regarding functionality of this particular patch,
> BufferUsageAccumDiff() does not seem slow to me. It's just a
> branchless bunch of simd instructions. Performance of this function
> might matter only when called gazillion times per second.

It is called gazillions of times per second when you do an EXPLAIN (ANALYZE,
BUFFERS). Every invocation of an executor node calls it.

Here's a quick pgbench, showing todays code, with -O3, without assertions:
   298.396   0  SELECT generate_series(1, 1000) OFFSET 1000;
   397.400   0  EXPLAIN (ANALYZE, TIMING OFF) SELECT 
generate_series(1, 1000) OFFSET 1000;
   717.238   0  EXPLAIN (ANALYZE, TIMING ON) SELECT 
generate_series(1, 1000) OFFSET 1000;
   419.736   0  EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT 
generate_series(1, 1000) OFFSET 1000;
   761.575   0  EXPLAIN (ANALYZE, BUFFERS, TIMING ON) SELECT 
generate_series(1, 1000) OFFSET 1000;

The effect ends up a lot larger once you add in joins etc, because it ads
additional executor node that all have their instrumentation started/stopped.

Greetings,

Andres Freund

Re: Some revises in adding sorting path

2023-02-13 Thread David Rowley

I looked at the three patches and have some thoughts:

0001:

Does the newly added test have to be this complex?  I think it might
be better to just come up with the most simple test you can that uses
an incremental sort. I really can't think why the test requires a FOR
UPDATE, to test incremental sort, for example. The danger with making
a test more complex than it needs to be is that it frequently gets
broken by unrelated changes.  The complexity makes it harder for
people to understand the test's intentions and that increases the risk
that the test eventually does not test what it was originally meant to
test as the underlying code changes and the expected output is
updated.

0002:

I think the following existing comment does not seem to be true any longer:

> However, there's one more
> * possibility: it may make sense to sort the cheapest partial path
> * according to the required output order and then use Gather Merge.

You've removed the comment that talks about trying Incremental Sort ->
Gather Merge paths yet the code is still doing that, the two are just
more consolidated now, so perhaps you need to come up with a new
comment to explain what we're trying to achieve.

> * already (no need to deal with paths which have presorted
> * keys when incremental sort is disabled unless it's the
> * cheapest input path).

I think "input path" should be "partial path". (I maybe didn't get
that right in all places in 4a29eabd1).

0003:

Looking at gather_grouping_paths(), I see it calls
generate_useful_gather_paths() which generates a bunch of Gather Merge
paths after sorting the cheapest path and incrementally sorting any
partially sorted paths.  Then back in gather_grouping_paths(), we go
and create Gather Merge paths again, but this time according to the
group_pathkeys instead of the query_pathkeys. I know you're not really
changing anything here, but as I'm struggling to understand why
exactly we're creating two sets of Gather Merge paths, it makes it a
bit scary to consider changing anything in this area. I've not really
found any comments that can explain to me sufficiently well enough so
that I understand why this needs to be done.  Do you know?

David

RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-13 Thread wangw.f...@fujitsu.com

On Thur, Feb 7, 2023 15:29 PM I wrote:
> On Wed, Feb 1, 2023 20:07 PM Melih Mutlu  wrote:
> > Thanks for pointing out this review. I somehow skipped that, sorry.
> >
> > Please see attached patches.
> 
> Thanks for updating the patch set.
> Here are some comments.

Hi, here are some more comments on patch v7-0001*:

1. The new comments atop the function CreateDecodingContext
+ * need_full_snapshot
+ * if true, create a snapshot able to read all tables,
+ * otherwise do not create any snapshot.

I think if 'need_full_snapshot' is false, it means we will create a snapshot
that can read only catalogs. (see SnapBuild->building_full_snapshot)

===

2. This are two questions I'm not sure about.
2a.
Because pg-doc has the following description in [1]: (option "SNAPSHOT 'use'")
```
'use' will use the snapshot for the current transaction executing the command.
This option must be used in a transaction, and CREATE_REPLICATION_SLOT must be
the first command run in that transaction.
```
So I think in the function CreateDecodingContext, when "need_full_snapshot" is
true, we seem to need the following check, just like in the function
CreateInitDecodingContext:
```
if (IsTransactionState() &&
GetTopTransactionIdIfAny() != InvalidTransactionId)
ereport(ERROR,
(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
 errmsg("cannot create logical replication slot 
in transaction that has performed writes")));
```

2b.
It seems that we also need to invoke the function
CheckLogicalDecodingRequirements in the new function CreateReplicationSnapshot,
just like the function CreateReplicationSlot and the function
StartLogicalReplication.

Is there any reason not to do these two checks? Please let me know if I missed
something.

===

3. The invocation of startup_cb_wrapper in the function CreateDecodingContext.
I think we should change the third input parameter to true when invoke function 
startup_cb_wrapper for CREATE_REPLICATION_SNAPSHOT. BTW, after applying patch
v10-0002*, these settings will be inconsistent when sync workers use
"CREATE_REPLICATION_SLOT" and "CREATE_REPLICATION_SNAPSHOT" to take snapshots.
This input parameter (true) will let us disable streaming and two-phase
transactions in function pgoutput_startup. See the last paragraph of the commit
message for 4648243 for more details.

[1] - https://www.postgresql.org/docs/devel/protocol-replication.html

Regards,
Wang Wei

1 2 >

1 - 100 of 110 matches

Mail list logo