Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-25 Thread Katsuragi Yuta

On 2023-01-23 14:40, Hayato Kuroda (Fujitsu) wrote:

Dear Ted,

Thanks for reviewing! PSA new version.

For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch , 
`pqConnCheck_internal` only has one caller which is quite short.

Can pqConnCheck_internal and PQConnCheck be merged into one func ?


I divided the function for feature expandability. Currently it works
on linux platform,
but the limitation should be removed in future and internal function
will be longer.
Therefore I want to keep this style.


+int
+PQCanConnCheck(void)

It seems the return value should be of bool type.


I slightly changed the returned value like true/false. But IIUC libpq 
functions

cannot define as "bool" datatype. E.g. PQconnectionNeedsPassword()
returns true/false,
but the function is defined as int.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Thank you for updating the patch!

+/* Check whether the postgres server is still alive or not */
+extern int PQConnCheck(PGconn *conn);
+extern int PQCanConnCheck(void);

Aren't these PQconnCheck and PQcanConnCheck? I think the first letter
following 'PQ' should be lower case.

regards.

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Operation log for major operations

2023-01-25 Thread Dmitry Koval

Hi!

Maybe another discussion thread can be created for the consolidation of 
XX_file_exists functions.


Usually XX_file_exists functions are simple. They contain single call 
stat() or open() and specific error processing after this call.


Likely the unified function will be too complex to cover all the uses of 
XX_file_exists functions.


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-25 Thread Kyotaro Horiguchi
At Wed, 25 Jan 2023 12:30:19 +0530, Amit Kapila  wrote 
in 
> On Wed, Jan 25, 2023 at 11:57 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Tue, 24 Jan 2023 12:19:04 +, "Takamichi Osumi (Fujitsu)" 
> >  wrote in
> > > Attached the patch v20 that has incorporated all comments so far.
> >
> ...
> >
> >
> > +  in which case no additional wait is necessary. If the system 
> > clocks
> > +  on publisher and subscriber are not synchronized, this may lead 
> > to
> > +  apply changes earlier than expected, but this is not a major 
> > issue
> > +  because this parameter is typically much larger than the time
> > +  deviations between servers. Note that if this parameter is set 
> > to a
> >
> > This doesn't seem to fit our documentation. It is not our business
> > whether a certain amount deviation is critical or not. How about
> > somethig like the following?
> >
> 
> But we have a similar description for 'recovery_min_apply_delay' [1].
> See "...If the system clocks on primary and standby are not
> synchronized, this may lead to recovery applying records earlier than
> expected; but that is not a major issue because useful settings of
> this parameter are much larger than typical time deviations between
> servers."

Mmmm. I thought that we might be able to gather the description
(including other common descriptions, if any), but I didn't find an
appropreate place..

Okay. I agree to the current description. Thanks for the kind
explanation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: document the need to analyze partitioned tables

2023-01-25 Thread David Rowley
On Wed, 25 Jan 2023 at 19:46, Laurenz Albe  wrote:
> Did you see Justin's wording suggestion in
> https://postgr.es/m/20230118174919.GA9837%40telsasoft.com ?
> He didn't attach it as a patch, so you may have missed it.
> I was pretty happy with that.

I didn't pay too much attention as I tend to apply patches to obtain
the full context of the change.  Manually trying to apply a patch from
an email is not something I like to do.

> I think your first sentence it a bit clumsy and might be streamlined to
>
>   Partitioned tables do not directly store tuples and consequently do not
>   require autovacuum to perform any VACUUM operations.

That seems better than what I had.

> Also, I am a little bit unhappy about
>
> 1. Your paragraph states that partitioned table need no autovacuum,
>but doesn't state unmistakably that they will never be treated
>by autovacuum.

hmm. I assume the reader realises from the text that lack of any
tuples means VACUUM is not required.  The remaining part of what
autovacuum does not do is explained when the text goes on to say that
ANALYZE operations are also not performed on partitioned tables. I'm
not sure what is left that's mistakable there.

> 2. You make a distinction between table partitions and "normal tables",
>but really there is no distiction.

We may have different mental models here. This relates to the part
that I wasn't keen on in your patch, i.e:

+The partitions of a partitioned table are normal tables and get processed
+by autovacuum

While I agree that the majority of partitions are likely to be
relkind='r', which you might ordinarily consider a "normal table", you
just might change your mind when you try to INSERT or UPDATE records
that would violate the partition constraint. Some partitions might
also be themselves partitioned tables and others might be foreign
tables. That does not really matter much when it comes to what
autovacuum does or does not do, but I'm not really keen to imply in
our documents that partitions are "normal tables".

David




Re: old_snapshot_threshold bottleneck on replica

2023-01-25 Thread Maxim Orlov
On Tue, 24 Jan 2023 at 18:46, Robert Haas  wrote:

>
> (1) that mutex also protects something else and the existing comment
> is wrong, or
>
> (2) the mutex should have been removed but the patch neglected to do so, or
>
> (3) the mutex is still needed for some reason, in which case either
> (3a) the patch isn't actually safe or (3b) the patch needs comments to
> explain what the new synchronization model is.
>
> Yes, you're absolutely right. And my first intention was to remove this
mutex completely.
But in TransactionIdLimitedForOldSnapshots these variable is using
conjointly. So, I'm not
sure, is it completely safe to remove mutex. Actually, removing mutex and
switch to atomics
was my first choice. I've run all the tests and no problems were found.
But, at that time I choose
to be more conservative. Anyway, here is the new variant.

-- 
Best regards,
Maxim Orlov.


v2-0001-Use-atomic-old_snapshot_threshold.patch
Description: Binary data


Re: plpython vs _POSIX_C_SOURCE

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-24 23:37:44 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Patches attached.
> 
> +1 for 0001.

Cool, will push tomorrow.


> I'm still nervous about 0002.  However, maybe the cases that we had trouble
> with are legacy issues that nobody cares about anymore in 2023.  We can
> always look for another answer if we get complaints, I guess.

Yea, it's a patch that should be easily revertable, if it comes to that. I'll
add a note to the commit message about potentially needing to do that if
there's not easily addressed fallout.

Greetings,

Andres Freund




Re: to_hex() for negative inputs

2023-01-25 Thread Aleksander Alekseev
Hi Dean,

> I don't see how a couple of extra arguments will expand to hundreds.

Maybe I was exaggerating, but the point is that adding extra flags for
every possible scenario is a disadvantageous approach in general.
There is no need to increase the code base, the amount of test cases
and the amount of documentation every time someone has an idea "in
rare cases I also may want to do A or B, let's add a flag for this".

> Which is broken for INT_MIN:
>
> select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
> from ( values (-2147483648) ) as s(x);
>
> ERROR:  integer out of range

I'm afraid the behavior of something like to_hex(X, with_sign => true)
is going to be exactly the same. There is no safe and consistent way
to calculate abs(INT_MIN).

-- 
Best regards,
Aleksander Alekseev




Re: heapgettup() with NoMovementScanDirection unused in core?

2023-01-25 Thread David Rowley
On Wed, 25 Jan 2023 at 13:55, Melanie Plageman
 wrote:
> David Rowley and I were discussing how to test the
> NoMovementScanDirection case for heapgettup() and heapgettup_pagemode()
> in [1] (since there is not currently coverage). We are actually
> wondering if it is dead code (in core).

Yeah, so I see nothing in core that can cause heapgettup() or
heapgettup_pagemode() to be called with NoMovementScanDirection.  I
imagine one possible way to hit it might be in an extension where
someone's written their own ExecutorRun_hook that does not have the
same NoMovementScanDirection check that standard_ExecutorRun() has.

So far my thoughts are that we should just rip it out and see if
anyone complains. If they complain loudly enough, then it's easy
enough to put it back without any compatibility issues. However, if
it's for the ExecutorRun_hook reason, then they'll likely be better to
add the same NoMovementScanDirection as we have in
standard_ExecutorRun().

I'm just not keen on refactoring the code without the means to test
that the new code actually works.

Does anyone know of any reason why we shouldn't ditch the nomovement
code in heapgettup/heapgettup_pagemode?

Maybe we'd also want to Assert that the direction is either forwards
or backwards in table_scan_getnextslot and
table_scan_getnextslot_tidrange. (I see heap_getnextslot_tidrange()
does not have any handling for NoMovementScanDirection, so if this is
not dead code, that probably needs to be fixed)

David




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-24 20:59:04 -0800, Andres Freund wrote:
> I find this to be awkward code. The booleans are kinda pointless, and the
> tableagevac case is hard to follow because trigger is set elsewhere.
>
> I can give reformulating it a go. Need to make some food first.

Here's a draft of what I am thinking of. Not perfect yet, but I think it looks
better.

The pg_stat_activity output looks like this right now:

autovacuum due to table XID age: VACUUM public.pgbench_accounts (to prevent 
wraparound)

Why don't we drop the "(to prevent wraparound)" now?

And I still think removing the "table " bit would be an improvement.

Greetings,

Andres Freund
diff --git i/src/include/commands/vacuum.h w/src/include/commands/vacuum.h
index 4450003b4a8..13f70a1f654 100644
--- i/src/include/commands/vacuum.h
+++ w/src/include/commands/vacuum.h
@@ -237,7 +237,8 @@ typedef struct VacuumParams
 			 * use default */
 	int			multixact_freeze_table_age; /* multixact age at which to scan
 			 * whole table */
-	bool		is_wraparound;	/* force a for-wraparound vacuum */
+	bool		is_wraparound;	/* antiwraparound autovacuum? */
+	AutoVacType trigger;		/* autovacuum trigger condition, if any */
 	int			log_min_duration;	/* minimum execution threshold in ms at
 	 * which autovacuum is logged, -1 to use
 	 * default */
@@ -328,6 +329,7 @@ extern void vacuum(List *relations, VacuumParams *params,
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
 			 int *nindexes, Relation **Irel);
 extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);
+extern const char *vac_autovacuum_trigger_msg(AutoVacType trigger);
 extern double vac_estimate_reltuples(Relation relation,
 	 BlockNumber total_pages,
 	 BlockNumber scanned_pages,
diff --git i/src/backend/access/heap/vacuumlazy.c w/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f38..8a64dce6180 100644
--- i/src/backend/access/heap/vacuumlazy.c
+++ w/src/backend/access/heap/vacuumlazy.c
@@ -639,6 +639,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
  * implies aggressive.  Produce distinct output for the corner
  * case all the same, just in case.
  */
+Assert(params->trigger == AUTOVACUUM_TABLE_XID_AGE ||
+	   params->trigger == AUTOVACUUM_TABLE_MXID_AGE);
 if (vacrel->aggressive)
 	msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
 else
@@ -656,6 +658,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			 vacrel->relnamespace,
 			 vacrel->relname,
 			 vacrel->num_index_scans);
+			if (!verbose)
+appendStringInfo(&buf, _("triggered by: %s\n"),
+ vac_autovacuum_trigger_msg(params->trigger));
 			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
 			 vacrel->removed_pages,
 			 new_rel_pages,
diff --git i/src/backend/commands/vacuum.c w/src/backend/commands/vacuum.c
index 7b1a4b127eb..18278acb557 100644
--- i/src/backend/commands/vacuum.c
+++ w/src/backend/commands/vacuum.c
@@ -273,8 +273,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		params.multixact_freeze_table_age = -1;
 	}
 
-	/* user-invoked vacuum is never "for wraparound" */
+	/* user-invoked vacuum never uses these autovacuum-only flags */
 	params.is_wraparound = false;
+	params.trigger = AUTOVACUUM_NONE;
 
 	/* user-invoked vacuum uses VACOPT_VERBOSE instead of log_min_duration */
 	params.log_min_duration = -1;
@@ -1874,7 +1875,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, bool skip_privs)
 		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 		MyProc->statusFlags |= PROC_IN_VACUUM;
 		if (params->is_wraparound)
+		{
+			Assert(params->trigger == AUTOVACUUM_TABLE_XID_AGE ||
+   params->trigger == AUTOVACUUM_TABLE_MXID_AGE);
 			MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
+		}
 		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
 		LWLockRelease(ProcArrayLock);
 	}
@@ -2176,6 +2181,30 @@ vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode)
 	pfree(Irel);
 }
 
+/*
+ * Return translatable string describing autovacuum trigger threshold type
+ */
+const char *
+vac_autovacuum_trigger_msg(AutoVacType trigger)
+{
+	switch (trigger)
+	{
+		case AUTOVACUUM_NONE:
+			return _("none");
+		case AUTOVACUUM_TABLE_XID_AGE:
+			return _("table XID age");
+		case AUTOVACUUM_TABLE_MXID_AGE:
+			return _("table MXID age");
+		case AUTOVACUUM_DEAD_TUPLES:
+			return _("dead tuples");
+		case AUTOVACUUM_INSERTED_TUPLES:
+			return _("inserted tuples");
+	}
+
+	Assert(false);/* cannot be reached */
+	return NULL;
+}
+
 /*
  * vacuum_delay_point --- check for interrupts and cost-based delay.
  *
diff --git i/src/backend/postmaster/autovacuum.c w/src/backend/postmaster/autovacuum.c
index f5ea381c53e..18d150c975b 100644
--- i/src/backend/postmaster/autovacuum.c
+++ w/src/backend/postmaster/autovacuum.c

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-25 Thread Dean Rasheed
On Wed, 18 Jan 2023 at 15:25, Justin Pryzby  wrote:
>
> TBH, I think the best approach is what I did in:
> 0001-report-top-parent-progress-for-CREATE-INDEX.txt
>
> That's a minimal patch, ideal for backpatching.
>
> ..which defines/clarifies that the progress reporting is only for
> *direct* children.  That avoids the need to change any data structures,
> and it's what was probably intended by the original patch, which doesn't
> seem to have considered intermediate partitioned tables.
>
> I think it'd be fine to re-define that in some future release, to allow
> showing indirect children (probably only "leaves", and not intermediate
> partitioned tables).  Or "total_bytes" or other global progress.
>

Hmm. My expectation as a user is that partitions_total includes both
direct and indirect (leaf) child partitions, that it is set just once
at the start of the process, and that partitions_done increases from
zero to partitions_total as the index-build proceeds. I think that
should be achievable with a minimally invasive patch that doesn't
change any data structures.

I agree with all the review comments Tomas posted. In particular, this
shouldn't need any changes to IndexStmt. I think the best approach
would be to just add a new function to backend_progress.c that offsets
a specified progress parameter by a specified amount, so that you can
just increment partitions_done by one or more, at the appropriate
points.

Regards,
Dean




Re: to_hex() for negative inputs

2023-01-25 Thread Dean Rasheed
On Wed, 25 Jan 2023 at 09:02, Aleksander Alekseev
 wrote:
>
> > I don't see how a couple of extra arguments will expand to hundreds.
>
> Maybe I was exaggerating, but the point is that adding extra flags for
> every possible scenario is a disadvantageous approach in general.
> There is no need to increase the code base, the amount of test cases
> and the amount of documentation every time someone has an idea "in
> rare cases I also may want to do A or B, let's add a flag for this".
>

OK, but the point was that we've just added support for accepting hex
inputs in a particular format, so I think it would be useful if
to_hex() could produce outputs compatible with that.

> > Which is broken for INT_MIN:
> >
> > select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x))
> > from ( values (-2147483648) ) as s(x);
> >
> > ERROR:  integer out of range
>
> I'm afraid the behavior of something like to_hex(X, with_sign => true)
> is going to be exactly the same. There is no safe and consistent way
> to calculate abs(INT_MIN).
>

Of course there is. This is easy to code in C using unsigned ints,
without resorting to abs() (yes, I'm aware that abs() is undefined for
INT_MIN).

Regards,
Dean




Re: to_hex() for negative inputs

2023-01-25 Thread Aleksander Alekseev
Hi Dean,

> Of course there is. This is easy to code in C using unsigned ints,
> without resorting to abs() (yes, I'm aware that abs() is undefined for
> INT_MIN).

So in your opinion what is the expected result of to_hex(INT_MIN,
with_sign => true)?

-- 
Best regards,
Aleksander Alekseev




RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-25 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reading the patch! PSA new version.

> Thank you for updating the patch!
> 
> +/* Check whether the postgres server is still alive or not */
> +extern int PQConnCheck(PGconn *conn);
> +extern int PQCanConnCheck(void);
> 
> Aren't these PQconnCheck and PQcanConnCheck? I think the first letter
> following 'PQ' should be lower case.

I cannot find any rules about it, but seems right. Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v27-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch
Description:  v27-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch


v27-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v27-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v27-0003-add-test.patch
Description: v27-0003-add-test.patch


Re: to_hex() for negative inputs

2023-01-25 Thread Dean Rasheed
On Wed, 25 Jan 2023 at 10:57, Aleksander Alekseev
 wrote:
>
> > Of course there is. This is easy to code in C using unsigned ints,
> > without resorting to abs() (yes, I'm aware that abs() is undefined for
> > INT_MIN).
>
> So in your opinion what is the expected result of to_hex(INT_MIN,
> with_sign => true)?
>

"-8000" or "-0x8000", depending on whether the prefix is
requested. The latter is legal input for an integer, which is the
point (to allow round-tripping):

SELECT int '-0x8000';
int4
-
 -2147483648
(1 row)

SELECT pg_typeof(-0x8000);
 pg_typeof
---
 integer
(1 row)

Regards,
Dean




Re: Logical replication timeout problem

2023-01-25 Thread Amit Kapila
On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patch.
>

I think the patch missed to handle the case of non-transactional
messages which was previously getting handled. I have tried to address
that in the attached. Is there a reason that shouldn't be handled?
Apart from that changed a few comments. If my understanding is
correct, then we need to change the callback update_progress_txn name
as well because now it needs to handle both transactional and
non-transactional changes. How about update_progress_write? We
accordingly need to change the comments for the callback.

Additionally, I think we should have a test case to show we don't time
out because of not processing non-transactional messages. See
pgoutput_message for cases where it doesn't process the message.

-- 
With Regards,
Amit Kapila.


v7-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description: Binary data


Re: Improve GetConfigOptionValues function

2023-01-25 Thread Nitin Jadhav
> After looking at this, it seemed to me that the factorization
> wasn't quite right after all: specifically, the new function
> could be used in several more places if it confines itself to
> being a privilege check and doesn't consider GUC_NO_SHOW_ALL.
> So more like the attached.
>
> You could argue that the factorization is illusory since each
> of these additional call sites has an error message that knows
> exactly what the conditions are to succeed.  But if we want to
> go that direction then I'd be inclined to forget about the
> permissions-check function altogether and just test the
> conditions in-line everywhere.

I am ok with the above changes. I thought of modifying the
ConfigOptionIsVisible function to take an extra argument, say
validate_superuser_only. If this argument is true then it only
considers GUC_SUPERUSER_ONLY check and return based on that. Otherwise
it considers both GUC_SUPERUSER_ONLY and GUC_NO_SHOW_ALL and returns
based on that. I understand that this just complicates the function
and has other disadvantages. Instead of testing the conditions
in-line, I prefer the use of function as done in v4 patch as it
reduces the code size.

Thanks & Regards,
Nitin Jadhav

On Mon, Jan 23, 2023 at 9:51 PM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > LGTM. I've marked it RfC.
>
> After looking at this, it seemed to me that the factorization
> wasn't quite right after all: specifically, the new function
> could be used in several more places if it confines itself to
> being a privilege check and doesn't consider GUC_NO_SHOW_ALL.
> So more like the attached.
>
> You could argue that the factorization is illusory since each
> of these additional call sites has an error message that knows
> exactly what the conditions are to succeed.  But if we want to
> go that direction then I'd be inclined to forget about the
> permissions-check function altogether and just test the
> conditions in-line everywhere.
>
> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in
> get_explain_guc_options, because it seems redundant given
> the preceding GUC_EXPLAIN check.  It's unlikely we'd ever have
> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ...
> but if we did, shouldn't the former take precedence here anyway?
>
> regards, tom lane
>




Re: Improve GetConfigOptionValues function

2023-01-25 Thread Nitin Jadhav
>>> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in
>>> get_explain_guc_options, because it seems redundant given
>>> the preceding GUC_EXPLAIN check.  It's unlikely we'd ever have
>>> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ...
>>> but if we did, shouldn't the former take precedence here anyway?
>
>> You're right, but there's nothing that prevents users writing GUCs
>> with GUC_EXPLAIN and GUC_NO_SHOW_ALL.
>
> "Users"?  You do realize those flags are only settable by C code,
> right?  Moreover, you haven't explained why it would be good that
> you can't get at the behavior that a GUC is both shown in EXPLAIN
> and not shown in SHOW ALL.  If you want "not shown by either",
> that's already accessible by setting only the GUC_NO_SHOW_ALL
> flag.  So I'd almost argue this is a bug fix, though I concede
> it's a bit hard to imagine why somebody would want that choice.
> Still, if we have two independent flags they should produce four
> behaviors, not just three.

I agree that the developer can use both GUC_NO_SHOW_ALL and
GUC_EXPLAIN knowingly or unknowingly for a single GUC. If used by
mistake then according to the existing code (without patch),
GUC_NO_SHOW_ALL takes higher precedence whether it is marked first or
last in the code. I am more convinced with this behaviour as I feel it
is safer than exposing the information which the developer might not
have intended.

Thanks & Regards,
Nitin Jadhav

On Tue, Jan 24, 2023 at 8:43 PM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > On Mon, Jan 23, 2023 at 9:51 PM Tom Lane  wrote:
> >> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in
> >> get_explain_guc_options, because it seems redundant given
> >> the preceding GUC_EXPLAIN check.  It's unlikely we'd ever have
> >> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ...
> >> but if we did, shouldn't the former take precedence here anyway?
>
> > You're right, but there's nothing that prevents users writing GUCs
> > with GUC_EXPLAIN and GUC_NO_SHOW_ALL.
>
> "Users"?  You do realize those flags are only settable by C code,
> right?  Moreover, you haven't explained why it would be good that
> you can't get at the behavior that a GUC is both shown in EXPLAIN
> and not shown in SHOW ALL.  If you want "not shown by either",
> that's already accessible by setting only the GUC_NO_SHOW_ALL
> flag.  So I'd almost argue this is a bug fix, though I concede
> it's a bit hard to imagine why somebody would want that choice.
> Still, if we have two independent flags they should produce four
> behaviors, not just three.
>
> regards, tom lane




Re: drop postmaster symlink

2023-01-25 Thread Devrim Gündüz
Hi,

On Wed, 2023-01-25 at 08:54 +0100, Peter Eisentraut wrote:
> 
> Apart from your concerns, it appears there is consensus for making
> this  change.  The RPM packaging scripts can obviously be fixed
> easily for  this.  Do you have an objection to making this change?

I'm inclined to create the symlink in the RPMs to make users' lives
(and my life) easier. So, no objection from here.

Regards,

-- 
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-25 Thread shveta malik
On Mon, Jan 23, 2023 at 6:30 PM Melih Mutlu  wrote:
>
> Hi,
>
> Thanks for your review.
> Attached updated versions of the patches.
>

Hello,
I am still in the process of reviewing the patch, before that I tried
to run below test:

--publisher
create table tab1(id int , name varchar);
create table tab2(id int primary key , name varchar);
create table tab3(id int primary key , name varchar);
Insert into tab1 values(10, 'a');
Insert into tab1 values(20, 'b');
Insert into tab1 values(30, 'c');

Insert into tab2 values(10, 'a');
Insert into tab2 values(20, 'b');
Insert into tab2 values(30, 'c');

Insert into tab3 values(10, 'a');
Insert into tab3 values(20, 'b');
Insert into tab3 values(30, 'c');

create publication mypub for table tab1, tab2, tab3;

--subscriber
create table tab1(id int , name varchar);
create table tab2(id int primary key , name varchar);
create table tab3(id int primary key , name varchar);
create subscription mysub connection 'dbname=postgres host=localhost
user=shveta port=5432' publication mypub;

--I see initial data copied, but new catalog columns srrelslotname
and srreloriginname are not updated:
postgres=# select sublastusedid from pg_subscription;
 sublastusedid
---
 2

postgres=# select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn  | srrelslotname | srreloriginname
-+-++---+---+-
   16409 |   16384 | r  | 0/15219E0 |   |
   16409 |   16389 | r  | 0/15219E0 |   |
   16409 |   16396 | r  | 0/15219E0 |   |

When are these supposed to be updated? I thought the slotname created
will be updated here. Am I missing something here?

Also the v8 patch does not apply on HEAD, giving merge conflicts.

thanks
Shveta




Re: Fix to enum hashing for dump and restore

2023-01-25 Thread Andrew
Those are excellent points.  
We will investigate adjusting pg_dump behavior,
as this is primarily a dump+restore issue.

Thank you!

-Andrew J Repp (VMware)

On Tue, Jan 24, 2023, at 9:56 PM, Tom Lane wrote:
> Andrew  writes:
> > I have discovered a bug in one usage of enums. If a table with hash
> > partitions uses an enum as a partitioning key, it can no longer be
> > backed up and restored correctly. This is because enums are represented
> > simply as oids, and the hash function for enums hashes that oid to
> > determine partition distribution. Given the way oids are assigned, any
> > dump+restore of a database with such a table may fail with the error
> > "ERROR: new row for relation "TABLENAME" violates partition constraint".
> 
> Ugh, that was not well thought out :-(.  I suppose this isn't a problem
> for pg_upgrade, which should preserve the enum value OIDs, but an
> ordinary dump/restore will indeed hit this.
> 
> > I have written a patch to fix this bug (attached), by instead having the
> > hashenum functions look up the enumsortorder ID of the value being
> > hashed. These are deterministic across databases, and so allow for
> > stable dump and restore.
> 
> Unfortunately, I'm not sure those are as deterministic as all that.
> They are floats, so there's a question of roundoff error, not to
> mention cross-platform variations in what a float looks like.  (At the
> absolute minimum, I think we'd have to change your patch to force
> consistent byte ordering of the floats.)  Actually though, roundoff
> error wouldn't be a problem for the normal exact-integer values of
> enumsortorder.  Where it could come into play is with the fractional
> values used after you insert a value into the existing sort order.
> And then the whole idea fails, because a dump/restore won't duplicate
> those fractional values.
> 
> Another problem with this approach is that we can't get from here to there
> without a guaranteed dump/reload failure, since it's quite unlikely that
> the partition assignment will be the same when based on enumsortorder
> as it was when based on OIDs.  Worse, it also breaks the pg_upgrade case.
> 
> I wonder if it'd work to make pg_dump force --load-via-partition-root
> mode when a hashed partition key includes an enum.
> 
> regards, tom lane
> 

CREATE ROLE bug?

2023-01-25 Thread Bruce Momjian
This works in PG 15:

CREATE ROLE service CREATEROLE;
CREATE ROLE service1 WITH LOGIN IN ROLE service;
SET SESSION AUTHORIZATION service;
CREATE ROLE service2 WITH LOGIN IN ROLE service;

but generates an error in git master:

CREATE ROLE service CREATEROLE;
CREATE ROLE service1 WITH LOGIN IN ROLE service;
SET SESSION AUTHORIZATION service;
CREATE ROLE service2 WITH LOGIN IN ROLE service;
--> ERROR:  must have admin option on role "service"

If I make 'service' a superuser, it works:

CREATE ROLE service SUPERUSER;
CREATE ROLE service1 WITH LOGIN IN ROLE service;
SET SESSION AUTHORIZATION service;
CREATE ROLE service2 WITH LOGIN IN ROLE service;

It is probably related to this discussion and change:

   
https://www.postgresql.org/message-id/flat/CA+TgmobGds7oefDjZUY+k_J7p1sS=pTq3sZ060qdb=okei1...@mail.gmail.com

I am not sure if the behavior is wrong, the error message is wrong, or
it is working as expected.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: plpython vs _POSIX_C_SOURCE

2023-01-25 Thread Robert Haas
On Tue, Jan 24, 2023 at 11:37 PM Tom Lane  wrote:
> Andres Freund  writes:
> > Patches attached.
>
> +1 for 0001.  I'm still nervous about 0002.  However, maybe the
> cases that we had trouble with are legacy issues that nobody cares
> about anymore in 2023.  We can always look for another answer if
> we get complaints, I guess.

It feels like things are changing so fast these days that whatever was
happening 12 years ago is not likely to be relevant. Compilers change
enough to cause warnings and even errors in just a few years. A decade
is long enough for an entire platform to become irrelevant.

Plus, the cost of experimentation here seems very low. Sure, something
might break, but if it does, we can just change it back, or change it
again. That's not really a big deal. The thing that would be a big
deal, maybe, is if we released and only found out afterward that this
caused some subtle and horrible problem for which we had no
back-patchable fix, but that seems pretty unlikely.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Syncrep and improving latency due to WAL throttling

2023-01-25 Thread Jakub Wartak
Hi,

attached is proposal idea by Tomas (in CC) for protecting and
prioritizing OLTP latency on syncrep over other heavy WAL hitting
sessions. This is the result of internal testing and research related
to the syncrep behavior with Tomas, Alvaro and me. The main objective
of this work-in-progress/crude patch idea (with GUC default disabled)
is to prevent build-up of lag against the synchronous standby. It
allows DBA to maintain stable latency for many backends when in
parallel there is some WAL-heavy activity happening (full table
rewrite, VACUUM, MV build, archiving, etc.). In other words it allows
slow down of any backend activity. Any feedback on such a feature is
welcome, including better GUC name proposals ;) and conditions in
which such feature should be disabled even if it would be enabled
globally (right now only anti- wraparound VACUUM comes to mind, it's
not in the patch).

Demo; Given:
- two DBs in syncrep configuration and artificial RTT 10ms latency
between them (introduced via tc qdisc netem)
- insertBIG.sql = "insert into bandwidthhog select repeat('c', 1000)
from generate_series(1, 50);" (50MB of WAL data)
- pgbench (8c) and 1x INSERT session

There are clearly visible drops of pgbench (OLTP) latency when the WAL
socket is saturated:

with 16devel/master and synchronous_commit_flush_wal_after=0
(disabled, default/baseline):
postgres@host1:~$ pgbench -n -R 50 -c 8 -T 15 -P 1
pgbench (16devel)
progress: 1.0 s, 59.0 tps, lat 18.840 ms stddev 11.251, 0 failed, lag 0.059 ms
progress: 2.0 s, 48.0 tps, lat 14.332 ms stddev 4.272, 0 failed, lag 0.063 ms
progress: 3.0 s, 56.0 tps, lat 15.383 ms stddev 6.270, 0 failed, lag 0.061 ms
progress: 4.0 s, 51.0 tps, lat 15.104 ms stddev 5.850, 0 failed, lag 0.061 ms
progress: 5.0 s, 47.0 tps, lat 15.184 ms stddev 5.472, 0 failed, lag 0.063 ms
progress: 6.0 s, 23.0 tps, lat 88.495 ms stddev 141.845, 0 failed, lag 0.064 ms
progress: 7.0 s, 1.0 tps, lat 999.053 ms stddev 0.000, 0 failed, lag 0.077 ms
progress: 8.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed, lag 0.000 ms
progress: 9.0 s, 1.0 tps, lat 2748.142 ms stddev NaN, 0 failed, lag 0.072 ms
progress: 10.0 s, 68.1 tps, lat 3368.267 ms stddev 282.842, 0 failed,
lag 2911.857 ms
progress: 11.0 s, 97.0 tps, lat 2560.750 ms stddev 216.844, 0 failed,
lag 2478.261 ms
progress: 12.0 s, 96.0 tps, lat 1463.754 ms stddev 376.276, 0 failed,
lag 1383.873 ms
progress: 13.0 s, 94.0 tps, lat 616.243 ms stddev 230.673, 0 failed,
lag 527.241 ms
progress: 14.0 s, 59.0 tps, lat 48.265 ms stddev 72.533, 0 failed, lag 15.181 ms
progress: 15.0 s, 39.0 tps, lat 14.237 ms stddev 6.073, 0 failed, lag 0.063 ms
transaction type: 
[..]
latency average = 931.383 ms
latency stddev = 1188.530 ms
rate limit schedule lag: avg 840.170 (max 3605.569) ms

session2 output:
postgres=# \i insertBIG.sql
Timing is on.
INSERT 0 50
Time: 4119.485 ms (00:04.119)

This new GUC makes it possible for the OLTP traffic to be less
affected (latency-wise) when the heavy bulk traffic hits. With
synchronous_commit_flush_wal_after=1024 (kB) it's way better, but
latency rises up to 45ms:
postgres@host1:~$ pgbench -n -R 50 -c 8 -T 15 -P 1
pgbench (16devel)
progress: 1.0 s, 52.0 tps, lat 17.300 ms stddev 10.178, 0 failed, lag 0.061 ms
progress: 2.0 s, 51.0 tps, lat 19.490 ms stddev 12.626, 0 failed, lag 0.061 ms
progress: 3.0 s, 48.0 tps, lat 14.839 ms stddev 5.429, 0 failed, lag 0.061 ms
progress: 4.0 s, 53.0 tps, lat 24.635 ms stddev 13.449, 0 failed, lag 0.062 ms
progress: 5.0 s, 48.0 tps, lat 17.999 ms stddev 9.291, 0 failed, lag 0.062 ms
progress: 6.0 s, 57.0 tps, lat 21.513 ms stddev 17.011, 0 failed, lag 0.058 ms
progress: 7.0 s, 50.0 tps, lat 28.071 ms stddev 21.622, 0 failed, lag 0.061 ms
progress: 8.0 s, 45.0 tps, lat 27.244 ms stddev 11.975, 0 failed, lag 0.064 ms
progress: 9.0 s, 57.0 tps, lat 35.988 ms stddev 25.752, 0 failed, lag 0.057 ms
progress: 10.0 s, 56.0 tps, lat 45.478 ms stddev 39.831, 0 failed, lag 0.534 ms
progress: 11.0 s, 62.0 tps, lat 45.146 ms stddev 32.881, 0 failed, lag 0.058 ms
progress: 12.0 s, 51.0 tps, lat 24.250 ms stddev 12.405, 0 failed, lag 0.063 ms
progress: 13.0 s, 57.0 tps, lat 18.554 ms stddev 8.677, 0 failed, lag 0.060 ms
progress: 14.0 s, 44.0 tps, lat 15.923 ms stddev 6.958, 0 failed, lag 0.065 ms
progress: 15.0 s, 54.0 tps, lat 19.773 ms stddev 10.024, 0 failed, lag 0.063 ms
transaction type: 
[..]
latency average = 25.575 ms
latency stddev = 21.540 ms

session2 output:
postgres=# set synchronous_commit_flush_wal_after = 1024;
SET
postgres=# \i insertBIG.sql
INSERT 0 50
Time: 8889.318 ms (00:08.889)


With 16devel/master and synchronous_commit_flush_wal_after=256 (kB)
all is smooth:
postgres@host1:~$ pgbench -n -R 50 -c 8 -T 15 -P 1
pgbench (16devel)
progress: 1.0 s, 49.0 tps, lat 14.345 ms stddev 4.700, 0 failed, lag 0.062 ms
progress: 2.0 s, 45.0 tps, lat 14.812 ms stddev 5.816, 0 failed, lag 0.064 ms
progress: 3.0 s, 49.0 tps, lat 13.145 ms stddev 4.320, 0 failed

Re: CREATE ROLE bug?

2023-01-25 Thread Robert Haas
On Wed, Jan 25, 2023 at 8:29 AM Bruce Momjian  wrote:
> This works in PG 15:
>
> CREATE ROLE service CREATEROLE;
> CREATE ROLE service1 WITH LOGIN IN ROLE service;
> SET SESSION AUTHORIZATION service;
> CREATE ROLE service2 WITH LOGIN IN ROLE service;
>
> but generates an error in git master:
>
> CREATE ROLE service CREATEROLE;
> CREATE ROLE service1 WITH LOGIN IN ROLE service;
> SET SESSION AUTHORIZATION service;
> CREATE ROLE service2 WITH LOGIN IN ROLE service;
> --> ERROR:  must have admin option on role "service"
>
> If I make 'service' a superuser, it works:
>
> CREATE ROLE service SUPERUSER;
> CREATE ROLE service1 WITH LOGIN IN ROLE service;
> SET SESSION AUTHORIZATION service;
> CREATE ROLE service2 WITH LOGIN IN ROLE service;
>
> It is probably related to this discussion and change:
>
>
> https://www.postgresql.org/message-id/flat/CA+TgmobGds7oefDjZUY+k_J7p1sS=pTq3sZ060qdb=okei1...@mail.gmail.com
>
> I am not sure if the behavior is wrong, the error message is wrong, or
> it is working as expected.

It is indeed related to that discussion and change. In existing
released branches, a CREATEROLE user can make any role a member of any
other role even if they have no rights at all with respect to that
role. This means that a CREATEROLE user can create a new user in the
pg_execute_server_programs group even though they have no access to
it. That allows any CREATEROLE user to take over the OS account, and
thus also superuser. In master, the rules have been tightened up.
CREATEROLE no longer exempts you from the usual permission checks
about adding a user to a group. This means that a CREATEROLE user now
needs the same permissions to add a user to a group as any other user
would need, i.e. ADMIN OPTION on the group.

In your example, the "service" user has CREATEROLE and is therefore
entitled to create new roles. However, "service" can only add those
new roles to groups for which "service" possesses ADMIN OPTION. And
"service" does not have ADMIN OPTION on itself, because no role ever
possesses ADMIN OPTION on itself.

I wrote a blog about this yesterday, which may or may not be of help:

http://rhaas.blogspot.com/2023/01/surviving-without-superuser-coming-to.html

I think that the new behavior will surprise some people, as it has
surprised you, and it will take some time to get used to. However, I
also think that the changes are absolutely essential. We've been
shipping major releases for years and just pretending that it's OK
that having CREATEROLE lets you take over the OS account and the
superuser account. That's a security hole -- and not a subtle one. I
could have easily exploited it as a teenager. My goals in doing this
project were to (1) fix the security holes, (2) otherwise change as
little about the behavior of CREATEROLE as possible, and (3) make
CREATEROLE do something useful. We could have accomplished the first
goal by just removing CREATEROLE or making it not do anything at all,
but meeting the second and third goals at the same time require
letting CREATEROLE continue to work but putting just enough
restrictions on its power to keep it from being used as a
privilege-escalation attack. I hope that what's been committed
accomplishes that goal.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: old_snapshot_threshold bottleneck on replica

2023-01-25 Thread Robert Haas
On Wed, Jan 25, 2023 at 3:52 AM Maxim Orlov  wrote:
> But in TransactionIdLimitedForOldSnapshots these variable is using 
> conjointly. So, I'm not
> sure, is it completely safe to remove mutex.

Well, that's something we - and ideally you, as the patch author -
need to analyze and figure out. We can't just take a shot and hope for
the best.

> Actually, removing mutex and switch to atomics
> was my first choice. I've run all the tests and no problems were found

Unfortunately, that kind of testing is not very likely to find a
subtle synchronization problem. That's why a theoretical analysis is
so important.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




More pgindent tweaks

2023-01-25 Thread Andrew Dunstan
After I committed 1249371632 I thought that I should really go ahead and
do what I suggested and allow multiple exclude pattern files for
pgindent. One obvious case is to exclude an in tree meson build
directory. I also sometimes have other in tree objects I'd like to be
able exclude.

The attached adds this ability. It also unifies the logic for finding
the regular exclude pattern file and the typedefs file.

I took the opportunity to remove a badly thought out and dangerous
feature whereby the first non-option argument, if it's not a .c or .h
file, is taken as the typedefs file. That's particularly dangerous in a
situation where git is producing a list of files that have changed and
passing it on the command line to pgindent. I also removed a number of
extraneous blank lines.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 1f95a1a34e..2ddfe07982 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -22,7 +22,7 @@ my $indent_opts =
 my $devnull = File::Spec->devnull;
 
 my ($typedefs_file, $typedef_str, $code_base,
-	$excludes,  $indent,  $build,
+	@excludes,  $indent,  $build,
 	$show_diff, $silent_diff, $help);
 
 $help = 0;
@@ -32,7 +32,7 @@ my %options = (
 	"typedefs=s" => \$typedefs_file,
 	"list-of-typedefs=s" => \$typedef_str,
 	"code-base=s"=> \$code_base,
-	"excludes=s" => \$excludes,
+	"excludes=s" => \@excludes,
 	"indent=s"   => \$indent,
 	"build"  => \$build,
 	"show-diff"  => \$show_diff,
@@ -46,10 +46,8 @@ usage("Cannot have both --silent-diff and --show-diff")
 
 run_build($code_base) if ($build);
 
-# command line option wins, then first non-option arg,
-# then environment (which is how --build sets it) ,
+# command line option wins, then environment (which is how --build sets it) ,
 # then locations. based on current dir, then default location
-$typedefs_file ||= shift if @ARGV && $ARGV[0] !~ /\.[ch]$/;
 $typedefs_file ||= $ENV{PGTYPEDEFS};
 
 # build mode sets PGINDENT
@@ -58,14 +56,15 @@ $indent ||= $ENV{PGINDENT} || $ENV{INDENT} || "pg_bsd_indent";
 # no non-option arguments given. so do everything in the current directory
 $code_base ||= '.' unless @ARGV;
 
+my $sourcedir = locate_sourcedir();
+
 # if it's the base of a postgres tree, we will exclude the files
 # postgres wants excluded
-$excludes ||= "$code_base/src/tools/pgindent/exclude_file_patterns"
-  if $code_base && -f "$code_base/src/tools/pgindent/exclude_file_patterns";
-
-# also look under the current directory for the exclude patterns file
-$excludes ||= "src/tools/pgindent/exclude_file_patterns"
-  if -f "src/tools/pgindent/exclude_file_patterns";
+if ($sourcedir)
+{
+	my $exclude_candidate = "$sourcedir/exclude_file_patterns";
+	push (@excludes, $exclude_candidate) if -f $exclude_candidate;
+}
 
 # The typedef list that's mechanically extracted by the buildfarm may omit
 # some names we want to treat like typedefs, e.g. "bool" (which is a macro
@@ -85,7 +84,6 @@ my %excluded = map { +"$_\n" => 1 } qw(
 my @files;
 my $filtered_typedefs_fh;
 
-
 sub check_indent
 {
 	system("$indent -? < $devnull > $devnull 2>&1");
@@ -114,26 +112,34 @@ sub check_indent
 	return;
 }
 
+sub locate_sourcedir
+{
+	# try fairly hard to locate the sourcedir
+	my $where = $code_base || '.';
+	my $sub = "$where/src/tools/pgindent";
+	return $sub if -d $sub;
+	# try to find it from an ancestor directory
+	$sub = "../src/tools/pgindent";
+	foreach (1..4)
+	{
+		return $sub if -d $sub;
+		$sub = "../$sub";
+	}
+	return; # undef if nothing found
+}
 
 sub load_typedefs
 {
-
 	# try fairly hard to find the typedefs file if it's not set
 
-	foreach my $try ('.', 'src/tools/pgindent', '/usr/local/etc')
+	foreach my $try ('.', $sourcedir, '/usr/local/etc')
 	{
-		$typedefs_file ||= "$try/typedefs.list"
+		last if $typedefs_file;
+		next unless defined $try;
+		$typedefs_file = "$try/typedefs.list"
 		  if (-f "$try/typedefs.list");
 	}
 
-	# try to find typedefs by moving up directory levels
-	my $tdtry = "..";
-	foreach (1 .. 5)
-	{
-		$typedefs_file ||= "$tdtry/src/tools/pgindent/typedefs.list"
-		  if (-f "$tdtry/src/tools/pgindent/typedefs.list");
-		$tdtry = "$tdtry/..";
-	}
 	die "cannot locate typedefs file \"$typedefs_file\"\n"
 	  unless $typedefs_file && -f $typedefs_file;
 
@@ -166,13 +172,13 @@ sub load_typedefs
 	return $filter_typedefs_fh;
 }
 
-
 sub process_exclude
 {
-	if ($excludes && @files)
+	foreach my $excl (@excludes)
 	{
-		open(my $eh, '<', $excludes)
-		  || die "cannot open exclude file \"$excludes\"\n";
+		last unless @files;
+		open(my $eh, '<', $excl)
+		  || die "cannot open exclude file \"$excl\"\n";
 		while (my $line = <$eh>)
 		{
 			chomp $line;
@@ -185,7 +191,6 @@ sub process_exclude
 	return;
 }
 
-
 sub read_source
 {
 	my $source_filename = shift;
@@ -200,7 +205,6 @@ sub read_source
 	return $source;
 }
 
-
 

Re: Re: Support plpgsql multi-range in conditional control

2023-01-25 Thread songjinzhou
Hello, this usage scenario is from Oracle's PL/SQL language (I have been doing 
the function development of PL/SQL language for some time). I think this patch 
is very practical and will expand our for loop scenario. In short, I look 
forward to your reply.

Happy Chinese New Year!



songjinzhou(2903807...@qq.com)

Maybe you didn't understand my reply. Without some significant real use case, I 
am strongly against the proposed feature and merging your patch to upstream. I 
don't see any reason to enhance language with this feature.

Regards

Pavel




Re: Re: Support plpgsql multi-range in conditional control

2023-01-25 Thread Pavel Stehule
Hi


st 25. 1. 2023 v 15:18 odesílatel songjinzhou <2903807...@qq.com> napsal:

> Hello, this usage scenario is from Oracle's PL/SQL language (I have been
> doing the function development of PL/SQL language for some time). I think
> this patch is very practical and will expand our for loop scenario. In
> short, I look forward to your
>

I don't see any real usage. PL/SQL doesn't support proposed syntax.

Regards

Pavel



> reply.
>
> Happy Chinese New Year!
>
> --
> songjinzhou(2903807...@qq.com)
>
>
> Maybe you didn't understand my reply. Without some significant real use
> case, I am strongly against the proposed feature and merging your patch to
> upstream. I don't see any reason to enhance language with this feature.
>
> Regards
>
> Pavel
>
>
>


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-25 Thread Takamichi Osumi (Fujitsu)
On Wednesday, January 25, 2023 3:27 PM Kyotaro Horiguchi 
 wrote:
> At Tue, 24 Jan 2023 12:19:04 +, "Takamichi Osumi (Fujitsu)"
>  wrote in
> > Attached the patch v20 that has incorporated all comments so far.
> 
> Thanks! I looked thourgh the documentation part.
Thank you for your review !


> +  
> +   subminapplydelay int8
> +  
> +  
> +   Total time spent delaying the application of changes, in milliseconds.
> +  
> 
> I was confused becase it reads as this column shows the summarized actual
> waiting time caused by min_apply_delay.  IIUC actually it shows the
> min_apply_delay setting for the subscription. Thus shouldn't it be something
> like this?
> 
> "The minimum amount of time to delay applying changes, in milliseconds"
> And it might be better to mention the corresponding subscription paramter.
This description looks much better to me than the past description. Fixed.
OTOH, other parameters don't mention about its subscription parameters.
So, I didn't add the mention.


> +   error. If wal_receiver_status_interval is set to
> +   zero, the apply worker doesn't send any feedback messages during
> the
> +   min_apply_delay period.
> 
> I took a bit longer time to understand what this sentence means.  I'd like to
> suggest something like the follwoing.
> 
> "Since no status-update messages are sent while delaying, note that
> wal_receiver_status_interval is the only source of keepalive messages during
> that period."
The current patch's description is precise and I prefer that.
I would say "the only source" would be confusing to readers.
However, I slightly adjusted the description a bit. Could you please check ?


> +  
> +   A logical replication subscription can delay the application of changes by
> +   specifying the min_apply_delay subscription
> parameter.
> +   See  for details.
> +  
> 
> I'm not sure "logical replication subscription" is a common term.
> Doesn't just "subscription" mean the same, especially in that context?
> (Note that 31.2 starts with "A subscription is the downstream..").
I think you are right. Fixed.


> +  Any delay occurs only on WAL records for transaction begins after
> all
> +  initial table synchronization has finished. The delay is
> + calculated
> 
> There is no "transaction begin" WAL records.  Maybe it is "logical replication
> transaction begin message". The timestamp is of "commit time".  (I took
> "transaction begins" as a noun, but that might be
> wrong..)
Yeah, we can improve here. But, we need to include not only
"commit" but also "prepare" as nuance in this part.

In short, I think we should change here to mention
(1) the delay happens after all initial table synchronization
(2) how delay is applied for non-streaming and streaming transactions in 
general.

By the way, WAL timestamp is a word used in the recovery_min_apply_delay.
So, I'd like to keep it to make the description more aligned with it,
until there is a better description.

Updated the doc. I adjusted the commit message according to this fix.
> 
> +may reduce the actual wait time. It is also possible that the 
> overhead
> +already exceeds the requested min_apply_delay
> value,
> +in which case no additional wait is necessary. If the system
> + clocks
> 
> I'm not sure it is right to say "necessary" here.  IMHO it might be better be 
> "in
> which case no delay is applied".
Agreed. Fixed.


> +  in which case no additional wait is necessary. If the system clocks
> +  on publisher and subscriber are not synchronized, this may lead to
> +  apply changes earlier than expected, but this is not a major issue
> +  because this parameter is typically much larger than the time
> +  deviations between servers. Note that if this parameter is
> + set to a
> 
> This doesn't seem to fit our documentation. It is not our business whether a
> certain amount deviation is critical or not. How about somethig like the
> following?
> 
> "Note that the delay is measured between the timestamp assigned by
> publisher and the system clock on subscriber.  You need to manage the
> system clocks to be in sync so that the delay works properly."
As discussed, this is aligned with recovery_min_apply_delay. So, I keep it.


> +Delaying the replication can mean there is a much longer time
> +between making a change on the publisher, and that change
> being
> +committed on the subscriber. This can impact the performance
> of
> +synchronous replication. See  linkend="guc-synchronous-commit"/>
> +parameter.
> 
> Do we need the "can" in "Delaying the replication can mean"?  If we want to
> say, it might be "Delaying the replication means there can be a much 
> longer..."?
The "can" indicates the possibility as the nuance,
while adopting "means" in this case indicates "time delayed LR causes
the long time wait always".

I'm okay with either expressio

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-25 Thread houzj.f...@fujitsu.com
On Wednesday, January 25, 2023 7:30 AM Peter Smith  
wrote:
> 
> Here are my review comments for patch v87-0002.

Thanks for your comments.

> ==
> doc/src/sgml/config.sgml
> 
> 1.
> 
> -Allows streaming or serializing changes immediately in
> logical decoding.
>  The allowed values of
> logical_replication_mode are
> -buffered and immediate. When
> set
> -to immediate, stream each change if
> +buffered and immediate.
> The default
> +is buffered.
> +   
> 
> I didn't think it was necessary to say “of logical_replication_mode”.
> IMO that much is already obvious because this is the first sentence of the
> description for logical_replication_mode.
> 

Changed.

> ~~~
> 
> 2.
> +   
> +On the publisher side, it allows streaming or serializing changes
> +immediately in logical decoding.  When set to
> +immediate, stream each change if
>  streaming option (see optional parameters set by
>  CREATE
> SUBSCRIPTION)
>  is enabled, otherwise, serialize each change.  When set to
> -buffered, which is the default, decoding will 
> stream
> -or serialize changes when
> logical_decoding_work_mem
> -is reached.
> +buffered, decoding will stream or serialize 
> changes
> +when logical_decoding_work_mem is
> reached.
> 
> 
> 2a.
> "it allows" --> "logical_replication_mode allows"
> 
> 2b.
> "decoding" --> "the decoding"

Changed.

> ~~~
> 
> 3.
> +   
> +On the subscriber side, if streaming option is set
> +to parallel, this parameter also allows the leader
> +apply worker to send changes to the shared memory queue or to
> serialize
> +changes.  When set to buffered, the leader sends
> +changes to parallel apply workers via shared memory queue.  When
> set to
> +immediate, the leader serializes all changes to
> +files and notifies the parallel apply workers to read and apply them 
> at
> +the end of the transaction.
> +   
> 
> "this parameter also allows" --> "logical_replication_mode also allows"

Changed.

> ~~~
> 
> 4.
> 
>  This parameter is intended to be used to test logical decoding and
>  replication of large transactions for which otherwise we need to
>  generate the changes till
> logical_decoding_work_mem
> -is reached.
> +is reached. Moreover, this can also be used to test the transmission 
> of
> +changes between the leader and parallel apply workers.
> 
> 
> "Moreover, this can also" --> "It can also"
> 
> I am wondering would this sentence be better put at the top of the GUC
> description. So then the first paragraph becomes like this:
> 
> 
> SUGGESTION (I've also added another sentence "The effect of...")
> 
> The allowed values are buffered and immediate. The default is buffered. This
> parameter is intended to be used to test logical decoding and replication of 
> large
> transactions for which otherwise we need to generate the changes till
> logical_decoding_work_mem is reached. It can also be used to test the
> transmission of changes between the leader and parallel apply workers. The
> effect of logical_replication_mode is different for the publisher and
> subscriber:
> 
> On the publisher side...
> 
> On the subscriber side...

I think your suggestion makes sense, so changed as suggested.

> ==
> .../replication/logical/applyparallelworker.c
> 
> 5.
> + /*
> + * In immeidate mode, directly return false so that we can switch to
> + * PARTIAL_SERIALIZE mode and serialize remaining changes to files.
> + */
> + if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE) return
> + false;
> 
> Typo "immediate"
> 
> Also, I felt "directly" is not needed. "return false" and "directly return 
> false" is the
> same.
> 
> SUGGESTION
> Using ‘immediate’ mode returns false to cause a switch to PARTIAL_SERIALIZE
> mode so that the remaining changes will be serialized.

Changed.

> ==
> src/backend/utils/misc/guc_tables.c
> 
> 6.
>   {
>   {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
> - gettext_noop("Allows streaming or serializing each change in logical
> decoding."),
> - NULL,
> + gettext_noop("Controls the behavior of logical replication publisher
> and subscriber"),
> + gettext_noop("If set to immediate, on the publisher side, it "
> + "allows streaming or serializing each change in "
> + "logical decoding. On the subscriber side, in "
> + "parallel streaming mode, it allows the leader apply "
> + "worker to serialize changes to files and notifies "
> + "the parallel apply workers to read and apply them at "
> + "the end of the transaction."),
>   GUC_NOT_IN_SAMPLE
>   },
> 
> 6a. short description
> 
> User PoV behaviour should be the same. Instead, maybe say "controls the
> internal behavior" or something like that?

Changed to "internal behavior xxx"

> ~
> 
> 6b. long description
> 
>

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-25 Thread Takamichi Osumi (Fujitsu)
On Wednesday, January 25, 2023 3:55 PM Amit Kapila  
wrote:
> On Wed, Jan 25, 2023 at 11:23 AM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> >
> > Thank you for checking the patch !
> > On Wednesday, January 25, 2023 10:17 AM Kyotaro Horiguchi
>  wrote:
> > > In short, I'd like to propose renaming the parameter
> > > in_delayed_apply of send_feedback to "has_unprocessed_change".
> > >
> > > At Tue, 24 Jan 2023 12:27:58 +0530, Amit Kapila
> > >  wrote in
> > > > > send_feedback():
> > > > > +* If the subscriber side apply is delayed (because of
> > > time-delayed
> > > > > +* replication) then do not tell the publisher that the
> > > > > + received
> > > latest
> > > > > +* LSN is already applied and flushed, otherwise, it leads to
> the
> > > > > +* publisher side making a wrong assumption of logical
> > > replication
> > > > > +* progress. Instead, we just send a feedback message to
> > > > > + avoid a
> > > publisher
> > > > > +* timeout during the delay.
> > > > >  */
> > > > > -   if (!have_pending_txes)
> > > > > +   if (!have_pending_txes && !in_delayed_apply)
> > > > > flushpos = writepos = recvpos;
> > > > >
> > > > > Honestly I don't like this wart. The reason for this is the
> > > > > function assumes recvpos = applypos but we actually call it
> > > > > while holding unapplied changes, that is, applypos < recvpos.
> > > > >
> > > > > Couldn't we maintain an additional static variable "last_applied"
> > > > > along with last_received?
> > > > >
> > > >
> > > > It won't be easy to maintain the meaning of last_applied because
> > > > there are cases where we don't apply the change directly. For
> > > > example, in case of streaming xacts, we will just keep writing it
> > > > to the file, now, say, due to some reason, we have to send the
> > > > feedback, then it will not allow you to update the latest write
> > > > locations. This would then become different then what we are doing
> without the patch.
> > > > Another point to think about is that we also need to keep the
> > > > variable updated for keep-alive ('k') messages even though we
> > > > don't apply anything in that case. Still, other cases to consider
> > > > are where we have mix of streaming and non-streaming transactions.
> > >
> > > Yeah.  Even though I named it as "last_applied", its objective is to
> > > have get_flush_position returning the correct have_pending_txes
> > > without a hint from callers, that is, "let g_f_position know if
> > > store_flush_position has been called with the last received data".
> > >
> > > Anyway I tried that but didn't find a clean and simple way. However,
> > > while on it, I realized what the code made me confused.
> > >
> > > +static void send_feedback(XLogRecPtr recvpos, bool force, bool
> > > requestReply,
> > > +   bool
> > > + in_delayed_apply);
> > >
> > > The name "in_delayed_apply" doesn't donsn't give me an idea of what
> > > the function should do for it. If it is named
> > > "has_unprocessed_change", I think it makes sense that send_feedback
> > > should think there may be an outstanding transaction that is not known to
> the function.
> > >
> > >
> > > So, my conclusion here is I'd like to propose changing the parameter
> > > name to "has_unapplied_change".
> > Renamed the variable name to "has_unprocessed_change".
> > Also, removed the first argument of the send_feedback() which isn't
> necessary now.
> >
> 
> Why did you remove the first argument of the send_feedback() when that is not
> added by this patch? If you really think that is an improvement, feel free to
> propose that as a separate patch.
> Personally, I don't see a value in it.
Oh, sorry for that. I have made the change back.
Kindly have a look at the v22 shared in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB837305BD31FA317256BC7B1FEDCE9%40TYCPR01MB8373.jpnprd01.prod.outlook.com



Best Regards,
Takamichi Osumi





Re: CREATE ROLE bug?

2023-01-25 Thread Bruce Momjian
On Wed, Jan 25, 2023 at 08:47:14AM -0500, Robert Haas wrote:
> > I am not sure if the behavior is wrong, the error message is wrong, or
> > it is working as expected.
> 
> It is indeed related to that discussion and change. In existing
> released branches, a CREATEROLE user can make any role a member of any
> other role even if they have no rights at all with respect to that
> role. This means that a CREATEROLE user can create a new user in the
> pg_execute_server_programs group even though they have no access to
> it. That allows any CREATEROLE user to take over the OS account, and
> thus also superuser. In master, the rules have been tightened up.
> CREATEROLE no longer exempts you from the usual permission checks
> about adding a user to a group. This means that a CREATEROLE user now
> needs the same permissions to add a user to a group as any other user
> would need, i.e. ADMIN OPTION on the group.
> 
> In your example, the "service" user has CREATEROLE and is therefore
> entitled to create new roles. However, "service" can only add those
> new roles to groups for which "service" possesses ADMIN OPTION. And
> "service" does not have ADMIN OPTION on itself, because no role ever
> possesses ADMIN OPTION on itself.

So, how would someone with CREATEROLE permission add people to their own
role, without superuser permission?  Are we adding any security by
preventing this?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: CREATE ROLE bug?

2023-01-25 Thread David G. Johnston
On Wed, Jan 25, 2023 at 7:35 AM Bruce Momjian  wrote:

>
> So, how would someone with CREATEROLE permission add people to their own
> role, without superuser permission?  Are we adding any security by
> preventing this?
>
>
As an encouraged design choice you wouldn't.  You'd create a new group and
add both yourself and the new role to it - then grant it the desired
permissions.

A CREATEROLE role should probably be a user (LOGIN) role and user roles
should not have members.

David J.


Re: More pgindent tweaks

2023-01-25 Thread Bruce Momjian
On Wed, Jan 25, 2023 at 08:59:44AM -0500, Andrew Dunstan wrote:
> After I committed 1249371632 I thought that I should really go ahead and
> do what I suggested and allow multiple exclude pattern files for
> pgindent. One obvious case is to exclude an in tree meson build
> directory. I also sometimes have other in tree objects I'd like to be
> able exclude.
> 
> The attached adds this ability. It also unifies the logic for finding
> the regular exclude pattern file and the typedefs file.
> 
> I took the opportunity to remove a badly thought out and dangerous
> feature whereby the first non-option argument, if it's not a .c or .h
> file, is taken as the typedefs file. That's particularly dangerous in a
> situation where git is producing a list of files that have changed and
> passing it on the command line to pgindent. I also removed a number of
> extraneous blank lines.

Can we make the pgindent options more visible, perhaps by adding them to
pgindent.man or at least saying type pgindent --help?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: heapgettup() with NoMovementScanDirection unused in core?

2023-01-25 Thread Tom Lane
David Rowley  writes:
> Does anyone know of any reason why we shouldn't ditch the nomovement
> code in heapgettup/heapgettup_pagemode?

AFAICS, the remaining actual use-case for NoMovementScanDirection
is that defined by ExecutorRun:

 *If direction is NoMovementScanDirection then nothing is done
 *except to start up/shut down the destination.  Otherwise,
 *we retrieve up to 'count' tuples in the specified direction.
 *
 *Note: count = 0 is interpreted as no portal limit, i.e., run to
 *completion.

We must have the NoMovementScanDirection option because count = 0
does not mean "do nothing", and I noted at least two call sites
that require it.

The heapgettup definition is thus not only unreachable, but confusingly
inconsistent with this meaning.

I wonder if we couldn't also get rid of this confusingly-inconsistent
alternative usage in the planner:

 * 'indexscandir' is one of:
 *ForwardScanDirection: forward scan of an ordered index
 *BackwardScanDirection: backward scan of an ordered index
 *NoMovementScanDirection: scan of an unordered index, or don't care
 * (The executor doesn't care whether it gets ForwardScanDirection or
 * NoMovementScanDirection for an indexscan, but the planner wants to
 * distinguish ordered from unordered indexes for building pathkeys.)

While that comment's claim is plausible, I think it's been wrong for
years.  AFAICS indxpath.c makes pathkeys before it ever does this:

  index_is_ordered ?
  ForwardScanDirection :
  NoMovementScanDirection,

and nothing depends on that later either.  So I think we could
simplify this to something like "indexscandir is either
ForwardScanDirection or BackwardScanDirection.  (Unordered
index types need not support BackwardScanDirection.)"

regards, tom lane




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-25 Thread Robert Haas
On Tue, Jan 24, 2023 at 3:33 PM Peter Geoghegan  wrote:
> Sure, it's possible that such a cancellable aggressive autovacuum was
> indeed cancelled, and that that factor made the crucial difference.
> But I find it far easier to believe that there simply was no such
> aggressive autovacuum in the first place (not this time), since it
> could have only happened when autovacuum thinks that there are
> sufficiently many dead tuples to justify launching an autovacuum in
> the first place. Which, as we now all accept, is based on highly
> dubious sampling by ANALYZE. So I think it's much more likely to be
> that factor (dead tuple accounting is bad), as well as the absurd
> false dichotomy between aggressive and non-aggressive -- plus the
> issue at hand, the auto-cancellation behavior.

In my opinion, this is too speculative to justify making changes to
the behavior. I'm not here to defend the way we do dead tuple
accounting. I think it's a hot mess. But whether or not it played any
role in this catastrophe is hard to say. The problems with dead tuple
accounting are, as I understand it, all about statistical
independence. That is, we make assumptions that what's true of the
sample is likely to be true of the whole table when in reality it may
not be true at all. Perhaps it's even unlikely to be true. But the
kinds of problems you get from assuming statistical independence tend
to hit users very unevenly. We make similar assumptions about
selectivity estimation: unless there are extended statistics, we take
P(a=1 and b=1) = P(a=1)*P(b = 1), which can be vastly and dramatically
wrong. People can and do get extremely bad query plans as a result of
that assumption. However, other people run PostgreSQL for years and
years and never really have big problems with it. I'd say that it's
completely fair to describe this as a big problem, but we can't
therefore conclude that some particular user has this problem, not
even if we know that they have a slow query and we know that it's due
to a bad plan. And similarly here, I don't see a particular reason to
think that your theory about what happened is more likely than mine. I
freely admit that yours could be right, just as you admitted that mine
could be right. But I think we really just don't know.

It feels unlikely to me that there was ONE cancellable aggressive
autovacuum and that it got cancelled. I think that it's probably
either ZERO or LOTS, depending on whether the dead tuple threshold was
ever reached. If it wasn't, then it must have been zero. But if it
was, and the first autovacuum worker to visit that table got
cancelled, then the next one would try again. And
autovacuum_naptime=1m, so if we're not out of autovacuum workers,
we're going to retry that table every minute. If we do run out of
autovacuum workers, which is pretty likely, we'll still launch new
workers in that database as often as we can given when other workers
exit. If the system is very tight on autovacuum capacity, probably
because the cost limit is too low, then you could have a situation
where only one try gets made before we hit autovacuum_freeze_max_age.
Otherwise, though, a single failed try would probably lead to trying a
whole lot more times after that, and you only hit
autovacuum_freeze_max_age if all those attempts fail.

At the risk of repeating myself, here's what bugs me. If we suppose
that your intuition is right and no aggressive autovacuum happened
before autovacuum_freeze_max_age was reached, then what you are
proposing will make things better. But if we suppose that my intuition
is right and many aggressive autovacuums happened before
autovacuum_freeze_max_age was reached, then what you are proposing
will make things worse, because if we've been auto-cancelling
repeatedly we're probably going to keep doing so until we shut that
behavior off, and we want a vacuum to succeed sooner rather than
later. So it doesn't feel obvious to me that we should change
anything. Even if we knew what had happened for certain in this
particular case, I don't know how we can possibly know what is typical
in similar cases.

My personal support experience has been that cases where autovacuum
runs a lot but doesn't solve the problem for some reason are a lot
more common than cases where it doesn't run when it should have done.
That probably accounts for my intuition about what is likely to have
happened here. But as my colleagues are often at pains to point out to
me, my experiences aren't representative of what happens to PostgreSQL
users generally for all kinds of reasons, and therefore sometimes my
intuition is wrong. But since I have nobody else's experiences to use
in lieu of my own, I don't know what else I can use to judge anything.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: CREATE ROLE bug?

2023-01-25 Thread Bruce Momjian
On Wed, Jan 25, 2023 at 07:38:51AM -0700, David G. Johnston wrote:
> On Wed, Jan 25, 2023 at 7:35 AM Bruce Momjian  wrote:
> 
> 
> So, how would someone with CREATEROLE permission add people to their own
> role, without superuser permission?  Are we adding any security by
> preventing this?
> 
> 
> 
> As an encouraged design choice you wouldn't.  You'd create a new group and add
> both yourself and the new role to it - then grant it the desired permissions.
> 
> A CREATEROLE role should probably be a user (LOGIN) role and user roles should
> not have members.

Makes sense.  I was actually using that pattern, but in running some
test scripts that didn't revert back to the superuser, I saw the errors
and was confused.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




[PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-01-25 Thread Aleksander Alekseev
Hi hackers,

Currently we allow self-conflicting inserts for ON CONFLICT DO NOTHING:

```
CREATE TABLE t (a INT UNIQUE, b INT);
INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT DO NOTHING;
-- succeeds, inserting the first row and ignoring the second
```
... but not for ON CONFLICT .. DO UPDATE:

```
INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0;
ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same
command have duplicate constrained values.
```

Tom pointed out in 2016 that this is actually a bug [1] and I agree.

The proposed patch fixes this.

[1]: https://www.postgresql.org/message-id/22438.1477265185%40sss.pgh.pa.us

-- 
Best regards,
Aleksander Alekseev


v1-0001-Make-ON-CONFLICT-DO-NOTHING-and-ON-CONFLICT-DO-UP.patch
Description: Binary data


Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-01-25 Thread Andrei Zubkov
Hi,

I've updated this patch for the current master. Also I have some
additional explanations..

On Wed, 2023-01-18 at 17:29 +0100, Tomas Vondra wrote:
> 1) I'm not sure why the patch is adding tests of permissions on the
> pg_stat_statements_reset function?

I've fixed that

> 
> 2) If we want the second timestamp, shouldn't it also cover resets of
> the mean values, not just min/max?

I think that mean values are not a targets for auxiliary resets because
any sampling solution can easily calculate the mean values between
samples without a reset.

> 
> 3) I don't understand why the patch is adding "IS NOT NULL AS t" to
> various places in the regression tests.

This change is necessary in the current version because the
pg_stat_statements_reset() function will return a timestamp of a reset,
needed for sampling solutions to detect resets, perfermed by someone
else.


Regards
-- 
Andrei Zubkov

From 8214db3676e686993bcf73963f78c96baeb04c4e Mon Sep 17 00:00:00 2001
From: Andrei Zubkov 
Date: Wed, 25 Jan 2023 18:13:14 +0300
Subject: [PATCH] pg_stat_statements: Track statement entry timestamp

This patch adds stats_since and minmax_stats_since columns to the
pg_stat_statements view and pg_stat_statements() function. The new min/max reset
mode for the pg_stat_stetments_reset() function is controlled by the
parameter minmax_only.
stat_since column is populated with the current timestamp when a new statement
is added to the pg_stat_statements hashtable. It provides clean information
about statistics collection time interval for each statement. Besides it can be
used by sampling solutions to detect situations when a statement was evicted and
stored again between samples.
Such sampling solution could derive any pg_stat_statements statistic values for
an interval between two samples with the exception of all min/max statistics. To
address this issue this patch adds the ability to reset min/max statistics
independently of the statement reset using the new minmax_only parameter
of the pg_stat_statements_reset(userid oid, dbid oid, queryid bigint,
minmax_only boolean) function. Timestamp of such reset is stored in the
minmax_stats_since field for each statement.
pg_stat_statements_reset() function now returns the timestamp of a reset as a
result.

Discussion:
https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../expected/oldextversions.out   |  70 
 .../expected/pg_stat_statements.out   | 361 +-
 .../pg_stat_statements--1.10--1.11.sql|  81 
 .../pg_stat_statements/pg_stat_statements.c   | 139 +--
 .../pg_stat_statements.control|   2 +-
 .../pg_stat_statements/sql/oldextversions.sql |   7 +
 .../sql/pg_stat_statements.sql| 149 +++-
 doc/src/sgml/pgstatstatements.sgml|  66 +++-
 9 files changed, 721 insertions(+), 156 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.10--1.11.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index edc40c8bbfb..0afb9060fa1 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,7 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql \
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.10--1.11.sql \
 	pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index efb2049ecff..1e1cc11a8e2 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -250,4 +250,74 @@ SELECT count(*) > 0 AS has_data FROM pg_stat_statements;
  t
 (1 row)
 
+-- New functions and views for pg_stat_statements in 1.11
+AlTER EXTENSION pg_stat_statements UPDATE TO '1.11';
+\d pg_stat_statements
+  View "public.pg_stat_statements"
+ Column |   Type   | Collation | Nullable | Default 
++--+---+--+-
+ userid | oid  |   |  | 
+ dbid   | oid  |   |  | 
+ toplevel   | boolean  |   |  | 
+ queryid| bigint   |   |  | 
+ query  | text |   |  | 
+ plans  | bigint   |   |  | 
+ total_plan_time| double precision |   |  | 
+ min_plan_time  | 

Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-25 Thread Israel Barth Rubio
Hello Jim/Jacob,

> > I do not think it is worth it to change the current behavior of
> PostgreSQL
> > in that sense.
>
> Well, I am not suggesting to change the current behavior of PostgreSQL in
> that matter. Quite the contrary, I find this feature very convenient,
> specially when you need to deal with many different clusters. What I am
> proposing is rather the possibility to disable it on demand :) I mean,
> in case I do not want libpq to try to authenticate using the certificates
> in `~/.postgresql`.
>
> > PostgreSQL looks for the cert and key under `~/.postgresql` as a
> facility.
> > These files do not exist by default, so if PostgreSQL finds something in
> > there it assumes you want to use it.
>
> Yes. I'm just trying to find an elegant way to disable this assumption
> on demand.

Right, I do understand your proposal. I was just thinking out loud and
wondering about the broad audience of such a mode in the sslmode
argument.

Something else that came to my mind is that sslmode itself seems more
like an argument covering the client expectations regarding the connection
to the server, I mean, if it expects channel encryption and/or validation
of the
server identity.

I wonder if we are willing to add some functionality around the expectations
regarding the client certificate, if it wouldn't make more sense to be
controlled
through something like the clientcert option of pg_hba? If so, the downside
of
that is the fact that the client would still send the certificate even if
it would not
be used at all by the server. Again, just thinking out loud about what your
goal
is and possible ways of accomplishing that:)

> > I do realize that this patch is a big ask, since probably nobody except
> > me "needs it" :D
>
> I'd imagine other people have run into it too; it's just a matter of
> how palatable the workarounds were to them. :)

I imagine more people might have already hit a similar situation too. While
the
workaround can seem a bit weird, in my very humble opinion the user/client
is
somehow still the one to blame in this case as it is providing the "wrong"
file in
a path that is checked by libpq. With that in mind I would be inclined to
say it is
an acceptable workaround.

> > I think the sslcertmode=disable option that I introduced in [1]
solves this issue too;
>
> Well, I see there is indeed a significant overlap between our patches -
> but yours has a much more comprehensive approach! If I got it right,
> the new slcertmode=disable would indeed cancel the existing certs in
> '~/.postgresql/ in case they exist. Right?
>
> +if (conn->sslcertmode[0] == 'd') /* disable */
> +{
> +/* don't send a client cert even if we have one */
> +have_cert = false;
> +}
> +else if (fnbuf[0] == '\0')
>
> My idea was rather to use the existing sslmode with a new option
> "no-clientcert" that does actually the same:
>
>  /* sslmode no-clientcert */
>  if (conn->sslmode[0] == 'n')
>  {
>  fnbuf[0] = '\0';
>  }
>
>  ...
>
>  if (fnbuf[0] == '\0')
>  {
>  /* no home directory, proceed without a client cert */
>  have_cert = false;
>  }
>
> I wish I had found your patchset some months ago. Now I hate myself
> for the duplication of efforts :D

Although both patches achieve a similar goal regarding not sending the
client certificate there is still a slight but in my opinion important
difference
between them: sslmode=disable will also disable channel encryption. It
may or may not be acceptable depending on how the connection is between
your client and the server.

Kind regards,
Israel.


Re: New strategies for freezing, advancing relfrozenxid early

2023-01-25 Thread Matthias van de Meent
On Tue, 24 Jan 2023 at 23:50, Peter Geoghegan  wrote:
>
> On Mon, Jan 16, 2023 at 5:55 PM Peter Geoghegan  wrote:
> > 0001 (the freezing strategies patch) is now committable IMV. Or at
> > least will be once I polish the docs a bit more. I plan on committing
> > 0001 some time next week, barring any objections.
>
> I plan on committing 0001 (the freezing strategies commit) tomorrow
> morning, US Pacific time.
>
> Attached is v17. There are no significant differences compared to v17.
> I decided to post a new version now, ahead of commit, to show how I've
> cleaned up the docs in 0001 -- docs describing the new GUC, freeze
> strategies, and so on.

LGTM, +1 on 0001

Some more comments on 0002:

> +lazy_scan_strategy(LVRelState *vacrel, bool force_scan_all)
> scanned_pages_lazy & scanned_pages_eager

We have not yet scanned the pages, so I suggest plan/scan_pages_eager
and *_lazy as variable names instead, to minimize confusion about the
naming.

I'll await the next iteration of 0002 in which you've completed more
TODOs before I'll dig deeper into that patch.


Kind regards,

Matthias van de Meent




Re: Improve GetConfigOptionValues function

2023-01-25 Thread Tom Lane
Nitin Jadhav  writes:
> I agree that the developer can use both GUC_NO_SHOW_ALL and
> GUC_EXPLAIN knowingly or unknowingly for a single GUC. If used by
> mistake then according to the existing code (without patch),
> GUC_NO_SHOW_ALL takes higher precedence whether it is marked first or
> last in the code. I am more convinced with this behaviour as I feel it
> is safer than exposing the information which the developer might not
> have intended.

Both of you are arguing as though GUC_NO_SHOW_ALL is a security
property.  It is not, or at least it's so trivially bypassable
that it's useless to consider it one.  All it is is a de-clutter
mechanism.

regards, tom lane




Re: More pgindent tweaks

2023-01-25 Thread Andrew Dunstan


On 2023-01-25 We 09:41, Bruce Momjian wrote:
> On Wed, Jan 25, 2023 at 08:59:44AM -0500, Andrew Dunstan wrote:
>> After I committed 1249371632 I thought that I should really go ahead and
>> do what I suggested and allow multiple exclude pattern files for
>> pgindent. One obvious case is to exclude an in tree meson build
>> directory. I also sometimes have other in tree objects I'd like to be
>> able exclude.
>>
>> The attached adds this ability. It also unifies the logic for finding
>> the regular exclude pattern file and the typedefs file.
>>
>> I took the opportunity to remove a badly thought out and dangerous
>> feature whereby the first non-option argument, if it's not a .c or .h
>> file, is taken as the typedefs file. That's particularly dangerous in a
>> situation where git is producing a list of files that have changed and
>> passing it on the command line to pgindent. I also removed a number of
>> extraneous blank lines.
> Can we make the pgindent options more visible, perhaps by adding them to
> pgindent.man or at least saying type pgindent --help?


Sure, will do.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Re: Support plpgsql multi-range in conditional control

2023-01-25 Thread songjinzhou

>Hi

>st 25. 1. 2023 v 16:39 odesílatel songjinzhou <2903807...@qq.com> napsal: 
>Hello, my personal understanding is that you can use multiple iterative 
>controls (as a merge) in a fo loop, otherwise we can only separate these 
>iterative controls, but in fact, they may do the same thing.

>1. please, don't use top posting in this mailing list 
>https://en.wikipedia.org/wiki/Posting_styl

>2. I understand the functionality, but I don't think there is a real necessity 
>to support this functionality.  Not in this static form, and just for integer 
>type.

>Postgres has a nice generic type "multirange". I can imagine some iterator 
>over the value of multirange, but I cannot imagine the necessity of a richer 
>iterator over just integer range. So the question is, what is the real 
>possible use case of this proposed functionality? 

1. I'm very sorry that my personal negligence has caused obstacles to your 
reading. Thank you for your reminding.
2. With regard to the use of this function, my understanding is relatively 
simple: there are many for loops that may do the same things. We can reduce our 
sql redundancy by merging iterative control; It is also more convenient to 
understand and read logically.

As follows, we can only repeat the for statement before we use such SQL:

begin
for i in 10..20 loop
raise notice '%', i; -- Things to do
end loop;

for i in 100 .. 200 by 10 loop
raise notice '%', i; -- Things to do
end loop;
end;

But now we can simplify it as follows:

begin
for i in 10..20, 100 .. 200 by 10 loop
raise notice '%', i; -- Things to do
end loop;
end;

Although we can only use integer iterative control here, this is just a 
horizontal expansion of the previous logic. Thank you very much for your reply. 
I am very grateful!

---

songjinzhou(2903807...@qq.com)


Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-25 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart  
> wrote in 
>> Here is a first attempt at a patch.  I scanned through all the existing
>> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
>> else that needed adjusting.

> There seems to be two cases for DSA_HANDLE_INVALID in dsa_get_handle
> and dsa_attach_in_place, one of which is Assert(), though.

Right.  I fixed some other infelicities and pushed it.

regards, tom lane




Re: Re: Support plpgsql multi-range in conditional control

2023-01-25 Thread Pavel Stehule
st 25. 1. 2023 v 17:22 odesílatel songjinzhou <2903807...@qq.com> napsal:

>
> >Hi
>
> >st 25. 1. 2023 v 16:39 odesílatel songjinzhou <2903807...@qq.com>
> napsal: Hello, my personal understanding is that you can use multiple
> iterative controls (as a merge) in a fo loop, otherwise we can only
> separate these iterative controls, but in fact, they may do the same thing.
>
> >1. please, don't use top posting in this mailing list
> https://en.wikipedia.org/wiki/Posting_styl
>
> >2. I understand the functionality, but I don't think there is a real
> necessity to support this functionality.  Not in this static form, and just
> for integer type.
>
> >Postgres has a nice generic type "multirange". I can imagine some
> iterator over the value of multirange, but I cannot imagine the necessity
> of a richer iterator over just integer range. So the question is, what is
> the real possible use case of this proposed functionality?
>
> 1. I'm very sorry that my personal negligence has caused obstacles to your
> reading. Thank you for your reminding.
> 2. With regard to the use of this function, my understanding is relatively
> simple: there are many for loops that may do the same things. We can reduce
> our sql redundancy by merging iterative control; It is also more convenient
> to understand and read logically.
>
> As follows, we can only repeat the for statement before we use such SQL:
>
> begin
> for i in 10..20 loop
> raise notice '%', i; -- Things to do
> end loop;
>
> for i in 100 .. 200 by 10 loop
> raise notice '%', i; -- Things to do
> end loop;
> end;
>
> But now we can simplify it as follows:
>
> begin
> for i in 10..20, 100 .. 200 by 10 loop
> raise notice '%', i; -- Things to do
> end loop;
> end;
>
> Although we can only use integer iterative control here, this is just a
> horizontal expansion of the previous logic. Thank you very much for your
> reply. I am very grateful!
>
>
Unfortunately, this is not a real use case - this is not an example from
the real world.

Regards

Pavel




> ---
>
> songjinzhou(2903807...@qq.com)
>
>


Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-25 Thread Jacob Champion
On Wed, Jan 25, 2023 at 7:47 AM Israel Barth Rubio
 wrote:
> I imagine more people might have already hit a similar situation too. While 
> the
> workaround can seem a bit weird, in my very humble opinion the user/client is
> somehow still the one to blame in this case as it is providing the "wrong" 
> file in
> a path that is checked by libpq. With that in mind I would be inclined to say 
> it is
> an acceptable workaround.

I'm not sure how helpful it is to assign "blame" here. I think the
requested improvement is reasonable -- it should be possible to
override the default for a particular connection, without having to
pick a junk value that you hope doesn't match up with an actual file
on the disk.

> Although both patches achieve a similar goal regarding not sending the
> client certificate there is still a slight but in my opinion important 
> difference
> between them: sslmode=disable will also disable channel encryption. It
> may or may not be acceptable depending on how the connection is between
> your client and the server.

sslmode=disable isn't used in either of our proposals, though. Unless
I'm missing what you mean?

--Jacob




Re: CREATE ROLE bug?

2023-01-25 Thread Robert Haas
On Wed, Jan 25, 2023 at 9:35 AM Bruce Momjian  wrote:
> So, how would someone with CREATEROLE permission add people to their own
> role, without superuser permission?  Are we adding any security by
> preventing this?

They can't, because a role can't ever have ADMIN OPTION on itself, and
you need ADMIN OPTION on a role to confer membership in that role.

The security argument here is complicated, but I think it basically
boils down to wanting to distinguish between accessing the permissions
of a role and administering the role, or in other words, being a
member of a role is supposed to be different than having ADMIN OPTION
on it. Probably for that reason, we've never allowed making a role a
member of itself. In any release, this fails:

rhaas=# grant bruce to bruce with admin option;
ERROR:  role "bruce" is a member of role "bruce"

If that worked, then role "bruce" would be able to grant membership in
role "bruce" to anyone -- but all those people wouldn't just get
membership, they'd get ADMIN OPTION, too, because *bruce* has ADMIN
OPTION on himself and anyone to whom he grants access to his role will
therefore get admin option too. Someone might argue that this is OK,
but our precedent is otherwise. It used to be the case that the users
implicitly enjoyed ADMIN OPTION on their own roles and thus could do
the sort of thing that you were proposing. That led to CVE-2014-0060
and commit fea164a72a7bfd50d77ba5fb418d357f8f2bb7d0. That CVE is, as I
understand it, all about maintaining the distinction between
membership and ADMIN OPTION. In other words, we've made an intentional
decision to not let ordinary users do the sort of thing you tried to
do here.

So the only reason your example ever worked is because the "service"
role had CREATEROLE, and thus, in earlier releases, got to bypass all
the permissions checks. But it turns out that letting CREATEROLE
bypass all the permissions checks is *also* a big security problem, so
that is now restricted as well.

I don't want to take the position that we couldn't find some way to
allow ordinary users to do this. I think that the desire to maintain
the distinction between membership and ADMIN OPTION makes sense as a
general rule, but if somebody wants to abolish it in a particular case
by making strange grants, would that really be that bad? I'm not
totally convinced that it would be. It probably depends somewhat on
how much you want to try to keep people from accidentally giving away
more privileges than they intended, and also on whether you think that
this is a useful thing for someone to be able to do in the first
place. However, it's the way we've designed the system and we've even
requested CVEs when we accidentally did something inconsistent with
that general principle. Similarly, I don't want to take the position
that the restrictions I put on CREATEROLE are the *only* way that we
could have plugged the security holes that it has had for years now. I
think they are pretty sensible and pretty consistent with the overall
system design, but somebody else might have been able to come up with
some other set of restrictions that allowed this case to work.

I think David is right to raise the question of how useful it would be
to allow this case. In general, I think that if role A creates role B,
it is more sensible to grant B's permissions to A than to grant A's
permissions to B. The former is useful because it permits the more
powerful user to act on behalf of the less powerful user, just as the
superuser is able to administer the whole system by being able to act
on behalf of any other user. But the latter makes you wonder why you
are even bothering to have two users, because you end up with A and B
having exactly identical privileges. That's a bit of a strange thing
to want, but if you do happen to want it, you can get it with this new
system: again, as David says, you should just create one role to use
as a group, and then grant membership in that group to multiple roles
that are used as users.

But it does seem pretty important to keep talking about these things,
because there's definitely no guarantee whatsoever that all of the
commits I've made to master in this area are without problems. If we
find important cases that can't be supported given the new
restrictions on CREATEROLE, or even important cases that never worked
but we wish they did, well then we should think about what to change.
I'm not too concerned about this particular case not working, but the
next one might be different.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Tomas Vondra


On 1/25/23 16:37, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby 
>  wrote:
> 
> 
>>
>>
>> On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote:
>>
>>> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby 
>>> pry...@telsasoft.com wrote:
>>>
 On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:

> Please find attached v23 which reintroduces the split.
>
> 0001 is reworked to have a reduced footprint than before. Also in an 
> attempt
> to facilitate the readability, 0002 splits the API's and the uncompressed
> implementation in separate files.

 Thanks for updating the patch. Could you address the review comments I
 sent here ?
 https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
>>>
>>> Please find v24 attached.
>>
>>
>> Thanks for updating the patch.
>>
>> In 001, RestoreArchive() does:
>>
>>> -#ifndef HAVE_LIBZ
>>> - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
>>> - AH->PrintTocDataPtr != NULL)
>>> + supports_compression = false;
>>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
>>> + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
>>> + supports_compression = true;
>>> +
>>> + if (AH->PrintTocDataPtr != NULL)
>>> {
>>> for (te = AH->toc->next; te != AH->toc; te = te->next)
>>> {
>>> if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>>> - pg_fatal("cannot restore from compressed archive (compression not 
>>> supported in this installation)");
>>> + {
>>> +#ifndef HAVE_LIBZ
>>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
>>> + supports_compression = false;
>>> +#endif
>>> + if (supports_compression == false)
>>> + pg_fatal("cannot restore from compressed archive (compression not 
>>> supported in this installation)");
>>> + }
>>> }
>>> }
>>> -#endif
>>
>>
>> This first checks if the algorithm is implemented, and then checks if
>> the algorithm is supported by the current build - that confused me for a
>> bit. It seems unnecessary to check for unimplemented algorithms before
>> looping. That also requires referencing both GZIP and LZ4 in two
>> places.
> 
> I am not certain that it is unnecessary, at least not in the way that is
> described. The idea is that new compression methods can be added, without
> changing the archive's version number. It is very possible that it is
> requested to restore an archive compressed with a method not implemented
> in the current binary. The first check takes care of that and sets
> supports_compression only for the supported versions. It is possible to
> enter the loop with supports_compression already set to false, for example
> because the archive was compressed with ZSTD, triggering the fatal error.
> 
> Of course, one can throw the error before entering the loop, yet I think
> that it does not help the readability of the code. IMHO it is easier to
> follow if the error is thrown once during that check.
> 

Actually, I don't understand why 0001 moves the check into the loop. I
mean, why not check HAVE_LIBZ before the loop?

>>
>> I think it could be written to avoid the need to change for added
>> compression algorithms:
>>
>> + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>>
>> + {
>> + /* Check if the compression algorithm is supported */
>> + pg_compress_specification spec;
>> + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec);
>>
>> + if (spec->parse_error != NULL)
>>
>> + pg_fatal(spec->parse_error);
>>
>> + }
> 
> I am not certain how that would work in the example with ZSTD above.
> If I am not wrong, parse_compress_specification() will not throw an error
> if the codebase supports ZSTD, yet this specific pg_dump binary will not
> support it because ZSTD is not implemented. parse_compress_specification()
> is not aware of that and should not be aware of it, should it?
> 

Not sure. What happens in a similar situation now? That is, when trying
to deal with an archive gzip-compressed in a build without libz?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: CREATE ROLE bug?

2023-01-25 Thread Bruce Momjian
On Wed, Jan 25, 2023 at 12:21:14PM -0500, Robert Haas wrote:
> But it does seem pretty important to keep talking about these things,
> because there's definitely no guarantee whatsoever that all of the
> commits I've made to master in this area are without problems. If we
> find important cases that can't be supported given the new
> restrictions on CREATEROLE, or even important cases that never worked
> but we wish they did, well then we should think about what to change.
> I'm not too concerned about this particular case not working, but the
> next one might be different.

Agreed, thanks for the details.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Robert Haas
On Wed, Jan 25, 2023 at 12:37 AM Michael Paquier  wrote:
> Rename contrib module basic_archive to basic_wal_module

FWIW, I find this new name much less clear than the old one.

If we want to provide a basic_archive module and a basic_recovery
module, that seems fine. Why merge them?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-25 Thread Nathan Bossart
On Wed, Jan 25, 2023 at 04:12:00PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart  
> wrote in 
>> Here is a first attempt at a patch.  I scanned through all the existing
>> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
>> else that needed adjusting.
> 
> There seems to be two cases for DSA_HANDLE_INVALID in dsa_get_handle
> and dsa_attach_in_place, one of which is Assert(), though.

Ah, sorry, I'm not sure how I missed this.  Thanks for looking.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-25 Thread Nathan Bossart
On Wed, Jan 25, 2023 at 11:49:27AM -0500, Tom Lane wrote:
> Right.  I fixed some other infelicities and pushed it.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread Justin Pryzby
On Wed, Jan 25, 2023 at 03:37:12PM +, gkokola...@pm.me wrote:
> Of course, one can throw the error before entering the loop, yet I think
> that it does not help the readability of the code. IMHO it is easier to
> follow if the error is thrown once during that check.

> If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(),
> and remove altogether this check. On the other hand, I like the belts
> and suspenders approach because there are no more checks after this point.

While looking at this, I realized that commit 5e73a6048 introduced a
regression:

@@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)

-   if (AH->compression != 0)
-   pg_log_warning("archive is compressed, but this installation 
does not support compression -- no data will be available");
+   if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
+   pg_fatal("archive is compressed, but this installation does not 
support compression");

Before, it was possible to restore non-data chunks of a dump file, even
if the current build didn't support its compression.  But that's now
impossible - and it makes the code we're discussing in RestoreArchive()
unreachable.

I don't think we can currently test for that, since it requires creating a dump
using a build --with compression and then trying to restore using a build
--without compression.  The coverage report disagrees with me, though...
https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901

> > I think it could be written to avoid the need to change for added
> > compression algorithms:
...
> 
> I am not certain how that would work in the example with ZSTD above.
> If I am not wrong, parse_compress_specification() will not throw an error
> if the codebase supports ZSTD, yet this specific pg_dump binary will not
> support it because ZSTD is not implemented. parse_compress_specification()
> is not aware of that and should not be aware of it, should it?

You're right.

I think the 001 patch should try to remove hardcoded references to
LIBZ/GZIP, such that the later patches don't need to update those same
places for LZ4.  For example in ReadHead() and RestoreArchive(), and
maybe other places dealing with file extensions.  Maybe that could be
done by adding a function specific to pg_dump indicating whether or not
an algorithm is implemented and supported.

-- 
Justin




Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Nathan Bossart
On Wed, Jan 25, 2023 at 12:49:45PM -0500, Robert Haas wrote:
> On Wed, Jan 25, 2023 at 12:37 AM Michael Paquier  wrote:
>> Rename contrib module basic_archive to basic_wal_module
> 
> FWIW, I find this new name much less clear than the old one.
> 
> If we want to provide a basic_archive module and a basic_recovery
> module, that seems fine. Why merge them?

I'll admit I've been stewing on whether "WAL Modules" is the right name.
My first instinct was to simply call it "Archive and Recovery Modules,"
which is longer but (IMHO) clearer.

I wanted to merge basic_archive and basic_recovery because there's a decent
chunk of duplicated code.  Perhaps that is okay, but I would rather just
have one test module.  AFAICT the biggest reason to split it is because we
can't determine a good name.  Maybe we could leave the name as
"basic_archive" since it deals with creating and recovering archive files.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-25 Thread Israel Barth Rubio
Hello Jacob,

> I'm not sure how helpful it is to assign "blame" here. I think the
> requested improvement is reasonable -- it should be possible to
> override the default for a particular connection, without having to
> pick a junk value that you hope doesn't match up with an actual file
> on the disk.

Right, I agree we can look for improvements. "blame" was likely
not the best word to express myself in that message.

> sslmode=disable isn't used in either of our proposals, though. Unless
> I'm missing what you mean?

Sorry about the noise, I misread the code snippet shared earlier
(sslmode x sslcertmode). I just took a closer read at the previously
mentioned patch about sslcertmode and it seems a bit
more elegant way of achieving something similar to what has
been proposed here.

Best regards,
Israel.

Em qua., 25 de jan. de 2023 às 14:09, Jacob Champion <
jchamp...@timescale.com> escreveu:

> On Wed, Jan 25, 2023 at 7:47 AM Israel Barth Rubio
>  wrote:
> > I imagine more people might have already hit a similar situation too.
> While the
> > workaround can seem a bit weird, in my very humble opinion the
> user/client is
> > somehow still the one to blame in this case as it is providing the
> "wrong" file in
> > a path that is checked by libpq. With that in mind I would be inclined
> to say it is
> > an acceptable workaround.
>
> I'm not sure how helpful it is to assign "blame" here. I think the
> requested improvement is reasonable -- it should be possible to
> override the default for a particular connection, without having to
> pick a junk value that you hope doesn't match up with an actual file
> on the disk.
>
> > Although both patches achieve a similar goal regarding not sending the
> > client certificate there is still a slight but in my opinion important
> difference
> > between them: sslmode=disable will also disable channel encryption. It
> > may or may not be acceptable depending on how the connection is between
> > your client and the server.
>
> sslmode=disable isn't used in either of our proposals, though. Unless
> I'm missing what you mean?
>
> --Jacob
>


Re: heapgettup() with NoMovementScanDirection unused in core?

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-25 10:02:28 -0500, Tom Lane wrote:
> David Rowley  writes:
> > Does anyone know of any reason why we shouldn't ditch the nomovement
> > code in heapgettup/heapgettup_pagemode?

+1

Because I dug it up yesterday. There used to be callers of heap* with
NoMovement. But they were unused themselves:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=adbfab119b308a7e0e6b1305de9be222cfd5c85b


>  *If direction is NoMovementScanDirection then nothing is done
>  *except to start up/shut down the destination.  Otherwise,
>  *we retrieve up to 'count' tuples in the specified direction.
>  *
>  *Note: count = 0 is interpreted as no portal limit, i.e., run to
>  *completion.
> 
> We must have the NoMovementScanDirection option because count = 0
> does not mean "do nothing", and I noted at least two call sites
> that require it.

I wonder if we'd be better off removing NoMovementScanDirection, and using
count == (uint64)-1 for what NoMovementScanDirection is currently used for as
an ExecutorRun parameter.  Seems less confusing to me - right now we have two
parameters with non-obbvious meanings and interactions.


> I wonder if we couldn't also get rid of this confusingly-inconsistent
> alternative usage in the planner:
> 
>  * 'indexscandir' is one of:
>  *ForwardScanDirection: forward scan of an ordered index
>  *BackwardScanDirection: backward scan of an ordered index
>  *NoMovementScanDirection: scan of an unordered index, or don't care
>  * (The executor doesn't care whether it gets ForwardScanDirection or
>  * NoMovementScanDirection for an indexscan, but the planner wants to
>  * distinguish ordered from unordered indexes for building pathkeys.)

+1

Certainly seems confusing to me.

Greetings,

Andres Freund




Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-25 18:45:12 +0300, Aleksander Alekseev wrote:
> Currently we allow self-conflicting inserts for ON CONFLICT DO NOTHING:
> 
> ```
> CREATE TABLE t (a INT UNIQUE, b INT);
> INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT DO NOTHING;
> -- succeeds, inserting the first row and ignoring the second
> ```
> ... but not for ON CONFLICT .. DO UPDATE:
> 
> ```
> INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0;
> ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
> HINT: Ensure that no rows proposed for insertion within the same
> command have duplicate constrained values.
> ```
> 
> Tom pointed out in 2016 that this is actually a bug [1] and I agree.

I don't think I agree with this being a bug.

We can't sensible implement updating a row twice within a statement - hence
erroring out for ON CONFLICT DO UPDATE affecting a row twice. But what's the
justification for erroring out in the DO NOTHING case? ISTM that it's useful
to be able to handle such duplicates, and I don't immediately see what
semantic confusion or implementation difficulty we avoid by erroring out.

It seems somewhat likely that a behavioural change will cause trouble for some
of the uses of DO NOTHING out there.

Greetings,

Andres Freund




pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function

2023-01-25 Thread Dimos Stamatakis
Hi hackers,

I attempted to perform an upgrade from PG-14.5 to PG-15.1 with pg_upgrade and 
unfortunately it errors out because of a function that does not exist anymore 
in PG-15.1.
The function is ‘pg_catalog.close_lb’ and it exists in 14.5 but not in 15.1.
In our scenario we changed the permissions of this function in PG14.5 (via an 
automated tool) and then pg_upgrade tries to change the permissions in PG15.1 
as well.


Steps to reproduce:


  1.  Run initdb for 14.5
  2.  Run initdb for 15.1
  3.  Run psql client on 14.5
 *   postgres=# REVOKE ALL ON FUNCTION close_lb(line, box) FROM $USER;
  4.  Run pg_upgrade from 14.5 to 15.1

This will error out because pg_upgrade will attempt to REVOKE the persmissions 
on close_lb on 15.1.
Is there a way to specify which functions/objects to exclude in pg_upgrade?
Thanks in advance!

Dimos
(ServiceNow)


Re: Transparent column encryption

2023-01-25 Thread Peter Eisentraut

On 07.01.23 01:34, Justin Pryzby wrote:

"ON (CASE WHEN a.attrealtypid <> 0 THEN a.attrealtypid ELSE a.atttypid END = 
t.oid)\n"

This breaks interoperability with older servers:
ERROR:  column a.attrealtypid does not exist

Same in describe.c

Find attached some typos and bad indentation.  I'm sending this off now
as I've already sat on it for 2 weeks since starting to look at the
patch.


Thanks, I have integrated all that into the v15 patch I just posted.






Re: Transparent column encryption

2023-01-25 Thread Peter Eisentraut

On 12.01.23 17:32, Peter Eisentraut wrote:
Can we do anything about the attack vector wherein a malicious DBA 
simply copies the encrypted datum from one row to another?


We discussed this earlier [0].  This patch is not that feature.  We 
could get there eventually, but it would appear to be an immense amount 
of additional work.  We have to start somewhere.


I've been thinking, this could be done as a "version 2" of the currently 
proposed feature, within the same framework.  We'd extend the 
RowDescription and ParameterDescription messages to provide primary key 
information, some flags, then the client would have enough to know what 
to do.  As you wrote in your follow-up message, a challenge would be to 
handle statements that do not touch all the columns.  We'd need to work 
through this and consider all the details.






Re: Transparent column encryption

2023-01-25 Thread Peter Eisentraut

On 19.01.23 21:48, Jacob Champion wrote:

I like the existing "caveats" documentation, and I've attached a sample
patch with some more caveats documented, based on some of the upthread
conversation:

- text format makes fixed-length columns leak length information too
- you only get partial protection against the Evil DBA
- RSA-OAEP public key safety

(Feel free to use/remix/discard as desired.)


I have added those in the v15 patch I just posted.


When writing the paragraph on RSA-OAEP I was reminded that we didn't
really dig into the asymmetric/symmetric discussion. Assuming that most
first-time users will pick the builtin CMK encryption method, do we
still want to have an asymmetric scheme implemented first instead of a
symmetric keywrap? I'm still concerned about that public key, since it
can't really be made public.


I had started coding that, but one problem was that the openssl CLI 
doesn't really provide any means to work with those kinds of keys.  The 
"openssl enc" command always wants to mix in a password.  Without that, 
there is no way to write a test case, and more crucially no way for 
users to set up these kinds of keys.  Unless we write our own tooling 
for this, which, you know, the patch just passed 400k in size.



For the padding caveat:


+  There is no concern if all values are of the same length (e.g., credit
+  card numbers).


I nodded along to this statement last year, and then this year I learned
that CCNs aren't fixed-length. So with a 16-byte block, you're probably
going to be able to figure out who has an American Express card.


Heh.  I have removed that parenthetical remark.


The column encryption algorithm is set per-column -- but isn't it
tightly coupled to the CEK, since the key length has to match? From a
layperson perspective, using the same key to encrypt the same plaintext
under two different algorithms (if they happen to have the same key
length) seems like it might be cryptographically risky. Is there a
reason I should be encouraged to do that?


Not really.  I was also initially confused by this setup, but that's how 
other similar systems are set up, so I thought it would be confusing to 
do it differently.



With the loss of \gencr it looks like we also lost a potential way to
force encryption from within psql. Any plans to add that for v1?


\gencr didn't do that either.  We could do it.  The libpq API supports 
it.  We just need to come up with some syntax for psql.



While testing, I forgot how the new option worked and connected with
`column_encryption=on` -- and then I accidentally sent unencrypted data
to the server, since `on` means "not enabled". :( The server errors out
after the damage is done, of course, but would it be okay to strictly
validate that option's values?


fixed in v15


Are there plans to document client-side implementation requirements, to
ensure cross-client compatibility? Things like the "PG\x00\x01"
associated data are buried at the moment (or else I've missed them in
the docs). If you're holding off until the feature is more finalized,
that's fine too.


This is documented in the protocol chapter, which I thought was the 
right place.  Did you want more documentation, or in a different place?



Speaking of cross-client compatibility, I'm still disconcerted by the
ability to write the value "hello world" into an encrypted integer
column. Should clients be required to validate the text format, using
the attrealtypid?


Well, we can ask them to, but we can't really require them, in a 
cryptographic sense.  I'm not sure what more we can do.



It occurred to me when looking at the "unspecified" CMK scheme that the
CEK doesn't really have to be an encryption key at all. In that case it
can function more like a (possibly signed?) cookie for lookup, or even
be ignored altogether if you don't want to use a wrapping scheme
(similar to JWE's "direct" mode, maybe?). So now you have three ways to
look up or determine a column encryption key (CMK realm, CMK name, CEK
cookie)... is that a concept worth exploring in v1 and/or the documentation?


I don't completely follow this.





Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-01-25 Thread Aleksander Alekseev
Hi Andres,

> I don't think I agree with this being a bug.

Perhaps that's not a bug especially considering the fact that the
documentation describes this behavior, but in any case the fact that:

```
INSERT INTO t VALUES (1,1) ON CONFLICT (a) DO UPDATE SET b = 0;
INSERT INTO t VALUES (1,2) ON CONFLICT (a) DO UPDATE SET b = 0;
```

and:

```
INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO NOTHING;
``

.. both work, and:

```
INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0;
```

... doesn't is rather confusing. There is no reason why the latest
query shouldn't work except for a slight complication of the code.
Which seems to be a reasonable tradeoff, for me at least.

> But what's the justification for erroring out in the DO NOTHING case?
>
> [...]
>
> It seems somewhat likely that a behavioural change will cause trouble for some
> of the uses of DO NOTHING out there.

Just to make sure we are on the same page. The patch doesn't break the
current DO NOTHING behavior but rather makes DO UPDATE work the same
way DO NOTHING does.

-- 
Best regards,
Aleksander Alekseev




Re: Syncrep and improving latency due to WAL throttling

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-25 14:32:51 +0100, Jakub Wartak wrote:
> In other words it allows slow down of any backend activity. Any feedback on
> such a feature is welcome, including better GUC name proposals ;) and
> conditions in which such feature should be disabled even if it would be
> enabled globally (right now only anti- wraparound VACUUM comes to mind, it's
> not in the patch).

Such a feature could be useful - but I don't think the current place of
throttling has any hope of working reliably:

> @@ -1021,6 +1025,21 @@ XLogInsertRecord(XLogRecData *rdata,
>   pgWalUsage.wal_bytes += rechdr->xl_tot_len;
>   pgWalUsage.wal_records++;
>   pgWalUsage.wal_fpi += num_fpi;
> +
> + backendWalInserted += rechdr->xl_tot_len;
> +
> + if ((synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY || 
> synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_WRITE) &&
> + synchronous_commit_flush_wal_after > 0 &&
> + backendWalInserted > synchronous_commit_flush_wal_after 
> * 1024L)
> + {
> + elog(DEBUG3, "throttling WAL down on this session 
> (backendWalInserted=%d)", backendWalInserted);
> + XLogFlush(EndPos);
> + /* XXX: refactor SyncRepWaitForLSN() to have different 
> waitevent than default WAIT_EVENT_SYNC_REP  */
> + /* maybe new WAIT_EVENT_SYNC_REP_BIG or something like 
> that */
> + SyncRepWaitForLSN(EndPos, false);
> + elog(DEBUG3, "throttling WAL down on this session - 
> end");
> + backendWalInserted = 0;
> + }
>   }

You're blocking in the middle of an XLOG insertion. We will commonly hold
important buffer lwlocks, it'll often be in a critical section (no cancelling
/ terminating the session!). This'd entail loads of undetectable deadlocks
(i.e. hard hangs).  And even leaving that aside, doing an unnecessary
XLogFlush() with important locks held will seriously increase contention.


My best idea for how to implement this in a somewhat safe way would be for
XLogInsertRecord() to set a flag indicating that we should throttle, and set
InterruptPending = true. Then the next CHECK_FOR_INTERRUPTS that's allowed to
proceed (i.e. we'll not be in a critical / interrupts off section) can
actually perform the delay.  That should fix the hard deadlock danger and
remove most of the increase in lock contention.


I don't think doing an XLogFlush() of a record that we just wrote is a good
idea - that'll sometimes significantly increase the number of fdatasyncs/sec
we're doing. To make matters worse, this will often cause partially filled WAL
pages to be flushed out - rewriting a WAL page multiple times can
significantly increase overhead on the storage level. At the very least this'd
have to flush only up to the last fully filled page.


Just counting the number of bytes inserted by a backend will make the overhead
even worse, as the flush will be triggered even for OLTP sessions doing tiny
transactions, even though they don't contribute to the problem you're trying
to address.  How about counting how many bytes of WAL a backend has inserted
since the last time that backend did an XLogFlush()?

A bulk writer won't do a lot of XLogFlush()es, so the time/bytes since the
last XLogFlush() will increase quickly, triggering a flush at the next
opportunity. But short OLTP transactions will do XLogFlush()es at least at
every commit.


I also suspect the overhead will be more manageable if you were to force a
flush only up to a point further back than the last fully filled page. We
don't want to end up flushing WAL for every page, but if you just have a
backend-local accounting mechanism, I think that's inevitably what you'd end
up with when you have a large number of sessions. But if you'd limit the
flushing to be done to synchronous_commit_flush_wal_after / 2 boundaries, and
only ever to a prior boundary, the amount of unnecessary WAL flushes would be
proportional to synchronous_commit_flush_wal_after.


Greetings,

Andres Freund




Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Robert Haas
On Wed, Jan 25, 2023 at 1:17 PM Nathan Bossart  wrote:
> On Wed, Jan 25, 2023 at 12:49:45PM -0500, Robert Haas wrote:
> > On Wed, Jan 25, 2023 at 12:37 AM Michael Paquier  
> > wrote:
> >> Rename contrib module basic_archive to basic_wal_module
> >
> > FWIW, I find this new name much less clear than the old one.
> >
> > If we want to provide a basic_archive module and a basic_recovery
> > module, that seems fine. Why merge them?
>
> I'll admit I've been stewing on whether "WAL Modules" is the right name.
> My first instinct was to simply call it "Archive and Recovery Modules,"
> which is longer but (IMHO) clearer.
>
> I wanted to merge basic_archive and basic_recovery because there's a decent
> chunk of duplicated code.  Perhaps that is okay, but I would rather just
> have one test module.  AFAICT the biggest reason to split it is because we
> can't determine a good name.  Maybe we could leave the name as
> "basic_archive" since it deals with creating and recovering archive files.

Yeah, maybe. I'm not sure what the best thing to do is, but if I see a
module called basic_archive or basic_restore, I know what it's about,
whereas basic_wal_module seems a lot less specific. It sounds like it
could be generating or streaming it just as easily as it could be
archiving it. It would be nice to have a name that is less prone to
that kind of unclarity.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: What object types should be in schemas?

2023-01-25 Thread Peter Eisentraut

On 12.01.23 18:41, Alvaro Herrera wrote:

I think one important criterion to think about is how does encryption work
when you have per-customer (or per-whatever) schemas.  Is the concept of
a column encryption [objtype] a thing that you would like to set up per
customer?  In that case, you will probably want that object to live in
that customer's schema.  Otherwise, you'll force the DBA to come up with
a naming scheme that includes the customer name in the column encryption
object.


Makes sense.  In my latest patch I have moved these key objects into 
schemas.






Re: pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function

2023-01-25 Thread Christoph Moench-Tegeder
## Dimos Stamatakis (dimos.stamata...@servicenow.com):

> In our scenario we changed the permissions of this function in PG14.5
> (via an automated tool) and then pg_upgrade tries to change the
> permissions in PG15.1 as well.

Given that this function wasn't even documented and did nothing but
throw an error "function close_lb not implemented" - couldn't you
revert that permissions change for the upgrade? (if it comes to the
worst, a superuser could UPDATE pg_catalog.pg_proc and set proacl
to NULL for that function, but that's not how you manage ACLs in
production, it's for emergency fixing only).

Regards,
Christoph

-- 
Spare Space




Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-01-25 Thread Peter Geoghegan
On Wed, Jan 25, 2023 at 11:01 AM Aleksander Alekseev
 wrote:
> Just to make sure we are on the same page. The patch doesn't break the
> current DO NOTHING behavior but rather makes DO UPDATE work the same
> way DO NOTHING does.

It also makes DO UPDATE not work the same way as either UPDATE itself
(which will silently skip a second or subsequent update of the same
row by the same UPDATE statement in RC mode), or MERGE (which has
similar cardinality violations).

DO NOTHING doesn't lock any conflicting row, and so won't have to
dirty pages that have matching rows. It was always understood to be
more susceptible to certain issues (when in READ COMMITTED mode) as a
result. There are some halfway reasonable arguments against this sort
of behavior, but I believe that we made the right trade-off.

-- 
Peter Geoghegan




Re: Proposal: Support custom authentication methods using hooks

2023-01-25 Thread Andrey Chudnovsky
Greetings,

Want to resurface the OAUTH support topic in the context of the
concerns raised here.

> How about- if we just added OAUTH support directly into libpq and the
> backend, would that work with Azure's OIDC provider? If not, why not?
> If it does, then what's the justification for trying to allow custom
> backend-only authentication methods?

We've explored this option, and prepared a patch which has bare-bone
OAUTHBEARER mechanism implementation on both server- and libpq- sides.
Based on existing SASL exchange support used for SCRAM.

Most of the work has been done by @Jacob Champion and was referenced
in this thread before.
We validated the approach would work with Azure AD and other OIDC
providers with token validation logic delegated to provider-specific
extension.

The thread link is here:
https://www.postgresql.org/message-id/flat/CABkiuWo4fJQ7dhqgYLtJh41kpCkT6iXOO8Eym3Rdh5tx2RJCJw%40mail.gmail.com#f94c36969a68a07c087fa9af0f5401e1

With the client- and server- side support, the proposed patch would allow:
- Identity providers to publish PG extensions to validate their
specific token format.
- Client ecosystem to build generic OAUTH flows using OIDC provider
metadata defined in RFCs
- Cloud providers to support a combination of Identity providers they work with.

Looking forward to bring the discussion to that thread, and decide if
we can proceed with the OAUTH support.

Thanks,
Andrey.

On Wed, Jan 25, 2023 at 10:55 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2022-08-03 17:21:58 -0400, Stephen Frost wrote:
> > > * Andres Freund (and...@anarazel.de) wrote:
> > > > On 2022-08-03 16:28:08 -0400, Stephen Frost wrote:
> > > > > Again, server-side only is not interesting and not a direction that
> > > > > makes sense to go in because it doesn't provide any way to have
> > > > > trust established in both directions, which is what all modern
> > > > > authentication methods do (certificates, kerberos, scram) and is what 
> > > > > we
> > > > > should expect from anything new in this space.
> > > >
> > > > As explained repeatedly before, that's plainly not true. The patch 
> > > > allows
> > > > e.g. to use the exact scram flow we already have, with the code we have 
> > > > for
> > > > that (e.g. using a different source of secret).  In fact the example 
> > > > extension
> > > > does so (starting in v3, from 2022-03-15):
> > >
> > > Sure, thanks to the bespoke code in libpq for supporting SCRAM.  I don't
> > > think it makes sense to move the server-side code for that into an
> > > extension as it just makes work for a bunch of people.  Having a way to
> > > have the authenticator token for scram exist elsewhere doesn't strike me
> > > as unreasonable but that's not the same thing and is certainly more
> > > constrained in terms of what it's doing.
> >
> > The question is: Why is providing that ability not a good step, even if
> > somewhat constrainted? One argument would be that a later "protocol level
> > extensibility" effort would require significant changes to the API - but I
> > don't think that'd be the case. And even if, this isn't a large 
> > extensibility
> > surface, so API evolution wouldn't be a problem.
>
> I'm quite confused by this as I feel like I've said multiple times that
> having a way to have the SCRAM authenticator token come from somewhere
> else seems reasonable.  If that's not what you're referring to above by
> 'that ability' then it'd really help if you could clarify what you mean
> there.
>
> > > Further, I outlined exactly how extensability in this area could be
> > > achieved: use some good third party library that provides multiple SASL
> > > methods then just add support for that library to *PG*, on both the
> > > libpq and the server sides.
> >
> > > What I don't think makes sense is adding extensibility to the server
> > > side, especially if it's through a 3rd party library, but then not
> > > adding support for that to libpq.  I don't follow what the rationale is
> > > for explicitly excluding libpq from this discussion.
> >
> > The rationale is trivial: Breaking down projects into manageable, separately
> > useful, steps is a very sensible way of doing development. Even if we get
> > protocol extensibility, we're still going to need an extension API like it's
> > provided by the patchset. After protocol extensibility the authentication
> > extensions would then have more functions it could call to control the auth
> > flow with the client.
> >
> > > To be clear- I'm not explicitly saying that we can only add
> > > extensibility with SCRAM, I'm just saying that whatever we're doing here
> > > to add other actual authentication methods we should be ensuring is done
> > > on both sides with a way for each side to authenticate the other side,
> > > as all of the modern authentication methods we have already do.
> >
> > But *why* do these need to be tied together?
>
> I'm also confused by this.  Surely you'd need a

Re: Implement missing join selectivity estimation for range types

2023-01-25 Thread Mahmoud Sakr
Hi Tomas,

> I finally had time to properly read the paper today - the general
> approach mostly matches how I imagined the estimation would work for
> inequalities, but it's definitely nice to see the algorithm properly
> formalized and analyzed.

Awesome, thanks for this interest!

> What seems a bit strange to me is that the patch only deals with range
> types, leaving the scalar cases unchanged. I understand why (not having
> a MCV simplifies it a lot), but I'd bet joins on range types are wy
> less common than inequality joins on scalar types. I don't even remember
> seeing inequality join on a range column, TBH.
>
> That doesn't mean the patch is wrong, of course. But I'd expect users to
> be surprised we handle range types better than "old" scalar types (which
> range types build on, in some sense).
>
> Did you have any plans to work on improving estimates for the scalar
> case too? Or did you do the patch needed for the paper, and have no
> plans to continue working on this?

I fully agree. Scalars are way more important.
I join you in the call for Diogo and Zhicheng to continue this implementation,
specially after the interest you show towards this patch. The current patch
was a course project (taught by me and Maxime), which was specific to range
types. But indeed the solution generalizes well to scalars. Hopefully after the
current exam session (Feb), there will be time to continue the implementation.
Nevertheless, it makes sense to do two separate patches: this one, and
the scalars. The code of the two patches is located in different files, and
the estimation algorithms have slight differences.

> I'm also wondering about not having MCV for ranges. I was a bit
> surprised we don't build MCV in compute_range_stats(), and perhaps we
> should start building those - if there are common ranges, this might
> significantly improve some of the estimates (just like for scalar
> columns). Which would mean the estimates for range types are just as
> complex as for scalars. Of course, we don't do that now.

Good question. Our intuition is that MCV will not be useful for ranges.
Maxime has done an experiment and confirmed this intuition. Here is his
experiment and explanation:
Create a table with 126000 int4ranges.
All ranges have their middle between 0 and 1000 and a length between 90
and 110.
The ranges are created in the following way:
- 10 different ranges, each duplicated 1000 times
- 20 different ranges, each duplicated 500 times
- 40 different ranges, each duplicated 100 times
- 200 different ranges, each duplicated 10 times
- 10 different ranges, not duplicated
Two such tables (t1 and t2) were created in the same way but with different
data. Then he ran the following query:

EXPLAIN ANALYZE SELECT count(*)
FROM t, t2 WHERE t && t2

The results (using our patch) where the following:
Plan rows: 2991415662
Actual rows: 2981335423

So the estimation accuracy is still fairly good for such data with a lot
of repeated values, even without having MCV statistics.
The only error that can happen in our algorithms comes from the last bin in
which we assume uniform distribution of values. Duplicate values
(end bound, not ranges) might make this assumption be wrong, which would
create inaccurate estimations. However, this is still only incorrect
for the last
bin and all the others are correct.
MCV's are mainly useful for equality, which is not an operation we cover, and
probably not an important predicate for ranges. What do you think?

Best regards,
Mahmoud




Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-01-25 Thread Aleksander Alekseev
Hi Peter,

> It also makes DO UPDATE not work the same way as either UPDATE itself
> (which will silently skip a second or subsequent update of the same
> row by the same UPDATE statement in RC mode), or MERGE (which has
> similar cardinality violations).

That's true. On the flip side, UPDATE and MERGE are different
statements and arguably shouldn't behave the same way INSERT does.

To clarify, I'm merely proposing the change and playing the role of
Devil's advocate here. I'm not arguing that the patch should be
necessarily accepted. In the end of the day it's up to the community
to decide. Personally I think it would make the users a bit happier.

The actual reason why I made the patch is that a colleague of mine,
Sven Klemm, encountered this limitation recently and was puzzled by it
and so was I. The only workaround the user currently has is to execute
several INSERTs one by one which is expensive when you have a lot of
INSERTs.

-- 
Best regards,
Aleksander Alekseev




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread gkokolatos






--- Original Message ---
On Wednesday, January 25th, 2023 at 6:28 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> On 1/25/23 16:37, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby 
> > pry...@telsasoft.com wrote:
> > 
> > > On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote:
> > > 
> > > > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby 
> > > > pry...@telsasoft.com wrote:
> > > > 
> > > > > On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote:
> > > > > 
> > > > > > Please find attached v23 which reintroduces the split.
> > > > > > 
> > > > > > 0001 is reworked to have a reduced footprint than before. Also in 
> > > > > > an attempt
> > > > > > to facilitate the readability, 0002 splits the API's and the 
> > > > > > uncompressed
> > > > > > implementation in separate files.
> > > > > 
> > > > > Thanks for updating the patch. Could you address the review comments I
> > > > > sent here ?
> > > > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
> > > > 
> > > > Please find v24 attached.
> > > 
> > > Thanks for updating the patch.
> > > 
> > > In 001, RestoreArchive() does:
> > > 
> > > > -#ifndef HAVE_LIBZ
> > > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
> > > > - AH->PrintTocDataPtr != NULL)
> > > > + supports_compression = false;
> > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
> > > > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > > > + supports_compression = true;
> > > > +
> > > > + if (AH->PrintTocDataPtr != NULL)
> > > > {
> > > > for (te = AH->toc->next; te != AH->toc; te = te->next)
> > > > {
> > > > if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > > > - pg_fatal("cannot restore from compressed archive (compression not 
> > > > supported in this installation)");
> > > > + {
> > > > +#ifndef HAVE_LIBZ
> > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > > > + supports_compression = false;
> > > > +#endif
> > > > + if (supports_compression == false)
> > > > + pg_fatal("cannot restore from compressed archive (compression not 
> > > > supported in this installation)");
> > > > + }
> > > > }
> > > > }
> > > > -#endif
> > > 
> > > This first checks if the algorithm is implemented, and then checks if
> > > the algorithm is supported by the current build - that confused me for a
> > > bit. It seems unnecessary to check for unimplemented algorithms before
> > > looping. That also requires referencing both GZIP and LZ4 in two
> > > places.
> > 
> > I am not certain that it is unnecessary, at least not in the way that is
> > described. The idea is that new compression methods can be added, without
> > changing the archive's version number. It is very possible that it is
> > requested to restore an archive compressed with a method not implemented
> > in the current binary. The first check takes care of that and sets
> > supports_compression only for the supported versions. It is possible to
> > enter the loop with supports_compression already set to false, for example
> > because the archive was compressed with ZSTD, triggering the fatal error.
> > 
> > Of course, one can throw the error before entering the loop, yet I think
> > that it does not help the readability of the code. IMHO it is easier to
> > follow if the error is thrown once during that check.
> 
> 
> Actually, I don't understand why 0001 moves the check into the loop. I
> mean, why not check HAVE_LIBZ before the loop?

The intention is to be able to restore archives that don't contain
data. In that case compression becomes irrelevant as only the data in
an archive is compressed.

> 
> > > I think it could be written to avoid the need to change for added
> > > compression algorithms:
> > > 
> > > + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > > 
> > > + {
> > > + /* Check if the compression algorithm is supported */
> > > + pg_compress_specification spec;
> > > + parse_compress_specification(AH->compression_spec.algorithm, NULL, 
> > > &spec);
> > > 
> > > + if (spec->parse_error != NULL)
> > > 
> > > + pg_fatal(spec->parse_error);
> > > 
> > > + }
> > 
> > I am not certain how that would work in the example with ZSTD above.
> > If I am not wrong, parse_compress_specification() will not throw an error
> > if the codebase supports ZSTD, yet this specific pg_dump binary will not
> > support it because ZSTD is not implemented. parse_compress_specification()
> > is not aware of that and should not be aware of it, should it?
> 
> 
> Not sure. What happens in a similar situation now? That is, when trying
> to deal with an archive gzip-compressed in a build without libz?


In case that there are no data chunks, the archive will be restored.

Cheers,
//Georgios


> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: to_hex() for negative inputs

2023-01-25 Thread Aleksander Alekseev
Hi Dean,

> > So in your opinion what is the expected result of to_hex(INT_MIN,
> > with_sign => true)?
> >
>
> "-8000" or "-0x8000", depending on whether the prefix is
> requested.

Whether this is the right result is very debatable. 0x8000 is a
binary representation of -2147483648:

```
(gdb) ptype cur_timeout
type = int
(gdb) p cur_timeout = 0x8000
$1 = -2147483648
(gdb) p/x cur_timeout
$2 = 0x8000
```

So what you propose to return is -(-2147483648). For some users this
may be a wanted result, for some it may be not. Personally I would
prefer to get an ERROR in this case. And this is exactly how you end
up with even more flags.

I believe it would be better to let the user write the exact query
depending on what he/she wants.

-- 
Best regards,
Aleksander Alekseev




Re: Add LZ4 compression in pg_dump

2023-01-25 Thread gkokolatos






--- Original Message ---
On Wednesday, January 25th, 2023 at 7:00 PM, Justin Pryzby 
 wrote:


> 
> 
> On Wed, Jan 25, 2023 at 03:37:12PM +, gkokola...@pm.me wrote:
> 

> While looking at this, I realized that commit 5e73a6048 introduced a
> regression:
> 
> @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH)
> 
> - if (AH->compression != 0)
> 
> - pg_log_warning("archive is compressed, but this installation does not 
> support compression -- no data will be available");
> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> 
> + pg_fatal("archive is compressed, but this installation does not support 
> compression");
> 
> Before, it was possible to restore non-data chunks of a dump file, even
> if the current build didn't support its compression. But that's now
> impossible - and it makes the code we're discussing in RestoreArchive()
> unreachable.

Nice catch!

Cheers,
//Georgios

> --
> Justin




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-14 12:34:03 -0800, Andres Freund wrote:
> On 2023-01-14 00:48:52 -0800, Jeff Davis wrote:
> > On Mon, 2022-12-26 at 14:20 +0530, Bharath Rupireddy wrote:
> > > Please review the attached v2 patch further.
> > 
> > I'm still unclear on the performance goals of this patch. I see that it
> > will reduce syscalls, which sounds good, but to what end?
> > 
> > Does it allow a greater number of walsenders? Lower replication
> > latency? Less IO bandwidth? All of the above?
> 
> One benefit would be that it'd make it more realistic to use direct IO for WAL
> - for which I have seen significant performance benefits. But when we
> afterwards have to re-read it from disk to replicate, it's less clearly a win.

Satya's email just now reminded me of another important reason:

Eventually we should add the ability to stream out WAL *before* it has locally
been written out and flushed. Obviously the relevant positions would have to
be noted in the relevant message in the streaming protocol, and we couldn't
generally allow standbys to apply that data yet.

That'd allow us to significantly reduce the overhead of synchronous
replication, because instead of commonly needing to send out all the pending
WAL at commit, we'd just need to send out the updated flush position. The
reason this would lower the overhead is that:

a) The reduced amount of data to be transferred reduces latency - it's easy to
   accumulate a few TCP packets worth of data even in a single small OLTP
   transaction
b) The remote side can start to write out data earlier


Of course this would require additional infrastructure on the receiver
side. E.g. some persistent state indicating up to where WAL is allowed to be
applied, to avoid the standby getting ahead of th eprimary, in case the
primary crash-restarts (or has more severe issues).


With a bit of work we could perform WAL replay on standby without waiting for
the fdatasync of the received WAL - that only needs to happen when a) we need
to confirm a flush position to the primary b) when we need to write back pages
from the buffer pool (and some other things).


Greetings,

Andres Freund




Re: GUCs to control abbreviated sort keys

2023-01-25 Thread Jeff Davis
On Tue, 2023-01-24 at 21:42 -0500, Robert Haas wrote:
> I find it a bit premature to include this comment in the very first
> email what if other people don't like the idea?

The trust_strxfrm GUC was pulled from the larger collation refactoring
patch, which has been out for a while. The sort_abbreviated_keys GUC is
new, and I posted these both in a new thread because they started to
look independently useful.

If someone doesn't like the idea, they are free to comment, like in
every other case (though this patch doesn't seem very controversial to
me?). I suppose the wording was off-putting, so I'll choose different
words next time.

> I would like to hear about the cases where abbreviated keys resulted
> in a regression.

I want to be clear that this is not a general criticism of the
abbreviated keys optimization, nor a comprehensive analysis of its
performance.

I am highlighting this case because the existence of a single non-
contrived case or regression suggests that we may want to explore
further and tweak heuristics. That's quite natural when the heuristics
are based on a complex dependency like a collation provider. The
sort_abbreviated_keys GUC makes that kind of exploration and tweaking a
lot easier.

Built with meson on linux, gcc 11.3.0, opt -O3. Times are the middle of
three runs, taken from the sort operator's "first returned tuple" time
in EXPLAIN ANALYZE. Total runtime (as reported in EXPLAIN ANALYZE) is
pretty much the same story, but I think there was slightly more noise
in that number.

$ perl text_generator.pl 1000 10 > /tmp/strings.txt

CREATE TABLE s (t TEXT);
COPY s FROM '/tmp/strings.txt';
VACUUM FREEZE s;
CHECKPOINT;
SET work_mem='10GB';
SET max_parallel_workers = 0;
SET max_parallel_workers_per_gather = 0;

SET sort_abbreviated_keys = false;
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu";
-- 20875ms

SET sort_abbreviated_keys = true;
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu";
-- 22931ms

Regression for abbreviated keys optimization in this case: 9.8%

> I'd also like to know whether there's a realistic possibility that
> making this a run-time test could itself result in a regression.

The sort_abbreviated_keys branch is happening after
tuplesort_begin_common (which creates memory contexts, etc.) and before
preparing the sort keys (which involves catalog lookups). The
trust_strxfrm branch is happening in the type-specific sort support
function, which needs to be looked up in the catalog before being
called (using V1 calling convention).

It doesn't look likely that a single branch in that path will have a
perf impact. Do you have a more specific concern?

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS




text_generator.pl
Description: Perl program


Set arbitrary GUC options during initdb

2023-01-25 Thread Tom Lane
The attached patch responds to the discussion at [1] about how
we ought to offer a way to set any server GUC from the initdb
command line.  Currently, if for some reason the server won't
start with default parameters, the only way to get through initdb
is to change the installed version of postgresql.conf.sample.
And even that is just a kluge, because the initial probes to
choose max_connections and shared_buffers will all fail, causing
initdb to choose rock-bottom-minimum values of those settings.
You can fix that up after the fact if you notice it, but you
might not.

So this invents an initdb switch "-c NAME=VALUE" just like the
one that the server itself has long had.  The specified settings
are applied on the command line of the initial probe calls
(which happen before we've made any config files), and then they
are added to postgresql.auto.conf, which causes them to take
effect for the bootstrap backend runs as well as subsequent
postmaster starts.

I also invented "--set NAME=VALUE", mainly because just about
every other initdb switch has a long form.  The server itself
doesn't have that spelling, so I'm not wedded to that part.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/17757-dbdfc1f1c954a6db%40postgresql.org

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 5b2bdac101..904cee7858 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -433,6 +433,23 @@ PostgreSQL documentation
 Other, less commonly used, options are also available:
 
 
+ 
+  -c name=value
+  --set name=value
+  
+   
+Forcibly set the server parameter name
+to value during initdb,
+and also set up that assignment in the
+generated postgresql.auto.conf file, as though
+it had been issued via ALTER SYSTEM SET.
+This option can be used more than once to set several parameters.
+It is primarily useful when the environment is such that the server
+will not start at all using the default parameters.
+   
+  
+ 
+
  
   -d
   --debug
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7a58c33ace..8daad6a1f6 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -82,6 +82,13 @@
 /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
 extern const char *select_default_timezone(const char *share_path);
 
+/* simple list of strings */
+typedef struct _stringlist
+{
+	char	   *str;
+	struct _stringlist *next;
+} _stringlist;
+
 static const char *const auth_methods_host[] = {
 	"trust", "reject", "scram-sha-256", "md5", "password", "ident", "radius",
 #ifdef ENABLE_GSS
@@ -142,6 +149,8 @@ static char *pwfilename = NULL;
 static char *superuser_password = NULL;
 static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
+static _stringlist *extra_guc_names = NULL;
+static _stringlist *extra_guc_values = NULL;
 static bool debug = false;
 static bool noclean = false;
 static bool noinstructions = false;
@@ -253,6 +262,7 @@ static void check_input(char *path);
 static void write_version_file(const char *extrapath);
 static void set_null_conf(void);
 static void test_config_settings(void);
+static bool test_specific_config_settings(int test_conns, int test_buffs);
 static void setup_config(void);
 static void bootstrap_template1(void);
 static void setup_auth(FILE *cmdfd);
@@ -359,6 +369,27 @@ escape_quotes_bki(const char *src)
 	return result;
 }
 
+/*
+ * Add an item at the end of a stringlist.
+ */
+static void
+add_stringlist_item(_stringlist **listhead, const char *str)
+{
+	_stringlist *newentry = pg_malloc(sizeof(_stringlist));
+	_stringlist *oldentry;
+
+	newentry->str = pg_strdup(str);
+	newentry->next = NULL;
+	if (*listhead == NULL)
+		*listhead = newentry;
+	else
+	{
+		for (oldentry = *listhead; oldentry->next; oldentry = oldentry->next)
+			 /* skip */ ;
+		oldentry->next = newentry;
+	}
+}
+
 /*
  * make a copy of the array of lines, with token replaced by replacement
  * the first time it occurs on each line.
@@ -873,11 +904,9 @@ test_config_settings(void)
 		400, 300, 200, 100, 50
 	};
 
-	char		cmd[MAXPGPATH];
 	const int	connslen = sizeof(trial_conns) / sizeof(int);
 	const int	bufslen = sizeof(trial_bufs) / sizeof(int);
 	int			i,
-status,
 test_conns,
 test_buffs,
 ok_buffers = 0;
@@ -903,19 +932,7 @@ test_config_settings(void)
 		test_conns = trial_conns[i];
 		test_buffs = MIN_BUFS_FOR_CONNS(test_conns);
 
-		snprintf(cmd, sizeof(cmd),
- "\"%s\" --check %s %s "
- "-c max_connections=%d "
- "-c shared_buffers=%d "
- "-c dynamic_shared_memory_type=%s "
- "< \"%s\" > \"%s\" 2>&1",
- backend_exec, boot_options, extra_options,
- test_conns, test_buffs,
- dynamic_shared_memory_type,
- DEVNULL, DEVNULL);
-		fflush(NULL);
-		status = system(cmd);
-		if (status == 0)
+		if 

Re: plpython vs _POSIX_C_SOURCE

2023-01-25 Thread Andres Freund
Hi,

Pushed the patches. So far no fallout, and hoverfly recovered.

I just checked a few of the more odd animals (Illumos, Solaris, old OpenBSD,
AIX) that already ran without finding new warnings.

There's a few more animals to run before I'll fully relax though.


On 2023-01-25 08:31:23 -0500, Robert Haas wrote:
> Plus, the cost of experimentation here seems very low. Sure, something
> might break, but if it does, we can just change it back, or change it
> again. That's not really a big deal. The thing that would be a big
> deal, maybe, is if we released and only found out afterward that this
> caused some subtle and horrible problem for which we had no
> back-patchable fix, but that seems pretty unlikely.

Agreed.

Greetings,

Andres Freund




Re: heapgettup() with NoMovementScanDirection unused in core?

2023-01-25 Thread Tom Lane
Andres Freund  writes:
> On 2023-01-25 10:02:28 -0500, Tom Lane wrote:
>> We must have the NoMovementScanDirection option because count = 0
>> does not mean "do nothing", and I noted at least two call sites
>> that require it.

> I wonder if we'd be better off removing NoMovementScanDirection, and using
> count == (uint64)-1 for what NoMovementScanDirection is currently used for as
> an ExecutorRun parameter.  Seems less confusing to me - right now we have two
> parameters with non-obbvious meanings and interactions.

I'm down on that because it seems just about certain to break extensions
that call the executor, and it isn't adding enough clarity (IMHO) to
justify that.

regards, tom lane




Re: GUCs to control abbreviated sort keys

2023-01-25 Thread Jeff Davis
On Tue, 2023-01-24 at 19:43 -0600, Justin Pryzby wrote:
> I think "an optimization, if applicable" is either too terse, or
> somehow
> wrong.  Maybe:
> 
> > Enables or disables the use of abbreviated keys, a sort
> > optimization...

Done.

> > +    optimization could return wrong results. Set to
> > +    true if certain that
> > strxfrm()
> > +    can be trusted.
> 
> "if you are certain"; or "if it is ..."

Done.

Thank you, rebased patch attached.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 39ed011cc51ba3a4af5e3b559a7b8de25fb895a5 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 21 Jan 2023 12:44:07 -0800
Subject: [PATCH v2] Introduce GUCs to control abbreviated keys sort
 optimization.

The setting sort_abbreviated_keys turns the optimization on or off
overall. The optimization relies on collation providers, which are
complex dependencies, and the performance of the optimization may rely
on many factors. Introducing a GUC allows easier diagnosis when this
optimization results in worse perforamnce.

The setting trust_strxfrm replaces the define TRUST_STRXFRM, allowing
users to experiment with the abbreviated keys optimization when using
the libc provider. Previously, the optimization only applied to
collations using the ICU provider unless specially compiled. By
default, allowed only for superusers (because an incorrect setting
could lead to wrong results), but can be granted to others.
---
 doc/src/sgml/config.sgml   | 40 ++
 src/backend/utils/adt/varlena.c| 10 +++---
 src/backend/utils/misc/guc_tables.c| 24 +
 src/backend/utils/sort/tuplesortvariants.c | 17 ++---
 4 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f985afc009..8f55b89f35 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11252,6 +11252,46 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  sort_abbreviated_keys (boolean)
+  
+   sort_abbreviated_keys configuration parameter
+  
+  
+  
+   
+Enables or disables the use of abbreviated sort keys, a sort optimization,
+if applicable. The default is true. Disabling may
+be useful to diagnose problems or measure performance.
+   
+  
+ 
+
+ 
+  trust_strxfrm (boolean)
+  
+   trust_strxfrm configuration parameter
+  
+  
+  
+   
+Abbreviated keys, a sort optimization, depends on correct behavior of
+the operating system function strxfrm() when
+using a collation with the libc provider. On some
+platforms strxfrm() does not return results
+consistent with strcoll(), which means the
+optimization could return wrong results. Set to
+true if it is certain that
+strxfrm() can be trusted.
+   
+   
+The default value is false. This setting has no
+effect if  is set to
+false.
+   
+  
+ 
+
  
   trace_locks (boolean)
   
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index fd81c47474..c270022483 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -45,6 +45,9 @@
 /* GUC variable */
 int			bytea_output = BYTEA_OUTPUT_HEX;
 
+/* GUC to enable use of strxfrm() for abbreviated keys */
+bool		trust_strxfrm = false;
+
 typedef struct varlena unknown;
 typedef struct varlena VarString;
 
@@ -2115,7 +2118,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * other libc other than Cygwin has so far been shown to have a problem,
 	 * we take the conservative course of action for right now and disable
 	 * this categorically.  (Users who are certain this isn't a problem on
-	 * their system can define TRUST_STRXFRM.)
+	 * their system can set the trust_strxfrm GUC to true.)
 	 *
 	 * Even apart from the risk of broken locales, it's possible that there
 	 * are platforms where the use of abbreviated keys should be disabled at
@@ -2128,10 +2131,9 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * categorically, we may still want or need to disable it for particular
 	 * platforms.
 	 */
-#ifndef TRUST_STRXFRM
-	if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU))
+	if (!trust_strxfrm && !collate_c &&
+		!(locale && locale->provider == COLLPROVIDER_ICU))
 		abbreviate = false;
-#endif
 
 	/*
 	 * If we're using abbreviated keys, or if we're using a locale-aware
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4ac808ed22..fd4a02fbf5 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -102,6 +102,8 @@ extern bool trace_syncscan;
 #ifdef DEBUG_BOUNDED_SORT
 extern bool optimize_bounded_sort;
 #endif
+extern bool sort_abbreviated_keys;
+extern bool trust_strxfrm;
 

Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Nathan Bossart
On Wed, Jan 25, 2023 at 02:05:39PM -0500, Robert Haas wrote:
> On Wed, Jan 25, 2023 at 1:17 PM Nathan Bossart  
> wrote:
>> I wanted to merge basic_archive and basic_recovery because there's a decent
>> chunk of duplicated code.  Perhaps that is okay, but I would rather just
>> have one test module.  AFAICT the biggest reason to split it is because we
>> can't determine a good name.  Maybe we could leave the name as
>> "basic_archive" since it deals with creating and recovering archive files.
> 
> Yeah, maybe. I'm not sure what the best thing to do is, but if I see a
> module called basic_archive or basic_restore, I know what it's about,
> whereas basic_wal_module seems a lot less specific. It sounds like it
> could be generating or streaming it just as easily as it could be
> archiving it. It would be nice to have a name that is less prone to
> that kind of unclarity.

Good point.  It seems like the most straightforward approach is just to
have separate modules.  Unless Michael objects, I'll go that route.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: to_hex() for negative inputs

2023-01-25 Thread Peter Eisentraut

On 24.01.23 14:10, Dean Rasheed wrote:

I also think it might be useful for it to gain a couple of boolean options:

1). An option to output a signed value (defaulting to false, to
preserve the current two's complement output).


I find the existing behavior so strange, I would rather give up and 
invent a whole new function family with correct behavior, which could 
then also support octal and binary, and perhaps any base.  This could be 
something generally named, like "convert" or "format".



2). An option to output the base prefix "0x" (which comes after the
sign, making it inconvenient for the user to add themselves).


This could also be handled with a "format"-like function.





Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Tom Lane
Nathan Bossart  writes:
> I wanted to merge basic_archive and basic_recovery because there's a decent
> chunk of duplicated code.

Would said code likely be duplicated into non-test uses of this feature?
If so, maybe you ought to factor it out into a common location.  I agree
with Robert's point that basic_wal_module is a pretty content-free name.

regards, tom lane




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-01-25 Thread SATYANARAYANA NARLAPURAM
On Sat, Jan 14, 2023 at 12:34 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-14 00:48:52 -0800, Jeff Davis wrote:
> > On Mon, 2022-12-26 at 14:20 +0530, Bharath Rupireddy wrote:
> > > Please review the attached v2 patch further.
> >
> > I'm still unclear on the performance goals of this patch. I see that it
> > will reduce syscalls, which sounds good, but to what end?
> >
> > Does it allow a greater number of walsenders? Lower replication
> > latency? Less IO bandwidth? All of the above?
>
> One benefit would be that it'd make it more realistic to use direct IO for
> WAL
> - for which I have seen significant performance benefits. But when we
> afterwards have to re-read it from disk to replicate, it's less clearly a
> win.
>

 +1. Archive modules rely on reading the wal files for PITR. Direct IO for
WAL requires reading these files from disk anyways for archival. However,
Archiving using utilities like pg_receivewal can take advantage of this
patch together with direct IO for WAL.

Thanks,
Satya


Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-25 14:05:39 -0500, Robert Haas wrote:
> > I wanted to merge basic_archive and basic_recovery because there's a decent
> > chunk of duplicated code.  Perhaps that is okay, but I would rather just
> > have one test module.  AFAICT the biggest reason to split it is because we
> > can't determine a good name.  Maybe we could leave the name as
> > "basic_archive" since it deals with creating and recovering archive files.
> 
> Yeah, maybe. I'm not sure what the best thing to do is, but if I see a
> module called basic_archive or basic_restore, I know what it's about,
> whereas basic_wal_module seems a lot less specific. It sounds like it
> could be generating or streaming it just as easily as it could be
> archiving it. It would be nice to have a name that is less prone to
> that kind of unclarity.

I think it'd be just fine to keep the name as basic_archive and use it for
both archiving and restoring. Restoring from an archive still deals with
archiving.

I agree that basic_wal_module isn't a good name.

Greetings,

Andres Freund




Re: Set arbitrary GUC options during initdb

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-25 16:25:19 -0500, Tom Lane wrote:
> The attached patch responds to the discussion at [1] about how
> we ought to offer a way to set any server GUC from the initdb
> command line.

Are you thinking of backpatching this, to offer the people affected by the
issue in [1] a way out?


> So this invents an initdb switch "-c NAME=VALUE" just like the
> one that the server itself has long had.

I still am +1 on the idea. I've actually wanted this for development purposes
a couple times...


> The specified settings are applied on the command line of the initial probe
> calls (which happen before we've made any config files), and then they are
> added to postgresql.auto.conf, which causes them to take effect for the
> bootstrap backend runs as well as subsequent postmaster starts.

I think this means that if you set e.g. max_connections as an initdb
parameter, the probes won't do much. Probably fine?


Perhaps worth memorializing the priority of the -c options in a test?
E.g. setting shared_buffers = 20MB or so and then testing that that's the
value when starting the server?


> I also invented "--set NAME=VALUE", mainly because just about
> every other initdb switch has a long form.  The server itself
> doesn't have that spelling, so I'm not wedded to that part.

Fine with me, but also fine to leave out.

Greetings,

Andres Freund




Re: Re: Support plpgsql multi-range in conditional control

2023-01-25 Thread Isaac Morland
On Wed, 25 Jan 2023 at 12:02, Pavel Stehule  wrote:

>
>
> st 25. 1. 2023 v 17:22 odesílatel songjinzhou <2903807...@qq.com> napsal:
>
>>
>> As follows, we can only repeat the for statement before we use such SQL:
>>
>> begin
>> for i in 10..20 loop
>> raise notice '%', i; -- Things to do
>> end loop;
>>
>> for i in 100 .. 200 by 10 loop
>> raise notice '%', i; -- Things to do
>> end loop;
>> end;
>>
>> But now we can simplify it as follows:
>>
>> begin
>> for i in 10..20, 100 .. 200 by 10 loop
>> raise notice '%', i; -- Things to do
>> end loop;
>> end;
>>
>> Although we can only use integer iterative control here, this is just a
>> horizontal expansion of the previous logic. Thank you very much for your
>> reply. I am very grateful!
>>
>>
> Unfortunately, this is not a real use case - this is not an example from
> the real world.
>

And anyway, this is already supported using generate_series() and UNION:

odyssey=> do $$ declare i int; begin for i in select generate_series (10,
20) union all select generate_series (100, 200, 10) do loop raise notice
'i=%', i; end loop; end;$$;
NOTICE:  i=10
NOTICE:  i=11
NOTICE:  i=12
NOTICE:  i=13
NOTICE:  i=14
NOTICE:  i=15
NOTICE:  i=16
NOTICE:  i=17
NOTICE:  i=18
NOTICE:  i=19
NOTICE:  i=20
NOTICE:  i=100
NOTICE:  i=110
NOTICE:  i=120
NOTICE:  i=130
NOTICE:  i=140
NOTICE:  i=150
NOTICE:  i=160
NOTICE:  i=170
NOTICE:  i=180
NOTICE:  i=190
NOTICE:  i=200
DO
odyssey=>

The existing x..y notation is just syntactic sugar for a presumably common
case (although I’m dubious how often one really loops through a range of
numbers — surely in a database looping through a query result is
overwhelmingly dominant?); I don’t think you’ll find much support around
here for adding more syntax possibilities to the loop construct.


Re: Set arbitrary GUC options during initdb

2023-01-25 Thread Tom Lane
Andres Freund  writes:
> On 2023-01-25 16:25:19 -0500, Tom Lane wrote:
>> The attached patch responds to the discussion at [1] about how
>> we ought to offer a way to set any server GUC from the initdb
>> command line.

> Are you thinking of backpatching this, to offer the people affected by the
> issue in [1] a way out?

We could ... it's a new feature for sure, but it seems quite unlikely
to break things for anybody not using it.

>> The specified settings are applied on the command line of the initial probe
>> calls (which happen before we've made any config files), and then they are
>> added to postgresql.auto.conf, which causes them to take effect for the
>> bootstrap backend runs as well as subsequent postmaster starts.

> I think this means that if you set e.g. max_connections as an initdb
> parameter, the probes won't do much. Probably fine?

Right, the probed value will be overridden.

> Perhaps worth memorializing the priority of the -c options in a test?
> E.g. setting shared_buffers = 20MB or so and then testing that that's the
> value when starting the server?

Given that it's written into postgresql.auto.conf, I imagine that
we have test coverage of that point already.

There is a more subtle issue, which is that -c max_connections or
-c shared_buffers should override the probe values *during the
probe steps*.  My first thought about implementation had been to
create postgresql.auto.conf right off the bat, but that would
fail to have this property because server command line overrides
config file.  I can't think of any very portable way to check
that though.

regards, tom lane




Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-01-25 Thread Andres Freund
Hi,

On 2023-01-25 22:00:50 +0300, Aleksander Alekseev wrote:
> Perhaps that's not a bug especially considering the fact that the
> documentation describes this behavior, but in any case the fact that:
> 
> ```
> INSERT INTO t VALUES (1,1) ON CONFLICT (a) DO UPDATE SET b = 0;
> INSERT INTO t VALUES (1,2) ON CONFLICT (a) DO UPDATE SET b = 0;
> ```
> 
> and:
> 
> ```
> INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO NOTHING;
> ``
> 
> .. both work, and:
> 
> ```
> INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0;
> ```
> 
> ... doesn't is rather confusing. There is no reason why the latest
> query shouldn't work except for a slight complication of the code.
> Which seems to be a reasonable tradeoff, for me at least.

I don't agree that this is just about a "slight complication" of the code. I
think semantically the proposed new behaviour is pretty bogus.


It *certainly* can't be right to just continue with the update in heap_update,
as you've done. You'd have to skip the update, not execute it. What am I
missing here?

I think this'd completely break triggers, for example, because they won't be
able to get the prior row version, since it won't actually be a row ever
visible (due to cmin=cmax).

I suspect it might break unique constraints as well, because we'd end up with
an invisible row in part of the ctid chain.



> > But what's the justification for erroring out in the DO NOTHING case?
> >
> > [...]
> >
> > It seems somewhat likely that a behavioural change will cause trouble for 
> > some
> > of the uses of DO NOTHING out there.
> 
> Just to make sure we are on the same page. The patch doesn't break the
> current DO NOTHING behavior but rather makes DO UPDATE work the same
> way DO NOTHING does.

I see that now - I somehow thought you were recommending to error out in both
cases, rather than the other way round.

Greetings,

Andres Freund




Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Nathan Bossart
On Wed, Jan 25, 2023 at 04:50:22PM -0500, Tom Lane wrote:
> Nathan Bossart  writes:
>> I wanted to merge basic_archive and basic_recovery because there's a decent
>> chunk of duplicated code.
> 
> Would said code likely be duplicated into non-test uses of this feature?
> If so, maybe you ought to factor it out into a common location.  I agree
> with Robert's point that basic_wal_module is a pretty content-free name.

I doubt it.  The duplicated parts are things like _PG_init(), the check
hook for the GUC, and all the rest of the usual boilerplate stuff for
extensions (e.g., Makefile, meson.build).  This module is small enough that
this probably makes up the majority of the code.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Nathan Bossart
On Wed, Jan 25, 2023 at 01:58:01PM -0800, Andres Freund wrote:
> I think it'd be just fine to keep the name as basic_archive and use it for
> both archiving and restoring. Restoring from an archive still deals with
> archiving.

This is my preference.  If Michael and Robert are okay with it, I think
this is what we should do.  Else, I'll create separate basic_archive and
basic_restore modules.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pgsql: Rename contrib module basic_archive to basic_wal_module

2023-01-25 Thread Michael Paquier
On Wed, Jan 25, 2023 at 02:41:18PM -0800, Nathan Bossart wrote:
> This is my preference.  If Michael and Robert are okay with it, I think
> this is what we should do.  Else, I'll create separate basic_archive and
> basic_restore modules.

Grouping both things into the same module has the advantage to ease
the configuration, at least in the example, where the archive
directory GUC can be used for both paths.

Regarding the renaming, I pushed for that.  There's little love for it
as far as I can see, so will revert accordingly.  Keeping
basic_archive as name for the module while it includes recovery is
fine by me.
--
Michael


signature.asc
Description: PGP signature


Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-01-25 Thread Jacob Champion
On 1/24/23 12:04, Robert Haas wrote:
> I find the concept of "ambient authentication" problematic. I don't
> know exactly what you mean by it. I hope you'll tell me,

Sure: Ambient authority [1] means that something is granted access based
on some aspect of its existence that it can't remove (or even
necessarily enumerate). Up above, when you said "I cannot choose not to
be myself," that's a clear marker that ambient authority is involved.
Examples of ambient authn/z factors might include an originating IP
address, the user ID of a connected peer process, the use of a loopback
interface, a GPS location, and so on. So 'peer' and 'ident' are ambient
authentication methods.

And, because I think it's useful, I'll extend the definition to include
privileges that _could_ be dropped by a proxy, but in practice are
included because there's no easy way not to. Examples for libpq include
the automatic use of the client certificate in ~/.postgresql, or any
Kerberos credentials available in the local user cache. (Or even a
PGPASSWORD set up and forgotten by a DBA.)

Ambient authority is closely related to the confused deputy problem [2],
and the proxy discussed here is a classic confused deputy. The proxy
doesn't know that a core piece of its identity has been used to
authenticate the request it's forwarding. It can't choose its IP
address, or its user ID.

I'm most familiar with this in the context of HTTP, cookie-/IP-based
authn, and cross-site request forgeries. Whenever someone runs a local
web server with no authentication and says "it's okay! we only respond
to requests from the local host!" they're probably about to be broken
open by the first person to successfully reflect a request through the
victim's (very local) web browser.

Ways to mitigate or solve this problem (that I know of) include

1) Forwarding the original ambient context along with the request, so
the server can check it too. HTTP has the Origin header, so a browser
can say, "This request is not coming from my end user; it's coming from
a page controlled by example.org. You can't necessarily treat attached
cookies like they're authoritative." The PROXY protocol lets a proxy
forward several ambient factors, including the originating IP address
(or even the use of a UNIX socket) and information about the original
TLS context.

2) Explicitly combining the request with the proof of authority needed
to make it, as in capability-based security [3]. Some web frameworks
push secret "CSRF tokens" into URLs for this purpose, to tangle the
authorization into the request itself [4]. I'd argue that the "password
requirement" implemented by postgres_fdw and discussed upthread was an
attempt at doing this, to try to ensure that the authentication comes
from the user explicitly and not from the proxy. It's just not very strong.

(require_auth would strengthen it quite a bit; a major feature of that
patchset is to explicitly name the in-band authentication factors that a
server is allowed to pull out of a client. It's still not strong enough
to make a true capability, for one because it's client-side only. But as
long as servers don't perform actions on behalf of users upon
connection, that's pretty good in practice.)

3) Dropping as many implicitly-held privileges as possible before making
a request. This doesn't solve the problem but may considerably reduce
the practical attack surface. For example, if browsers didn't attach
their user's cookies to cross-origin requests, cross-site request
forgeries would probably be considerably less dangerous (and, in the
years since I left the space, it looks like browsers have finally
stopped doing this by default). Upthread, Andres suggested disabling the
default inclusion of client certs and GSS creds, and I would extend that
to include really *anything* pulled in from the environment. Make the
DBA explicitly allow those things.

> but I think
> that I won't like it even after I know, because as I said before, it's
> difficult to know why anyone else makes a decision, and asking an
> untrusted third-party why they're deciding something is sketchy at
> best.

I think that's a red herring. Setting aside that you can, in fact, prove
that the server has authenticated you (e.g. require_auth=scram-sha-256
in my proposed patchset), I don't think "untrusted servers, that we
don't control, doing something stupid" is a very useful thing to focus
on. We're trying to secure the case where a server *is* authenticating
us, using known useful factors, but those factors have been co-opted by
an attacker via a proxy.

> I think that the problems we have in this area can be solved by
> either (a) restricting the open proxy to be less open or (b)
> encouraging people to authenticate users in some way that won't admit
> connections from an open proxy.

(a) is an excellent mitigation, and we should do it. (b) starts getting
shaky because I think peer auth is actually a very reasonable choice for
many people. So I hope we can also start solv

[BUG] pg_stat_statements and extended query protocol

2023-01-25 Thread Imseih (AWS), Sami
Doing some work with extended query protocol, I encountered the same
issue that was discussed in [1]. It appears when a client is using
extended query protocol and sends an Execute message to a portal with
max_rows, and a portal is executed multiple times,
pg_stat_statements does not correctly track rows and calls.

Using the attached jdbc script, TEST.java, which can reproduce the issue
with setFetchSize of 100 with autocommit mode set to OFF. We can
see that although pg_class has 414 rows, the total call and
rows returned is 14. the first 4 * 100 fetches did not get accounted for,.

postgres=# select calls, rows, query from pg_stat_statements
postgres-# where queryid = '-1905758228217333571';
calls | rows | query
-
1 | 14 | select * from pg_class
(1 row)

The execution work flow goes something like this:
ExecutorStart
ExecutorRun – which will be called multiple times to fetch from the
   portal until the caller Closes the portal or the 
portal
   runs out of rows.
ExecutorFinish
ExecutorEnd – portal is closed & pg_stat_statements stores the final rows 
processed

Where this breaks for pg_stat_statements is during ExecutorRun,
es_processed is reset to 0 every iteration. So by the time the portal
is closed, es_processed will only show the total from the last execute
message.

This appears to be only an issue for portals fetched
through extended query protocol and not explicit cursors
that go through simple query protocol (i.e. FETCH )

I attached a JDBC script to repro the issue.

One potential fix I see is to introduce 2 new counters in the
ExecutionState which will track the total rows processed
and the number of calls. These counters can then be used
by pg_stat_statements. Attached is an experimental patch
which shows the correct number of rows and number of
calls.

postgres=# select calls, rows, query from pg_stat_statements
postgres-# where queryid = '-1905758228217333571';
calls | rows | query
-
5 | 414 | select * from pg_class
(1 row)

[1] 
https://www.postgresql.org/message-id/flat/c90890e7-9c89-c34f-d3c5-d5c763a34bd8%40dunslane.net

Thanks

–
Sami Imseih
Amazon Web Services (AWS)



0001-correct-pg_stat_statements-tracking-of-portals.patch
Description: 0001-correct-pg_stat_statements-tracking-of-portals.patch


Test.java
Description: Test.java


Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound (stop telling users to "vacuum that database in single-user mode")

2023-01-25 Thread Justin Pryzby
On Mon, Jan 16, 2023 at 03:50:57PM +0300, Aleksander Alekseev wrote:
> Hi hackers,
> 
> > The proposed patchset changes the documentation and the error messages
> > accordingly, making them less misleading. 0001 corrects the
> > documentation but doesn't touch the code. 0002 and 0003 correct the
> > messages shown when approaching xidWrapLimit and xidWarnLimit
> > accordingly.
> 
> A colleague of mine, Oleksii Kliukin, pointed out that the
> recommendation about running VACUUM in a single-user mode is also
> outdated, as it was previously reported in [1]. I didn't believe it at
> first and decided to double-check:

and again at:
https://www.postgresql.org/message-id/flat/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com#e7dd25fdcd171c5775f3f9e3f86b2082

> Unfortunately the [1] discussion went nowhere. 

likewise...

> So I figured it would be appropriate to add corresponding changes to
> the proposed patchset since it's relevant and is registered in the CF
> app already. PFA patchset v2 which now also includes 0004.
> 
> [1]:
> https://www.postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com

I suggest to resend this with a title like the 2021 thread [1] (I was
unable to find this just now when I looked)
| doc: stop telling users to "vacuum that database in single-user mode"

And copy the participants of the previous two iterations of this thread.

-- 
Justin




Re: suppressing useless wakeups in logical/worker.c

2023-01-25 Thread Nathan Bossart
On Tue, Jan 24, 2023 at 06:45:08PM -0500, Tom Lane wrote:
> I took a look through this, and have a number of mostly-cosmetic
> issues:

Thanks for the detailed review.

> * It seems wrong that next_sync_start isn't handled as one of the
> wakeup[NUM_LRW_WAKEUPS] entries.  I see that it needs to be accessed from
> another module; but you could handle that without exposing either enum
> LogRepWorkerWakeupReason or the array, by providing getter/setter
> functions for process_syncing_tables_for_apply() to call.
> 
> * This code is far too intimately familiar with the fact that TimestampTz
> is an int64 count of microseconds.  (I'm picky about that because I
> remember that they were once something else, so I wonder if someday
> they will be different again.)  You could get rid of the PG_INT64_MAX
> usages by replacing those with the timestamp infinity macro DT_NOEND;
> and I'd even be on board with adding a less-opaque alternate name for
> that to datatype/timestamp.h.  The various magic-constant multipliers
> could perhaps be made less magic by using TimestampTzPlusMilliseconds(). 
> 
> * I think it might be better to construct the enum like this:
> 
> +typedef enum LogRepWorkerWakeupReason
> +{
> + LRW_WAKEUP_TERMINATE,
> + LRW_WAKEUP_PING,
> + LRW_WAKEUP_STATUS
> +#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1)
> +} LogRepWorkerWakeupReason;
> 
> so that you don't have to have a default: case in switches on the
> enum value.  I'm more worried about somebody adding an enum value
> and missing updating a switch statement elsewhere than I am about 
> somebody adding an enum value and neglecting to update the
> immediately-adjacent macro.

I did all of this in v3.

> * The updates of "now" in LogicalRepApplyLoop seem rather
> randomly placed, and I'm not entirely convinced that we'll
> always be using a reasonably up-to-date value.  Can't we
> just update it right before each usage?

This came up for walreceiver.c, too.  The concern is that
GetCurrentTimestamp() might be rather expensive on systems without
something like the vDSO.  I don't know how common that is.  If we can rule
that out, then I agree that we should just update it right before each use.

> * This special handling of next_sync_start seems mighty ugly:
> 
> +/* Also consider special wakeup time for starting sync workers. 
> */
> +if (next_sync_start < now)
> +{
> +/*
> + * Instead of spinning while we wait for the sync worker to
> + * start, wait a bit before retrying (unless there's an 
> earlier
> + * wakeup time).
> + */
> +nextWakeup = Min(now + INT64CONST(10), nextWakeup);
> +}
> +else
> +nextWakeup = Min(next_sync_start, nextWakeup);
> 
> Do we really need the slop?  If so, is there a reason why it
> shouldn't apply to all possible sources of nextWakeup?  (It's
> going to be hard to fold next_sync_start into the wakeup[]
> array unless you can make this not a special case.)

I'm not positive it is absolutely necessary.  AFAICT the function that
updates this particular wakeup time is conditionally called, so it at least
seems theoretically possible that we could end up spinning in a tight loop
until we attempt to start a new tablesync worker.  But perhaps this is
unlikely enough that we needn't worry about it.

I noticed that this wakeup time wasn't being updated when the number of
active tablesync workers is >= max_sync_workers_per_subscription.  In v3, I
tried to handle this by setting the wakeup time to a second later for this
case.  I think you could ordinarily depend on the tablesync worker's
notify_pid to wake up the apply worker, but that wouldn't work if the apply
worker has restarted.

Ultimately, this particular wakeup time seems to be a special case, and I
probably need to think about it some more.  If you have ideas, I'm all
ears.

> * It'd probably be worth enlarging the comment for
> LogRepWorkerComputeNextWakeup to explain why its API is like that,
> perhaps "We ask the caller to pass in the value of "now" because
> this frequently avoids multiple calls of GetCurrentTimestamp().
> It had better be a reasonably-up-to-date value, though."

I did this in v3.  I noticed that many of your comments also applied to the
similar patch that was recently applied to walreceiver.c, so I created
another patch to fix that up.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 3b464bf0ccb22e36ab627a5e19981eaf3734d4dd Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 24 Jan 2023 20:52:21 -0800
Subject: [PATCH v3 1/2] code review for 05a7be9

---
 src/backend/replication/walreceiver.c | 31 ++-
 src/include/utils/timestamp.h |  1 +
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 3876c0188d..0563bad0f6

Re: drop postmaster symlink

2023-01-25 Thread Karl O. Pinc
Hello,

Somehow I missed the email changing the status of this back
to "needs review".

Buried in
https://www.postgresql.org/message-id/20230107165942.748ccf4e%40slate.karlpinc.com
is the one change I see that should be made.

> In doc/src/sgml/ref/allfiles.sgml at line 222 there is an ENTITY
> defined which references the deleted postmaster.sgml file.

This line needs to be removed and the
0002-Don-t-install-postmaster-symlink-anymore.patch 
updated.  (Unless there's some magic going on
with the various allfiles.sgml files of which I am
not aware.)

If this is fixed I see no other problems.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




improving user.c error messages

2023-01-25 Thread Nathan Bossart
moving this discussion to a new thread...

On Thu, Jan 19, 2023 at 10:20:33AM -0500, Robert Haas wrote:
> On Wed, Jan 18, 2023 at 6:17 PM Nathan Bossart  
> wrote:
>> However, as the attribute
>> system becomes more sophisticated, I think we ought to improve the error
>> messages in user.c.  IMHO messages like "permission denied" could be
>> greatly improved with some added context.
>>
>> This probably shouldn't block your patch, but I think it's worth doing in
>> v16 since there are other changes in this area.  I'm happy to help.
> 
> That would be great. I agree that it's good to try to improve the
> error messages. It hasn't been entirely clear to me how to do that.
> For instance, I don't think we want to say something like:
> 
> ERROR: must have CREATEROLE privilege and ADMIN OPTION on the target
> role, or in lieu of both of those to be superuser, to set the
> CONNECTION LIMIT for another role
> ERROR: must have CREATEROLE privilege and ADMIN OPTION on the target
> role, plus also CREATEDB, or in lieu of all that to be superuser, to
> remove the CREATEDB property from another role
> 
> Such messages are long and we'd end up with a lot of variants. It's
> possible that the messages could be multi-tier. For instance, if we
> determine that you're trying to manage users and you don't have
> permission to manage ANY user, we could say:
> 
> ERROR: permission to manage roles denied
> DETAIL: You must have the CREATEROLE privilege or be a superuser to
> manage roles.
> 
> If you could potentially manage some user, but not the one you're
> trying to manage, we could say:
> 
> ERROR: permission to manage role "%s" denied
> DETAIL: You need ADMIN OPTION on the target role to manage it.
> 
> If you have permission to manage the target role but not in the
> requested manner, we could then say something like:
> 
> ERROR: permission to manage CREATEDB for role "%s" denied
> DETAIL: You need CREATEDB to manage it.
> 
> This is just one idea, and maybe not the best one. I'm just trying to
> say that I think this is basically an organizational problem. We need
> a plan for how we're going to report errors that is not too
> complicated to implement with reasonable effort, and that will produce
> messages that users will understand. I'd be delighted if you wanted to
> provide either ideas or patches...

Here is an early draft of some modest improvements to the user.c error
messages.  I basically just tried to standardize the style of and add
context to the existing error messages.  I used errhint() for this extra
context, but errdetail() would work, too.  This isn't perfect.  You might
still have to go through a couple rounds of errors before your role has all
the privileges it needs for a command, but this seems to improve matters a
little.

I think there is still a lot of room for improvement, but I wanted to at
least get the discussion started before I went too far.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 3a92e930c0..1982831e2a 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -316,23 +316,28 @@ CreateRole(ParseState *pstate, CreateRoleStmt *stmt)
 		if (!has_createrole_privilege(currentUserId))
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-	 errmsg("permission denied to create role")));
+	 errmsg("permission denied to create role"),
+	 errhint("You must have CREATEROLE privilege to create roles.")));
 		if (issuper)
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-	 errmsg("must be superuser to create superusers")));
+	 errmsg("permission denied to create role"),
+	 errhint("You must have SUPERUSER privilege to create roles with SUPERUSER.")));
 		if (createdb && !have_createdb_privilege())
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-	 errmsg("must have createdb permission to create createdb users")));
+	 errmsg("permission denied to create role"),
+	 errhint("You must have CREATEDB privilege to create roles with CREATEDB.")));
 		if (isreplication && !has_rolreplication(currentUserId))
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-	 errmsg("must have replication permission to create replication users")));
+	 errmsg("permission denied to create role"),
+	 errhint("You must have REPLICATION privilege to create roles with REPLICATION.")));
 		if (bypassrls && !has_bypassrls_privilege(currentUserId))
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-	 errmsg("must have bypassrls to create bypassrls users")));
+	 errmsg("permission denied to create role"),
+	 errhint("You must have BYPASSRLS privilege to create roles with BYPASSRLS.")));
 	}
 
 	/*
@@ -747,7 +752,8 @@ AlterRole(ParseState *pstate, AlterRoleStmt *stmt)
 	if (!superuser() && (authform->rolsuper || dissuper))
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIV

Re: suppressing useless wakeups in logical/worker.c

2023-01-25 Thread Thomas Munro
On Thu, Jan 26, 2023 at 12:50 PM Nathan Bossart
 wrote:
> I did this in v3.  I noticed that many of your comments also applied to the
> similar patch that was recently applied to walreceiver.c, so I created
> another patch to fix that up.

Can we also use TimestampDifferenceMilliseconds()?  It knows about
rounding up for WaitLatch().




Re: suppressing useless wakeups in logical/worker.c

2023-01-25 Thread Nathan Bossart
On Thu, Jan 26, 2023 at 01:23:41PM +1300, Thomas Munro wrote:
> Can we also use TimestampDifferenceMilliseconds()?  It knows about
> rounding up for WaitLatch().

I think we might risk overflowing "long" when all the wakeup times are
DT_NOEND:

 * This is typically used to calculate a wait timeout for WaitLatch()
 * or a related function.  The choice of "long" as the result type
 * is to harmonize with that.  It is caller's responsibility that the
 * input timestamps not be so far apart as to risk overflow of "long"
 * (which'd happen at about 25 days on machines with 32-bit "long").

Maybe we can adjust that function or create a new one to deal with this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




  1   2   >