date:20240515

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-05-15 Thread Bharath Rupireddy

On Mon, Apr 29, 2024 at 11:36 AM Bharath Rupireddy
 wrote:
>
> Please see the attached v20 patch set.

It looks like with the use of the new multi insert table access method
(TAM) for COPY (v20-0005), pgbench regressed about 35% [1]. The reason
is that the memory-based flushing decision the new TAM takes [2]
differs from that of what the COPY does today with table_multi_insert.
The COPY with table_multi_insert, maintains exact size of the tuples
in CopyFromState after it does the line parsing. For instance, the
tuple size of a table with two integer columns is 8 (4+4) bytes here.
The new TAM relies on the memory occupied by the slot's memory context
which holds the actual tuple as a good approximation for the tuple
size. But, this memory context size also includes a tuple header, so
the size here is not just 8 (4+4) bytes but more. Because of this, the
buffers get flushed sooner than that of the existing COPY with
table_multi_insert AM causing regression in pgbench which uses COPY
extensively. The new TAM aren't designed to be able to receive tuple
sizes from the callers, even if we do that, the API doesn't look
generic.

Here are couple of ideas to get away with this:

1. Try to get the actual tuple sizes excluding header sizes for each
column in the new TAM.
2. Try not to use the new TAM for COPY in which case the
table_multi_insert stays forever.
3. Try passing a flag to tell the new TAM that the caller does the
flushing and no need for an internal flushing.

I haven't explored the idea (1) in depth yet. If we find a way to do
so, it looks to me that we are going backwards since we need to strip
off headers for each column of a row for all of the rows. I suspect
this would cost a bit more and may not solve the regression.

With an eventual goal to get rid of table_multi_insert, (3) may not be
a better choice.

(3) seems reasonable to implement and reduce the regression. I did so
in the attached v21 patches. A new flag TM_SKIP_INTERNAL_BUFFER_FLUSH
is introduced in v21 patch, when specified, no internal flushing is
done, the caller has to flush the buffered tuples using
table_modify_buffer_flush(). Check the test results [3] HEAD 2.948 s,
PATCHED 2.946 s.

v21 also adds code to maintain tuple size for virtual tuple slots.
This helps make better memory-based flushing decisions in the new TAM.

Thoughts?

[1]
HEAD:
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.99 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.78 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.88 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.97 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.86 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.96 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.05 s, vacuum 0.21 s, primary keys 0.64 s).
done in 2.83 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.96 s, vacuum 0.21 s, primary keys 0.66 s).
done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.95 s, vacuum 0.20 s, primary keys 0.63 s).
done in 2.79 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.89 s, vacuum 0.21 s, primary keys 0.69 s).
done in 3.75 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.17 s, vacuum 0.32 s, primary keys 1.25 s).
done in 3.86 s (drop tables 0.00 s, create tables 0.08 s, client-side
generate 2.97 s, vacuum 0.21 s, primary keys 0.59 s).

AVG done in 2.948 s

v20 PATCHED:
done in 3.94 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.12 s, vacuum 0.19 s, primary keys 0.62 s).
done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.22 s, vacuum 0.20 s, primary keys 0.61 s).
done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s).
done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.67 s).
done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s).
done in 4.00 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.17 s, vacuum 0.20 s, primary keys 0.63 s).
done in 4.43 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.24 s, vacuum 0.21 s, primary keys 0.98 s).
done in 4.16 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.36 s, vacuum 0.20 s, primary keys 0.59 s).
done in 3.62 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.83 s, vacuum 0.20 s, primary keys 0.58 s).
done in 3.67 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.84 s, vacuum 0.21 s, primary keys 0.61 s).

AVG done in 3.986 s

[2]
+/*
+ * Memory allocated for the whole tuple is in slot's memory context, so
+ * use it k

Re: cataloguing NOT NULL constraints

2024-05-15 Thread Alvaro Herrera

On 2024-May-14, Bruce Momjian wrote:

> Turns out these commits generated a single release note item, which I
> have now removed with the attached committed patch.

Hmm, but the commits about not-null constraints for domains were not
reverted, only the ones for constraints on relations.  I think the
release notes don't properly address the ones on domains.  I think it's
at least these two commits:

> -Author: Peter Eisentraut 
> -2024-03-20 [e5da0fe3c] Catalog domain not-null constraints
> -Author: Peter Eisentraut 
> -2024-04-15 [9895b35cb] Fix ALTER DOMAIN NOT NULL syntax

It may still be a good idea to make a note about those, at least to
point out that information_schema now lists them.  For example, pg11
release notes had this item

Add information_schema columns related to table
constraints and triggers (Peter Eisentraut)

Specifically,

triggers.action_order,

triggers.action_reference_old_table,
and

triggers.action_reference_new_table
are now populated, where before they were always null.  Also,

table_constraints.enforced
now exists but is not yet usefully populated.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Log details for stats dropped more than once

2024-05-15 Thread Bertrand Drouvot

Hi,

On Wed, May 15, 2024 at 02:47:29PM +0900, Michael Paquier wrote:
> On Tue, May 14, 2024 at 10:07:14AM +, Bertrand Drouvot wrote:
> > While resuming the work on refilenode stats (mentioned in [1] but did not 
> > share
> > the patch yet), I realized that my current POC patch is buggy enough to 
> > produce
> > things like:
> > 
> > 024-05-14 09:51:14.783 UTC [1788714] FATAL:  can only drop stats once
> > 
> > While the CONTEXT provides the list of dropped stats:
> > 
> > 2024-05-14 09:51:14.783 UTC [1788714] CONTEXT:  WAL redo at 0/D75F478 for 
> > Transaction/ABORT: 2024-05-14 09:51:14.782223+00; dropped stats: 
> > 2/16384/27512/0 2/16384/27515/0 2/16384/27516/0
> 
> Can refcount be useful to know in this errcontext?

Thanks for looking at it!

Do you mean as part of the added errdetail_internal()? If so, yeah I think it's
a good idea (done that way in v2 attached).

> > Attached a tiny patch to report the stat that generates the error. The 
> > patch uses
> > errdetail_internal() as the extra details don't seem to be useful to average
> > users.
> 
> I think that's fine.  Overall that looks like useful information for
> debugging, so no objections from here.

Thanks! BTW, I just realized that adding more details for this error has already
been mentioned in [1] (and Andres did propose a slightly different version).

[1]: 
https://www.postgresql.org/message-id/20240505160915.6boysum4f34siqct%40awork3.anarazel.de

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 0866e8252f2038f3fdbd9cfabb214461689210df Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 14 May 2024 09:10:50 +
Subject: [PATCH v2] Log details for stats dropped more than once

Adding errdetail_internal() in pgstat_drop_entry_internal() to display the
PgStat_HashKey and the entry's refcount when the "can only drop stats once"
error is reported.
---
 src/backend/utils/activity/pgstat_shmem.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 91591da395..0595c08d6e 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -785,7 +785,12 @@ pgstat_drop_entry_internal(PgStatShared_HashEntry *shent,
 	 * backends to release their references.
 	 */
 	if (shent->dropped)
-		elog(ERROR, "can only drop stats once");
+		ereport(ERROR,
+errmsg("can only drop stats once"),
+errdetail_internal("Stats kind=%s dboid=%u objoid=%u refcount=%u.",
+   pgstat_get_kind_info(shent->key.kind)->name,
+   shent->key.dboid, shent->key.objoid,
+   pg_atomic_read_u32(&shent->refcount)));
 	shent->dropped = true;
 
 	/* release refcount marking entry as not dropped */
-- 
2.34.1

Re: Avoid orphaned objects dependencies, take 3

2024-05-15 Thread Bertrand Drouvot

Hi,

On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
> On Thu, May 09, 2024 at 12:20:51PM +, Bertrand Drouvot wrote:
> > Oh I see, your test updates an existing dependency. v4 took care about 
> > brand new 
> > dependencies creation (recordMultipleDependencies()) but forgot to take care
> > about changing an existing dependency (which is done in another code path:
> > changeDependencyFor()).
> > 
> > Please find attached v5 that adds:
> > 
> > - a call to the new depLockAndCheckObject() function in 
> > changeDependencyFor().
> > - a test when altering an existing dependency.
> > 
> > With v5 applied, I don't see the issue anymore.
> 
> +if (object_description)
> +ereport(ERROR, errmsg("%s does not exist", 
> object_description));
> +else
> +ereport(ERROR, errmsg("a dependent object does not ex
> 
> This generates an internal error code.  Is that intended?

Thanks for looking at it!

Yes, it's like when say you want to create an object in a schema that does not
exist (see get_namespace_oid()).

> --- /dev/null
> +++ 
> b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec 
> 
> This introduces a module with only one single spec.  I could get
> behind an extra module if splitting the tests into more specs makes
> sense or if there is a restriction on installcheck.  However, for 
> one spec file filed with a bunch of objects, and note that I'm OK to
> let this single spec grow more for this range of tests, it seems to me
> that this would be better under src/test/isolation/.

Yeah, I was not sure about this one (the location is from take 2 mentioned
up-thread). I'll look at moving the tests to src/test/isolation/.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Avoid orphaned objects dependencies, take 3

2024-05-15 Thread Bertrand Drouvot

Hi,

On Tue, May 14, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> 09.05.2024 15:20, Bertrand Drouvot wrote:
> > Oh I see, your test updates an existing dependency. v4 took care about 
> > brand new
> > dependencies creation (recordMultipleDependencies()) but forgot to take care
> > about changing an existing dependency (which is done in another code path:
> > changeDependencyFor()).
> > 
> > Please find attached v5 that adds:
> > 
> > - a call to the new depLockAndCheckObject() function in 
> > changeDependencyFor().
> > - a test when altering an existing dependency.
> > 
> > With v5 applied, I don't see the issue anymore.
> 
> Me too. Thank you for the improved version!
> I will test the patch in the background, but for now I see no other
> issues with it.

Thanks for confirming!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: First draft of PG 17 release notes

2024-05-15 Thread Alvaro Herrera

On 2024-May-14, Bruce Momjian wrote:

> On Tue, May 14, 2024 at 03:39:26PM -0400, Melanie Plageman wrote:

> > I think that we need to more clearly point out the implications of the
> > feature added by Sawada-san (and reviewed by John) in 667e65aac35497.
> > Vacuum no longer uses a fixed amount of memory for dead tuple TID
> > storage and it is not preallocated. This affects users as they may
> > want to change their configuration (and expectations).
> > 
> > If you make that item more specific to their work, you should also
> > remove my name, as the work I did on vacuum this release was unrelated
> > to their work on dead tuple TID storage.
> > 
> > The work Heikki and I did which culminated in 6dbb490261 mainly has
> > the impact of improving vacuum's performance (vacuum emits less WAL
> > and is more efficient). So you could argue for removing it from the
> > release notes if you are using the requirement that performance
> > improvements don't go in the release notes.
> 
> I don't think users really care about these details, just that it is
> faster so they will not be surprised if there is a change.  I was
> purposely vague to group multiple commits into the single item.  By
> grouping them together, I got enough impact to warrant listing it.  If
> you split it apart, it is harder to justify mentioning them.

I disagree with this.  IMO the impact of the Sawada/Naylor change is
likely to be enormous for people with large tables and large numbers of
tuples to clean up (I know we've had a number of customers in this
situation, I can't imagine any Postgres service provider that doesn't).
The fact that maintenance_work_mem is no longer capped at 1GB is very
important and I think we should mention that explicitly in the release
notes, as setting it higher could make a big difference in vacuum run
times.

I don't know what's the impact of the Plageman/Linnakangas work, but
since there are no user-visible consequences other than it being faster,
I agree it could be put more succintly, perhaps together as a sub-para
of the same item.

What about something like this?

 Lift the 1 GB allocation limit for vacuum memory usage for dead
 tuples, and make storage more compact and performant.

 This can reduce the number of index passes that vacuum has to perform
 for tables with many dead tuples, shortening vacuum times.

 Also, the WAL traffic caused by vacuum has been made more compact.

> > However, one of the preliminary commits for this f83d70976 does
> > change WAL format. There are three WAL records which no longer exist
> > as separate records. Do users care about this?

I don't think so.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You don't solve a bad join with SELECT DISTINCT" #CupsOfFail
https://twitter.com/connor_mc_d/status/1431240081726115845

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-05-15 Thread Alvaro Herrera

Sorry to interject, but --

On 2024-May-15, Bharath Rupireddy wrote:

> It looks like with the use of the new multi insert table access method
> (TAM) for COPY (v20-0005), pgbench regressed about 35% [1].

Where does this acronym "TAM" comes from for "table access method"?  I
find it thoroughly horrible and wish we didn't use it.  What's wrong
with using "table AM"?  It's not that much longer, much clearer and
reuses our well-established acronym AM.

We don't use IAM anywhere, for example (it's always "index AM"), and I
don't think we'd turn "sequence AM" into SAM either, would we?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Postgres and --config-file option

2024-05-15 Thread Peter Eisentraut


On 15.05.24 04:07, Michael Paquier wrote:

Not sure that these additions in --help or the docs are necessary.
The rest looks OK.

-"You must specify the --config-file or -D invocation "
+"You must specify the --config-file (or equivalent -c) or -D invocation "

How about "You must specify the --config-file, -c
\"config_file=VALUE\" or -D invocation"?  There is some practice for
--opt=VALUE in .po files.


Yeah, some of this is becoming quite unwieldy, if we document and 
mention each spelling variant of each option everywhere.


Maybe if the original problem is that the option --config-file is not 
explicitly in the --help output, let's add it to the --help output?

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-05-15 Thread Bharath Rupireddy

On Wed, May 15, 2024 at 2:44 PM Alvaro Herrera  wrote:
>
> > It looks like with the use of the new multi insert table access method
> > (TAM) for COPY (v20-0005), pgbench regressed about 35% [1].
>
> Where does this acronym "TAM" comes from for "table access method"?

Thanks for pointing it out. I used it for just the discussion sake in
this response. Although a few of the previous responses from others in
this thread mentioned that word, none of the patches have it added in
the code. I'll ensure to not use it further in this thread if that
worries one like another acronym is being added.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Avoid orphaned objects dependencies, take 3

2024-05-15 Thread Bertrand Drouvot

Hi,

On Wed, May 15, 2024 at 08:31:43AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote:
> > +++ 
> > b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec
> >  
> > 
> > This introduces a module with only one single spec.  I could get
> > behind an extra module if splitting the tests into more specs makes
> > sense or if there is a restriction on installcheck.  However, for 
> > one spec file filed with a bunch of objects, and note that I'm OK to
> > let this single spec grow more for this range of tests, it seems to me
> > that this would be better under src/test/isolation/.
> 
> Yeah, I was not sure about this one (the location is from take 2 mentioned
> up-thread). I'll look at moving the tests to src/test/isolation/.

Please find attached v6 (only diff with v5 is moving the tests as suggested
above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 59f7befd34b8aa4ba0483a100e85bacbc1a76707 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v6] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place when the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- alter a dependency (function and schema)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/dependency.c  |  54 
 src/backend/catalog/objectaddress.c   |  70 ++
 src/backend/catalog/pg_depend.c   |  12 ++
 src/include/catalog/dependency.h  |   1 +
 src/include/catalog/objectaddress.h   |   1 +
 .../expected/test_dependencies_locks.out  | 129 ++
 src/test/isolation/isolation_schedule |   1 +
 .../specs/test_dependencies_locks.spec|  89 
 8 files changed, 357 insertions(+)
  24.1% src/backend/catalog/
  45.9% src/test/isolation/expected/
  28.6% src/test/isolation/specs/

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index d4b5b2ade1..a49357bbe2 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1517,6 +1517,60 @@ AcquireDeletionLock(const ObjectAddress *object, int flags)
 	}
 }
 
+/*
+ * depLockAndCheckObject
+ *
+ * Lock the object that we are about to record a dependency on.
+ * After it's locked, verify that it hasn't been dropped while we
+ * weren't looking.  If the object has been dropped, this function
+ * does not return!
+ */
+void
+depLockAndCheckObject(const ObjectAddress *object)
+{
+	char	   *object_description;
+
+	/*
+	 * Those don't rely on LockDatabaseObject() when being dropped (see
+	 * AcquireDeletionLock()). Also it looks like they can not produce
+	 * orphaned dependent objects when being dropped.
+	 */
+	if (object->classId == RelationRelationId || object->classId == AuthMemRelationId)
+		return;
+
+	object_description = getObjectDescription(object, true);
+
+	/* assume we should lock the whole object not a sub-object */
+	LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock);
+
+	/* check if object still exists */
+	if (!ObjectByIdExist(object, false))
+	{
+		/*
+		 * It might be possible that we are creating it (for example creating
+		 * a composite type while creating a relation), so bypass the syscache
+		 * lookup and use a dirty snaphot instead to cover this scenario.
+		 */
+		if (!ObjectByIdExist(object, true))
+		{
+			/*
+			 * If the object has been dropped before we get a chance to get
+			 * its description, then emit a generic error message. That looks
+			 * like a good compromise over extra complexity.
+			 */

Re: First draft of PG 17 release notes

2024-05-15 Thread David Rowley

On Wed, 15 May 2024 at 20:38, Alvaro Herrera  wrote:
>
> On 2024-May-14, Bruce Momjian wrote:
> > I don't think users really care about these details, just that it is
> > faster so they will not be surprised if there is a change.  I was
> > purposely vague to group multiple commits into the single item.  By
> > grouping them together, I got enough impact to warrant listing it.  If
> > you split it apart, it is harder to justify mentioning them.
>
> I disagree with this.  IMO the impact of the Sawada/Naylor change is
> likely to be enormous for people with large tables and large numbers of
> tuples to clean up (I know we've had a number of customers in this
> situation, I can't imagine any Postgres service provider that doesn't).
> The fact that maintenance_work_mem is no longer capped at 1GB is very
> important and I think we should mention that explicitly in the release
> notes, as setting it higher could make a big difference in vacuum run
> times.

I very much agree with Alvaro here. IMO, this should be on the
highlight feature list for v17. Prior to this, having to do multiple
index scans because of filling maintenance_work_mem was a performance
tragedy. If there were enough dead tuples to have filled
maintenance_work_mem, then the indexes are large. Having to scan
multiple large indexes multiple times isn't good use of I/O and CPU.
As far as I understand it, this work means it'll be unlikely that a
well-configured server will ever have to do multiple index passes. I
don't think "enormous impact" is an exaggeration here.

David

Re: doc: some fixes for environment sections in ref pages

2024-05-15 Thread Peter Eisentraut


On 13.05.24 13:02, Daniel Gustafsson wrote:

On 13 May 2024, at 10:48, Peter Eisentraut  wrote:



Patches attached.


All patches look good.


I think the first one is a bug fix.


Agreed.


Committed, thanks.

Re: Fixup a few 2023 copyright years

2024-05-15 Thread David Rowley

On Wed, 15 May 2024 at 17:32, Michael Paquier  wrote:
> While running src/tools/copyright.pl, I have noticed that that a
> newline was missing at the end of index_including.sql, as an effect of
> the test added by you in a63224be49b8.  I've cleaned up that while on
> it, as it was getting added automatically, and we tend to clean these
> like in 3f1197191685 or more recently c2df2ed90a82.

Thanks for fixing that.  I'm a little surprised that pgindent does not
fix that sort of thing.

David

Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs

2024-05-15 Thread Nazir Bilal Yavuz

Hi,

On Sun, 12 May 2024 at 14:53, Peter Eisentraut  wrote:
>
> On 14.12.23 14:40, Nazir Bilal Yavuz wrote:
> > On Fri, 6 Oct 2023 at 17:07, Tom Lane  wrote:
> >>
> >> As a quick cross-check, I searched our commit log to see how many
> >> README-only commits there were so far this year.  I found 11 since
> >> January.  (Several were triggered by the latest round of pgindent
> >> code and process changes, so maybe this is more than typical.)
> >>
> >> Not sure what that tells us about the value of changing the CI
> >> logic, but it does seem like it could be worth the one-liner
> >> change needed to teach buildfarm animals to ignore READMEs.
> >
> > I agree that it could be worth implementing this logic on buildfarm animals.
> >
> > In case we want to implement the same logic on the CI, I added a new
> > version of the patch; it skips CI completely if the changes are only
> > in the README files.
>
> I don't see how this could be applicable widely enough to be useful:
>
> - While there are some patches that touch on README files, very few of
> those end up in a commit fest.
>
> - If someone manually pushes a change to their own CI environment, I
> don't see why we need to second-guess that.
>
> - Buildfarm runs generally batch several commits together, so it is very
> unlikely that this would be hit.
>
> I think unless some concrete reason for this change can be shown, we
> should drop it.

These points make sense. I thought that it is useful regarding Tom's
'11 README-only commit since January' analysis (at 6 Oct 2023) but
this may not be enough on its own. If there are no objections, I will
withdraw this soon.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Underscore in positional parameters?

2024-05-15 Thread Peter Eisentraut


On 14.05.24 18:07, Erik Wienhold wrote:

Patch 0001 changes rules param and param_junk to only accept digits 0-9.


I have committed this patch to PG16 and master.

I was a little bit on the fence about what the behavior should be, but I 
checked Perl for comparison:


print 1000;   # ok
print 1_000;  # ok
print $1000;  # ok
print $1_000; # error

So this seems alright.


Patch 0002 replaces atol with pg_strtoint32_safe in the backend parser
and strtoint in ECPG.  This fixes overflows like:


Seems like a good idea, but as was said, this is an older issue, so 
let's look at that separately.

in parentesis is not usual on DOCs

2024-05-15 Thread Marcos Pegoraro

This page has 3 items that are between parentheses, there is an explanation
why they are used this way ?

https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-TABLE


Each syntax element is described below in more detail.
*context_item*, *path_expression* [ AS *json_path_name* ] [ PASSING {
*value* AS *varname* } [, ...]]

The input data to query (*context_item*), the JSON path expression defining
the query (*path_expression*) with an optional name (*json_path_name*), and
an optional PASSING clause, which can provide data

Why (*context_item*), (*path_expression*) and (*json_path_name*) are inside
a parentheses ? This is not usual when explaining any other feature.

regards
Marcos

Re: Converting README documentation to Markdown

2024-05-15 Thread Daniel Gustafsson

> On 13 May 2024, at 09:20, Peter Eisentraut  wrote:

> I started looking through this and immediately found a bunch of tiny 
> problems.  (This is probably in part because the READMEs under 
> src/backend/access/ are some of the more complicated ones, but then they are 
> also the ones that might benefit most from better rendering.)

Thanks for looking!

> One general problem is that original Markdown and GitHub-flavored Markdown 
> (GFM) are incompatible in some interesting aspects.

That's true, but virtually every implementation of Markdown in practical use
today is incompatible with Original Markdown.

Reading my email I realize I failed to mention the markdown platforms I was
targeting (and thus flavours), and citing Gruber made it even more confusing.
For online reading I verified with Github and VS Code since they have a huge
market presence.  For offline work I targeted rendering with pandoc since we
already have a dependency on it in the tree.  I don't think targeting the
original Markdown implementation is useful, or even realistic.

Another aspect of platform/flavour was to make the markdown version easy to
maintain for hackers writing content.  Requiring the minimum amount of markup
seems like the developer-friendly way here to keep productivity as well as
document quality high.

Most importantly though, I targeted reading the files as plain text without any
rendering.  We keep these files in text format close to the code for a reason,
and maintaining readability as text was a north star.

>  For example, the line
> 
>A split initially marks the left page with the F_FOLLOW_RIGHT flag.
> 
> is rendered by GFM as you'd expect.  But original Markdown converts it to
> 
>A split initially marks the left page with the FFOLLOWRIGHT
>flag.
> 
> This kind of problem is pervasive, as you'd expect.

Correct, but I can't imagine that we'd like to wrap every instance of a name
with underscores in backticks like `F_FOLLOW_RIGHT`.  There are very few
Markdown implementations which don't support underscores like this (testing
just now on the top online editors and sites providing markdown editing I
failed to find a single one).

> Also, the READMEs often do not indent lists in a non-ambiguous way.  For 
> example, if you look into src/backend/optimizer/README, section "Join Tree 
> Construction", there are two list items, but it's not immediately clear which 
> paragraphs belong to the list and which ones follow the list.  This also 
> interacts with the previous point.  The resulting formatting in GFM is quite 
> misleading.

I agree that the rendered version excacerbates this problem.  Writing a bullet
point list where each item spans multiple paragraphs indented the same way as
the paragraphs following the list is not helpful to the reader.  In these cases
both the markdown and the text version will be improved by indentation.

> There are also various places where whitespace is used for ad-hoc formatting. 
>  Consider for example in src/backend/access/gin/README
> 
>  the "category" of the null entry.  These are the possible categories:
> 
>1 = ordinary null key value extracted from an indexable item
>2 = placeholder for zero-key indexable item
>3 = placeholder for null indexable item
> 
>  Placeholder null entries are inserted into the index because otherwise
> 
> But this does not preserve the list-like formatting, it just flows it 
> together.

That's the kind of sublists which need to be found as part of this work, and
the items prefixed with a list identifier.  In this case, prefixing each row in
the sublist with '-' yields the correct result.

> src/test/README.md wasn't touched by your patch, but it also needs 
> adjustments for list formatting.

I didn't re-indent that one in order to keep the changes to the absolute
minimum, since I considered the rendered version passable even if not
particularly good.  Re-indenting files like this will for sure make the end
result better, as long as the changes keep the text version readability.

> In summary, I think before we could accept this, we'd need to go through this 
> with a fine-toothed comb line by line and page by page to make sure the 
> formatting is still sound.  

Absolutely.  I've been over every file to ensure they aren't blatantly wrong,
but I didn't want to spend the time if this was immmediately shot down as
something the community don't want to maintain.

> And we'd need to figure out which Markdown flavor to target.

Absolutely, and as I mentioned above, we need to pick based both the final
result (text and rendered) as well as the developer experience for maintaining
this.

--
Daniel Gustafsson

Re: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-15 Thread Amit Kapila

On Wed, May 15, 2024 at 9:26 AM Michael Paquier  wrote:
>
> On Tue, May 14, 2024 at 10:22:29AM +, Ilyasov Ian wrote:
> > Hello, hackers!
> >
> > Recently I've been building postgres with different cflags and cppflags.
> > And suddenly on REL_15_STABLE, REL_16_STABLE and master
> > I faced a failure of a src/test/subscription/t/029_on_error.pl test when
> > CPPFLAGS="-DWAL_DEBUG"
> > and
> > printf "wal_debug = on\n" >> "${TEMP_CONFIG}"
> > (or when both publisher and subscriber or only subscriber are run with 
> > wal_debug=on)
> >
> > So I propose a little fix to the test.
>
> Rather than assuming that the last line is the one to check, wouldn't
> it be better to grab the data from the CONTEXT line located just after
> the ERROR reporting the primary key violation?
>

I guess it could be more work if we want to enhance the test for
ERRORs other than the primary key violation. One simple fix is to
update the log_offset to the location of the LOG after successful
replication of un-conflicted data. For example, the Log location after
we execute the below line in the test:

# Check replicated data
   my $res =
 $node_subscriber->safe_psql('postgres', "SELECT
count(*) FROM tbl");
   is($res, $expected, $msg);

-- 
With Regards,
Amit Kapila.

Re: in parentesis is not usual on DOCs

2024-05-15 Thread Daniel Gustafsson

> On 15 May 2024, at 14:04, Marcos Pegoraro  wrote:

> Why (context_item), (path_expression) and (json_path_name) are inside a 
> parentheses ? This is not usual when explaining any other feature. 

Agreed, that's inconsisent with how for example json_table_column is documented
in the next list item under COLUMNS.  Unless objected to I will remove these
parenthesis.

--
Daniel Gustafsson

Re: cataloguing NOT NULL constraints

2024-05-15 Thread Peter Eisentraut


On 15.05.24 09:50, Alvaro Herrera wrote:

On 2024-May-14, Bruce Momjian wrote:


Turns out these commits generated a single release note item, which I
have now removed with the attached committed patch.


Hmm, but the commits about not-null constraints for domains were not
reverted, only the ones for constraints on relations.  I think the
release notes don't properly address the ones on domains.  I think it's
at least these two commits:


-Author: Peter Eisentraut 
-2024-03-20 [e5da0fe3c] Catalog domain not-null constraints
-Author: Peter Eisentraut 
-2024-04-15 [9895b35cb] Fix ALTER DOMAIN NOT NULL syntax


I'm confused that these were kept.  The first one was specifically to 
make the catalog representation of domain not-null constraints 
consistent with table not-null constraints.  But the table part was 
reverted, so now the domain constraints are inconsistent again.


The second one refers to the first one, but it might also fix some 
additional older issue, so it would need more investigation.

Re: First draft of PG 17 release notes

2024-05-15 Thread Melanie Plageman

On Wed, May 15, 2024 at 4:38 AM Alvaro Herrera  wrote:
>
> On 2024-May-14, Bruce Momjian wrote:
>
> > On Tue, May 14, 2024 at 03:39:26PM -0400, Melanie Plageman wrote:
>
> > > I think that we need to more clearly point out the implications of the
> > > feature added by Sawada-san (and reviewed by John) in 667e65aac35497.
> > > Vacuum no longer uses a fixed amount of memory for dead tuple TID
> > > storage and it is not preallocated. This affects users as they may
> > > want to change their configuration (and expectations).
> > >
> > > If you make that item more specific to their work, you should also
> > > remove my name, as the work I did on vacuum this release was unrelated
> > > to their work on dead tuple TID storage.
> > >
> > > The work Heikki and I did which culminated in 6dbb490261 mainly has
> > > the impact of improving vacuum's performance (vacuum emits less WAL
> > > and is more efficient). So you could argue for removing it from the
> > > release notes if you are using the requirement that performance
> > > improvements don't go in the release notes.
> >
> > I don't think users really care about these details, just that it is
> > faster so they will not be surprised if there is a change.  I was
> > purposely vague to group multiple commits into the single item.  By
> > grouping them together, I got enough impact to warrant listing it.  If
> > you split it apart, it is harder to justify mentioning them.
>
> I disagree with this.  IMO the impact of the Sawada/Naylor change is
> likely to be enormous for people with large tables and large numbers of
> tuples to clean up (I know we've had a number of customers in this
> situation, I can't imagine any Postgres service provider that doesn't).
> The fact that maintenance_work_mem is no longer capped at 1GB is very
> important and I think we should mention that explicitly in the release
> notes, as setting it higher could make a big difference in vacuum run
> times.
>
> I don't know what's the impact of the Plageman/Linnakangas work, but
> since there are no user-visible consequences other than it being faster,
> I agree it could be put more succintly, perhaps together as a sub-para
> of the same item.
>
> What about something like this?
>
> 
>  Lift the 1 GB allocation limit for vacuum memory usage for dead
>  tuples, and make storage more compact and performant.
> 
> 
>  This can reduce the number of index passes that vacuum has to perform
>  for tables with many dead tuples, shortening vacuum times.
> 
> 
>  Also, the WAL traffic caused by vacuum has been made more compact.
> 

I think this wording and organization makes sense. I hadn't thought of
using "traffic" to describe this, but I like it.

Also +1 on the Sawada/Naylor change being on the highlight section of
the release (as David suggested upthread).

- Melanie

Re: First draft of PG 17 release notes

2024-05-15 Thread Amit Kapila

On Wed, May 15, 2024 at 7:36 AM Bruce Momjian  wrote:
>
> On Wed, May 15, 2024 at 02:03:32PM +1200, David Rowley wrote:
> > On Wed, 15 May 2024 at 13:00, Bruce Momjian  wrote:
> > >
> > > On Tue, May 14, 2024 at 03:39:26PM -0400, Melanie Plageman wrote:
> > > > "Reduce system calls by automatically merging reads up to 
> > > > io_combine_limit"
> > >
> > > Uh, as I understand it, the reduced number of system calls is not the
> > > value of the feature, but rather the ability to request a larger block
> > > from the I/O subsystem.  Without it, you have to make a request and wait
> > > for each request to finish.  I am open to new wording, but I am not sure
> > > your new wording is accurate.
> >
> > I think you have the cause and effect backwards. There's no advantage
> > to reading 128KB if you only need 8KB.  It's the fact that doing
> > *larger* reads allows *fewer* reads that allows it to be more
> > efficient.  There are also the efficiency gains from fadvise
> > POSIX_FADV_WILLNEED. I'm unsure how to jam that into a short sentence.
> > Maybe; "Optimize reading of tables by allowing pages to be prefetched
> > and read in chunks up to io_combine_limit", or a bit more buzzy;
> > "Optimize reading of tables by allowing pages to be prefetched and
> > performing vectored reads in chunks up to io_combine_limit".
>
> Yes, my point is that it is not the number of system calls or system
> call overhead that is the advantage of this patch, but the ability to
> request more of the I/O system in one call, which is not the same as
> system calls.
>
> I can use your wording, but how much prefetching to we enable now?
>

Shouldn't we need to include commit
b5a9b18cd0bc6f0124664999b31a00a264d16913 with this item?

-- 
With Regards,
Amit Kapila.

Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Andrei Lepikhov


On 15/5/2024 12:09, Michael Paquier wrote:

On Wed, May 15, 2024 at 03:24:05AM +, Imseih (AWS), Sami wrote:

I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.


When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?


Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly.  Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.
Okay, that's what I precisely wanted to understand: queryId doesn't have 
semantics to show the job that consumes resources right now—it is mostly 
about convenience to know that the backend processes nothing except 
(probably) this query.


--
regards, Andrei Lepikhov

Re: Direct SSL connection with ALPN and HBA rules

2024-05-15 Thread Heikki Linnakangas


On 14/05/2024 01:29, Jacob Champion wrote:

Definitely not a major problem, but I think
select_next_encryption_method() has gone stale, since it originally
provided generality and lines of fallback that no longer exist. In
other words, I think the following code is now misleading:


if (conn->sslmode[0] == 'a')
 SELECT_NEXT_METHOD(ENC_PLAINTEXT);

SELECT_NEXT_METHOD(ENC_NEGOTIATED_SSL);
SELECT_NEXT_METHOD(ENC_DIRECT_SSL);

if (conn->sslmode[0] != 'a')
 SELECT_NEXT_METHOD(ENC_PLAINTEXT);


To me, that implies that negotiated mode takes precedence over direct,
but the point of the patch is that it's not possible to have both. And
if direct SSL is in use, then sslmode can't be "allow" anyway, and we
definitely don't want ENC_PLAINTEXT.

So if someone proposes a change to select_next_encryption_method(),
you'll have to remember to stare at init_allowed_encryption_methods()
as well, and think really hard about what's going on. And vice-versa.
That worries me.


Ok, yeah, I can see that now. Here's a new version to address that. I 
merged ENC_SSL_NEGOTIATED_SSL and ENC_SSL_DIRECT_SSL to a single method, 
ENC_SSL. The places that need to distinguish between them now check 
conn-sslnegotiation. That seems more clear now that there is no fallback.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 7a2bc2ede5ba7bef147e509ce3c4d5472c8e0247 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 15 May 2024 16:27:51 +0300
Subject: [PATCH v2 1/1] Remove option to fall back from direct to postgres SSL
 negotiation

There were three problems with the sslnegotiation options:

1. The sslmode=prefer and sslnegotiation=requiredirect combination was
somewhat dangerous, as you might unintentionally fall back to
plaintext authentication when connecting to a pre-v17 server.

2. There was an asymmetry between 'postgres' and 'direct'
options. 'postgres' meant "try only traditional negotiation", while
'direct' meant "try direct first, and fall back to traditional
negotiation if it fails". That was apparent only if you knew that the
'requiredirect' mode also exists.

3. The "require" word in 'requiredirect' suggests that it's somehow
more strict or more secure, similar to sslmode. However, I don't
consider direct SSL connections to be a security feature.

To address these problems:

- Only allow sslnegotiation='direct' if sslmode='require' or
stronger. And for the record, Jacob and Robert felt that we should do
that (or have sslnegotiation='direct' imply sslmode='require') anyway,
regardless of the first issue.

- Remove the 'direct' mode that falls back to traditional negotiation,
and rename what was called 'requiredirect' to 'direct' instead. In
other words, there is no "try both methods" option anymore, 'postgres'
now means the traditional negotiation and 'direct' means a direct SSL
connection.

Reviewed-by: Jelte Fennema-Nio, Robert Haas, Jacob Champion
Discussion: https://www.postgresql.org/message-id/d3b1608a-a1b6-4eda-9ec5-ddb3e4375808%40iki.fi
---
 doc/src/sgml/libpq.sgml   |  49 ++--
 src/interfaces/libpq/fe-connect.c | 144 +-
 src/interfaces/libpq/fe-secure-openssl.c  |   2 +-
 src/interfaces/libpq/libpq-int.h  |   6 +-
 .../libpq/t/005_negotiate_encryption.pl   | 254 --
 5 files changed, 202 insertions(+), 253 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1d32c226d8..b32e497b1b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1772,15 +1772,18 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   sslnegotiation
   

-This option controls whether PostgreSQL
-will perform its protocol negotiation to request encryption from the
-server or will just directly make a standard SSL
-connection.  Traditional PostgreSQL
-protocol negotiation is the default and the most flexible with
-different server configurations. If the server is known to support
-direct SSL connections then the latter requires one
-fewer round trip reducing connection latency and also allows the use
-of protocol agnostic SSL network tools.
+This option controls how SSL encryption is negotiated with the server,
+if SSL is used. In the default postgres mode, the
+client first asks the server if SSL is supported. In
+direct mode, the client starts the standard SSL
+handshake directly after establishing the TCP/IP connection. Traditional
+PostgreSQL protocol negotiation is the most
+flexible with different server configurations. If the server is known
+to support direct SSL connections then the latter
+requires one fewer round trip reducing connection latency and also
+allows the use of protocol agnostic SSL network tools. The direct SSL
+option was introduced in PostgreSQL version
+17.

 
 
@@ -1798,32 +1801,14 @@ postgr

RE: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-15 Thread Hayato Kuroda (Fujitsu)

Dear Amit, Ian,

> I guess it could be more work if we want to enhance the test for
> ERRORs other than the primary key violation. One simple fix is to
> update the log_offset to the location of the LOG after successful
> replication of un-conflicted data. For example, the Log location after
> we execute the below line in the test:
> 
> # Check replicated data
>my $res =
>  $node_subscriber->safe_psql('postgres', "SELECT
> count(*) FROM tbl");
>is($res, $expected, $msg);

I made a patch for confirmation purpose. This worked well on my environment.
Ian, how about you?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



fix_029.diff
Description: fix_029.diff

Re: Postgres and --config-file option

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 2:49 AM Peter Eisentraut 
wrote:

> On 15.05.24 04:07, Michael Paquier wrote:
> > Not sure that these additions in --help or the docs are necessary.
> > The rest looks OK.
> >
> > -"You must specify the --config-file or -D invocation "
> > +"You must specify the --config-file (or equivalent -c) or -D
> invocation "
> >
> > How about "You must specify the --config-file, -c
> > \"config_file=VALUE\" or -D invocation"?  There is some practice for
> > --opt=VALUE in .po files.
>
> Yeah, some of this is becoming quite unwieldy, if we document and
> mention each spelling variant of each option everywhere.
>

Where else would this need to be added that was missed?  Largely we don't
discuss how to bring a setting into effect - rather there is a single
reference area that discusses how, and everywhere else just assumes you
have read it and goes on to name the setting.  On this grounds the
proper fix here is probably to not put the how into the message:

"You must specify the config_file option, the -D argument, or the PGDATA
environment variable."

And this is only unwieldy because while -D and --config-file both can get
to the same result they are not substitutes for each other.  Namely if the
configuration file is not in the data directory, as is the case on Debian,
the choice to use -D is not going to work.

This isn't an error message, I'm not all that worried if we output a wall
of text in lieu of pointing the user to the reference page.

> Maybe if the original problem is that the option --config-file is not
> explicitly in the --help output, let's add it to the --help output?
>
>
I'm not opposed to this.  Though maybe it is sufficient to do:

--NAME=VALUE (e.g., --config-file='...')

I would do this in addition to removing the explicit how of setting
config_file above.

We also don't mention environment variables in the help but that message
refers to PGDATA...so the complaint and fix if done on that basis seems a
bit selective.

David J.

Re: libpq compression (part 3)

2024-05-15 Thread Robert Haas

On Tue, May 14, 2024 at 5:21 PM Jacob Burroughs
 wrote:
> The reason for both the semicolons and for not doing this is related
> to using the same specification structure as here:
> https://www.postgresql.org/docs/current/app-pgbasebackup.html
> (specifically the --compress argument).

I agree with that goal, but I'm somewhat confused by how your proposal
achieves it. You had
libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off,
so how do we parse that? Is that two completely separate
specifications, one for lzma and one for gzip, and each of those has
one option which is set to off? And then they are separated from each
other by a semicolon? That actually does make sense, and I think it
may do a better job allowing for compression options than my proposal,
but it also seems a bit weird, because client_to_server and
server_to_client are not really compression options at all. They're
framing when this compression specification applies, rather than what
it does when it applies. In a way it's a bit like the fact that you
can prefix a pg_basebackup's --compress option with client- or server-
to specify where the compression should happen. But we can't quite
reuse that idea here, because in that case there's no question of
doing it in both places, whereas here, you might want one thing for
upstream and another thing for downstream.

> Alternatively, we could have `connection_compression`,
> `connection_compression_server_to_client`, and
> `connection_compression_client_to_server` as three separate GUCs (and
> on the client side `compression`, `compression_server_to_client`, and
> `compression_client_to_server` as three separate connection
> parameters), where we would treat `connection_compression` as a
> default that could be overridden by an explicit
> client_to_server/server_to_client.  That creates the slightly funky
> case where if you specify all three then the base one ends up unused
> because the two more specific ones are being used instead, but that
> isn't necessarily terrible.  On the server side we *could* go with
> just the server_to_client and client_to_server ones, but I think we
> want it to be easy to use this feature in the simple case with a
> single libpq parameter.

I'm not a fan of three settings; I could go with two settings, one for
each direction, and if you want both you have to set both. Or, another
idea, what if we just separated the two directions with a slash,
SEND/RECEIVE, and if there's no slash, then it applies to both
directions. So you could say
connection_compression='gzip:level=9/lzma' or whatever.

But now I'm wondering whether these options should really be symmetric
on the client and server sides? Isn't it for the server just to
specify a list of acceptable algorithms, and the client to set the
compression options? If both sides are trying to set the compression
level, for example, who wins?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-15 Thread Ilyasov Ian

Dear Hayato,

> I made a patch for confirmation purpose. This worked well on my environment.
> Ian, how about you?

I checked this patch on my environment. It also works well.
I like this change, but as I see it makes a different approach from Michael's 
advice.
Honesly, I do not know what would be better for this test.


Kind regards,
Ian Ilyasov.


Junior Software Developer at Postgres Professional

Re: BitmapHeapScan streaming read user and prelim refactoring

2024-05-15 Thread Robert Haas

Procedural comment:

It's better to get this patch committed with an imperfect test case
than to have it miss beta1.

...Robert

Re: Fixup a few 2023 copyright years

2024-05-15 Thread Tom Lane

David Rowley  writes:
> On Wed, 15 May 2024 at 17:32, Michael Paquier  wrote:
>> While running src/tools/copyright.pl, I have noticed that that a
>> newline was missing at the end of index_including.sql, as an effect of
>> the test added by you in a63224be49b8.  I've cleaned up that while on
>> it, as it was getting added automatically, and we tend to clean these
>> like in 3f1197191685 or more recently c2df2ed90a82.

> Thanks for fixing that.  I'm a little surprised that pgindent does not
> fix that sort of thing.

pgindent does not touch anything but .c and .h files.

I do recommend running "git diff --check" (with --staged if you
already git-added your changes) before you're ready to commit
something.  That does find generic whitespace issues, and I
believe it would've found this one.

regards, tom lane

Re: Postgres and --config-file option

2024-05-15 Thread Jelte Fennema-Nio

On Wed, 15 May 2024 at 11:49, Peter Eisentraut  wrote:
> Yeah, some of this is becoming quite unwieldy, if we document and
> mention each spelling variant of each option everywhere.
>
> Maybe if the original problem is that the option --config-file is not
> explicitly in the --help output, let's add it to the --help output?

I definitely think it would be useful to list this --config variant in
more places, imho it's nicer than the -c variant. Especially in the
PGOPTIONS docs it would be useful. People are already using it in the
wild and I regressed on support for that in PgBouncer by accident:
https://github.com/pgbouncer/pgbouncer/pull/1064

Re: Things I don't like about \du's "Attributes" column

2024-05-15 Thread Pavel Luzanov


On 14.05.2024 19:03, Robert Haas wrote:

I think we should go back to the v4 version of this patch, minus the
(ignored) stuff.


Thank you for looking into this.
I can assume that you support the idea of changing pg_roles. It's great.

By the way, I have attached a separate thread[1] about pg_roles to this 
commitfest entry[2].

I will return to work on the patch after my vacation.

1.https://www.postgresql.org/message-id/flat/db1d94ba-1e6e-4e86-baff-91e6e79071c1%40postgrespro.ru
2.https://commitfest.postgresql.org/48/4738/

--
Pavel Luzanov
Postgres Professional:https://postgrespro.com

Re: Restrict EXPLAIN (ANALYZE) for RLS and security_barrier views

2024-05-15 Thread Laurenz Albe

On Mon, 2024-05-06 at 16:46 +0200, Laurenz Albe wrote:
> Attached is a POC patch that implements that (documentation and
> regression tests are still missing) to form a basis for a discussion.

... and here is a complete patch with regression tests and documentation.

Yours,
Laurenz Albe
From 201c23d0716af7451375608644a4cf2a002744df Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Wed, 15 May 2024 17:09:48 +0200
Subject: [PATCH v2] Restrict EXPLAIN (ANALYZE) on security relevant statements

Using EXPLAIN (ANALYZE), it is easy to work around restrictions
imposed by row-level security or security barrier views.
This is not considered a security bug, but we ought to do better.
Restricting the use of EXPLAIN (ANALYZE) to superusers in such
cases would be too much, and superusers bypass row-level security,
so that would leave no way to debug such statements.

Consequently, restrict EXPLAIN (ANALYZE) on statements that
involve security_barrier views or row-level security to members
of the predefined role pg_read_all_stats.

Discussion: https://postgr.es/m/3a60be45e7a89f50d166dba49553950d6b8a97f5.camel%40cybertec.at
---
 doc/src/sgml/ddl.sgml  |  4 +++
 doc/src/sgml/ref/explain.sgml  |  7 -
 doc/src/sgml/rules.sgml| 13 +++--
 doc/src/sgml/user-manag.sgml   |  4 ++-
 src/backend/commands/explain.c | 33 ++
 src/test/regress/expected/rowsecurity.out  | 17 +++
 src/test/regress/expected/select_views.out | 18 
 src/test/regress/sql/rowsecurity.sql   | 19 +
 src/test/regress/sql/select_views.sql  | 20 +
 9 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 026bfff70f..74d05300b7 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2560,6 +2560,10 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw;
no rows are visible or can be modified.  Operations that apply to the
whole table, such as TRUNCATE and REFERENCES,
are not subject to row security.
+   Note that only members of the predefined role pg_read_all_stats
+   may run EXPLAIN (ANALYZE) for a statement that is subject
+   to row security, because the execution statistics can be used to reveal the
+   existence of rows the user is not allowed to see.
   
 
   
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index db9d3a8549..6586983467 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -84,7 +84,12 @@ EXPLAIN [ ( option [, ...] ) ] security_barrier view,
+   only members of the predefined role pg_read_all_stats
+   can use EXPLAIN ANALYZE with statements that use these
+   features.
   
 
   
diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml
index 7a928bd7b9..f2c0911643 100644
--- a/doc/src/sgml/rules.sgml
+++ b/doc/src/sgml/rules.sgml
@@ -2167,13 +2167,22 @@ CREATE VIEW phone_number WITH (security_barrier) AS
 view's row filters.
 
 
+
+To prevent attackers from using EXPLAIN (ANALYZE) to
+get information about data excluded by the definition of a view with the
+security_barrier option, only superusers and members
+of the predefined role pg_read_all_stats are allowed
+to use the ANALYZE option of EXPLAIN
+on a statement that involves such a view.
+
+
 
 It is important to understand that even a view created with the
 security_barrier option is intended to be secure only
 in the limited sense that the contents of the invisible tuples will not be
 passed to possibly-insecure functions.  The user may well have other means
-of making inferences about the unseen data; for example, they can see the
-query plan using EXPLAIN, or measure the run time of
+of making inferences about the unseen data; for example, they can
+measure the run time of
 queries against the view.  A malicious attacker might be able to infer
 something about the amount of unseen data, or even gain some information
 about the data distribution or most common values (since these things may
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 07a16247d7..a2109304cd 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -639,7 +639,9 @@ DROP ROLE doomed_role;
   
pg_read_all_stats
Read all pg_stat_* views and use various statistics related extensions,
-   even those normally visible only to superusers.
+   even those normally visible only to superusers.
+   Use EXPLAIN (ANALYZE) on statements that involve row-level
+   security or security_barrier views.
   
   
pg_stat_scan_tables
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c0c73aa3c9..85782e614e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #

Re: BitmapHeapScan streaming read user and prelim refactoring

2024-05-15 Thread Alvaro Herrera

On 2024-May-14, Melanie Plageman wrote:

> On Tue, May 14, 2024 at 4:05 PM Alvaro Herrera  
> wrote:

> > I do wonder how do we _know_ that the test is testing what it wants to
> > test:

> We include the explain output (the plan) to ensure it is still
> using a bitmap heap scan. The test exercises the skip fetch
> optimization in bitmap heap scan when not all of the inner tuples are
> emitted.

I meant -- the query returns an empty resultset, so how do we know it's
the empty resultset that we want and not a different empty resultset
that happens to be identical?  (This is probably not a critical point
anyhow.)

> Without the patch, the test fails, so it is protection against someone
> adding back that assert in the future. It is not protection against
> someone deleting the line
> scan->rs_empty_tuples_pending = 0
> That is, it doesn't verify that the empty unused tuples count is
> discarded. Do you think that is necessary?

I don't think that's absolutely necessary.  I suspect there are
thousands of lines that you could delete that would break things, with
no test directly raising red flags.

At this point I would go with the RMT's recommendation of pushing this
now to make sure the bug is fixed for beta1, even if the test is
imperfect.  You can always polish the test afterwards if you feel like
it ...

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"  (Jorge González)

Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-05-15 Thread Robert Haas

On Fri, Mar 22, 2024 at 1:57 PM Robert Haas  wrote:
> Rather, I think that it's entirely appropriate to do what Roberto
> suggested, which is to say, let users know that they're going to use
> some extra resources if they increase the setting, and then let them
> figure out what if anything they want to do about that.

Considering that, and the lack of further comment, I propose to commit
the original patch.

Objections?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Simplify documentation related to Windows builds

2024-05-15 Thread Robert Haas

On Thu, Apr 11, 2024 at 9:27 PM Michael Paquier  wrote:
> So, yes, you're right that removing completely this list may be too
> aggressive for the end-user.  As far as I can see, there are a few
> things that stand out:
> - Diff is not mentioned in the list of dependencies on the meson page,
> and it may not exist by default on Windows.  I think that we should
> add it.
> - We don't use activeperl anymore in the buildfarm, and recommending
> it is not a good idea based on the state of the project.  If we don't
> remove the entry, I would switch it to point to msys perl or even
> strawberry perl.  Andres has expressed concerns about the strawberry
> part, so perhaps mentioning only msys perl would be enough?
> - The portion of the docs about command line editing with psql, cygwin
> being mentioned as an exception, does not apply AFAIK.
> - Mentioning more the packaging options that exist to not have to
> install individual MSIs would be a good addition.

I think that we need to get a more definitive answer to the question
of whether command-line editing works or not. I have the impression
that it never has. If it's started working, we should establish that
for certain and probably also establish what made it start working; if
it works provided you do X, Y, or Z, we should establish what those
things are.

I'm cool with adding diff to the list of dependencies.

I'd prefer to see us update the other links rather than delete them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: AIX support

2024-05-15 Thread Sriram RK

Hi Team, we have any updated from the XLC team, the issue specific to the 
alignment is fixed
and XLC had released it as part of 16.1.0.18. The PTF is available at the below 
location,

You can also find a link here:
https://www.ibm.com/support/pages/fix-list-xl-cc-aix.

>>/opt/IBM/xlC/16.1.0/bin/xlC align.c -o align.xl

>>./align.xl
al4096   4096 @ 0x20008000 (mod 0)
al4096_initialized   4096 @ 0x20004000 (mod 0)
al4096_const 4096 @ 0x2000b000 (mod 0)
al4096_const_initialized 4096 @ 0x10008000 (mod 0)
al4096_static4096 @ 0x2000e000 (mod 0)
al4096_static_initialized4096 @ 0x20001000 (mod 0)
al4096_static_const  4096 @ 0x20011000 (mod 0)
al4096_static_const_initialized  4096 @ 0x10001000 (mod 0)


Also would like to know some info related to the request raised for buildfarm 
access, to register the node in OSU lab. Where can I get the status of the 
request? Whom can I contact to get the request approved? So that we can add the 
node to the buildfarm.

Regards,
Sriram.

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Peter Eisentraut


On 14.05.24 01:11, Tom Lane wrote:

The mechanism that Andres describes for sourcing the name seems a bit
overcomplex though.  Why not just allow/require each extension to
specify its name as a constant string?  We could force the matter by
redefining PG_MODULE_MAGIC as taking an argument:

PG_MODULE_MAGIC("hstore");


We kind of already have something like this, for NLS.  If you look for 
pg_bindtextdomain(TEXTDOMAIN) and ereport_domain(), this information 
already trickles into the vicinity of the error data.  Maybe the same 
thing could just be used for this, by wiring up the macros a bit 
differently.

Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there

2024-05-15 Thread Robert Haas

On Thu, Apr 4, 2024 at 12:41 AM Erik Wienhold  wrote:
> Thanks, fixed in v4.  Looks like American English prefers that comma and
> it's also more common in our docs.

Reviewing this patch:

-  Creates a typed table, which takes its
-  structure from the specified composite type (name optionally
-  schema-qualified).  A typed table is tied to its type; for
-  example the table will be dropped if the type is dropped
-  (with DROP TYPE ... CASCADE).
+  Creates a typed table, which takes its structure
+  from an existing (name optionally schema-qualified) stand-alone composite
+  type (i.e., created using ) though it
+  still produces a new composite type as well.  The table will have
+  a dependency on the referenced type such that cascaded alter and drop
+  actions on the type will propagate to the table.

It would be better if this diff didn't reflow the unchanged portions
of the paragraph.

I agree that it's a good idea to mention that the table must have been
created using CREATE TYPE .. AS here, but I disagree with the rest of
the rewording in this hunk. I think we could just add "creating using
CREATE TYPE" to the end of the first sentence, with an xref, and leave
the rest as it is. I don't see a reason to mention that the typed
table also spawns a rowtype; that's just standard CREATE TABLE
behavior and not really relevant here. And I don't understand what the
rest of the rewording does for us.

-  When a typed table is created, then the data types of the
-  columns are determined by the underlying composite type and are
-  not specified by the CREATE TABLE command.
+  A typed table always has the same column names and data types as the
+  type it is derived from, and you cannot specify additional columns.
   But the CREATE TABLE command can add defaults
-  and constraints to the table and can specify storage parameters.
+  and constraints to the table, as well as specify storage parameters.

I don't see how this is better.

- errmsg("type %s is not a composite type",
+ errmsg("type %s is not a stand-alone composite type",

I agree with Peter's complaint that people aren't going to understand
what a stand-alone composite type means when they see the revised
error message; to really help people, we're going to need to do better
than this, I think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Tom Lane

Peter Eisentraut  writes:
> On 14.05.24 01:11, Tom Lane wrote:
>> The mechanism that Andres describes for sourcing the name seems a bit
>> overcomplex though.  Why not just allow/require each extension to
>> specify its name as a constant string?  We could force the matter by
>> redefining PG_MODULE_MAGIC as taking an argument:
>> PG_MODULE_MAGIC("hstore");

> We kind of already have something like this, for NLS.  If you look for 
> pg_bindtextdomain(TEXTDOMAIN) and ereport_domain(), this information 
> already trickles into the vicinity of the error data.  Maybe the same 
> thing could just be used for this, by wiring up the macros a bit 
> differently.

Hmm, cute idea, but it'd only help for extensions that are
NLS-enabled.  Which I bet is a tiny fraction of the population.
So far as I can find, we don't even document how to set up
TEXTDOMAIN for an extension --- you have to cargo-cult the
macro definition from some in-core extension.

regards, tom lane

Re: AIX support

2024-05-15 Thread Noah Misch

On Wed, May 15, 2024 at 03:33:25PM +, Sriram RK wrote:
> Hi Team, we have any updated from the XLC team, the issue specific to the 
> alignment is fixed
> and XLC had released it as part of 16.1.0.18. The PTF is available at the 
> below location,
> 
> You can also find a link here:
> https://www.ibm.com/support/pages/fix-list-xl-cc-aix.
> 
> >>/opt/IBM/xlC/16.1.0/bin/xlC align.c -o align.xl
> 
> >>./align.xl
> al4096   4096 @ 0x20008000 (mod 0)
> al4096_initialized   4096 @ 0x20004000 (mod 0)
> al4096_const 4096 @ 0x2000b000 (mod 0)
> al4096_const_initialized 4096 @ 0x10008000 (mod 0)
> al4096_static4096 @ 0x2000e000 (mod 0)
> al4096_static_initialized4096 @ 0x20001000 (mod 0)
> al4096_static_const  4096 @ 0x20011000 (mod 0)
> al4096_static_const_initialized  4096 @ 0x10001000 (mod 0)

That is good news.  PGIOAlignedBlock is now in the IBM publication,
https://www.ibm.com/support/pages/apar/IJ51032

> Also would like to know some info related to the request raised for buildfarm 
> access, to register the node in OSU lab. Where can I get the status of the 
> request? Whom can I contact to get the request approved? So that we can add 
> the node to the buildfarm.

I assume you filled out the form at
https://buildfarm.postgresql.org/cgi-bin/register-form.pl?  It can take a few
weeks, so I wouldn't worry yet.

Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-05-15 Thread Robert Haas

On Tue, Apr 23, 2024 at 4:05 PM Nathan Bossart  wrote:
> Here is a first attempt.  I'm not tremendously happy with it, but it at
> least gets something on the page to build on.  I was originally going to
> copy/paste the relevant steps into the description of the incremental
> process, but that seemed kind-of silly, so I instead just pointed to the
> relevant steps of the "full" process, along with the deviations from those
> steps.  That's a little more work for the reader, but maybe it isn't too
> bad...  I plan to iterate on this patch some more.

What jumps out at me when I read this patch is that it says that an
incremental run should do steps 1-3 of a complete run, and then
immediately backtracks and says not to do step 2, which seems a little
strange.

I played around with this a bit and came up with the attached, which
takes a slightly different approach. Feel free to use, ignore, etc.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

pgindent_readme_v2.patch
Description: Binary data

Re: AIX support

2024-05-15 Thread Sriram RK

> > Also would like to know some info related to the request raised for 
> > buildfarm access, to register the
> > node in OSU lab. Where can I get the status of the request? Whom can I 
> > contact to get the request
> > approved? So that we can add the node to the buildfarm.

> I assume you filled out the form at
> https://buildfarm.postgresql.org/cgi-bin/register-form.pl?  It can take a few
> weeks, so I wouldn't worry yet.

Thanks Noha, I had already submitted a form a week back, hope it might take 
another couple of weeks to get it approved.

Re: libpq compression (part 3)

2024-05-15 Thread Jacob Burroughs

On Wed, May 15, 2024 at 8:38 AM Robert Haas  wrote:
>
> I agree with that goal, but I'm somewhat confused by how your proposal
> achieves it. You had
> libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off,
> so how do we parse that? Is that two completely separate
> specifications, one for lzma and one for gzip, and each of those has
> one option which is set to off? And then they are separated from each
> other by a semicolon? That actually does make sense, and I think it
> may do a better job allowing for compression options than my proposal,
> but it also seems a bit weird, because client_to_server and
> server_to_client are not really compression options at all. They're
> framing when this compression specification applies, rather than what
> it does when it applies. In a way it's a bit like the fact that you
> can prefix a pg_basebackup's --compress option with client- or server-
> to specify where the compression should happen. But we can't quite
> reuse that idea here, because in that case there's no question of
> doing it in both places, whereas here, you might want one thing for
> upstream and another thing for downstream.

Your interpretation is correct, but I don't disagree that it ends up
feeling confusing.

> I'm not a fan of three settings; I could go with two settings, one for
> each direction, and if you want both you have to set both. Or, another
> idea, what if we just separated the two directions with a slash,
> SEND/RECEIVE, and if there's no slash, then it applies to both
> directions. So you could say
> connection_compression='gzip:level=9/lzma' or whatever.
>
> But now I'm wondering whether these options should really be symmetric
> on the client and server sides? Isn't it for the server just to
> specify a list of acceptable algorithms, and the client to set the
> compression options? If both sides are trying to set the compression
> level, for example, who wins?

Compression options really only ever apply to the side doing the
compressing, and at least as I had imagined things each party
(client/server) only used its own level/other compression params.
That leaves me thinking, maybe we really want two independent GUCs,
one for "what algorithms are enabled/negotiable" and one for "how
should I configure my compressors" and then we reduce the dimensions
we are trying to shove into one GUC and each one ends up with a very
clear purpose:
connection_compression=(yes|no|alg1,alg2:server_to_client=alg1,alg2:client_to_server=alg3)
connection_compression_opts=gzip:level=2


-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com

Re: libpq compression (part 3)

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 12:24 PM Jacob Burroughs
 wrote:
> > But now I'm wondering whether these options should really be symmetric
> > on the client and server sides? Isn't it for the server just to
> > specify a list of acceptable algorithms, and the client to set the
> > compression options? If both sides are trying to set the compression
> > level, for example, who wins?
>
> Compression options really only ever apply to the side doing the
> compressing, and at least as I had imagined things each party
> (client/server) only used its own level/other compression params.
> That leaves me thinking, maybe we really want two independent GUCs,
> one for "what algorithms are enabled/negotiable" and one for "how
> should I configure my compressors" and then we reduce the dimensions
> we are trying to shove into one GUC and each one ends up with a very
> clear purpose:
> connection_compression=(yes|no|alg1,alg2:server_to_client=alg1,alg2:client_to_server=alg3)
> connection_compression_opts=gzip:level=2

>From my point of view, it's the client who knows what it wants to do
with the connection. If the client plans to read a lot of data, it
might want the server to compress that data, especially if it knows
that it's on a slow link. If the client plans to send a lot of data --
basically COPY, I'm not thinking this is going to matter much
otherwise -- then it might want to compress that data before sending
it, again, especially if it knows that it's on a slow link.

But what does the server know, really? If some client connects and
sends a SELECT query, the server can't guess whether that query is
going to return 1 row or 100 million rows, so it has no idea of
whether compression is likely to make sense or not. It is entitled to
decide, as a matter of policy, that it's not willing to perform
compression, either because of CPU consumption or security concerns or
whatever, but it has no knowledge of what the purpose of this
particular connection is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: GenBKI emits useless open;close for catalogs without rows

2024-05-15 Thread Robert Haas

On Tue, Apr 9, 2024 at 12:03 AM Michael Paquier  wrote:
> Hmm, is that productive?  This patch has been waiting on author since
> the 1st of February, and it was already moved from the CF 2024-01 to
> 2024-03.  It would make more sense to me to mark it as RwF, then
> resubmit if there is still interest in working on this topic rather
> than move it again.

Done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: libpq compression (part 3)

2024-05-15 Thread Jacob Burroughs

On Wed, May 15, 2024 at 11:31 AM Robert Haas  wrote:
> From my point of view, it's the client who knows what it wants to do
> with the connection. If the client plans to read a lot of data, it
> might want the server to compress that data, especially if it knows
> that it's on a slow link. If the client plans to send a lot of data --
> basically COPY, I'm not thinking this is going to matter much
> otherwise -- then it might want to compress that data before sending
> it, again, especially if it knows that it's on a slow link.
>
> But what does the server know, really? If some client connects and
> sends a SELECT query, the server can't guess whether that query is
> going to return 1 row or 100 million rows, so it has no idea of
> whether compression is likely to make sense or not. It is entitled to
> decide, as a matter of policy, that it's not willing to perform
> compression, either because of CPU consumption or security concerns or
> whatever, but it has no knowledge of what the purpose of this
> particular connection is.

I think I would agree with that.  That said, I don't think the client
should be in the business of specifying what configuration of the
compression algorithm the server should use.  The server administrator
(or really most of the time, the compression library developer's
defaults) gets to pick the compression/compute tradeoff for
compression that runs on the server (which I would imagine would be
the vast majority of it), and the client gets to pick those same
parameters for any compression that runs on the client machine
(probably indeed in practice only for large COPYs).  The *algorithm*
needs to actually be communicated/negotiated since different
client/server pairs may be built with support for different
compression libraries, but I think it is reasonable to say that the
side that actually has to do the computationally expensive part owns
the configuration of that part too.  Maybe I'm missing a good reason
that we want to allow clients to choose compression levels for the
server though?

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Chapman Flack

On 05/15/24 11:50, Tom Lane wrote:
> Hmm, cute idea, but it'd only help for extensions that are
> NLS-enabled.  Which I bet is a tiny fraction of the population.
> So far as I can find, we don't even document how to set up
> TEXTDOMAIN for an extension --- you have to cargo-cult the

But I'd bet, within the fraction of the population that does use it,
it is already a short string that looks a whole lot like the name
of the extension. So maybe enhancing the documentation and making it
easy to set up would achieve much of the objective here.

Could PGXS be made to supply the extension name as TEXTDOMAIN when
building code that does not otherwise define it, and would that have
any ill effect on the otherwise not-NLS-enabled code? Would the worst-
case effect be a failed search for a nonexistent .mo file, followed by
output of the untranslated message as before?

At first glance, it appears elog will apply PG_TEXTDOMAIN("postgres")
in an extension that does not otherwise define TEXTDOMAIN. But I assume
the usual effect of that is already a failed lookup followed by output of
the untranslated message, except in the case of the out-of-core extension
using a message matching a PG_TEXTDOMAIN("postgres") translation.

If that case is considered unexpected, or actively discouraged, perhaps
defining TEXTDOMAIN in an otherwise not-NLS-enabled extension could be
relatively painless.

Regards,
-Chap

Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block

2024-05-15 Thread Robert Haas

On Tue, Apr 16, 2024 at 3:16 AM Quan Zongliang  wrote:
> According to the discussion with Jian He. Use the guc hook to check if
> the xid needs to be output. If needed, the statement log can be delayed
> to be output.

I appreciate the work that both of you have put into this, but I think
we should reject this patch and remove the TODO item. We currently
have some facilities (like log_statement) that log the statement
before parsing it or executing it, and others (like
log_min_duration_statement) that log it afterward. That is probably
not documented as clearly as it should be, but it's long-established
behavior.

What this patch does is change the behavior of log_statement so that
log_statement sometimes logs the statement before it's executed, and
sometimes after the statement. I think that's going to be confusing
and unhelpful. In particular, right now you can assume that if you set
log_statement=all and there's a statement running, it's already been
logged. With this change, that would sometimes be true and sometimes
false.

For example, suppose that at 9am sharp, I run an UPDATE command that
takes ten seconds to complete. Right now, the log_statement message
will appear at 9am. With this change, it will run at 9am if I do it
inside a transaction block that has an XID already, and at 9:00:10am
if I do it in a transaction block that does not yet have an XID, or if
I do it outside of a transaction. I don't think the benefit of getting
the XID in the log message is nearly enough to justify such a strange
behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Andres Freund

Hi,

On 2024-05-15 12:54:45 -0400, Chapman Flack wrote:
> On 05/15/24 11:50, Tom Lane wrote:
> > Hmm, cute idea, but it'd only help for extensions that are
> > NLS-enabled.  Which I bet is a tiny fraction of the population.
> > So far as I can find, we don't even document how to set up
> > TEXTDOMAIN for an extension --- you have to cargo-cult the
> 
> But I'd bet, within the fraction of the population that does use it,
> it is already a short string that looks a whole lot like the name
> of the extension. So maybe enhancing the documentation and making it
> easy to set up would achieve much of the objective here.

The likely outcome would IMO be that some extensions will have the data,
others not. Whereas inferring the information from our side will give you
something reliable.

But I also just don't think it's something that architecturally fits together
that well. If we either had TEXTDOMAIN reliably set across extensions or it'd
architecturally be pretty, I'd go for it, but imo it's neither.

Greetings,

Andres Freund

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Robert Haas

On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
 wrote:
> I think there's about 0% chance we'd add this to "postgres" binary.

Several people have taken this position, so I'm going to mark
https://commitfest.postgresql.org/48/4704/ as Rejected.

My own opinion is that syntax checking is a useful thing to expose,
but I don't believe that this is a useful way to expose it. I think
this comment that Tom made upthread is right on target:

# Another thing I don't like is that this exposes to the user what ought
# to be purely an implementation detail, namely the division of labor
# between gram.y (raw_parser()) and the rest of the parser.  There are
# checks that a user would probably see as "syntax checks" that don't
# happen in gram.y, and conversely there are some things we reject there
# that seem more like semantic than syntax issues.

I think that what that means in practice is that, while this patch may
seem to give reasonable results in simple tests, as soon as you try to
do slightly more complicated things with it, it's going to give weird
results, either failing to flag things that you'd expect it to flag,
or flagging things you'd expect it not to flag. Fixing that would be
either impossible or a huge amount of work depending on your point of
view. If you take the point of view that we need to adjust things so
that the raw parser reports all the things that ought to be reported
by a tool like this and none of the things that it shouldn't, then
it's probably just a huge amount of work. If you take the point of
view that what goes into the raw parser and what goes into parse
analysis ought to be an implementation decision untethered to what a
tool like this ought to report, then fixing the problems would be
impossible, or at least, it would amount to throwing away this patch
and starting over. I think the latter point of view, which Tom has
already taken, would be the more prevalent view among hackers by far,
but even if the former view prevailed, who is going to do all that
work?

I strongly suspect that doing something useful in this area requires
about two orders of magnitude more code than are included in this
patch, and a completely different design. If it actually worked well
to do something this simple, somebody probably would have done it
already. In fact, there are already tools out there for validation,
like https://github.com/okbob/plpgsql_check for example. That tool
doesn't do exactly the same thing as this patch is trying to do, but
it does do other kinds of validation, and it's 19k lines of code, vs.
the 45 lines of code in this patch, which I think reinforces the point
that you need to do something much more complicated than this to
create real value.

Also, the patch as proposed, besides being 45 lines, also has zero
lines of comments, no test cases, no documentation, and doesn't follow
the PostgreSQL coding standards. I'm not saying that to be mean, nor
am I suggesting that Josef should go fix that stuff. It's perfectly
reasonable to propose a small patch without a lot of research to see
what people think -- but now we have the answer to that question:
people think it won't work. So Josef should now decide to either give
up, or try a new approach, or if he's really sure that all the
feedback that has been given so far is completely wrong, he can try to
demonstrate that the patch does all kinds of wonderful things with
very few disadvantages. But I think if he does that last, he's going
to find that it's not really possible, because I'm pretty sure that
Tom is right.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Tom Lane

Andres Freund  writes:
> On 2024-05-15 12:54:45 -0400, Chapman Flack wrote:
>> But I'd bet, within the fraction of the population that does use it,
>> it is already a short string that looks a whole lot like the name
>> of the extension. So maybe enhancing the documentation and making it
>> easy to set up would achieve much of the objective here.

> The likely outcome would IMO be that some extensions will have the data,
> others not. Whereas inferring the information from our side will give you
> something reliable.
> But I also just don't think it's something that architecturally fits together
> that well. If we either had TEXTDOMAIN reliably set across extensions or it'd
> architecturally be pretty, I'd go for it, but imo it's neither.

There is one advantage over my suggestion of changing PG_MODULE_MAGIC:
if we tell people to write

   PG_MODULE_MAGIC;
   #undef TEXTDOMAIN
   #define TEXTDOMAIN PG_TEXTDOMAIN("hstore")

then that's 100% backwards compatible and they don't need any
version-testing ifdef's.

I still think that the kind of infrastructure Andres proposes
is way overkill compared to the value, plus it's almost certainly
going to have a bunch of platform-specific problems to solve.
So I think Peter's thought is worth pursuing.

regards, tom lane

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Chapman Flack

On 05/15/24 13:45, Tom Lane wrote:
> if we tell people to write
> 
>PG_MODULE_MAGIC;
>#undef TEXTDOMAIN
>#define TEXTDOMAIN PG_TEXTDOMAIN("hstore")
> 
> then that's 100% backwards compatible and they don't need any
> version-testing ifdef's.

OT for this thread, but related: supposing out-of-core extensions
participate increasingly in NLS, would they really want to use
the PG_TEXTDOMAIN macro?

That munges the supplied domain name with PG's major version and
.so version numbers.

Were such versioning wanted for an out-of-core extension's message
catalogs, wouldn't the extension's own versioning be better suited?

Regards,
-Chap

Re: psql JSON output format

2024-05-15 Thread Robert Haas

On Tue, Jan 23, 2024 at 11:35 AM Christoph Berg  wrote:
> Ack.

The last version of this patch was posted on January 22nd and got a
bunch of replies, so I'm marking
https://commitfest.postgresql.org/48/4707/ as Returned with Feedback
for now. Please feel free to update the status of the patch when the
situation changes.

IMHO, the big problem here is that different people want different
corner-case behaviors and it's not clear what to do about that. I
don't think there's a single vote for "don't do this at all". So if
there is a desire to take this work forward, the goal probably ought
to be to try to either (a) figure out one behavior that everyone can
live with or (b) figure out a short list of options that can be used
to customize the behavior to a degree that lets everyone get something
reasonably close to what they want. For instance, "what to do if you
find a SQL null" and "whether to include json values as strings or
json objects" seem like they could potentially be customizable. That's
probably not a silver bullet because (1) that's more work and (2)
there might be more behaviors than we want to code, or maintain the
code for, and (3) if it gets too complicated that can itself become a
source of objections. But it's an idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek

st 15. 5. 2024 v 19:43 odesílatel Robert Haas  napsal:
>
> On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
>  wrote:
> > I think there's about 0% chance we'd add this to "postgres" binary.
>
> Several people have taken this position, so I'm going to mark
> https://commitfest.postgresql.org/48/4704/ as Rejected.
>
> My own opinion is that syntax checking is a useful thing to expose,
> but I don't believe that this is a useful way to expose it. I think
> this comment that Tom made upthread is right on target:
>
> # Another thing I don't like is that this exposes to the user what ought
> # to be purely an implementation detail, namely the division of labor
> # between gram.y (raw_parser()) and the rest of the parser.  There are
> # checks that a user would probably see as "syntax checks" that don't
> # happen in gram.y, and conversely there are some things we reject there
> # that seem more like semantic than syntax issues.
>
> I think that what that means in practice is that, while this patch may
> seem to give reasonable results in simple tests, as soon as you try to
> do slightly more complicated things with it, it's going to give weird
> results, either failing to flag things that you'd expect it to flag,
> or flagging things you'd expect it not to flag. Fixing that would be
> either impossible or a huge amount of work depending on your point of
> view. If you take the point of view that we need to adjust things so
> that the raw parser reports all the things that ought to be reported
> by a tool like this and none of the things that it shouldn't, then
> it's probably just a huge amount of work. If you take the point of
> view that what goes into the raw parser and what goes into parse
> analysis ought to be an implementation decision untethered to what a
> tool like this ought to report, then fixing the problems would be
> impossible, or at least, it would amount to throwing away this patch
> and starting over. I think the latter point of view, which Tom has
> already taken, would be the more prevalent view among hackers by far,
> but even if the former view prevailed, who is going to do all that
> work?
>
> I strongly suspect that doing something useful in this area requires
> about two orders of magnitude more code than are included in this
> patch, and a completely different design. If it actually worked well
> to do something this simple, somebody probably would have done it
> already. In fact, there are already tools out there for validation,
> like https://github.com/okbob/plpgsql_check for example. That tool
> doesn't do exactly the same thing as this patch is trying to do, but
> it does do other kinds of validation, and it's 19k lines of code, vs.
> the 45 lines of code in this patch, which I think reinforces the point
> that you need to do something much more complicated than this to
> create real value.
>
> Also, the patch as proposed, besides being 45 lines, also has zero
> lines of comments, no test cases, no documentation, and doesn't follow
> the PostgreSQL coding standards. I'm not saying that to be mean, nor
> am I suggesting that Josef should go fix that stuff. It's perfectly
> reasonable to propose a small patch without a lot of research to see
> what people think -- but now we have the answer to that question:
> people think it won't work. So Josef should now decide to either give
> up, or try a new approach, or if he's really sure that all the
> feedback that has been given so far is completely wrong, he can try to
> demonstrate that the patch does all kinds of wonderful things with
> very few disadvantages. But I think if he does that last, he's going
> to find that it's not really possible, because I'm pretty sure that
> Tom is right.

I'm totally OK to mark this as rejected and indeed 45 lines were just
minimal patch to create PoC (I have coded this during last PGConf.eu
lunch break) and mainly to start discussion.

I'm not sure everyone in this thread understands the reason for this
patch, which is clearly my fault, since I have failed to explain. Main
idea is to make a tool to validate query can be parsed, that's all.
Similar to running "EXPLAIN query", but not caring about the result
and not caring about the DB structure (ignoring missing tables, ...),
just checking it was successfully executed. This definitely belongs to
the server side and not to the client side, it is just a tool to
validate that for this running PostgreSQL backend it is a "parseable"
query.

I'm not giving up on this, but I hear there are various problems to
explore. If I understand it well, just running the parser to query
doesn't guarantee the query is valid, since it can fail later for
various reasons (I mean other than missing table, ...). I wasn't aware
of that. Also exposing this inside postgres binary seems
controversial. I had an idea to expose parser result at SQL level with
a new command (similar to EXPLAIN), but you'll need running PostgreSQL
backend to be able to use this capability, which is agains

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek

pá 15. 12. 2023 v 15:50 odesílatel Tom Lane  napsal:
>
> Laurenz Albe  writes:
> > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> >> of ideas for PostgreSQL
> >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> >> quick patch to do SQL syntax validation.
> >>
> >> What do you think?
>
> > I like the idea.  But what will happen if the SQL statement references
> > tables or other objects, since we have no database?
>
> This seems like a fairly useless wart to me.

this hurts :'(

>
> In the big picture a command line switch in the postgres executable
> doesn't feel like the right place for this.  There's no good reason
> to assume that the server executable will be installed where you want
> this capability; not to mention the possibility of version skew
> between that executable and whatever installation you're actually
> running on.
>
> Another thing I don't like is that this exposes to the user what ought
> to be purely an implementation detail, namely the division of labor
> between gram.y (raw_parser()) and the rest of the parser.  There are
> checks that a user would probably see as "syntax checks" that don't
> happen in gram.y, and conversely there are some things we reject there
> that seem more like semantic than syntax issues.
>
> regards, tom lane

Re: Use streaming read API in ANALYZE

2024-05-15 Thread Nazir Bilal Yavuz

Hi,

On Mon, 29 Apr 2024 at 18:41, Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> On Mon, 8 Apr 2024 at 04:21, Thomas Munro  wrote:
> >
> > Pushed.  Thanks Bilal and reviewers!
>
> I wanted to discuss what will happen to this patch now that
> 27bc1772fc8 is reverted. I am continuing this thread but I can create
> another thread if you prefer so.

041b96802ef is discussed in the 'Table AM Interface Enhancements'
thread [1]. The main problems discussed about this commit is that the
read stream API is not pushed to the heap-specific code and, because
of that, the other AM implementations need to use read streams. To
push read stream API to the heap-specific code, it is pretty much
required to pass BufferAccessStrategy and BlockSamplerData to the
initscan().

I am sharing the alternative version of this patch. The first patch
just reverts 041b96802ef and the second patch is the alternative
version.

In this alternative version, the read stream API is not pushed to the
heap-specific code, but it is controlled by the heap-specific code.
The SO_USE_READ_STREAMS_IN_ANALYZE flag is introduced and set in the
heap-specific code if the scan type is 'ANALYZE'. This flag is used to
decide whether streaming API in ANALYZE will be used or not. If this
flag is set, this means heap AMs and read stream API will be used. If
it is not set, this means heap AMs will not be used and code falls
back to the version before read streams.

Pros of the alternative version:

* The existing AM implementations other than heap AM can continue to
use their AMs without any change.
* AM implementations other than heap do not need to use read streams.
* Upstream code uses the read stream API and benefits from that.

Cons of the alternative version:

* 6 if cases are added to the acquire_sample_rows() function and 3 of
them are in the while loop.
* Because of these changes, the code looks messy.

Any kind of feedback would be appreciated.

[1] 
https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 323f28ff979cde8e4dbde8b4654bded74abf1fbc Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Wed, 15 May 2024 00:03:56 +0300
Subject: [PATCH v13 1/2] Revert "Use streaming I/O in ANALYZE."

This commit reverts 041b96802ef.

041b96802ef revised the changes on 27bc1772fc8 but 27bc1772fc8 and
dd1f6b0c172 are reverted together in 6377e12a5a5. So, this commit
reverts all 27bc1772fc, 041b96802ef and dd1f6b0c172 together.

Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
---
 src/include/access/tableam.h | 26 +++
 src/backend/access/heap/heapam_handler.c | 38 +-
 src/backend/commands/analyze.c   | 96 ++--
 3 files changed, 98 insertions(+), 62 deletions(-)

diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd5..e08b9627f30 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,6 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
-#include "storage/read_stream.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"

@@ -655,16 +654,6 @@ typedef struct TableAmRoutine
 	struct VacuumParams *params,
 	BufferAccessStrategy bstrategy);

-	/*
-	 * Prepare to analyze the next block in the read stream.  Returns false if
-	 * the stream is exhausted and true otherwise. The scan must have been
-	 * started with SO_TYPE_ANALYZE option.
-	 *
-	 * This routine holds a buffer pin and lock on the heap page.  They are
-	 * held until heapam_scan_analyze_next_tuple() returns false.  That is
-	 * until all the items of the heap page are analyzed.
-	 */
-
 	/*
 	 * Prepare to analyze block `blockno` of `scan`. The scan has been started
 	 * with table_beginscan_analyze().  See also
@@ -683,7 +672,8 @@ typedef struct TableAmRoutine
 	 * isn't one yet.
 	 */
 	bool		(*scan_analyze_next_block) (TableScanDesc scan,
-			ReadStream *stream);
+			BlockNumber blockno,
+			BufferAccessStrategy bstrategy);

 	/*
 	 * See table_scan_analyze_next_tuple().
@@ -1721,17 +1711,19 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
 }

 /*
- * Prepare to analyze the next block in the read stream. The scan needs to
- * have been  started with table_beginscan_analyze().  Note that this routine
- * might acquire resources like locks that are held until
+ * Prepare to analyze block `blockno` of `scan`. The scan needs to have been
+ * started with table_beginscan_analyze().  Note that this routine might
+ * acquire resources like locks that are held until
  * table_scan_analyze_next_tuple() returns false.
  *
  * Returns false if block is unsuitable for sampling, true otherwise.
  */
 static inline bool
-table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
+table_scan_analyze_next_block(TableScanDesc scan,

Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

2024-05-15 Thread Jacob Champion

On Tue, May 14, 2024 at 11:15 PM Peter Eisentraut  wrote:
>
> On 15.05.24 02:00, Michael Paquier wrote:
> > Is that a recent regression?  I have some blurry memories from
> > working on these areas that changing src/common/ reflected on the
> > compiled pieces effectively at some point.
>
> One instance of this problem that I can reproduce at least back to PG12 is
>
> 1. touch src/common/exec.c
> 2. make -C src/bin/pg_dump
>
> This will rebuild libpgcommon, but it will not relink pg_dump.

I remember src/common/unicode changes having similar trouble, as well [1].

--Jacob

[1] 
https://www.postgresql.org/message-id/CAFBsxsGZTwzDnTs=TVM38CCTPP3Y0D3=h+uiwt8m83d5thh...@mail.gmail.com

Re: Direct SSL connection with ALPN and HBA rules

2024-05-15 Thread Jacob Champion

On Wed, May 15, 2024 at 6:33 AM Heikki Linnakangas  wrote:
> Ok, yeah, I can see that now. Here's a new version to address that. I
> merged ENC_SSL_NEGOTIATED_SSL and ENC_SSL_DIRECT_SSL to a single method,
> ENC_SSL. The places that need to distinguish between them now check
> conn-sslnegotiation. That seems more clear now that there is no fallback.

That change and the new comment that were added seem a lot clearer to
me, too; +1. And I like that this potentially preps for
encryption=gss/ssl/none or similar.

This assertion seems a little strange to me:

>   if (conn->sslnegotiation[0] == 'p')
>   {
>   ProtocolVersion pv;
>
>   Assert(conn->sslnegotiation[0] == 'p');

But other than that nitpick, nothing else jumps out at me at the moment.

Thanks,
--Jacob

Re: Support a wildcard in backtrace_functions

2024-05-15 Thread Robert Haas

Hi,

This patch is currently parked in the July CommitFest:

https://commitfest.postgresql.org/48/4735/

That's fine, except that I think that what the previous discussion
revealed is that we don't have consensus on how backtrace behavior
ought to be controlled: backtrace_on_internal_error was one proposal,
and this was a competing proposal, and neither one of them seemed to
be completely satisfactory. If we don't forge a consensus on what a
hypothetical patch ought to do, any particular actual patch is
probably doomed. So I would suggest that the effort ought to be
deciding what kind of design would be generally acceptable -- and that
need not wait for July, nor should it, if the goal is to get something
committed in July.

...Robert

Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Imseih (AWS), Sami

> Okay, that's what I precisely wanted to understand: queryId doesn't have
> semantics to show the job that consumes resources right now—it is mostly
> about convenience to know that the backend processes nothing except
> (probably) this query.

It may be a good idea to expose in pg_stat_activity or a
supplemental activity view information about the current state of the
query processing. i.e. Is it parsing, planning or executing a query or
is it processing a nested query. 

I can see this being useful and perhaps could be taken up in a 
separate thread.

Regards,

Sami

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Tom Lane

=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> I'm not sure everyone in this thread understands the reason for this
> patch, which is clearly my fault, since I have failed to explain. Main
> idea is to make a tool to validate query can be parsed, that's all.
> Similar to running "EXPLAIN query", but not caring about the result
> and not caring about the DB structure (ignoring missing tables, ...),
> just checking it was successfully executed. This definitely belongs to
> the server side and not to the client side, it is just a tool to
> validate that for this running PostgreSQL backend it is a "parseable"
> query.

The thing that was bothering me most about this is that I don't
understand why that's a useful check.  If I meant to type

UPDATE mytab SET mycol = 42;

and instead I type

UPDATEE mytab SET mycol = 42;

your proposed feature would catch that; great.  But if I type

UPDATE mytabb SET mycol = 42;

it won't.  How does that make sense?  I'm not entirely sure where
to draw the line about what a "syntax check" should catch, but this
seems a bit south of what I'd want in a syntax-checking editor.

BTW, if you do feel that a pure grammar check is worthwhile, you
should look at the ecpg preprocessor, which does more or less that
with the SQL portions of its input.  ecpg could be a better model
to follow because it doesn't bring all the dependencies of the server
and so is much more likely to appear in a client-side installation.
It's kind of an ugly, unmaintained mess at the moment, sadly.

The big knock on doing this client-side is that there might be
version skew compared to the server you're using --- but if you
are not doing anything beyond a grammar-level check then your
results are pretty approximate anyway, ISTM.  We've not heard
anything suggesting that version skew is a huge problem for
ecpg users.

regards, tom lane

Fix PGresult leak in pg_dump during binary upgrade

2024-05-15 Thread Daniel Gustafsson

While looking at pg_dump performance today I noticed that pg_dump fails to
clear query results in binary_upgrade_set_pg_class_oids during binary upgrade
mode.  9a974cbcba00 moved the query to the outer block, but left the PQclear
and query buffer destruction in the is_index conditional, making it not always
be executed.  353708e1fb2d fixed the leak of the query buffer but left the
PGresult leak.  The attached fixes the PGresult leak which when upgrading large
schemas can be non-trivial.

This needs to be backpatched down to v15.

--
Daniel Gustafsson



0001-Fix-query-result-leak-during-binary-upgrade.patch
Description: Binary data

Re: add function argument names to regex* functions.

2024-05-15 Thread Robert Haas

On Thu, Apr 4, 2024 at 9:55 AM jian he  wrote:
> in the regexp_replace explanation section.
> changing "N" to lower-case would be misleading for regexp_replace?
> so I choose "count".

I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:

+ then the count'th match of the pattern

An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.

> By the way, I think the above  is so hard to comprehend.
> I can only find related test in src/test/regress/sql/strings.sql are:
> SELECT regexp_replace('111222', E'(\\d{3})(\\d{3})(\\d{4})',
> E'(\\1) \\2-\\3');
> SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
> SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'XY', 'g');
>
> but these tests seem not friendly.
> maybe we should have some simple examples to demonstrate the above paragraph.

Examples in the regression tests aren't meant as tests, not examples
for users to copy. If we want examples, those belong in the
documentation. However, I see that regexp_replace already has some
examples at 
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
so I'm not sure exactly what you think should be added.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix PGresult leak in pg_dump during binary upgrade

2024-05-15 Thread Tom Lane

Daniel Gustafsson  writes:
> While looking at pg_dump performance today I noticed that pg_dump fails to
> clear query results in binary_upgrade_set_pg_class_oids during binary upgrade
> mode.  9a974cbcba00 moved the query to the outer block, but left the PQclear
> and query buffer destruction in the is_index conditional, making it not always
> be executed.  353708e1fb2d fixed the leak of the query buffer but left the
> PGresult leak.  The attached fixes the PGresult leak which when upgrading 
> large
> schemas can be non-trivial.

+1 --- in 353708e1f I was just fixing what Coverity complained about.
I wonder why it missed this; it does seem to understand that PGresult
leaks are a thing.  But anyway, I missed it too.

regards, tom lane

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread walther


Tom Lane:

The thing that was bothering me most about this is that I don't
understand why that's a useful check.  If I meant to type

UPDATE mytab SET mycol = 42;

and instead I type

UPDATEE mytab SET mycol = 42;

your proposed feature would catch that; great.  But if I type

UPDATE mytabb SET mycol = 42;

it won't.  How does that make sense?  I'm not entirely sure where
to draw the line about what a "syntax check" should catch, but this
seems a bit south of what I'd want in a syntax-checking editor.

BTW, if you do feel that a pure grammar check is worthwhile, you
should look at the ecpg preprocessor, which does more or less that
with the SQL portions of its input.  ecpg could be a better model
to follow because it doesn't bring all the dependencies of the server
and so is much more likely to appear in a client-side installation.
It's kind of an ugly, unmaintained mess at the moment, sadly.


Would working with ecpg allow to get back a parse tree of the query to 
do stuff with that?


This is really what is missing for the ecosystem. A libpqparser for 
tools to use: Formatters, linters, query rewriters, simple syntax 
checkers... they are all missing access to postgres' own parser.


Best,

Wolfgang

Re: add function argument names to regex* functions.

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 2:46 PM Robert Haas  wrote:
> Examples in the regression tests aren't meant as tests, not examples
> for users to copy. If we want examples, those belong in the
> documentation. However, I see that regexp_replace already has some
> examples at 
> https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
> so I'm not sure exactly what you think should be added.

Woops. I should have said: Examples in the regression tests *are*
meant as tests...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek

st 15. 5. 2024 v 20:39 odesílatel Tom Lane  napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > I'm not sure everyone in this thread understands the reason for this
> > patch, which is clearly my fault, since I have failed to explain. Main
> > idea is to make a tool to validate query can be parsed, that's all.
> > Similar to running "EXPLAIN query", but not caring about the result
> > and not caring about the DB structure (ignoring missing tables, ...),
> > just checking it was successfully executed. This definitely belongs to
> > the server side and not to the client side, it is just a tool to
> > validate that for this running PostgreSQL backend it is a "parseable"
> > query.
>
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
>
> UPDATE mytab SET mycol = 42;
>
> and instead I type
>
> UPDATEE mytab SET mycol = 42;
>
> your proposed feature would catch that; great.  But if I type
>
> UPDATE mytabb SET mycol = 42;
>
> it won't.  How does that make sense?  I'm not entirely sure where
> to draw the line about what a "syntax check" should catch, but this
> seems a bit south of what I'd want in a syntax-checking editor.

This is exactly where the line is drawn. My motivation is not to use
this feature for syntax check in editor (even could be used to find
those banalities). I'm looking to improve automation to be able to
detect those banalities as early as possible. Let's say there is
complex CI automation configuring and starting PostgreSQL backend,
loading some data, ... and in the end all this is useless, since there
is this kind of simple mistake like UPDATEE. I would like to detect
this problem as early as possible and stop the complex CI pipeline to
save time and also to save resources (= money) by failing super-early
and reporting back. This kind of mistake could be simply introduced by
like wrongly resolved git conflict, human typing error ...

This kind of mistake (typo, ...) can be usually spotted super early.
In compiled languages during compilation, in interpreted languages
(like Ruby) at program start (since code is parsed as one of the first
steps). There is no such early detection possible for PostgreSQL
currently IMHO.

> BTW, if you do feel that a pure grammar check is worthwhile, you
> should look at the ecpg preprocessor, which does more or less that
> with the SQL portions of its input.  ecpg could be a better model
> to follow because it doesn't bring all the dependencies of the server
> and so is much more likely to appear in a client-side installation.
> It's kind of an ugly, unmaintained mess at the moment, sadly.
>
> The big knock on doing this client-side is that there might be
> version skew compared to the server you're using --- but if you
> are not doing anything beyond a grammar-level check then your
> results are pretty approximate anyway, ISTM.  We've not heard
> anything suggesting that version skew is a huge problem for
> ecpg users.

Thanks for the info, I'll check.

> regards, tom lane

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Tom Lane

walt...@technowledgy.de writes:
> Tom Lane:
>> BTW, if you do feel that a pure grammar check is worthwhile, you
>> should look at the ecpg preprocessor, which does more or less that
>> with the SQL portions of its input.

> Would working with ecpg allow to get back a parse tree of the query to 
> do stuff with that?

No, ecpg isn't interested in building a syntax tree.

> This is really what is missing for the ecosystem. A libpqparser for 
> tools to use: Formatters, linters, query rewriters, simple syntax 
> checkers... they are all missing access to postgres' own parser.

To get to that, you'd need some kind of agreement on what the syntax
tree is.  I doubt our existing implementation would be directly useful
to very many tools, and even if it is, do they want to track constant
version-to-version changes?

regards, tom lane

Re: add function argument names to regex* functions.

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 11:46 AM Robert Haas  wrote:

> On Thu, Apr 4, 2024 at 9:55 AM jian he 
> wrote:
> > in the regexp_replace explanation section.
> > changing "N" to lower-case would be misleading for regexp_replace?
> > so I choose "count".
>
> I don't see why that would be confusing for regexp_replace
> specifically, but I think N => count is a reasonable change to make.
> However, I don't think this quite works:
>
> + then the count'th match of the pattern
>
> An English speaker is more likely to understand what is meant by
> "N'th" than what is meant by "count'th". Even if they can guess, it's
> kinda strange-looking. I think it needs to be rephrased somehow, but
> I'm not sure exactly how.
>
>
I think this confusion goes to show that replacing N with count doesn't
work.

"replace_at" comes to mind as a better name.

By default, only the first match of the pattern is replaced.  If replace_at
is specified and greater than zero, then the first "replace_at - 1" matches
are skipped before making a single replacement (i.e., the g flag is ignored
when replace_at is specified.)

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 2:39 PM Tom Lane  wrote:
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
>
> UPDATE mytab SET mycol = 42;
>
> and instead I type
>
> UPDATEE mytab SET mycol = 42;
>
> your proposed feature would catch that; great.  But if I type
>
> UPDATE mytabb SET mycol = 42;
>
> it won't.  How does that make sense?  I'm not entirely sure where
> to draw the line about what a "syntax check" should catch, but this
> seems a bit south of what I'd want in a syntax-checking editor.

I don't agree with this, actually. The first wrong example can never
be valid, while the second one can be valid given the right table
definitions. That lines up quite nicely with the distinction between
parsing and parse analysis. There is a problem with actually getting
all the way there, I'm fairly sure, because we've got thousands of
lines of gram.y and thousands of lines of parse analysis code that
weren't all written with the idea of making a crisp distinction here.
For example, I'd like both EXPLAIN (WAFFLES) SELECT 1 and EXPLAIN
WAFFLES SELECT 1 to be flagged as syntactically invalid, and with
things as they are that would not happen. Even for plannable
statements I would not be at all surprised to hear that there are a
bunch of corner cases that we'd get wrong.

But I don't understand the idea that the concept doesn't make sense. I
think it is perfectly reasonable to imagine a world in which the
initial parsing takes care of reporting everything that can be
determined by static analysis without knowing anything about catalog
contents, and parse analysis checks all the things that require
catalog access, and you can run the first set of checks and then
decide whether to proceed further. I think if I were designing a new
system from scratch, I'd definitely want to set it up that way, and I
think moving our existing system in that direction would probably let
us clean up a variety of warts along the way. Really, the only
argument I see against it is that getting from where we are to there
would be a gigantic amount of work for the value we'd derive.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 2:12 PM Josef Šimánek  wrote:
> I'm totally OK to mark this as rejected and indeed 45 lines were just
> minimal patch to create PoC (I have coded this during last PGConf.eu
> lunch break) and mainly to start discussion.

Thanks for understanding.

> I'm not sure everyone in this thread understands the reason for this
> patch, which is clearly my fault, since I have failed to explain. Main
> idea is to make a tool to validate query can be parsed, that's all.
> Similar to running "EXPLAIN query", but not caring about the result
> and not caring about the DB structure (ignoring missing tables, ...),
> just checking it was successfully executed. This definitely belongs to
> the server side and not to the client side, it is just a tool to
> validate that for this running PostgreSQL backend it is a "parseable"
> query.

I don't think it's at all obvious that this belongs on the server side
rather than the client side. I think it could be done in either place,
or both. I just think we don't have the infrastructure to do it
cleanly, at present.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 3:01 PM David G. Johnston
 wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.

I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

2024-05-15 Thread Tom Lane

Robert Haas  writes:
> On Thu, Apr 4, 2024 at 9:55 AM jian he  wrote:
>> changing "N" to lower-case would be misleading for regexp_replace?
>> so I choose "count".

> I don't see why that would be confusing for regexp_replace
> specifically, but I think N => count is a reasonable change to make.
> However, I don't think this quite works:
> + then the count'th match of the pattern

I think the origin of the problem here is not wanting to use "N"
as the actual name of the parameter, because then users would have
to double-quote it to write "regexp_replace(..., "N" => 42, ...)".

However ... is that really so awful?  It's still fewer keystrokes
than "count".  It's certainly a potential gotcha for users who've
not internalized when they need double quotes, but I think we
could largely address that problem just by making sure to provide
a documentation example that shows use of "N".

> An English speaker is more likely to understand what is meant by
> "N'th" than what is meant by "count'th".

+1 ... none of the proposals make that bit read more clearly
than it does now.

regards, tom lane

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread walther


Tom Lane:

This is really what is missing for the ecosystem. A libpqparser for
tools to use: Formatters, linters, query rewriters, simple syntax
checkers... they are all missing access to postgres' own parser.


To get to that, you'd need some kind of agreement on what the syntax
tree is.  I doubt our existing implementation would be directly useful
to very many tools, and even if it is, do they want to track constant
version-to-version changes?


Correct, on top of what the syntax tree currently has, one would 
probably need:

- comments
- locations (line number / character) for everything, including those of 
comments


Otherwise it's impossible to print proper SQL again without losing 
information.


And then on top of that, to be really really useful, you'd need to be 
able to parse partial statements, too, to support all kinds of "language 
server" applications.


Tracking version-to-version changes is exactly the reason why it would 
be good to have that from upstream, imho. New syntax is added in 
(almost?) every release and everyone outside core trying to write their 
own parser and staying up2date with **all** the new syntax.. will 
eventually fail.


Yes, there could be changes to the produced parse tree as well and you'd 
also need to adjust, for example, your SQL-printers. But it should be 
easier to stay up2date than right now.


Best,

Wolfgang

Re: Incorrect Assert in BufFileSize()?

2024-05-15 Thread Peter Geoghegan

On Tue, May 14, 2024 at 6:58 AM David Rowley  wrote:
> On Fri, 3 May 2024 at 16:03, David Rowley  wrote:
> > I'm trying to figure out why BufFileSize() Asserts that file->fileset
> > isn't NULL, per 1a990b207.
>
> I was hoping to get some feedback here.

Notice that comments above BufFileSize() say "Return the current
fileset based BufFile size". There are numerous identical assertions
at the start of several other functions within the same file.

I'm not sure what it would take for this function to support
OpenTemporaryFile()-based BufFiles -- possibly not very much at all. I
assume that that's what you're actually interested in doing here (you
didn't say). If it is, you'll need to update the function's contract
-- just removing the assertion isn't enough.

-- 
Peter Geoghegan

Re: add function argument names to regex* functions.

2024-05-15 Thread Chapman Flack

On 05/15/24 15:07, Robert Haas wrote:
> is. I believe that if I were reading the documentation, count would be
> clearer to me than N, N would probably still be clear enough, and
> replace_at wouldn't be clear at all. I'd expect replace_at to be a
> character position or something, not an occurrence count.

You've said the magic word. In the analogous (but XQuery-based)
ISO standard regex functions, the argument that does that is identified
with the keyword OCCURRENCE.

What would be wrong with that, for consistency's sake?

Regards,
-Chap

Re: add function argument names to regex* functions.

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 12:07 PM Robert Haas  wrote:

> On Wed, May 15, 2024 at 3:01 PM David G. Johnston
>  wrote:
> > I think this confusion goes to show that replacing N with count doesn't
> work.
> >
> > "replace_at" comes to mind as a better name.
>
> I do not agree with that at all. It shows that a literal
> search-and-replace changing N to count does not work, but it does not
> show that count is a bad name for the concept, and I don't think it
> is. I believe that if I were reading the documentation, count would be
> clearer to me than N, N would probably still be clear enough, and
> replace_at wouldn't be clear at all. I'd expect replace_at to be a
> character position or something, not an occurrence count.
>
>
The function replaces matches, not random characters.  And if you are
reading the documentation I find it implausible that the wording I
suggested would cause one to think in terms of characters instead of
matches.

If I choose not to read the documentation "count" seems like it behaves as
a qualified "g".  I don't want all matches replaced, I want the first
"count" matches only replaced.

"occurrence" probably is the best choice but I agree the spelling issues
are a big negative.

count - how many things there are.  This isn't a count.  I'd rather stick
with N, at least it actually has the desired meaning as a pointer to an
item in a list.

N - The label provides zero context as to what the number you place there
is going to be used for.  Labels ideally do more work than this especially
if someone takes the time to spell them out.  Otherwise why use "pattern"
instead of "p".

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 15, 2024 at 2:39 PM Tom Lane  wrote:
>> The thing that was bothering me most about this is that I don't
>> understand why that's a useful check. ...

> But I don't understand the idea that the concept doesn't make sense.

Sorry: "make sense" was a poorly chosen phrase there.  What I was
doubting, and continue to doubt, is that 100% checking of what
you can check without catalog access and 0% checking of the rest
is a behavior that end users will find especially useful.

> I think it is perfectly reasonable to imagine a world in which the
> initial parsing takes care of reporting everything that can be
> determined by static analysis without knowing anything about catalog
> contents, and parse analysis checks all the things that require
> catalog access, and you can run the first set of checks and then
> decide whether to proceed further. I think if I were designing a new
> system from scratch, I'd definitely want to set it up that way, and I
> think moving our existing system in that direction would probably let
> us clean up a variety of warts along the way. Really, the only
> argument I see against it is that getting from where we are to there
> would be a gigantic amount of work for the value we'd derive.

I'm less enthusiatic about this than you are.  I think it would likely
produce a slower and less maintainable system.  Slower because we'd
need more passes over the query: what parse analysis does today would
have to be done in at least two separate steps.  Less maintainable
because knowledge about certain things would end up being more spread
around the system.  Taking your example of getting syntax checking to
recognize invalid EXPLAIN keywords: right now there's just one piece
of code that knows what those options are, but this'd require two.
Also, "run the first set of checks and then decide whether to proceed
further" seems like optimizing the system for broken queries over
valid ones, which I don't think is an appropriate goal.

Now, I don't say that there'd be *no* benefit to reorganizing the
system that way.  But it wouldn't be an unalloyed win, and so I
share your bottom line that the costs would be out of proportion
to the benefits.

regards, tom lane

Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-05-15 Thread Nathan Bossart

On Wed, May 15, 2024 at 12:06:03PM -0400, Robert Haas wrote:
> What jumps out at me when I read this patch is that it says that an
> incremental run should do steps 1-3 of a complete run, and then
> immediately backtracks and says not to do step 2, which seems a little
> strange.
> 
> I played around with this a bit and came up with the attached, which
> takes a slightly different approach. Feel free to use, ignore, etc.

This is much cleaner, thanks.  The only thing that stands out to me is that
the "once per release cycle" section should probably say to do an indent
run after downloading the typedef file.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: date_trunc function in interval version

2024-05-15 Thread Robert Haas

On Mon, Mar 4, 2024 at 5:03 AM Przemysław Sztoch  wrote:
> Apparently the functionality is identical to date_bin.
> When I saw date_bin in the documentation, I thought it solved all my problems.
> Unfortunately, DST problems have many corner cases.
> I tried to change date_bin several times, but unfortunately in some cases it 
> would start working differently than before.

So, first of all, thanks for taking an interest and sending a patch.

In order for the patch to have a chance of being accepted, we would
need to have a clear understanding of exactly how this patch is
different from the existing date_bin(). If we knew that, we could
decide either that (a) date_bin does the right thing and your patch
does the wrong thing and therefore we should reject your patch, or we
could decide that (b) date_bin does the wrong thing and therefore we
should fix it, or we could decide that (c) both date_bin and what this
patch does are correct, in the sense of being sensible things to do,
and there is a reason to have both. But if we don't really understand
how they are different, which seems to be the case right now, then we
can't make any decisions. And what that means in practice is that
nobody is going to be willing to commit anything, and we're just going
to go around in circles.

Typically, this kind of research is the responsibility of the patch
author: you're the one who wants something changed, so that means you
need to provide convincing evidence that it should be. If someone else
volunteers to do it, that's also cool, but it absolutely has to be
done in order for there to be a chance of progress here. No committer
is going to say "well, we already have date_bin, but Przemysław says
his date_trunc is different somehow, so let's have both without
understanding how exactly they're different." That's just not a
realistic scenario. Just to name one problem, how would we document
each of them? Users would expect the documentation to explain how two
closely-related functions differ, but we will be unable to explain
that if we don't know the answer ourselves.

If you can't figure out exactly what the differences are by code
inspection, then maybe one thing you could do to help unblock things
here is provide some very clear examples of when they deliver the same
results and when they deliver different results. Although there are no
guarantees, that might lead somebody else to jump in and suggest an
explanation, or further avenues of analysis, or some other helpful
comment.

Personally, what I suspect is that there's already a way to do what
you want using date_bin(), maybe in conjunction with some casting or
some calls to other functions that we already have. But it's hard to
be sure because we just don't have the details. "DST problems have
many corner cases" and "in some cases [date_bin] would start working
differently than before" may be true statements as far as they go, but
they're not very specific complaints. If you can describe *exactly*
how date_bin fails to meet your expectations, there is a much better
chance that something useful will happen here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 3:23 PM Chapman Flack  wrote:
> What would be wrong with that, for consistency's sake?

It was proposed and rejected upthread, but that's not to say that I
necessarily endorse the reasons given.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 12:18 PM  wrote:

> Tom Lane:
> >> This is really what is missing for the ecosystem. A libpqparser for
> >> tools to use: Formatters, linters, query rewriters, simple syntax
> >> checkers... they are all missing access to postgres' own parser.
> >
> > To get to that, you'd need some kind of agreement on what the syntax
> > tree is.  I doubt our existing implementation would be directly useful
> > to very many tools, and even if it is, do they want to track constant
> > version-to-version changes?
>
> Correct, on top of what the syntax tree currently has, one would
> probably need:
> - comments
> - locations (line number / character) for everything, including those of
> comments
>
>
I'm with the original patch idea at this point.  A utility that simply runs
text through the parser, not parse analysis, and answers the question:
"Were you able to parse this?" has both value and seems like something that
can be patched into core in a couple of hundred lines, not thousands, as
has already been demonstrated.

Sure, other questions are valid and other goals exist in the ecosystem, but
that doesn't diminish this one which is sufficiently justified for my +1 on
the idea.

Now, in my ideal world something like this could be made as an extension so
that it can work on older versions and not have to be maintained by core.
And likely grow more features over time.  Is there anything fundamental
about this that prevents it being implemented in an extension and, if so,
what can we add to core in v18 to alleviate that limitation?

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek

st 15. 5. 2024 v 21:28 odesílatel Tom Lane  napsal:
>
> Robert Haas  writes:
> > On Wed, May 15, 2024 at 2:39 PM Tom Lane  wrote:
> >> The thing that was bothering me most about this is that I don't
> >> understand why that's a useful check. ...
>
> > But I don't understand the idea that the concept doesn't make sense.
>
> Sorry: "make sense" was a poorly chosen phrase there.  What I was
> doubting, and continue to doubt, is that 100% checking of what
> you can check without catalog access and 0% checking of the rest
> is a behavior that end users will find especially useful.

But that's completely different feature which is not exclusive and
shouldn't block this other feature to do only exactly as specified.

> > I think it is perfectly reasonable to imagine a world in which the
> > initial parsing takes care of reporting everything that can be
> > determined by static analysis without knowing anything about catalog
> > contents, and parse analysis checks all the things that require
> > catalog access, and you can run the first set of checks and then
> > decide whether to proceed further. I think if I were designing a new
> > system from scratch, I'd definitely want to set it up that way, and I
> > think moving our existing system in that direction would probably let
> > us clean up a variety of warts along the way. Really, the only
> > argument I see against it is that getting from where we are to there
> > would be a gigantic amount of work for the value we'd derive.
>
> I'm less enthusiatic about this than you are.  I think it would likely
> produce a slower and less maintainable system.  Slower because we'd
> need more passes over the query: what parse analysis does today would
> have to be done in at least two separate steps.  Less maintainable
> because knowledge about certain things would end up being more spread
> around the system.  Taking your example of getting syntax checking to
> recognize invalid EXPLAIN keywords: right now there's just one piece
> of code that knows what those options are, but this'd require two.
> Also, "run the first set of checks and then decide whether to proceed
> further" seems like optimizing the system for broken queries over
> valid ones, which I don't think is an appropriate goal.
>
> Now, I don't say that there'd be *no* benefit to reorganizing the
> system that way.  But it wouldn't be an unalloyed win, and so I
> share your bottom line that the costs would be out of proportion
> to the benefits.
>
> regards, tom lane

Re: An improved README experience for PostgreSQL

2024-05-15 Thread Nathan Bossart

On Wed, May 15, 2024 at 07:23:19AM +0200, Peter Eisentraut wrote:
> I think for CONTRIBUTING.md, a better link would be
> .

WFM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 74de0e89bea2802bf699397837ebf77252a0e06b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 16 Apr 2024 21:23:52 -0500
Subject: [PATCH v3 1/1] Add code of conduct, contributing, and security files.

---
 .github/CODE_OF_CONDUCT.md | 2 ++
 .github/CONTRIBUTING.md| 2 ++
 .github/SECURITY.md| 2 ++
 3 files changed, 6 insertions(+)
 create mode 100644 .github/CODE_OF_CONDUCT.md
 create mode 100644 .github/CONTRIBUTING.md
 create mode 100644 .github/SECURITY.md

diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md
new file mode 100644
index 00..99bb1905d6
--- /dev/null
+++ b/.github/CODE_OF_CONDUCT.md
@@ -0,0 +1,2 @@
+The PostgreSQL code of conduct can be found at
+.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
new file mode 100644
index 00..0108e72956
--- /dev/null
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,2 @@
+For information about contributing to PostgreSQL, see
+.
diff --git a/.github/SECURITY.md b/.github/SECURITY.md
new file mode 100644
index 00..ebdbe609db
--- /dev/null
+++ b/.github/SECURITY.md
@@ -0,0 +1,2 @@
+For information about reporting security issues, see
+.
-- 
2.25.1

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek

st 15. 5. 2024 v 21:33 odesílatel David G. Johnston
 napsal:
>
> On Wed, May 15, 2024 at 12:18 PM  wrote:
>>
>> Tom Lane:
>> >> This is really what is missing for the ecosystem. A libpqparser for
>> >> tools to use: Formatters, linters, query rewriters, simple syntax
>> >> checkers... they are all missing access to postgres' own parser.
>> >
>> > To get to that, you'd need some kind of agreement on what the syntax
>> > tree is.  I doubt our existing implementation would be directly useful
>> > to very many tools, and even if it is, do they want to track constant
>> > version-to-version changes?
>>
>> Correct, on top of what the syntax tree currently has, one would
>> probably need:
>> - comments
>> - locations (line number / character) for everything, including those of
>> comments
>>
>
> I'm with the original patch idea at this point.  A utility that simply runs 
> text through the parser, not parse analysis, and answers the question: "Were 
> you able to parse this?" has both value and seems like something that can be 
> patched into core in a couple of hundred lines, not thousands, as has already 
> been demonstrated.
>
> Sure, other questions are valid and other goals exist in the ecosystem, but 
> that doesn't diminish this one which is sufficiently justified for my +1 on 
> the idea.
>
> Now, in my ideal world something like this could be made as an extension so 
> that it can work on older versions and not have to be maintained by core.  
> And likely grow more features over time.  Is there anything fundamental about 
> this that prevents it being implemented in an extension and, if so, what can 
> we add to core in v18 to alleviate that limitation?

Like extension providing additional binary? Or what kind of extension
do you mean? One of the original ideas was to be able to do so (parse
query) without running postgres itself. Could extension be useful
without running postgres backend?

> David J.
>

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 12:35 PM Josef Šimánek 
wrote:

> st 15. 5. 2024 v 21:33 odesílatel David G. Johnston
>  napsal:
>
> > Now, in my ideal world something like this could be made as an extension
> so that it can work on older versions and not have to be maintained by
> core.  And likely grow more features over time.  Is there anything
> fundamental about this that prevents it being implemented in an extension
> and, if so, what can we add to core in v18 to alleviate that limitation?
>
> Like extension providing additional binary? Or what kind of extension
> do you mean? One of the original ideas was to be able to do so (parse
> query) without running postgres itself. Could extension be useful
> without running postgres backend?
>
>
Pushing beyond my experience level here...but yes a separately installed
binary (extension is being used conceptually here, this doesn't involve
"create extension") that can inspect pg_config to find out where
backend/postmaster library objects are installed and link to them.

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Tom Lane

"David G. Johnston"  writes:
> Now, in my ideal world something like this could be made as an extension so
> that it can work on older versions and not have to be maintained by core.
> And likely grow more features over time.  Is there anything fundamental
> about this that prevents it being implemented in an extension and, if so,
> what can we add to core in v18 to alleviate that limitation?

It'd be pretty trivial to create a function that takes a string
and runs it through raw_parser --- I've got such things laying
about for microbenchmarking purposes, in fact.  But the API that'd
present for tools such as editors is enormously different from
the proposed patch: there would need to be a running server and
they'd need to be able to log into it, plus there are more minor
concerns like having to wrap the string in valid quoting.

Now on the plus side, once you'd bought into that environment,
it'd be equally trivial to offer alternatives like "run raw
parsing and parse analysis, but don't run the query".  I continue
to maintain that that's the set of checks you'd really want in a
lot of use-cases.

regards, tom lane

Re: add function argument names to regex* functions.

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 3:25 PM David G. Johnston
 wrote:
> The function replaces matches, not random characters.  And if you are reading 
> the documentation I find it implausible that the wording I suggested would 
> cause one to think in terms of characters instead of matches.

I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?

> N - The label provides zero context as to what the number you place there is 
> going to be used for.  Labels ideally do more work than this especially if 
> someone takes the time to spell them out.  Otherwise why use "pattern" 
> instead of "p".

I feel like you're attacking a straw man here. I never said that N was
my first choice; in fact, I said the opposite. But I do think that if
the documentation says, as it does, that the function is
regexp_replace(source, pattern, replacement, start, N, flags), a
reader who has some idea what a function called regexp_replace might
do will probably be able to guess what N is. It's probably also true
that if we changed "pattern" to "p" they would still be able to guess
that too, because there's nothing other than a pattern that you'd
expect to pass to a regexp-replacement function that starts with p,
but it would still be worse than what we have now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-05-15 Thread Robert Treat

On Wed, May 15, 2024 at 11:14 AM Robert Haas  wrote:
>
> On Fri, Mar 22, 2024 at 1:57 PM Robert Haas  wrote:
> > Rather, I think that it's entirely appropriate to do what Roberto
> > suggested, which is to say, let users know that they're going to use
> > some extra resources if they increase the setting, and then let them
> > figure out what if anything they want to do about that.
>
> Considering that, and the lack of further comment, I propose to commit
> the original patch.
>
> Objections?
>

I think the only unresolved question in my mind was if we should add a
similar note to the original patch to max_prepared_xacts as well; do
you intend to do that?

Robert Treat
https://xzilla.net

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 3:28 PM Tom Lane  wrote:
> Sorry: "make sense" was a poorly chosen phrase there.  What I was
> doubting, and continue to doubt, is that 100% checking of what
> you can check without catalog access and 0% checking of the rest
> is a behavior that end users will find especially useful.

You might be right, but my guess is that you're wrong. Syntax
highlighting is very popular, and seems like a more sophisticated
version of that same concept. I don't personally like it or use it
myself, but I bet I'm hugely in the minority these days. And EDB
certainly gets customer requests for syntax checking of various kinds;
whether this particular kind would get more or less traction than
other things is somewhat moot in view of the low likelihood of it
actually happening.

> I'm less enthusiatic about this than you are.  I think it would likely
> produce a slower and less maintainable system.  Slower because we'd
> need more passes over the query: what parse analysis does today would
> have to be done in at least two separate steps.  Less maintainable
> because knowledge about certain things would end up being more spread
> around the system.  Taking your example of getting syntax checking to
> recognize invalid EXPLAIN keywords: right now there's just one piece
> of code that knows what those options are, but this'd require two.
> Also, "run the first set of checks and then decide whether to proceed
> further" seems like optimizing the system for broken queries over
> valid ones, which I don't think is an appropriate goal.

Well, we've talked before about other problems that stem from the fact
that DDL doesn't have a clear separation between parse analysis and
execution. I vaguely imagine that it would be valuable to clean that
up for various reasons. But I haven't really thought it through, so
I'm prepared to concede that it might have various downsides that
aren't presently obvious to me.

> Now, I don't say that there'd be *no* benefit to reorganizing the
> system that way.  But it wouldn't be an unalloyed win, and so I
> share your bottom line that the costs would be out of proportion
> to the benefits.

I'm glad we agree on that much, and don't feel a compelling need to
litigate the remaining differences between our positions, unless you
really want to. I'm just telling you what I think, and I'm pleased
that we think as similarly as we do, despite remaining differences.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 4:00 PM Robert Treat  wrote:
> I think the only unresolved question in my mind was if we should add a
> similar note to the original patch to max_prepared_xacts as well; do
> you intend to do that?

I didn't intend to do that. I don't think it would be incorrect to do
so, but then we're kind of getting into a slippery slope of trying to
label every parameter that has increases shared memory usage or any
other kind of research consumption, and there are probably (pulls
number out of the air) twenty of those. It seems more worthwhile to
mention it for max_connections than the other (deducts one from
previous random guess) nineteen because it affects a whole lot more
things, like the size of the fsync queue and the size of the lock
table, and also because it tends to get set to relatively large
values, unlike, for example, autovacuum_max_workers. If you think we
should go further than just doing max_connections, then I think we
either need to (a) add a note to every single bloody parameter that
affects the size of shared memory or (b) prove that the subset where
we add such a note have a significantly larger impact than the others
where we don't. Do you think we should get into all that?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 3:30 PM Nathan Bossart  wrote:
> On Wed, May 15, 2024 at 12:06:03PM -0400, Robert Haas wrote:
> > What jumps out at me when I read this patch is that it says that an
> > incremental run should do steps 1-3 of a complete run, and then
> > immediately backtracks and says not to do step 2, which seems a little
> > strange.
> >
> > I played around with this a bit and came up with the attached, which
> > takes a slightly different approach. Feel free to use, ignore, etc.
>
> This is much cleaner, thanks.  The only thing that stands out to me is that
> the "once per release cycle" section should probably say to do an indent
> run after downloading the typedef file.

How's this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


pgindent_readme_v3.patch
Description: Binary data

Re: add function argument names to regex* functions.

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 12:52 PM Robert Haas  wrote:

> On Wed, May 15, 2024 at 3:25 PM David G. Johnston
>  wrote:
> > The function replaces matches, not random characters.  And if you are
> reading the documentation I find it implausible that the wording I
> suggested would cause one to think in terms of characters instead of
> matches.
>
> I mean I just told you what my reaction to it was. If you find that
> reaction "implausible" then I guess you think I was lying when I said
> that?
>
>
You just broke my brain when you say that you read:

By default, only the first match of the pattern is replaced.  If replace_at
is specified and greater than zero, then the first "replace_at - 1" matches
are skipped before making a single replacement (i.e., the g flag is ignored
when replace_at is specified.)

And then say:

I'd expect replace_at to be a character position or something, not an
occurrence count.

I guess it isn't a claim you are lying, rather I simply don't follow your
mental model of all this and in my mental model behind the proposal I don't
believe the typical reader will become confused on that point.  I guess
that means I don't find you to be the typical reader, at least so far as
this specific topic goes.  But hey, maybe I'm the one in the minority.  In
either case we disagree and that was my main point.

David J.

Re: add function argument names to regex* functions.

2024-05-15 Thread David G. Johnston

On Wed, May 15, 2024 at 12:07 PM Robert Haas  wrote:

> On Wed, May 15, 2024 at 3:01 PM David G. Johnston
>  wrote:
> > I think this confusion goes to show that replacing N with count doesn't
> work.
> >
> > "replace_at" comes to mind as a better name.
> I'd expect replace_at to be a
> character position or something, not an occurrence count.
>
>
I'll amend the name to:  "replace_match"

I do now see that since the immediately preceding parameter, "start", deals
with characters instead of matches that making it clear this parameter
deals in matches in the name work.  The singular 'match' has all the same
benefits as 'at' plus this point of clarity.

David J.

Re: add function argument names to regex* functions.

2024-05-15 Thread Robert Haas

On Wed, May 15, 2024 at 4:13 PM David G. Johnston
 wrote:
>
> You just broke my brain when you say that you read:
>
> By default, only the first match of the pattern is replaced.  If replace_at 
> is specified and greater than zero, then the first "replace_at - 1" matches 
> are skipped before making a single replacement (i.e., the g flag is ignored 
> when replace_at is specified.)
>
> And then say:
>
> I'd expect replace_at to be a character position or something, not an 
> occurrence count.

Ah. What I meant was: if I just saw the parameter name, and not the
documentation, I believe that I would not correctly understand what it
did. I would have had to read the docs. Whereas I'm pretty sure at
some point years ago, I looked up these functions and I saw "N", and I
did understand what that did without needing it explained. If I had
seen "count" or "occurrence" I think I would have understood that
without further explanation, too.

So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.

If your mileage varies on that point, so be it!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: More performance improvements for pg_dump in binary upgrade mode

2024-05-15 Thread Nathan Bossart

On Wed, May 15, 2024 at 10:15:13PM +0200, Daniel Gustafsson wrote:
> With the typarray caching from the patch attached here added *and* Nathan's
> patch from [0] added:
> 
> $ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
>   --format=custom --file a postgres > /dev/null
> 
> real  0m1.566s
> user  0m0.309s
> sys   0m0.080s
> 
> The combination of these patches thus puts binary uphrade mode almost on par
> with a plain dump, which has the potential to make upgrades of large schemas
> faster.  Parallel-parking this patch with Nathan's in the July CF, just wanted
> to type it up while it was fresh in my mind.

Nice!  I'll plan on taking a closer look at this one.  I have a couple
other ideas in-flight (e.g., parallelizing the once-in-each-database
operations with libpq's asynchronous APIs) that I'm hoping to post soon,
too.  v18 should have a lot of good stuff for pg_upgrade...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-05-15 Thread Robert Treat

On Wed, May 15, 2024 at 4:05 PM Robert Haas  wrote:
>
> On Wed, May 15, 2024 at 4:00 PM Robert Treat  wrote:
> > I think the only unresolved question in my mind was if we should add a
> > similar note to the original patch to max_prepared_xacts as well; do
> > you intend to do that?
>
> I didn't intend to do that. I don't think it would be incorrect to do
> so, but then we're kind of getting into a slippery slope of trying to
> label every parameter that has increases shared memory usage or any
> other kind of research consumption, and there are probably (pulls
> number out of the air) twenty of those. It seems more worthwhile to
> mention it for max_connections than the other (deducts one from
> previous random guess) nineteen because it affects a whole lot more
> things, like the size of the fsync queue and the size of the lock
> table, and also because it tends to get set to relatively large
> values, unlike, for example, autovacuum_max_workers. If you think we
> should go further than just doing max_connections, then I think we
> either need to (a) add a note to every single bloody parameter that
> affects the size of shared memory or (b) prove that the subset where
> we add such a note have a significantly larger impact than the others
> where we don't. Do you think we should get into all that?
>

Nope. Let's do the best bang for the buck improvement and we can see
if we get any feedback that indicates more needs to be done.

Robert Treat
https://xzilla.net

Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-05-15 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 15, 2024 at 3:30 PM Nathan Bossart  
> wrote:
>> This is much cleaner, thanks.  The only thing that stands out to me is that
>> the "once per release cycle" section should probably say to do an indent
>> run after downloading the typedef file.

> How's this?

This works for me.  One point that could stand discussion while we're
here is whether the once-a-cycle run should use the verbatim buildfarm
results or it's okay to editorialize on that typedefs list.  I did a
little of the latter in da256a4a7, and I feel like we should either
bless that practice in this document or decide that it was a bad idea.

For reference, what I added to the buildfarm's list was

+InjectionPointCacheEntry
+InjectionPointCondition
+InjectionPointConditionType
+InjectionPointEntry
+InjectionPointSharedState
+NotificationHash
+ReadBuffersFlags
+ResourceOwnerData
+WaitEventExtension
+WalSyncMethod

I believe all of these must have been added manually during v17.
If I took any one of them out there was some visible disimprovement
in pgindent's results, so I kept them.  Was that the right decision?
If so we should explain it here.

regards, tom lane

1 2 >

1 - 100 of 158 matches

Mail list logo