Re: SQL:2023 JSON simplified accessor support

2024-09-27 Thread Andrew Dunstan



On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote:

On Sep 26, 2024, at 16:45, Alexandra Wang  wrote:


I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!

Oh,  nice! I don’t suppose the standard also has defined an operator equivalent to 
->>, though, has it? I tend to want the text output far more often than a JSON 
scalar.



That would defeat being able to chain these.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-09-27 Thread Robert Haas
On Thu, Sep 26, 2024 at 2:57 PM Jelte Fennema-Nio  wrote:
> On Thu, 26 Sept 2024 at 08:06, Robert Haas  wrote:
> > Focusing on the first patch seems odd to me, though
>
> Indeed the first few patches will often be small, and the big patch
> will appear later. When I split patches up, those small patches should
> usually be reviewable without looking at the big patch in detail, and
> hopefully they shouldn't be too contentious: e.g. a comment
> improvement or some small refactor. But often those patches don't seem
> to be reviewed significantly quicker or merged significantly earlier
> than the big patch. That makes it seem to me that even though they
> should be relatively low-risk to commit and low-effort to review,
> reviewers are scared away by the sheer number of patches in the
> patchset, or by the size of the final patch. That's why I thought it
> could be useful to specifically show the size of the first patch in
> addition to the total patchset size, so that reviewers can easily spot
> some small hopefully easy to review patch at the start of a patchset.

Fair enough! Personally what I'd want to know is how large the biggest
patch is, but I see your point, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-09-27 Thread Christoph Berg
Re: Tom Lane
> So the first part of that is great, but if your script file is
> large you probably won't be happy about having the whole thing
> repeated in the "QUERY" field.  So this needs some work on
> user-friendliness.

Does this really have to be addressed? It would be way better than it
is now, and errors during extension creation are rare and mostly for
developers only, so it doesn't have to be pretty.

> I'm inclined to think that maybe we'd be best off keeping the server
> end of it straightforward, and trying to teach psql to abbreviate the
> QUERY field in a useful way.  IIRC you get this same type of problem
> with very large SQL-language functions and suchlike.

I'd treat this as a separate patch, if it's considered to be a good
idea.

Christoph




Re: Conflict Detection and Resolution

2024-09-27 Thread shveta malik
On Fri, Sep 27, 2024 at 10:44 AM shveta malik  wrote:
>
> > >
> > > Thanks for the review.
> > > Here is the v14 patch-set fixing review comments in [1] and [2].
> > >
> >
> > Thanks for the patches. I am reviewing patch001, it is WIP, but please
> > find initial set of comments:
> >
>

Please find the next set of comments.

16)
In pg_dump.h, there is a lot of duplication of structures from
conflict.h, we can avoid that by making below changes:
--In SubscriptionInfo(), we can have a list of ConflictTypeResolver
structure and fill the elements of the list in getSubscriptions()
simply by output of pg_subscription_conflict.
--Then in dumpSubscription() we can traverse the list to verify if the
resolver is the default one, if so, skip the dump. We can create a new
function to return whether the resolver is default or not.
--We can get rid of enum ConflictType, enum ConflictResolver,
ConflictResolverNames, ConflictTypeDefaultResolvers from pg_dump.h

17)
In describe.c, we can have an 'order by' in the query so that order is
not changed everytime we update a resolver. Please see this:

For sub1, \dRs was showing below as output for Conflict Resolvers:
insert_exists = error, update_origin_differs = apply_remote,
update_exists = error, update_missing = skip, delete_origin_differs =
apply_remote, delete_missing = skip

Once I update resolver, the order gets changed:
postgres=# ALTER SUBSCRIPTION sub1 CONFLICT RESOLVER
(insert_exists='apply_remote');
ALTER SUBSCRIPTION

\dRs:
update_origin_differs = apply_remote, update_exists = error,
update_missing = skip, delete_origin_differs = apply_remote,
delete_missing = skip, insert_exists = apply_remote

18)
Similarly after making change 16, for pg_dump too, it will be good if
we maintain the order and thus can have order-by in pg_dump's query as
well.

thanks
Shveta




Re: Using per-transaction memory contexts for storing decoded tuples

2024-09-27 Thread Shlok Kyal
On Mon, 23 Sept 2024 at 09:59, Amit Kapila  wrote:
>
> On Sun, Sep 22, 2024 at 11:27 AM David Rowley  wrote:
> >
> > On Fri, 20 Sept 2024 at 17:46, Amit Kapila  wrote:
> > >
> > > On Fri, Sep 20, 2024 at 5:13 AM David Rowley  wrote:
> > > > In general, it's a bit annoying to have to code around this
> > > > GenerationContext fragmentation issue.
> > >
> > > Right, and I am also slightly afraid that this may not cause some
> > > regression in other cases where defrag wouldn't help.
> >
> > Yeah, that's certainly a possibility. I was hoping that
> > MemoryContextMemAllocated() being much larger than logical_work_mem
> > could only happen when there is fragmentation, but certainly, you
> > could be wasting effort trying to defrag transactions where the
> > changes all arrive in WAL consecutively and there is no
> > defragmentation. It might be some other large transaction that's
> > causing the context's allocations to be fragmented. I don't have any
> > good ideas on how to avoid wasting effort on non-problematic
> > transactions. Maybe there's something that could be done if we knew
> > the LSN of the first and last change and the gap between the LSNs was
> > much larger than the WAL space used for this transaction. That would
> > likely require tracking way more stuff than we do now, however.
> >
>
> With more information tracking, we could avoid some non-problematic
> transactions but still, it would be difficult to predict that we
> didn't harm many cases because to make the memory non-contiguous, we
> only need a few interleaving small transactions. We can try to think
> of ideas for implementing defragmentation in our code if we first can
> prove that smaller block sizes cause problems.
>
> > With the smaller blocks idea, I'm a bit concerned that using smaller
> > blocks could cause regressions on systems that are better at releasing
> > memory back to the OS after free() as no doubt malloc() would often be
> > slower on those systems. There have been some complaints recently
> > about glibc being a bit too happy to keep hold of memory after free()
> > and I wondered if that was the reason why the small block test does
> > not cause much of a performance regression. I wonder how the small
> > block test would look on Mac, FreeBSD or Windows. I think it would be
> > risky to assume that all is well with reducing the block size after
> > testing on a single platform.
> >
>
> Good point. We need extensive testing on different platforms, as you
> suggest, to verify if smaller block sizes caused any regressions.

I did similar tests on Windows. rb_mem_block_size was changed from 8kB
to 8MB. Below table shows the result (average of 5 runs) and Standard
Deviation (of 5 runs) for each block-size.

===
block-size  |Average time (ms)   |Standard Deviation (ms)
-
8kb|12580.879 ms |144.6923467
16kb  |12442.7256 ms   |94.02799006
32kb  |12370.7292 ms   |97.7958552
64kb  |11877.4888 ms   |222.2419142
128kb|11828.8568 ms   |129.732941
256kb|11801.086 ms |20.60030913
512kb|12361.4172 ms   |65.27390105
1MB  |12343.3732 ms   |80.84427202
2MB  |12357.675 ms |79.40017604
4MB  |12395.8364 ms   |76.78273689
8MB  |11712.8862 ms   |50.74323039
==

>From the results, I think there is a small regression for small block size.

I ran the tests in git bash. I have also attached the test script.

Thanks and Regards,
Shlok Kyal
#!/bin/bash

if [ -z "$1" ]
then
size="8kB"
else
size=$1
fi

echo 'Clean up'
echo $size
./pg_ctl stop -D data

rm -rf data* logfile

echo 'Set up'

./initdb -D data -U postgres

cat << EOF >> data/postgresql.conf
wal_level = logical
autovacuum = false
checkpoint_timeout = 1h
shared_buffers = '10GB'
work_mem = '1GB'
logical_decoding_work_mem = '2097151 kB'
max_wal_size = 20GB
min_wal_size = 10GB
rb_mem_block_size = $size
EOF

./pg_ctl -D data start -w -l logfile

(
echo -E "SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');"
echo -E "CREATE TABLE foo (id int);"
echo -E "INSERT INTO foo VALUES (generate_series(1, 1000));"
) | ./psql -U postgres

for i in `seq 1 5`
do
(
echo -E "\timing"
echo -E "SELECT count(*) FROM pg_logical_slot_peek_changes('test', NULL, NULL);"
) | ./psql -U postgres
done


Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-09-27 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane
>> (It might be worth some effort to trim away comments appearing
>> just before a command, but I didn't tackle that here.)

> The "error when psql" comments do look confusing, but I guess in other
> places the comment just before the query adds valuable context, so I'd
> say leaving the comments in is ok.

It looks like if we did want to suppress that, the right fix is to
make gram.y track statement start locations more honestly, as in
0002 attached (0001 is the same as before).  This'd add a few cycles
per grammar nonterminal reduction, which is kind of annoying but
probably is negligible in the grand scheme of things.  Still, I'd
not propose it just for this.  But if memory serves, we've had
previous complaints about pg_stat_statements failing to strip
leading comments from queries, and this'd fix that.  I think it
likely also improves error cursor positioning for cases involving
empty productions --- I'm a tad surprised that no other regression
cases changed.

regards, tom lane

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index fab59ad5f6..0fae1332d2 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -54,6 +54,7 @@
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/queryjumble.h"
 #include "storage/fd.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
@@ -107,6 +108,17 @@ typedef struct ExtensionVersionInfo
 	struct ExtensionVersionInfo *previous;	/* current best predecessor */
 } ExtensionVersionInfo;
 
+/*
+ * Information for script_error_callback()
+ */
+typedef struct
+{
+	const char *sql;			/* entire script file contents */
+	const char *filename;		/* script file pathname */
+	ParseLoc	stmt_location;	/* current stmt start loc, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
+} script_error_callback_arg;
+
 /* Local functions */
 static List *find_update_path(List *evi_list,
 			  ExtensionVersionInfo *evi_start,
@@ -670,9 +682,60 @@ read_extension_script_file(const ExtensionControlFile *control,
 	return dest_str;
 }
 
+/*
+ * error context callback for failures in script-file execution
+ */
+static void
+script_error_callback(void *arg)
+{
+	script_error_callback_arg *callback_arg = (script_error_callback_arg *) arg;
+	int			syntaxerrposition;
+	const char *lastslash;
+
+	/* If it's a syntax error, convert to internal syntax error report */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		/*
+		 * We must report the whole string because otherwise details such as
+		 * psql's line number report would be wrong.
+		 */
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(callback_arg->sql);
+	}
+	else if (callback_arg->stmt_location >= 0)
+	{
+		/*
+		 * Since no syntax cursor will be shown, it's okay and helpful to trim
+		 * the reported query string to just the current statement.
+		 */
+		const char *query = callback_arg->sql;
+		int			location = callback_arg->stmt_location;
+		int			len = callback_arg->stmt_len;
+
+		query = CleanQuerytext(query, &location, &len);
+		internalerrquery(pnstrdup(query, len));
+	}
+
+	/*
+	 * Trim the reported file name to remove the path.  We know that
+	 * get_extension_script_filename() inserted a '/', regardless of whether
+	 * we're on Windows.
+	 */
+	lastslash = strrchr(callback_arg->filename, '/');
+	if (lastslash)
+		lastslash++;
+	else
+		lastslash = callback_arg->filename; /* shouldn't happen, but cope */
+	errcontext("extension script file \"%s\"", lastslash);
+}
+
 /*
  * Execute given SQL string.
  *
+ * The filename the string came from is also provided, for error reporting.
+ *
  * Note: it's tempting to just use SPI to execute the string, but that does
  * not work very well.  The really serious problem is that SPI will parse,
  * analyze, and plan the whole string before executing any of it; of course
@@ -682,12 +745,27 @@ read_extension_script_file(const ExtensionControlFile *control,
  * could be very long.
  */
 static void
-execute_sql_string(const char *sql)
+execute_sql_string(const char *sql, const char *filename)
 {
+	script_error_callback_arg callback_arg;
+	ErrorContextCallback scripterrcontext;
 	List	   *raw_parsetree_list;
 	DestReceiver *dest;
 	ListCell   *lc1;
 
+	/*
+	 * Setup error traceback support for ereport().
+	 */
+	callback_arg.sql = sql;
+	callback_arg.filename = filename;
+	callback_arg.stmt_location = -1;
+	callback_arg.stmt_len = -1;
+
+	scripterrcontext.callback = script_error_callback;
+	scripterrcontext.arg = (void *) &callback_arg;
+	scripterrcontext.previous = error_context_stack;
+	error_context_stack = &scripterrcontext;
+
 	/*
 	 * Parse the SQL string into a list of raw parse trees.
 	 */
@@ -709,6 +787,10 @@ execute_sql_string(const char *sql)
 		List	   *stmt_list;
 		ListCell   *lc2;
 
+		/* Report location of this query for error c

Re: [PoC] Federated Authn/z with OAUTHBEARER

2024-09-27 Thread Antonin Houska
Jacob Champion  wrote:

> Peter asked me if there were plans to provide a "standard" validator
> module, say as part of contrib. The tricky thing is that Bearer
> validation is issuer-specific, and many providers give you an opaque
> token that you're not supposed to introspect at all.
> 
> We could use token introspection (RFC 7662) for online verification,
> but last I looked at it, no one had actually implemented those
> endpoints. For offline verification, I think the best we could do
> would be to provide a generic JWT Profile (RFC 9068) validator, but
> again I don't know if anyone is actually providing those token formats
> in practice. I'm inclined to push that out into the future.

Have you considered sending the token for validation to the server, like this

curl -X GET "https://www.googleapis.com/oauth2/v3/userinfo"; -H "Authorization: 
Bearer $TOKEN"

and getting the userid (e.g. email address) from the response, as described in
[1]? ISTM that this is what pgadmin4 does - in paricular, see the
get_user_profile() function in web/pgadmin/authenticate/oauth2.py.

[1] 
https://www.oauth.com/oauth2-servers/signing-in-with-google/verifying-the-user-info/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-09-27 Thread Nathan Bossart
On Fri, Sep 27, 2024 at 01:27:38PM -0700, Jeff Davis wrote:
> Looks good to me.

Thanks, committed.

-- 
nathan




Re: Vacuum statistics

2024-09-27 Thread Melanie Plageman
On Fri, Sep 27, 2024 at 2:16 PM Masahiko Sawada  wrote:
>
> Hi,
>
> On Thu, Sep 5, 2024 at 2:01 PM Alena Rybakina  
> wrote:
> >
> > Hi! Thank you for your review!
> >
> > On 05.09.2024 15:47, jian he wrote:
> >
> > On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina  
> > wrote:
> >
> > Hi, all!
> >
> > I have attached the new version of the code and the diff files
> > (minor-vacuum.no-cbot).
>
> Thank you for updating the patches. I've reviewed the 0001 patch and
> have two comments.

I took a very brief look at this and was wondering if it was worth
having a way to make the per-table vacuum statistics opt-in (like a
table storage parameter) in order to decrease the shared memory
footprint of storing the stats.

- Melanie




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-09-27 Thread Melanie Plageman
On Mon, Aug 26, 2024 at 10:49 AM Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Wed, Jun 19, 2024 at 2:21 PM Melanie Plageman
> >  wrote:
> >> If we want to make it possible to use no tools and only manually grep
> >> for struct members, that means we can never reuse struct member names.
> >> Across a project of our size, that seems like a very serious
> >> restriction. Adding prefixes in struct members makes it harder to read
> >> code -- both because it makes the names longer and because people are
> >> more prone to abbreviate the meaningful parts of the struct member
> >> name to make the whole name shorter.
>
> > I don't think we should go so far as to never reuse a structure member
> > name. But I also do use 'git grep' a lot to find stuff, and I don't
> > appreciate it when somebody names a key piece of machinery 'x' or 'n'
> > or something, especially when references to that thing could
> > reasonably occur almost anywhere in the source code. So if somebody is
> > creating a struct whose names are fairly generic and reasonably short,
> > I like the idea of using a prefix for those names. If the structure
> > members are things like that_thing_i_stored_behind_the_fridge (which
> > is long) or cytokine (which is non-generic) then they're greppable
> > anyway and it doesn't really matter. But surely changing something
> > like rs_flags to just flags is just making everyone's life harder:
>
> I'm with Robert here: I care quite a lot about the greppability of
> field names.  I'm not arguing for prefixes everywhere, but I don't
> think we should strip out prefixes we've already created, especially
> if the result will be to have extremely generic field names.

Okay, got it -- folks like the prefixes.
I'm picking this patch set back up again after a long pause and I will
restore all prefixes.

What does the rs_* in the HeapScanDescData stand for, though?

- Melanie




Re: Vacuum statistics

2024-09-27 Thread Andrei Zubkov
Hi,

On Fri, 2024-09-27 at 11:15 -0700, Masahiko Sawada wrote:
> I'm concerned that a pg_stat_vacuum_tables view has some duplicated
> statistics that we already collect in different ways. For instance,
> total_blks_{read,hit,dirtied,written} are already tracked at
> system-level by pg_stat_io,

pg_stat_vacuum_tables.total_blks_{read,hit,dirtied,written} tracks
blocks used by vacuum in different ways while vacuuming this particular
table while pg_stat_io tracks blocks used by vacuum on the cluster
level.

> and per-relation block I/O statistics can
> be collected using pg_stat_statements.

This is impossible. pg_stat_statements tracks block statistics on a 
statement level. One statement could touch many tables and many
indexes, and all used database blocks will be counted by the
pg_stat_statements counters on a statement-level. Autovacuum statistics
won't be accounted by the pg_stat_statements. After all,
pg_stat_statements won't hold the statements statistics forever. Under
pressure of new statements the statement eviction can happen and
statistics will be lost.

All of the above is addressed by relation-level vacuum statistics held
in the Cumulative Statistics System proposed by this patch.
-- 
regards, Andrei Zubkov
Postgres Professional





Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.

2024-09-27 Thread Nathan Bossart
On Fri, Sep 27, 2024 at 02:48:01PM +, Max Johnson wrote:
> I think that it would be a good idea to include these fixes in the next
> minor release. After working for a couple months on getting our embedded
> systems 2038 compliant, it has become very apparent that 2038 will be a
> substantial ordeal. Maximizing the number of systems that include this
> fix would make things a little easier when that time comes around.

Alright.  I was able to back-patch it to v12 without too much trouble,
fortunately.  I'll commit that soon unless anyone else has feedback.

-- 
nathan




Re: SQL:2023 JSON simplified accessor support

2024-09-27 Thread David E. Wheeler
On Sep 27, 2024, at 12:07, Andrew Dunstan  wrote:

> That would defeat being able to chain these.

Not if it’s a different operator. But I’m fine to just keep using ->> at the 
end of a chain.

D





Re: BitmapHeapScan streaming read user and prelim refactoring

2024-09-27 Thread Melanie Plageman
On Wed, Jun 19, 2024 at 2:13 PM Melanie Plageman
 wrote:
>
> On Wed, Jun 19, 2024 at 12:38 PM Tomas Vondra
>  wrote:
>
> > > + * XXX I don't understand why we should have this special node if we
> > > + * don't have special nodes for other scan types.
> > >
> > > In this case, up until the final commit (using the read stream
> > > interface), there are six fields required by bitmap heap scan that are
> > > not needed by any other user of HeapScanDescData. There are also
> > > several members of HeapScanDescData that are not needed by bitmap heap
> > > scans and all of the setup in initscan() for those fields is not
> > > required for bitmap heap scans.
> > >
> > > Also, because the BitmapHeapScanDesc needs information like the
> > > ParallelBitmapHeapState and prefetch_maximum (for use in prefetching),
> > > the scan_begin() callback would have to take those as parameters. I
> > > thought adding so much bitmap table scan-specific information to the
> > > generic table scan callbacks was a bad idea.
> > >
> > > Once we add the read stream API code, the number of fields required
> > > for bitmap heap scan that are in the scan descriptor goes down to
> > > three. So, perhaps we could justify having that many bitmap heap
> > > scan-specific fields in the HeapScanDescData.
> > >
> > > Though, I actually think we could start moving toward having
> > > specialized scan descriptors if the requirements for that scan type
> > > are appreciably different. I can't think of new code that would be
> > > added to the HeapScanDescData that would have to be duplicated over to
> > > specialized scan descriptors.
> > >
> > > With the final read stream state, I can see the argument for bloating
> > > the HeapScanDescData with three extra members and avoiding making new
> > > scan descriptors. But, for the intermediate patches which have all of
> > > the bitmap prefetch members pushed down into the HeapScanDescData, I
> > > think it is really not okay. Six members only for bitmap heap scans
> > > and two bitmap-specific members to begin_scan() seems bad.
> > >
> > > What I thought we plan to do is commit the refactoring patches
> > > sometime after the branch for 18 is cut and leave the final read
> > > stream patch uncommitted so we can do performance testing on it. If
> > > you think it is okay to have the six member bloated HeapScanDescData
> > > in master until we push the read stream code, I am okay with removing
> > > the BitmapTableScanDesc and BitmapHeapScanDesc.
> > >
> >
> > I admit I don't have a very good idea what the ideal / desired state
> > look like. My comment is motivated solely by the feeling that it seems
> > strange to have one struct serving most scan types, and then a special
> > struct for one particular scan type ...
>
> I see what you are saying. We could make BitmapTableScanDesc inherit
> from TableScanDescData which would be similar to what we do with other
> things like the executor scan nodes themselves. We would waste space
> and LOC with initializing the unneeded members, but it might seem less
> weird.
>
> Whether we want the specialized scan descriptors at all is probably
> the bigger question, though.
>
> The top level BitmapTableScanDesc is motivated by wanting fewer bitmap
> table scan specific members passed to scan_begin(). And the
> BitmapHeapScanDesc is motivated by this plus wanting to avoid bloating
> the HeapScanDescData.
>
> If you look at at HEAD~1 (with my patches applied) and think you would
> be okay with
> 1) the contents of the BitmapHeapScanDesc being in the HeapScanDescData and
> 2) the extra bitmap table scan-specific parameters in scan_begin_bm()
> being passed to scan_begin()
>
> then I will remove the specialized scan descriptors.
>
> The final state (with the read stream) will still have three bitmap
> heap scan-specific members in the HeapScanDescData.
>
> Would it help if I do a version like this so you can see what it is like?

I revisited this issue (how to keep from bloating the Heap and Table
scan descriptors and adding many parameters to the scan_begin() table
AM callback) and am trying to find a less noisy way to address it than
my previous proposal.

I've attached a prototype of what I think might work applied on top of
master instead of on top of my patchset.

For the top-level TableScanDescData, I suggest we use a union with the
members of each scan type in it in anonymous structs (see 0001). This
will avoid too much bloat because there are other scan types (like TID
Range scans) whose members we can move into the union. It isn't great,
but it avoids a new top-level scan descriptor and changes to the
generic scanstate node.

We will still have to pass the parameters needed to set up the
parallel bitmap iterators to scan_begin() in the intermediate patches,
but if we think that we can actually get the streaming read version of
bitmapheapscan in in the same release, then I think it should be okay
because the final version of these table AM callbacks do not nee

Re: Conflict Detection and Resolution

2024-09-27 Thread Peter Smith
Here are some review comments for v14-0001.

This is a WIP, but here are my comments for all the SGML parts.

(There will be some overlap here with comments already posted by Shveta)

==
1. file modes after applying the patch

 mode change 100644 => 100755 doc/src/sgml/ref/alter_subscription.sgml
 mode change 100644 => 100755 doc/src/sgml/ref/create_subscription.sgml

What's going on here? Why are those SGMLs changed to executable?

==
Commit message

2.
nit - a missing period in the first sentence
nit - typo /reseting/resetting/

==
doc/src/sgml/logical-replication.sgml

3.
-  Conflicts
+  Conflicts and conflict resolution

nit - change the capitalisation to "and Conflict Resolution" to match
other titles.

~~~

4.
+   Additional logging is triggered in various conflict scenarios,
each identified as a
+   conflict type, and the conflict statistics are collected (displayed in the
+   pg_stat_subscription_stats
view).
+   Users have the option to configure a
conflict_resolver for each
+   conflict_type when creating a subscription.
+   For more information on the supported
conflict_types detected and
+   conflict_resolvers, refer to section
+   CONFLICT
RESOLVERS.
+

nit - "Additional logging is triggered" sounds strange. I reworded
this in the nits attachment. Please see if you approve.
nit - The "conflict_type" and "conflict_resolver" you are referring to
here are syntax elements of the CREATE SUBSCRIPTION, so here I think
they should just be called (without the underscores) "conflict type"
and "conflict resolver".
nit - IMO this would be better split into multiple paragraphs.
nit - There is no such section called "CONFLICT RESOLVERS". I reworded
this link text.

==
doc/src/sgml/monitoring.sgml

5.
The changes here all render with the link including the type "(enum)"
displayed, which I thought it unnecessary/strange.

For example:
See insert_exists (enum) for details about this conflict.

IIUC there is no problem here, but maybe the other end of the link
needed to define xreflabels. I have made the necessary modifications
in the create_subscription.sgml.

==
doc/src/sgml/ref/alter_subscription.sgml

6.
+ALTER SUBSCRIPTION name
CONFLICT RESOLVER ( conflict_type [= conflict_resolver] [, ...] )

This syntax seems wrong to me.

Currently, it says:
ALTER SUBSCRIPTION name CONFLICT RESOLVER ( conflict_type [=
conflict_resolver] [, ...] )

But, shouldn't that say:
ALTER SUBSCRIPTION name CONFLICT RESOLVER ( conflict_type =
conflict_resolver [, ...] )

~~~
7.
+ALTER SUBSCRIPTION name
RESET CONFLICT RESOLVER FOR (conflict_type)

I can see that this matches the implementation, but I was wondering
why don't you permit resetting multiple conflict_types at the same
time. e.g. what if I want to reset some but not ALL?

~~~

nit - there are some minor whitespace indent problems in the SGML

~~~

8.
+   
+CONFLICT RESOLVER ( conflict_type [= conflict_resolver] [, ... ]
)
+
+ 
+  This clause alters either the default conflict resolvers or
those set by .
+  Refer to section CONFLICT
RESOLVERS
+  for the details on supported conflict_types
and conflict_resolvers.
+ 
+
+   
+
+ 
+  conflict_type
+   
+
+The conflict type being reset to its default resolver setting.
+For details on conflict types and their default resolvers, refer
to section CONFLICT
RESOLVERS
+
+   
+  
+ 

This section seems problematic:
e.g the syntax seems wrong same as before.

~
There are other nits.
(I've given a rough fix in the nits attachment. Please see it and make
it better).

nit - why do you care if it is "either the default conflict resolvers
or those set...". Why not just say "current resolver"
nit - it does not mention 'conflict_resolver' type in the normal way
nit - there is no actual section called "CONFLICT RESOLVERS"
nit - the part that says "The conflict type being reset to its default
resolver setting." is bogus for this form of the ALTER statement.

~~~

9.
There is no description for the "RESET CONFLICT RESOLVER ALL"

~~~

10.
There is no description for the "RESET CONFLICT RESOLVER FOR (conflict_type)"

==
doc/src/sgml/ref/create_subscription.sgml

11. General - Order

+   
+CONFLICT RESOLVER ( conflict_type = 
+  This optional clause specifies options for conflict resolvers
for different conflict_types.
+ 

nit - IMO we don't need the words "options for" here.

~~~

16.
+ 
+  The conflict_type
and their default behaviour are listed below.

nit - sounded strange to me. reworded it slightly.

~~~

17.
+   

nit - Here, and for all other conflict types, add "xreflabel". See my
review comment #5 for the reason why.

~~~

18.
+ 
+  The conflict_resolver and their behaviour
+  are listed below. Users can use any of the following resolvers
for automatic conflict
+  resolution.
+  

nit - reworded this too, to be like the previous review comment.

~~~

19. General - readability.

19a.
IMO the information about what are the def

Changing the default random_page_cost value

2024-09-27 Thread Greg Sabino Mullane
tl;dr let's assume SSDs are popular and HDDs are the exception and flip our
default

As I write this email, it's the year 2024. I think it is time we lower our
"default" setting of random_page_cost (as set in postgresql.conf.sample and
the docs). Even a decade ago, the current default of 4 was considered
fairly conservative and often lowered. The git logs shows that this value
was last touched in 2006, during the age of spinning metal. We are now in a
new era, the age of SSDs, and thus we should lower this default value to
reflect the fact that the vast majority of people using Postgres these days
are doing so on solid state drives. We tend to stay ultra-conservative in
all of our settings, but we also need to recognize when there has been a
major shift in the underlying hardware - and calculations that our defaults
are based on.

Granted, there are other factors involved, and yes, perhaps we should tweak
some of the similar settings as well, but ranom_page_cost is the one
setting most out of sync with today's hardware realities. So I'll be brave
and throw a number out there: 1.2. And change our docs to say wordage like
"if you are using an older hard disk drive technology, you may want to try
raising rpc" to replace our fairly-hidden note about SSDs buried in the
last sentence - of the fourth paragraph - of the rpc docs.

Real data about performance on today's SSDs are welcome, and/or some way to
generate a more accurate default.

Cheers,
Greg


Re: [PoC] Federated Authn/z with OAUTHBEARER

2024-09-27 Thread Jacob Champion
On Fri, Sep 27, 2024 at 10:58 AM Antonin Houska  wrote:
> Have you considered sending the token for validation to the server, like this
>
> curl -X GET "https://www.googleapis.com/oauth2/v3/userinfo"; -H 
> "Authorization: Bearer $TOKEN"

In short, no, but I'm glad you asked. I think it's going to be a
common request, and I need to get better at explaining why it's not
safe, so we can document it clearly. Or else someone can point out
that I'm misunderstanding, which honestly would make all this much
easier and less complicated. I would love to be able to do it that
way.

We cannot, for the same reason libpq must send the server an access
token instead of an ID token. The /userinfo endpoint tells you who the
end user is, but it doesn't tell you whether the Bearer is actually
allowed to access the database. That difference is critical: it's
entirely possible for an end user to be authorized to access the
database, *and yet* the Bearer token may not actually carry that
authorization on their behalf. (In fact, the user may have actively
refused to give the Bearer that permission.) That's why people are so
pedantic about saying that OAuth is an authorization framework and not
an authentication framework.

To illustrate, think about all the third-party web services out there
that ask you to Sign In with Google. They ask Google for permission to
access your personal ID, and Google asks you if you're okay with that,
and you either allow or deny it. Now imagine that I ran one of those
services, and I decided to become evil. I could take my legitimately
acquired Bearer token -- which should only give me permission to query
your Google ID -- and send it to a Postgres database you're authorized
to access.

The server is supposed to introspect it, say, "hey, this token doesn't
give the bearer access to the database at all," and shut everything
down. For extra credit, the server could notice that the client ID
tied to the access token isn't even one that it recognizes! But if all
the server does is ask Google, "what's the email address associated
with this token's end user?", then it's about to make some very bad
decisions. The email address it gets back doesn't belong to Jacob the
Evil Bearer; it belongs to you.

Now, the token introspection endpoint I mentioned upthread should give
us the required information (scopes, etc.). But Google doesn't
implement that one. In fact they don't seem to have implemented custom
scopes at all in the years since I started work on this feature, which
makes me think that people are probably not going to be able to safely
log into Postgres using Google tokens. Hopefully there's some feature
buried somewhere that I haven't seen.

Let me know if that makes sense. (And again: I'd love to be proven
wrong. It would improve the reach of the feature considerably if I
am.)

Thanks,
--Jacob




Re: Using per-transaction memory contexts for storing decoded tuples

2024-09-27 Thread Masahiko Sawada
On Fri, Sep 27, 2024 at 12:39 AM Shlok Kyal  wrote:
>
> On Mon, 23 Sept 2024 at 09:59, Amit Kapila  wrote:
> >
> > On Sun, Sep 22, 2024 at 11:27 AM David Rowley  wrote:
> > >
> > > On Fri, 20 Sept 2024 at 17:46, Amit Kapila  
> > > wrote:
> > > >
> > > > On Fri, Sep 20, 2024 at 5:13 AM David Rowley  
> > > > wrote:
> > > > > In general, it's a bit annoying to have to code around this
> > > > > GenerationContext fragmentation issue.
> > > >
> > > > Right, and I am also slightly afraid that this may not cause some
> > > > regression in other cases where defrag wouldn't help.
> > >
> > > Yeah, that's certainly a possibility. I was hoping that
> > > MemoryContextMemAllocated() being much larger than logical_work_mem
> > > could only happen when there is fragmentation, but certainly, you
> > > could be wasting effort trying to defrag transactions where the
> > > changes all arrive in WAL consecutively and there is no
> > > defragmentation. It might be some other large transaction that's
> > > causing the context's allocations to be fragmented. I don't have any
> > > good ideas on how to avoid wasting effort on non-problematic
> > > transactions. Maybe there's something that could be done if we knew
> > > the LSN of the first and last change and the gap between the LSNs was
> > > much larger than the WAL space used for this transaction. That would
> > > likely require tracking way more stuff than we do now, however.
> > >
> >
> > With more information tracking, we could avoid some non-problematic
> > transactions but still, it would be difficult to predict that we
> > didn't harm many cases because to make the memory non-contiguous, we
> > only need a few interleaving small transactions. We can try to think
> > of ideas for implementing defragmentation in our code if we first can
> > prove that smaller block sizes cause problems.
> >
> > > With the smaller blocks idea, I'm a bit concerned that using smaller
> > > blocks could cause regressions on systems that are better at releasing
> > > memory back to the OS after free() as no doubt malloc() would often be
> > > slower on those systems. There have been some complaints recently
> > > about glibc being a bit too happy to keep hold of memory after free()
> > > and I wondered if that was the reason why the small block test does
> > > not cause much of a performance regression. I wonder how the small
> > > block test would look on Mac, FreeBSD or Windows. I think it would be
> > > risky to assume that all is well with reducing the block size after
> > > testing on a single platform.
> > >
> >
> > Good point. We need extensive testing on different platforms, as you
> > suggest, to verify if smaller block sizes caused any regressions.
>
> I did similar tests on Windows. rb_mem_block_size was changed from 8kB
> to 8MB. Below table shows the result (average of 5 runs) and Standard
> Deviation (of 5 runs) for each block-size.
>
> ===
> block-size  |Average time (ms)   |Standard Deviation (ms)
> -
> 8kb|12580.879 ms |144.6923467
> 16kb  |12442.7256 ms   |94.02799006
> 32kb  |12370.7292 ms   |97.7958552
> 64kb  |11877.4888 ms   |222.2419142
> 128kb|11828.8568 ms   |129.732941
> 256kb|11801.086 ms |20.60030913
> 512kb|12361.4172 ms   |65.27390105
> 1MB  |12343.3732 ms   |80.84427202
> 2MB  |12357.675 ms |79.40017604
> 4MB  |12395.8364 ms   |76.78273689
> 8MB  |11712.8862 ms   |50.74323039
> ==
>
> From the results, I think there is a small regression for small block size.
>
> I ran the tests in git bash. I have also attached the test script.

Thank you for testing on Windows! I've run the same benchmark on Mac
(Sonoma 14.7, M1 Pro):

8kB: 4852.198 ms
16kB: 4822.733 ms
32kB: 4776.776 ms
64kB: 4851.433 ms
128kB: 4804.821 ms
256kB: 4781.778 ms
512kB: 4776.486 ms
1MB: 4783.456 ms
2MB: 4770.671 ms
4MB: 4785.800 ms
8MB: 4747.447 ms

I can see there is a small regression for small block sizes.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Vacuum statistics

2024-09-27 Thread Masahiko Sawada
Hi,

On Thu, Sep 5, 2024 at 2:01 PM Alena Rybakina  wrote:
>
> Hi! Thank you for your review!
>
> On 05.09.2024 15:47, jian he wrote:
>
> On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina  
> wrote:
>
> Hi, all!
>
> I have attached the new version of the code and the diff files
> (minor-vacuum.no-cbot).

Thank you for updating the patches. I've reviewed the 0001 patch and
have two comments.

I think we can split the 0001 patch into two parts: adding
pg_stat_vacuum_tables system views that shows the vacuum statistics
that we are currently collecting such as scanned_pages and
removed_pages, and another one is to add new statistics to collect
such as vacrel->set_all_visible_pages and visibility map updates.

I'm concerned that a pg_stat_vacuum_tables view has some duplicated
statistics that we already collect in different ways. For instance,
total_blks_{read,hit,dirtied,written} are already tracked at
system-level by pg_stat_io, and per-relation block I/O statistics can
be collected using pg_stat_statements. Having duplicated statistics
consumes more memory for pgstat and could confuse users if these
statistics are not consistent. I think it would be better to avoid
collecting duplicated statistics in different places.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-09-27 Thread Christoph Berg
Re: Tom Lane
> Perhaps.  I spent a little more effort on this and added code to
> report errors that don't come with an error location.  On those,
> we don't have any constraints about what to report in the QUERY
> field, so I made it trim the string to just the current query
> within the script, which makes things quite a bit better.  You
> can see the results in the test_extensions regression test changes.

That looks very good me, thanks!

> (It might be worth some effort to trim away comments appearing
> just before a command, but I didn't tackle that here.)

The "error when psql" comments do look confusing, but I guess in other
places the comment just before the query adds valuable context, so I'd
say leaving the comments in is ok.

Christoph




Re: Thoughts about NUM_BUFFER_PARTITIONS

2024-09-27 Thread Nathan Bossart
On Mon, Aug 05, 2024 at 12:32:20AM +0500, Andrey M. Borodin wrote:
> I´ve found some dead code: BufMappingPartitionLockByIndex() is unused,
> and unused for a long time. See patch 1.

I don't see a reason to also get rid of BufTableHashPartition(), but
otherwise this looks reasonable to me.  It would probably be a good idea to
first check whether there are any external callers we can find.

> I´ve prototyped similar GUC for anyone willing to do such experiments.
> See patch 2, 4. Probably, I´ll do some experiments too, on customer's
> clusters and workloads :)

Like Tomas, I'm not too wild about making this a GUC.  And as Heikki
pointed out upthread, a good first step would be to benchmark different
NUM_BUFFER_PARTITIONS settings on modern hardware.  I suspect the current
setting is much lower than optimal (e.g., I doubled it and saw a TPS boost
for read-only pgbench on an i5-13500T), but I don't think we fully
understand the performance characteristics with different settings yet.  If
we find that the ideal setting depends heavily on workload/hardware, then
perhaps we can consider adding a GUC, but I don't think we are there yet.

>> Anyway, this value is inherently a trade off. If it wasn't, we'd set it
>> to something super high from the start. But having more partitions of
>> the lock table has a cost too, because some parts need to acquire all
>> the partition locks (and that's O(N) where N = number of partitions).
> 
> I´ve found none such cases, actually. Or, perhaps, I was not looking good
> enough. pg_buffercache iterates over buffers and releases locks. See
> patch 3 to fix comments.

Yeah, I think 0003 is reasonable, too.  pg_buffercache stopped acquiring
all the partition locks in v10 (see commit 6e65454), which is also the
commit that removed all remaining uses of BufMappingPartitionLockByIndex().
In fact, I think BufMappingPartitionLockByIndex() was introduced just for
this pg_buffercache use-case (see commit ea9df81). 

-- 
nathan




Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-09-27 Thread Nathan Bossart
On Fri, Sep 27, 2024 at 09:22:48AM -0700, Jeff Davis wrote:
> I suggest that we add the wording to the
> query portion of the doc, near "security-
> restricted operation".

How does this look?

diff --git a/doc/src/sgml/ref/create_materialized_view.sgml 
b/doc/src/sgml/ref/create_materialized_view.sgml
index 0d2fea2b97..62d897931c 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -143,7 +143,9 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] 
table_name
   A SELECT, TABLE,
   or VALUES command.  
This query will run within a
   security-restricted operation; in particular, calls to functions that
-  themselves create temporary tables will fail.
+  themselves create temporary tables will fail.  Also, while the query is
+  running, the  is temporarily changed to
+  pg_catalog, pg_temp.
  
 


-- 
nathan




Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-09-27 Thread Jeff Davis
On Fri, 2024-09-27 at 15:04 -0500, Nathan Bossart wrote:
> On Fri, Sep 27, 2024 at 09:22:48AM -0700, Jeff Davis wrote:
> > I suggest that we add the wording to the
> > query portion of the doc, near
> > "security-
> > restricted operation".
> 
> How does this look?

Looks good to me.

Regards,
Jeff Davis





Re: Vacuum statistics

2024-09-27 Thread Masahiko Sawada
On Fri, Sep 27, 2024 at 12:19 PM Melanie Plageman
 wrote:
>
> On Fri, Sep 27, 2024 at 2:16 PM Masahiko Sawada  wrote:
> >
> > Hi,
> >
> > On Thu, Sep 5, 2024 at 2:01 PM Alena Rybakina  
> > wrote:
> > >
> > > Hi! Thank you for your review!
> > >
> > > On 05.09.2024 15:47, jian he wrote:
> > >
> > > On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina  
> > > wrote:
> > >
> > > Hi, all!
> > >
> > > I have attached the new version of the code and the diff files
> > > (minor-vacuum.no-cbot).
> >
> > Thank you for updating the patches. I've reviewed the 0001 patch and
> > have two comments.
>
> I took a very brief look at this and was wondering if it was worth
> having a way to make the per-table vacuum statistics opt-in (like a
> table storage parameter) in order to decrease the shared memory
> footprint of storing the stats.

I'm not sure how users can select tables that enable vacuum statistics
as I think they basically want to have statistics for all tables, but
I see your point. Since the size of PgStat_TableCounts approximately
tripled by this patch (112 bytes to 320 bytes), it might be worth
considering ways to reduce the number of entries or reducing the size
of vacuum statistics.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Row pattern recognition

2024-09-27 Thread Jacob Champion
On Wed, Sep 18, 2024 at 10:00 PM Tatsuo Ishii  wrote:
>
> Attached are the v22 patches. Just rebased.

Thanks!

With some bigger partitions, I hit an `ERROR:  wrong pos: 1024`. A
test that reproduces it is attached.

While playing with the feature, I've been trying to identify runs of
matched rows by eye. But it's pretty difficult -- the best I can do is
manually count rows using a `COUNT(*) OVER ...`. So I'd like to
suggest that MEASURES be part of the eventual v1 feature, if there's
no other way to determine whether a row was skipped by a previous
match. (That was less obvious to me before the fix in v17.)

--

I've been working on an implementation [1] of SQL/RPR's "parenthesized
language" and preferment order. (These are defined in SQL/Foundation
2023, section 9.41.) The tool gives you a way to figure out, for a
given pattern, what matches are supposed to be attempted and in what
order:

$ ./src/test/modules/rpr/rpr_prefer "a b? a"
( ( a ( b ) ) a )
( ( a ( ) ) a )

Many simple patterns result in an infinite set of possible matches. So
if you use an unbounded quantifiers, you have to also use --max-rows
to limit the size of the hypothetical window frame:

$ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
( ( ^ ( ) ) $ )

I've found this useful to check my personal understanding of the spec
and the match behavior, but it could also potentially be used to
generate test cases, or to help users debug their own patterns. For
example, a pattern that has a bunch of duplicate sequences in its PL
is probably not very well optimized:

$ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
( ( a a a ) ( a ) )
( ( a a ) ( a a ) )
( ( a a ) ( a ) )
( ( a ) ( a a a ) )
( ( a ) ( a a ) )
( ( a ) ( a ) )

And patterns with catastrophic backtracking behavior tend to show a
"sawtooth" pattern in the output, with a huge number of potential
matches being generated relative to the number of rows in the frame.

My implementation is really messy -- it leaks memory like a sieve, and
I cannibalized the parser from ECPG, which just ended up as an
exercise in teaching myself flex/bison. But if there's interest in
having this kind of tool in the tree, I can work on making it
reviewable. Either way, I should be able to use it to double-check
more complicated test cases.

A while back [2], you were wondering whether our Bison implementation
would be able to parse the PATTERN grammar directly. I think this tool
proves that the answer is "yes", but PERMUTE in particular causes a
shift/reduce conflict. To fix it, I applied the same precedence
workaround that we use for CUBE and ROLLUP.

Thanks again!
--Jacob

[1] https://github.com/jchampio/postgres/tree/dev/rpr
[2] 
https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp
commit 819c1abba21c9b59cceedd55962c3fdcb982aad5
Author: Jacob Champion 
Date:   Thu Sep 26 14:12:38 2024 -0700

RPR: test larger window partitions

The second test case fails with

ERROR:  wrong pos: 1024

diff --git a/src/test/regress/expected/rpr.out 
b/src/test/regress/expected/rpr.out
index 0789e09435..dcfd67f143 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -834,3 +834,40 @@ ERROR:  SEEK is not supported
 LINE 8:  SEEK
  ^
 HINT:  Use INITIAL.
+-- Smoke test for larger partitions.
+WITH s AS (
+ SELECT v, count(*) OVER w AS c
+ FROM (SELECT generate_series(1, 5000) v)
+ WINDOW w AS (
+  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+  AFTER MATCH SKIP PAST LAST ROW
+  INITIAL
+  PATTERN ( r+ )
+  DEFINE r AS TRUE
+ )
+)
+-- Should be exactly one long match across all rows.
+SELECT * FROM s WHERE c > 0;
+ v |  c   
+---+--
+ 1 | 5000
+(1 row)
+
+WITH s AS (
+ SELECT v, count(*) OVER w AS c
+ FROM (SELECT generate_series(1, 5000) v)
+ WINDOW w AS (
+  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+  AFTER MATCH SKIP PAST LAST ROW
+  INITIAL
+  PATTERN ( r )
+  DEFINE r AS TRUE
+ )
+)
+-- Every row should be its own match.
+SELECT count(*) FROM s WHERE c > 0;
+ count 
+---
+  5000
+(1 row)
+
diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
index 302e2b86a5..6ff61e6d60 100644
--- a/src/test/regress/sql/rpr.sql
+++ b/src/test/regress/sql/rpr.sql
@@ -403,3 +403,32 @@ SELECT company, tdate, price, first_value(price) OVER w, 
last_value(price) OVER
   UP AS price > PREV(price),
   DOWN AS price < PREV(price)
 );
+
+-- Smoke test for larger partitions.
+WITH s AS (
+ SELECT v, count(*) OVER w AS c
+ FROM (SELECT generate_series(1, 5000) v)
+ WINDOW w AS (
+  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWIN

Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2024-09-27 Thread David E. Wheeler
On Sep 26, 2024, at 13:59, Florents Tselai  wrote:

> Speaking of extensible: the jsonpath standard does mention function 
> extensions [1] ,
> so it looks like we're covered by the standard, and the mutability aspect is 
> an implementation detail. No?

That’s not the standard used for Postgres jsonpath. Postgres follows the 
SQL/JSON standard in the SQL standard, which is not publicly available, but a 
few people on the list have copies they’ve purchased and so could provide some 
context.

In a previous post I wondered if the SQL standard had some facility for 
function extensions, but I suspect not. Maybe in the next iteration?

> And having said that, the whole jsonb/jsonpath parser/executor infrastructure 
> is extremely powerful
> and kinda under-utilized if we use it "only" for jsonpath.
> Tbh, I can see it supporting more specific DSLs and even offering hooks for 
> extensions.
> And I know for certain I'm not the only one thinking about this.
> See [2] for example where they've lifted, shifted and renamed the 
> jsonb/jsonpath infra to build a separate language for graphs

I’m all for extensibility, though jsonpath does need to continue to comply with 
the SQL standard. Do you have some idea of the sorts of hooks that would allow 
extension authors to use some of that underlying capability?

Best,

David





general purpose array_sort

2024-09-27 Thread Junwang Zhao
Hi hackers,

per David's suggestion, this patch implements general
purpose array sort.

We can do the following with this patch:

SELECT array_sort('{1.1,3.3,5.5,2.2,4.4,6.6}'::float8[], 'asc');
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[], 'asc', 'pg_c_utf8');

-- 
Regards
Junwang Zhao


v1-0001-general-purpose-array_sort.patch
Description: Binary data


Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2024-09-27 Thread Florents Tselai



> On 27 Sep 2024, at 12:45 PM, David E. Wheeler  wrote:
> 
> On Sep 26, 2024, at 13:59, Florents Tselai  wrote:
> 
>> Speaking of extensible: the jsonpath standard does mention function 
>> extensions [1] ,
>> so it looks like we're covered by the standard, and the mutability aspect is 
>> an implementation detail. No?
> 
> That’s not the standard used for Postgres jsonpath. Postgres follows the 
> SQL/JSON standard in the SQL standard, which is not publicly available, but a 
> few people on the list have copies they’ve purchased and so could provide 
> some context.
> 
> In a previous post I wondered if the SQL standard had some facility for 
> function extensions, but I suspect not. Maybe in the next iteration?
> 
>> And having said that, the whole jsonb/jsonpath parser/executor 
>> infrastructure is extremely powerful
>> and kinda under-utilized if we use it "only" for jsonpath.
>> Tbh, I can see it supporting more specific DSLs and even offering hooks for 
>> extensions.
>> And I know for certain I'm not the only one thinking about this.
>> See [2] for example where they've lifted, shifted and renamed the 
>> jsonb/jsonpath infra to build a separate language for graphs
> 
> I’m all for extensibility, though jsonpath does need to continue to comply 
> with the SQL standard. Do you have some idea of the sorts of hooks that would 
> allow extension authors to use some of that underlying capability?

Re-tracing what I had to do

1. Define a new JsonPathItemType jpiMyExtType and map it to a JsonPathKeyword
2. Add a new JsonPathKeyword and make the lexer and parser aware of that,
3. Tell the main executor executeItemOptUnwrapTarget what to do when the new 
type is matched.

I think 1, 2 are the trickiest because they require hooks to  jsonpath_scan.l 
and parser jsonpath_gram.y 

3. is the meat of a potential hook, which would be something like 
extern JsonPathExecResult executeOnMyJsonpathItem(JsonPathExecContext *cxt, 
JsonbValue *jb, JsonValueList *found);
This should be called by the main executor executeItemOptUnwrapTarget when it 
encounters case jpiMyExtType

It looks like quite an endeavor, to be honest.



Re: Truncate logs by max_log_size

2024-09-27 Thread Andrey M. Borodin



> On 27 Sep 2024, at 03:30, Euler Taveira  wrote:
> 
> Let's say you arbitrarily provide max_log_size = 100

Consider max_log_size = 10Mb. The perspective might look very different. It’s 
not about WHERE anymore. It's a guard against heavy abuse.

The feature looks very important for me.


Best regards, Andrey Borodin.



Re: Extension security improvement: Add support for extensions with an owned schema

2024-09-27 Thread Tomas Vondra
Hi,

I've spent a bit of time looking at this patch. It seems there's a clear
consensus that having "owned schemas" for extensions would be good for
security. To me it also seems as a convenient way to organize stuff. It
was possible to create extensions in a separate schema before, ofc, but
that's up to the DBA. With this the extension author to enforce that.

One thing that's not quite clear to me is what's the correct way for
existing extensions to switch to an "owned schema". Let's say you have
an extension. How do you transition to this? Can you just add it to the
control file and then some magic happens?

A couple minor comments:


1) doc/src/sgml/extend.sgml

  An extension is owned_schema if it requires a
  new dedicated schema for its objects. Such a requirement can make
  security concerns related to search_path injection
  much easier to reason about. The default is false,
  i.e., the extension can be installed into an existing schema.

Doesn't "extension is owned_schema" sound a bit weird? I'd probably say
"extension may own a schema" or something like that.

Also, "requires a new dedicated schema" is a bit ambiguous. It's not
clear if it means the schema is expected to exist, or if it creates the
schema itself.

And perhaps it should clarify what "much easier to reason about" means.
That's pretty vague, and as a random extension author I wouldn't know
about the risks to consider. Maybe there's a section about this stuff
that we could reference?


2) doc/src/sgml/ref/create_extension.sgml

  relocated.  The named schema must already exist if the extension's
  control file does not specify owned_schema.

Seems a bit unclear, I'd say having "owned_schema = false" in the
control file still qualifies as "specifies owned_schema". So might be
better to say it needs to be set to true?

Also, perhaps "dedicated_schema" would be better than "owned_schema"? I
mean, the point is not that it's "owned" by the extension, but that
there's nothing else in it. But that's nitpicking.


3) src/backend/commands/extension.c

I'm not sure why AlterExtensionNamespace moves the privilege check. Why
should it not check the privilege for owned schema too?


4) src/bin/pg_dump/pg_dump.c

checkExtensionMembership has typo "owned_schem".

Shouldn't binary_upgrade_extension_member still set ext=NULL in the for
loop, the way the original code does that?

The long if conditions might need some indentation, I guess. pgindent
leaves them like this, but 100 columns seems a bit too much. I'd do a
line break after each condition, I guess.




regards

-- 
Tomas Vondra





Re: Add has_large_object_privilege function

2024-09-27 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> - New functions introduced during a development cycle should use OIDs in
> the range 8000-.  See 98eab30b93d5, consisting of running
> ./unused_oids to get a random range.

There's been seen several fixups of this kind recently.  Should we add a
note about this to the comment at the top of all of the pg_*.dat files
that have explicit oid assignments?  People might be more likely to
notice that than the the section over in bki.sgml.

- ilmari




Re: pg_verifybackup: TAR format backup verification

2024-09-27 Thread Robert Haas
On Fri, Sep 27, 2024 at 2:07 AM Amul Sul  wrote:
> Thank you, Robert. The code changes look much better now.

Cool.

> A few minor comments:
>
> +each tablespace, named after the tablespace's OID. If the backup
> +is compressed, the relevant compression extension is added to the
> +end of each file name.
>
> I am a bit unsure about the last line, especially the use of the word
> "added."  I feel like it's implying that we're adding something, which
> isn't true.

If you add .gz to the end of 16904.tar, you get 16904.tar.gz. This
seems like correct English to me.

> Typo: futher

OK, thanks.

> The addition of simple_ptr_list_destroy will be part of a separate
> commit, correct?

To me, it doesn't seem worth splitting that out into a separate commit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: On disable_cost

2024-09-27 Thread David Rowley
On Fri, 27 Sept 2024 at 20:42, Laurenz Albe  wrote:
> 2. The "disabled nodes" are not only shown at the nodes where nodes
>were actually disabled, but also at every nodes above these nodes.

I'm also not a fan either and I'd like to see this output improved.

It seems like it's easy enough to implement some logic to detect when
a given node is disabled just by checking if the disable_nodes count
is higher than the sum of the disabled_node field of the node's
children.  If there are no children (a scan node) and disabed_nodes >
0 then it must be disabled. There's even a nice fast path where we
don't need to check the children if disabled_nodes == 0.

Here's a POC grade patch of how I'd rather see it looking.

I opted to have a boolean field as I didn't see any need for an
integer count. I also changed things around so we always display the
boolean property in non-text EXPLAIN. Normally, we don't mind being
more verbose there.

I also fixed a bug in make_sort() where disabled_nodes isn't being set
properly. I'll do an independent patch for that if this goes nowhere.

David


poc_improve_disabled_nodes_explain_output.patch
Description: Binary data


Re: Changing the default random_page_cost value

2024-09-27 Thread Dagfinn Ilmari Mannsåker
Greg Sabino Mullane  writes:

> So I'll be brave and throw a number out there: 1.2. And change our
> docs to say wordage like "if you are using an older hard disk drive
> technology, you may want to try raising rpc" to replace our
> fairly-hidden note about SSDs buried in the last sentence - of the
> fourth paragraph - of the rpc docs.

It might also be worth mentioning cloudy block storage (e.g. AWS' EBS),
which is typically backed by SSDs, but has extra network latency.

- ilmari




Re: SQL:2023 JSON simplified accessor support

2024-09-27 Thread David E. Wheeler
On Sep 26, 2024, at 16:45, Alexandra Wang  wrote:

> I didn’t run pgindent earlier, so here’s the updated version with the
> correct indentation. Hope this helps!

Oh,  nice! I don’t suppose the standard also has defined an operator equivalent 
to ->>, though, has it? I tend to want the text output far more often than a 
JSON scalar.

Best,

David





IndexAccessMethod API & Index Only Scans

2024-09-27 Thread Eric Ridge
Hi all!  Congrats on releasing v17!

I'm adding support for Index Only Scans to a custom IAM impl and I've got a 
little dilemma.  

My IAM implementation is essentially a composite index that might have up to 32 
columns and while it can return any column in the index definition it's quite 
expensive to do so.  It doesn't have an already formed index tuple sitting 
right there like the built-in btree index does.  Returning 1 or 2 columns is 
usually a win over the regular Index Scan version, but more than that and it's 
at least a wash if not a total loss.

Since not all Index Only Scans need *all* the columns, was there ever any 
thought around providing the required attnos as a field on IndexScanDescData?  
That information seems to be there in the nodeindexonlyscan.c machinery.

As a compromise, I've settled on my `amcanreturn` function saying it only knows 
how to return attno==1, which is sufficient for some query patterns, but I'd 
like to be able to have my index directly feed tuples into aggregate plans and 
such too.  It's just too expensive for me to always return all the columns when 
generally maybe only 1 or 2 are needed (there doesn't seem to be a way to weave 
that into the cost estimations, but that wouldn't matter if I knew which 
specific columns I need to fetch out of my index).

I'm pretty familiar with the IAM and the internals around it, but maybe I've 
missed something -- can I get at this information some other way?

Thanks for your time and consideration!

eric



Re: Pgoutput not capturing the generated columns

2024-09-27 Thread Shubham Khanna
On Tue, Sep 24, 2024 at 5:16 AM Peter Smith  wrote:
>
> Hi. Here are my review comments for v32-0001
>
> You wrote: "I have addressed all the comments in the v32-0001 Patch.",
> however, I found multiple old review comments not addressed. Please
> give a reason if a comment is deliberately left out, otherwise, I will
> assume they are omitted by accident and so keep repeating them.
>
> There were also still some unanswered questions from previous reviews,
> so I have reminded you about those again here.
>
> ==
> Commit message
>
> 1.
> This commit enables support for the 'publish_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. The option
> 'publish_generated_columns' is a PUBLICATION parameter.
>
> ~
>
> That PUBLICATION info in the 2nd sentence would be easier to say in
> the 1st sentence.
> SUGGESTION:
> This commit supports the transmission of generated column information
> and data alongside regular table changes. This behaviour is controlled
> by a new PUBLICATION parameter ('publish_generated_columns').
>
> ~~~
>
> 2.
> When 'publish_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> Hm. This contradicts the behaviour that Amit wanted, (e.g.
> "column-list takes precedence"). So I am not sure if this patch is
> already catering for the behaviour suggested by Amit or if that is yet
> to come in v33. For now, I am assuming that 32* has not caught up with
> the latest behaviour requirements, but that might be a wrong
> assumption; perhaps it is only this commit message that is bogus.
>
> ~~~
>
> 3. General.
>
> On the same subject, there is lots of code, like:
>
> if (att->attgenerated && !pub->pubgencols)
> continue;
>
> I suspect that might not be quite what you want for the "column-list
> takes precedence" behaviour, but I am not going to identify all those
> during this review. It needs lots of combinations of column list tests
> to verify it.
>
> ==
> doc/src/sgml/ddl.sgml
>
> 4ab.
> nit - Huh?? Not changed the linkend as told in a previous review [1-#3a]
> nit - Huh?? Not changed to call this a "parameter" instead of an
> "option" as told in a previous review [1-#3b]
>
> ==
> doc/src/sgml/protocol.sgml
>
> 5.
> - 
> -  Next, the following message part appears for each column included in
> -  the publication (except generated columns):
> - 
> -
>
> nit -- Huh?? I don't think you can just remove this whole paragraph.
> But, probably you can just remove the "except generated columns" part.
> I posted this same comment [4 #11] 20 patch versions back.
>
> ==
> doc/src/sgml/ref/create_publication.sgml
>
> 6abc.
> nit - Huh?? Not changed the parameter ID as told in a previous review [1-#6]
> nit - Huh?? Not removed paragraph "This option is only available..."
> as told in a previous review. See [1-#7]
> nit - Huh?? Not removed paragraph "This parameter can only be set" as
> told in a previous review. See [1-#8]
>
> ==
> src/backend/catalog/pg_publication.c
>
> 7.
>   if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> - ereport(ERROR,
> + ereport(WARNING,
>   errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> - errmsg("cannot use generated column \"%s\" in publication column list",
> + errmsg("specified generated column \"%s\" in publication column list
> for publication with publish_generated_columns as false",
>  colname));
>
> I did not understand how this WARNING can know
> "publish_generated_columns as false"? Should the code be checking the
> function parameter 'pubgencols'?
>
> The errmsg also seemed a bit verbose. How about:
> "specified generated column \"%s\" in publication column list when
> publish_generated_columns = false"
>
> ==
> src/backend/replication/logical/proto.c
>
> 8.
> logicalrep_write_tuple:
> logicalrep_write_attrs:
>
> Reminder. I think I have multiple questions about this code from
> previous reviews that may be still unanswered. See [2 #4]. Maybe when
> you implement Amit's "column list takes precedence" behaviour then
> this code is fine as-is (because the replication message might include
> gencols or not-gecols regardless of the 'publish_generated_columns'
> value). But I don't think that is the current implementation, so
> something did not quite seem right. I am not sure. If you say it is
> fine then I will believe it, but the question [2 #4] remains
> unanswered.
>
> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 9.
> send_relation_and_attrs:
>
> Reminder: Here is another question that was answered from [2 #5]. I
> did not really trust it for the current implementation, but for the
> "column list takes precedence" behaviour probably it will be ok.
>
> ~~~
>
> 10.
> +/*
> + * Prepare new column list bitmap. This includes all the columns of the 
> table.
> + */
> +static Bitmapset *
> +prepare_all_columns_bms(PGOutputData *data, RelationSyncEntry *en

Re: Pgoutput not capturing the generated columns

2024-09-27 Thread Shubham Khanna
On Tue, Sep 24, 2024 at 7:08 AM Peter Smith  wrote:
>
> Hi. Here are my v32-0002 review comments:
>
> ==
> src/backend/replication/logical/tablesync.c
>
> 1. fetch_remote_table_info
>
>   /*
> - * Get column lists for each relation.
> + * Get column lists for each relation, and check if any of the
> + * publications have the 'publish_generated_columns' parameter enabled.
>
> I am not 100% sure about this logic anymore. Maybe it is OK, but it
> requires careful testing because with Amit's "column lists take
> precedence" it is now possible for the publication to say
> 'publish_generated_columns=false', but the publication can still
> publish gencols *anyway* if they were specified in a column list.
>
> ~~~
>

This comment is still open. Will fix this in the next version of patches.

> 2.
>   /*
>   * Fetch info about column lists for the relation (from all the
>   * publications).
>   */
> + StringInfo pub_names = makeStringInfo();
> +
> + get_publications_str(MySubscription->publications, pub_names, true);
>   resetStringInfo(&cmd);
>   appendStringInfo(&cmd,
> ~
>
> nit - The comment here seems misplaced.
>
> ~~~
>
> 3.
> + if (server_version >= 12)
> + {
> + has_pub_with_pubgencols = server_version >= 18 && 
> has_pub_with_pubgencols;
> +
> + if (!has_pub_with_pubgencols)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
>
> My previous review comment about this [1 #10] was:
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> nit - So the current v32 code is not what I was expecting. What I
> meant was 'has_pub_with_pubgencols' can only be true if server_version
> >= 18, so I thought there was no reason to check it again. For
> reference, I've changed it to like I meant in the nitpicks attachment.
> Please see if that works the same.
>
> ==
> [1] my review of v31-0002.
> https://www.postgresql.org/message-id/CAHut%2BPusbhvPrL1uN1TKY%3DFd4zu3h63eDebZvsF%3Duy%2BLBKTwgA%40mail.gmail.com
>

I have addressed all the comments in the v34-0002 Patch. Please refer
to the updated v34-0002 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.

2024-09-27 Thread Max Johnson
I think that it would be a good idea to include these fixes in the next minor 
release. After working for a couple months on getting our embedded systems 2038 
compliant, it has become very apparent that 2038 will be a substantial ordeal. 
Maximizing the number of systems that include this fix would make things a 
little easier when that time comes around.

Thanks,

Max

From: Nathan Bossart 
Sent: Thursday, September 26, 2024 9:38 PM
To: Max Johnson 
Cc: pgsql-hack...@postgresql.org ; 
t...@sss.pgh.pa.us 
Subject: Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of 
long to avoid 2038 problem.

On Wed, Sep 25, 2024 at 08:04:59PM +, Max Johnson wrote:
> I think your patch looks good, no objections. I am happy to have contributed.

Great.  I've attached what I have staged for commit.

My first instinct was to not bother back-patching this since all
currently-supported versions will have been out of support for over 8 years
by the time this becomes a practical issue.  However, I wonder if it makes
sense to back-patch for the kinds of 32-bit embedded systems you cited
upthread.  I can imagine that such systems might need to work for a very
long time without any software updates, in which case it'd probably be a
good idea to make this fix available in the next minor release.  What do
you think?

--
nathan


Re: Pgoutput not capturing the generated columns

2024-09-27 Thread Shubham Khanna
On Mon, Sep 23, 2024 at 6:19 PM vignesh C  wrote:
>
> On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
>  wrote:
> >
> > On Wed, Sep 11, 2024 at 8:55 AM Peter Smith  wrote:
> >
> > I have fixed all the comments. The attached patches contain the desired 
> > changes.
> > Also the merging of 0001 and 0002 can be done once there are no
> > comments on the patch to help in reviewing.
>
> Few comments:
> 1) This commit  message seems wrong, currently irrespective of
> publish_generated_columns, the column specified in column list take
> preceedene:
> When 'publish_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> 2) Since we have added pubgencols to pg_pubication.h we can specify
> "Bump catversion" in the commit message.
>
> 3) In create publication column list/publish_generated_columns
> documentation we should mention that if generated column is mentioned
> in column list, generated columns mentioned in column list will be
> replication irrespective of publish_generated_columns option.
>
> 4) This warning should be mentioned only if publish_generated_columns is 
> false:
> if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -   ereport(ERROR,
> +   ereport(WARNING,
>
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -   errmsg("cannot use generated
> column \"%s\" in publication column list",
> +   errmsg("specified generated
> column \"%s\" in publication column list for publication with
> publish_generated_columns as false",
>colname));
>
> 5) These tests are not required for this feature:
> +   'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
> +   create_order => 51,
> +   create_sql =>
> + 'ALTER PUBLICATION pub5 ADD TABLE
> dump_test.test_table WHERE (col1 > 0);',
> +   regexp => qr/^
> +   \QALTER PUBLICATION pub5 ADD TABLE ONLY
> dump_test.test_table WHERE ((col1 > 0));\E
> +   /xm,
> +   like => { %full_runs, section_post_data => 1, },
> +   unlike => {
> +   exclude_dump_test_schema => 1,
> +   exclude_test_table => 1,
> +   },
> +   },
> +
> +   'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE
> (col2 = \'test\');'
> + => {
> +   create_order => 52,
> +   create_sql =>
> + 'ALTER PUBLICATION pub5 ADD TABLE
> dump_test.test_second_table WHERE (col2 = \'test\');',
> +   regexp => qr/^
> +   \QALTER PUBLICATION pub5 ADD TABLE ONLY
> dump_test.test_second_table WHERE ((col2 = 'test'::text));\E
> +   /xm,
> +   like => { %full_runs, section_post_data => 1, },
> +   unlike => { exclude_dump_test_schema => 1, },
> + },
>

I have addressed all the comments in the v34-0001 Patch. Please refer
to the updated v34-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Changing the default random_page_cost value

2024-09-27 Thread Roberto Mello
On Fri, Sep 27, 2024 at 8:07 AM Greg Sabino Mullane 
wrote:

> tl;dr let's assume SSDs are popular and HDDs are the exception and flip
> our default
>




> Granted, there are other factors involved, and yes, perhaps we should
> tweak some of the similar settings as well, but ranom_page_cost is the one
> setting most out of sync with today's hardware realities. So I'll be brave
> and throw a number out there: 1.2. And change our docs to say wordage like
> "if you are using an older hard disk drive technology, you may want to try
> raising rpc" to replace our fairly-hidden note about SSDs buried in the
> last sentence - of the fourth paragraph - of the rpc docs.
>

+1

I suggest a slightly nicer comment in the default conf file, like "For
spinning hard drives, raise this to at least 3 and test"

Roberto


Re: Changing the default random_page_cost value

2024-09-27 Thread Laurenz Albe
On Fri, 2024-09-27 at 10:07 -0400, Greg Sabino Mullane wrote:
> So I'll be brave and throw a number out there: 1.2.

+1

Laurenz Albe




Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-09-27 Thread Nathan Bossart
On Fri, Sep 27, 2024 at 12:42:34PM +0900, Yugo NAGATA wrote:
> I agree with you. I overlooked WITH NO DATA.
> I attached a updated patch.

Thanks.  Unless someone objects, I plan to commit this shortly.

-- 
nathan




Re: micro-optimize nbtcompare.c routines

2024-09-27 Thread Nathan Bossart
I've marked this one as Withdrawn.  Apologies for the noise.

-- 
nathan




Re: long-standing data loss bug in initial sync of logical replication

2024-09-27 Thread Shlok Kyal
Hi Kuroda-san,

Thanks for reviewing the patch.

> > Solution:
> > 1. When we alter a publication using commands like ‘ALTER PUBLICATION
> > pub_name DROP TABLE table_name’, first all tables in the publications
> > are invalidated using the function ‘rel_sync_cache_relation_cb’. Then
> > again ‘rel_sync_cache_publication_cb’ function is called which
> > invalidates all the tables.
>
> On my environment, rel_sync_cache_publication_cb() was called first and 
> invalidate
> all the entries, then rel_sync_cache_relation_cb() was called and the 
> specified
> entry is invalidated - hence second is NO-OP.
>

You are correct. I made a silly mistake while writing the write-up.
rel_sync_cache_publication_cb() is called first and invalidate all the
entries, then rel_sync_cache_relation_cb() is called and the specified
entry is invalidated

> > This happens because of the following
> > callback registered:
> > CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
> > rel_sync_cache_publication_cb, (Datum) 0);
>
> But even in this case, I could understand that you want to remove the
> rel_sync_cache_publication_cb() callback.

Yes, I think rel_sync_cache_publication_cb() callback can be removed,
as it is invalidating all the other tables as well (which are not in
this publication).

> > 2. When we add/drop a schema to/from a publication using command like
> > ‘ALTER PUBLICATION pub_name ADD TABLES in SCHEMA schema_name’, first
> > all tables in that schema are invalidated using
> > ‘rel_sync_cache_relation_cb’ and then again
> > ‘rel_sync_cache_publication_cb’ function is called which invalidates
> > all the tables.
>
> Even in this case, rel_sync_cache_publication_cb() was called first and then
> rel_sync_cache_relation_cb().
>

Yes, your observation is correct. rel_sync_cache_publication_cb() is
called first and then rel_sync_cache_relation_cb().

> >
> > 3. When we alter a namespace using command like ‘ALTER SCHEMA
> > schema_name RENAME to new_schema_name’ all the table in cache are
> > invalidated as ‘rel_sync_cache_publication_cb’ is called due to the
> > following registered callback:
> > CacheRegisterSyscacheCallback(NAMESPACEOID,
> > rel_sync_cache_publication_cb, (Datum) 0);
> >
> > So, we added a new callback function ‘rel_sync_cache_namespacerel_cb’
> > will be called instead of function ‘rel_sync_cache_publication_cb’ ,
> > which invalidates only the cache of the tables which are part of that
> > particular namespace. For the new function the ‘namespace id’ is added
> > in the Invalidation message.
>
> Hmm, I feel this fix is too much. Unlike ALTER PUBLICATION statements, I think
> ALTER SCHEMA is rarely executed at the production stage. However, this 
> approach
> requires adding a new cache callback system, which affects the entire postgres
> system; this is not very beneficial compared to the outcome. It should be 
> discussed
> on another thread to involve more people, and then we can add the improvement
> after being accepted.
>
Yes, I also agree with you. I have removed the changes in the updated patch.

> > Performance Comparison:
> > I have run the same tests as shared in [1] and observed a significant
> > decrease in the degradation with the new changes.  With selective
> > invalidation degradation is around ~5%. This results are an average of
> > 3 runs.
>
> IIUC, the executed workload did not contain ALTER SCHEMA command, so
> third improvement did not contribute this improvement.
I have removed the changes corresponding to the third improvement.

I have addressed the comment for 0002 patch and attached the patches.
Also, I have moved the tests in the 0002 to 0001 patch.

Thanks and Regards,
Shlok Kyal


v10-0002-Selective-Invalidation-of-Cache.patch
Description: Binary data


v10-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch
Description: Binary data


Re: On disable_cost

2024-09-27 Thread Laurenz Albe
On Wed, 2024-08-21 at 10:29 -0400, Robert Haas wrote:
> I went ahead and committed these patches. I know there's some debate
> over whether we want to show the # of disabled nodes and if so whether
> it should be controlled by COSTS, and I suspect I haven't completely
> allayed David's concerns about the initial_cost_XXX functions although
> I think that I did the right thing. But, I don't have the impression
> that anyone is desperately opposed to the basic concept, so I think it
> makes sense to put these into the tree and see what happens. We have
> quite a bit of time left in this release cycle to uncover bugs, hear
> from users or other developers, etc. about what problems there may be
> with this. If we end up deciding to reverse course or need to fix a
> bunch of stuff, so be it, but let's see what the feedback is.

I am somewhat unhappy about the "Disabled Nodes" in EXPLAIN.

First, the commit message confused me: it claims that the information
is displayed with EXPLAIN ANALYZE, but it's shown with every EXPLAIN.

But that's not important.  My complaints are:

1. The "disabled nodes" are always displayed.
   I'd be happier if it were only shown for COSTS ON, but I think it
   would be best if they were only shown with VERBOSE ON.

   After all, the messages are pretty verbose...

2. The "disabled nodes" are not only shown at the nodes where nodes
   were actually disabled, but also at every nodes above these nodes.

   This would be fine:

 Sort
   -> Nested Loop Join
-> Hash Join
 -> Index Scan
Disabled Nodes: 1
 -> Hash
  -> Index Scan
 Disabled Nodes: 1
-> Index Scan
   Disabled Nodes: 1

   This is annoying:

 Sort
 Disabled Nodes: 3
   -> Nested Loop Join
  Disabled Nodes: 3
-> Hash Join
   Disabled Nodes: 2
 -> Index Scan
Disabled Nodes: 1
 -> Hash
  -> Index Scan
 Disabled Nodes: 1
-> Index Scan
   Disabled Nodes: 1

I have no idea how #2 could be implemented, but it would be nice to have.
Please, please, can we show the "disabled nodes" only with VERBOSE?

Yours,
Laurenz Albe




Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-09-27 Thread Jeff Davis
On Fri, 2024-09-27 at 10:34 -0500, Nathan Bossart wrote:
> On Fri, Sep 27, 2024 at 12:42:34PM +0900, Yugo NAGATA wrote:
> > I agree with you. I overlooked WITH NO DATA.
> > I attached a updated patch.
> 
> Thanks.  Unless someone objects, I plan to commit this shortly.

The command is run effectively in two parts: the CREATE part and the
REFRESH part. The former just uses the session search path, while the
latter uses the safe search path.

I suggest that we add the wording to the
query portion of the doc, near "security-
restricted operation".

Regards,
Jeff Davis





Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-09-27 Thread Tom Lane
Christoph Berg  writes:
> Re: Tom Lane
>> So the first part of that is great, but if your script file is
>> large you probably won't be happy about having the whole thing
>> repeated in the "QUERY" field.  So this needs some work on
>> user-friendliness.

> Does this really have to be addressed? It would be way better than it
> is now, and errors during extension creation are rare and mostly for
> developers only, so it doesn't have to be pretty.

Perhaps.  I spent a little more effort on this and added code to
report errors that don't come with an error location.  On those,
we don't have any constraints about what to report in the QUERY
field, so I made it trim the string to just the current query
within the script, which makes things quite a bit better.  You
can see the results in the test_extensions regression test changes.

(It might be worth some effort to trim away comments appearing
just before a command, but I didn't tackle that here.)

regards, tom lane

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index fab59ad5f6..0fae1332d2 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -54,6 +54,7 @@
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/queryjumble.h"
 #include "storage/fd.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
@@ -107,6 +108,17 @@ typedef struct ExtensionVersionInfo
 	struct ExtensionVersionInfo *previous;	/* current best predecessor */
 } ExtensionVersionInfo;
 
+/*
+ * Information for script_error_callback()
+ */
+typedef struct
+{
+	const char *sql;			/* entire script file contents */
+	const char *filename;		/* script file pathname */
+	ParseLoc	stmt_location;	/* current stmt start loc, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
+} script_error_callback_arg;
+
 /* Local functions */
 static List *find_update_path(List *evi_list,
 			  ExtensionVersionInfo *evi_start,
@@ -670,9 +682,60 @@ read_extension_script_file(const ExtensionControlFile *control,
 	return dest_str;
 }
 
+/*
+ * error context callback for failures in script-file execution
+ */
+static void
+script_error_callback(void *arg)
+{
+	script_error_callback_arg *callback_arg = (script_error_callback_arg *) arg;
+	int			syntaxerrposition;
+	const char *lastslash;
+
+	/* If it's a syntax error, convert to internal syntax error report */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		/*
+		 * We must report the whole string because otherwise details such as
+		 * psql's line number report would be wrong.
+		 */
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(callback_arg->sql);
+	}
+	else if (callback_arg->stmt_location >= 0)
+	{
+		/*
+		 * Since no syntax cursor will be shown, it's okay and helpful to trim
+		 * the reported query string to just the current statement.
+		 */
+		const char *query = callback_arg->sql;
+		int			location = callback_arg->stmt_location;
+		int			len = callback_arg->stmt_len;
+
+		query = CleanQuerytext(query, &location, &len);
+		internalerrquery(pnstrdup(query, len));
+	}
+
+	/*
+	 * Trim the reported file name to remove the path.  We know that
+	 * get_extension_script_filename() inserted a '/', regardless of whether
+	 * we're on Windows.
+	 */
+	lastslash = strrchr(callback_arg->filename, '/');
+	if (lastslash)
+		lastslash++;
+	else
+		lastslash = callback_arg->filename; /* shouldn't happen, but cope */
+	errcontext("extension script file \"%s\"", lastslash);
+}
+
 /*
  * Execute given SQL string.
  *
+ * The filename the string came from is also provided, for error reporting.
+ *
  * Note: it's tempting to just use SPI to execute the string, but that does
  * not work very well.  The really serious problem is that SPI will parse,
  * analyze, and plan the whole string before executing any of it; of course
@@ -682,12 +745,27 @@ read_extension_script_file(const ExtensionControlFile *control,
  * could be very long.
  */
 static void
-execute_sql_string(const char *sql)
+execute_sql_string(const char *sql, const char *filename)
 {
+	script_error_callback_arg callback_arg;
+	ErrorContextCallback scripterrcontext;
 	List	   *raw_parsetree_list;
 	DestReceiver *dest;
 	ListCell   *lc1;
 
+	/*
+	 * Setup error traceback support for ereport().
+	 */
+	callback_arg.sql = sql;
+	callback_arg.filename = filename;
+	callback_arg.stmt_location = -1;
+	callback_arg.stmt_len = -1;
+
+	scripterrcontext.callback = script_error_callback;
+	scripterrcontext.arg = (void *) &callback_arg;
+	scripterrcontext.previous = error_context_stack;
+	error_context_stack = &scripterrcontext;
+
 	/*
 	 * Parse the SQL string into a list of raw parse trees.
 	 */
@@ -709,6 +787,10 @@ execute_sql_string(const char *sql)
 		List	   *stmt_list;
 		ListCell   *lc2;
 
+		/* Report location of this query for error context callback */
+		callback_arg.stmt_location = parsetre

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-09-27 Thread Masahiko Sawada
Hi,

On Sun, Aug 4, 2024 at 3:20 PM Sutou Kouhei  wrote:
>
> Hi,
>
> I re-ran the benchmark(*) with the v19 patch set and the
> following CPUs:
>
> 1. AMD Ryzen 9 3900X 12-Core Processor
> 2. Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>
> (*)
> * Use tables that have {5,10,15,20,25,30} integer columns
> * Use tables that have {1,2,3,4,5,6,7,8,9,10}M rows
> * Use '/dev/null' for COPY TO
> * Use blackhole_am for COPY FROM
>
> See the attached graphs for details.
>
> Notes:
> * X-axis is the number of columns
> * Y-axis is the number of M rows
> * Z-axis is the elapsed time percent (smaller is faster,
>   e.g. 99% is a bit faster than the HEAD and 101% is a bit
>   slower than the HEAD)
> * Z-ranges aren't same (The Ryzen case uses about 79%-121%
>   but the Intel case uses about 91%-111%)
> * Red means the patch is slower than HEAD
> * Blue means the patch is faster than HEAD
> * The upper row shows FROM results
> * The lower row shows TO results
>
> Here are summaries based on the results:
>
> For FROM:
> * With Ryzen: It shows that negative performance impact
> * With Intel: It shows that negative performance impact with
>   1-5M rows and positive performance impact with 6M-10M rows
> For TO:
> * With Ryzen: It shows that positive performance impact
> * With Intel: It shows that positive performance impact
>
> Here are insights based on the results:
>
> * 0001 (that introduces Copy{From,To}Routine} and adds some
>   "if () {...}" for them but the existing formats still
>   doesn't use them) has a bit negative performance impact
> * 0002 (that migrates the existing codes to
>   Copy{From,To}Routine} based implementations) has positive
>   performance impact
>   * For FROM: Negative impact by 0001 and positive impact by
> 0002 almost balanced
> * We should use both of 0001 and 0002 than only 0001
> * With Ryzon: It's a bit slower than HEAD. So we may not
>   want to reject this propose for FROM
> * With Intel:
>   * With 1-5M rows: It's a bit slower than HEAD
>   * With 6-10M rows: It's a bit faster than HEAD
>   * For TO: Positive impact by 0002 is larger than negative
> impact by 0002
> * We should use both of 0001 and 0002 than only 0001
> * 0003 (that makes Copy{From,To}Routine Node) has a bit
>   negative performance impact
>   * But I don't know why. This doesn't change per row
> related codes. Increasing Copy{From,To}Routine size
> (NodeTag is added) may be related.
> * 0004 (that moves Copy{From,To}StateData to copyapi.h)
>   doesn't have impact
>   * It makes sense because this doesn't change any
> implementations.
> * 0005 (that add "void *opaque" to Copy{From,To}StateData)
>   has a bit negative impact for FROM and a bit positive
>   impact for TO
>   * But I don't know why. This doesn't change per row
> related codes. Increasing Copy{From,To}StateData size
> ("void *opaque" is added) may be related.

I was surprised that the 0005 patch made COPY FROM slower (with fewer
rows) and COPY TO faster overall in spite of just adding one struct
field and some functions.

I'm interested in why the performance trends of COPY FROM are
different between fewer than 6M rows and more than 6M rows.

>
> How to proceed this proposal?
>
> * Do we need more numbers to judge this proposal?
>   * If so, could someone help us?
> * There is no negative performance impact for TO with both
>   of Ryzen and Intel based on my results. Can we merge only
>   the TO part?
>   * Can we defer the FROM part? Should we proceed this
> proposal with both of the FROM and TO part?
> * Could someone provide a hint why the FROM part is more
>   slower with Ryzen?
>

Separating the patches into two parts (one is for COPY TO and another
one is for COPY FROM) could be a good idea. It would help reviews and
investigate performance regression in COPY FROM cases. And I think we
can commit them separately.

Also, could you please rebase the patches as they conflict with the
current HEAD? I'll run some benchmarks on my environment as well.

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Add new COPY option REJECT_LIMIT

2024-09-27 Thread jian he
+/*
+ * Extract REJECT_LIMIT value from a DefElem.
+ */
+static int64
+defGetCopyRejectLimitOptions(DefElem *def)
+{
+ int64 reject_limit;
+
+ if (def->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("REJECT_LIMIT requires a positive integer")));
+
+ if (nodeTag(def->arg) == T_Integer)
+ {
+ reject_limit = defGetInt64(def);
+ if (reject_limit <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REJECT_LIMIT (%lld) must be greater than zero",
+ (long long) reject_limit)));
+ }
+ else
+ {
+ char   *sval = defGetString(def);
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REJECT_LIMIT (%s) must be a positive integer",
+ sval)));
+ }
+
+ return reject_limit;
+}
there will be an integer overflow issue?

Can you try the following?

static int64
defGetCopyRejectLimitOptions(DefElem *def)
{
int64reject_limit;
reject_limit = defGetInt64(def);
if (reject_limit <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("REJECT_LIMIT (%lld) must be greater than zero",
(long long) reject_limit)));
}




REJECT_LIMIT integer
i think, you want REJECT_LIMIT be bigint?
so here it should be
REJECT_LIMIT bigint\
?