date:20210714

Re: Remove repeated calls to PQserverVersion

2021-07-14 Thread Daniel Gustafsson

> On 14 Jul 2021, at 02:19, Alvaro Herrera  wrote:

> Personally, I like the simplicity of the function call in those places,

+1

--
Daniel Gustafsson   https://vmware.com/

Re: [PATCH] document pg_encoding_to_char() and pg_char_to_encoding()

2021-07-14 Thread Ian Lawrence Barwick

2021年7月14日(水) 14:43 Ian Lawrence Barwick :
>
> Hi
>
> The description for "pg_database" [1] mentions the function
> "pg_encoding_to_char()", but this is not described anywhere in the docs. Given
> that that it (and the corresponding "pg_char_to_encoding()") have been around
> since 7.0 [2], it's probably not a burning issue, but it seems not entirely
> unreasonable to add short descriptions for both (and link from "pg_conversion"
> while we're at it); see attached patch. "System Catalog Information Functions"
> seems the most logical place to put these.
>
> [1] https://www.postgresql.org/docs/current/catalog-pg-database.html
> [2] 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5eb1d0deb15f2b7cd0051bef12f3e091516c723b
>
> Will add to the next commitfest.

Added; apologies, the subject line of the original mail was
unintentionally truncated.

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-14 Thread Masahiko Sawada

On Tue, Jul 13, 2021 at 1:14 PM r.takahash...@fujitsu.com
 wrote:
>
> Hi Sawada-san,
>
>
> Thank you for your reply.
>
> > Not sure but it might be possible to keep holding an xlogreader for
> > reading PREPARE WAL records even after the transaction commit. But I
> > wonder how much open() for wal segment file accounts for the total
> > execution time of 2PC. 2PC requires 2 network round trips for each
> > participant. For example, if it took 500ms in total, we would not get
> > benefits much from the point of view of 2PC performance even if we
> > improved it from 14ms to 1ms.
>
> I made the patch based on your advice and re-run the test on the new machine.
> (The attached patch is just for test purpose.)

Thank you for testing!

>
>
> * foreign_twophase_commit = disabled
> 2686tps
>
> * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
> Ikeda-san said)
> 311tps
>
> * foreign_twophase_commit = required with attached patch (It is not necessary 
> to set -R ${RATE})
> 2057tps

Nice improvement!

BTW did you test on the local? That is, the foreign servers are
located on the same machine?

>
>
> This indicate that if we can reduce the number of times to open() wal segment 
> file during "COMMIT PREPARED", the performance can be improved.
>
> This patch can skip closing wal segment file, but I don't know when we should 
> close.
> One idea is to close when the wal segment file is recycled, but it seems 
> difficult for backend process to do so.

I guess it would be better to start a new thread for this improvement.
This idea helps not only 2PC case but also improves the
COMMIT/ROLLBACK PREPARED performance itself. Rather than thinking it
tied with this patch, I think it's good if we can discuss this patch
separately and it gets committed alone.

> BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is 
> proposed.
> I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', 
> ... 'prep_n'".
> If we can open wal segment file during bulk COMMIT PREPARED, we can not only 
> reduce the times of communication, but also reduce the times of open() wal 
> segment file.

What if we successfully committed 'prep_1' but an error happened
during committing another one for some reason (i.g., corrupted 2PC
state file, OOM etc)? We might return an error to the client but have
already committed 'prep_1'.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: psql - factor out echo code

2021-07-14 Thread Fabien COELHO




Attached v4 simplifies the format and fixes this one.


I think this goes way way overboard in terms of invasiveness. There's no 
need to identify individual call sites of PSQLexec. [...]


ISTM that having the information was useful for the user who actually 
asked for psql to show hidden queries, and pretty simple to get, although 
somehow invasive.



It also looks like a mess from the translatibility standpoint.
You can't expect "%s QUERY" to be a useful thing for translators.


Sure. Maybe I should have used an enum have a explicit switch in 
echoQuery, but I do not like writing this kind of code.


Attached a v5 without hinting at the origin of the query beyond internal 
or not.


--
Fabien.diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index d704c4220c..7e91e588cc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5112,18 +5112,9 @@ echo_hidden_command(const char *query)
 {
 	if (pset.echo_hidden != PSQL_ECHO_HIDDEN_OFF)
 	{
-		printf(_("* QUERY **\n"
- "%s\n"
- "**\n\n"), query);
-		fflush(stdout);
+		echoQuery(stdout, true, query);
 		if (pset.logfile)
-		{
-			fprintf(pset.logfile,
-	_("* QUERY **\n"
-	  "%s\n"
-	  "**\n\n"), query);
-			fflush(pset.logfile);
-		}
+			echoQuery(pset.logfile, true, query);
 
 		if (pset.echo_hidden == PSQL_ECHO_HIDDEN_NOEXEC)
 			return false;
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 5640786678..15dab3c543 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -523,6 +523,18 @@ PrintTiming(double elapsed_msec)
 		   elapsed_msec, days, (int) hours, (int) minutes, seconds);
 }
 
+/*
+ * Echo user query
+ */
+void
+echoQuery(FILE *out, bool is_internal, const char *query)
+{
+	if (is_internal)
+		fprintf(out, _("-- INTERNAL QUERY:\n%s\n\n"), query);
+	else
+		fprintf(out, _("-- QUERY:\n%s\n\n"), query);
+	fflush(out);
+}
 
 /*
  * PSQLexec
@@ -549,18 +561,9 @@ PSQLexec(const char *query)
 
 	if (pset.echo_hidden != PSQL_ECHO_HIDDEN_OFF)
 	{
-		printf(_("* QUERY **\n"
- "%s\n"
- "**\n\n"), query);
-		fflush(stdout);
+		echoQuery(stdout, true, query);
 		if (pset.logfile)
-		{
-			fprintf(pset.logfile,
-	_("* QUERY **\n"
-	  "%s\n"
-	  "**\n\n"), query);
-			fflush(pset.logfile);
-		}
+			echoQuery(pset.logfile, true, query);
 
 		if (pset.echo_hidden == PSQL_ECHO_HIDDEN_NOEXEC)
 			return NULL;
@@ -859,10 +862,7 @@ ExecQueryTuples(const PGresult *result)
  * assumes that MainLoop did that, so we have to do it here.
  */
 if (pset.echo == PSQL_ECHO_ALL && !pset.singlestep)
-{
-	puts(query);
-	fflush(stdout);
-}
+	echoQuery(stdout, false, query);
 
 if (!SendQuery(query))
 {
@@ -1229,13 +1229,7 @@ SendQuery(const char *query)
 	}
 
 	if (pset.logfile)
-	{
-		fprintf(pset.logfile,
-_("* QUERY **\n"
-  "%s\n"
-  "**\n\n"), query);
-		fflush(pset.logfile);
-	}
+		echoQuery(pset.logfile, false, query);
 
 	SetCancelConn(pset.db);
 
diff --git a/src/bin/psql/common.h b/src/bin/psql/common.h
index d8538a4e06..04ece4e9b1 100644
--- a/src/bin/psql/common.h
+++ b/src/bin/psql/common.h
@@ -31,6 +31,7 @@ extern void psql_setup_cancel_handler(void);
 extern PGresult *PSQLexec(const char *query);
 extern int	PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout);
 
+extern void echoQuery(FILE *out, bool is_internal, const char *query);
 extern bool SendQuery(const char *query);
 
 extern bool is_superuser(void);
diff --git a/src/test/modules/test_extensions/expected/test_extdepend.out b/src/test/modules/test_extensions/expected/test_extdepend.out
index 0b62015d18..5fd826ac56 100644
--- a/src/test/modules/test_extensions/expected/test_extdepend.out
+++ b/src/test/modules/test_extensions/expected/test_extdepend.out
@@ -29,24 +29,58 @@ SELECT * FROM test_extdep_commands;
 
 -- First, test that dependent objects go away when the extension is dropped.
 SELECT * FROM test_extdep_commands \gexec
+-- QUERY:
  CREATE SCHEMA test_ext
+
+-- QUERY:
  CREATE EXTENSION test_ext5 SCHEMA test_ext
+
+-- QUERY:
  SET search_path TO test_ext
+
+-- QUERY:
  CREATE TABLE a (a1 int)
 
+-- QUERY:
+
+
+-- QUERY:
  CREATE FUNCTION b() RETURNS TRIGGER LANGUAGE plpgsql AS
$$ BEGIN NEW.a1 := NEW.a1 + 42; RETURN NEW; END; $$
+
+-- QUERY:
  ALTER FUNCTION b() DEPENDS ON EXTENSION test_ext5
 
+-- QUERY:
+
+
+-- QUERY:
  CREATE TRIGGER c BEFORE INSERT ON a FOR EACH ROW EXECUTE PROCEDURE b()
+
+-- QUERY:
  ALTER TRIGGER c ON a DEPENDS ON EXTENSION test_ext5
 
+-- QUERY:
+
+
+-- QUERY:
  CREATE MATERIALIZED VIEW d AS SELECT * FROM a
+
+-- QUERY:
  ALTER MATERIALIZED VIEW d DEPENDS ON EXTENSION test_ext5
 
+-- QUERY:
+
+
+-- QUERY:
  CREATE INDEX e ON a (a1)
+
+-- QUERY:
  ALTER INDEX e DEPENDS ON EXTENSION test_ext5
+
+-- QUERY:

Re: Skipping logical replication transactions on subscriber side

2021-07-14 Thread Masahiko Sawada

On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila  wrote:
>
> On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada  
> wrote:
> >
> > On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila  wrote:
> > >
> > > On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky  
> > > wrote:
> > > >
> > > > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila  
> > > > wrote:
> > > >>
> > > >> >
> > > >> > Ok, looks nice. But I am curious how this will work in the case when 
> > > >> > there are two (or more) errors in the same subscription, but 
> > > >> > different relations?
> > > >> >
> > > >>
> > > >> We can't proceed unless the first error is resolved, so there
> > > >> shouldn't be multiple unresolved errors.
> > > >
> > > >
> > > > Ok. I thought multiple errors are possible when many tables are 
> > > > initialized using parallel workers (with 
> > > > max_sync_workers_per_subscription > 1).
> > > >
> > >
> > > Yeah, that is possible but that covers under the second condition
> > > mentioned by me and in such cases I think we should have separate rows
> > > for each tablesync. Is that right, Sawada-san or do you have something
> > > else in mind?
> >
> > Yeah, I agree to have separate rows for each table sync. The table
> > should not be processed by both the table sync worker and the apply
> > worker at a time so the pair of subscription OID and relation OID will
> > be unique. I think that we have a boolean column in the view,
> > indicating whether the error entry is reported by the table sync
> > worker or the apply worker, or maybe we also can have the action
> > column show "TABLE SYNC" if the error is reported by the table sync
> > worker.
> >
>
> Or similar to backend_type (text) in pg_stat_activity, we can have
> something like error_source (text) which will display apply worker or
> tablesync worker? I think if we have this column then even if there is
> a chance that both apply and sync worker operates on the same
> relation, we can identify it via this column.

Sounds good. I'll incorporate this in the next version patch that I'm
planning to submit this week.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [HACKERS] logical decoding of two-phase transactions

2021-07-14 Thread Peter Smith

On Wed, Jul 14, 2021 at 4:23 PM Amit Kapila  wrote:
>
> On Mon, Jul 12, 2021 at 9:14 AM Peter Smith  wrote:
> >
> > On Sun, Jul 11, 2021 at 8:20 PM Amit Kapila  wrote:
> > >
> > > On Fri, Jul 9, 2021 at 4:43 AM Peter Smith  wrote:
> > > >
> > > > > The patch looks good to me, I don't have any comments.
> > > >
> > > > I tried the v95-0001 patch.
> > > >
> > > > - The patch applied cleanly and all build / testing was OK.
> > > > - The documentation also builds OK.
> > > > - I checked all v95-0001 / v93-0001 differences and found no problems.
> > > > - Furthermore, I noted that v95-0001 patch is passing the cfbot [1].
> > > >
> > > > So this patch LGTM.
> > > >
> > >
> > > Thanks, I took another pass over it and made a few changes in docs and
> > > comments. I am planning to push this next week sometime (by 14th July)
> > > unless there are more comments from you or someone else. Just to
> > > summarize, this patch will add support for prepared transactions to
> > > built-in logical replication. To add support for streaming
> > > transactions at prepare time into the
> > > built-in logical replication, we need to do the following things: (a)
> > > Modify the output plugin (pgoutput) to implement the new two-phase API
> > > callbacks, by leveraging the extended replication protocol. (b) Modify
> > > the replication apply worker, to properly handle two-phase
> > > transactions by replaying them on prepare. (c) Add a new SUBSCRIPTION
> > > option "two_phase" to allow users to enable
> > > two-phase transactions. We enable the two_phase once the initial data
> > > sync is over. Refer to comments atop worker.c in the patch and commit
> > > message to see further details about this patch. After this patch,
> > > there is a follow-up patch to allow streaming and two-phase options
> > > together which I feel needs some more review and can be committed
> > > separately.
> > >
> >
> > FYI - I repeated the same verification of the v96-0001 patch as I did
> > previously for v95-0001
> >
> > - The v96 patch applied cleanly and all build / testing was OK.
> > - The documentation also builds OK.
> > - I checked the v95-0001 / v96-0001 differences and found no problems.
> > - Furthermore, I noted that v96-0001 patch is passing the cfbot.
> >
> > LGTM.
> >
>
> Pushed.
>
> Feel free to submit the remaining patches after rebase. Is it possible
> to post patches related to skipping empty transactions in the other
> thread [1] where that topic is being discussed?
>
> [1] - 
> https://www.postgresql.org/message-id/CAMkU%3D1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q%40mail.gmail.com
>


Please find attached the latest patch set v97*

* Rebased v94* to HEAD @ today.

This rebase was made necessary by the recent push of the first patch
from this set.

v94-0001 ==> already pushed [1]
v94-0002 ==> v97-0001
v94-0003 ==> will be relocated to other thread [2]
v94-0004 ==> this is omitted for now


[1] 
https://github.com/postgres/postgres/commit/a8fd13cab0ba815e9925dc9676e6309f699b5f72
[2] 
https://www.postgresql.org/message-id/CAMkU%3D1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v97-0001-Add-prepare-API-support-for-streaming-transactio.patch
Description: Binary data

Avoid repeated PQfnumber() in pg_dump

2021-07-14 Thread houzj.f...@fujitsu.com

Hi,

When reviewing some pg_dump related code, I found some existsing code
(getTableAttrs() and dumpEnumType()) invoke PQfnumber() repeatedly which seems
unnecessary.

Example
-
for (int j = 0; j < ntups; j++)
{
if (j + 1 != atoi(PQgetvalue(res, j, PQfnumber(res, 
"attnum"
-

Since PQfnumber() is not a cheap function, I think we'd better invoke
PQfnumber() out of the loop like the attatched patch.

After applying this change, I can see about 8% performance gain in my test 
environment
when dump table definitions which have many columns.

Best regards,
Hou zhijie


0001-Avoid-repeated-PQfnumber.patch
Description: 0001-Avoid-repeated-PQfnumber.patch

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-14 Thread Magnus Hagander

On Wed, Jul 14, 2021 at 6:36 AM Peter Eisentraut
 wrote:
>
> On 13.07.21 10:58, Magnus Hagander wrote:
> > Maybe "each distinct database id, each distinct user id, each distinct
> > query id, and whether it is a top level statement or not"?
> >
> > Or maybe "each distinct combination of database id, user id, query id
> > and whether it's a top level statement or not"?
>
> Okay, now I understand what is meant here.  The second one sounds good
> to me.

Thanks, will push a fix like that.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: row filtering for logical replication

2021-07-14 Thread Amit Kapila

On Wed, Jul 14, 2021 at 12:37 AM Tomas Vondra
 wrote:
>
> On 7/13/21 5:44 PM, Jeff Davis wrote:
> > On Tue, 2021-07-13 at 10:24 +0530, Amit Kapila wrote:
> > Also:
> >
> > * Andres also mentioned that the function should not leak memory.
> > * One use case for this feature is when sharding a table, so the
> > expression should allow things like "hashint8(x) between ...". I'd
> > really like to see this problem solved, as well.
> >
>
> I think built-in functions should be fine, because generally don't get
> dropped etc. (And if you drop built-in function, well - sorry.)
>

I am not sure if all built-in functions are also safe. I think we
can't allow volatile functions (ex. setval) that can update the
database which doesn't seem to be allowed in the historic snapshot.
Similarly, it might not be okay to invoke stable functions that access
the database as those might expect current snapshot. I think immutable
functions should be okay but that brings us to Jeff's question of can
we tie the marking of functions that can be used here with
IMMUTABLE/STABLE/VOLATILE designation? The UDFs might have a higher
risk that something used in those functions can be dropped but I guess
we can address that by using the current snapshot to access the
publication catalog.

> Not sure about the memory leaks - I suppose we'd free memory for each
> row, so this shouldn't be an issue I guess ...
>
> >> I think in the long run one idea to allow UDFs is probably by
> >> explicitly allowing users to specify whether the function is
> >> publication predicate safe and if so, then we can allow such
> >> functions
> >> in the filter clause.
> >
> > This sounds like a better direction. We probably need some kind of
> > catalog information here to say what functions/operators are "safe" for
> > this kind of purpose. There are a couple questions:
> >
>
> Not sure. It's true it's a bit like volatile/stable/immutable categories
> where we can't guarantee those labels are correct, and it's up to the
> user to keep the pieces if they pick the wrong category.
>
> But we can achieve the same goal by introducing a simple GUC called
> dangerous_allow_udf_in_decoding, I think.
>

One guc for all UDFs sounds dangerous.

-- 
With Regards,
Amit Kapila.

Re: Support kerberos authentication for postgres_fdw

2021-07-14 Thread Peifeng Qiu

Hi all.

I've come up with a proof-of-concept patch using the delegation/proxy approach.

Let's say we have two DB, one for FDW and one for the real server. When client
connects to FDW server using kerberos authentication, we can obtain a "proxy"
credential and store it in the global variable "MyProcPort->gss->proxy". This 
can
be then passed to gssapi calls during libpq kerberos setup when the foreign 
table
is queried.

This will mitigate the need for keytab file on FDW server. We will also have to
relax the password requirement for user mapping.

The big problem here is how to pass proxy credential from backend to libpq-fe
safely. Because libpq called in postgres_fdw is compiled as frontend binary, 
we'd
better not include any backend related stuff in libpq-fe.
In this patch I use a very ugly hack to work around this. First take pointer 
address
of the variable MyProcPort->gss->proxy, convert it to hex string, and then pass
it as libpq option "gss_proxy_cred". Any idea about how to do this in a more
elegant way?

Best regards,
Peifeng

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index c1b0cad453..2113fa8028 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -7,7 +7,8 @@ OBJS = \
 	deparse.o \
 	option.o \
 	postgres_fdw.o \
-	shippable.o
+	shippable.o \
+	gss_proxy.o
 PGFILEDESC = "postgres_fdw - foreign data wrapper for PostgreSQL"
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 82aa14a65d..e5e5f534ae 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -333,6 +333,9 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }
 
+extern char* get_gss_proxy_cred();
+extern bool has_gss_proxy_cred();
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -349,14 +352,16 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 		const char **keywords;
 		const char **values;
 		int			n;
+		char *gss_proxy_cred_addr;
 
 		/*
 		 * Construct connection params from generic options of ForeignServer
 		 * and UserMapping.  (Some of them might not be libpq options, in
-		 * which case we'll just waste a few array slots.)  Add 3 extra slots
-		 * for fallback_application_name, client_encoding, end marker.
+		 * which case we'll just waste a few array slots.)  Add 4 extra slots
+		 * for fallback_application_name, client_encoding, end marker,
+		 * gss_proxy_cred.
 		 */
-		n = list_length(server->options) + list_length(user->options) + 3;
+		n = list_length(server->options) + list_length(user->options) + 4;
 		keywords = (const char **) palloc(n * sizeof(char *));
 		values = (const char **) palloc(n * sizeof(char *));
 
@@ -376,6 +381,14 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 		values[n] = GetDatabaseEncodingName();
 		n++;
 
+		gss_proxy_cred_addr = get_gss_proxy_cred();
+		if (gss_proxy_cred_addr != NULL)
+		{
+			keywords[n] = "gss_proxy_cred";
+			values[n] = gss_proxy_cred_addr;
+			n++;
+		}
+
 		keywords[n] = values[n] = NULL;
 
 		/* verify the set of connection parameters */
@@ -476,6 +489,9 @@ UserMappingPasswordRequired(UserMapping *user)
 {
 	ListCell   *cell;
 
+	if (has_gss_proxy_cred())
+		return false;
+
 	foreach(cell, user->options)
 	{
 		DefElem*def = (DefElem *) lfirst(cell);
diff --git a/contrib/postgres_fdw/gss_proxy.c b/contrib/postgres_fdw/gss_proxy.c
new file mode 100644
index 00..1f016c54fc
--- /dev/null
+++ b/contrib/postgres_fdw/gss_proxy.c
@@ -0,0 +1,25 @@
+#include 
+
+#include "postgres.h"
+#include "miscadmin.h"
+#include "libpq/libpq-be.h"
+
+bool has_gss_proxy_cred()
+{
+	return MyProcPort->gss != NULL && MyProcPort->gss->proxy != NULL;
+}
+
+/* ugly hack to pass the gss proxy credential from backend Port to frontend libpq. */
+char* get_gss_proxy_cred()
+{
+	char* addr = NULL;
+	if (MyProcPort->gss)
+	{
+		if (MyProcPort->gss->proxy)
+		{
+			addr = palloc(sizeof(void*) + 3);
+			sprintf(addr, "%p", MyProcPort->gss->proxy);
+		}
+	}
+	return addr;
+}
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 8cc23ef7fb..2cb3e3b435 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -935,12 +935,17 @@ pg_GSS_recvauth(Port *port)
 	}
 
 	/*
-	 * We accept any service principal that's present in our keytab. This
-	 * increases interoperability between kerberos implementations that see
-	 * for example case sensitivity differently, while not really opening up
-	 * any vector of attack.
+	 * Acquire default credential with GSS_C_BOTH. Comprare to using
+	 * GSS_C_NO_CREDENTIAL, this allows us to also acquire a proxy
+	 * credential, which can be used in postgres_fdw as for delegation.
 	 */
-	port->gss->cred = GSS_C_NO_CREDENTIAL;
+	maj_stat = gss_acquire_cred(&min_stat, GSS_C_NO_NAME, 0,
+			NULL, GSS_C_BOTH, &port->gss->cre

Re: row filtering for logical replication

2021-07-14 Thread Amit Kapila

On Wed, Jul 14, 2021 at 12:45 AM Tomas Vondra
 wrote:
>
> On 7/13/21 12:57 PM, Amit Kapila wrote:
> > On Tue, Jul 13, 2021 at 10:24 AM Amit Kapila  
> > wrote:
>
> > I think the problem described by Petr[1] is also possible today if the
> > user drops the publication and there is a corresponding subscription,
> > basically, the system will stuck with error: "ERROR:  publication
> > "mypub" does not exist. I think allowing to use non-historic snapshots
> > just for publications will resolve that problem as well.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com
> >
>
> That seems like a completely different problem, TBH. For example the
> slot is dropped too, which means the WAL is likely gone etc.
>

I think if we can use WAL archive (if available) and re-create the
slot, the system should move but recreating the publication won't
allow the system to move.

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] Preserving param location

2021-07-14 Thread Julien Rouhaud

On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote:
>
> I thought for a bit about also having evaluate_expr() inject the
> exprLocation(expr) into its result Const, but in this example
> that would likely result in having the 43 labeled with the offset
> of the $1 (given exprLocation's penchant for preferring the leftmost
> token location).  That could be more confusing not less, especially
> when you consider that the variant case "1 + $1" would not be
> traceable to the Param at all.

Thanks for looking at this patch.

I agree that it probably doesn't solve all problems related to token location,
but arguably there can't be a perfect solution for all cases with our
infrastructure anyway.  Going a bit further the road:

=# PREPARE foo2(int, int) AS select $1 + $2;
=# EXPLAIN (VERBOSE) EXECUTE foo2(42, 42);
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 84
(2 rows)

As long as we store a single location for each token, there's no way we can
give a correct answer there, and I'm definitely not pushing for having an
array of locations.

> I also fear that this is still just the tip of the iceberg, in terms
> of when rewriting and planning preserve useful info about the source
> location of modified Nodes.  There's probably other places to touch
> if we want to have consistent results.
> 
> So I'm not really convinced that there's a fully-baked use case
> here, and would like more detail about how you hope to use the
> location value.

As I said I have no doubt that there are other cases which are currently not
correctly handled, but that's not what I'm trying to fix.  I just want to treat
parameterized queries the same way as regular queries, thus fixing one obvious
and frequent case (for which some user did complain), hoping that it will on
its own already help many users.

As mentioned in my first email, the use case is trying to deparse a
normalized query.  I'm working on pg_qualstats extension, which tracks quals
and a lot of associated information.  For example, the constant value,
the constant location but also whether the qual was part of an index scan or
not, the effective selectivity, the selectivity estimation error and so on.

So here is one over-simplified example on how to use that, relying on the
discussed patch:

=# CREATE TABLE IF NOT EXISTS pgqs(id integer, val text);
CREATE TABLE

=# PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2;
PREPARE

=# EXECUTE foo1(42, 'test');
 id

(0 rows)

=# SELECT query,
CASE WHEN constant_position IS NOT NULL THEN
substr(query, constant_position, 3)
ELSE NULL
END AS param,
constvalue, constant_position FROM pg_qualstats() pgqs
LEFT JOIN pg_stat_statements pgss USING (userid, dbid, queryid)
ORDER BY 4;
-[ RECORD 1 
]-+--
query | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= 
$1 AND val = $2
param |  $1
constvalue| 42::integer
constant_position | 58
-[ RECORD 2 
]-+--
query | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= 
$1 AND val = $2
param |  $2
constvalue| 'test'::text
constant_position | 71

which is enough to reliably regenerate the original statement.  With the
current code this is simply impossible.

The final goal is to try to help DBA to find problematic queries, and give
as much information as possible automatically.

One naive usage is to suggest indexes using the quals that are not part of an
index scan, if their selectivity is high enough.  In order to help validating
the resulting index suggestion, I can try to create hypothetical indexes and
test them using various sets of quals that are stored (the most filtering, the
least filtering, the most often used and such) to see if the indexes would be
used, and in which case.

For other use cases, just having the execution plans for the various sets of
quals can help to track down other problems.  For instance detecting if some
bad selectivity estimation for some qual leads to an inferior plan compared to
the same qual wihen it has a better selectivity estimate.

There are of course a lot of gotchas but if it can automatically provide
helpful information for 50% of the queries, that as much time saved that can be
spent actually investigating the performance issues rather than manually
copy/pasting queries and qual constants around.

In practice it doesn't work so bad, except when you're using extended protocol
or prepared statements.  In such cases, you drop from let's say 50% (wild
pessimistic guess) of the queries that can be preprocessed to a straight 0%.

Re: proposal: possibility to read dumped table's name from file

2021-07-14 Thread Tomas Vondra


On 7/14/21 2:18 AM, Stephen Frost wrote:

Greetings,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

On 2021-Jul-13, Stephen Frost wrote:

The simplest possible format isn't going to work with all the different
pg_dump options and it still isn't going to be 'simple' since it needs
to work with the flexibility that we have in what we support for object
names,


That's fine.  If people want a mechanism that allows changing the other
pg_dump options that are not related to object filtering, they can
implement a configuration file for that.


It's been said multiple times that people *do* want that and that they
want it to all be part of this one file, and specifically that they
don't want to end up with a file structure that actively works against
allowing other options to be added to it.



I have no problem believing some people want to be able to specify 
pg_dump parameters in a file, similarly to IMPDP/EXPDP parameter files 
etc. That seems useful, but I doubt they considered the case with many 
filter rules ... which is what "my people" want.


Not sure how keeping the filter rules in a separate file (which I assume 
is what you mean by "file structure"), with a format tailored for filter 
rules, works *actively* against adding options to the "main" config.


I'm not buying the argument that keeping some of the stuff in a separate 
file is an issue - plenty of established tools do that, the concept of 
"including" a config is not a radical new thing, and I don't expect we'd 
have many options supported by a file.


In any case, I think user input is important, but ultimately it's up to 
us to reconcile the conflicting requirements coming from various users 
and come up with a reasonable compromise design.



I don't know that the options that I suggested previously would
definitely work or not but they at least would allow other projects like
pgAdmin to leverage existing code for parsing and generating these
config files.


Keep in mind that this patch is not intended to help pgAdmin
specifically.  It would be great if pgAdmin uses the functionality
implemented here, but if they decide not to, that's not terrible.  They
have survived decades without a pg_dump configuration file; they still
can.


The adding of a config file for pg_dump should specifically be looking
at pgAdmin as the exact use-case for having such a capability.


There are several votes in this thread for pg_dump to gain functionality
to filter objects based on a simple specification -- particularly one
that can be written using shell pipelines.  This patch gives it.


And several votes for having a config file that supports, or at least
can support in the future, the various options which pg_dump supports-
and active voices against having a new file format that doesn't allow
for that.



IMHO the whole "problem" here stems from the question whether there 
should be a single universal pg_dump config file, containing everything 
including the filter rules. I'm of the opinion it's better to keep the 
filter rules separate, mainly because:


1) simplicity - Options (key/value) and filter rules (with more internal 
structure) seem quite different, and mixing them in the same file will 
just make the format more complex.


2) flexibility - Keeping the filter rules in a separate file makes it 
easier to reuse the same set of rules with different pg_dump configs, 
specified in (much smaller) config files.


So in principle, the "main" config could use e.g. TOML or whatever we 
find most suitable for this type of key/value config file (or we could 
just use the same format as for postgresql.conf et al). And the filter 
rules could use something as simple as CSV (yes, I know it's not great, 
but there's plenty of parsers, it handles multi-line strings etc.).




I'm not completely against inventing something new, but I'd really
prefer that we at least try to make something existing work first
before inventing something new that everyone is going to have to deal
with.


That was discussed upthread and led nowhere.


You're right- no one followed up on that.  Instead, one group continues
to push for 'simple' and to just accept what's been proposed, while
another group counters that we should be looking at the broader design
question and work towards a solution which will work for us down the
road, and not just right now.



I have quite thick skin, but I have to admit I rather dislike how this 
paints the people arguing for simplicity.


IMO simplicity is a perfectly legitimate (and desirable) design feature, 
and simpler solutions often fare better in the long run. Yes, we need to 
look at the broader design, no doubt about that.



One thing remains clear- there's no consensus here.



True.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: psql \copy from sends a lot of packets

2021-07-14 Thread Heikki Linnakangas


On 13/07/2021 14:52, Aleksander Alekseev wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

The patch was marked as the one that needs review and doesn't currently have
a reviewer, so I decided to take a look. The patch was tested on MacOS against
master `e0271d5f`. It works fine and doesn't seem to contradict the current
documentation.


Thanks for the review! I read through it myself one more time and 
spotted one bug: in interactive mode, the prompt was printed twice in 
the beginning of the operation. Fixed that, and pushed.


- Heikki

Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra





On 7/14/21 7:39 AM, Amit Kapila wrote:

On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:


On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:

1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE.  Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)



+1.


I thought about it but came to the conclusion that it doesn't worth it.  Even
with REPLICA IDENTITY FULL expression evaluates to false if the column allows
NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
because some row filter uses the column you want to remove from it.



Yeah, that is required but is it not feasible to do so?


2. For UPDATE, does the expression apply to the old tuple or to the new
tuple?  You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)

New tuple. The main reason is that new tuple is always there for UPDATEs.



I am not sure if that is a very good reason to use a new tuple.



True. Perhaps we should look at other places with similar concept of 
WHERE conditions and old/new rows, and try to be consistent with those?


I can think of:

1) updatable views with CHECK option

2) row-level security

3) triggers

Is there some reasonable rule which of the old/new tuples (or both) to 
use for the WHERE condition? Or maybe it'd be handy to allow referencing 
OLD/NEW as in triggers?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Extending amcheck to check toast size and compression

2021-07-14 Thread Heikki Linnakangas


@@ -30,6 +30,9 @@ PG_FUNCTION_INFO_V1(verify_heapam);
 /* The number of columns in tuples returned by verify_heapam */
 #define HEAPCHECK_RELATION_COLS 4
 
+/* The largest valid toast va_rawsize */

+#define VARLENA_SIZE_LIMIT 0x3FFF
+


Hmm, a toasted datum cannot be larger than MaxAllocSize, because it's 
reconstituted in a palloc'd datum, right?


- Heikki

Re: Added schema level support for publication.

2021-07-14 Thread vignesh C

On Tue, Jul 13, 2021 at 2:22 PM Greg Nancarrow  wrote:
>
> On Mon, Jul 12, 2021 at 7:24 PM vignesh C  wrote:
> >
> >
> > Thanks for reporting this issue. I felt this issue is the same as the
issue which Hou San had reported. This issue is fixed in the v10 patch
attached at [1].
> > [1] -
https://www.postgresql.org/message-id/CALDaNm2%2BtR%2B8R-sD1CSyMbZcZbkintZE-avefjsp7LCkm6HMmw%40mail.gmail.com
> >
>
> I did some testing and the issue that I reported does seem to be fixed
> by the v10 patch.
>
> I have some patch review comments for the v10 patch:
>
> (1)
>
> The following:
>
> + if (!OidIsValid(address.objectId))
> + {
> +if (!missing_ok)
> +   ereport(ERROR,
> +  (errcode(ERRCODE_UNDEFINED_OBJECT),
> +  errmsg("publication schema \"%s\" in publication \"%s\"
> does not exist",
> + schemaname, pubname)));
> +return address;
> + }
> +
> + return address;
>
> could just be simplified to:
>
> + if (!OidIsValid(address.objectId) && !missing_ok)
> + {
> +ereport(ERROR,
> +   (errcode(ERRCODE_UNDEFINED_OBJECT),
> +   errmsg("publication schema \"%s\" in publication \"%s\" does not
exist",
> +  schemaname, pubname)));
> + }
> +
> + return address;

Modified

> (2) src/backend/catalog/objectaddress.c
>
> I think there is a potential illegal memory access (psform->psnspcid)
> in the case of "!missing_ok", as the tuple is released from the cache
> on the previous line.
>
> + psform = (Form_pg_publication_schema) GETSTRUCT(tup);
> + pubname = get_publication_name(psform->pspubid, false);
> + nspname = get_namespace_name(psform->psnspcid);
> + if (!nspname)
> + {
> +pfree(pubname);
> +ReleaseSysCache(tup);
> +if (!missing_ok)
> +   elog(ERROR, "cache lookup failed for schema %u",
> +  psform->psnspcid);
> +break;
> + }
>
>
> I think this should be:
>
> + psform = (Form_pg_publication_schema) GETSTRUCT(tup);
> + pubname = get_publication_name(psform->pspubid, false);
> + nspname = get_namespace_name(psform->psnspcid);
> + if (!nspname)
> + {
> +Oid psnspcid = psform->psnspcid;
> +
> +pfree(pubname);
> +ReleaseSysCache(tup);
> +if (!missing_ok)
> +   elog(ERROR, "cache lookup failed for schema %u",
> +  psnspcid);
> +break;
> + }
>
> There are two cases of this that need correction (see: case
> OCLASS_PUBLICATION_SCHEMA).

Modified

> (3) Incomplete function header comment
>
> + * makeSchemaSpec - Create a SchemaSpec with the given type
>
> Should be:
>
> + * makeSchemaSpec - Create a SchemaSpec with the given type and location

Modified

> (4) src/bin/psql/describe.c
>
> Shouldn't the following comment say "version 15"?
>
> + /* Prior to version 14 check was based on all tables */
> + if ((has_pubtype && pubtype == PUBTYPE_TABLE) ||
> + (!has_pubtype && !puballtables))

Modified

> (5) typedefs.list
>
> I think you also need to add "Form_pg_publication_schema" to
typedefs.list.

Modified.

Thanks for the comments, these comments are fixed in the v11 patch posted
at [1].
[1] -
https://www.postgresql.org/message-id/CALDaNm1oZzaEsZC1W8MRNGZ6LWOayC54_UzyRV%2BnCh8w0yW74g%40mail.gmail.com

Regards,
Vignesh

Re: ERROR: "ft1" is of the wrong type.

2021-07-14 Thread Michael Paquier

On Fri, Jul 09, 2021 at 09:00:31PM +0900, Kyotaro Horiguchi wrote:
> Mmm. Ok, I distributed the mother regression test into each version.

Thanks, my apologies for the late reply.  It took me some time to
analyze the whole.

> PG11, 12:
>
>  - ATT_TABLE | ATT_PARTITIONED_INDEX
> 
>This test doesn't detect the "is of the wrong type" issue.
> 
>The item is practically a dead one since the combination is caught
>by transformPartitionCmd before visiting ATPrepCmd, which emits a
>bit different error message for the test.

Yes, I was surprised to see this test choke in the utility parsing.
There is a good argument in keeping (ATT_TABLE |
ATT_PARTITIONED_INDEX) though.  I analyzed the code and I agree that
it cannot be directly reached, but a future code change on those
branches may expose that.  And it does not really cost in keeping it
either.

> PG13:
>Of course this works fine but doesn't seem clean, but it is
>apparently a matter of the master branch.
> 
>  - ATT_TABLE | ATT_MATVIEW | ATT_INDEX | ATT_PARTITIONED_INDEX | 
> ATT_FOREIGN_TABLE
>Added and works as expected.

HEAD had its own improvements, and what you have here closes some
holes of their own, so applied.  Thanks!
--
Michael

signature.asc
Description: PGP signature

Re: row filtering for logical replication

2021-07-14 Thread Dilip Kumar

On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
 wrote:
>
>
>
> On 7/14/21 7:39 AM, Amit Kapila wrote:
> > On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:
> >>
> >> On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
> >>
> >> 1. if you use REPLICA IDENTITY FULL, then the expressions would work
> >> even if they use any other column with DELETE.  Maybe it would be
> >> reasonable to test for this in the code and raise an error if the
> >> expression requires a column that's not part of the replica identity.
> >> (But that could be relaxed if the publication does not publish
> >> updates/deletes.)
> >>
> >
> > +1.
> >
> >> I thought about it but came to the conclusion that it doesn't worth it.  
> >> Even
> >> with REPLICA IDENTITY FULL expression evaluates to false if the column 
> >> allows
> >> NULL values. Besides that REPLICA IDENTITY is changed via another DDL 
> >> (ALTER
> >> TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
> >> because some row filter uses the column you want to remove from it.
> >>
> >
> > Yeah, that is required but is it not feasible to do so?
> >
> >> 2. For UPDATE, does the expression apply to the old tuple or to the new
> >> tuple?  You say it's the new tuple, but from the user point of view I
> >> think it would make more sense that it would apply to the old tuple.
> >> (Of course, if you're thinking that the R.I. is the PK and the PK is
> >> never changed, then you don't really care which one it is, but I bet
> >> that some people would not like that assumption.)
> >>
> >> New tuple. The main reason is that new tuple is always there for UPDATEs.
> >>
> >
> > I am not sure if that is a very good reason to use a new tuple.
> >
>
> True. Perhaps we should look at other places with similar concept of
> WHERE conditions and old/new rows, and try to be consistent with those?
>
> I can think of:
>
> 1) updatable views with CHECK option
>
> 2) row-level security
>
> 3) triggers
>
> Is there some reasonable rule which of the old/new tuples (or both) to
> use for the WHERE condition? Or maybe it'd be handy to allow referencing
> OLD/NEW as in triggers?

I think for insert we are only allowing those rows to replicate which
are matching filter conditions, so if we updating any row then also we
should maintain that sanity right? That means at least on the NEW rows
we should apply the filter, IMHO.  Said that, now if there is any row
inserted which were satisfying the filter and replicated, if we update
it with the new value which is not satisfying the filter then it will
not be replicated,  I think that makes sense because if an insert is
not sending any row to a replica which is not satisfying the filter
then why update has to do that, right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: alter table set TABLE ACCESS METHOD

2021-07-14 Thread vignesh C

On Thu, Jun 10, 2021 at 1:01 AM Jeff Davis  wrote:
>
> On Wed, 2021-06-09 at 13:47 +0900, Michael Paquier wrote:
> > There is a mix of upper and lower-case characters here.  It could be
> > more consistent.  It seems to me that this test should actually check
> > that pg_class.relam reflects the new value.
>
> Done. I also added a (negative) test for changing the AM of a
> partitioned table, which wasn't caught before.
>
> > +errmsg("cannot have multiple SET ACCESS METHOD
> > subcommands")));
> > Worth adding a test?
>
> Done.
>
> > Nit: the name of the variable looks inconsistent with this comment.
> > The original is weird as well.
>
> Tried to improve wording.
>
> > I am wondering if it would be a good idea to set the new tablespace
> > and new access method fields to InvalidOid within
> > ATGetQueueEntry.  We
> > do that for the persistence.  Not critical at all, still..
>
> Done.
>
> > +   pass = AT_PASS_MISC;
> > Maybe add a comment here?
>
> Done. In that case, it doesn't matter because there's no work to be
> done in Phase 2.
>

There are few compilation issues:
tablecmds.c:4629:52: error: too few arguments to function call,
expected 3, have 2
ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW);
~~~ ^
tablecmds.c:402:1: note: 'ATSimplePermissions' declared here
static void ATSimplePermissions(AlterTableType cmdtype, Relation rel,
int allowed_targets);
^
tablecmds.c:5983:10: warning: enumeration value 'AT_SetAccessMethod'
not handled in switch [-Wswitch]
switch (cmdtype)
^
1 warning and 1 error generated.

Also few comments need to be addressed, based on that I'm changing the
status to "Waiting for Author".

Regards,
Vignesh

Re: Columns correlation and adaptive query optimization

2021-07-14 Thread vignesh C

On Fri, Mar 19, 2021 at 10:28 PM Konstantin Knizhnik
 wrote:
>
>
>
> On 19.03.2021 12:17, Yugo NAGATA wrote:
> > On Wed, 10 Mar 2021 03:00:25 +0100
> > Tomas Vondra  wrote:
> >
> >> What is being proposed here - an extension suggesting which statistics
> >> to create (and possibly creating them automatically) is certainly
> >> useful, but I'm not sure I'd call it "adaptive query optimization". I
> >> think "adaptive" means the extension directly modifies the estimates
> >> based on past executions. So I propose calling it maybe "statistics
> >> advisor" or something like that.
> > I am also agree with the idea to implement this feature as a new
> > extension for statistics advisor.
> >
> >> BTW Why is "qual" in
> >>
> >>static void
> >>AddMultiColumnStatisticsForQual(void* qual, ExplainState *es)
> >>
> >> declared as "void *"? Shouldn't that be "List *"?
> > When I tested this extension using TPC-H queries, it raised segmentation
> > fault in this function. I think the cause would be around this argument.
> >
> > Regards,
> > Yugo Nagata
> >
> Attached please find new version of the patch with
> AddMultiColumnStatisticsForQual parameter type fix and one more fix
> related with handling synthetic attributes.
> I can not reproduce the crash on TPC-H queries, so if the problem
> persists, can you please send me stack trace and may be some other
> information helping to understand the reason of SIGSEGV?
>


"C:\projects\postgresql\pgsql.sln" (default target) (1) ->
"C:\projects\postgresql\auto_explain.vcxproj" (default target) (45) ->
(ClCompile target) ->
contrib/auto_explain/auto_explain.c(658): error C2039: 'mt_plans' : is
not a member of 'ModifyTableState'
[C:\projects\postgresql\auto_explain.vcxproj]
contrib/auto_explain/auto_explain.c(659): error C2039: 'mt_nplans' :
is not a member of 'ModifyTableState'
[C:\projects\postgresql\auto_explain.vcxproj]
contrib/auto_explain/auto_explain.c(660): error C2198:
'AddMultiColumnStatisticsForMemberNodes' : too few arguments for call
[C:\projects\postgresql\auto_explain.vcxproj]
2 Warning(s)
3 Error(s)

Also Yugo Nagata's comments need to be addressed, I'm changing the
status to "Waiting for Author".

Regards,
Vignesh

Re: [PATCH] Add extra statistics to explain for Nested Loop

2021-07-14 Thread vignesh C

On Wed, Apr 14, 2021 at 4:57 PM  wrote:
>
> Thank you for working on this issue. Your comments helped me make this
> patch more correct.
>
> > Lines with "colon" format shouldn't use equal signs, and should use two
> > spaces
> > between fields.
> Done. Now extra line looks like "Loop min_rows: %.0f  max_rows: %.0f
> total_rows: %.0f" or "Loop min_time: %.3f  max_time: %.3f  min_rows:
> %.0f  max_rows: %.0f  total_rows: %.0f".
>
> > Since this is now on a separate line, the "if (nloops > 1 &&
> > es->verbose)"
> > can be after the existing "if (es->timing)", and shouldn't need its own
> > "if (es->timing)".  It should conditionally add a separate line, rather
> > than
> > duplicating the "(actual.*" line.
> >
> >> -if (es->timing)
> >> +if (nloops > 1 && es->verbose)
> New version of patch contains this correction. It helped make the patch
> shorter.
>
> > In non-text mode, think you should not check "nloops > 1".  Rather,
> > print the
> > field as 0.
> The fields will not be zeros. New line will almost repeat the line with
> main sttistics.
>
> > I think the labels in non-text format should say "Loop Min Time" or
> > similar.
> >
> > And these variables should have a loop_ prefix like loop_min_t ?
> There are good ideas. I changed it.
>
> I apply new version of this patch. I hope it got better.
> Please don't hesitate to share any thoughts on this topic.

The patch does not apply on Head, I'm changing the status to "Waiting
for Author":
1 out of 2 hunks FAILED -- saving rejects to file
src/test/regress/expected/partition_prune.out.rej
patching file src/test/regress/sql/partition_prune.sql
Hunk #1 FAILED at 467.
Hunk #2 succeeded at 654 (offset -3 lines).
1 out of 2 hunks FAILED -- saving rejects to file
src/test/regress/sql/partition_prune.sql.rej

Please post a new patch rebased on head.

Regards,
Vignesh

Re: row filtering for logical replication

2021-07-14 Thread Greg Nancarrow

On Wed, Jul 14, 2021 at 6:38 AM Euler Taveira  wrote:
>
> Peter, thanks for quickly check the new patch. I'm attaching a new patch (v19)
> that addresses (a) this new review, (b) Tomas' review and (c) Greg's review. I
> also included the copy/equal node support for the new node (PublicationTable)
> mentioned by Tomas in another email.
>

Some minor v19 patch review points you might consider for your next
patch version:
(I'm still considering the other issues raised about WHERE clauses and
filtering)


(1) src/backend/commands/publicationcmds.c
OpenTableList

Some suggested abbreviations:

BEFORE:
if (IsA(lfirst(lc), PublicationTable))
   whereclause = true;
else
   whereclause = false;

AFTER:
whereclause = IsA(lfirst(lc), PublicationTable);


BEFORE:
if (whereclause)
   pri->whereClause = t->whereClause;
else
   pri->whereClause = NULL;

AFTER:
pri->whereClause = whereclause? t->whereClause : NULL;


(2) src/backend/parser/parse_expr.c

I think that the check below:

/* Functions are not allowed in publication WHERE clauses */
if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE &&
nodeTag(expr) == T_FuncCall)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("functions are not allowed in publication WHERE expressions"),
parser_errposition(pstate, exprLocation(expr;

should be moved down into the "T_FuncCall" case of the switch
statement below it, so that "if (pstate->p_expr_kind ==
EXPR_KIND_PUBLICATION_WHERE" doesn't get checked every call to
transformExprRecurse() regardless of the expression Node type.


(3) Save a nanosecond when entry->exprstate is already NIL:

BEFORE:
if (entry->exprstate != NIL)
   list_free_deep(entry->exprstate);
entry->exprstate = NIL;

AFTER:
if (entry->exprstate != NIL)
{
   list_free_deep(entry->exprstate);
   entry->exprstate = NIL;
}


Regards,
Greg Nancarrow
Fujitsu Australia

Re: [PATCH] Automatic HASH and LIST partition creation

2021-07-14 Thread Pavel Borisov

>
> - I don't think it's a very good idea to support LIST and HASH but not
> RANGE. We need a design that can work for all three partitioning
> strategies, even if we don't have support for all of them in the
> initial patch. If they CAN all be in the same patch, so much the
> better.
>
> - I am not very impressed with the syntax. CONFIGURATION is an odd
> word that seems too generic for what we're talking about here. It
> would be tempting to use a connecting word like WITH or USING except
> that both would be ambiguous here, so we can't. MySQL and Oracle use
> the keyword PARTITIONS -- which I realize isn't a keyword at all in
> PostgreSQL right now -- to introduce the partition specification. DB2
> uses no keyword at all; it seems you just say PARTITION BY
> (mypartitioncol) (...partition specifications go here...). I think
> either approach could work for us. Avoiding the extra keyword is a
> plus, especially since I doubt we're likely to support the exact
> syntax that Oracle and MySQL offer anyway - though if we do, then I'd
> be in favor of inserting the PARTITIONS keyword so that people's SQL
> can work without modification.
>
> - We need to think a little bit about exactly what we're trying to do.
> The simplest imaginable thing here would be to just give people a
> place to put a bunch of partition specifications. So you can imagine
> letting someone say PARTITION BY HASH (FOR VALUES WITH (MODULUS 2,
> REMAINDER 0), FOR VALUES WITH (MODULUS 2, REMAINDER 1)). However, the
> patch quite rightly rejects that approach in favor of the theory that,
> at CREATE TABLE time, you're just going to want to give a modulus and
> have the system create one partition for every possible remainder. But
> that could be expressed even more compactly than what the patch does.
> Instead of saying PARTITION BY HASH CONFIGURATION (MODULUS 4) we could
> just let people say PARTITION BY HASH (4) or probably even PARTITION
> BY HASH 4.
>
> - For list partitioning, the patch falls back to just letting you put
> a bunch of VALUES IN clauses in the CREATE TABLE statement. I don't
> find something like PARTITION BY LIST CONFIGURATION (VALUES IN (1, 2),
> (1, 3)) to be particularly readable. What are all the extra keywords
> adding? We could just say PARTITION BY LIST ((1, 2), (1, 3)). I think
> I would find that easier to remember; not sure what other people
> think. As an alternative, PARTITION BY LIST VALUES IN (1, 2), (1, 3)
> looks workable, too.
>
> - What about range partitioning? This is an interesting case because
> while in theory you could leave gaps between range partitions, in
> practice people probably don't want to do that very often, and it
> might be better to have a simpler syntax that caters to the common
> case, since people can always create partitions individually if they
> happen to want gaps. So you can imagine making something like
> PARTITION BY RANGE ((MINVALUE), (42), (163)) mean create two
> partitions, one from (MINVALUE) to (42) and the other from (42) to
> (163). I think that would be pretty useful.
>
> - Another possible separating keyword here would be INITIALLY, which
> is already a parser keyword. So then you could have stuff like
> PARTITION BY HASH INITIALLY 4, PARTITION BY LIST INITIALLY ((1, 2),
> (1, 3)), PARTITION BY RANGE INITIALLY ((MINVALUE), (42), (163)).
>

Robert, I've read your considerations and I have a proposal to change the
syntax to make it like:

CREATE TABLE foo (bar text) PARTITION BY LIST (bar) PARTITIONS (('US'),
('UK', 'RU'));
CREATE TABLE foo (bar text) PARTITION BY LIST (bar) PARTITIONS
(foo_us('US'), foo_uk_ru('UK', 'RU'), { DEFAULT foo_dflt | AUTOMATIC });

CREATE TABLE foo (bar int) PARTITION BY HASH (bar) PARTITIONS (5);

CREATE TABLE foo (bar int) PARTITION BY RANGE (bar) PARTITIONS (FROM 1 TO
10 INTERVAL 2, { DEFAULT foo_dflt | AUTOMATIC });

- I think using partitions syntax without any keyword at all, is quite
different from the existing pseudo-english PostgreSQL syntax. Also, it will
need two consecutive brackets divided by nothing ()(). So I think it's better to use the
keyword PARTITIONS

- from the current patch it seems like a 'syntactic sugar' only but I don't
think it is being so. From a new syntaх proposal it's seen that it can
enable three options
(1) create a fixed set of partitions with everything else comes to the
default partition
(2) create a fixed set of partitions with everything else invokes error on
insert
(3) create a set of partitions with everything else invokes a new partition
creation based on a partition key (AUTOMATIC word). Like someone will be
able to do:
CREATE TABLE foo (a varchar) PARTITION BY LIST (SUBSTRING (a, 1, 1))
PARTITIONS (('a'),('b'),('c'));
INSERT INTO foo VALUES ("doctor"); // will automatically create partition
for 'd'
INSERT INTO foo VALUES ("dam"); // will come into partition 'd'

Option (3) is not yet implemented and sure it needs much care from DBA to
not end up with the each-row-separate-partition.

- Also with

Re: partial heap only tuples

2021-07-14 Thread vignesh C

On Tue, Mar 9, 2021 at 12:09 AM Bossart, Nathan  wrote:
>
> On 3/8/21, 10:16 AM, "Ibrar Ahmed"  wrote:
> > On Wed, Feb 24, 2021 at 3:22 AM Bossart, Nathan  wrote:
> >> On 2/10/21, 2:43 PM, "Bruce Momjian"  wrote:
> >>> I wonder if you should create a Postgres wiki page to document all of
> >>> this.  I agree PG 15 makes sense.  I would like to help with this if I
> >>> can.  I will need to study this email more later.
> >>
> >> I've started the wiki page for this:
> >>
> >>https://wiki.postgresql.org/wiki/Partial_Heap_Only_Tuples
> >>
> >> Nathan
> >
> > The regression test case  (partial-index) is failing
> >
> > https://cirrus-ci.com/task/5310522716323840
>
> This patch is intended as a proof-of-concept of some basic pieces of
> the project.  I'm working on a new patch set that should be more
> suitable for community review.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Write visibility map during CLUSTER/VACUUM FULL

2021-07-14 Thread vignesh C

On Mon, Jun 28, 2021 at 1:52 PM Anna Akenteva  wrote:
>
> On 2019-11-29 05:32, Michael Paquier wrote:
> > These comments are unanswered for more than 2 months, so I am marking
> > this entry as returned with feedback.
>
> I'd like to revive this patch. To make further work easier, attaching a
> rebased version of the latest patch by Alexander.
>
> As for having yet another copy of logic determining visibility: the
> visibility check was mainly taken from heap_page_is_all_visible(), so I
> refactored the code to make sure that logic is not duplicated. The
> updated patch is attached too.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: WIP: System Versioned Temporal Table

2021-07-14 Thread vignesh C

On Mon, Mar 8, 2021 at 11:04 PM Ibrar Ahmed  wrote:
>
>
>
> On Thu, Feb 25, 2021 at 3:28 PM Li Japin  wrote:
>>
>>
>> On Jan 27, 2021, at 12:39 AM, Surafel Temesgen  wrote:
>>
>>
>>
>> On Tue, Jan 26, 2021 at 2:33 PM Vik Fearing  wrote:
>>>
>>> I'm still in the weeds of reviewing this patch, but why should this
>>> fail?  It should not fail.
>>
>>
>> Attached is rebased patch that include isolation test
>>
>>
>> Thanks for updating the patch.  However it cannot apply to master 
>> (e5d8a9990).
>>
>> Here are some comments on system-versioning-temporal-table_2021_v13.patch.
>>
>> +
>> +When system versioning is specified two columns are added which
>> +record the start timestamp and end timestamp of each row verson.
>>
>> verson -> version
>>
>> +By default, the column names will be StartTime and EndTime, though
>> +you can specify different names if you choose.
>>
>> In fact, it is start_time and end_time, not StartTime and EndTime.
>> I think it's better to use  label around start_time and end_time.
>>
>> +column will be automatically added to the Primary Key of the
>> +table.
>>
>> Should we mention the unique constraints?
>>
>> +The system versioning period end column will be added to the
>> +Primary Key of the table as a way of ensuring that concurrent
>> +INSERTs conflict correctly.
>>
>> Same as above.
>>
>> Since the get_row_start_time_col_name() and get_row_end_time_col_name()
>> are similar, IMO we can pass a flag to get StartTime/EndTime column name,
>> thought?
>>
>> --
>> Regrads,
>> Japin Li.
>> ChengDu WenWu Information Technology Co.,Ltd.
>>
> The patch (system-versioning-temporal-table_2021_v13.patch) does not apply 
> successfully.
>
> http://cfbot.cputube.org/patch_32_2316.log
>
> Hunk #1 FAILED at 80.
> 1 out of 1 hunk FAILED -- saving rejects to file 
> src/test/regress/parallel_schedule.rej
> patching file src/test/regress/serial_schedule
> Hunk #1 succeeded at 126 (offset -1 lines).
>
>
> Therefore it is a minor change so I rebased the patch, please take a look at 
> that.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Enhance traceability of wal_level changes for backup management

2021-07-14 Thread vignesh C

On Thu, Jan 28, 2021 at 6:14 AM osumi.takami...@fujitsu.com
 wrote:
>
> Hello
>
>
> On Thursday, January 21, 2021 11:19 PM I wrote:
> > If no one disagree with it, I'll proceed with (1) below.
> >
> > (1) writing the time or LSN in the control file to indicate when/where 
> > wal_level
> > is changed to 'minimal'
> > from upper level to invalidate the old backups or make alerts to users.
> I attached the first patch which implementes this idea.
> It was aligned by pgindent and shows no regression.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: pg_rewind race condition just after promotion

2021-07-14 Thread Aleksander Alekseev

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

The v3 patch LGTM. I wonder if we should explicitly say in pg_rewind tests that
they _don't_ have to call `checkpoint`, or otherwise, we will lose the test
coverage for this scenario. But I don't have a strong opinion on this one.

The new status of this patch is: Ready for Committer

Re: Toast compression method options

2021-07-14 Thread vignesh C

On Thu, May 6, 2021 at 7:24 PM Dilip Kumar  wrote:
>
> On Mon, May 3, 2021 at 6:27 PM Dilip Kumar  wrote:
> >
> > We have already pushed the configurable lz4 toast compression code[1].
> > In the custom compression thread, we were already having the patch to
> > support the compression method options[2].  But the design for the
> > base patches was heavily modified before commit but I never rebased
> > this patch based on the new design.  Now, I have rebased this patch so
> > that we don't lose track and we can work on this for v15.  This is
> > still a WIP patch.
> >
> > For v15 I will work on improving the code and I will also work on
> > analyzing the usage of compression method options (compression
> > speed/ratio).
> >
> > [1] 
> > https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org
> > [2] 
> > https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com
> >
>
> I have fixed some comments offlist reported by Justin.  Apart from
> that, I have also improved documentation and test case.  Stil it has a
> lot of cleanup to be done but I am not planning to do that
> immediately.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2021-07-14 Thread vignesh C

On Sun, Sep 13, 2020 at 9:34 PM Nikolay Shaplov  wrote:
>
> В письме от понедельник, 20 июля 2020 г. 18:36:44 MSK пользователь Georgios
> Kokolatos написал:
>
> Hi! Sorry for really long delay, I was at my summer vacations, and then has
> urgent things to finish first. :-( Now I hope we can continue...
>
>
> > thank you for the patch. It applies cleanly, compiles and passes check,
> > check-world.
>
> Thank you for reviewing efforts.
>
> > I feel as per the discussion, this is a step to the right direction yet it
> > does not get far enough. From experience, I can confirm that dealing with
> > reloptions in a new table AM is somewhat of a pain. Ultimately, reloptions
> > should be handled by the table AM specific code. The current patch does not
> > address the issue. Yet it does make the issue easier to address by clearing
> > up the current state.
>
> Moving reloptions to AM code is the goal I am slowly moving to. I've started
> some time ago with big patch https://commitfest.postgresql.org/14/992/ and
> have been told to split it into smaller parts. So I did, and this patch is
> last step that cleans options related things up, and then actual moving can be
> done.
>
> > If you allow me, I have a couple of comments.
> >
> > - saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
> > -   
> >  HEAP_DEFAULT_FILLFACTOR);
> > + if (IsToastRelation(relation))
> > + saveFreeSpace = ToastGetTargetPageFreeSpace();
> > + else
> > + saveFreeSpace = HeapGetTargetPageFreeSpace(relation);
> >
> > For balance, it does make some sense for ToastGetTargetPageFreeSpace() to
> > get relation as an argument, similarly to HeapGetTargetPageFreeSpace().
>
> ToastGetTargetPageFreeSpace return a const value. I've change the code, so it
> gets relation argument, that is not used, the way you suggested. But I am not
> sure if it is good or bad idea. May be we will get some "Unused variable"
> warning on some compilers. I like consistency... But not sure we need it here.
>
> > -   /* Retrieve the parallel_workers reloption, or -1 if not set. */
> > -   rel->rel_parallel_workers = RelationGetParallelWorkers(relation,
> > -1);
>  +   /*
> > +* Retrieve the parallel_workers for heap and mat.view relations.
> > +* Use -1 if not set, or if we are dealing with other relation
> > kinds
>  +*/
> > +   if (relation->rd_rel->relkind == RELKIND_RELATION ||
> > +   relation->rd_rel->relkind == RELKIND_MATVIEW)
> > +   rel->rel_parallel_workers =
> > RelationGetParallelWorkers(relation, -1);
>  +   else
> > +   rel->rel_parallel_workers = -1;
> > Also, this pattern is repeated in four places, maybe the branch can be
> > moved inside a macro or static inline instead?
>
> > If the comment above is agreed upon, then it makes a bit of sense to apply
> > the same here. The expression in the branch is already asserted for in
> > macro, why not switch there and remove the responsibility from the caller?
>
> I guess here you are right, because here the logic is following: for heap
> relation take option from options, for _all_ others use -1. This can be moved
> to macro.
>
> So I changed it to
>
> /*
>  * HeapGetParallelWorkers
>  *  Returns the heap's parallel_workers reloption setting.
>  *  Note multiple eval of argument!
>  */
> #define HeapGetParallelWorkers(relation, defaultpw) \
> (AssertMacro(relation->rd_rel->relkind == RELKIND_RELATION ||   \
>  relation->rd_rel->relkind == RELKIND_MATVIEW), \
>  (relation)->rd_options ?   \
>   ((HeapOptions *) (relation)->rd_options)->parallel_workers :  \
> (defaultpw))
>
> /*
>  * RelationGetParallelWorkers
>  *  Returns the relation's parallel_workers reloption setting.
>  *  Note multiple eval of argument!
>  */
>
> #define RelationGetParallelWorkers(relation, defaultpw) \
> (((relation)->rd_rel->relkind == RELKIND_RELATION ||\
>   (relation)->rd_rel->relkind == RELKIND_MATVIEW) ? \
>   HeapGetParallelWorkers(relation, defaultpw) : defaultpw)
>
>
> But I would not like to move
>
> if (IsToastRelation(relation))
> saveFreeSpace = ToastGetTargetPageFreeSpace(relation);
> else
> saveFreeSpace = HeapGetTargetPageFreeSpace(relation);
>
> into macros, as there is a choice only between heap and toast. All other
> relation types are not mentioned.
>
> So we can not call it RelationGetTargetPageFreeSpace. It would be
> ToastOrHeapGetTargetPageFreeSpace actually. Better not to have such macro.
>
> Please find new version of the patch in the attachment.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Au

Re: Polyphase merge is obsolete

2021-07-14 Thread vignesh C

On Sat, Jan 23, 2021 at 3:49 AM Heikki Linnakangas  wrote:
>
> On 22/10/2020 14:48, Heikki Linnakangas wrote:
> > On 11/09/2017 13:37, Tomas Vondra wrote:
> >> I planned to do some benchmarking on this patch, but apparently the
> >> patch no longer applies. Rebase please?
> >
> > Here's a rebase of this. Sorry to keep you waiting :-).
>
> Here's an updated version that fixes one bug:
>
> The CFBot was reporting a failure on the FreeBSD system [1]. It turned
> out to be an out-of-memory issue caused by an underflow bug in the
> calculation of the size of the tape read buffer size. With a small
> work_mem size, the memory left for tape buffers was negative, and that
> wrapped around to a very large number. I believe that was not caught by
> the other systems, because the other ones had enough memory for the
> incorrectly-sized buffers anyway. That was the case on my laptop at
> least. It did cause a big slowdown in the 'tuplesort' regression test
> though, which I hadn't noticed.
>
> The fix for that bug is here as a separate patch for easier review, but
> I'll squash it before committing.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: ResourceOwner refactoring

2021-07-14 Thread vignesh C

On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas  wrote:
>
> On 08/03/2021 18:47, Ibrar Ahmed wrote:
> > The patchset does not apply successfully, there are some hunk failures.
> >
> > http://cfbot.cputube.org/patch_32_2834.log
> > 
> >
> > v6-0002-Make-resowners-more-easily-extensible.patch
> >
> > 1 out of 6 hunks FAILED -- saving rejects to file
> > src/backend/utils/cache/plancache.c.rej
> > 2 out of 15 hunks FAILED -- saving rejects to file
> > src/backend/utils/resowner/resowner.c.rej
> >
> >
> > Can we get a rebase?
>
> Here you go.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

SI messages sent when excuting ROLLBACK PREPARED command

2021-07-14 Thread liuhuail...@fujitsu.com

Hi

When reading the code FinishPreparedTransaction, I found that SI messages are 
sent 
when executing ROLLBACK PREPARED command.

But according to AtEOXact_Inval function, we send the SI messages only when 
committing the transaction .

So, I think we needn't send SI messags when rollbacking the two-phase 
transaction.
Or Does it has something special because of two-phase transaction?


Regards

Re: logical replication empty transactions

2021-07-14 Thread Ajin Cherian

On Thu, May 27, 2021 at 8:58 PM vignesh C  wrote:

> Thanks for the updated patch, few comments:
> 1) I'm not sure if we could add some tests for skip empty
> transactions, if possible add a few tests.
>
Added a few tests for prepared transactions as well as the existing
test in 020_messages.pl also tests regular transactions.

> 2) We could add some debug level log messages for the transaction that
> will be skipped.

Added.

>
> 3) You could keep this variable below the other bool variables in the 
> structure:
> +   boolsent_begin_txn; /* flag indicating whether begin
> +
>   * has already been sent */
> +

I've moved this variable around, so this comment no longer is valid.

>
> 4) You can split the comments to multi-line as it exceeds 80 chars
> +   /* output BEGIN if we haven't yet, avoid for streaming and
> non-transactional messages */
> +   if (!data->sent_begin_txn && !in_streaming && transactional)
> +   pgoutput_begin(ctx, txn);

Done.

I've had to rebase the patch after a recent commit by Amit Kapila of
supporting two-phase commits in pub-sub [1].
Also I've modified the patch to also skip replicating empty prepared
transactions. Do let me know if you have any comments.

regards,
Ajin Cherian
Fujitsu Australia
[1]- 
https://www.postgresql.org/message-id/CAHut+PueG6u3vwG8DU=JhJiWa2TwmZ=bdqpchzkbky7ykza...@mail.gmail.com


v7-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data

Re: 回复：Re: Cache relation sizes?

2021-07-14 Thread Anastasia Lubennikova

On Wed, Jun 16, 2021 at 9:24 AM Thomas Munro  wrote:

> No change yet, just posting a rebase to keep cfbot happy.
>
>
Hi, Thomas

I think that the proposed feature is pretty cool not only because it fixes
some old issues with lseek() performance and reliability, but also because
it opens the door to some new features, such as disk size quota management
[1]. Cheap up-to-date size tracking is one of the major problems for it.

I've only read the 1st patch so far. Here are some thoughts about it:

- SMgrSharedRelation - what about calling it SMgrShmemRelation. It looks
weird too, but "...SharedRelation" makes me think of shared catalog tables.

-  We can exclude temp relations from consideration as they are never
shared and use RelFileNode instead of RelFileNodeBackend in
SMgrSharedRelationMapping.
Not only it will save us some space, but also it will help to avoid a
situation when temp relations occupy whole cache (which may easily happen
in some workloads).
I assume you were trying to make a generic solution, but in practice, temp
rels differ from regular ones and may require different optimizations.
Local caching will be enough for them if ever needed, for example, we can
leave smgr_cached_nblocks[] for temp relations.

>  int smgr_shared_relations = 1000;
- How bad would it be to keep cache for all relations?  Let's test memory
overhead on some realistic datasets. I suppose this 1000 default was chosen
just for WIP patch.
I wonder if we can use DSM instead of regular shared memory?

- I'd expect that CREATE DATABASE and ALTER DATABASE SET TABLESPACE require
special treatment, because they bypass smgr, just like dropdb. Don't see it
in the patch.

[1] https://github.com/greenplum-db/diskquota

-- 
Best regards,
Lubennikova Anastasia

Re: row filtering for logical replication

2021-07-14 Thread Amit Kapila

On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
 wrote:
>
> On 7/14/21 7:39 AM, Amit Kapila wrote:
> > On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:
> >>
> >> On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
> >>
> >> 1. if you use REPLICA IDENTITY FULL, then the expressions would work
> >> even if they use any other column with DELETE.  Maybe it would be
> >> reasonable to test for this in the code and raise an error if the
> >> expression requires a column that's not part of the replica identity.
> >> (But that could be relaxed if the publication does not publish
> >> updates/deletes.)
> >>
> >
> > +1.
> >
> >> I thought about it but came to the conclusion that it doesn't worth it.  
> >> Even
> >> with REPLICA IDENTITY FULL expression evaluates to false if the column 
> >> allows
> >> NULL values. Besides that REPLICA IDENTITY is changed via another DDL 
> >> (ALTER
> >> TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
> >> because some row filter uses the column you want to remove from it.
> >>
> >
> > Yeah, that is required but is it not feasible to do so?
> >
> >> 2. For UPDATE, does the expression apply to the old tuple or to the new
> >> tuple?  You say it's the new tuple, but from the user point of view I
> >> think it would make more sense that it would apply to the old tuple.
> >> (Of course, if you're thinking that the R.I. is the PK and the PK is
> >> never changed, then you don't really care which one it is, but I bet
> >> that some people would not like that assumption.)
> >>
> >> New tuple. The main reason is that new tuple is always there for UPDATEs.
> >>
> >
> > I am not sure if that is a very good reason to use a new tuple.
> >
>
> True. Perhaps we should look at other places with similar concept of
> WHERE conditions and old/new rows, and try to be consistent with those?
>
> I can think of:
>
> 1) updatable views with CHECK option
>
> 2) row-level security
>
> 3) triggers
>
> Is there some reasonable rule which of the old/new tuples (or both) to
> use for the WHERE condition? Or maybe it'd be handy to allow referencing
> OLD/NEW as in triggers?
>

I think apart from the above, it might be good if we can find what
some other databases does in this regard?

-- 
With Regards,
Amit Kapila.

RE: Added schema level support for publication.

2021-07-14 Thread houzj.f...@fujitsu.com

Wednesday, July 14, 2021 6:17 PM vignesh C  wrote:
> On Tue, Jul 13, 2021 at 12:06 PM tanghy.f...@fujitsu.com
>  wrote:
> >
> > On Monday, July 12, 2021 5:36 PM vignesh C  wrote:
> > >
> > > Thanks for reporting this issue, this issue is fixed in the v10
> > > patch attached at [1].
> > > [1] - https://www.postgresql.org/message-id/CALDaNm2%2BtR%2B8R-
> > > sD1CSyMbZcZbkintZE-avefjsp7LCkm6HMmw%40mail.gmail.com
> >
> > Thanks for fixing it.
> >
> > By applying your V10 patch, I saw three problems, please have a look.
> >
> > 1. An issue about pg_dump.
> > When public schema was published, the publication was created in the
> > output file, but public schema was not added to it. (Other schemas
> > could be added as expected.)
> >
> > I looked into it and found that selectDumpableNamespace function marks
> DUMP_COMPONENT_DEFINITION as needless when the schema is public,
> leading to schema public is ignored in getPublicationSchemas. So we'd better
> check whether schemas should be dumped in another way.
> >
> > I tried to fix it with the following change, please have a look.
> > (Maybe we also need to add some comments for it.)
> >
> > diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
> > index f6b4f12648..a327d2568b 100644
> > --- a/src/bin/pg_dump/pg_dump.c
> > +++ b/src/bin/pg_dump/pg_dump.c
> > @@ -4206,7 +4206,8 @@ getPublicationSchemas(Archive *fout,
> NamespaceInfo nspinfo[], int numSchemas)
> >  * Ignore publication membership of schemas whose
> definitions are not
> >  * to be dumped.
> >  */
> > -   if (!(nsinfo->dobj.dump &
> DUMP_COMPONENT_DEFINITION))
> > +   if (!((nsinfo->dobj.dump &
> DUMP_COMPONENT_DEFINITION)
> > +   || (strcmp(nsinfo->dobj.name, "public") == 0
> > + && nsinfo->dobj.dump != DUMP_COMPONENT_NONE)))
> > continue;
> >
> > pg_log_info("reading publication membership for schema
> > \"%s\"",
> 
> I felt it is intentionally done like that as the pubic schema is created by 
> default,
> hence it is not required to dump else we will get errors while restoring.
> Thougths?

Thanks for the new patches and I also looked at this issue.

For user defined schema and publication:
--
create schema s1;
create publication pub2 for SCHEMA s1;
--

pg_dump will only generate the following SQLs:

--pg_dump result--
CREATE PUBLICATION pub2 WITH (publish = 'insert, update, delete, truncate');
ALTER PUBLICATION pub2 ADD SCHEMA s1;
--

But for the public schema:
--
create publication pub for SCHEMA public;
--

pg_dump will only generate the following SQL:

--pg_dump result--
CREATE PUBLICATION pub WITH (publish = 'insert, update, delete, truncate');
--

It didn't generate SQL like "ALTER PUBLICATION pub ADD SCHEMA public;" which
means the public schema won't be published after restoring. So, I think we'd
better let the pg_dump generate the ADD SCHEMA public SQL. Thoughts ?


Best regards,
Hou zhijie

Re: [PATCH] Hooks at XactCommand level

2021-07-14 Thread Gilles Darold


Hi,


I have renamed the patch and the title of this proposal registered in 
the commitfest "Xact/SubXact event callback at command start" to reflect 
the last changes that do not include new hooks anymore.



Here is the new description corresponding to the current patch.


This patch allow to execute user-defined code for the start of any 
command through a xact registered callback. It introduce two new events 
in XactEvent and SubXactEvent enum called respectively 
XACT_EVENT_COMMAND_START and SUBXACT_EVENT_COMMAND_START. The callback 
is not called if a transaction is not started.



The objective of this new callback is to be able to call user-defined 
code before any new statement is executed. For example it can call a 
rollback to savepoint if there was an error in the previous transaction 
statement, which allow to implements Rollback at Statement Level at 
server side using a PostgreSQL extension, see [1] .



The patch compile and regressions tests with assert enabled passed 
successfully.


There is no regression test for this feature but extension at [1] has 
been used for validation of the new callback.



The patch adds insignificant overhead by looking at an existing callback 
definition but clearly it is the responsibility to the developer to 
evaluate the performances impact of its user-defined code for this 
callback as it will be called before each statement. Here is a very 
simple test using pgbench -c 20 -j 8 -T 30


    tps = 669.930274 (without user-defined code)
    tps = 640.718361 (with user-defined code from extension [1])

the overhead for this extension is around 4.5% which I think is not so 
bad good for such feature (internally it adds calls to RELEASE + 
SAVEPOINT before each write statement execution and in case of error a 
ROLLBACK TO savepoint).



[1] https://github.com/darold/pg_statement_rollbackv2

--
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..3b5f6bfc2d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -6118,3 +6118,32 @@ MarkSubTransactionAssigned(void)
 
 	CurrentTransactionState->assigned = true;
 }
+
+/*
+ * CallXactStartCommand
+ *
+ * Wrapper over CallXactCallbacks or CallSubXactCallbacks called in postgres.c
+ * at end of start_xact_command(). It allows to user-defined code to be executed
+ * for the start of any command through a xact registered callback. This function
+ * do nothing if a transaction is not started.
+ *
+ * The related events XactEvent/SubXactEvent are XACT_EVENT_COMMAND_START and
+ * SUBXACT_EVENT_COMMAND_START.
+ */
+void
+CallXactStartCommand(void)
+{
+	TransactionState s = CurrentTransactionState;
+
+	if (s->blockState == TBLOCK_DEFAULT || s->blockState == TBLOCK_STARTED)
+		return;
+
+	/*
+	 * Call start-of-xact callbacks with start command event
+	 */
+	if (s->parent && s->subTransactionId)
+		CallSubXactCallbacks(SUBXACT_EVENT_COMMAND_START, s->subTransactionId,
+			 s->parent->subTransactionId);
+	else
+		CallXactCallbacks(XACT_EVENT_COMMAND_START);
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..a0f4a17c51 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2708,6 +2708,16 @@ start_xact_command(void)
 		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
 		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
 			 client_connection_check_interval);
+
+	/*
+	 * Instruct registered callbacks on xact that a new command is to be
+	 * executed. It allows to execute user-defined code before any new
+	 * statement is executed. This is the responsability of the user code
+	 * to take care of the state of the transaction, it can be in an ABORT
+	 * state. The related xact events are XACT_EVENT_COMMAND_START and
+	 * SUBXACT_EVENT_COMMAND_START.
+	 */
+	CallXactStartCommand();
 }
 
 static void
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 134f6862da..190fc7151b 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -119,7 +119,8 @@ typedef enum
 	XACT_EVENT_PREPARE,
 	XACT_EVENT_PRE_COMMIT,
 	XACT_EVENT_PARALLEL_PRE_COMMIT,
-	XACT_EVENT_PRE_PREPARE
+	XACT_EVENT_PRE_PREPARE,
+	XACT_EVENT_COMMAND_START
 } XactEvent;
 
 typedef void (*XactCallback) (XactEvent event, void *arg);
@@ -129,7 +130,8 @@ typedef enum
 	SUBXACT_EVENT_START_SUB,
 	SUBXACT_EVENT_COMMIT_SUB,
 	SUBXACT_EVENT_ABORT_SUB,
-	SUBXACT_EVENT_PRE_COMMIT_SUB
+	SUBXACT_EVENT_PRE_COMMIT_SUB,
+	SUBXACT_EVENT_COMMAND_START
 } SubXactEvent;
 
 typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
@@ -467,4 +469,6 @@ extern void EnterParallelMode(void);
 extern void ExitParallelMode(void);
 extern bool IsInParallelMode(void);
 
+extern void CallXactStartCommand(void);
+
 #endif			/* XACT_H */

Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-07-14 Thread Peter Eisentraut




On 05.01.21 22:40, Paul Martinez wrote:

I've created a patch to better support referential integrity constraints when
using composite primary and foreign keys. This patch allows creating a foreign
key using the syntax:

   FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)

which means that only the fk_id column will be set to NULL when the referenced
row is deleted, rather than both the tenant_id and fk_id columns.


I think this is an interesting feature with a legitimate use case.

I'm wondering a bit about what the ON UPDATE side of this is supposed to 
mean.  Consider your example:



CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
   tenant_id int REFERENCES tenants ON DELETE CASCADE,
   id serial,
   PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
 tenant_id int REFERENCES tenants ON DELETE CASCADE,
 id serial,
 author_id int,
 PRIMARY KEY (tenant_id, id),
 FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
);

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, null).


Consider what should happen when you update users.id.  Per SQL standard, 
for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the 
referencing column that corresponds to the referenced column actually 
updated, not all of them.  PostgreSQL doesn't do this, but if it did, 
then this would work just fine.


Your feature requires specifying a fixed column or columns to update, so 
it cannot react differently to what column actually updated.  In fact, 
you might want different referential actions depending on what columns 
are updated, like what you can do with general triggers.


So, unless I'm missing an angle here, I would suggest leaving out the ON 
UPDATE variant of this feature.

Re: Toast compression method options

2021-07-14 Thread Dilip Kumar

On Tue, Jun 22, 2021 at 1:37 PM Michael Paquier  wrote:
>
> On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:
> > IMHO there is certainly a use case, basically, if we compress the data
> > so that we can avoid storing it externally.  Now suppose for some
> > data, with default LZ4 compression, the compression ratio is so high
> > that you are able to compress to the size which is way under the
> > limit.  So for such data, the acceleration can be increased such that
> > compression is fast and compression ratio is good enough that it is
> > not going to the external storage.  I agree it will be difficult for
> > the user to make such a decision and select the acceleration value but
> > based on the data pattern and their compressed length the admin can
> > make such a decision.  So in short select the acceleration value such
> > that you can compress it fast and the compression ratio is good enough
> > to keep it from storing externally.
>
> Theoritically, I agree that there could be a use case, and that was
> the point I was trying to outline above.  My point is more from a
> practical point of view.  LZ4 is designed to be fast and cheap in CPU
> with a rather low compression ratio compared to other modern algos.
>
> Could it be possible to think about some worst cases where one may
> want to reduce its compression to save some CPU?  The point, as you
> say, to allow a tuning of the acceleration would be that one may want
> to save a bit of CPU and does not care about the extra disk space it
> takes.  Still, I am wondering why one would not just store the values
> externally in such cases and just save as much compression effort as
> possible.

Well, that actually depends upon the data, basically, LZ4 acceleration
searches in wider increments, which may reduce the number of potential
matches but increase the speed.  So based on the actual data pattern
it is highly possible that you get the speed benefit without losing
much or nothing on the compression ratio.  So IMHO, this is user
exposed option so based on the user's data pattern why it is not wise
to provide an option for the user to give the acceration when the user
is sure that selecting a better speed will not harm anything on
compression ratio for their data pattern.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2021-07-14 Thread Alvaro Herrera

On 2021-Jul-14, Dilip Kumar wrote:

> I think for insert we are only allowing those rows to replicate which
> are matching filter conditions, so if we updating any row then also we
> should maintain that sanity right? That means at least on the NEW rows
> we should apply the filter, IMHO.  Said that, now if there is any row
> inserted which were satisfying the filter and replicated, if we update
> it with the new value which is not satisfying the filter then it will
> not be replicated,  I think that makes sense because if an insert is
> not sending any row to a replica which is not satisfying the filter
> then why update has to do that, right?

Right, that's a good aspect to think about.

I think the guiding principle for which tuple to use for the filter is
what is most useful to the potential user of the feature, rather than
what is the easiest to implement.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Álvarez)

Re: Unused function parameter in get_qual_from_partbound()

2021-07-14 Thread John Naylor

On Mon, Jul 12, 2021 at 8:46 AM John Naylor 
wrote:
>
> On Tue, Jun 8, 2021 at 10:50 PM Michael Paquier 
wrote:
> >
> > At first glance, this looked to me like breaking something just for
> > sake of breaking it, but removing the rel argument could be helpful
> > to simplify any external code calling it as there would be no need for
> > this extra Relation.  So that looks like a good idea, no need to rush
> > that into 14 though.
>
> I found no external references in codesearch.debian.net. I plan to commit
this in the next couple of days unless there are objections.

This has been committed.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: ResourceOwner refactoring

2021-07-14 Thread Alvaro Herrera

On 2021-Jul-14, vignesh C wrote:

> On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas  wrote:

> > Here you go.
> 
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Support for hmac was added by e6bdfd9700eb so the rebase is not trivial.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)

Re: Introduce pg_receivewal gzip compression tests

2021-07-14 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Wednesday, July 14th, 2021 at 04:17, Michael Paquier  
wrote:

> On Tue, Jul 13, 2021 at 11:16:06AM +, gkokola...@pm.me wrote:
> > Agreed. For the record that is why I said v6 :)
> Okay, thanks.

Please find v6 attached.

Cheers,
//Georgios

> ---
>
> Michael

v6-0001-Introduce-pg_receivewal-gzip-compression-tests.patch
Description: Binary data

Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra





On 7/14/21 4:01 PM, Alvaro Herrera wrote:

On 2021-Jul-14, Dilip Kumar wrote:


I think for insert we are only allowing those rows to replicate which
are matching filter conditions, so if we updating any row then also we
should maintain that sanity right? That means at least on the NEW rows
we should apply the filter, IMHO.  Said that, now if there is any row
inserted which were satisfying the filter and replicated, if we update
it with the new value which is not satisfying the filter then it will
not be replicated,  I think that makes sense because if an insert is
not sending any row to a replica which is not satisfying the filter
then why update has to do that, right?


Right, that's a good aspect to think about.



I agree, that seems like a reasonable approach.

The way I'm thinking about this is that for INSERT and DELETE it's clear 
which row version should be used (because there's just one). And for 
UPDATE we could see that as DELETE + INSERT, and apply the same rule to 
each action.


On the other hand, I can imagine cases where it'd be useful to send the 
UPDATE when the old row matches the condition and new row does not.



I think the guiding principle for which tuple to use for the filter is
what is most useful to the potential user of the feature, rather than
what is the easiest to implement.



+1

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra


On 7/14/21 2:50 PM, Amit Kapila wrote:

On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
 wrote:


On 7/14/21 7:39 AM, Amit Kapila wrote:

On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:


On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:

1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE.  Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)



+1.


I thought about it but came to the conclusion that it doesn't worth it.  Even
with REPLICA IDENTITY FULL expression evaluates to false if the column allows
NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
because some row filter uses the column you want to remove from it.



Yeah, that is required but is it not feasible to do so?


2. For UPDATE, does the expression apply to the old tuple or to the new
tuple?  You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)

New tuple. The main reason is that new tuple is always there for UPDATEs.



I am not sure if that is a very good reason to use a new tuple.



True. Perhaps we should look at other places with similar concept of
WHERE conditions and old/new rows, and try to be consistent with those?

I can think of:

1) updatable views with CHECK option

2) row-level security

3) triggers

Is there some reasonable rule which of the old/new tuples (or both) to
use for the WHERE condition? Or maybe it'd be handy to allow referencing
OLD/NEW as in triggers?



I think apart from the above, it might be good if we can find what
some other databases does in this regard?



Yeah, that might tell us what the users would like to do with it. I did 
some quick search, but haven't found much :-( The one thing I found is 
that Debezium [1] allows accessing both the "old" and "new" rows through 
value.before and value.after, and use both for filtering.


I haven't found much about how this works in other databases, sadly.

Perhaps the best way forward is to stick to the approach that INSERT 
uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably), 
and leave anything fancier (like being able to reference both versions 
of the row) for a future patch.



[1] 
https://wanna-joke.com/wp-content/uploads/2015/01/german-translation-comics-science.jpg


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ResourceOwner refactoring

2021-07-14 Thread Heikki Linnakangas


On 14/07/2021 17:07, Alvaro Herrera wrote:

On 2021-Jul-14, vignesh C wrote:


On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas  wrote:



Here you go.


The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".


Support for hmac was added by e6bdfd9700eb so the rebase is not trivial.


Yeah, needed some manual fixing, but here you go.

- Heikki
>From 19fb4369e3b7bfb29afc6cfe597af5a89698dd3f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 13 Jan 2021 12:21:28 +0200
Subject: [PATCH v8 1/3] Move a few ResourceOwnerEnlarge() calls for safety and
 clarity.

These are functions where quite a lot of things happen between the
ResourceOwnerEnlarge and ResourceOwnerRemember calls. It's important that
there are no unrelated ResourceOwnerRemember() calls in the code
inbetween, otherwise the entry reserved by the ResourceOwnerEnlarge() call
might be used up by the intervening ResourceOwnerRemember() and not be
available at the intended ResourceOwnerRemember() call anymore. The longer
the code path between them is, the harder it is to verify that.

In bufmgr.c, there is a function similar to ResourceOwnerEnlarge(),
to ensure that the private refcount array has enough space. The
ReservePrivateRefCountEntry() calls, analogous to ResourceOwnerEnlarge(),
were made at different places than the ResourceOwnerEnlarge() calls.
Move the ResourceOwnerEnlarge() calls together with the
ReservePrivateRefCountEntry() calls for consistency.
---
 src/backend/storage/buffer/bufmgr.c   | 39 +++
 src/backend/storage/buffer/localbuf.c |  3 +++
 src/backend/utils/cache/catcache.c| 13 ++---
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 86ef607ff38..2383e4cac08 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -810,9 +810,6 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	*hit = false;
 
-	/* Make sure we will have room to remember the buffer pin */
-	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
-
 	isExtend = (blockNum == P_NEW);
 
 	TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
@@ -1165,9 +1162,11 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	{
 		/*
 		 * Ensure, while the spinlock's not yet held, that there's a free
-		 * refcount entry.
+		 * refcount entry and that the resoure owner has room to remember the
+		 * pin.
 		 */
 		ReservePrivateRefCountEntry();
+		ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
 
 		/*
 		 * Select a victim buffer.  The buffer is returned with its header
@@ -1668,8 +1667,6 @@ ReleaseAndReadBuffer(Buffer buffer,
  * taking the buffer header lock; instead update the state variable in loop of
  * CAS operations. Hopefully it's just a single CAS.
  *
- * Note that ResourceOwnerEnlargeBuffers must have been done already.
- *
  * Returns true if buffer is BM_VALID, else false.  This provision allows
  * some callers to avoid an extra spinlock cycle.
  */
@@ -1680,6 +1677,8 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 	bool		result;
 	PrivateRefCountEntry *ref;
 
+	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
+
 	ref = GetPrivateRefCountEntry(b, true);
 
 	if (ref == NULL)
@@ -1760,7 +1759,8 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
  * The spinlock is released before return.
  *
  * As this function is called with the spinlock held, the caller has to
- * previously call ReservePrivateRefCountEntry().
+ * previously call ReservePrivateRefCountEntry() and
+ * ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  *
  * Currently, no callers of this function want to modify the buffer's
  * usage_count at all, so there's no need for a strategy parameter.
@@ -1936,9 +1936,6 @@ BufferSync(int flags)
 	int			mask = BM_DIRTY;
 	WritebackContext wb_context;
 
-	/* Make sure we can handle the pin inside SyncOneBuffer */
-	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
-
 	/*
 	 * Unless this is a shutdown checkpoint or we have been explicitly told,
 	 * we write only permanent, dirty buffers.  But at shutdown or end of
@@ -2412,9 +2409,6 @@ BgBufferSync(WritebackContext *wb_context)
 	 * requirements, or hit the bgwriter_lru_maxpages limit.
 	 */
 
-	/* Make sure we can handle the pin inside SyncOneBuffer */
-	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
-
 	num_to_scan = bufs_to_lap;
 	num_written = 0;
 	reusable_buffers = reusable_buffers_est;
@@ -2496,8 +2490,6 @@ BgBufferSync(WritebackContext *wb_context)
  *
  * (BUF_WRITTEN could be set in error if FlushBuffer finds the buffer clean
  * after locking it, but we don't care all that much.)
- *
- * Note: caller must have done ResourceOwnerEnlargeBuffers.
  */
 static int
 SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
@@ -2507,7 +2499,9 @@ SyncOneB

Re: row filtering for logical replication

2021-07-14 Thread Dilip Kumar

On Wed, Jul 14, 2021 at 8:04 PM Tomas Vondra
 wrote:
>

> Perhaps the best way forward is to stick to the approach that INSERT
> uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably),
> and leave anything fancier (like being able to reference both versions
> of the row) for a future patch.

If UPDATE works as DELETE+ INSERT, does that mean both the OLD row and
the NEW row should satisfy the filter, then only it will be sent?
That means if we insert a row that is not satisfying the condition
(which is not sent to the subscriber) and later if we update that row
and change the values such that the modified value matches the filter
then we will not send it because only the NEW row is satisfying the
condition but OLD row doesn't.  I am just trying to understand your
idea.  Or you are saying that in this case, we will not send anything
for the OLD row as it was not satisfying the condition but the
modified row will be sent as an INSERT operation because this is
satisfying the condition?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2021-07-14 Thread Alvaro Herrera

On 2021-Jul-14, Tomas Vondra wrote:

> The way I'm thinking about this is that for INSERT and DELETE it's clear
> which row version should be used (because there's just one). And for UPDATE
> we could see that as DELETE + INSERT, and apply the same rule to each
> action.
> 
> On the other hand, I can imagine cases where it'd be useful to send the
> UPDATE when the old row matches the condition and new row does not.

In any case, it seems to me that the condition expression should be
scanned to see which columns are used in Vars (pull_varattnos?), and
verify if those columns are in the REPLICA IDENTITY; and if they are
not, raise an error.  Most of the time the REPLICA IDENTITY is going to
be the primary key; but if the user wants to use other columns in the
expression, we can HINT that they can set REPLICA IDENTITY FULL.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.

Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra





On 7/14/21 4:48 PM, Dilip Kumar wrote:

On Wed, Jul 14, 2021 at 8:04 PM Tomas Vondra
 wrote:





Perhaps the best way forward is to stick to the approach that INSERT
uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably),
and leave anything fancier (like being able to reference both versions
of the row) for a future patch.


If UPDATE works as DELETE+ INSERT, does that mean both the OLD row and
the NEW row should satisfy the filter, then only it will be sent?
That means if we insert a row that is not satisfying the condition
(which is not sent to the subscriber) and later if we update that row
and change the values such that the modified value matches the filter
then we will not send it because only the NEW row is satisfying the
condition but OLD row doesn't.  I am just trying to understand your
idea.  Or you are saying that in this case, we will not send anything
for the OLD row as it was not satisfying the condition but the
modified row will be sent as an INSERT operation because this is
satisfying the condition?



Good questions. I'm not sure, I probably have not thought it through.

So yeah, I think we should probably stick to the principle that what we 
send needs to match the filter condition, which applied to this case 
would mean we should be looking at the new row version.


The more elaborate scenarios can be added later by a patch allowing to 
explicitly reference the old/new row versions.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Extending amcheck to check toast size and compression

2021-07-14 Thread Mark Dilger

> On Jul 14, 2021, at 3:33 AM, Heikki Linnakangas  wrote:
> 
>> +/* The largest valid toast va_rawsize */
>> +#define VARLENA_SIZE_LIMIT 0x3FFF
>> +
> 
> Hmm, a toasted datum cannot be larger than MaxAllocSize, because it's 
> reconstituted in a palloc'd datum, right?

No datum size exceeds MaxAllocSize, and no datum expands when compressed 
(because for those that do expand under any particular compression algorithm, 
we opt to instead store the datum uncompressed), so no valid toast pointer 
should contain a va_rawsize field greater than MaxAllocSize.  Any toast 
pointers that have larger va_rawsize fields are therefore corrupt.

VARLENA_SIZE_LIMIT is defined here equal to MaxAllocSize:

src/include/utils/memutils.h:#define MaxAllocSize   ((Size) 0x3fff) 
/* 1 gigabyte - 1 */

Earlier versions of the patch used MaxAllocSize rather than defining 
VARLENA_SIZE_LIMIT, but review comments suggested that was less clear.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-14 Thread Alvaro Herrera

On 2021-Jul-14, Kyotaro Horiguchi wrote:

> > >  pg_log_error("extra_float_digits must be in range 
> > > -15..3");
> > >  exit_nicely(1);
> > 
> > Should we take this occasion to reduce the burden of translators and
> > reword that as "%s must be in range %d..%d"?  That could be a separate
> > patch.

Yes, please, let's do it here.

> The first %s is not always an invariable symbol name so it could
> result in concatenating several phrases into one, like this.
> 
>  pg_log..("%s must be in range %s..%s", _("compression level"), "0", "9"))
> 
> It is translatable at least into Japanese but I'm not sure about other
> languages.

No, this doesn't work.  When the first word is something that is
not to be translated (a literal parameter name), let's use a %s (but of
course the values must be taken out of the phrase too).  But when it is
a translatable phrase, it must be present a full phrase, no
substitution:

pg_log_error("%s must be in range %s..%s", "extra_float_digits", "-15", 
"3");
pg_log..("compression level must be in range %s..%s", "0", "9"))

I think the purity test is whether you want to use a _() around the
argument, then you have to include it into the original message.

Thanks

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

Re: Polyphase merge is obsolete

2021-07-14 Thread Heikki Linnakangas


On 14/07/2021 15:12, vignesh C wrote:

On Sat, Jan 23, 2021 at 3:49 AM Heikki Linnakangas  wrote:

Here's an updated version that fixes one bug:

The CFBot was reporting a failure on the FreeBSD system [1]. It turned
out to be an out-of-memory issue caused by an underflow bug in the
calculation of the size of the tape read buffer size. With a small
work_mem size, the memory left for tape buffers was negative, and that
wrapped around to a very large number. I believe that was not caught by
the other systems, because the other ones had enough memory for the
incorrectly-sized buffers anyway. That was the case on my laptop at
least. It did cause a big slowdown in the 'tuplesort' regression test
though, which I hadn't noticed.

The fix for that bug is here as a separate patch for easier review, but
I'll squash it before committing.


The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".


Here's a rebased version. I also squashed that little bug fix from 
previous patch set.


- Heikki
>From 027b20e74b3d8b09144cffe6d071706231e496d7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 21 Oct 2020 21:57:42 +0300
Subject: [PATCH 1/2] Refactor LogicalTapeSet/LogicalTape interface.

All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now
take a LogicalTape as argument, instead of LogicalTapeSet+tape number.
You can create any number of LogicalTapes in a single LogicalTapeSet, and
you don't need to decide the number upfront, when you create the tape set.

This makes the tape management in hash agg spilling in nodeAgg.c simpler.
---
 src/backend/executor/nodeAgg.c | 187 
 src/backend/utils/sort/logtape.c   | 457 -
 src/backend/utils/sort/tuplesort.c | 229 +++
 src/include/nodes/execnodes.h  |   3 +-
 src/include/utils/logtape.h|  37 ++-
 5 files changed, 360 insertions(+), 553 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 914b02ceee4..80025e5f1f3 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -208,7 +208,16 @@
  *
  *	  Spilled data is written to logical tapes. These provide better control
  *	  over memory usage, disk space, and the number of files than if we were
- *	  to use a BufFile for each spill.
+ *	  to use a BufFile for each spill.  We don't know the number of tapes needed
+ *	  at the start of the algorithm (because it can recurse), so a tape set is
+ *	  allocated at the beginning, and individual tapes are created as needed.
+ *	  As a particular tape is read, logtape.c recycles its disk space. When a
+ *	  tape is read to completion, it is destroyed entirely.
+ *
+ *	  Tapes' buffers can take up substantial memory when many tapes are open at
+ *	  once. We only need one tape open at a time in read mode (using a buffer
+ *	  that's a multiple of BLCKSZ); but we need one tape open in write mode (each
+ *	  requiring a buffer of size BLCKSZ) for each partition.
  *
  *	  Note that it's possible for transition states to start small but then
  *	  grow very large; for instance in the case of ARRAY_AGG. In such cases,
@@ -311,27 +320,6 @@
  */
 #define CHUNKHDRSZ 16
 
-/*
- * Track all tapes needed for a HashAgg that spills. We don't know the maximum
- * number of tapes needed at the start of the algorithm (because it can
- * recurse), so one tape set is allocated and extended as needed for new
- * tapes. When a particular tape is already read, rewind it for write mode and
- * put it in the free list.
- *
- * Tapes' buffers can take up substantial memory when many tapes are open at
- * once. We only need one tape open at a time in read mode (using a buffer
- * that's a multiple of BLCKSZ); but we need one tape open in write mode (each
- * requiring a buffer of size BLCKSZ) for each partition.
- */
-typedef struct HashTapeInfo
-{
-	LogicalTapeSet *tapeset;
-	int			ntapes;
-	int		   *freetapes;
-	int			nfreetapes;
-	int			freetapes_alloc;
-} HashTapeInfo;
-
 /*
  * Represents partitioned spill data for a single hashtable. Contains the
  * necessary information to route tuples to the correct partition, and to
@@ -343,9 +331,8 @@ typedef struct HashTapeInfo
  */
 typedef struct HashAggSpill
 {
-	LogicalTapeSet *tapeset;	/* borrowed reference to tape set */
 	int			npartitions;	/* number of partitions */
-	int		   *partitions;		/* spill partition tape numbers */
+	LogicalTape **partitions;	/* spill partition tapes */
 	int64	   *ntuples;		/* number of tuples in each partition */
 	uint32		mask;			/* mask to find partition from hash value */
 	int			shift;			/* after masking, shift by this amount */
@@ -365,8 +352,7 @@ typedef struct HashAggBatch
 {
 	int			setno;			/* grouping set */
 	int			used_bits;		/* number of bits of hash already used */
-	LogicalTapeSet *tapeset;	/* borrowed reference to tape set */
-	int			input_tapenum;	/* input partition tape */
+	LogicalTa

Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra





On 7/14/21 4:52 PM, Alvaro Herrera wrote:

On 2021-Jul-14, Tomas Vondra wrote:


The way I'm thinking about this is that for INSERT and DELETE it's clear
which row version should be used (because there's just one). And for UPDATE
we could see that as DELETE + INSERT, and apply the same rule to each
action.

On the other hand, I can imagine cases where it'd be useful to send the
UPDATE when the old row matches the condition and new row does not.


In any case, it seems to me that the condition expression should be
scanned to see which columns are used in Vars (pull_varattnos?), and
verify if those columns are in the REPLICA IDENTITY; and if they are
not, raise an error.  Most of the time the REPLICA IDENTITY is going to
be the primary key; but if the user wants to use other columns in the
expression, we can HINT that they can set REPLICA IDENTITY FULL.



Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps 
we could ignore this for subscriptions without DELETE.


The other question is when to check/enforce this. I guess we'll have to 
do that during decoding, not just when the publication is being created, 
because the user can do ALTER TABLE later.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: row filtering for logical replication

2021-07-14 Thread Alvaro Herrera

On 2021-Jul-14, Tomas Vondra wrote:

> On 7/14/21 4:52 PM, Alvaro Herrera wrote:

> > In any case, it seems to me that the condition expression should be
> > scanned to see which columns are used in Vars (pull_varattnos?), and
> > verify if those columns are in the REPLICA IDENTITY; and if they are
> > not, raise an error.  Most of the time the REPLICA IDENTITY is going to
> > be the primary key; but if the user wants to use other columns in the
> > expression, we can HINT that they can set REPLICA IDENTITY FULL.
> 
> Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps we
> could ignore this for subscriptions without DELETE.

Yeah, I said that too in my older reply :-)

> The other question is when to check/enforce this. I guess we'll have to do
> that during decoding, not just when the publication is being created,
> because the user can do ALTER TABLE later.

... if you're saying the user can change the replica identity after we
have some publications with filters defined, then I think we should
verify during ALTER TABLE and not allow the change if there's a
publication that requires it.  I mean, during decoding we should be able
to simply assume that the tuple is correct for what we need at that
point.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: ResourceOwner refactoring

2021-07-14 Thread Zhihong Yu

On Wed, Jul 14, 2021 at 7:40 AM Heikki Linnakangas  wrote:

> On 14/07/2021 17:07, Alvaro Herrera wrote:
> > On 2021-Jul-14, vignesh C wrote:
> >
> >> On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas 
> wrote:
> >
> >>> Here you go.
> >>
> >> The patch does not apply on Head anymore, could you rebase and post a
> >> patch. I'm changing the status to "Waiting for Author".
> >
> > Support for hmac was added by e6bdfd9700eb so the rebase is not trivial.
>
> Yeah, needed some manual fixing, but here you go.
>
> - Heikki
>
Hi,
For the loop over the hash:

+   for (int idx = 0; idx < capacity; idx++)
{
-   if (olditemsarr[i] != resarr->invalidval)
-   ResourceArrayAdd(resarr, olditemsarr[i]);
+   while (owner->hash[idx].kind != NULL &&
+  owner->hash[idx].kind->phase == phase)
...
+   } while (capacity != owner->capacity);

Since the phase variable doesn't seem to change for the while loop, I
wonder what benefit the while loop has (since the release is governed by
phase).

Cheers

Re: Avoid repeated PQfnumber() in pg_dump

2021-07-14 Thread Justin Pryzby

On Wed, Jul 14, 2021 at 08:54:32AM +, houzj.f...@fujitsu.com wrote:
> Since PQfnumber() is not a cheap function, I think we'd better invoke
> PQfnumber() out of the loop like the attatched patch.

+1

Please add to the next CF

-- 
Justin

Re: ResourceOwner refactoring

2021-07-14 Thread Heikki Linnakangas


Thanks for having a look!

On 14/07/2021 18:18, Zhihong Yu wrote:

For the loop over the hash:

+       for (int idx = 0; idx < capacity; idx++)
         {
-           if (olditemsarr[i] != resarr->invalidval)
-               ResourceArrayAdd(resarr, olditemsarr[i]);
+           while (owner->hash[idx].kind != NULL &&
+                  owner->hash[idx].kind->phase == phase)
...
+   } while (capacity != owner->capacity);

Since the phase variable doesn't seem to change for the while loop, I 
wonder what benefit the while loop has (since the release is governed by 
phase).


Hmm, the phase variable doesn't change, but could the element at 
'owner->hash[idx]' change? I'm not sure about that. The loop is supposed 
to handle the case that the hash table grows; could that replace the 
element at 'owner->hash[idx]' with something else, with different phase? 
The check is very cheap, so I'm inclined to keep it to be sure.


- Heikki

Re: ResourceOwner refactoring

2021-07-14 Thread Zhihong Yu

On Wed, Jul 14, 2021 at 8:26 AM Heikki Linnakangas  wrote:

> Thanks for having a look!
>
> On 14/07/2021 18:18, Zhihong Yu wrote:
> > For the loop over the hash:
> >
> > +   for (int idx = 0; idx < capacity; idx++)
> >  {
> > -   if (olditemsarr[i] != resarr->invalidval)
> > -   ResourceArrayAdd(resarr, olditemsarr[i]);
> > +   while (owner->hash[idx].kind != NULL &&
> > +  owner->hash[idx].kind->phase == phase)
> > ...
> > +   } while (capacity != owner->capacity);
> >
> > Since the phase variable doesn't seem to change for the while loop, I
> > wonder what benefit the while loop has (since the release is governed by
> > phase).
>
> Hmm, the phase variable doesn't change, but could the element at
> 'owner->hash[idx]' change? I'm not sure about that. The loop is supposed
> to handle the case that the hash table grows; could that replace the
> element at 'owner->hash[idx]' with something else, with different phase?
> The check is very cheap, so I'm inclined to keep it to be sure.
>
> - Heikki
>
Hi,
Agreed that ```owner->hash[idx].kind->phase == phase``` can be kept.

I just wonder if we should put limit on the number of iterations for the
while loop.
Maybe add a bool variable indicating that kind->ReleaseResource(value) is
called in the inner loop.
If there is no releasing when the inner loop finishes, we can come out of
the while loop.

Cheers

Re: Extending amcheck to check toast size and compression

2021-07-14 Thread Mark Dilger

> On Jul 14, 2021, at 7:57 AM, Mark Dilger  wrote:
> 
> so no valid toast pointer should contain a va_rawsize field greater than 
> MaxAllocSize

... nor should any valid toast pointer contain a va_extinfo field encoding a 
va_extsize greater than va_rawsize - VARHDRSZ.

Violations of either of these properties suggest either a bug in the code which 
wrote the toast pointer, or that the toast pointer has been corrupted since 
being written, or that the page of data being read is being interpreted 
incorrectly, perhaps due to catalog corruption, or because the page is just 
random noise and not part of a valid table, etc.  The amcheck code is not 
focused specifically on whether the toasted value can be detoasted so much as 
deducing that the data cannot be correct.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: libpq compression

2021-07-14 Thread vignesh C

On Wed, Jul 14, 2021 at 6:31 PM Daniil Zakhlystov
 wrote:
>
> **sorry for the noise, but I need to re-send the message because one of the 
> recipients is blocked on the pgsql-hackers for some reason**
>
> Hi!
>
> Done, the patch should apply to the current master now.
>
> Actually, I have an almost-finished version of the patch with the previously 
> requested asymmetric compression negotiation. I plan to attach it soon.

Thanks for providing the patch quickly, I have changed the status to
"Need Review".

Regards,
Vignesh

Re: [PATCH] Allow multiple recursive self-references

2021-07-14 Thread vignesh C

On Wed, Mar 31, 2021 at 7:28 PM Denis Hirn  wrote:
>
> Sorry, I didn't append the patch properly.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: DROP INDEX CONCURRENTLY on partitioned index

2021-07-14 Thread vignesh C

On Wed, Oct 28, 2020 at 6:14 AM Justin Pryzby  wrote:
>
> Forking this thread, since the existing CFs have been closed.
> https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd
>
> On Mon, Sep 14, 2020 at 09:31:03AM -0500, Justin Pryzby wrote:
> > On Sat, Sep 12, 2020 at 10:35:34AM +0900, Michael Paquier wrote:
> > > On Fri, Sep 11, 2020 at 07:13:01PM -0500, Justin Pryzby wrote:
> > > > On Tue, Sep 08, 2020 at 01:31:05PM +0900, Michael Paquier wrote:
> > > >> - CIC on partitioned relations.  (Should we also care about DROP INDEX
> > > >> CONCURRENTLY as well?)
> > > >
> > > > Do you have any idea what you think that should look like for DROP INDEX
> > > > CONCURRENTLY ?
> > >
> > > Making the maintenance of the partition tree consistent to the user is
> > > the critical part here, so my guess on this matter is:
> > > 1) Remove each index from the partition tree and mark the indexes as
> > > invalid in the same transaction.  This makes sure that after commit no
> > > indexes would get used for scans, and the partition dependency tree
> > > pis completely removed with the parent table.  That's close to what we
> > > do in index_concurrently_swap() except that we just want to remove the
> > > dependencies with the partitions, and not just swap them of course.
> > > 2) Switch each index to INDEX_DROP_SET_DEAD, one per transaction
> > > should be fine as that prevents inserts.
> > > 3) Finish the index drop.
> > >
> > > Step 2) and 3) could be completely done for each index as part of
> > > index_drop().  The tricky part is to integrate 1) cleanly within the
> > > existing dependency machinery while still knowing about the list of
> > > partitions that can be removed.  I think that this part is not that
> > > straight-forward, but perhaps we could just make this logic part of
> > > RemoveRelations() when listing all the objects to remove.
> >
> > Thanks.
> >
> > I see three implementation ideas..
> >
> > 1. I think your way has an issue that the dependencies are lost.  If 
> > there's an
> > interruption, the user is maybe left with hundreds or thousands of detached
> > indexes to clean up.  This is strange since there's actually no detach 
> > command
> > for indexes (but they're implicitly "attached" when a matching parent index 
> > is
> > created).  A 2nd issue is that DETACH currently requires an exclusive lock 
> > (but
> > see Alvaro's WIP patch).
>
> I think this is a critical problem with this approach.  It's not ok if a
> failure leaves behind N partition indexes not attached to any parent.
> They may have pretty different names, which is a mess to clean up.
>
> > 2. Maybe the easiest way is to mark all indexes invalid and then drop all
> > partitions (concurrently) and then the partitioned parent.  If interrupted,
> > this would leave a parent index marked "invalid", and some child tables 
> > with no
> > indexes.  I think this may be "ok".  The same thing is possible if a 
> > concurrent
> > build is interrupted, right ?
>
> I think adding the "invalid" mark in the simple/naive way isn't enough - it 
> has
> to do everything DROP INDEX CONCURRENTLY does (of course).
>
> > 3. I have a patch which changes index_drop() to "expand" a partitioned 
> > index into
> > its list of children.  Each of these becomes a List:
> > | indexId, heapId, userIndexRelation, userHeapRelation, heaplocktag, 
> > heaprelid, indexrelid
> > The same process is followed as for a single index, but handling all 
> > partitions
> > at once in two transactions total.  Arguably, this is bad since that 
> > function
> > currently takes a single Oid but would now ends up operating on a list of 
> > indexes.
>
> This is what's implemented in the attached.  It's very possible I've missed
> opportunities for better simplification/integration.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-07-14 Thread vignesh C

On Tue, Mar 30, 2021 at 2:14 AM Mark Rofail  wrote:
>
> Hey Alvaro
>
>> Yes, we should do that.
>
> I have attached v12 with more tests in “ src/test/regress/sql/gin.sql”
>
> Changelog:
> - v12 (compatible with current master 2021/03/29, commit 
> 6d7a6feac48b1970c4cd127ee65d4c487acbb5e9)
> * add more tests to “ src/test/regress/sql/gin.sql”
> * merge Andreas' edits to the GIN patch
>
> Also, @Andreas Karlsson, waiting for your edits to "pg_constraint"

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Implementing Incremental View Maintenance

2021-07-14 Thread vignesh C

On Mon, May 17, 2021 at 10:08 AM Yugo NAGATA  wrote:
>
> On Fri, 7 May 2021 14:14:16 +0900
> Yugo NAGATA  wrote:
>
> > On Mon, 26 Apr 2021 16:03:48 +0900
> > Yugo NAGATA  wrote:
> >
> > > On Mon, 26 Apr 2021 15:46:21 +0900
> > > Yugo NAGATA  wrote:
> > >
> > > > On Tue, 20 Apr 2021 09:51:34 +0900
> > > > Yugo NAGATA  wrote:
> > > >
> > > > > On Mon, 19 Apr 2021 17:40:31 -0400
> > > > > Tom Lane  wrote:
> > > > >
> > > > > > Andrew Dunstan  writes:
> > > > > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > > > > Fedora 31. Here's the stack trace:
> > > > > >
> > > > > > > #2  0x0094a54a in ExceptionalCondition
> > > > > > > (conditionName=conditionName@entry=0xa91dae 
> > > > > > > "queryDesc->sourceText !=
> > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > > > > fileName=fileName@entry=0xa91468
> > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > > > > lineNumber=lineNumber@entry=199) at
> > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > > > >
> > > > > > That assert just got added a few days ago, so that's why the patch
> > > > > > seemed OK before.
> > > > >
> > > > > Thank you for letting me know. I'll fix it.
> > > >
> > > > Attached is the fixed patch.
> > > >
> > > > queryDesc->sourceText cannot be NULL after commit b2668d8,
> > > > so now we pass an empty string "" for refresh_matview_datafill() 
> > > > instead NULL
> > > > when maintaining views incrementally.
> > >
> > > I am sorry, I forgot to include a fix for 8aba9322511.
> > > Attached is the fixed version.
> >
> > Attached is the rebased patch (for 6b8d29419d).
>
> I attached a rebased patch.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Make Append Cost aware of some run time partition prune case

2021-07-14 Thread vignesh C

On Thu, Mar 4, 2021 at 9:51 AM Andy Fan  wrote:
>
>
>>
>> I have implemented a new one, which only handles 1 level of partitioned 
>> table, and
>> only 1 partition key.  and only handle the eq operators like partkey = $1 / 
>> partkey in ($1, $2)
>> / parkey = $1 or partkey = $2;  The patch works well in my user case.  I can 
>> send
>> one on the latest master shortly, and hope it is helpful for you as well.
>>
>
> Hi:
>
> Here is the new patch for this topic,  which has proved works in my limited 
> user
> case (apply the similar logic on pg11), and it is incomplete since more cases
> should be covered but not.  Uploading this patch now is just for discussing 
> and
> testing.
>
> Design principle:
>
> 1). the cost of AppendPath should be reduced for either init partition prune 
> or run
> time partition prune. All of the startup_cost, total_cost, rows should be
> adjusted. As for the startup_cost, if we should adjust it carefully, or else 
> we can
> get the run_time_cost less than 0.
>
> 2). When we merge the subpath from sub-partitioned rel via
> accumulate_append_subpath, currently we just merge the subpaths and discard 
> the
> cost/rows in AppendPath.  After this feature is involved, we may consider to 
> use
> the AppendPath's cost as well during this stage.
>
> 3). When join is involved, AppendPath is not enough since the estimated rows 
> for
> a join relation cares about rel1->rows, rel2->rows only, the Path.rows is not
> cared. so we need to adjust rel->rows as well.  and only for the init
> partition prune case.  (appendrel->rows for planning time partition prune is
> handled already).
>
> The biggest problem of the above is I don't know how to adjust the cost for
> Parallel Append Path. Currently I just ignore it, which would cause some query
> should use Parallel Append Path but not.
>
> Something I don't want to handle really unless we can address its value.
> 1. Operators like  >, <. Between and.
> 2. the uncompleted part key appeared in prunequals. Like we have partition 
> key (a,
>b). But use just use A = 1 as restrictinfo.
>
> The main reason I don't want to handle them are 1). it would be uncommon.
> b). It introduces extra complexity. c). at last, we can't estimate it well 
> like partkey > $1,
> what would be a prune ratio for ).
>
> Something I don't handle so far are: 1). accumulate_append_subpath
> stuff. 2). MergeAppend.  3). Multi Partition key.
>

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Have I found an interval arithmetic bug?

2021-07-14 Thread Zhihong Yu

On Thu, Jul 8, 2021 at 10:22 AM Zhihong Yu  wrote:

>
>
> On Wed, Jun 30, 2021 at 9:35 AM Bruce Momjian  wrote:
>
>> On Tue, Jun 29, 2021 at 06:49:45PM +0200, Daniel Gustafsson wrote:
>> > > On 29 Jun 2021, at 18:50, Zhihong Yu  wrote:
>> >
>> > > Now that PG 15 is open for commit, do you think the patch can land ?
>> >
>> > Adding it to the commitfest patch tracker is a good way to ensure it's
>> not
>> > forgotten about:
>> >
>> >   https://commitfest.postgresql.org/33/
>>
>> OK, I have been keeping it in my git tree since I wrote it and will
>> apply it in the next few days.
>>
>> --
>>   Bruce Momjian  https://momjian.us
>>   EDB  https://enterprisedb.com
>>
>>   If only the physical world exists, free will is an illusion.
>>
>> Thanks, Bruce.
>
> Hopefully you can get to this soon.
>

Bruce:
Please see if the patch can be integrated now.

Cheers

Re: [PATCH] New default role allowing to change per-role/database settings

2021-07-14 Thread vignesh C

On Wed, Apr 7, 2021 at 5:23 PM Michael Banck  wrote:
>
> Hi,
>
> Am Dienstag, den 06.04.2021, 15:37 +0200 schrieb Michael Banck:
> > Am Montag, den 05.04.2021, 14:33 -0400 schrieb Stephen Frost:
> > > Should drop the 'DEFAULT_' to match the others since the rename to
> > > 'predefined' roles went in.
> >
> > Right, will send a rebased patch ASAP.
>
> Here's a rebased version that also removes the ALTER DATABASE SET
> capability from this new predefined role and adds some documentation.
>

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Reduce the number of special cases to build contrib modules on windows

2021-07-14 Thread vignesh C

On Mon, Apr 19, 2021 at 5:18 PM David Rowley  wrote:
>
> On Wed, 3 Mar 2021 at 22:37, David Rowley  wrote:
> > I've attached a rebased patch.
>
> I've rebased this again.
>
> I also moved away from using hash tables for storing references and
> libraries.  I was having some problems getting psql to compile due to
> the order of the dependencies being reversed due to the order being at
> the mercy of Perl's hash function. There's mention of this in
> Makefile.global.in:
>
> # libpq_pgport is for use by client executables (not libraries) that use 
> libpq.
> # We force clients to pull symbols from the non-shared libraries libpgport
> # and libpgcommon rather than pulling some libpgport symbols from libpq just
> # because libpq uses those functions too.  This makes applications less
> # dependent on changes in libpq's usage of pgport (on platforms where we
> # don't have symbol export control for libpq).  To do this we link to
> # pgport before libpq.  This does cause duplicate -lpgport's to appear
> # on client link lines, since that also appears in $(LIBS).
> # libpq_pgport_shlib is the same idea, but for use in client shared libraries.
>
> I switched these back to arrays but added an additional check to only
> add new items to the array if we don't already have an element with
> the same value.
>
> I've attached the diffs in the *.vcxproj files between patched and unpatched.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Release SPI plans for referential integrity with DISCARD ALL

2021-07-14 Thread vignesh C

On Wed, Mar 10, 2021 at 1:49 PM yuzuko  wrote:
>
> Hello,
>
> I thought about this suggestion again.
>
> Amit's patch suggested in the thread [1] can eliminate SPI plans from
> INSERT/UPDATE triggers, so our memory pressure issue would be solved.
> But as far as I can see that thread, Amit's patch doesn't cover DELETE case.
> It is not a common case, but there is a risk of pressing memory
> capacity extremely.
> Considering that, this suggestion might still have good value as Corey
> said before.
>
> I improved a patch according to Peter's following comment :
> > but I think the
> > solution of dropping all cached plans as part of DISCARD ALL seems a bit
> > too extreme of a solution.  In the context of connection pooling,
> > getting a new session with pre-cached plans seems like a good thing, and
> > this change could potentially have a performance impact without a
> > practical way to opt out.
>
> In a new patch, I separated discarding SPI Plan caches for RI from DISCARD ALL
> and added a new option "RIPLANS" to DISCARD command to do that.  Also a new
> function, ResetRIPlanCache() which clears SPI plan caches is called by
> DISCARD ALL
> or DISCARD RIPLANS.  The amount of modification is small and this option can 
> be
> removed instantly when it is no longer needed, so I think we can use
> this patch as
> a provisional solution.
>
> Any thoughts?

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-07-14 Thread Alvaro Herrera

On 2021-Jul-14, vignesh C wrote:

> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

BTW I gave a look at this patch in the March commitfest and concluded it
still requires some major surgery that I didn't have time for.  I did so
by re-reading early in the thread to understand what the actual
requirements were for this feature to work, and it seemed to me that the
patch started to derail at some point.  I suggest that somebody needs to
write up exactly what we need, lest the patches end up implementing
something else.

I don't have time for this patch during the current commitfest, and I'm
not sure that I will during the next one.  If somebody else wants to
spend time with it, ... be my guest.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

Re: storing an explicit nonce

2021-07-14 Thread vignesh C

On Sat, Jun 26, 2021 at 2:52 AM Bruce Momjian  wrote:
>
> On Wed, May 26, 2021 at 05:02:01PM -0400, Bruce Momjian wrote:
> > For these reasons, if we decide to go in the direction of using a
> > non-LSN nonce, I no longer plan to continue working on this feature. I
> > would rather work on things that have a more positive impact.  Maybe a
> > non-LSN nonce is a better long-term plan, but there are too many
> > unknowns and complexity for me to feel comfortable with it.
>
> As stated above, I have no plans to continue working on this feature.  I
> am attaching my final patches here in case anyone wants to make use of
> them;  it passes check-world and all my private tests.  I have removed
> my patches from the feature wiki page:
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
>
> and replaced it with a link to this email.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: VACUUM (DISABLE_PAGE_SKIPPING on)

2021-07-14 Thread vignesh C

On Thu, Apr 8, 2021 at 11:40 PM Simon Riggs  wrote:
>
> On Thu, 8 Apr 2021 at 18:15, Alvaro Herrera  wrote:
> >
> > On 2021-Apr-08, Simon Riggs wrote:
> >
> > > On Thu, 8 Apr 2021 at 16:58, David Steele  wrote:
> >
> > > > It's not clear to me which patch is which, so perhaps move one CF entry
> > > > to next CF and clarify which patch is current?
> > >
> > > Entry: Maximize page freezing
> > > has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov.
> > > one_freeze_then_max_freeze.v7.patch
> >
> > Oh dear, was this waiting on me?  It was not in my to-do, so I didn't
> > pay close attention.
>
> No problem, if I felt the idea deserved priority attention I would
> have pinged you.
>
> I think I'll open a new thread later to allow it to be discussed
> without confusion.

The patch no longer applies on the head, please post a rebased patch.

Regards,
Vignesh

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

2021-07-14 Thread Tom Lane

Peter Eisentraut  writes:
> In this particular case, I would for example be quite curious how those 
> alternative minimal C libraries such as musl-libc handle this.

Interesting question, so I took a look:

https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593

case 's':
a = arg.p ? arg.p : "(null)";
...

Any others you'd like to consider?

regards, tom lane

Re: row filtering for logical replication

2021-07-14 Thread Euler Taveira

On Wed, Jul 14, 2021, at 11:48 AM, Dilip Kumar wrote:
> On Wed, Jul 14, 2021 at 8:04 PM Tomas Vondra
>  wrote:
> >
> 
> > Perhaps the best way forward is to stick to the approach that INSERT
> > uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably),
> > and leave anything fancier (like being able to reference both versions
> > of the row) for a future patch.
> 
> If UPDATE works as DELETE+ INSERT, does that mean both the OLD row and
> the NEW row should satisfy the filter, then only it will be sent?
> That means if we insert a row that is not satisfying the condition
> (which is not sent to the subscriber) and later if we update that row
> and change the values such that the modified value matches the filter
> then we will not send it because only the NEW row is satisfying the
> condition but OLD row doesn't.  I am just trying to understand your
> idea.  Or you are saying that in this case, we will not send anything
> for the OLD row as it was not satisfying the condition but the
> modified row will be sent as an INSERT operation because this is
> satisfying the condition?
That's a fair argument for the default UPDATE behavior. It seems we have a
consensus that UPDATE operation will use old row. If there is no objections, I
will change it in the next version.

We can certainly discuss the possibilities for UPDATE operations. It can choose
which row to use: old, new or both (using an additional publication argument or
OLD and NEW placeholders to reference old and new rows are feasible ideas).


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

2021-07-14 Thread Tom Lane

I wrote:
> Interesting question, so I took a look:
> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
> case 's':
>   a = arg.p ? arg.p : "(null)";

BTW, the adjacent code shows that musl is also supporting glibc's
"%m" extension, so I imagine that they are endeavoring to be
compatible with glibc, and this goes along with that.  But that
just supports my larger point: printing "(null)" is clearly the
de facto standard now, whether or not POSIX has caught up with it.

regards, tom lane

free C string

2021-07-14 Thread Zhihong Yu

Hi,
I was looking at fmgr_internal_validator().

It seems prosrc is only used internally.

The patch frees the C string prosrc points to, prior to returning.

Please take a look.

Thanks


free-c-str.patch
Description: Binary data

Re: free C string

2021-07-14 Thread Tom Lane

Zhihong Yu  writes:
> I was looking at fmgr_internal_validator().
> It seems prosrc is only used internally.
> The patch frees the C string prosrc points to, prior to returning.

There's really very little point in adding such code.  Our memory
context mechanisms take care of minor leaks like this, with less
code and fewer cycles expended than explicit pfree calls require.
It's worth trying to clean up explicitly in code that might get
executed many times in a row, or might be allocating very big
temporary chunks; but fmgr_internal_validator hardly falls in
that category.

regards, tom lane

Re: row filtering for logical replication

2021-07-14 Thread Euler Taveira

On Wed, Jul 14, 2021, at 12:08 PM, Tomas Vondra wrote:
> Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps 
> we could ignore this for subscriptions without DELETE.
... and UPDATE. It seems we have a consensus to use old row in the row filter
for UPDATEs. I think you meant publication.

> The other question is when to check/enforce this. I guess we'll have to 
> do that during decoding, not just when the publication is being created, 
> because the user can do ALTER TABLE later.
I'm afraid this check during decoding has a considerable cost. If we want to
enforce this condition, I suggest that we add it to CREATE PUBLICATION, ALTER
PUBLICATION ... ADD|SET TABLE and ALTER TABLE ... REPLICA IDENTITY. Data are
being constantly modified; schema is not.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: SI messages sent when excuting ROLLBACK PREPARED command

2021-07-14 Thread Tom Lane

"liuhuail...@fujitsu.com"  writes:
> So, I think we needn't send SI messags when rollbacking the two-phase 
> transaction.
> Or Does it has something special because of two-phase transaction?

Hmmm, yeah, I think you're right.  It probably doesn't make a big
difference in the real world --- anyone who's dependent on the
performance of 2PC rollbaxks is Doing It Wrong.  But we'd have
already done LocalExecuteInvalidationMessage when getting out of
the prepared transaction, so no other SI invals should be needed.

regards, tom lane

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread Ranier Vilela

Em qua., 14 de jul. de 2021 às 07:14, David Rowley 
escreveu:

> On Tue, 13 Jul 2021 at 15:15, David Rowley  wrote:
> > In theory, we likely could get rid of the small regression by having
> > two versions of ExecSort() and setting the correct one during
> > ExecInitSort() by setting the function pointer to the version we want
> > to use in sortstate->ss.ps.ExecProcNode.
>
> Just to see how it would perform, I tried what I mentioned above. I've
> included what I ended up with in the attached POC patch.
>
> I got the following results on my AMD hardware.
>
> Test master v8 patch comparison
> Test1   448.0   671.7   149.9%
> Test2   316.4   317.5   100.3%
> Test3   299.5   381.6   127.4%
> Test4   219.7   229.5   104.5%
> Test5   226.3   254.6   112.5%
> Test6   197.9   217.9   110.1%
> Test7   179.2   185.3   103.4%
> Test8   389.2   544.8   140.0%
>
I'm a little surprised by your results.
Test1 and Test8 look pretty good to me.
What is compiler and environment?

I repeated (3 times) the benchmark with v8 here,
and the results were not good.


  HEADv6  v7bv8
v6 vs head  v8 vs v6 v8 vs v7b
Test1 288,149636 449,018541 550,48505 468,168165 155,83% 104,26% 85,05%
Test2 94,766955 95,451406 94,718982 94,800275 100,72% 99,32% 100,09%
Test3 190,521319 260,279802 278,115296 262,538383 136,61% 100,87% 94,40%
Test4 78,779344 78,253455 77,941482 78,471546 99,33% 100,28% 100,68%
Test5 131,362614 142,662223 149,639041 144,849303 108,60% 101,53% 96,80%
Test6 112,884298 124,181671 127,58497 124,29376 110,01% 100,09% 97,42%
Test7 69,308587 68,643067 69,087544 69,437312 99,04% 101,16% 100,51%
Test8 243,674171 364,681142 419,259703 369,239176 149,66% 101,25% 88,07%



> This time I saw no regression on tests 2, 4 and 7.
>
> I looked to see if there was anywhere else in the executor that
> conditionally uses a different exec function in this way and found
> nothing, so I'm not too sure if it's a good idea to start doing this.
>
Specialized functions can be a way to optimize. The compilers themselves do
it.
But the ExecSortTuple and ExecSortDatum are much more similar,
which can cause maintenance problems.
I don't think in this case it would be a good idea.


>
> It would be good to get a 2nd opinion about this idea.  Also, more
> benchmark results with v6 and v8 would be good too.
>
Yeah, another different machine.
I would like to see other results with v7b.

Attached the file with all results from v8.

regards,
Ranier Vilela
Benchmarks datumSort:

6) v8 David

a)
Test1
tps = 426.606950 (without initial connection time)
tps = 420.964492 (without initial connection time)
tps = 429.016435 (without initial connection time)
Test2
tps = 93.388625 (without initial connection time)
tps = 94.571572 (without initial connection time)
tps = 94.581301 (without initial connection time)
Test3
tps = 251.625641 (without initial connection time)
tps = 251.769007 (without initial connection time)
tps = 251.576880 (without initial connection time)
Test4
tps = 77.892592 (without initial connection time)
tps = 77.664981 (without initial connection time)
tps = 77.618023 (without initial connection time)
Test5
tps = 141.801858 (without initial connection time)
tps = 141.957810 (without initial connection time)
tps = 141.849105 (without initial connection time)
Test6
tps = 122.650449 (without initial connection time)
tps = 122.603506 (without initial connection time)
tps = 122.786432 (without initial connection time)
Test7
tps = 68.602538 (without initial connection time)
tps = 68.940470 (without initial connection time)
tps = 68.770827 (without initial connection time)
Test8
tps = 350.593188 (without initial connection time)
tps = 349.741689 (without initial connection time)
tps = 349.544567 (without initial connection time)

b)
Test1
tps = 430.025697 (without initial connection time)
tps = 427.884165 (without initial connection time)
tps = 428.708592 (without initial connection time)
Test2
tps = 94.207150 (without initial connection time)
tps = 93.821936 (without initial connection time)
tps = 93.647174 (without initial connection time)
Test3
tps = 251.784817 (without initial connection time)
tps = 251.336243 (without initial connection time)
tps = 251.431278 (without initial connection time)
Test4
tps = 77.884797 (without initial connection time)
tps = 77.413191 (without initial connection time)
tps = 77.569484 (without initial connection time)
Test5
tps = 141.787480 (without initial connection time)
tps = 142.344187 (without initial connection time)
tps = 141.819273 (without initial connection time)
Test6
tps = 122.848858 (without initial connection time)
tps = 122.935840 (without initial connection time)
tps = 123.559398 (without initial connection time)
Test7
tps = 68.854804 (without initial connection time)
tps = 68.929120 (without initial connection time)
tps = 68.779992 (without initial connection time)
Test8
tps = 349.630138 (without initial connection time)
tps =

Re: Replacing pg_depend PIN entries with a fixed range check

2021-07-14 Thread John Naylor

On Thu, May 27, 2021 at 6:53 PM Tom Lane  wrote:

> Attached is a rebase over a4390abec.

Looks good to me overall, I just had a couple questions/comments:

isObjectPinned and isSharedObjectPinned are now thin wrappers around
IsPinnedObject. Is keeping those functions a matter of future-proofing in
case something needs to be handled differently someday, or reducing
unnecessary code churn?

setup_depend now doesn't really need to execute any SQL (unless third-party
forks have extra steps here?), and could be replaced with a direct call
to StopGeneratingPinnedObjectIds. That's a bit more self-documenting, and
that would allow shortening this comment:

 /*
* Note that no objects created after setup_depend() will be "pinned".
* They are all droppable at the whim of the DBA.
*/

--
John Naylor
EDB: http://www.enterprisedb.com

Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2021-07-14 Thread Alvaro Herrera

On 2020-Oct-27, Justin Pryzby wrote:

> I think either way could be ok - if you assume that the trigger was disabled
> with ONLY, then it'd make sense to restore it with ONLY, but I think it's at
> least as common to ALTER TABLE [*].  It might look weird to the user if they
> used ALTER TABLE ONLY and the pg_dump output uses ALTER TABLE for that table,
> and then again for all its children (or vice versa).  But it's fine as long as
> the state is correctly restored.
> 
> There are serveral issues:
>  - fail to preserve childs' tgenabled in CREATE TABLE PARTITION OF;
>  - fail to preserve childs' tgenabled in pg_dump;
>  - fail to preserve childs' comments in pg_dump;
> 
> I'm going step away from this patch at least for awhile, so I'm attaching my
> latest in case it's useful.

Here's a new cut of this series.  I used your pg_dump patch, but I blank
out the CREATE TRIGGER query before stashing the ALTER TRIGGER;
otherwise the dump has an error at restore time (because the trigger is
created again on the partition, but it already exists because it's been
created for the parent).  Also, the new query has the "OR tgenabled <>"
test only if the table is a partition; and apply this new query only in
11 and 12; keep 9.x-10 unchanged, because it cannot possibly match
anything.

I tested this by creating 10k tables with one trigger each (no
partitioned tables).  Total time to dump is the same as before.  I was
concerned that because the query now has two LEFT JOINs it would be
slower; but it seems to be only marginally so.

I'm thinking to apply my patch that changes the server behavior only to
14 and up.  I could be persuaded to backpatch all the way to 11, if
anybody supports the idea.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
>From 29b595d02af124900d9f1e9655f6fe3d2725bfd1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 16 Oct 2020 10:58:54 -0300
Subject: [PATCH v3 1/2] Preserve firing-on state when cloning row triggers to
 partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When triggers are cloned from partitioned tables to their partitions,
the 'tgenabled' flag (origin/replica/always/disable) was not propagated.
Make it so that the flag on the trigger on partition is initially set to
the same value as on the partitioned table.

Add a test case to verify the behavior.

Backpatch to 14.  The original behavior, which appeared in pg11 with
commit 86f575948c77 doesn't really make much sense, but it seems risky
to change it on established branches.

Author: Álvaro Herrera 
Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20200930223450.ga14...@telsasoft.com
---
 src/backend/commands/tablecmds.c   |  4 +-
 src/backend/commands/trigger.c | 30 +++---
 src/include/commands/trigger.h |  5 +++
 src/test/regress/expected/triggers.out | 56 ++
 src/test/regress/sql/triggers.sql  | 32 +++
 5 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96375814a8..dd2aefe1f3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17736,10 +17736,10 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 		trigStmt->initdeferred = trigForm->tginitdeferred;
 		trigStmt->constrrel = NULL; /* passed separately */
 
-		CreateTrigger(trigStmt, NULL, RelationGetRelid(partition),
+		CreateTriggerFiringOn(trigStmt, NULL, RelationGetRelid(partition),
 	  trigForm->tgconstrrelid, InvalidOid, InvalidOid,
 	  trigForm->tgfoid, trigForm->oid, qual,
-	  false, true);
+	  false, true, trigForm->tgenabled);
 
 		MemoryContextSwitchTo(oldcxt);
 		MemoryContextReset(perTupCxt);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 952c8d582a..6d4b7ee92a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -151,6 +151,24 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			  Oid relOid, Oid refRelOid, Oid constraintOid, Oid indexOid,
 			  Oid funcoid, Oid parentTriggerOid, Node *whenClause,
 			  bool isInternal, bool in_partition)
+{
+	return
+		CreateTriggerFiringOn(stmt, queryString, relOid, refRelOid,
+			  constraintOid, indexOid, funcoid,
+			  parentTriggerOid, whenClause, isInternal,
+			  in_partition, TRIGGER_FIRES_ON_ORIGIN);
+}
+
+/*
+ * Like the above; additionally the firing condition
+ * (always/origin/replica/disabled) can be specified.
+ */
+ObjectAddress
+CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
+	  Oid relOid, Oid refRelOid, Oid constraintOid,
+	  Oid indexOid, Oid funcoid, Oid parentTriggerOid,
+	  Node *whenClause, bool isInternal, bool in_partition,
+	  char trigger_fires_when)
 {
 	int16		tgtype;
 	int			ncol

Re: [HACKERS] Preserving param location

2021-07-14 Thread Tom Lane

Julien Rouhaud  writes:
> On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote:
>> So I'm not really convinced that there's a fully-baked use case
>> here, and would like more detail about how you hope to use the
>> location value.

> As I said I have no doubt that there are other cases which are currently not
> correctly handled, but that's not what I'm trying to fix.  I just want to 
> treat
> parameterized queries the same way as regular queries, thus fixing one obvious
> and frequent case (for which some user did complain), hoping that it will on
> its own already help many users.

Hm.  I guess this point (i.e. that the Param substitution should result
in the identical plan tree as writing a literal constant would have)
has some force.  I still feel like your application is pretty shaky,
but I don't really have any ideas about feasible ways to make it better.

Will push the patch.

regards, tom lane

Remove redundant Assert(PgArchPID == 0); in PostmasterStateMachine

2021-07-14 Thread Bharath Rupireddy

Hi,

It looks like the commit d75288fb [1] added an unnecessary
Assert(PgArchPID == 0); in PostmasterStateMachine as the if block code
gets hit only when PgArchPID == 0. PSA small patch.

[1]
commit d75288fb27b8fe0a926aaab7d75816f091ecdc27
Author: Fujii Masao 
Date:   Mon Mar 15 13:13:14 2021 +0900

Make archiver process an auxiliary process.

Regards,
Bharath Rupireddy.


v1-0001-Remove-redundant-Assert-PgArchPID-0-inPostmasterS.patch
Description: Binary data

Re: PROXY protocol support

2021-07-14 Thread Jacob Champion

On Mon, 2021-07-12 at 18:28 +0200, Magnus Hagander wrote:
> Yeah, I have no problem being stricter than necessary, unless that
> actually causes any interop problems. It's a lot worse to not be
> strict enough..

Agreed. Haven't heard back from the HAProxy mailing list yet, so
staying strict seems reasonable in the meantime. That could always be
rolled back later.

> > I've been thinking more about your earlier comment:
> > 
> > > An interesting thing is what to do about
> > > inet_server_addr/inet_server_port. That sort of loops back up to the
> > > original question of where/how to expose the information about the
> > > proxy in general (since right now it just logs). Right now you can
> > > actually use inet_server_port() to see if the connection was proxied
> > > (as long as it was over tcp).
> > 
> > IMO these should return the "virtual" dst_addr/port, instead of
> > exposing the physical connection information to the client. That way,
> > if you intend for your proxy to be transparent, you're not advertising
> > your network internals to connected clients. It also means that clients
> > can reasonably expect to be able to reconnect to the addr:port that we
> > give them, and prevents confusion if the proxy is using an address
> > family that the client doesn't even support (e.g. the client is IPv4-
> > only but the proxy connects via IPv6).
> 
> That reasoning I think makes a lot of sense, especially with the
> comment about being able to connect back to it.
> 
> The question at that point extends to, would we also add extra
> functions to get the data on the proxy connection itself? Maybe add a
> inet_proxy_addr()/inet_proxy_port()? Better names?

What's the intended use case? I have trouble viewing those as anything
but information disclosure vectors, but I'm jaded. :)

If the goal is to give a last-ditch debugging tool to someone whose
proxy isn't behaving properly -- though I'd hope the proxy in question
has its own ways to debug its behavior -- maybe they could be locked
behind one of the pg_monitor roles, so that they're only available to
someone who could get that information anyway?

> PFA a patch that fixes the above errors, and changes
> inet_server_addr()/inet_server_port(). Does not yet add anything to
> receive the actual local port in this case.

Looking good in local testing. I'm going to reread the spec with fresh
eyes and do a full review pass, but this is shaping up nicely IMO.

Something that I haven't thought about very hard yet is proxy
authentication, but I think the simple IP authentication will be enough
for a first version. For the Unix socket case, it looks like anyone
currently relying on peer auth will need to switch to a
unix_socket_group/_permissions model. For now, that sounds like a
reasonable v1 restriction, though I think not being able to set the
proxy socket's permissions separately from the "normal" one might lead
to some complications in more advanced setups.

--Jacob

Re: free C string

2021-07-14 Thread Zhihong Yu

On Wed, Jul 14, 2021 at 10:17 AM Tom Lane  wrote:

> Zhihong Yu  writes:
> > I was looking at fmgr_internal_validator().
> > It seems prosrc is only used internally.
> > The patch frees the C string prosrc points to, prior to returning.
>
> There's really very little point in adding such code.  Our memory
> context mechanisms take care of minor leaks like this, with less
> code and fewer cycles expended than explicit pfree calls require.
> It's worth trying to clean up explicitly in code that might get
> executed many times in a row, or might be allocating very big
> temporary chunks; but fmgr_internal_validator hardly falls in
> that category.
>
> regards, tom lane
>
Hi,
How about this occurrence which is in a loop ?

Thanks


c-str-free.patch
Description: Binary data

Re: row filtering for logical replication

2021-07-14 Thread Euler Taveira

On Wed, Jul 14, 2021, at 8:21 AM, Greg Nancarrow wrote:
> Some minor v19 patch review points you might consider for your next
> patch version:
Greg, thanks for another review. I agree with all of these changes. It will be
in the next patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2021-07-14 Thread Zhihong Yu

On Wed, Jul 14, 2021 at 11:02 AM Alvaro Herrera 
wrote:

> On 2020-Oct-27, Justin Pryzby wrote:
>
> > I think either way could be ok - if you assume that the trigger was
> disabled
> > with ONLY, then it'd make sense to restore it with ONLY, but I think
> it's at
> > least as common to ALTER TABLE [*].  It might look weird to the user if
> they
> > used ALTER TABLE ONLY and the pg_dump output uses ALTER TABLE for that
> table,
> > and then again for all its children (or vice versa).  But it's fine as
> long as
> > the state is correctly restored.
> >
> > There are serveral issues:
> >  - fail to preserve childs' tgenabled in CREATE TABLE PARTITION OF;
> >  - fail to preserve childs' tgenabled in pg_dump;
> >  - fail to preserve childs' comments in pg_dump;
> >
> > I'm going step away from this patch at least for awhile, so I'm
> attaching my
> > latest in case it's useful.
>
> Here's a new cut of this series.  I used your pg_dump patch, but I blank
> out the CREATE TRIGGER query before stashing the ALTER TRIGGER;
> otherwise the dump has an error at restore time (because the trigger is
> created again on the partition, but it already exists because it's been
> created for the parent).  Also, the new query has the "OR tgenabled <>"
> test only if the table is a partition; and apply this new query only in
> 11 and 12; keep 9.x-10 unchanged, because it cannot possibly match
> anything.
>
> I tested this by creating 10k tables with one trigger each (no
> partitioned tables).  Total time to dump is the same as before.  I was
> concerned that because the query now has two LEFT JOINs it would be
> slower; but it seems to be only marginally so.
>
> I'm thinking to apply my patch that changes the server behavior only to
> 14 and up.  I could be persuaded to backpatch all the way to 11, if
> anybody supports the idea.
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W  —
> https://www.EnterpriseDB.com/
> "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
>

Hi, Alvaro:
It would be nice if this is backported to PG 11+

Thanks

pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Dave Cramer

Upgrading from
10.5 to 13.3 using pg_upgrade -k

The following is the result of an upgrade

select * from pg_extension ;
  oid  |  extname   | extowner | extnamespace | extrelocatable |
extversion | extconfig | extcondition
---++--+--+++---+--
 12910 | plpgsql|   10 |   11 | f  |
1.0|   |
 16403 | pg_stat_statements |   10 | 2200 | t  |
1.5|   |
(2 rows)

test=# \df+ pg_stat_statements_reset

  List of functions
 Schema |   Name   | Result data type | Argument data types
| Type | Volatility | Parallel | Owner | Security | Access privileges
   | Language |   Source code| Description
+--+--+-+--++--+---+--+---+--+--+-
 public | pg_stat_statements_reset | void |
 | func | volatile   | safe | davec | invoker  | davec=X/davec
  +| c| pg_stat_statements_reset |
|  |  |
 |  ||  |   |  |
pg_read_all_stats=X/davec |  |  |
(1 row)

And this is from creating the extension in a new db on the same instance

foo=# select * from pg_extension ;
  oid  |  extname   | extowner | extnamespace | extrelocatable |
extversion | extconfig | extcondition
---++--+--+++---+--
 12910 | plpgsql|   10 |   11 | f  |
1.0|   |
 16393 | pg_stat_statements |   10 | 2200 | t  |
1.8|   |
(2 rows)

foo=# \df+ pg_stat_statements_reset

List of functions
 Schema |   Name   | Result data type |
Argument data types | Type | Volatility |
Parallel | Owner | Security | Access privileges | Language | Source
code  | Description
+--+--++--++--+---+--+---+--+--+-
 public | pg_stat_statements_reset | void | userid oid DEFAULT
0, dbid oid DEFAULT 0, queryid bigint DEFAULT 0 | func | volatile   | safe
   | davec | invoker  | davec=X/davec | c|
pg_stat_statements_reset_1_7 |
(1 row)

Notice the upgraded version is 1.5 and the new version is 1.8

I would think somewhere in the upgrade of the schema there should have been
a create extension pg_stat_statements ?

Dave
Dave Cramer

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston

On Wednesday, July 14, 2021, Dave Cramer  wrote:

>
>
> Notice the upgraded version is 1.5 and the new version is 1.8
>
> I would think somewhere in the upgrade of the schema there should have
> been a create extension pg_stat_statements ?
>

That would be a faulty assumption.  Modules do not get upgraded during a
server version upgrade.  This is a good thing, IMO.

David J.

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Dave Cramer

On Wed, 14 Jul 2021 at 14:47, David G. Johnston 
wrote:

> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>
>>
>>
>> Notice the upgraded version is 1.5 and the new version is 1.8
>>
>> I would think somewhere in the upgrade of the schema there should have
>> been a create extension pg_stat_statements ?
>>
>
> That would be a faulty assumption.  Modules do not get upgraded during a
> server version upgrade.  This is a good thing, IMO.
>

This is from the documentation of pg_upgrade

Install any custom shared object files (or DLLs) used by the old cluster
into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
some other source. Do not install the schema definitions, e.g., CREATE
EXTENSION pgcrypto, because these will be upgraded from the old cluster.
Also, any custom full text search files (dictionary, synonym, thesaurus,
stop words) must also be copied to the new cluster.

If indeed modules do not get upgraded then the above is confusing at best,
and misleading at worst.

Dave

> David J.
>
>

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston

On Wed, Jul 14, 2021 at 11:59 AM Dave Cramer  wrote:

>
>
> On Wed, 14 Jul 2021 at 14:47, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>>
>>>
>>>
>>> Notice the upgraded version is 1.5 and the new version is 1.8
>>>
>>> I would think somewhere in the upgrade of the schema there should have
>>> been a create extension pg_stat_statements ?
>>>
>>
>> That would be a faulty assumption.  Modules do not get upgraded during a
>> server version upgrade.  This is a good thing, IMO.
>>
>
> This is from the documentation of pg_upgrade
>
> Install any custom shared object files (or DLLs) used by the old cluster
> into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
> some other source. Do not install the schema definitions, e.g., CREATE
> EXTENSION pgcrypto, because these will be upgraded from the old cluster.
> Also, any custom full text search files (dictionary, synonym, thesaurus,
> stop words) must also be copied to the new cluster.
>
> If indeed modules do not get upgraded then the above is confusing at best,
> and misleading at worst.
>
>
"Install ... files used by the old cluster" (which must be binary
compatible with the new cluster as noted elsewhere on that page) supports
the claim that it is the old cluster's version that is going to result.
But I agree that saying "because these will be upgraded from the old
cluster" is poorly worded and should be fixed to be more precise here.

Something like, "... because the installed extensions will be copied from
the old cluster during the upgrade."

David J.

Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

2021-07-14 Thread Ibrar Ahmed

On Wed, Jul 14, 2021 at 1:41 AM Tom Lane  wrote:

> Ibrar Ahmed  writes:
> > The patch is failing the regression, @Tom Lane  can
> you
> > please take a look at that.
>
> Seems to just need an update of the expected-file to account for test
> cases added recently.  (I take no position on whether the new results
> are desirable; some of these might be breaking the intent of the case.
> But this should quiet the cfbot anyway.)
>
> regards, tom lane
>
>
Thanks for the update.

The test case was added by commit "Add support for asynchronous execution"
"27e1f14563cf982f1f4d71e21ef24782a052" by Etsuro Fujita. He can comment
whether the new results are desirable or not.

-- 
Ibrar Ahmed

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Dave Cramer

On Wed, 14 Jul 2021 at 15:09, David G. Johnston 
wrote:

> On Wed, Jul 14, 2021 at 11:59 AM Dave Cramer  wrote:
>
>>
>>
>> On Wed, 14 Jul 2021 at 14:47, David G. Johnston <
>> david.g.johns...@gmail.com> wrote:
>>
>>> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>>>


 Notice the upgraded version is 1.5 and the new version is 1.8

 I would think somewhere in the upgrade of the schema there should have
 been a create extension pg_stat_statements ?

>>>
>>> That would be a faulty assumption.  Modules do not get upgraded during a
>>> server version upgrade.  This is a good thing, IMO.
>>>
>>
>> This is from the documentation of pg_upgrade
>>
>> Install any custom shared object files (or DLLs) used by the old cluster
>> into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
>> some other source. Do not install the schema definitions, e.g., CREATE
>> EXTENSION pgcrypto, because these will be upgraded from the old cluster.
>> Also, any custom full text search files (dictionary, synonym, thesaurus,
>> stop words) must also be copied to the new cluster.
>>
>> If indeed modules do not get upgraded then the above is confusing at
>> best, and misleading at worst.
>>
>>
> "Install ... files used by the old cluster" (which must be binary
> compatible with the new cluster as noted elsewhere on that page) supports
> the claim that it is the old cluster's version that is going to result.
> But I agree that saying "because these will be upgraded from the old
> cluster" is poorly worded and should be fixed to be more precise here.
>
> Something like, "... because the installed extensions will be copied from
> the old cluster during the upgrade."
>

This is still rather opaque. Without intimate knowledge of what changes
have occurred in each extension I have installed; how would I know what I
have to fix after the upgrade.

Seems to me extensions should either store some information in pg_extension
to indicate compatibility, or they should have some sort of upgrade script
which pg_upgrade would call to fix any problems (yes, I realize this is
hand waving at the moment)

In this example the older version of pg_stat_statements works fine, it only
fails when I do a dump restore of the new database and then the error is
rather obtuse. IIRC pg_dump wanted to revoke all from public from the
function pg_stat_statements_reset() and that could not be found, yet the
function is there. I don't believe we should be surprising our users like
this.

Dave

>
> David J.
>
>

Re: [PATCH] Hooks at XactCommand level

2021-07-14 Thread Tom Lane

Gilles Darold  writes:
> I have renamed the patch and the title of this proposal registered in 
> the commitfest "Xact/SubXact event callback at command start" to reflect 
> the last changes that do not include new hooks anymore.

Hmm, it doesn't seem like this has addressed my concern at all.
The callbacks are still effectively defined as "run during
start_xact_command", so they're not any less squishy semantically
than they were before.  The point of my criticism was that you
should move the call site to someplace that's more organically
connected to execution of commands.

Another thing I'm not too pleased with in this formulation is that it's
very unclear what the distinction is between XACT_EVENT_COMMAND_START
and SUBXACT_EVENT_COMMAND_START.  AFAICS, *every* potential use-case
for this would have to hook into both callback chains, and most likely
would treat the two events alike.  Plus, as you note, the goalposts
have suddenly been moved for the amount of overhead it's okay to have
in an XactCallback or SubXactCallback function.  So that might cause
problems for unrelated code.  It's probably better to not try to
re-use that infrastructure.

> The objective of this new callback is to be able to call user-defined 
> code before any new statement is executed. For example it can call a 
> rollback to savepoint if there was an error in the previous transaction 
> statement, which allow to implements Rollback at Statement Level at 
> server side using a PostgreSQL extension, see [1] .

Urgh.  Isn't this re-making the same mistake we made years ago, namely
trying to implement autocommit on the server side?  I fear this will
be a disaster even larger than that was, because if it's an extension
then pretty much no applications will be prepared for the new semantics.
I strongly urge you to read the discussions that led up to f85f43dfb,
and to search the commit history before that for mentions of
"autocommit", to see just how extensive the mess was.

I spent a little time trying to locate said discussions; it's harder
than it should be because we didn't have the practice of citing email
threads in the commit log at the time.  I did find

https://www.postgresql.org/message-id/flat/Pine.LNX.4.44.0303172059170.1975-10%40peter.localdomain#7ae26ed5c1bfbf9b22a420dfd8b8e69f

which seems to have been the proximate decision, and here are
a few threads talking about all the messes that were created
for JDBC etc:

https://www.postgresql.org/message-id/flat/3D793A93.703%40xythos.com#4a2e2d9bdf2967906a6e0a75815d6636
https://www.postgresql.org/message-id/flat/3383060E-272E-11D7-BA14-000502E740BA%40wellsgaming.com
https://www.postgresql.org/message-id/flat/Law14-F37PIje6n0ssr0bc1%40hotmail.com

Basically, changing transactional semantics underneath clients is
a horrid idea.  Having such behavior in an extension rather than
the core doesn't make it less horrid.  If we'd designed it to act
that way from day one, maybe it'd have been fine.  But as things
stand, we are quite locked into the position that this has to be
managed on the client side.

That point doesn't necessarily invalidate the value of having
some sort of hook in this general area.  But I would kind of like
to see another use-case, because I don't believe in this one.

regards, tom lane

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston

On Wed, Jul 14, 2021 at 12:21 PM Dave Cramer  wrote:

>
> On Wed, 14 Jul 2021 at 15:09, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> Something like, "... because the installed extensions will be copied from
>> the old cluster during the upgrade."
>>
>
> This is still rather opaque. Without intimate knowledge of what changes
> have occurred in each extension I have installed; how would I know what I
> have to fix after the upgrade.
>

The point of this behavior is that you don't have to fix anything after an
upgrade - so long as your current extension version works on the new
cluster.  If you are upgrading in such a way that the current extension and
new cluster are not compatible you need to not do that.  Upgrade instead to
a lesser version where they are compatible.  Then upgrade your extension to
its newer version, changing any required user code that such an upgrade
requires, then upgrade the server again.

David J.

Re: Replacing pg_depend PIN entries with a fixed range check

2021-07-14 Thread Tom Lane

John Naylor  writes:
> On Thu, May 27, 2021 at 6:53 PM Tom Lane  wrote:
>> Attached is a rebase over a4390abec.

> Looks good to me overall, I just had a couple questions/comments:

Thanks for looking!

> isObjectPinned and isSharedObjectPinned are now thin wrappers around
> IsPinnedObject. Is keeping those functions a matter of future-proofing in
> case something needs to be handled differently someday, or reducing
> unnecessary code churn?

Yeah, it was mostly a matter of reducing code churn.  We could probably
drop isSharedObjectPinned altogether, but isObjectPinned seems to have
some notational value in providing an API that takes an ObjectAddress.

> setup_depend now doesn't really need to execute any SQL (unless third-party
> forks have extra steps here?), and could be replaced with a direct call
> to StopGeneratingPinnedObjectIds. That's a bit more self-documenting, and
> that would allow shortening this comment:

Hm, I'm not following?  setup_depend runs in initdb, that is on the
client side.  It can't invoke backend-internal functions any other
way than via SQL, AFAICS.

regards, tom lane

Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-07-14 Thread Ibrar Ahmed

On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

>
> On 05.01.21 22:40, Paul Martinez wrote:
> > I've created a patch to better support referential integrity constraints
> when
> > using composite primary and foreign keys. This patch allows creating a
> foreign
> > key using the syntax:
> >
> >FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL
> (fk_id)
> >
> > which means that only the fk_id column will be set to NULL when the
> referenced
> > row is deleted, rather than both the tenant_id and fk_id columns.
>
> I think this is an interesting feature with a legitimate use case.
>
> I'm wondering a bit about what the ON UPDATE side of this is supposed to
> mean.  Consider your example:
>
> > CREATE TABLE tenants (id serial PRIMARY KEY);
> > CREATE TABLE users (
> >tenant_id int REFERENCES tenants ON DELETE CASCADE,
> >id serial,
> >PRIMARY KEY (tenant_id, id),
> > );
> > CREATE TABLE posts (
> >  tenant_id int REFERENCES tenants ON DELETE CASCADE,
> >  id serial,
> >  author_id int,
> >  PRIMARY KEY (tenant_id, id),
> >  FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET
> NULL
> > );
> >
> > INSERT INTO tenants VALUES (1);
> > INSERT INTO users VALUES (1, 101);
> > INSERT INTO posts VALUES (1, 201, 101);
> > DELETE FROM users WHERE id = 101;
> > ERROR:  null value in column "tenant_id" violates not-null constraint
> > DETAIL:  Failing row contains (null, 201, null).
>
> Consider what should happen when you update users.id.  Per SQL standard,
> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
> referencing column that corresponds to the referenced column actually
> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
> then this would work just fine.
>
> Your feature requires specifying a fixed column or columns to update, so
> it cannot react differently to what column actually updated.  In fact,
> you might want different referential actions depending on what columns
> are updated, like what you can do with general triggers.
>
> So, unless I'm missing an angle here, I would suggest leaving out the ON
> UPDATE variant of this feature.
>
>
>
Patch does not apply on head, I am marking the status "Waiting on author"
http://cfbot.cputube.org/patch_33_2932.log

-- 
Ibrar Ahmed

1 2 >

1 - 100 of 134 matches

Mail list logo