Re: Surely this code in setrefs.c is wrong?

2023-09-10 Thread David Rowley
On Sun, 10 Sept 2023 at 11:22, Tom Lane  wrote:
> if (!OidIsValid(saop->hashfuncid))
> record_plan_function_dependency(root, saop->hashfuncid);
>
> if (!OidIsValid(saop->negfuncid))
> record_plan_function_dependency(root, saop->negfuncid);
>
> Surely those if-conditions are exactly backward, and we should be
> recording nonzero hashfuncid and negfuncid entries, not zero ones.

That's certainly not coded as I intended. Perhaps I got my wires
crossed and mixed up OidIsValid and InvalidOid and without reading
correctly somehow thought OidIsValid was for the inverse case.

I'll push fixes once the 16.0 release is out of the way.

David




Re: Cleaning up array_in()

2023-09-10 Thread Alexander Lakhin

Hello Jian,

05.09.2023 02:53, jian he wrote:

On Mon, Sep 4, 2023 at 8:00 AM jian he  wrote:

hi.
attached v4.
v4, 0001 to 0005 is the same as v3 in
https://www.postgresql.org/message-id/5859ce4e-2be4-92b0-c85c-e1e24eab57c6%40iki.fi

v4-0006 doing some modifications to address the corner case mentioned
in the previous thread (like select '{{1,},{1},}'::text[]).
also fixed all these FIXME, Heikki mentioned in the code.

v4-0007 refactor ReadDimensionInt. to make the array dimension bound
variables within the INT_MIN and INT_MAX. so it will make select
'[21474836488:21474836489]={1,2}'::int[]; fail.


attached, same as v4, but delete unused variable {nitems_according_to_dims}.


Please look at the differences, I've observed with the latest patches
applied, old vs new behavior:

Case 1:
SELECT '{1,'::integer[];
ERROR:  malformed array literal: "{1,"
LINE 1: SELECT '{1,'::integer[];
   ^
DETAIL:  Unexpected end of input.

vs

ERROR:  malformed array literal: "{1,"
LINE 1: SELECT '{1,'::integer[];
   ^

(no DETAIL)

Case 2:
SELECT '{{},}'::text[];
ERROR:  malformed array literal: "{{},}"
LINE 1: SELECT '{{},}'::text[];
   ^
DETAIL:  Unexpected "}" character

vs
 text
--
 {}
(1 row)

Case 3:
select '{\{}'::text[];
 text
---
 {"{"}
(1 row)

vs
 text
--
 {""}

Best regards,
Alexander




Re: Inefficiency in parallel pg_restore with many tables

2023-09-10 Thread Tom Lane
Nathan Bossart  writes:
> I spent some more time on this patch and made the relevant adjustments to
> the rest of the set.

Hmm ... I do not like v7 very much at all.  It requires rather ugly
changes to all of the existing callers, and what are we actually
buying?  If anything, it makes things slower for pass-by-value items
like integers.  I'd stick with the Datum convention in the backend.

Instead, I took a closer look through the v6 patch set.
I think that's in pretty good shape and nearly committable,
but I have a few thoughts:

* I'm not sure about defining bh_node_type as a macro:

+#ifdef FRONTEND
+#define bh_node_type void *
+#else
+#define bh_node_type Datum
+#endif

rather than an actual typedef:

+#ifdef FRONTEND
+typedef void *bh_node_type;
+#else
+typedef Datum bh_node_type;
+#endif

My concern here is that bh_node_type is effectively acting as a
typedef, so that pgindent might misbehave if it's not declared as a
typedef.  On the other hand, there doesn't seem to be any indentation
problem in the patchset as it stands, and we don't expect any code
outside binaryheap.h/.c to refer to bh_node_type, so maybe it's fine.
(If you do choose to make it a typedef, remember to add it to
typedefs.list.)

* As a matter of style, I'd recommend adding braces in places
like this:

if (heap->bh_size >= heap->bh_space)
+   {
+#ifdef FRONTEND
+   pg_fatal("out of binary heap slots");
+#else
elog(ERROR, "out of binary heap slots");
+#endif
+   }
heap->bh_nodes[heap->bh_size] = d;

It's not wrong as you have it, but I think it's more readable
and less easy to accidentally break with the extra braces.

* In 0002, isn't the comment for binaryheap_remove_node wrong?

+ * Removes the nth node from the heap.  The caller must ensure that there are
+ * at least (n - 1) nodes in the heap.  O(log n) worst case.

Shouldn't that be "(n + 1)"?  Also, I'd specify "n'th (zero based) node"
for clarity.

* I would say that this bit in 0004:

-   j = removeHeapElement(pendingHeap, heapLength--);
+   j = (intptr_t) binaryheap_remove_first(pendingHeap);

needs an explicit cast to int:

+   j = (int) (intptr_t) binaryheap_remove_first(pendingHeap);

otherwise some compilers might complain about the result possibly
not fitting in "j".

Other than those nitpicks, I like v6.  I'll mark this RfC.

regards, tom lane




Re: proposal: psql: show current user in prompt

2023-09-10 Thread Pavel Stehule
pá 8. 9. 2023 v 21:07 odesílatel Pavel Stehule 
napsal:

> Hi
>
> Another thing that should be described there is that this falls
>> outside of the transaction flow, i.e. it's changes are not reverted on
>> ROLLBACK. But that leaves an important consideration: What happens
>> when an error occurs on the server during handling of this message
>> (e.g. the GUC does not exist or an OOM is triggered). Is any open
>> transaction aborted in that case? If not, we should have a test for
>> that.
>>
>
> I tested this scenario. I had to modify message handling to fix warning
> "message type 0x5a arrived from server while idle"
>

I fixed this issue. The problem was in the missing setting
`doing_extended_query_message`.


> But if this is inside a transaction, the transaction is aborted.
>
>>
>> +   if (PQresultStatus(res) != PGRES_COMMAND_OK)
>> +   pg_fatal("failed to link custom variable: %s",
>> PQerrorMessage(conn));
>> +   PQclear(res);
>>
>
> done
>
>
>>
>> The tests should also change the config after running
>> PQlinkParameterStatus/PQunlinkParameterStatus to show that the guc is
>> reported then or not reported then.
>>
>
> done
>
>
>>
>> +   if (!PQsendTypedCommand(conn, PqMsg_ReportGUC, 't', paramName))
>> +   return NULL;
>>
>>
>> I think we'll need some bikeshedding on what the protocol message
>> should look like exactly. I'm not entirely sure what is the most
>> sensible here, so please treat everything I write next as
>> suggestions/discussion:
>> I see that you're piggy-backing on PQsendTypedCommand, which seems
>> nice to avoid code duplication. It has one downside though: not every
>> type, is valid for each command anymore.
>> One way to avoid that would be to not introduce a new command, but
>> only add a new type that is understood by Describe and Close, e.g. a
>> 'G' (for GUC). Then PqMsg_Describe, G would become the equivalent of
>> what'the current patch its PqMsg_ReportGUC, 't' and PqMsg_Close, G
>> would be the same as PqMsg_ReportGUC, 'f'.
>>
>
> I am sorry, I don't understand this idea?
>
>
>>
>> The rest of this email assumes that we're keeping your current
>> proposal for the protocol message, so it might not make sense to
>> address all of this feedback, in case we're still going to change the
>> protocol:
>>
>> +   if (is_set == 't')
>> +   {
>> +   SetGUCOptionFlag(name, GUC_REPORT);
>> +   status = "SET REPORT_GUC";
>> +   }
>> +   else
>> +   {
>> +   UnsetGUCOptionFlag(name, GUC_REPORT);
>> +   status = "UNSET REPORT_GUC";
>> +   }
>>
>> I think we should be strict about what we accept here. Any other value
>> than 'f'/'t' for is_set should result in an error imho.
>>
>
> done
>

Regards

Pavel


>
>
From 4a7c8b30f297189f6801076556a984baa51daa4f Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sun, 3 Sep 2023 19:14:24 +0200
Subject: [PATCH 3/4] - allow to connect to server with major protocol version
 3, minor version is ignored

- allow to read minor protocol version
---
 doc/src/sgml/libpq.sgml | 17 +
 src/include/libpq/pqcomm.h  |  2 +-
 src/interfaces/libpq/exports.txt|  1 +
 src/interfaces/libpq/fe-connect.c   | 12 +++-
 src/interfaces/libpq/fe-protocol3.c | 21 +
 src/interfaces/libpq/libpq-fe.h |  1 +
 6 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index a52baa27d5..d9e5502d09 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2576,6 +2576,23 @@ int PQprotocolVersion(const PGconn *conn);
  
 
 
+
+ PQprotocolVersionFullPQprotocolVersionFull
+
+ 
+  
+   Returns complete frontend/backend protocol number.
+
+int PQprotocolVersionFull(const PGconn *conn);
+
+   The frontend/backend version protocol number is an value, that composites.
+   major protocol number and minor protocol number. These numbers can be
+   separated by using macros PG_PROTOCOL_MAJOR and
+   PG_PROTOCOL_MINOR.
+  
+ 
+
+
 
  PQserverVersionPQserverVersion
 
diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
index 46a0946b8b..4ea4538157 100644
--- a/src/include/libpq/pqcomm.h
+++ b/src/include/libpq/pqcomm.h
@@ -94,7 +94,7 @@ is_unixsock_path(const char *path)
  */
 
 #define PG_PROTOCOL_EARLIEST	PG_PROTOCOL(3,0)
-#define PG_PROTOCOL_LATEST		PG_PROTOCOL(3,0)
+#define PG_PROTOCOL_LATEST		PG_PROTOCOL(3,1)
 
 typedef uint32 ProtocolVersion; /* FE/BE protocol version number */
 
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 7e101368d5..595a4808f2 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -193,3 +193,4 @@ PQsendClosePrepared   190
 PQsendClosePortal 191
 PQlinkParameterStatus 192
 PQunl

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-10 Thread Ranier Vilela
Em sex., 8 de set. de 2023 às 03:24, Michael Paquier 
escreveu:

> On Wed, Sep 06, 2023 at 07:57:03AM -0300, Ranier Vilela wrote:
> > I think no one objected.
>
> Looking closer, there is much more inconsistency in this file
> depending on the routine called.  How about something like the v2
> attached instead to provide more context in the error message about
> the function called?

+1
But as Jeff mentioned, when the locale is NULL,
the provider is known to be COLLPROVIDER_LIBC.

I think we can use this to provide an error message,
when the locale is NULL.

What do you think about v3 attached?

best regards,
Ranier Vilela


pglocale-elogs-v3.patch
Description: Binary data


Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-10 Thread Michael Paquier
On Sun, Sep 10, 2023 at 06:28:08PM -0300, Ranier Vilela wrote:
> +1
> But as Jeff mentioned, when the locale is NULL,
> the provider is known to be COLLPROVIDER_LIBC.
> 
> I think we can use this to provide an error message,
> when the locale is NULL.
> 
> What do you think about v3 attached?

This does not apply for me on HEAD, and it seems to me that the patch
has some parts that apply on top of v2 (or v1?) while others would
apply to HEAD.

Anyway, what you are suggesting to change compared to v2 is that:

+   /*
+* if locale is NULL, then
+* the provider is known to be COLLPROVIDER_LIBC
+*/
if (!locale)
-   elog(ERROR, "unsupported collprovider");
+   elog(ERROR, "collprovider '%c' does not support (%s)", 
+   COLLPROVIDER_LIBC, "pg_strxfrm_prefix");

I'm OK with enforcing COLLPROVIDER_LIBC in this path, but I also value
consistency across all the error messages of this file.  After
sleeping on it, and as that's a set of elogs, "unsupported
collprovider" is fine for me across the board as these should not be
user-visible.

This should be made consistent down to 16, I guess, but only after
16.0 is tagged as we are in release week.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add inline comments to the pg_hba_file_rules view

2023-09-10 Thread Michael Paquier
On Mon, Sep 04, 2023 at 12:54:15PM +0200, Jim Jones wrote:
> The patch slightly changes the test 004_file_inclusion.pl to accommodate the
> new column and the hba comments.
> 
> Discussion: 
> https://www.postgresql.org/message-id/flat/3fec6550-93b0-b542-b203-b0054aaee83b%40uni-muenster.de

Well, it looks like what I wrote a couple of days ago was perhaps
confusing:
https://www.postgresql.org/message-id/ZPHAiNp%2ByKMsa/vc%40paquier.xyz
https://www.postgresql.org/message-id/zpe8a7enuh+ax...@paquier.xyz

This patch touches hbafuncs.c and the system view pg_hba_file_rules,
but I don't think this stuff should touch any of these code paths.
That's what I meant in my second message: the SQL portion should be
usable for all types of configuration files, even pg_ident.conf and
postgresql.conf, and not only pg_hba.conf.  A new SQL function
returning a SRF made of the comments extracted and the line numbers 
can be joined with all the system views of the configuration files,
like sourcefile and sourceline in pg_settings, etc.
--
Michael


signature.asc
Description: PGP signature


Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-09-10 Thread Richard Guo
On Fri, Sep 8, 2023 at 3:04 PM Ashutosh Bapat 
wrote:

> When the clause s.t1b = s.a is presented to distribute_qual_to_rels()
> it has form PHV(t1.b) = t2.b. The PHV's ph_eval_at is 4, which is what
> is returned as varno to pull_varnos(). The other Var in the caluse has
> varno = 4 already so  pull_varnos() returns a SINGLETON relids (b 4).
> The clause is an equality clause, so it is used to create an
> Equivalence class.
> generate_base_implied_equalities_no_const() then constructs the same
> RestrictInfo again and adds to baserestrictinfo of Rel with relid = 4
> i.e. t2's baserestrictinfo. I don't know whether that's the right
> thing to do.


Well, I think that's what PHVs are supposed to do.  Quoting the README:

... Note that even with this restriction, pullup of a LATERAL
subquery can result in creating PlaceHolderVars that contain lateral
references to relations outside their syntactic scope.  We still evaluate
such PHVs at their syntactic location or lower, but the presence of such a
PHV in the quals or targetlist of a plan node requires that node to appear
on the inside of a nestloop join relative to the rel(s) supplying the
lateral reference.


> I am not sure where we are taking the original bug fix with this
> investigation. Is it required to fix this problem in order to fix the
> original problem OR we should commit the fix for the original problem
> and then investigate this further?


Fair point.  This seems a separate problem from the original, so I'm
okay we fix them separately.

Thanks
Richard


Re: Impact of checkpointer during pg_upgrade

2023-09-10 Thread Dilip Kumar
On Fri, Sep 8, 2023 at 11:59 AM Amit Kapila  wrote:
>
> On Fri, Sep 8, 2023 at 10:10 AM Michael Paquier  wrote:
> >
> > On Fri, Sep 08, 2023 at 09:14:59AM +0530, Amit Kapila wrote:
> > > On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > >>>  I
> > >>> mean that doing the latter is benefitial for the sake of any patch 
> > >>> committed and
> > >>> as a long-term method to rely on.
> > >
> > > What is your worry here? Are you worried that unknowingly in the
> > > future we could add some other way to invalidate slots during upgrades
> > > that we won't be able to detect?
> >
> > Exactly.  A safety belt would not hurt, especially if the belt added
> > is simple.  The idea of a backend side elog(ERROR) with
> > isBinaryUpgrade is tempting in the invalidation slot path.
> >
>
> I agree with doing something simple. So, to conclude, we agree on two
> things in this thread (a) Use max_slot_wal_keep_size to -1 to start
> postmaster for the old cluster during the upgrade; (b) Have an
> elog(ERROR) to avoid invalidating slots during the upgrade.

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pg_logical_emit_message() misses a XLogFlush()

2023-09-10 Thread Fujii Masao




On 2023/08/16 16:51, Michael Paquier wrote:

Anyway, attached is a patch to add a 4th argument "flush" that
defaults to false.  Thoughts about this version are welcome.


When the "transactional" option is set to true, WAL including
the record generated by the pg_logical_emit_message() function is flushed
at the end of the transaction based on the synchronous_commit setting.
However, in the current patch, if "transactional" is set to false and
"flush" is true, the function flushes the WAL immediately without
considering synchronous_commit. Is this the intended behavior?
I'm not sure how the function should work in this case, though.

Though I don't understand the purpose of this option fully yet,
is flushing the WAL sufficient? Are there scenarios where the function
should ensure that the WAL is not only flushed but also replicated
to the standby?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Synchronizing slots from primary to standby

2023-09-10 Thread shveta malik
On Fri, Sep 8, 2023 at 4:40 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shveta,
>
> I resumed to check the thread. Here are my high-level comments.
> Sorry if you have been already discussed.

Thanks Kuroda-san for the feedback.
>
> 01. General
>
> I think the documentation can be added, not only GUCs. How about adding 
> examples
> for combinations of physical and logical replications?  You can say that both 
> of
> physical primary can be publisher and slots on primary/standby are 
> synchronized.
>

I did not fully understand this. Can you please state a clear example.
We are only synchronizing logical replication slots in this draft and
that too on physical standby from primary. So the last statement is
not completely true.

> 02. General
>
> standby_slot_names ensures that physical standby is always ahead subscriber, 
> but I
> think it may be not sufficient. There is a possibility that primary server 
> does
> not have any physical slots.So it expects a slot to be present.
> In this case the physical standby may be behind the
> subscriber and the system may be confused when the failover is occured.

Currently there is a check in slot-sync worker which mandates that
there is a physical slot present between primary and standby for this
feature to proceed.So that confusion state will not arise.
+ /* WalRcvData is not set or primary_slot_name is not set yet */
+ if (!WalRcv || WalRcv->slotname[0] == '\0')
+ return naptime;

>Can't we specify the name of standby via application_name or something?

So do you mean that in absence of a physical slot (if we plan to
support that), we let primary know about standby(slots-synchronization
client) through application_name? I am not sure about this. Will think
more on this. I would like to know others' opinion on this as well.

>
> 03. General
>
> In this architecture, the syncslot worker is launched per db and they
> independently connects to primary, right?

Not completely true. Each slotsync worker is responsible for managing
N dbs. Here 'N' =  'Number of distinct dbs for slots in
synchronize_slot_names'/ 'number of max_slotsync_workers configured'
for cases where dbcount exceeds workers configured.
And if dbcount < max_slotsync_workers, then we launch only that many
workers equal to dbcount and each worker manages a single db. Each
worker independently connects to primary. Currently it makes a
connection multiple times, I am optimizing it to make connection only
once and then after each SIGHUP assuming 'primary_conninfo' may
change. This change will be in the next version.


>I'm not sure it is efficient, but I
> come up with another architecture - only a worker (syncslot receiver)connects
> to the primary and other workers (syncslot worker) receives infos from it and
> updates. This can reduce the number of connections so that it may slightly
> improve the latency of network. How do you think?
>

I feel it may help in reducing network latency, but not sure if it
could be more efficient in keeping the lsns in sync. I feel it may
introduce lag due to the fact that only one worker is getting all the
info from primary and the actual synchronizing workers are waiting on
that worker. This lag may be more when the number of slots are huge.
We have run some performance tests on the design implemented
currently, please have a look at emails around [1] and [2].

> 04. General
>
> test file recovery/t/051_slot_sync.pl is missing.
>

yes, it was removed. Please see point3 at [3]


> 04. ReplSlotSyncMain
>
> Does the worker have to connect to the specific database?
>
>
> ```
> /* Connect to our database. */
> BackgroundWorkerInitializeConnectionByOid(MySlotSyncWorker->dbid,
>   
> MySlotSyncWorker->userid,
>   
> 0);
> ```

Since we are using libpq public interface 'walrcv_exec=libpqrcv_exec'
to connect to primary, this needs database connection. It errors out
in the absence of 'MyDatabaseId'. Do you think db-connection can have
some downsides?

>
> 05. SlotSyncInitSlotNamesLst()
>
> "Lst" should be "List".
>

Okay, I will change this in the next version.

==

[1]: 
https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com

thanks
Shveta




Re: proposal: possibility to read dumped table's name from file

2023-09-10 Thread Pavel Stehule
Hi

only rebase

Regards

Pavel
From b62d99f51da03b1fb8dae577fc49420bf36a3bad Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 126 
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1767 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c1e2220b3c..ae1cc522a8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -835,6 +835,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1165,6 +1265,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1608,6 +1709,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb > mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2023-09-10 Thread Andy Fan
Hi,

On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov 
wrote:

> Hi, all.
>
> Some of my clients use JOIN's with three - four clauses. Quite
> frequently, I see complaints on unreasonable switch of JOIN algorithm to
> Merge Join instead of Hash Join. Quick research have shown one weak
> place - estimation of an average bucket size in final_cost_hashjoin (see
> q2.sql in attachment) with very conservative strategy.
> Unlike estimation of groups, here we use smallest ndistinct value across
> all buckets instead of multiplying them (or trying to make multivariate
> analysis).
> It works fine for the case of one clause. But if we have many clauses,
> and if each has high value of ndistinct, we will overestimate average
> size of a bucket and, as a result, prefer to use Merge Join. As the
> example in attachment shows, it leads to worse plan than possible,
> sometimes drastically worse.
> I assume, this is done with fear of functional dependencies between hash
> clause components. But as for me, here we should go the same way, as
> estimation of groups.
>

I can reproduce the visitation you want to improve and verify the patch
can do it expectedly.  I think this is a right thing to do.


> The attached patch shows a sketch of the solution.
>

I understand that this is a sketch of the solution,  but the  below changes
still
make me confused.

+ if (innerbucketsize > virtualbuckets)
+ innerbucketsize = 1.0 / virtualbuckets;

innerbucketsize is a fraction of rows in all the rows, so it is between 0.0
and 1.0.
and virtualbuckets is the number of buckets in total (when considered the
mutli
batchs),  how is it possible for 'innerbucketsize > virtualbuckets' ?  Am
I missing something?

-- 
Best Regards
Andy Fan


Re: proposal: possibility to read dumped table's name from file

2023-09-10 Thread Pavel Stehule
Hi

po 11. 9. 2023 v 6:34 odesílatel Pavel Stehule 
napsal:

> Hi
>
> only rebase
>

Unfortunately this rebase was not correct. I am sorry.

fixed version

Regards

Pavel


>
> Regards
>
> Pavel
>
>
From 32ea3d180ccd976b266bb48c5445c96a1aaf7a54 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 127 +++-
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1767 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c1e2220b3c..ae1cc522a8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -835,6 +835,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1165,6 +1265,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1608,6 +1709,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;

Re: pg_logical_emit_message() misses a XLogFlush()

2023-09-10 Thread Michael Paquier
On Mon, Sep 11, 2023 at 12:54:11PM +0900, Fujii Masao wrote:
> However, in the current patch, if "transactional" is set to false and
> "flush" is true, the function flushes the WAL immediately without
> considering synchronous_commit. Is this the intended behavior?
> I'm not sure how the function should work in this case, though.

Yes, that's the intended behavior.  This just offers more options to
the toolkit of this function to give more control to applications when
emitting a message.  In this case, like the current non-transactional
case, we make the record immediately available to logical decoders but
also make sure that it is flushed to disk.  If one wants to force the
record's presence to a remote instance, then using the transactional
mode would be sufficient.

Perhaps you have a point here, though, that we had better make
entirely independent the flush and transactional parts, and still
call XLogFlush() even in transactional mode.  One would make sure that
the record is on disk before waiting for the commit to do so, but
that's also awkward for applications because they would not know the
end LSN of the emitted message until the internal transaction commits
the allocated XID, which would be a few records after the result
coming out of pg_logical_emit_message().

The documentation does not worry about any of that even now in the
case of the non-transactional case, and it does not mention that one
may need to monitor pg_stat_replication or similar to make sure that
the LSN of the message exists on the remote with an application-level
check, either.  How about adding an extra paragraph to the
documentation, then?  I could think of something like that, but the
current docs also outline this a bit by telling that the message is
*not* part of a transaction, which kind of implies, at least to me,
that synchonous_commit is moot in this case:
"When transactional is false, note that the backend ignores
synchronous_commit as the record is not part of a transaction so there
is no commit to wait for. Ensuring that the record of a message
emitted exists on standbys requires additional monitoring."

> Though I don't understand the purpose of this option fully yet,
> is flushing the WAL sufficient? Are there scenarios where the function
> should ensure that the WAL is not only flushed but also replicated
> to the standby?

The flush makes sure that the record is durable, but we only care
about transaction commits in a synchronous setup, so that's
independent, in my opinion.  If you look closely, we do some fancy
stuff in finish_sync_worker(), for example, where a transaction commit
is enforced to make sure that the internal flush is sensitive to the
synchronous commit requirements, but that's just something we expect
to happen in a sync worker.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-10 Thread Dilip Kumar
On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
 wrote:

Comments on the latest patch.

1.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to this restriction, the timing of restoring replication slots is
different from other objects.

This comment in the commit message is confusing.  I understand the
reason but from this, it is not very clear that if resetwal removes
the WAL we needed then why it is good to create after the resetwal.  I
think we should make it clear that creating new slot will set the
restart lsn to current WAL location and after that resetwal can remove
those WAL where slot restart lsn is pointing

2.

+
+ 
+  
+   All slots on the old cluster must be usable, i.e., there are no slots
+   whose
+   pg_replication_slots.wal_status
+   is lost.
+  
+ 
+ 
+  
+   pg_replication_slots.confirmed_flush_lsn
+   of all slots on the old cluster must be the same as the latest
+   checkpoint location.
+  
+ 
+ 
+  
+   The output plugins referenced by the slots on the old cluster must be
+   installed in the new PostgreSQL executable directory.
+  
+ 
+ 
+  
+   The new cluster must have
+   max_replication_slots
+   configured to a value greater than or equal to the number of slots
+   present in the old cluster.
+  
+ 
+ 
+  
+   The new cluster must have
+   wal_level as
+   logical.
+  
+ 
+

I think we should also add that the new slot should not have any
permanent existing logical replication slot.

3.
-   with the primary.)  Replication slots are not copied and must
-   be recreated.
+   with the primary.)  Replication slots on the old standby are not copied.
+   Only logical slots on the primary are migrated to the new standby,
+   and other slots must be recreated.

This paragraph should be rephrased.  I mean first stating that
"Replication slots on the old standby are not copied" and then saying
Only logical slots are migrated doesn't seem like the best way.  Maybe
we can just say "Only logical slots on the primary are migrated to the
new standby, and other slots must be recreated."

4.
+ /*
+ * Raise an ERROR if the logical replication slot is invalidating. It
+ * would not happen because max_slot_wal_keep_size is set to -1 during
+ * the upgrade, but it stays safe.
+ */
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ elog(ERROR, "Replication slots must not be invalidated during the upgrade.");

Rephrase the first line as ->  Raise an ERROR if the logical
replication slot is invalidating during an upgrade.

5.
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;


For readability change this to if
(GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
of the checks related to this, we are using 1700 so better be
consistent in this.

6.
+ if (nslots_on_new)
+ pg_fatal(ngettext("New cluster must not have logical replication
slots but found %d slot.",
+   "New cluster must not have logical replication slots but found %d slots.",
+   nslots_on_new),
+ nslots_on_new);
...
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level.");
+
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
+ wal_level);


I have noticed that the case of the first letter in the pg_fatal
message is not consistent.

7.
+
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {

Why comment says "Is the slot still usable?" I think it should be "Is
the slot usable?" otherwise it appears that we have first fetched the
slots and now we are refetching it and checking whether it is still
usable.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Cleaning up array_in()

2023-09-10 Thread jian he
On Sun, Sep 10, 2023 at 6:00 PM Alexander Lakhin  wrote:
> Case 1:
> SELECT '{1,'::integer[];
> ERROR:  malformed array literal: "{1,"
> LINE 1: SELECT '{1,'::integer[];
> ^
> DETAIL:  Unexpected end of input.
>
> vs
>
> ERROR:  malformed array literal: "{1,"
> LINE 1: SELECT '{1,'::integer[];
> ^
>
> (no DETAIL)
>
> Case 2:
> SELECT '{{},}'::text[];
> ERROR:  malformed array literal: "{{},}"
> LINE 1: SELECT '{{},}'::text[];
> ^
> DETAIL:  Unexpected "}" character
>
> vs
>   text
> --
>   {}
> (1 row)
>
> Case 3:
> select '{\{}'::text[];
>   text
> ---
>   {"{"}
> (1 row)
>
> vs
>   text
> --
>   {""}
>
> Best regards,
> Alexander

hi.
Thanks for reviewing it.

> DETAIL:  Unexpected end of input.
In many cases, ending errors will happen, so I consolidate it.

SELECT '{{},}'::text[];
solved by tracking current token type and previous token type.

select '{\{}'::text[];
solved by update dstendptr.

attached.
From d57060acd39a8460eb8ffa2cb448181e2ab24c53 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Tue, 4 Jul 2023 12:39:41 -0400
Subject: [PATCH v6 1/7] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

ArrayGetOffset0() is no longer used anywhere, so remove it.
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++
 src/backend/utils/adt/arrayutils.c | 15 ---
 src/include/utils/array.h  |  1 -
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 7828a626..df1ebb47 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ typedef struct ArrayIteratorData
 static int	ArrayCount(const char *str, int *dim, char typdelim,
 	   Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-		 int nitems, int ndim, int *dim,
+		 int nitems,
 		 FmgrInfo *inputproc, Oid typioparam, int32 typmod,
 		 char typdelim,
 		 int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
 	dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
 	nullsPtr = (bool *) palloc(nitems * sizeof(bool));
 	if (!ReadArrayStr(p, string,
-	  nitems, ndim, dim,
+	  nitems,
 	  &my_extra->proc, typioparam, typmod,
 	  typdelim,
 	  typlen, typbyval, typalign,
@@ -436,7 +436,8 @@ array_in(PG_FUNCTION_ARGS)
 
 /*
  * ArrayCount
- *	 Determines the dimensions for an array string.
+ *	 Determines the dimensions for an array string.  This includes
+ *	 syntax-checking the array structure decoration (braces and delimiters).
  *
  * Returns number of dimensions as function result.  The axis lengths are
  * returned in dim[], which must be of size MAXDIM.
@@ -683,16 +684,14 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
 /*
  * ReadArrayStr :
  *	 parses the array string pointed to by "arrayStr" and converts the values
- *	 to internal format.  Unspecified elements are initialized to nulls.
- *	 The array dimensions must already have been determined.
+ *	 to internal format.  The array dimensions must have been determined,
+ *	 and the case of an empty array must have been handled earlier.
  *
  * Inputs:
  *	arrayStr: the string to parse.
  *			  CAUTION: the contents of "arrayStr" will be modified!
  *	origStr: the unmodified input string, used only in error messages.
  *	nitems: total number of array elements, as already determined.
- *	ndim: number of array dimensions
- *	dim[]: array axis lengths
  *	inputproc: type-specific input procedure for element datatype.
  *	typioparam, typmod: auxiliary values to pass to inputproc.
  *	typdelim: the value delimiter (type-specific).
@@ -717,8 +716,6 @@ static bool
 ReadArrayStr(char *arrayStr,
 			 const char *origStr,
 			 int nitems,
-			 int ndim,
-			 int *dim,
 			 FmgrInfo *inputproc,
 			 Oid typioparam,
 			 int32 typmod,
@@ -732,20 +729,13 @@ ReadArrayStr(char *arrayStr,
 			 int32 *nbytes,
 			 Node *escontext)
 {
-	int			i,
+	int			i = 0,
 nest_level = 0;
 	char	   *srcptr;
 	bool		in_quotes = false;
 	bool		eoArray = false;
 	bool		hasnull;
 	int32		totbytes;
-	int			indx[MAXDIM] = {0},
-prod[MAXDIM];
-
-	mda_get_prod(ndim, dim, prod);
-
-	/* Initialize is-null markers to true */
-	memset(nulls, true, nitems * sizeof(bool));
 
 	/*
 	 * We have to remove " and \ characters to create a clean item value to
@@ -768,11 +758,20 @@ ReadA

Re: pg_logical_emit_message() misses a XLogFlush()

2023-09-10 Thread bt23nguyent

On 2023-09-11 14:02, Michael Paquier wrote:

On Mon, Sep 11, 2023 at 12:54:11PM +0900, Fujii Masao wrote:

However, in the current patch, if "transactional" is set to false and
"flush" is true, the function flushes the WAL immediately without
considering synchronous_commit. Is this the intended behavior?
I'm not sure how the function should work in this case, though.


Yes, that's the intended behavior.  This just offers more options to
the toolkit of this function to give more control to applications when
emitting a message.  In this case, like the current non-transactional
case, we make the record immediately available to logical decoders but
also make sure that it is flushed to disk.  If one wants to force the
record's presence to a remote instance, then using the transactional
mode would be sufficient.

Perhaps you have a point here, though, that we had better make
entirely independent the flush and transactional parts, and still
call XLogFlush() even in transactional mode.  One would make sure that
the record is on disk before waiting for the commit to do so, but
that's also awkward for applications because they would not know the
end LSN of the emitted message until the internal transaction commits
the allocated XID, which would be a few records after the result
coming out of pg_logical_emit_message().

The documentation does not worry about any of that even now in the
case of the non-transactional case, and it does not mention that one
may need to monitor pg_stat_replication or similar to make sure that
the LSN of the message exists on the remote with an application-level
check, either.  How about adding an extra paragraph to the
documentation, then?  I could think of something like that, but the
current docs also outline this a bit by telling that the message is
*not* part of a transaction, which kind of implies, at least to me,
that synchonous_commit is moot in this case:
"When transactional is false, note that the backend ignores
synchronous_commit as the record is not part of a transaction so there
is no commit to wait for. Ensuring that the record of a message
emitted exists on standbys requires additional monitoring."


Though I don't understand the purpose of this option fully yet,
is flushing the WAL sufficient? Are there scenarios where the function
should ensure that the WAL is not only flushed but also replicated
to the standby?


The flush makes sure that the record is durable, but we only care
about transaction commits in a synchronous setup, so that's
independent, in my opinion.  If you look closely, we do some fancy
stuff in finish_sync_worker(), for example, where a transaction commit
is enforced to make sure that the internal flush is sensitive to the
synchronous commit requirements, but that's just something we expect
to happen in a sync worker.
--
Michael

Hi,

With regard to the patch, the documentation outlines the 
pg_logical_emit_message function and its corresponding syntax in the 
following manner.


pg_logical_emit_message ( transactional boolean, prefix text, content 
text ) → pg_lsn
pg_logical_emit_message ( transactional boolean, prefix text, content 
bytea [, flush boolean DEFAULT false] ) → pg_lsn


A minor issue with the description here is that while the description 
for the new flush argument in pg_logical_emit_message() with bytea type 
is clearly declared, there is no description of flush argument in the 
former pg_logical_emit_message() with text type at all.


Additionally, there is a lack of consistency in the third argument names 
between the function definition and the description (i.e., "message 
bytea" versus "content bytea") as 
follows.


+CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message(
+   transactional boolean,
+   prefix text,
+   message bytea,
+   flush boolean DEFAULT false)
+RETURNS pg_lsn
+LANGUAGE INTERNAL
+VOLATILE STRICT
+AS 'pg_logical_emit_message_bytea';

...

+   pg_logical_emit_message ( 
transactional boolean, 
prefix text, 
content bytea [, 
flush boolean 
DEFAULT false] )


Could you please provide clarification on the reason for this 
differentiation?


On a side note, could you also include a bit more information that 
"flush is set to false by default" in the document as well? It could be 
helpful for the users.


Regards,
Tung Nguyen




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-10 Thread Amit Kapila
On Mon, Sep 11, 2023 at 10:39 AM Dilip Kumar  wrote:
>
> 3.
> -   with the primary.)  Replication slots are not copied and must
> -   be recreated.
> +   with the primary.)  Replication slots on the old standby are not 
> copied.
> +   Only logical slots on the primary are migrated to the new standby,
> +   and other slots must be recreated.
>
> This paragraph should be rephrased.  I mean first stating that
> "Replication slots on the old standby are not copied" and then saying
> Only logical slots are migrated doesn't seem like the best way.  Maybe
> we can just say "Only logical slots on the primary are migrated to the
> new standby, and other slots must be recreated."
>

It is fine to combine these sentences but let's be a bit more
explicit: "Only logical slots on the primary are migrated to the new
standby, and other slots on the old standby must be recreated as they
are not copied."

> 4.
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the 
> upgrade.");
>
> Rephrase the first line as ->  Raise an ERROR if the logical
> replication slot is invalidating during an upgrade.
>

I think it would be better to write something like: "The logical
replication slots shouldn't be invalidated as max_slot_wal_keep_size
is set to -1 during the upgrade."

> 5.
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
>
>
> For readability change this to if
> (GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
> of the checks related to this, we are using 1700 so better be
> consistent in this.
>

But the current check is consistent with what we do at other places
during the upgrade. I think the patch is trying to be consistent with
existing code as much as possible.

-- 
With Regards,
Amit Kapila.




BufferUsage counters' values have changed

2023-09-10 Thread Karina Litskevich
Hi hackers,

I noticed that BufferUsage counters' values are strangely different for the
same queries on REL_15_STABLE and REL_16_STABLE. For example, I run

CREATE EXTENSION pg_stat_statements;
CREATE TEMP TABLE test(b int);
INSERT INTO test(b) SELECT generate_series(1,1000);
SELECT query, local_blks_hit, local_blks_read, local_blks_written,
   local_blks_dirtied, temp_blks_written FROM pg_stat_statements;

and output on REL_15_STABLE contains

query  | INSERT INTO test(b) SELECT generate_series($1,$2)
local_blks_hit | 1005
local_blks_read| 2
local_blks_written | 5
local_blks_dirtied | 5
temp_blks_written  | 0

while output on REL_16_STABLE contains

query  | INSERT INTO test(b) SELECT generate_series($1,$2)
local_blks_hit | 1006
local_blks_read| 0
local_blks_written | 0
local_blks_dirtied | 5
temp_blks_written  | 8


I found a bug that causes one of the differences. Wrong counter is
incremented
in ExtendBufferedRelLocal(). The attached patch fixes it and should be
applied
to REL_16_STABLE and master. With the patch applied output contains

query  | INSERT INTO test(b) SELECT generate_series($1,$2)
local_blks_hit | 1006
local_blks_read| 0
local_blks_written | 8
local_blks_dirtied | 5
temp_blks_written  | 0


I still wonder why local_blks_written is greater than it was on
REL_15_STABLE,
and why local_blks_read became zero. These changes are caused by fcdda1e4b5.
This code is new to me, and I'm still trying to understand whether it's a
bug
in computing the counters or just changes in how many blocks are
read/written
during the query execution. If anyone can help me, I would be grateful.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-10 Thread Dilip Kumar
On Mon, Sep 11, 2023 at 11:16 AM Amit Kapila  wrote:
>
> On Mon, Sep 11, 2023 at 10:39 AM Dilip Kumar  wrote:
> >
> > 3.
> > -   with the primary.)  Replication slots are not copied and must
> > -   be recreated.
> > +   with the primary.)  Replication slots on the old standby are not 
> > copied.
> > +   Only logical slots on the primary are migrated to the new standby,
> > +   and other slots must be recreated.
> >
> > This paragraph should be rephrased.  I mean first stating that
> > "Replication slots on the old standby are not copied" and then saying
> > Only logical slots are migrated doesn't seem like the best way.  Maybe
> > we can just say "Only logical slots on the primary are migrated to the
> > new standby, and other slots must be recreated."
> >
>
> It is fine to combine these sentences but let's be a bit more
> explicit: "Only logical slots on the primary are migrated to the new
> standby, and other slots on the old standby must be recreated as they
> are not copied."

Fine with this.

> > 4.
> > + /*
> > + * Raise an ERROR if the logical replication slot is invalidating. It
> > + * would not happen because max_slot_wal_keep_size is set to -1 during
> > + * the upgrade, but it stays safe.
> > + */
> > + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> > + elog(ERROR, "Replication slots must not be invalidated during the 
> > upgrade.");
> >
> > Rephrase the first line as ->  Raise an ERROR if the logical
> > replication slot is invalidating during an upgrade.
> >
>
> I think it would be better to write something like: "The logical
> replication slots shouldn't be invalidated as max_slot_wal_keep_size
> is set to -1 during the upgrade."

This makes it much clear.

> > 5.
> > + /* Logical slots can be migrated since PG17. */
> > + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> > + return;
> >
> >
> > For readability change this to if
> > (GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
> > of the checks related to this, we are using 1700 so better be
> > consistent in this.
> >
>
> But the current check is consistent with what we do at other places
> during the upgrade. I think the patch is trying to be consistent with
> existing code as much as possible.

Okay, I see.  Thanks for pointing that out.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: BufferUsage counters' values have changed

2023-09-10 Thread Karina Litskevich
On Mon, Sep 11, 2023 at 9:08 AM Karina Litskevich <
litskevichkar...@gmail.com> wrote:

> I found a bug that causes one of the differences. Wrong counter is
> incremented
> in ExtendBufferedRelLocal(). The attached patch fixes it and should be
> applied
> to REL_16_STABLE and master.
>

 I've forgotten to attach the patch. Here it is.
From 999a3d533a9b74c8568cc8a3d715c287de45dd2c Mon Sep 17 00:00:00 2001
From: Karina Litskevich 
Date: Thu, 7 Sep 2023 17:44:40 +0300
Subject: [PATCH v1] Fix local_blks_written counter incrementation

---
 src/backend/storage/buffer/localbuf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 1735ec7141..567b8d15ef 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -431,7 +431,7 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 
 	*extended_by = extend_by;
 
-	pgBufferUsage.temp_blks_written += extend_by;
+	pgBufferUsage.local_blks_written += extend_by;
 
 	return first_block;
 }
-- 
2.34.1



Re: proposal: possibility to read dumped table's name from file

2023-09-10 Thread Pavel Stehule
Hi

po 11. 9. 2023 v 6:57 odesílatel Pavel Stehule 
napsal:

> Hi
>
> po 11. 9. 2023 v 6:34 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> only rebase
>>
>
> Unfortunately this rebase was not correct. I am sorry.
>
> fixed version
>

and fixed forgotten "break" in switch

Regards

Pavel



>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
From 1f0738992d69d0d748bd4494a9244c353c224ace Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 128 +++-
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1768 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c1e2220b3c..ae1cc522a8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -835,6 +835,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1165,6 +1265,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
  

Re: persist logical slots to disk during shutdown checkpoint

2023-09-10 Thread Michael Paquier
On Fri, Sep 08, 2023 at 11:50:37AM +0530, Amit Kapila wrote:
> On Fri, Sep 8, 2023 at 10:08 AM Michael Paquier  wrote:
>>
>>
>> +/*
>> + * This is used to track the last persisted confirmed_flush LSN value to
>> + * detect any divergence in the in-memory and on-disk values for the 
>> same.
>> + */
>>
>> "This value tracks is the last LSN where this slot's data has been
>> flushed to disk.
>>
> 
> This makes the comment vague as this sounds like we are saving a slot
> corresponding to some LSN which is not the case. If you prefer this
> tone then we can instead say: "This value tracks the last
> confirmed_flush LSN flushed which is used during a checkpoint shutdown
> to decide if a logical slot's data should be forcibly flushed or not."

Okay, that looks like an improvement over the term "persisted".

>> This is used during a checkpoint shutdown to decide
>> if a logical slot's data should be forcibly flushed or not."
>>
>> Hmm.  WAL senders are shut down *after* the checkpointer and *after*
>> the shutdown checkpoint is finished (see PM_SHUTDOWN and
>> PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the
>> checkpoint record before shutting down the primary.
>>
> 
> As per my understanding, this is not true for logical walsenders. As
> per code, while HandleCheckpointerInterrupts(), we call ShutdownXLOG()
> which sends a signal to walsender to stop and waits for it to stop.
> And only after that, did it write a shutdown checkpoint WAL record.
> After getting the InitStopping signal, walsender sets got_STOPPING
> flag. Then *logical* walsender ensures that it sends all the pending
> WAL and exits. What you have quoted is probably true for physical
> walsenders.

Hm, reminding me about this area..  This roots down to the handling of
WalSndCaughtUp in the send_data callback for logical or physical.
This is switched to true for logical WAL senders much earlier than
physical WAL senders, aka before the shutdown checkpoint begins in the
latter.  What was itching me a bit is that the postmaster logic could
be made more solid.  Logical and physical WAL senders are both marked
as BACKEND_TYPE_WALSND, but we don't actually check that the WAL
senders remaining at the end of PM_SHUTDOWN_2 are *not* connected to a
database.  This would require a new BACKEND_TYPE_* perhaps, or perhaps
we're fine with the current state because we'll catch up problems in
the checkpointer if any WAL is generated while the shutdown checkpoint
is running anyway.  Just something I got in mind, unrelated to this
patch.

>>  In order to limit
>> the number of records to work on after a restart, what this patch is
>> proposing is an improvement.  Perhaps it would be better to document
>> that we don't care about the potential concurrent activity of logical
>> WAL senders in this case and that the LSN we are saving at is a best
>> effort, meaning that last_saved_confirmed_flush is just here to reduce
>> the damage on a follow-up restart?
> 
> Unless I am wrong, there shouldn't be any concurrent activity for
> logical walsenders. IIRC, it is a mandatory requirement for logical
> walsenders to stop before shutdown checkpointer to avoid panic error.
> We do handle logical walsnders differently because they can generate
> WAL during decoding.

Yeah.  See above.

>>  The comment added in
>> CheckPointReplicationSlots() goes in this direction, but perhaps this
>> potential concurrent activity should be mentioned?
> 
> Sure, we can change it if required.

+ * We can flush dirty replication slots at regular intervals by any
+ * background process like bgwriter but checkpoint is a convenient location.

I still don't quite see a need to mention the bgwriter at all here..
That's just unrelated.

The comment block in CheckPointReplicationSlots() from v10 uses
"persist", but you mean "flush", I guess..
--
Michael


signature.asc
Description: PGP signature


Re: PSQL error: total cell count of XXX exceeded

2023-09-10 Thread Hongxu Ma
I created a patch to fix it.
Really appreciate to anyone can help to review it.
Thanks.


From: Hongxu Ma 
Sent: Saturday, August 26, 2023 19:19
To: David G. Johnston 
Cc: PostgreSQL Hackers 
Subject: Re: PSQL error: total cell count of XXX exceeded

Thank you David.

>From the code logic, I don't think this check is meant to check the limit:
If it enters the double-loop (cont.nrows * cont.ncolumns) in printQuery(), the 
check should be always false (except overflow happened). So, if want to check 
the limit, we could have done this check before the double-loop: just checking 
PGresult and reports error earlier.

> I wouldn’t be adverse to an improved error message, and possibly documenting 
> said limit.

Agreed with you, current error message may even report a negative value, it's 
very confusing for user. It's better to introduce a limit here. Or using a 
bigger integer type (e.g. long) for them, but it's also have the theoretical 
upbound.

Thanks.


From: David G. Johnston 
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma 
Cc: PostgreSQL Hackers 
Subject: Re: PSQL error: total cell count of XXX exceeded

On Friday, August 25, 2023, Hongxu Ma 
mailto:inte...@outlook.com>> wrote:

When I tried to select a big amount of rows, psql complains a error "Cannot add 
cell to table content: total cell count of 905032704 exceeded."

 We should use long for ncolumns and nrows and give a more obvious error 
message here.

Any thoughts? or some other hidden reasons?

9 millions cells seems more than realistic a limit for a psql query result 
output.  In any case it isn’t a bug, the code demonstrates that fact by 
producing an explicit error.

I wouldn’t be adverse to an improved error message, and possibly documenting 
said limit.

David J.



0001-Using-long-type-in-printTableAddCell.patch
Description: 0001-Using-long-type-in-printTableAddCell.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-10 Thread Amit Kapila
On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for reviewing! PSA new version! PSA new version.
>

Few comments:
==
1.
+   pg_replication_slots.confirmed_flush_lsn
+   of all slots on the old cluster must be the same as the latest
+   checkpoint location.

We can add something like: "This ensures that all the data has been
replicated before the upgrade." to make it clear why this test is
important.

2. Move the wal_level related restriction before max_replication_slots.

3.
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);
+ }
+
+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all
+ * the slots is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been
+ * shut down, so we skip this for live checks.
+ */
+ if (!live_check && !slot->caught_up)

Isn't it better to continue for the next slot once we find that slot
is invalid instead of checking other conditions?

4.
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);

Let's keep it as one string and change the message to: "The slot
"\"%s\" is invalid"

+ fprintf(script,
+ "slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+ slot->slotname);
+ }

On a similar line, we can change this to: "The slot "\"%s\" has not
consumed the WAL yet"

5.
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "problematic_logical_relication_slots.txt");

I think we can name this file as "invalid_logical_replication_slots"
or simply "logical_replication_slots"

6.
+ pg_fatal("The source cluster contains one or more problematic
logical replication slots.\n"
+ "The needed workaround depends on the problem.\n"
+ "1) If the problem is \"The slot is unusable,\" You can drop such
replication slots.\n"
+ "2) If the problem is \"The slot has not consumed WALs yet,\" you
can consume all remaining WALs.\n"
+ "Then, you can restart the upgrade.\n"
+ "A list of problematic logical replication slots is in the file:\n"
+ "%s", output_path);

This doesn't match the similar existing comments. So, let's change it
to something like:

"Your installation contains invalid logical replication slots.  These
slots can't be copied so this cluster cannot currently be upgraded.
Consider either removing such slots or consuming the pending WAL if
any and then restart the upgrade.  A list of invalid logical
replication slots is in the file:"

Apart from the above, I have edited a few other comments in the patch.
See attached.

-- 
With Regards,
Amit Kapila.


cosmetic_improvements_amit.1.patch
Description: Binary data