Re: Parallel CREATE INDEX for BRIN indexes

2023-11-12 Thread Matthias van de Meent
On Wed, 8 Nov 2023 at 12:03, Tomas Vondra  wrote:
>
> Hi,
>
> here's an updated patch, addressing the review comments, and reworking
> how the work is divided between the workers & leader etc.

Thanks!

> In general I'm quite happy with the current state, and I believe it's
> fairly close to be committable.

Are you planning on committing the patches separately, or squashed? I
won't have much time this week for reviewing the patch, and it seems
like these patches are incremental, so some guidance on what you want
to be reviewed would be appreciated.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Parallel CREATE INDEX for BRIN indexes

2023-11-12 Thread Tomas Vondra



On 11/12/23 10:38, Matthias van de Meent wrote:
> On Wed, 8 Nov 2023 at 12:03, Tomas Vondra  
> wrote:
>>
>> Hi,
>>
>> here's an updated patch, addressing the review comments, and reworking
>> how the work is divided between the workers & leader etc.
> 
> Thanks!
> 
>> In general I'm quite happy with the current state, and I believe it's
>> fairly close to be committable.
> 
> Are you planning on committing the patches separately, or squashed? I
> won't have much time this week for reviewing the patch, and it seems
> like these patches are incremental, so some guidance on what you want
> to be reviewed would be appreciated.
> 

Definitely squashed. I only kept them separate to make it more obvious
what the changes are.

If you need more time for a review, I can certainly wait. No rush.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




building with meson on windows with ssl

2023-11-12 Thread Dave Cramer
Greetings,

I am getting the following error
building on HEAD

Library crypto found: YES
Checking for function "CRYPTO_new_ex_data" with dependencies -lssl,
-lcrypto: NO

I have openssl 1.1.1 installed

Dave Cramer


Re: proposal: possibility to read dumped table's name from file

2023-11-12 Thread Pavel Stehule
Hi


> What are your thoughts on this version?  It's not in a committable state
> as it
> needs a bit more comments here and there and a triplecheck that nothing was
> missed in changing this, but I prefer to get your thoughts before spending
> the
> extra time.
>

I think using pointer to exit function is an elegant solution. I checked
the code and I found only one issue. I fixed warning

[13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
[13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
[13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might
be a candidate for ‘gnu_printf’ format attribute
[-Werror=suggest-attribute=format]
[13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
[13:58:20.858] | ^
[13:58:20.858] cc1: all warnings being treated as errors

and probably copy/paste bug

diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename,
RestoreOptions *opts)
case FILTER_OBJECT_TYPE_EXTENSION:
case FILTER_OBJECT_TYPE_FOREIGN_DATA:
pg_log_filter_error(&fstate, _("%s filter for \"%s\" is
not allowed."),
-   "exclude",
+   "include",
filter_object_type_name(objtype));
exit_nicely(1);

Regards

Pavel


>
> --
> Daniel Gustafsson
>
>
From 9be8fb14a3fe75aa4203a059e8372986bf5e6615 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sun, 12 Nov 2023 13:54:26 +0100
Subject: [PATCH 2/2] fix err message

---
 src/bin/pg_dump/pg_restore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
 case FILTER_OBJECT_TYPE_EXTENSION:
 case FILTER_OBJECT_TYPE_FOREIGN_DATA:
 	pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."),
-		"exclude",
+		"include",
 		filter_object_type_name(objtype));
 	exit_nicely(1);
 
-- 
2.41.0

From cbaae854eca0cc88bb0886abfd45416ad13cffc7 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sat, 11 Nov 2023 20:34:34 +0100
Subject: [PATCH 1/2] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 471 +
 src/bin/pg_dump/filter.h|  60 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 114 +++-
 src/bin/pg_dump/pg_dumpall.c|  68 +-
 src/bin/pg_dump/pg_restore.c| 103 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1698 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8695571045..e2f100d552 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -836,6 +836,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword s

Re: Add PQsendSyncMessage() to libpq

2023-11-12 Thread Anton Kirilov

Hello,

Thanks for the feedback!

On 07/11/2023 09:23, Jelte Fennema-Nio wrote:
> But I think it's looking at the situation from the wrong direction. 
[...] we should look at it as an addition to our current list of PQsend 
functions for a new packet type. And none of those PQsend functions ever 
needed a flag. Which makes sense, because they are the lowest level 
building blocks that make sense from a user perspective: They send a 
message type over the socket and don't do anything else.


Yes, I think that this is quite close to my thinking when I created the 
original version of the patch. Also, the protocol specification states 
that the Sync message lacks parameters.


Since there haven't been any comments from the other people who have 
chimed in on this e-mail thread, I will assume that there is consensus 
(we are doing a U-turn with the implementation approach after all), so 
here is the updated version of the patch.


Best wishes,
Anton KirilovFrom b752269b2763f8d66bcfc79faf751e52226c344b Mon Sep 17 00:00:00 2001
From: Anton Kirilov 
Date: Wed, 22 Mar 2023 20:39:57 +
Subject: [PATCH v5] Add PQsendPipelineSync() to libpq

This new function is equivalent to PQpipelineSync(), except
that it does not flush anything to the server; the user must
subsequently call PQflush() instead. Its purpose is to reduce
the system call overhead of pipeline mode.
---
 doc/src/sgml/libpq.sgml   | 45 ---
 src/interfaces/libpq/exports.txt  |  1 +
 src/interfaces/libpq/fe-exec.c| 17 +--
 src/interfaces/libpq/libpq-fe.h   |  1 +
 .../modules/libpq_pipeline/libpq_pipeline.c   | 37 +++
 .../traces/multi_pipelines.trace  | 11 +
 6 files changed, 102 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 64b2910fee..61bee82a54 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3547,8 +3547,9 @@ ExecStatusType PQresultStatus(const PGresult *res);
   

 The PGresult represents a
-synchronization point in pipeline mode, requested by
-.
+synchronization point in pipeline mode, requested by either
+ or
+.
 This status occurs only when pipeline mode has been selected.

   
@@ -5122,7 +5123,8 @@ int PQsendClosePortal(PGconn *conn, const char *portalName);
,
,
,
-   , or
+   ,
+   , or

call, and returns it.
A null pointer is returned when the command is complete and there
@@ -5506,8 +5508,9 @@ int PQflush(PGconn *conn);
  client sends them.  The server will begin executing the commands in the
  pipeline immediately, not waiting for the end of the pipeline.
  Note that results are buffered on the server side; the server flushes
- that buffer when a synchronization point is established with
- PQpipelineSync, or when
+ that buffer when a synchronization point is established with either
+ PQpipelineSync or
+ PQsendPipelineSync, or when
  PQsendFlushRequest is called.
  If any statement encounters an error, the server aborts the current
  transaction and does not execute any subsequent command in the queue
@@ -5564,7 +5567,8 @@ int PQflush(PGconn *conn);
  PGresult types PGRES_PIPELINE_SYNC
  and PGRES_PIPELINE_ABORTED.
  PGRES_PIPELINE_SYNC is reported exactly once for each
- PQpipelineSync at the corresponding point
+ PQpipelineSync or
+ PQsendPipelineSync at the corresponding point
  in the pipeline.
  PGRES_PIPELINE_ABORTED is emitted in place of a normal
  query result for the first error and all subsequent results
@@ -5602,7 +5606,8 @@ int PQflush(PGconn *conn);
  PQresultStatus will report a
  PGRES_PIPELINE_ABORTED result for each remaining queued
  operation in an aborted pipeline. The result for
- PQpipelineSync is reported as
+ PQpipelineSync or
+ PQsendPipelineSync is reported as
  PGRES_PIPELINE_SYNC to signal the end of the aborted pipeline
  and resumption of normal result processing.
 
@@ -5834,6 +5839,32 @@ int PQsendFlushRequest(PGconn *conn);

   
  
+
+
+ PQsendPipelineSyncPQsendPipelineSync
+
+ 
+  
+   Marks a synchronization point in a pipeline by sending a
+   sync message
+   without flushing the send buffer. This serves as
+   the delimiter of an implicit transaction and an error recovery
+   point; see .
+
+
+int PQsendPipelineSync(PGconn *conn);
+
+  
+  
+   Returns 1 for success. Returns 0 if the connection is not in
+   pipeline mode or sending a
+   sync message
+   failed.
+   Note that the message is not itself flushed to the server automatically;
+   use PQflush if necessary.
+  
+ 
+

   
 
diff --git a/src/interfaces/libpq/exports.txt

Re: pg_basebackup check vs Windows file path limits

2023-11-12 Thread Andrew Dunstan


On 2023-11-11 Sa 12:00, Alexander Lakhin wrote:

11.11.2023 18:18, Andrew Dunstan wrote:


Hmm, maybe we should be using File::Copy::move() instead of rename(). 
The docco for that says:


     If possible, move() will simply rename the file. Otherwise, it
     copies the file to the new location and deletes the original. If an
     error occurs during this copy-and-delete process, you may be left
     with a (possibly partial) copy of the file under the destination
     name.


Unfortunately, I've stumbled upon inability of File::Copy::move()
to move directories across filesystems, exactly as described here:
https://stackoverflow.com/questions/17628039/filecopy-move-directories-accross-drives-in-windows-not-working

(I'm sorry for not looking above rename() where this stated explicitly:
# On Windows use the short location to avoid path length issues.
# Elsewhere use $tempdir to avoid file system boundary issues with moving.
So this issue affects Windows only.)




*sigh*

A probable workaround is to use a temp directory on the same device the 
test is building on. Just set it up and set your environment TEMPDIR to 
point to it, and I think it will be OK (i.e. I havent tested it).


But that doesn't mean I'm not searching for a better solution. Maybe 
Alvaro's suggestion nearby will help.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: building with meson on windows with ssl

2023-11-12 Thread Dave Cramer
On Sun, 12 Nov 2023 at 07:57, Dave Cramer  wrote:

> Greetings,
>
> I am getting the following error
> building on HEAD
>
> Library crypto found: YES
> Checking for function "CRYPTO_new_ex_data" with dependencies -lssl,
> -lcrypto: NO
>

So this is the error you get if you mix a 64 bit version of openssl and
build with x86 tools. Clearly my problem, but the error message is less
than helpful

Dave

>


Re: [PATCH] Add support function for containment operators

2023-11-12 Thread Laurenz Albe
On Fri, 2023-10-20 at 16:24 +0800, jian he wrote:
> [new patch]

Thanks, that patch works as expected and passes regression tests.

Some comments about the code:

> --- a/src/backend/utils/adt/rangetypes.c
> +++ b/src/backend/utils/adt/rangetypes.c
> @@ -558,7 +570,6 @@ elem_contained_by_range(PG_FUNCTION_ARGS)
>   PG_RETURN_BOOL(range_contains_elem_internal(typcache, r, val));
>  }
>  
> -
>  /* range, range -> bool functions */
>  
>  /* equality (internal version) */

Please don't change unrelated whitespace.

> +static Node *
> +find_simplified_clause(Const *rangeConst, Expr *otherExpr)
> +{
> + Form_pg_range pg_range;
> + HeapTuple   tup;
> + Oid opclassOid;
> + RangeBound  lower;
> + RangeBound  upper;
> + boolempty;
> + Oid rng_collation;
> + TypeCacheEntry *elemTypcache;
> + Oid opfamily =  InvalidOid;
> +
> + RangeType  *range = DatumGetRangeTypeP(rangeConst->constvalue);
> + TypeCacheEntry *rangetypcache = 
> lookup_type_cache(RangeTypeGetOid(range), TYPECACHE_RANGE_INFO);
> + {

This brace is unnecessary.  Perhaps a leftover from a removed conditional 
statement.

> + /* this part is get the range's SUBTYPE_OPCLASS from pg_range 
> catalog.
> +  * Refer load_rangetype_info function last line.
> +  * TypeCacheEntry->rngelemtype typcaheenetry either don't have 
> opclass entry or with default opclass.
> +  * Range's subtype opclass only in catalog table.
> + */

The comments in the patch need some more love.
Apart from the language, you should have a look at the style guide:

- single-line comments should start with lower case and have no period:

  /* example of a single-line comment */

- Multi-line comments should start with /* on its own line and end with */ on 
its
  own line.  They should use whole sentences:

  /*
   * In case a comment spans several lines, it should look like
   * this.  Try not to exceed 80 characters.
   */

> + tup = SearchSysCache1(RANGETYPE, 
> ObjectIdGetDatum(RangeTypeGetOid(range)));
> +
> + /* should not fail, since we already checked typtype ... */
> + if (!HeapTupleIsValid(tup))
> + elog(ERROR, "cache lookup failed for range type %u", 
> RangeTypeGetOid(range));

If this is a "can't happen" case, it should be an Assert.

> +
> + pg_range = (Form_pg_range) GETSTRUCT(tup);
> +
> + opclassOid = pg_range->rngsubopc;
> +
> + ReleaseSysCache(tup);
> +
> + /* get opclass properties and look up the comparison function */
> + opfamily = get_opclass_family(opclassOid);
> + }
> +
> + range_deserialize(rangetypcache, range, &lower, &upper, &empty);
> + rng_collation = rangetypcache->rng_collation;
> +
> + if (empty)
> + {
> + /* If the range is empty, then there can be no matches. */
> + return makeBoolConst(false, false);
> + }
> + else if (lower.infinite && upper.infinite)
> + {
> + /* The range has no bounds, so matches everything. */
> + return makeBoolConst(true, false);
> + }
> + else
> + {

Many of the variables declared at the beginning of the function are only used in
this branch.  You should declare them here.

> +static Node *
> +match_support_request(Node *rawreq)
> +{
> + if (IsA(rawreq, SupportRequestSimplify))
> + {

To keep the indentation shallow, the preferred style is:

  if (/* case we don't handle */)
  return NULL;
  /* proceed without indentation */

> + SupportRequestSimplify *req = (SupportRequestSimplify *) rawreq;
> + FuncExpr   *fexpr = req->fcall;
> + Node   *leftop;
> + Node   *rightop;
> + Const  *rangeConst;
> + Expr   *otherExpr;
> +
> + Assert(list_length(fexpr->args) == 2);
> +
> + leftop = linitial(fexpr->args);
> + rightop = lsecond(fexpr->args);
> +
> + switch (fexpr->funcid)
> + {
> + case F_ELEM_CONTAINED_BY_RANGE:
> + if (!IsA(rightop, Const) || ((Const *) 
> rightop)->constisnull)
> + return NULL;
> +
> + rangeConst = (Const *) rightop;
> + otherExpr = (Expr *) leftop;
> + break;
> +
> + case F_RANGE_CONTAINS_ELEM:
> + if (!IsA(leftop, Const) || ((Const *) 
> leftop)->constisnull)
> + return NULL;
> +
> + rangeConst = (Const *) leftop;
> + otherExpr = (Expr *) rightop;
> + break;
> +
> + default:
> +   

Re: [PATCH] Add support function for containment operators

2023-11-12 Thread Laurenz Albe
On Sun, 2023-11-12 at 18:15 +0100, Laurenz Albe wrote:
> I adjusted your patch according to my comments; what do you think?

I have added the patch to the January commitfest, with Jian and Kim as authors.
I hope that is OK with you.

Yours,
Laurenz Albe




Re: [PATCH] Add support function for containment operators

2023-11-12 Thread Kim Johan Andersson

On 12-11-2023 20:20, Laurenz Albe wrote:

On Sun, 2023-11-12 at 18:15 +0100, Laurenz Albe wrote:

I adjusted your patch according to my comments; what do you think?


I have added the patch to the January commitfest, with Jian and Kim as authors.
I hope that is OK with you.


Sounds great to me. Thanks to Jian for picking this up.

Regards,
Kim Johan Andersson






Re: ResourceOwner refactoring

2023-11-12 Thread Heikki Linnakangas

On 11/11/2023 14:00, Alexander Lakhin wrote:

10.11.2023 17:26, Heikki Linnakangas wrote:


I think that is surprising behavior from the DSA facility. When you make 
allocations with dsa_allocate() or just call
dsa_get_address() on an existing dsa_pointer, you wouldn't expect the current 
resource owner to matter. I think
dsa_create/attach() should store the current resource owner in the dsa_area, 
for use in subsequent operations on the
DSA, per attached patch (0002-Fix-dsa.c-with-different-resource-owners.patch).



With the patch 0002 applied, I'm observing another anomaly:


Ok, so the ownership of a dsa was still muddled :-(. Attached is a new 
version that goes a little further. It replaces the whole 
'mapping_pinned' flag in dsa_area with the 'resowner'. When a mapping is 
pinned, it means that 'resowner == NULL'. This is now similar to how the 
resowner field and pinning works in dsm.c.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 90a0f1c8204f01c423b60be032ea521e4afd7473 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 10 Nov 2023 13:54:11 +0200
Subject: [PATCH 1/3] Add test_dsa module.

This covers basic calls within a single backend process, and a test
that uses a different resource owner. The resource owner test
demonstrates that calling dsa_allocate() or dsa_get_address() while in
a different resource owner than in the dsa_create/attach call causes a
segfault. I think that's a bug.

This resource owner issue was originally found by Alexander Lakhin,
exposed by my large ResourceOwner rewrite commit.

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
---
 src/test/modules/Makefile |   1 +
 src/test/modules/meson.build  |   1 +
 src/test/modules/test_dsa/Makefile|  23 
 .../modules/test_dsa/expected/test_dsa.out|  13 ++
 src/test/modules/test_dsa/meson.build |  33 +
 src/test/modules/test_dsa/sql/test_dsa.sql|   4 +
 src/test/modules/test_dsa/test_dsa--1.0.sql   |  12 ++
 src/test/modules/test_dsa/test_dsa.c  | 115 ++
 src/test/modules/test_dsa/test_dsa.control|   4 +
 9 files changed, 206 insertions(+)
 create mode 100644 src/test/modules/test_dsa/Makefile
 create mode 100644 src/test/modules/test_dsa/expected/test_dsa.out
 create mode 100644 src/test/modules/test_dsa/meson.build
 create mode 100644 src/test/modules/test_dsa/sql/test_dsa.sql
 create mode 100644 src/test/modules/test_dsa/test_dsa--1.0.sql
 create mode 100644 src/test/modules/test_dsa/test_dsa.c
 create mode 100644 src/test/modules/test_dsa/test_dsa.control

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 738b715e792..a18e4d28a04 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
 		  test_copy_callbacks \
 		  test_custom_rmgrs \
 		  test_ddl_deparse \
+		  test_dsa \
 		  test_extensions \
 		  test_ginpostinglist \
 		  test_integerset \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index d4828dc44d5..4e83c0f8d74 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
+subdir('test_dsa')
 subdir('test_extensions')
 subdir('test_ginpostinglist')
 subdir('test_integerset')
diff --git a/src/test/modules/test_dsa/Makefile b/src/test/modules/test_dsa/Makefile
new file mode 100644
index 000..77583464dca
--- /dev/null
+++ b/src/test/modules/test_dsa/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_dsa/Makefile
+
+MODULE_big = test_dsa
+OBJS = \
+	$(WIN32RES) \
+	test_dsa.o
+PGFILEDESC = "test_dsa - test code for dynamic shared memory areas"
+
+EXTENSION = test_dsa
+DATA = test_dsa--1.0.sql
+
+REGRESS = test_dsa
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_dsa
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_dsa/expected/test_dsa.out b/src/test/modules/test_dsa/expected/test_dsa.out
new file mode 100644
index 000..266010e77fe
--- /dev/null
+++ b/src/test/modules/test_dsa/expected/test_dsa.out
@@ -0,0 +1,13 @@
+CREATE EXTENSION test_dsa;
+SELECT test_dsa_basic();
+ test_dsa_basic 
+
+ 
+(1 row)
+
+SELECT test_dsa_resowners();
+ test_dsa_resowners 
+
+ 
+(1 row)
+
diff --git a/src/test/modules/test_dsa/meson.build b/src/test/modules/test_dsa/meson.build
new file mode 100644
index 000..21738290ad5
--- /dev/null
+++ b/src/test/modules/test_dsa/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2022-2023, PostgreSQL Global Development Group
+
+test_dsa_sources = files(
+  'test_dsa.c',
+)
+
+if host_system == 'windows'
+  test_dsa_sour

Re: ResourceOwner refactoring

2023-11-12 Thread Thomas Munro
On Mon, Nov 13, 2023 at 11:16 AM Heikki Linnakangas  wrote:
> On 11/11/2023 14:00, Alexander Lakhin wrote:
> > 10.11.2023 17:26, Heikki Linnakangas wrote:
> >> I think that is surprising behavior from the DSA facility. When you make 
> >> allocations with dsa_allocate() or just call
> >> dsa_get_address() on an existing dsa_pointer, you wouldn't expect the 
> >> current resource owner to matter. I think
> >> dsa_create/attach() should store the current resource owner in the 
> >> dsa_area, for use in subsequent operations on the
> >> DSA, per attached patch 
> >> (0002-Fix-dsa.c-with-different-resource-owners.patch).

Yeah, I agree that it is surprising and dangerous that the DSM
segments can have different owners if you're not careful, and it's not
clear that you have to be.  Interesting that no one ever reported
breakage in parallel query due to this thinko, presumably because the
current resource owner always turns out to be either the same one or
something with longer life.

> > With the patch 0002 applied, I'm observing another anomaly:
>
> Ok, so the ownership of a dsa was still muddled :-(. Attached is a new
> version that goes a little further. It replaces the whole
> 'mapping_pinned' flag in dsa_area with the 'resowner'. When a mapping is
> pinned, it means that 'resowner == NULL'. This is now similar to how the
> resowner field and pinning works in dsm.c.

This patch makes sense to me.

It might be tempting to delete the dsa_pin_mapping() interface
completely and say that if CurrentResourceOwner == NULL, then it's
effectively (what we used to call) pinned, but I think it's useful for
exception-safe construction of multiple objects to be able to start
out with a resource owner and then later 'commit' by clearing it.  As
seen in GetSessionDsmHandle().

I don't love the way this stuff is not very explicit, and if we're
going to keep doing this 'dynamic scope' stuff then I wish we could
find a scoping notation that looks more like the stuff one sees in
other languages that say something like
"with-resource-owner(area->resowner) { block of code }".




Re: pg_walfile_name_offset can return inconsistent values

2023-11-12 Thread Michael Paquier
On Fri, Nov 10, 2023 at 07:59:43PM -0800, Andres Freund wrote:
> I think this needs to add tests "documenting" the changed behaviour and
> perhaps also for a few other edge cases.  You could e.g. test
>   SELECT * FROM pg_walfile_name_offset('0/0')
> which today returns bogus values, and which is independent of the wal segment
> size.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Cleaning up array_in()

2023-11-12 Thread Heikki Linnakangas

On 09/11/2023 18:57, Tom Lane wrote:

After further thought I concluded that this area is worth spending
a little more code for.  If we have input like '{foo"bar"}' or
'{"foo"bar}' or '{"foo""bar"}', what it most likely means is that
the user misunderstood the quoting rules.  A message like "Unexpected
array element" is pretty completely unhelpful for figuring that out.
The alternative I was considering, "Unexpected """ character", would
not be much better.  What we want to say is something like "Incorrectly
quoted array element", and the attached v12 makes ReadArrayToken do
that for both quoted and unquoted cases.


+1


I also fixed the obsolete documentation that Alexander noted, and
cleaned up a couple other infelicities (notably, I'd blindly written
ereport(ERROR) in one place where ereturn is now the way).

Barring objections, I think v12 is committable.


Looks good to me. Just two little things caught my eye:

1.


/* Initialize dim, lBound for ReadArrayDimensions, ReadArrayStr */
for (int i = 0; i < MAXDIM; i++)
{
dim[i] = -1;/* indicates "not yet known" */
lBound[i] = 1;  /* default lower bound */
}


The function comments in ReadArrayDimensions and ReadArrayStr don't make 
it clear that these arrays need to be initialized like this. 
ReadArrayDimensions() says that they are output variables, and 
ReadArrayStr() doesn't mention anything about having to initialize them.



2. This was the same before this patch, but:

postgres=# select '{{1}}'::int[];
ERROR:  number of array dimensions (7) exceeds the maximum allowed (6)
LINE 1: select '{{1}}'::int[];
   ^

The error message isn't great, as the literal contains 10 dimensions, 
not 7 as the error message claims.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Cleaning up array_in()

2023-11-12 Thread Tom Lane
Heikki Linnakangas  writes:
> On 09/11/2023 18:57, Tom Lane wrote:
>> Barring objections, I think v12 is committable.

> Looks good to me. Just two little things caught my eye:

> 1.
> The function comments in ReadArrayDimensions and ReadArrayStr don't make 
> it clear that these arrays need to be initialized like this. 
> ReadArrayDimensions() says that they are output variables, and 
> ReadArrayStr() doesn't mention anything about having to initialize them.

Roger, will fix that.

> 2. This was the same before this patch, but:

> postgres=# select '{{1}}'::int[];
> ERROR:  number of array dimensions (7) exceeds the maximum allowed (6)
> LINE 1: select '{{1}}'::int[];
> ^
> The error message isn't great, as the literal contains 10 dimensions, 
> not 7 as the error message claims.

Yeah.  To make that report accurate, we'd have to somehow postpone
issuing the error until we've seen all the left braces (or at least
all the initial ones).  There's a related problem in reading an
explicitly-dimensioned array:

postgres=# select '[1][2][3][4][5][6][7][8][9]={}'::text[];
ERROR:  number of array dimensions (7) exceeds the maximum allowed (6)

I kind of think it's not worth the trouble.  What was discussed
upthread was revising the message to not claim it knows how many
dimensions there are.  The related cases in plperl and plpython just
say "number of array dimensions exceeds the maximum allowed (6)",
and there's a case to be made for adjusting the core messages
similarly.  I figured that could be a separate patch though,
since it'd touch more than array_in (there's about a dozen
occurrences of the former wording).

regards, tom lane




RE: Synchronizing slots from primary to standby

2023-11-12 Thread Zhijie Hou (Fujitsu)
On Thursday, November 9, 2023 6:54 PM shveta malik  
wrote:
> 
> 
> PFA v32 patches which has below changes:

Thanks for updating the patch.

Here are few comments:

1.
Do we need to update the slot upgrade code in pg_upgrade to upgrade the slot's 
failover
property as well ?

2.
The check for wal_level < WAL_LEVEL_LOGICAL.

It seems the existing codes disallow slot creation if wal_level is not
sufficient, I am thinking we might need similar check in slot sync worker.
Otherwise, the synced slot could not be used after standby promotion.

Best Regards,
Hou zj


How to solve the problem of one backend process crashing and causing other processes to restart?

2023-11-12 Thread yuansong
In PostgreSQL, when a backend process crashes, it can cause other backend 
processes to also require a restart, primarily to ensure data consistency. I 
understand that the correct approach is to analyze and identify the cause of 
the crash and resolve it. However, it is also important to be able to handle a 
backend process crash without affecting the operation of other processes, thus 
minimizing the scope of negative impact and improving availability. To achieve 
this goal, could we mimic the Oracle process by introducing a "pmon" process 
dedicated to rolling back crashed process transactions and performing resource 
cleanup? I wonder if anyone has attempted such a strategy or if there have been 
previous discussions on this topic.

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-11-12 Thread Thomas Munro
Trivial observation: these patches obviously introduce many instances
of words derived from "cancel", but they don't all conform to
established project decisions (cf 21f1e15a) about how to spell them.
We follow the common en-US usage: "canceled", "canceling" but
"cancellation".  Blame Webstah et al.

https://english.stackexchange.com/questions/176957/cancellation-canceled-canceling-us-usage




Re: How to solve the problem of one backend process crashing and causing other processes to restart?

2023-11-12 Thread Tom Lane
yuansong  writes:
> In PostgreSQL, when a backend process crashes, it can cause other backend 
> processes to also require a restart, primarily to ensure data consistency. I 
> understand that the correct approach is to analyze and identify the cause of 
> the crash and resolve it. However, it is also important to be able to handle 
> a backend process crash without affecting the operation of other processes, 
> thus minimizing the scope of negative impact and improving availability. To 
> achieve this goal, could we mimic the Oracle process by introducing a "pmon" 
> process dedicated to rolling back crashed process transactions and performing 
> resource cleanup? I wonder if anyone has attempted such a strategy or if 
> there have been previous discussions on this topic.

The reason we force a database-wide restart is that there's no way to
be certain that the crashed process didn't corrupt anything in shared
memory.  (Even with the forced restart, there's a window where bad
data could reach disk before we kill off the other processes that
might write it.  But at least it's a short window.)  "Corruption"
here doesn't just involve bad data placed into disk buffers; more
often it's things like unreleased locks, which would block other
processes indefinitely.

I seriously doubt that anything like what you're describing
could be made reliable enough to be acceptable.  "Oracle does
it like this" isn't a counter-argument: they have a much different
(and non-extensible) architecture, and they also have an army of
programmers to deal with minutiae like undoing resource acquisition.
Even with that, you'd have to wonder about the number of bugs
existing in such necessarily-poorly-tested code paths.

regards, tom lane




Re: A recent message added to pg_upgade

2023-11-12 Thread Amit Kapila
On Sat, Nov 11, 2023 at 5:46 AM Michael Paquier  wrote:
>
> On Fri, Nov 10, 2023 at 03:27:25PM +0530, Amit Kapila wrote:
> > I don't think this comment is correct because there won't be any apply
> > activity on the new cluster as after restoration subscriptions should
> > be disabled. On the old cluster, I think one problem is that the
> > origins may move forward after we copy them which can cause data
> > inconsistency issues. The other is that we may not prefer to generate
> > additional data and WAL during the upgrade. Also, I am not completely
> > sure about using the word 'corruption' in this context.
>
> What is your suggestion here?  Would it be better to just aim for
> simplicity and just say that we don't want it to run because "it can
> lead to some inconsistent behaviors"?
>

I think we can be specific about logical replication stuff. I have not
done any study on autovacuum behavior related to this, so we can
update about it separately if required. I could think of something
like the following:

-   /* Use -b to disable autovacuum. */
+   /*
+* Use -b to disable autovacuum and logical replication launcher
+* (effective in PG17 or later for the latter).
+*
+* Logical replication workers can stream data during the
upgrade which can
+* cause replication origins to move forward after we have copied them.
+* It can cause the system to request the data which is already present
+* in the new cluster.
+*/

Now, ideally, such a comment change makes more sense along with the
main patch, so either we can go without this comment change or
probably wait till the main patch is ready and merge just before it or
along with it. I am fine either way.

BTW, it is not clear to me another part of the comment "... for the
latter" in the proposed wording. Is there any typo there or am I
missing something?

-- 
With Regards,
Amit Kapila.




Re: Is this a problem in GenericXLogFinish()?

2023-11-12 Thread Amit Kapila
On Fri, Nov 10, 2023 at 9:17 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> > Next we should add some test codes. I will continue considering but please 
> > post
> > anything
> > If you have idea.
>
> And I did, PSA the patch. This patch adds two parts in hash_index.sql.
>
> In the first part, the primary bucket page is filled by live tuples and some 
> overflow
> pages are by dead ones. This leads removal of overflow pages without moving 
> tuples.
> HEAD will crash if this were executed, which is already reported on hackers.
>

+-- And do CHECKPOINT and vacuum. Some overflow pages will be removed.
+CHECKPOINT;
+VACUUM hash_cleanup_heap;

Why do we need CHECKPOINT in the above test?

> The second one tests a case (ntups == 0 && is_prim_bucket_same_wrt == false &&
> is_prev_bucket_same_wrt == true), which is never done before.
>

+-- Insert few tuples, the primary bucket page will not satisfy
+INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 50) as i;

What do you mean by the second part of the sentence (the primary
bucket page will not satisfy)?

Few other minor comments:
===
1. Can we use ROLLBACK instead of ABORT in the tests?

2.
- for (i = 0; i < nitups; i++)
+ for (int i = 0; i < nitups; i++)

I don't think there is a need to make this unrelated change.

3.
+ /*
+ * A write buffer needs to be registered even if no tuples are added to
+ * it to ensure that we can acquire a cleanup lock on it if it is the
+ * same as primary bucket buffer or update the nextblkno if it is same
+ * as the previous bucket buffer.
+ */
+ else if (xlrec.is_prim_bucket_same_wrt || xlrec.is_prev_bucket_same_wrt)
+ {
+ uint8 wbuf_flags;
+
+ Assert(xlrec.ntups == 0);

Can we move this comment inside else if, just before Assert?

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2023-11-12 Thread shveta malik
On Mon, Nov 13, 2023 at 6:19 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Thursday, November 9, 2023 6:54 PM shveta malik  
> wrote:
> >
> >
> > PFA v32 patches which has below changes:
>
> Thanks for updating the patch.
>
> Here are few comments:
>
>
> 2.
> The check for wal_level < WAL_LEVEL_LOGICAL.
>
> It seems the existing codes disallow slot creation if wal_level is not
> sufficient, I am thinking we might need similar check in slot sync worker.
> Otherwise, the synced slot could not be used after standby promotion.
>

Yes, I agree. We should add this check.

> Best Regards,
> Hou zj




Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-12 Thread torikoshia

On 2023-11-12 16:46, Michael Paquier wrote:

On Fri, Nov 10, 2023 at 01:15:50PM +0900, Michael Paquier wrote:

The comments added could be better grammatically, but basically LGTM.
I'll take care of that if there are no objections.


The documentation also needed a few tweaks (for DEFAULT and the
argument name), so I have fixed the whole and adapted the new part of
the docs to that, with few little tweaks.


Thanks!

I assume you have already taken this into account, but I think we should 
add the same documentation to the below patch for pg_stat_reset_slru():


  
https://www.postgresql.org/message-id/CALj2ACW4Fqc_m%2BOaavrOMEivZ5aBa24pVKvoXRTmuFECsNBfAg%40mail.gmail.com


On 2023-11-12 16:54, Michael Paquier wrote:

On Fri, Nov 10, 2023 at 08:32:34PM +0900, torikoshia wrote:

On 2023-11-10 13:18, Andres Freund wrote:

I see no reason to not include slrus. We should never have added the
ability to reset them individually, particularly not without a use
case - I couldn't find one skimming some discussion. And what's the
point in not allowing to reset them via pg_stat_reset_shared()?


When including SLRUs, do you think it's better to add 'slrus' argument 
which

enables pg_stat_reset_shared() to reset all SLRUs?


I understand that Andres says that he'd be OK with a addition of a
'slru' option in pg_stat_reset_shared(), as well as including SLRUs in
the resets if everything should be wiped.


Thanks, I'll make the patch.


28cac71bd368 is around since 13~, so changing pg_stat_reset_slru() or
removing it could impact existing applications, so there's little
benefit in changing it at this stage.  Let it be itself.


+1.

As described above, since SLRUs cannot be reset by 
pg_stat_reset_shared(), I

feel a bit uncomfortable to delete it all together.


That would be only effective if NULL is given to the function to reset
everything, which is OK IMO, because this is a shared stats.
--
Michael


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Synchronizing slots from primary to standby

2023-11-12 Thread shveta malik
On Thu, Nov 9, 2023 at 8:56 AM Amit Kapila  wrote:
>
> On Thu, Nov 9, 2023 at 8:11 AM Amit Kapila  wrote:
> >
> > On Wed, Nov 8, 2023 at 8:09 PM Drouvot, Bertrand
> >  wrote:
> > >
> > > > Unrelated to above, if there is a user slot on standby with the same
> > > > name which the slot-sync worker is trying to create, then shall it
> > > > emit a warning and skip the sync of that slot or shall it throw an
> > > > error?
> > > >
> > >
> > > I'd vote for emit a warning and move on to the next slot if any.
> > >
> >
> > But then it could take time for users to know the actual problem and
> > they probably notice it after failover. OTOH, if we throw an error
> > then probably they will come to know earlier because the slot sync
> > mechanism would be stopped. Do you have reasons to prefer giving a
> > WARNING and skipping creating such slots? I expect this WARNING to
> > keep getting repeated in LOGs because the consecutive sync tries will
> > again generate a WARNING.
> >
>
> Apart from the above, I would like to discuss the slot sync work
> distribution strategy of this patch. The current implementation as
> explained in the commit message [1] works well if the slots belong to
> multiple databases. It is clear from the data in emails [2][3][4] that
> having more workers really helps if the slots belong to multiple
> databases. But I think if all the slots belong to one or very few
> databases then such a strategy won't be as good. Now, on one hand, we
> get very good numbers for a particular workload with the strategy used
> in the patch but OTOH it may not be adaptable to various different
> kinds of workloads. So, I have a question whether we should try to
> optimize this strategy for various kinds of workloads or for the first
> version let's use a single-slot sync-worker and then we can enhance
> the functionality in later patches either in PG17 itself or in PG18 or
> later versions.

I can work on separating the patch. We can first focus on single
worker design and then we can work on multi-worker design either
immediately (if needed) or we can target it in the second draft of the
patch. I would like to know the thoughts of others on this.

One thing to note is that a lot of the complexity of
> the patch is attributed to the multi-worker strategy which may still
> not be efficient, so there is an argument to go with a simpler
> single-slot sync-worker strategy and then enhance it in future
> versions as we learn more about various workloads. It will also help
> to develop this feature incrementally instead of doing all the things
> in one go and taking a much longer time than it should.
>

Agreed. With multi-workers, a lot of complexity (dsa, locks etc) have
come into play. We can decide better on our workload distribution
strategy among workers once we have more clarity on different types of
workloads.

>
> [1] - "The replication launcher on the physical standby queries
> primary to get the list of dbids for failover logical slots. Once it
> gets the dbids, if dbids < max_slotsync_workers, it starts only that
> many workers, and if dbids > max_slotsync_workers, it starts
> max_slotsync_workers and divides the work equally among them. Each
> worker is then responsible to keep on syncing the logical slots
> belonging to the DBs assigned to it.
>
> Each slot-sync worker will have its own dbids list. Since the upper
> limit of this dbid-count is not known, it needs to be handled using
> dsa. We initially allocated memory to hold 100 dbids for each worker.
> If this limit is exhausted, we reallocate this memory with size
> incremented again by 100."
>
> [2] - 
> https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com
> [3] - 
> https://www.postgresql.org/message-id/CAFPTHDZw2G3Pax0smymMjfPqdPcZhMWo36f9F%2BTwNTs0HFxK%2Bw%40mail.gmail.com
> [4] - 
> https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.




Re: Add recovery to pg_control and remove backup_label

2023-11-12 Thread Michael Paquier
On Fri, Nov 10, 2023 at 02:55:19PM -0400, David Steele wrote:
> On 11/10/23 00:37, Michael Paquier wrote:
>> I've done a few more dozen runs, and still nothing.  I am wondering
>> what this disturbance was.
> 
> OK, hopefully it was just a blip.

Still nothing on this side.  So that seems really like a random blip
in the matrix.

> This has been split out.

Thanks, applied 0001.

>> The term "backup recovery", that we've never used in the tree until
>> now as far as I know.  Perhaps this recovery method should just be
>> referred as "recovery from backup"?
> 
> Well, "backup recovery" is less awkward, I think. For instance "backup
> recovery field" vs "recovery from backup field".

Not sure.  I've never used this term when referring to recovery from a
backup.  Perhaps I'm just not used to it, still that sounds a bit
confusing here.

>> Something in this area is that backupRecoveryRequired is the switch
>> controlling if the fields set by the recovery initialization.  Could
>> it be actually useful to leave the other fields as they are and only
>> reset backupRecoveryRequired before the first control file update?
>> This would leave a trace of the backup history directly in the control
>> file.
> 
> Since the other recovery fields are cleared in ReachedEndOfBackup() this
> would be a change from what we do now.
> 
> None of these fields are ever visible (with the exception of
> minRecoveryPoint/TLI) since they are reset when the database becomes
> consistent and before logons are allowed. Viewing them with pg_controldata
> makes sense, but I'm not sure pg_control_recovery() does.
> 
> In fact, are backup_start_lsn, backup_end_lsn, and
> end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe
> I'm missing something?

Yeah, but custom backup/restore tools may want manipulate the contents
of the control file for their own work, so at least for the sake of
visibility it sounds important to me to show all the information at
hand, and that there is no need to.

-The backup label
-file includes the label string you gave to 
pg_backup_start,
+The backup history file (which is archived like WAL) includes the label
+string you gave to pg_backup_start,
 as well as the time at which pg_backup_start was run, 
and
 the name of the starting WAL file.  In case of confusion it is therefore
-possible to look inside a backup file and determine exactly which
+possible to look inside a backup history file and determine exactly which

As a side note, it is a bit disappointing that we lose the backup
label from the backup itself, even if the patch changes correctly the
documentation to reflect the new behavior.  It is in the backup
history file on the node from where the base backup has been taken or
in the archives, hopefully.  However there is nothing that remains in
the base backup itself, and backups can be self-contained (easy with
pg_basebackup --wal-method=stream).  I think that we should retain a
minimum amount of information as a replacement for the backup_label,
at least.  With this state, the current patch slightly reduces the
debuggability of deployments.  That could be annoying for some users.

> New patches attached based on eb81e8e790.

Diving into the code for references about the backup label file, I
have spotted this log in pg_rewind that is now incorrect:
if (showprogress)
pg_log_info("creating backup label and updating control file");

+printf(_("Backup start location's timeline: %u\n"),
+   ControlFile->backupStartPointTLI);
 printf(_("Backup end location:  %X/%X\n"),
LSN_FORMAT_ARGS(ControlFile->backupEndPoint));
Perhaps these two should be reversed to match with the header file.


+/*
+ * After pg_backup_stop() returns this field will contain a copy of
+ * pg_control that should be stored with the backup. Fields have been
+ * updated for recovery and the CRC has been recalculated. The buffer
+ * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always
+ * a consistent size but smaller (and hopefully easier to handle) than
+ * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed.
+ */
+uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE];

I don't mind the addition of a control file with the max safe size,
because it will never be higher than that.  However:

+/* End the backup before sending pg_control */
+basebackup_progress_wait_wal_archive(&state);
+do_pg_backup_stop(backup_state, !opt->nowait);
+
+/* Send copy of pg_control containing recovery info */
+sendFileWithContent(sink, XLOG_CONTROL_FILE,
+(char *)backup_state->controlFile,
+PG_CONTROL_MAX_SAFE_SIZE, &manifest);

It seems to me that the base backup protocol should always send an 8k
file for the control file so as we maintain consi

Re: A recent message added to pg_upgade

2023-11-12 Thread Michael Paquier
On Mon, Nov 13, 2023 at 08:45:12AM +0530, Amit Kapila wrote:
> I think we can be specific about logical replication stuff. I have not
> done any study on autovacuum behavior related to this, so we can
> update about it separately if required.

Autovacuum, as far as I recall, could decide to do some work before
files are physically copied from the old to the new cluster,
corrupting the new cluster.  Per 76dd09bbec89:

+*  If we have lost the autovacuum launcher, try to start a new one.
+*  We don't want autovacuum to run in binary upgrade mode because
+*  autovacuum might update relfrozenxid for empty tables before
+*  the physical files are put in place.

> -   /* Use -b to disable autovacuum. */
> +   /*
> +* Use -b to disable autovacuum and logical replication launcher
> +* (effective in PG17 or later for the latter).
> +*
> +* Logical replication workers can stream data during the
> upgrade which can
> +* cause replication origins to move forward after we have copied 
> them.
> +* It can cause the system to request the data which is already 
> present
> +* in the new cluster.
> +*/
> 
> Now, ideally, such a comment change makes more sense along with the
> main patch, so either we can go without this comment change or
> probably wait till the main patch is ready and merge just before it or
> along with it. I am fine either way.

Another location would be to document that stuff directly in
launcher.c where the check for IsBinaryUpgrade would be added.  You
are right that it makes little sense to document that now, so how
about:
1) keeping pg_upgrade.c minimal, say:
-   /* Use -b to disable autovacuum. */
+   /*
+* Use -b to disable autovacuum and logical replication
+* launcher (in 17~).
+*/
With a removal of the comment block related to
max_logical_replication_workers=0?
2) Document that in ApplyLauncherRegister() as part of the main patch
for the subscribers?

> BTW, it is not clear to me another part of the comment "... for the
> latter" in the proposed wording. Is there any typo there or am I
> missing something?

The "latter" refers to the logirep launcher here, as -b would affect
it only in 17~ with the patch I sent previously.
--
Michael


signature.asc
Description: PGP signature


Re: MERGE ... RETURNING

2023-11-12 Thread jian he
Hi.
v13 works fine. all tests passed. The code is very intuitive. played
with multi WHEN clauses, even with before/after row triggers, work as
expected.

I don't know when replace_outer_merging will be invoked. even set a
breakpoint on it. coverage shows replace_outer_merging only called
once.

sql-merge.html miss mentioned RETURNING need select columns privilege?
in sql-insert.html, we have:
"Use of the RETURNING clause requires SELECT privilege on all columns
mentioned in RETURNING. If you use the query clause to insert rows
from a query, you of course need to have SELECT privilege on any table
or column used in the query."

I saw the change in src/sgml/glossary.sgml, So i looked around. in the
"Materialized view (relation)" part. "It cannot be modified via
INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into
that sentence?
also there is SELECT, INSERT, UPDATE, DELETE,  do we need to add a
MERGE entry in glossary.sgml?




Re: Remove MSVC scripts from the tree

2023-11-12 Thread Michael Paquier
On Fri, Nov 10, 2023 at 08:38:21AM +0100, Peter Eisentraut wrote:
> How about this patch as a comprehensive solution?
>  8 files changed, 26 insertions(+), 339 deletions(-)

Thanks for the patch.  The numbers are here, and the patch looks
sensible.

The contents of probes.h without --enable-trace are exactly the same
before and after the patch.

In short, +1 to switch to what you are proposing here.
--
Michael


signature.asc
Description: PGP signature


Re: Synchronizing slots from primary to standby

2023-11-12 Thread Ajin Cherian
On Thu, Nov 9, 2023 at 9:54 PM shveta malik  wrote:
>
> PFA v32 patches which has below changes:
Testing with this patch, I see that if the failover enabled slot is
invalidated on the primary, then the corresponding synced slot is not
invalidated on the standby. Instead, I see that it continuously gets
the below error:
" WARNING:  not synchronizing slot sub; synchronization would move it backwards"

In the code, I see that:
if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn)
{
ereport(WARNING,
errmsg("not synchronizing slot %s; synchronization
would move"
   " it backwards", remote_slot->name));

ReplicationSlotRelease();
CommitTransactionCommand();
return;
}

If the restart_lsn of the remote slot is behind, then the
local_slot_update() function is never called to set the invalidation
status on the local slot. And for invalidated slots, restart_lsn is
always NULL.

regards,
Ajin Cherian
Fujitsu Australia




RE: Is this a problem in GenericXLogFinish()?

2023-11-12 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thanks for reviewing! PSA new version patch.

> > > Next we should add some test codes. I will continue considering but please
> post
> > > anything
> > > If you have idea.
> >
> > And I did, PSA the patch. This patch adds two parts in hash_index.sql.
> >
> > In the first part, the primary bucket page is filled by live tuples and some
> overflow
> > pages are by dead ones. This leads removal of overflow pages without moving
> tuples.
> > HEAD will crash if this were executed, which is already reported on hackers.
> >
> 
> +-- And do CHECKPOINT and vacuum. Some overflow pages will be removed.
> +CHECKPOINT;
> +VACUUM hash_cleanup_heap;
> 
> Why do we need CHECKPOINT in the above test?

CHECKPOINT command is needed for writing a hash pages to disk. IIUC if the 
command
is omitted, none of tuples are recorded to hash index *file*, so squeeze would 
not
be occurred.

You can test by 1) restoring changes for hashovfl.c then 2) removing CHECKPOINT
command. Server crash would not be occurred.

> > The second one tests a case (ntups == 0 && is_prim_bucket_same_wrt ==
> false &&
> > is_prev_bucket_same_wrt == true), which is never done before.
> >
> 
> +-- Insert few tuples, the primary bucket page will not satisfy
> +INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 50) as i;
> 
> What do you mean by the second part of the sentence (the primary
> bucket page will not satisfy)?

I meant to say that the primary bucket page still can have more tuples. Maybe I
should say "will not be full". Reworded.

> Few other minor comments:
> ===
> 1. Can we use ROLLBACK instead of ABORT in the tests?

Changed. IIRC they have same meaning, but it seems that most of sql files have
"ROLLBACK".

> 2.
> - for (i = 0; i < nitups; i++)
> + for (int i = 0; i < nitups; i++)
> 
> I don't think there is a need to make this unrelated change.

Reverted. I thought that the variable i would be used only when nitups>0 so that
it could be removed, but it was not my business.

> 3.
> + /*
> + * A write buffer needs to be registered even if no tuples are added to
> + * it to ensure that we can acquire a cleanup lock on it if it is the
> + * same as primary bucket buffer or update the nextblkno if it is same
> + * as the previous bucket buffer.
> + */
> + else if (xlrec.is_prim_bucket_same_wrt || xlrec.is_prev_bucket_same_wrt)
> + {
> + uint8 wbuf_flags;
> +
> + Assert(xlrec.ntups == 0);
> 
> Can we move this comment inside else if, just before Assert?

Moved.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



fix_hash_squeeze_wal_5.patch
Description: fix_hash_squeeze_wal_5.patch


Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-11-12 Thread Michael Paquier
On Fri, Nov 10, 2023 at 02:44:25PM -0600, Nathan Bossart wrote:
> On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote:
>> +#ifdef USE_INJECTION_POINTS
>> +#define INJECTION_POINT_RUN(name) InjectionPointRun(name)
>> +#else
>> +#define INJECTION_POINT_RUN(name) ((void) name)
>> +#endif
> 
> nitpick: Why is the non-injection-point version "(void) name"?  I see
> "(void) true" used elsewhere for this purpose.

Or (void) 0.

>> +
>> + Here is an example of callback for
>> + InjectionPointCallback:
>> +
>> +static void
>> +custom_injection_callback(const char *name)
>> +{
>> +elog(NOTICE, "%s: executed custom callback", name);
>> +}
>> +
> 
> Why do we provide the name to the callback functions?

This is for the use of the same callback across multiple points, and
tracking the name of the event happening was making sense to me to
know which code path is being taken when a callback is called.  One
thing that I got in mind as well here is to be able to register custom
wait events based on the name of the callback taken, for example on a 
condition variable, a latch or a named LWLock.

> Overall, the design looks pretty good to me.  I think it's a good idea to
> keep it simple to start with.  Since this is really only intended for
> special tests that run in special builds, it seems like we ought to be able
> to change it easily in the future as needed.

Yes, my first idea is to keep the initial design minimal and take the
temperature.  As far as I can see, there seem to not be any strong
objection with this basic design, still I agree that I need to show a
bit more code about its usability.  I have some SQL and recovery cases
where this is handy and these have piled over time, including at least
two/three of them with more basic APIs in the test module may make
sense in the initial batch of what I am proposing here.
--
Michael


signature.asc
Description: PGP signature


Re: How to solve the problem of one backend process crashing and causing other processes to restart?

2023-11-12 Thread Laurenz Albe
On Sun, 2023-11-12 at 21:55 -0500, Tom Lane wrote:
> yuansong  writes:
> > In PostgreSQL, when a backend process crashes, it can cause other backend
> > processes to also require a restart, primarily to ensure data consistency.
> > I understand that the correct approach is to analyze and identify the
> > cause of the crash and resolve it. However, it is also important to be
> > able to handle a backend process crash without affecting the operation of
> > other processes, thus minimizing the scope of negative impact and
> > improving availability. To achieve this goal, could we mimic the Oracle
> > process by introducing a "pmon" process dedicated to rolling back crashed
> > process transactions and performing resource cleanup? I wonder if anyone
> > has attempted such a strategy or if there have been previous discussions
> > on this topic.
> 
> The reason we force a database-wide restart is that there's no way to
> be certain that the crashed process didn't corrupt anything in shared
> memory.  (Even with the forced restart, there's a window where bad
> data could reach disk before we kill off the other processes that
> might write it.  But at least it's a short window.)  "Corruption"
> here doesn't just involve bad data placed into disk buffers; more
> often it's things like unreleased locks, which would block other
> processes indefinitely.
> 
> I seriously doubt that anything like what you're describing
> could be made reliable enough to be acceptable.  "Oracle does
> it like this" isn't a counter-argument: they have a much different
> (and non-extensible) architecture, and they also have an army of
> programmers to deal with minutiae like undoing resource acquisition.
> Even with that, you'd have to wonder about the number of bugs
> existing in such necessarily-poorly-tested code paths.

Yes.
I think that PostgreSQL's approach is superior: rather than investing in
code to mitigate the impact of data corruption caused by a crash, invest
in quality code that doesn't crash in the first place.

Euphemistically naming a crash "ORA-600 error" seems to be part of
their strategy.

Yours,
Laurenz Albe




Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-12 Thread Michael Paquier
On Mon, Nov 13, 2023 at 01:15:14PM +0900, torikoshia wrote:
> I assume you have already taken this into account, but I think we should add
> the same documentation to the below patch for pg_stat_reset_slru():
> 
> https://www.postgresql.org/message-id/CALj2ACW4Fqc_m%2BOaavrOMEivZ5aBa24pVKvoXRTmuFECsNBfAg%40mail.gmail.com

Yep, the DEFAULT value and the argument name should be documented in
monitoring.sgml.
--
Michael


signature.asc
Description: PGP signature


Re: Synchronizing slots from primary to standby

2023-11-12 Thread shveta malik
On Mon, Nov 13, 2023 at 11:02 AM Ajin Cherian  wrote:
>
> On Thu, Nov 9, 2023 at 9:54 PM shveta malik  wrote:
> >
> > PFA v32 patches which has below changes:
> Testing with this patch, I see that if the failover enabled slot is
> invalidated on the primary, then the corresponding synced slot is not
> invalidated on the standby. Instead, I see that it continuously gets
> the below error:
> " WARNING:  not synchronizing slot sub; synchronization would move it 
> backwards"
>
> In the code, I see that:
> if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn)
> {
> ereport(WARNING,
> errmsg("not synchronizing slot %s; synchronization
> would move"
>" it backwards", remote_slot->name));
>
> ReplicationSlotRelease();
> CommitTransactionCommand();
> return;
> }
>
> If the restart_lsn of the remote slot is behind, then the
> local_slot_update() function is never called to set the invalidation
> status on the local slot. And for invalidated slots, restart_lsn is
> always NULL.
>

Okay. Thanks for testing Ajin. I think it needs a fix wherein we set
the local-slot's invalidation status (provided it is not invalidated
already) from the remote slot before this check itself. And if the
slot is invalidated locally (either by itself) or by primary_slot
being invalidated, then we should skip the sync. I will fix this in
the next version.

thanks
Shveta




Re: pg_upgrade --copy-file-range

2023-11-12 Thread Peter Eisentraut

On 08.10.23 07:15, Thomas Munro wrote:

About your patch:

I think you should have a "check" function called from
check_new_cluster().  That check function can then also handle the "not
supported" case, and you don't need to handle that in
parseCommandLine().  I suggest following the clone example for these,
since the issues there are very similar.


Done.


This version looks good to me.

Tiny nit:  You copy-and-pasted "%s/PG_VERSION.clonetest"; maybe choose a 
different suffix.