date:20220904

Re: freeing LDAPMessage in CheckLDAPAuth

2022-09-04 Thread Michael Paquier

On Sun, Sep 04, 2022 at 01:52:10AM -0400, Tom Lane wrote:
> I can't get too excited about this.  All of the error exit paths in
> backend authentication code will lead immediately to process exit, so
> the possibility of some memory being leaked really has no consequences
> worth worrying about.  If we *were* worried about it, sprinkling a few
> more ldap_msgfree() calls into the existing code would hardly make it
> more bulletproof.

Even if this is not critical in the backend for this authentication
path, I'd like to think that it is still a good practice for future
code so as anything code-pasted around would get the call.  So I see
no reason to not put smth on HEAD at least.

> There's lots of psprintf() and other
> Postgres-universe calls in that code that could potentially fail and
> force an elog exit without reaching ldap_msgfree.  So if you wanted to
> make this completely clean you'd need to resort to doing the freeing
> in PG_CATCH blocks ... and I don't see any value in hacking it to that
> extent.

Agreed.  I cannot get excited about going down to that in this case.

> What might be worth inspecting is the code paths in frontend libpq
> that call ldap_msgfree(), because on the client side we don't get to
> assume that an error will lead to immediate process exit.  If we've
> missed any cleanups over there, that *would* be worth fixing.

FWIW, I have looked at the frontend while writing my previous message
and did not notice anything.
--
Michael


signature.asc
Description: PGP signature

Re: pgsql: pg_collation_actual_version() -> pg_collation_current_version().

2022-09-04 Thread Alvaro Herrera

On 2021-Feb-26, Thomas Munro wrote:

> On Thu, Feb 25, 2021 at 10:49 PM Peter Eisentraut  
> wrote:
> > Seeing that explanation, I think that's even more of a reason to avoid
> > the name "current" and use something strikingly different.
> >
> > In any case, this function name has been around for some years now and
> > renaming it just for taste reasons seems unnecessary.
> 
> I guess my unspoken assumption was that anyone using this in a query
> is probably comparing it with collversion and thus already has a query
> that needs to be rewritten for v14, and therefore it's not a bad time
> to clean up some naming.  But that argument is moot if you don't even
> agree that the new name's an improvement, so... reverted.

18 months later I find myself translating this message:

#: utils/init/postinit.c:457
msgid "template database \"%s\" has a collation version, but no actual 
collation version could be determined"

and I have absolutely no idea what to translate it to.  What does it
*mean*?  I think if it does mean something important, there should be a
"translator:" comment next to it.

However, looking at get_collation_actual_version, I think returning NULL
is unexpected anyway; it appears that in all cases where something
failed, it has already reported an error internally.  Also, several
callers seem to ignore the possibility of it returning NULL.  So I
wonder if we can just turn this ereport() into an elog() and call it a
day.

The word "actual" is very seldom used in catalogued server messages.  We
only have these few:

msgid "template database \"%s\" has a collation version, but no actual 
collation version could be determined"
msgid "could not determine actual type of argument declared %s"
msgid "RADIUS response from %s has corrupt length: %d (actual length %d)"
msgid "ShmemIndex entry size is wrong for data structure \"%s\": expected %zu, 
actual %zu"
msgid "could not determine actual enum type"
msgid "collation \"%s\" has no actual version, but a version was recorded"
msgid "could not determine actual result type for function \"%s\" declared to 
return type %s"
msgid "database \"%s\" has no actual collation version, but a version was 
recorded"

and my strategy so far is to translate it to "real" (meaning "true") or
just ignore the word altogether, which gives good results IMO.  But in
this particular it looks like it's a very critical part of the message.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo." (Augusto Pinochet a una corte de justicia)

Re: pgsql: Prefetch data referenced by the WAL, take II.

2022-09-04 Thread Alvaro Herrera

On 2022-Apr-07, Thomas Munro wrote:

> Prefetch data referenced by the WAL, take II.

I propose a small wording change in guc.c,

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..9803741708 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2840,7 +2840,7 @@ static struct config_int ConfigureNamesInt[] =
{
{"wal_decode_buffer_size", PGC_POSTMASTER, WAL_RECOVERY,
gettext_noop("Maximum buffer size for reading ahead in 
the WAL during recovery."),
-   gettext_noop("This controls the maximum distance we can 
read ahead in the WAL to prefetch referenced blocks."),
+   gettext_noop("This controls the maximum distance we can 
read ahead in the WAL to prefetch data blocks referenced therein."),
GUC_UNIT_BYTE
},
&wal_decode_buffer_size,

"referenced blocks" seems otherwise a bit unclear to me.  Other wording
suggestions welcome.  I first thought of "...to prefetch referenced data
blocks", which is probably OK too.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)

Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2022-09-04 Thread Ibrar Ahmed

On Sat, Sep 3, 2022 at 12:09 PM Julien Rouhaud  wrote:

> Hi,
>
> On Sat, Sep 03, 2022 at 10:36:37AM +0500, Ibrar Ahmed wrote:
> >
> > Hi Kyotaro Horiguchi, Fabien Coelho, Daniel Gustafsson,
> >
> > Since you haven't had time to write a review the last many days, the
> author
> > replied
> > with a rebased patch for a long time and never heard. We've taken your
> name
> > off the reviewer list for this patch. Of course, you are still welcome to
> > review it if you can
> > find the time. We're removing your name so that other reviewers know the
> > patch still needs
> > attention. We understand that day jobs and other things get in the way of
> > doing patch
> > reviews when you want to, so please come back and review a patch or two
> > later when you
> > have more time.
>
> I thought that we decided not to remove assigned reviewers from a CF entry,
> even if they didn't reply recently?  See the discussion around
>
> https://www.postgresql.org/message-id/CA%2BTgmoZSBNhX0zCkG5T5KiQize9Aq4%2Bec%2BuqLcfBhm_%2B12MbQA%40mail.gmail.com
>

Ah, ok, thanks for the clarification. I will add them back.

@Jacob Champion, we need to update the CommitFest Checklist [1] document
accordingly.





*"Reviewer Clear   [reviewer name]:*

*   Since you haven't had time to write a review of [patch] in the last 5
days,   we've taken your name off the reviewer list for this patch."*


[1] https://wiki.postgresql.org/wiki/CommitFest_Checklist

-- 
Ibrar Ahmed

Re: pgsql: Add ALTER SUBSCRIPTION ... SKIP.

2022-09-04 Thread Alvaro Herrera

On 2022-Mar-22, Amit Kapila wrote:

> Add ALTER SUBSCRIPTION ... SKIP.

There are two messages here that seem oddly worded.

msgid "start skipping logical replication transaction finished at %X/%X"
msgid "done skipping logical replication transaction finished at %X/%X"

Two complaints here.  First, the phrases "start / finished" and "done /
finished" look very strange.  It took me a while to realize that
"finished" refers to the LSN, not to the skipping operation.  Do we ever
talk about a transaction "finished at XYZ" as opposed to a transaction
whose LSN is XYZ?  (This became particularly strange when I realized
that the LSN might come from a PREPARE.)

Second, "logical replication transaction".  Is it not a regular
transaction that we happen to be processing via logical replication?

I think they should say something like

"logical replication starts skipping transaction with LSN %X/%X"
"logical replication completed skipping transaction with LSN %X/%X"

Other ideas?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Re: Add 64-bit XIDs into PostgreSQL 15

2022-09-04 Thread Dilip Kumar

On Sun, Sep 4, 2022 at 9:53 AM Dilip Kumar  wrote:
>
> On Mon, Jul 18, 2022 at 2:54 PM Pavel Borisov  wrote:
> >>
> >> > I can agree with you that sending rebased patches too often can be a 
> >> > little annoying. On the other hand, otherwise, it's just red in Cfbot. I 
> >> > suppose it's much easier and more comfortable to review the patches that 
> >> > at least apply cleanly and pass all tests. So if Cfbot is red for a long 
> >> > time I feel we need to send a rebased patchset anyway.
> >> >
> >> > I'll try to not doing this too often but frankly, I don't see a better 
> >> > alternative at the moment.
> >>
> >> Considering the overall activity on the mailing list personally I
> >> don't see a problem here. Several extra emails don't bother me at all,
> >> but I would like to see a green cfbot report for an open item in the
> >> CF application. Otherwise someone will complain that the patch doesn't
> >> apply anymore and the result will be the same as for sending an
> >> updated patch, except that we will receive at least two emails instead
> >> of one.
> >
> > Hi, Alexander!
> > Agree with you. I also consider green cfbot entry important. So PFA rebased 
> > v43.
>
> Since we have converted TransactionId to 64-bit, so do we still need
> the concept of FullTransactionId?  I mean it is really confusing to
> have 3 forms of transaction ids.  i.e. Transaction Id,
> FullTransactionId and ShortTransactionId.

I have done some more reviews to understand the idea.  I think this
patch needs far more comments to make it completely readable.

1.
 typedef struct HeapTupleData
 {
+ TransactionId t_xmin; /* base value for normal transaction ids */
+ TransactionId t_xmax; /* base value for mutlixact */

I think the field name and comments are not in sync, field says xmin
and xmax whereas the comment says base value for
transaction id and multi-xact.

2.
extern bool heap_page_prepare_for_xid(Relation relation, Buffer buffer,
  TransactionId xid, bool multi);

I noticed that this function is returning bool but all the callers are
ignoring the return type.

3.
+static int
+heap_page_try_prepare_for_xid(Relation relation, Buffer buffer, Page page,
+   TransactionId xid, bool multi, bool is_toast)
+{
+ TransactionId base;
+ ShortTransactionId min = InvalidTransactionId,

add function header comments.

4.

+ if (!multi)
+ {
+ Assert(!is_toast || !(htup->t_infomask & HEAP_XMAX_IS_MULTI));
+
+ if (TransactionIdIsNormal(htup->t_choice.t_heap.t_xmin) &&
+ !HeapTupleHeaderXminFrozen(htup))
+ {
+ xid_min_max(min, max, htup->t_choice.t_heap.t_xmin, &found);
+ }
+
+ if (htup->t_infomask & HEAP_XMAX_INVALID)
+ continue;
+
+ if ((htup->t_infomask & HEAP_XMAX_IS_MULTI) &&
+ (!(htup->t_infomask & HEAP_XMAX_LOCK_ONLY)))
+ {
+ TransactionId update_xid;
+ ShortTransactionId xid;
+
+ Assert(!is_toast);
+ update_xid = MultiXactIdGetUpdateXid(HeapTupleHeaderGetRawXmax(page, htup),
+ htup->t_infomask);
+ xid = NormalTransactionIdToShort(HeapPageGetSpecial(page)->pd_xid_base,
+ update_xid);
+
+ xid_min_max(min, max, xid, &found);
+ }
+ }

Why no handling for multi?  And this function has absolutely no
comments to understand the reason for this.

5.
+ if (IsToastRelation(relation))
+ {
+ PageInit(page, BufferGetPageSize(buffer), sizeof(ToastPageSpecialData));
+ ToastPageGetSpecial(page)->pd_xid_base = RecentXmin -
FirstNormalTransactionId;
+ }
+ else
+ {
+ PageInit(page, BufferGetPageSize(buffer), sizeof(HeapPageSpecialData));
+ HeapPageGetSpecial(page)->pd_xid_base = RecentXmin - FirstNormalTransactionId;
+ }

Why pd_xid_base can not be just RecentXmin?  Please explain in the
comments above.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

[BUG] Storage declaration in ECPG

2022-09-04 Thread Andrey Sokolov

Hi, The ECPG preprocessor converts the code"static VARCHAR str1[10], str2[20], str3[30];"into"static  struct varchar_1  { int len; char arr[ 10 ]; }  str1 ; struct varchar_2  { int len; char arr[ 20 ]; }  str2 ; struct varchar_3  { int len; char arr[ 30 ]; }  str3 ;".Storage declaration applies only to the first structure. The patch in the attachment fixes the bug. Storage declaration will be repeated before each structure.The patch is on github too: https://github.com/andr-sokolov/postgresql/commit/c8f8fc7a211938569e7d46c91a428d8cb25b6f9c --Andrey SokolovArenadata    https://arenadata.tech/From c8f8fc7a211938569e7d46c91a428d8cb25b6f9c Mon Sep 17 00:00:00 2001
From: Andrey Sokolov 
Date: Sun, 4 Sep 2022 12:48:22 +0300
Subject: [PATCH] Fix storage declaration in ECPG

The ECPG preprocessor converted the code
"static VARCHAR str1[10], str2[20], str3[30];"
into
"static  struct varchar_1  { int len; char arr[ 10 ]; }  str1 ;
 struct varchar_2  { int len; char arr[ 20 ]; }  str2 ;
 struct varchar_3  { int len; char arr[ 30 ]; }  str3 ;".
Storage declaration applied only to the first structure.
Now storage declaration is repeated before each structure.
---
 src/interfaces/ecpg/preproc/ecpg.trailer  |   4 +-
 src/interfaces/ecpg/preproc/type.h|   1 +
 src/interfaces/ecpg/test/ecpg_schedule|   1 +
 .../test/expected/preproc-static_variables.c  | 145 ++
 .../expected/preproc-static_variables.stderr  |  96 
 .../expected/preproc-static_variables.stdout  |   3 +
 src/interfaces/ecpg/test/preproc/.gitignore   |   2 +
 src/interfaces/ecpg/test/preproc/Makefile |   1 +
 .../ecpg/test/preproc/static_variables.pgc|  55 +++
 9 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 src/interfaces/ecpg/test/expected/preproc-static_variables.c
 create mode 100644 src/interfaces/ecpg/test/expected/preproc-static_variables.stderr
 create mode 100644 src/interfaces/ecpg/test/expected/preproc-static_variables.stdout
 create mode 100644 src/interfaces/ecpg/test/preproc/static_variables.pgc

diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer
index 0b100b9b04..54254e2f97 100644
--- a/src/interfaces/ecpg/preproc/ecpg.trailer
+++ b/src/interfaces/ecpg/preproc/ecpg.trailer
@@ -479,6 +479,7 @@ type_declaration: S_TYPEDEF
 var_declaration: storage_declaration
 		var_type
 		{
+			actual_type[struct_level].type_storage = $1;
 			actual_type[struct_level].type_enum = $2.type_enum;
 			actual_type[struct_level].type_str = $2.type_str;
 			actual_type[struct_level].type_dimension = $2.type_dimension;
@@ -493,6 +494,7 @@ var_declaration: storage_declaration
 		}
 		| var_type
 		{
+			actual_type[struct_level].type_storage = EMPTY;
 			actual_type[struct_level].type_enum = $1.type_enum;
 			actual_type[struct_level].type_str = $1.type_str;
 			actual_type[struct_level].type_dimension = $1.type_dimension;
@@ -949,7 +951,7 @@ variable_list: variable
 		| variable_list ',' variable
 		{
 			if (actual_type[struct_level].type_enum == ECPGt_varchar || actual_type[struct_level].type_enum == ECPGt_bytea)
-$$ = cat_str(3, $1, mm_strdup(";"), $3);
+$$ = cat_str(4, $1, mm_strdup(";"), mm_strdup(actual_type[struct_level].type_storage), $3);
 			else
 $$ = cat_str(3, $1, mm_strdup(","), $3);
 		}
diff --git a/src/interfaces/ecpg/preproc/type.h b/src/interfaces/ecpg/preproc/type.h
index fb20be53e0..08b739e5f3 100644
--- a/src/interfaces/ecpg/preproc/type.h
+++ b/src/interfaces/ecpg/preproc/type.h
@@ -114,6 +114,7 @@ struct exec
 
 struct this_type
 {
+	char	   *type_storage;
 	enum ECPGttype type_enum;
 	char	   *type_str;
 	char	   *type_dimension;
diff --git a/src/interfaces/ecpg/test/ecpg_schedule b/src/interfaces/ecpg/test/ecpg_schedule
index e034c5a420..d594c4d9b0 100644
--- a/src/interfaces/ecpg/test/ecpg_schedule
+++ b/src/interfaces/ecpg/test/ecpg_schedule
@@ -24,6 +24,7 @@ test: preproc/comment
 test: preproc/cursor
 test: preproc/define
 test: preproc/init
+test: preproc/static_variables
 test: preproc/strings
 test: preproc/type
 test: preproc/variable
diff --git a/src/interfaces/ecpg/test/expected/preproc-static_variables.c b/src/interfaces/ecpg/test/expected/preproc-static_variables.c
new file mode 100644
index 00..b0f62a8b14
--- /dev/null
+++ b/src/interfaces/ecpg/test/expected/preproc-static_variables.c
@@ -0,0 +1,145 @@
+/* Processed by ecpg (regression mode) */
+/* These include files are added by the preprocessor */
+#include 
+#include 
+#include 
+/* End of automatic include section */
+#define ECPGdebug(X,Y) ECPGdebug((X)+100,(Y))
+
+#line 1 "static_variables.pgc"
+#include 
+
+
+#line 1 "regression.h"
+
+
+
+
+
+
+#line 3 "static_variables.pgc"
+
+
+/* exec sql whenever sqlerror  stop ; */
+#line 5 "static_variables.pgc"
+
+
+/* declare cur cursor for select firstname , lastname , address from persons */
+#line 8 "static_variables.pgc"
+
+
+/* exec sql b

Re: freeing LDAPMessage in CheckLDAPAuth

2022-09-04 Thread Zhihong Yu

On Sun, Sep 4, 2022 at 12:25 AM Michael Paquier  wrote:

> On Sun, Sep 04, 2022 at 01:52:10AM -0400, Tom Lane wrote:
> > I can't get too excited about this.  All of the error exit paths in
> > backend authentication code will lead immediately to process exit, so
> > the possibility of some memory being leaked really has no consequences
> > worth worrying about.  If we *were* worried about it, sprinkling a few
> > more ldap_msgfree() calls into the existing code would hardly make it
> > more bulletproof.
>
> Even if this is not critical in the backend for this authentication
> path, I'd like to think that it is still a good practice for future
> code so as anything code-pasted around would get the call.  So I see
> no reason to not put smth on HEAD at least.
>
Hi,
Here is updated patch as you suggested in your previous email.

Thanks


v2-ldap-msg-free.patch
Description: Binary data

TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Tomas Vondra

Hi,

I've been running some valgrind tests on rpi4/aarch64, and I get a crash
in test_decoding ddl test in ~50% runs. I don't see the same failure
without valgrind or on 32-bit system (hundreds of runs, no crashes), so
I suspect this is a race condition, and with valgrind the timing changes
in a way to make it more likely.

The crash always happens in the "ddl" test. The backtrace always looks
like this:

(ExceptionalCondition+0x98)[0x8f6f7c]
(+0x57a7ec)[0x6827ec]
(+0x579edc)[0x681edc]
(ReorderBufferAddNewTupleCids+0x60)[0x686758]
(SnapBuildProcessNewCid+0x94)[0x68b920]
(heap2_decode+0x17c)[0x671584]
(LogicalDecodingProcessRecord+0xbc)[0x670cd0]
(+0x570f88)[0x678f88]
(pg_logical_slot_get_changes+0x1c)[0x6790fc]
(ExecMakeTableFunctionResult+0x29c)[0x4a92c0]
(+0x3be638)[0x4c6638]
(+0x3a2c14)[0x4aac14]
(ExecScan+0x8c)[0x4aaca8]
(+0x3bea14)[0x4c6a14]
(+0x39ea60)[0x4a6a60]
(+0x392378)[0x49a378]
(+0x39520c)[0x49d20c]
(standard_ExecutorRun+0x214)[0x49aad8]
(ExecutorRun+0x64)[0x49a8b8]
(+0x62f53c)[0x73753c]
(PortalRun+0x27c)[0x737198]
(+0x627e78)[0x72fe78]
(PostgresMain+0x9a0)[0x73512c]
(+0x547be8)[0x64fbe8]
(+0x547540)[0x64f540]
(+0x542d30)[0x64ad30]
(PostmasterMain+0x1460)[0x64a574]
(+0x41)[0x520888]

I'm unable to get a better backtrace from the valgrind-produces core
usign gdb, for some reason.

However, I've modified AssertTXNLsnOrder() - which is where the assert
is checked - to also dump toplevel_by_lsn instead of just triggering the
assert, and the result is always like this:

WARNING:  ==
WARNING:  txn xid 849 top 0 first 30264752 0/1CDCDB0 final 0 0/0
WARNING:  txn xid 848 top 0 first 30264752 0/1CDCDB0 final 0 0/0
WARNING:  ==

The LSNs change a bit between the runs, but the failing transactions are
always 848 and 849. Also, both transactions have exactly the same info.

But the very first WAL record for 849 is

ASSIGNMENT xtop 848: subxacts: 849

so it's strange 849 is in the toplevel_by_lsn list at all, because it
clearly is a subxact of 848.

Furthermore, the WAL is almost exactly the same in both cases. Attached
are two dumps from a failed and successful run (only the part related to
these two xids is included). There are very few differences - there is a
PRUNE in the failed case, and a LOCK / RUNNING_XACTS moved a bit.


Any ideas?

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

wal-crash.log.gz
Description: application/gzip


wal-ok.log.gz
Description: application/gzip

Re: pgsql: Add ALTER SUBSCRIPTION ... SKIP.

2022-09-04 Thread Amit Kapila

On Sun, Sep 4, 2022 at 1:48 PM Alvaro Herrera  wrote:
>
> On 2022-Mar-22, Amit Kapila wrote:
>
> > Add ALTER SUBSCRIPTION ... SKIP.
>
> There are two messages here that seem oddly worded.
>
> msgid "start skipping logical replication transaction finished at %X/%X"
> msgid "done skipping logical replication transaction finished at %X/%X"
>
> Two complaints here.  First, the phrases "start / finished" and "done /
> finished" look very strange.  It took me a while to realize that
> "finished" refers to the LSN, not to the skipping operation.  Do we ever
> talk about a transaction "finished at XYZ" as opposed to a transaction
> whose LSN is XYZ?  (This became particularly strange when I realized
> that the LSN might come from a PREPARE.)
>

The reason to add "finished at ..." was to be explicit about whether
it is a starting LSN or an end LSN of a transaction. We do have such
differentiation in ReorderBufferTXN (first_lsn ... end_lsn).

> Second, "logical replication transaction".  Is it not a regular
> transaction that we happen to be processing via logical replication?
>
> I think they should say something like
>
> "logical replication starts skipping transaction with LSN %X/%X"
> "logical replication completed skipping transaction with LSN %X/%X"
>

This looks better to me. If you find the above argument to
differentiate between the start and end LSN convincing then we can
think of replacing "with" in the above messages with "finished at". I
see your point related to using "finished at" for PREPARE may not be a
good idea but I don't have better ideas for the same.

-- 
With Regards,
Amit Kapila.

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Amit Kapila

On Sun, Sep 4, 2022 at 4:34 PM Tomas Vondra
 wrote:
>
> I've been running some valgrind tests on rpi4/aarch64, and I get a crash
> in test_decoding ddl test in ~50% runs. I don't see the same failure
> without valgrind or on 32-bit system (hundreds of runs, no crashes), so
> I suspect this is a race condition, and with valgrind the timing changes
> in a way to make it more likely.
>
> The crash always happens in the "ddl" test. The backtrace always looks
> like this:
>
> (ExceptionalCondition+0x98)[0x8f6f7c]
> (+0x57a7ec)[0x6827ec]
> (+0x579edc)[0x681edc]
> (ReorderBufferAddNewTupleCids+0x60)[0x686758]
> (SnapBuildProcessNewCid+0x94)[0x68b920]
> (heap2_decode+0x17c)[0x671584]
> (LogicalDecodingProcessRecord+0xbc)[0x670cd0]
> (+0x570f88)[0x678f88]
> (pg_logical_slot_get_changes+0x1c)[0x6790fc]
> (ExecMakeTableFunctionResult+0x29c)[0x4a92c0]
> (+0x3be638)[0x4c6638]
> (+0x3a2c14)[0x4aac14]
> (ExecScan+0x8c)[0x4aaca8]
> (+0x3bea14)[0x4c6a14]
> (+0x39ea60)[0x4a6a60]
> (+0x392378)[0x49a378]
> (+0x39520c)[0x49d20c]
> (standard_ExecutorRun+0x214)[0x49aad8]
> (ExecutorRun+0x64)[0x49a8b8]
> (+0x62f53c)[0x73753c]
> (PortalRun+0x27c)[0x737198]
> (+0x627e78)[0x72fe78]
> (PostgresMain+0x9a0)[0x73512c]
> (+0x547be8)[0x64fbe8]
> (+0x547540)[0x64f540]
> (+0x542d30)[0x64ad30]
> (PostmasterMain+0x1460)[0x64a574]
> (+0x41)[0x520888]
>
> I'm unable to get a better backtrace from the valgrind-produces core
> usign gdb, for some reason.
>
> However, I've modified AssertTXNLsnOrder() - which is where the assert
> is checked - to also dump toplevel_by_lsn instead of just triggering the
> assert, and the result is always like this:
>
> WARNING:  ==
> WARNING:  txn xid 849 top 0 first 30264752 0/1CDCDB0 final 0 0/0
> WARNING:  txn xid 848 top 0 first 30264752 0/1CDCDB0 final 0 0/0
> WARNING:  ==
>
> The LSNs change a bit between the runs, but the failing transactions are
> always 848 and 849. Also, both transactions have exactly the same info.
>
> But the very first WAL record for 849 is
>
> ASSIGNMENT xtop 848: subxacts: 849
>
> so it's strange 849 is in the toplevel_by_lsn list at all, because it
> clearly is a subxact of 848.
>

There is no guarantee that toplevel_by_lsn won't have subxact. As per
my understanding, the problem I reported in the email [1] is the same
and we have seen this in BF failures as well. I posted a way to
reproduce it in that email. It seems this is possible if the decoding
gets XLOG_HEAP2_NEW_CID as the first record (belonging to a
subtransaction) after XLOG_RUNNING_XACTS.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LK1nxOTL32OP%3DejhPoBsUP4Bvwb3Ly%3DfethyJ-KbaXyw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Tomas Vondra




On 9/4/22 13:49, Amit Kapila wrote:
> On Sun, Sep 4, 2022 at 4:34 PM Tomas Vondra
>  wrote:
>>
>> I've been running some valgrind tests on rpi4/aarch64, and I get a crash
>> in test_decoding ddl test in ~50% runs. I don't see the same failure
>> without valgrind or on 32-bit system (hundreds of runs, no crashes), so
>> I suspect this is a race condition, and with valgrind the timing changes
>> in a way to make it more likely.
>>
>> The crash always happens in the "ddl" test. The backtrace always looks
>> like this:
>>
>> (ExceptionalCondition+0x98)[0x8f6f7c]
>> (+0x57a7ec)[0x6827ec]
>> (+0x579edc)[0x681edc]
>> (ReorderBufferAddNewTupleCids+0x60)[0x686758]
>> (SnapBuildProcessNewCid+0x94)[0x68b920]
>> (heap2_decode+0x17c)[0x671584]
>> (LogicalDecodingProcessRecord+0xbc)[0x670cd0]
>> (+0x570f88)[0x678f88]
>> (pg_logical_slot_get_changes+0x1c)[0x6790fc]
>> (ExecMakeTableFunctionResult+0x29c)[0x4a92c0]
>> (+0x3be638)[0x4c6638]
>> (+0x3a2c14)[0x4aac14]
>> (ExecScan+0x8c)[0x4aaca8]
>> (+0x3bea14)[0x4c6a14]
>> (+0x39ea60)[0x4a6a60]
>> (+0x392378)[0x49a378]
>> (+0x39520c)[0x49d20c]
>> (standard_ExecutorRun+0x214)[0x49aad8]
>> (ExecutorRun+0x64)[0x49a8b8]
>> (+0x62f53c)[0x73753c]
>> (PortalRun+0x27c)[0x737198]
>> (+0x627e78)[0x72fe78]
>> (PostgresMain+0x9a0)[0x73512c]
>> (+0x547be8)[0x64fbe8]
>> (+0x547540)[0x64f540]
>> (+0x542d30)[0x64ad30]
>> (PostmasterMain+0x1460)[0x64a574]
>> (+0x41)[0x520888]
>>
>> I'm unable to get a better backtrace from the valgrind-produces core
>> usign gdb, for some reason.
>>
>> However, I've modified AssertTXNLsnOrder() - which is where the assert
>> is checked - to also dump toplevel_by_lsn instead of just triggering the
>> assert, and the result is always like this:
>>
>> WARNING:  ==
>> WARNING:  txn xid 849 top 0 first 30264752 0/1CDCDB0 final 0 0/0
>> WARNING:  txn xid 848 top 0 first 30264752 0/1CDCDB0 final 0 0/0
>> WARNING:  ==
>>
>> The LSNs change a bit between the runs, but the failing transactions are
>> always 848 and 849. Also, both transactions have exactly the same info.
>>
>> But the very first WAL record for 849 is
>>
>> ASSIGNMENT xtop 848: subxacts: 849
>>
>> so it's strange 849 is in the toplevel_by_lsn list at all, because it
>> clearly is a subxact of 848.
>>
> 
> There is no guarantee that toplevel_by_lsn won't have subxact.

I don't think that's quite true - toplevel_by_lsn should not contain any
*known* subxacts. Yes, we may initially add a subxact to the list, but
as soon as we get assignment record, it should be removed. See what
ReorderBufferAssignChild does.

And in this case the ASSIGNMENT is the first WAL record we get for 849
(in fact, isn't that guaranteed since 7259736a6e?), so we know from the
very beginning 849 is a subxact.


> As per
> my understanding, the problem I reported in the email [1] is the same
> and we have seen this in BF failures as well. I posted a way to
> reproduce it in that email. It seems this is possible if the decoding
> gets XLOG_HEAP2_NEW_CID as the first record (belonging to a
> subtransaction) after XLOG_RUNNING_XACTS.
> 

Interesting. That's certainly true for WAL in the crashing case:

rmgr: Standby len (rec/tot): 58/58, tx:  0, lsn:
0/01CDCD70, prev 0/01CDCD10, desc: RUNNING_XACTS nextXid 850
latestCompletedXid 847 oldestRunningXid 848; 1 xacts: 848
rmgr: Heap2   len (rec/tot): 60/60, tx:849, lsn:
0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel 1663/16384/1249; tid
58/38; cmin: 1, cmax: 14, combo: 6


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix typo function circle_same (src/backend/utils/adt/geo_ops.c)

2022-09-04 Thread Ranier Vilela

Em sáb., 3 de set. de 2022 às 19:39, Daniel Gustafsson 
escreveu:

> > On 3 Sep 2022, at 09:36, Julien Rouhaud  wrote:
>
> > Yeah, there are unfortunately a lot of problems around those and NaN,
> with
> > multiple reports in the past (I recall [1] and [2] but there were
> others).
>
> NaNs are indeed incredibly complicated, but I think we are sort of in a
> good
> place here given it's testing for equality in floats.  The commit message
> of
> c4c34008854654279ec30067d72fc5d174d2f42f carries an explanation:
>
> The float datatypes consider NaNs values to be equal and greater
> than
> all non-NaN values.  This change considers NaNs equal only for
> equality
> operators.  The placement operators, contains, overlaps,
> left/right of
> etc.  continue to return false when NaNs are involved.
>
> From testing and reading I believe the fix in this thread is correct, but
> since
> NaNs are involved I will take another look at this with fresh eyes before
> going
> ahead.
>
Yeah, the fix is correct.

But with Windows 10 build, I got this diff result:

diff -w -U3
C:/dll/postgres_dev/postgres_master/src/test/regress/expected/geometry.out
C:/dll/postgres_dev/postgres_master/src/test/regress/results/geometry.out
---
C:/dll/postgres_dev/postgres_master/src/test/regress/expected/geometry.out
2022-09-01 08:05:03.685931000 -0300
+++
C:/dll/postgres_dev/postgres_master/src/test/regress/results/geometry.out
2022-09-04 09:27:47.133617800 -0300
@@ -4380,9 +4380,8 @@
  <(100,200),10> | <(100,200),10>
  <(100,1),115>  | <(100,1),115>
  <(3,5),0>  | <(3,5),0>
- <(3,5),NaN>| <(3,5),0>
  <(3,5),NaN>| <(3,5),NaN>
-(9 rows)
+(8 rows)

 -- Overlap with circle
 SELECT c1.f1, c2.f1 FROM CIRCLE_TBL c1, CIRCLE_TBL c2 WHERE c1.f1 && c2.f1;

Not sure why.

regards,
Ranier Vilela

Latest build fails

2022-09-04 Thread Ranier Vilela

Hi,

The latest build fails.
https://github.com/postgres/postgres/runs/8176044869

In file included from
/tmp/cpluspluscheck.ggpN3I/test.cpp:3:[11:12:13.290]
/tmp/cirrus-ci-build/contrib/cube/cubeparse.h:77:19: error: ‘NDBOX’
was not declared in this scope[11:12:13.290]77 | int cube_yyparse
(NDBOX **result, Size scanbuflen);[11:12:13.290]   |
^[11:12:13.290]
/tmp/cirrus-ci-build/contrib/cube/cubeparse.h:77:27: error: ‘result’
was not declared in this scope[11:12:13.290]77 | int cube_yyparse
(NDBOX **result, Size scanbuflen);[11:12:13.290]   |
^~[11:12:13.290]
/tmp/cirrus-ci-build/contrib/cube/cubeparse.h:77:40: error: expected
primary-expression before ‘scanbuflen’[11:12:13.290]77 | int
cube_yyparse (NDBOX **result, Size scanbuflen);[11:12:13.290]   |
  ^~[11:12:13.290]
/tmp/cirrus-ci-build/contrib/cube/cubeparse.h:77:50: error: expression
list treated as compound expression in initializer
[-fpermissive][11:12:13.290]77 | int cube_yyparse (NDBOX **result,
Size scanbuflen);[11:12:13.290]   |
  ^[11:12:13.455] In file included from
/tmp/cpluspluscheck.ggpN3I/test.cpp:3:[11:12:13.455] segparse.h:90:18:
error: ‘SEG’ was not declared in this scope[11:12:13.456]
segparse.h:90:23: error: ‘result’ was not declared in this
scope[11:12:13.860] make: *** [GNUmakefile:141: cpluspluscheck] Error
1


Now I have some trouble in c.h with one my tools:
Windows 10 64 bits
msvc 2019 64 bits
#error must have a working 64-bit integer datatype

Strange.

regards,
Ranier Vilela

Re: Latest build fails

2022-09-04 Thread Ranier Vilela

Em dom., 4 de set. de 2022 às 09:58, Ranier Vilela 
escreveu:

>
> Now I have some trouble in c.h with one my tools:
> Windows 10 64 bits
> msvc 2019 64 bits
> #error must have a working 64-bit integer datatype
>
Nevermind , found the cause.

regards,
Ranier Vilela

Re: freeing LDAPMessage in CheckLDAPAuth

2022-09-04 Thread Zhihong Yu

On Sun, Sep 4, 2022 at 3:58 AM Zhihong Yu  wrote:

>
>
> On Sun, Sep 4, 2022 at 12:25 AM Michael Paquier 
> wrote:
>
>> On Sun, Sep 04, 2022 at 01:52:10AM -0400, Tom Lane wrote:
>> > I can't get too excited about this.  All of the error exit paths in
>> > backend authentication code will lead immediately to process exit, so
>> > the possibility of some memory being leaked really has no consequences
>> > worth worrying about.  If we *were* worried about it, sprinkling a few
>> > more ldap_msgfree() calls into the existing code would hardly make it
>> > more bulletproof.
>>
>> Even if this is not critical in the backend for this authentication
>> path, I'd like to think that it is still a good practice for future
>> code so as anything code-pasted around would get the call.  So I see
>> no reason to not put smth on HEAD at least.
>>
> Hi,
> Here is updated patch as you suggested in your previous email.
>
> Thanks
>
Hi,
Please take a look at patch v3.

Thanks


v3-ldap-msg-free.patch
Description: Binary data

Re: failing to build preproc.c on solaris with sun studio

2022-09-04 Thread Andrew Dunstan



On 2022-09-02 Fr 13:56, Tom Lane wrote:
> Andres Freund  writes:
>> I also wonder if we shouldn't just make ecpg optional at some point. Or even
>> move it out of the tree.
> The reason it's in the tree is to ensure its grammar stays in sync
> with the core grammar, and perhaps more to the point, that it's
> possible to build its grammar at all.  If it were at arm's length,
> we'd probably not have noticed the conflict over STRING in the JSON
> patches until unpleasantly far down the road (to mention just the
> most recent example).  However, those aren't arguments against
> making it optional-to-build like the PLs are.
>
>   


That seems reasonable. Note that the buildfarm client would then need an
extra build step.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: failing to build preproc.c on solaris with sun studio

2022-09-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-09-02 Fr 13:56, Tom Lane wrote:
>> ... However, those aren't arguments against
>> making it optional-to-build like the PLs are.

> That seems reasonable. Note that the buildfarm client would then need an
> extra build step.

Not sure why there'd be an extra build step; I'd envision it more
like "configure ... --with-ecpg" and the main build step either
does it or not.  You would need to make the ecpg-check step
conditional, though, so it's moot: we'd have to fix the buildfarm
first in any case, unless it's default-enabled which would seem
a bit odd.

regards, tom lane

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Tomas Vondra

On 9/4/22 14:24, Tomas Vondra wrote:
> 
> 
> On 9/4/22 13:49, Amit Kapila wrote:
>> On Sun, Sep 4, 2022 at 4:34 PM Tomas Vondra
>>  wrote:
>>>
>>> I've been running some valgrind tests on rpi4/aarch64, and I get a crash
>>> in test_decoding ddl test in ~50% runs. I don't see the same failure
>>> without valgrind or on 32-bit system (hundreds of runs, no crashes), so
>>> I suspect this is a race condition, and with valgrind the timing changes
>>> in a way to make it more likely.
>>>
>>> The crash always happens in the "ddl" test. The backtrace always looks
>>> like this:
>>>
>>> (ExceptionalCondition+0x98)[0x8f6f7c]
>>> (+0x57a7ec)[0x6827ec]
>>> (+0x579edc)[0x681edc]
>>> (ReorderBufferAddNewTupleCids+0x60)[0x686758]
>>> (SnapBuildProcessNewCid+0x94)[0x68b920]
>>> (heap2_decode+0x17c)[0x671584]
>>> (LogicalDecodingProcessRecord+0xbc)[0x670cd0]
>>> (+0x570f88)[0x678f88]
>>> (pg_logical_slot_get_changes+0x1c)[0x6790fc]
>>> (ExecMakeTableFunctionResult+0x29c)[0x4a92c0]
>>> (+0x3be638)[0x4c6638]
>>> (+0x3a2c14)[0x4aac14]
>>> (ExecScan+0x8c)[0x4aaca8]
>>> (+0x3bea14)[0x4c6a14]
>>> (+0x39ea60)[0x4a6a60]
>>> (+0x392378)[0x49a378]
>>> (+0x39520c)[0x49d20c]
>>> (standard_ExecutorRun+0x214)[0x49aad8]
>>> (ExecutorRun+0x64)[0x49a8b8]
>>> (+0x62f53c)[0x73753c]
>>> (PortalRun+0x27c)[0x737198]
>>> (+0x627e78)[0x72fe78]
>>> (PostgresMain+0x9a0)[0x73512c]
>>> (+0x547be8)[0x64fbe8]
>>> (+0x547540)[0x64f540]
>>> (+0x542d30)[0x64ad30]
>>> (PostmasterMain+0x1460)[0x64a574]
>>> (+0x41)[0x520888]
>>>
>>> I'm unable to get a better backtrace from the valgrind-produces core
>>> usign gdb, for some reason.
>>>
>>> However, I've modified AssertTXNLsnOrder() - which is where the assert
>>> is checked - to also dump toplevel_by_lsn instead of just triggering the
>>> assert, and the result is always like this:
>>>
>>> WARNING:  ==
>>> WARNING:  txn xid 849 top 0 first 30264752 0/1CDCDB0 final 0 0/0
>>> WARNING:  txn xid 848 top 0 first 30264752 0/1CDCDB0 final 0 0/0
>>> WARNING:  ==
>>>
>>> The LSNs change a bit between the runs, but the failing transactions are
>>> always 848 and 849. Also, both transactions have exactly the same info.
>>>
>>> But the very first WAL record for 849 is
>>>
>>> ASSIGNMENT xtop 848: subxacts: 849
>>>
>>> so it's strange 849 is in the toplevel_by_lsn list at all, because it
>>> clearly is a subxact of 848.
>>>
>>
>> There is no guarantee that toplevel_by_lsn won't have subxact.
> 
> I don't think that's quite true - toplevel_by_lsn should not contain any
> *known* subxacts. Yes, we may initially add a subxact to the list, but
> as soon as we get assignment record, it should be removed. See what
> ReorderBufferAssignChild does.
> 
> And in this case the ASSIGNMENT is the first WAL record we get for 849
> (in fact, isn't that guaranteed since 7259736a6e?), so we know from the
> very beginning 849 is a subxact.
> 
> 
>> As per
>> my understanding, the problem I reported in the email [1] is the same
>> and we have seen this in BF failures as well. I posted a way to
>> reproduce it in that email. It seems this is possible if the decoding
>> gets XLOG_HEAP2_NEW_CID as the first record (belonging to a
>> subtransaction) after XLOG_RUNNING_XACTS.
>>
> 
> Interesting. That's certainly true for WAL in the crashing case:
> 
> rmgr: Standby len (rec/tot): 58/58, tx:  0, lsn:
> 0/01CDCD70, prev 0/01CDCD10, desc: RUNNING_XACTS nextXid 850
> latestCompletedXid 847 oldestRunningXid 848; 1 xacts: 848
> rmgr: Heap2   len (rec/tot): 60/60, tx:849, lsn:
> 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel 1663/16384/1249; tid
> 58/38; cmin: 1, cmax: 14, combo: 6
> 

I investigated using the pgdata from the crashed run (can provide, if
you have rpi4 or some other aarch64 machine), and the reason is pretty
simple - the restart_lsn for the slot is 0/1CDCD70, which is looong
after the subxact assignment, so we add both xids as toplevel.

That seems broken - if we skip the assignment like this, doesn't that
break spill-to-disk and/or streaming? IIRC that's exactly why we had to
start logging assignments immediately with wal_level=logical.

Or maybe we're not dealing with the restart_lsn properly, and we should
have ignored those records. Both xacts started long before the restart
LSN, so we're not seeing the whole xact anyway.

However, when processing the NEW_CID record:

tx:849, lsn: 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel
1663/16384/1249; tid 58/38; cmin: 1, cmax: 14, combo: 6

we ultimately do this in SnapBuildProcessNewCid:

#1  0x005566cccdb4 in ReorderBufferAddNewTupleCids (rb=0x559dd64218,
xid=848, lsn=30264752, locator=..., tid=..., cmin=1, cmax=14,
combocid=6) at reorderbuffer.c:3218
#2  0x005566cd1f7c in SnapBuildProcessNewCid (builder=0x559dd

Re: failing to build preproc.c on solaris with sun studio

2022-09-04 Thread Andrew Dunstan



On 2022-09-04 Su 09:56, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-09-02 Fr 13:56, Tom Lane wrote:
>>> ... However, those aren't arguments against
>>> making it optional-to-build like the PLs are.
>> That seems reasonable. Note that the buildfarm client would then need an
>> extra build step.
> Not sure why there'd be an extra build step; I'd envision it more
> like "configure ... --with-ecpg" and the main build step either
> does it or not. 


Ah, ok, makes sense.


>  You would need to make the ecpg-check step
> conditional, though, so it's moot: we'd have to fix the buildfarm
> first in any case, unless it's default-enabled which would seem
> a bit odd.
>
>   


*nod*


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: failing to build preproc.c on solaris with sun studio

2022-09-04 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-09-04 Su 09:56, Tom Lane wrote:
>> You would need to make the ecpg-check step
>> conditional, though, so it's moot: we'd have to fix the buildfarm
>> first in any case, unless it's default-enabled which would seem
>> a bit odd.

> *nod*

I guess we could proceed like this:

1. Invent the --with option.  Temporarily make "make check" in ecpg
print a message but not fail if the option wasn't selected.

2. Update buildfarm client to recognize the option and skip ecpg-check
if not selected.

3. Sometime down the road, after everyone's updated their buildfarm
animals, flip ecpg "make check" to throw an error reporting that
ecpg wasn't built.

There'd need to be a separate discussion around how much to
encourage buildfarm owners to add --with-ecpg to their
configurations.  One thing that would make that easier is
adding --with-ecpg as a no-op option to the back branches,
so that if you do want it on it doesn't have to be done
with a branch-specific test.  (I guess packagers might
appreciate that too.)

regards, tom lane

Re: failing to build preproc.c on solaris with sun studio

2022-09-04 Thread Noah Misch

On Sun, Sep 04, 2022 at 10:55:43AM -0400, Tom Lane wrote:
> There'd need to be a separate discussion around how much to
> encourage buildfarm owners to add --with-ecpg to their
> configurations.  One thing that would make that easier is
> adding --with-ecpg as a no-op option to the back branches,
> so that if you do want it on it doesn't have to be done
> with a branch-specific test.

That would not make it easier.  "configure" doesn't fail when given unknown
options, so there's already no need for a branch-specific test.  For example,
topminnow has no problem passing --with-llvm on branches lacking that option:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=topminnow&dt=2022-08-27%2005%3A57%3A45&stg=configure

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Tomas Vondra

On 9/4/22 16:08, Tomas Vondra wrote:
> ...
> 
> so in fact we *know* 849 is a subxact of 848, but we don't call
> ReorderBufferAssignChild in this case. In fact we can't even do the
> assignment easily in this case, because we create the subxact first, so
> that the crash happens right when we attempt to create the toplevel one,
> and we never even get a chance to do the assignment:
> 
> 1) process the NEW_CID record, logged for 849 (subxact)
> 2) process CIDs in the WAL record, which has topleve_xid 848
> 
> 
> So IMHO we need to figure out what to do for WAL records that create
> both the toplevel and subxact - either we need to skip them, or rethink
> how we create the ReorderBufferTXN structs.
> 

This fixes the crash for me, by adding a ReorderBufferAssignChild call
to SnapBuildProcessNewCid, and tweaking ReorderBufferAssignChild to
ensure we don't try to create the top xact before updating the subxact
and removing it from the toplevel_by_lsn list.

Essentially, what's happening is this:

1) We read the NEW_CID record, which is logged with XID 849, i.e. the
subxact. But we don't know it's a subxact, so we create it as a
top-level xact with the LSN.

2) We start processing contents of the NEW_CID, which however has info
that 849 is subxact of 848, calls ReorderBufferAddNewTupleCids which
promptly does ReorderBufferTXNByXid() with the top-level XID, which
creates it with the same LSN, and crashes because of the assert.

I'm not sure what's the right/proper way to fix this ...

The problem is ReorderBufferAssignChild was coded in a way that did not
expect the subxact to be created first (as a top-level xact). And
indeed, if I add Assert(false) to the (!new_sub) branch that converts
top-level xact to subxact, check-world still passes. So we never test
this case, but the NEW_CID breaks this assumption and creates them in
the opposite order (i.e. subxact first).

So the patch "fixes" this by

(a) tweaking ReorderBufferAssignChild to first remove the subxact from
the list of top-level transactions

(b) call ReorderBufferAssignChild when processing NEW_CID

However, I wonder whether we even have to process these records? If the
restart_lsn is half-way through the xact, so can we even decode it?
Maybe we can just skip all of this, somehow? We'd still need to remember
849 is a subxact of 848, at least, so that we know to skip it too.

Thread [1] suggested to relax the assert to allow the same LSN, provided
it's xact and it's subxact. That goes directly against the expectation
the toplevel_by_lsn list contains no known subxacts, and I don't think
we should be relaxing that. After all, just tweaking the LSN does not
really fix the issue, because not remembering it's xact+subxact is part
of the issue. In principle, I think the issue is exactly the opposite,
i.e. that we don't realize 849 is a subxact, and leave it in the list.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 89cf9f9389c..34ca52d6cc0 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1029,7 +1029,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 	bool		new_top;
 	bool		new_sub;

-	txn = ReorderBufferTXNByXid(rb, xid, true, &new_top, lsn, true);
 	subtxn = ReorderBufferTXNByXid(rb, subxid, true, &new_sub, lsn, false);

 	if (!new_sub)
@@ -1054,6 +1053,8 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 	subtxn->toplevel_xid = xid;
 	Assert(subtxn->nsubtxns == 0);

+	txn = ReorderBufferTXNByXid(rb, xid, true, &new_top, lsn, true);
+
 	/* set the reference to top-level transaction */
 	subtxn->toptxn = txn;

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index bf72ad45ec7..00428e46d09 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -808,6 +808,10 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;

+	/* we have a top XID, but it's not known to be subxact */
+	if (xid != xlrec->top_xid && xlrec->top_xid != InvalidTransactionId)
+		ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+
 	/*
 	 * we only log new_cid's if a catalog tuple was modified, so mark the
 	 * transaction as containing catalog modifications

Re: First draft of the PG 15 release notes

2022-09-04 Thread Jonathan S. Katz


On 5/10/22 11:44 AM, Bruce Momjian wrote:
I have completed the first draft of the PG 15 release notes 



I assume there will be major adjustments in the next few weeks based on
feedback.


I wanted to propose the "major enhancements" section to see if we can 
get an iteration in prior to Beta 4. Please see attached patched.


Do we want to include anything else, or substitute any of the items?

Thanks,

Jonathan
diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index d432c2db44..7387d493d4 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -3,7 +3,7 @@
 
   
Release date:
-   AS OF 2022-06-11
+   AS OF 2022-09-04
   
 
   
@@ -15,7 +15,38 @@

 

-
+ 
+  
+   Performance improvements for in-memory and on-disk sorting.
+  
+ 
+
+ 
+  Support for the SQL MERGE
+  command.
+ 
+
+
+ 
+  Column-level and row-level filtering on
+  logical replication
+  publications.
+ 
+
+
+ 
+  More options for compression, including support for Zstandard (zstd)
+  compression. This includes support for server-side compression for
+  
+pg_basebackup.
+ 
+
+
+ 
+  Structured log output using
+  the JSON format.
+ 
+

 



OpenPGP_signature
Description: OpenPGP digital signature

Re: build remaining Flex files standalone

2022-09-04 Thread Andres Freund

Hi,

On 2022-09-04 12:16:10 +0700, John Naylor wrote:
> Pushed 01 and 02 separately, then squashed and pushed the rest.

Thanks a lot! It does look a good bit cleaner to me now.

I think, as a followup improvement, we should move gramparse.h to
src/backend/parser, and stop installing gram.h, gramparse.h. gramparse.h
already had this note:

 * NOTE: this file is only meant to be included in the core parsing files,
 * i.e., parser.c, gram.y, and scan.l.
 * Definitions that are needed outside the core parser should be in parser.h.

What do you think?


I looked for projects including gramparse.h ([1], and found libpg-query, pgpool,
slony1 and oracfe:
- libpg-query, pgpool are partial copies of our code so will catch up when
  they sync up,
- slony1's [2] is a configure check, one that long seems outdated, because it's
  grepping for standard_conforming strings, which was moved out in 6566e37e027
  in 2009.
- As far as I can tell oracfe's include in sqlscan.l is vistigial, it compiles
  without it. And the include in parse_keywords.c is just required because it
  needs to include parser/scanner.h.

Greetings,

Andres Freund

[1] https://codesearch.debian.net/search?q=gramparse.h&literal=1&perpkg=1
[2] 
https://git.postgresql.org/gitweb/?p=slony1-engine.git;a=blob;f=config/acx_libpq.m4;h=7653357c0a731e36ec637df5ab378832d9279c19;hb=HEAD#l530

Re: First draft of the PG 15 release notes

2022-09-04 Thread Nathan Bossart

I noticed that the v15 release notes still refer to pg_checkpointer, which
was renamed to pg_checkpoint in b9eb0ff.

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index d432c2db44..362728753a 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -1255,7 +1255,7 @@ Author: Jeff Davis 
   

 Add predefined role pg_checkpointer
+
linkend="predefined-roles-table">pg_checkpoint
 that allows members to run CHECKPOINT
 (Jeff Davis)


-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [RFC] building postgres with meson - v12

2022-09-04 Thread Andres Freund

Hi,

On 2022-09-04 13:12:52 +0700, John Naylor wrote:
> On Fri, Sep 2, 2022 at 11:35 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2022-09-02 14:17:26 +0700, John Naylor wrote:
> > > On Thu, Sep 1, 2022 at 1:12 AM Andres Freund  wrote:
> > > > [v12]
> > >
> > > +# Build a small utility static lib for the parser. This makes it easier 
> > > to not
> > > +# depend on gram.h already having been generated for most of the other 
> > > code
> > > +# (which depends on generated headers having been generated). The 
> > > generation
> > > +# of the parser is slow...
> > >
> > > It's not obvious whether this is intended to be a Meson-only
> > > optimization or a workaround for something awkward to specify.
> >
> > It is an optimization. The parser generation is by far the slowest part of a
> > build. If other files can only be compiled once gram.h is generated, 
> > there's a
> > long initial period where little can happen. So instead of having all .c 
> > files
> > have a dependency on gram.h having been generated, the above makes only
> > scan.c, gram.c compilation depend on gram.h.  It only matters for the first
> > compilation, because such dependencies are added as order-only dependencies,
> > supplanted by more precise compiler generated dependencies after.
> 
> Okay, I think the comment could include some of this info for clarity.

Working on that.


> > It's still pretty annoying that so much of the build is initially idle,
> > waiting for genbki.pl to finish.
> >
> > Part of that is due to some ugly dependencies of src/common on backend 
> > headers
> > that IMO probably shouldn't exist (e.g. src/common/relpath.c includes
> > catalog/pg_tablespace_d.h).
> 
> Technically, *_d.h headers are not backend, that's why it's safe to
> include them anywhere. relpath.c in its current form has to know the
> tablespace OIDs, which I guess is what you think is ugly. (I agree
> it's not great)

Yea, I'm not saying it's unsafe in a produces-wrong-results way, just that it
seems architecturally dubious / circular.


> > Looks like it'd not be hard to get at least the
> > _shlib version of src/common and libpq build without waiting for that. But 
> > for
> > all the backend code I don't really see a way, so it'd be nice to make 
> > genbki
> > faster at some point.
> 
> The attached gets me a ~15% reduction in clock time by having
> Catalog.pm parse the .dat files in one sweep, when we don't care about
> formatting, i.e. most of the time:

Cool. Seems worthwhile.

Greetings,

Andres Freund

Re: POC: GROUP BY optimization

2022-09-04 Thread Tomas Vondra

On 9/1/22 16:05, Jonathan S. Katz wrote:
> On 9/1/22 9:06 AM, Tomas Vondra wrote:
>>
>>
>> On 8/30/22 15:15, Jonathan S. Katz wrote:
>>> On 8/18/22 3:29 AM, Tomas Vondra wrote:


 On 8/18/22 03:32, David Rowley wrote:
>>>
> Here are a couple of patches to demo the idea.
>

 Yeah, that's an option too. I should have mentioned it along with the
 cpu_operator_cost.

 BTW would you mind taking a look at the costing? I think it's fine, but
 it would be good if someone not involved in the patch takes a look.
>>>
>>> With RMT hat on, just a friendly reminder that this is still on the open
>>> items list[1] -- we have the Beta 4 release next week and it would be
>>> great if we can get this resolved prior to it.
>>>
>>
>> I know. I'll fix this by tweaking one of the cost gucs, mentioned by
>> David.
> 
> Great -- thanks for the update!
> 

I've pushed the fix to 15+master. In the end I just used David's patches
that set parallel_setup_cost to 0.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Different compression methods for FPI

2022-09-04 Thread Justin Pryzby

On Fri, Sep 02, 2022 at 06:55:11AM -0500, Justin Pryzby wrote:
> On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> > In this patch series, I added compression information to the errcontext from
> > xlog_block_info(), and allow specifying compression levels like zlib-2.  
> > I'll
> > rearrange that commit earlier if we decide that's desirable to include.
> 
> 4035cd5d4 added wal_compress=lz4 and
> e9537321a added wal_compress=zstd
> 
> Since 4035cd5d4, pg_waldump has shown compression info (and walinspect
> shows the same since 2258e76f9).
> 
> But if you try to replay WAL on a server which wasn't compiled with
> support for the requisite compression methods, it's a bit crummy that it
> doesn't include in the error message the reason *why* it failed to
> restore the image, if that's caused by the missing compression method.

That's also hitting an elog().

2022-09-04 15:56:29.916 CDT startup[2625] FATAL:  XX000: failed to restore 
block image
2022-09-04 15:56:29.916 CDT startup[2625] DETAIL:  image at 0/1D11CB8 
compressed with zstd not supported by build, block 0
2022-09-04 15:56:29.916 CDT startup[2625] CONTEXT:  WAL redo at 0/1D11CB8 for 
Heap/DELETE: off 50 flags 0x00 KEYS_UPDATED ; blkref #0: rel 1663/16384/2610, 
blk 4 FPW
2022-09-04 15:56:29.916 CDT startup[2625] LOCATION:  
XLogReadBufferForRedoExtended, xlogutils.c:396

(gdb) bt
#0  report_invalid_record (state=0x55e33ff0, fmt=0x55b1c1a8 "image at 
%X/%X compressed with %s not supported by build, block %d") at xlogreader.c:74
#1  0x556beeec in RestoreBlockImage 
(record=record@entry=0x55e33ff0, block_id=block_id@entry=0 '\000', 
page=page@entry=0x7fffee9bdc00 "") at xlogreader.c:2078
#2  0x556c5d39 in XLogReadBufferForRedoExtended 
(record=record@entry=0x55e33ff0, block_id=block_id@entry=0 '\000', 
mode=mode@entry=RBM_NORMAL, get_cleanup_lock=get_cleanup_lock@entry=false, 
buf=buf@entry=0x7fffd760) at xlogutils.c:395
#3  0x556c5e4a in XLogReadBufferForRedo 
(record=record@entry=0x55e33ff0, block_id=block_id@entry=0 '\000', 
buf=buf@entry=0x7fffd760) at xlogutils.c:320
#4  0x5565bd5b in heap_xlog_delete (record=0x55e33ff0) at 
heapam.c:9032
#5  0x556622b7 in heap_redo (record=) at heapam.c:9836
#6  0x556c15ed in ApplyWalRecord (xlogreader=0x55e33ff0, 
record=record@entry=0x7fffee2b6820, replayTLI=replayTLI@entry=0x7fffd870) 
at ../../../../src/include/access/xlog_internal.h:379
#7  0x556c4c30 in PerformWalRecovery () at xlogrecovery.c:1725
#8  0x556b7f23 in StartupXLOG () at xlog.c:5291
#9  0x55ac4491 in InitPostgres 
(in_dbname=in_dbname@entry=0x55e09390 "postgres", dboid=dboid@entry=0, 
username=username@entry=0x55dedda0 "pryzbyj", useroid=useroid@entry=0, 
load_session_libraries=, 
override_allow_connections=override_allow_connections@entry=false, 
out_dbname=0x0) at postinit.c:731
#10 0x5598471f in PostgresMain (dbname=0x55e09390 "postgres", 
username=username@entry=0x55dedda0 "pryzbyj") at postgres.c:4085
#11 0x559851b0 in PostgresSingleUserMain (argc=5, argv=0x55de7530, 
username=0x55dedda0 "pryzbyj") at postgres.c:3986
#12 0x55840533 in main (argc=5, argv=0x55de7530) at main.c:194

I guess it should be promoted to an ereport(), since it's now a
user-facing error rathere than an internal one.

$ ./tmp_install.without-zstd/usr/local/pgsql/bin/postgres -D 
./src/test/regress/tmp_check/data
2022-09-04 15:28:37.446 CDT postmaster[29964] LOG:  starting PostgreSQL 16devel 
on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 
64-bit
2022-09-04 15:28:37.446 CDT postmaster[29964] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2022-09-04 15:28:37.528 CDT postmaster[29964] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2022-09-04 15:28:37.587 CDT startup[29972] LOG:  database system was 
interrupted while in recovery at 2022-09-04 15:27:44 CDT
2022-09-04 15:28:37.587 CDT startup[29972] HINT:  This probably means that some 
data is corrupted and you will have to use the last backup for recovery.
2022-09-04 15:28:38.010 CDT startup[29972] LOG:  database system was not 
properly shut down; automatic recovery in progress
2022-09-04 15:28:38.038 CDT startup[29972] LOG:  redo starts at 0/1D118C0
2022-09-04 15:28:38.039 CDT startup[29972] LOG:  could not stat file 
"pg_tblspc/16502": No such file or directory
2022-09-04 15:28:38.039 CDT startup[29972] CONTEXT:  WAL redo at 0/1D11970 for 
Tablespace/DROP: 16502
2022-09-04 15:28:38.039 CDT startup[29972] FATAL:  failed to restore block image
2022-09-04 15:28:38.039 CDT startup[29972] DETAIL:  image at 0/1D11CB8 
compressed with zstd not supported by build, block 0
2022-09-04 15:28:38.039 CDT startup[29972] CONTEXT:  WAL redo at 0/1D11CB8 for 
Heap/DELETE: off 50 flags 0x00 KEYS_UPDATED ; blkref #0: rel 1663/16384/2610, 
blk 4 FPW

diff --git a/src/backend/access/transam/xlogutils.c 
b/sr

Re: Column Filtering in Logical Replication

2022-09-04 Thread Peter Smith

On Fri, Sep 2, 2022 at 11:40 PM Amit Kapila  wrote:
>
> On Fri, Sep 2, 2022 at 8:45 AM Peter Smith  wrote:
> >
> > On Thu, Sep 1, 2022 at 7:53 PM Amit Kapila  wrote:
> > >
> > > On Fri, Aug 26, 2022 at 7:33 AM Peter Smith  wrote:
> > > >
> > >
> > > Few comments on both the patches:
> > > v4-0001*
> > > =
> > > 1.
> > > Furthermore, if the table uses
> > > +   REPLICA IDENTITY FULL, specifying a column list is 
> > > not
> > > +   allowed (it will cause publication errors for 
> > > UPDATE or
> > > +   DELETE operations).
> > >
> > > This line sounds a bit unclear to me. From this like it appears that
> > > the following operation is not allowed:
> > >
> > > postgres=# create table t1(c1 int, c2 int, c3 int);
> > > CREATE TABLE
> > > postgres=# Alter Table t1 replica identity full;
> > > ALTER TABLE
> > > postgres=# create publication pub1 for table t1(c1);
> > > CREATE PUBLICATION
> > >
> > > However, what is not allowed is the following:
> > > postgres=# delete from t1;
> > > ERROR:  cannot delete from table "t1"
> > > DETAIL:  Column list used by the publication does not cover the
> > > replica identity.
> > >
> > > I am not sure if we really need this line but if so then please try to
> > > make it more clear. I think the similar text is present in 0002 patch
> > > which should be modified accordingly.
> > >
> >
> > The "Furthermore…" sentence came from the commit message [1]. But I
> > agree it seems redundant/ambiguous, so I have removed it (and removed
> > the same in patch 0002).
> >
>
> Thanks, pushed your first patch.
>

Thanks for the push.

I have rebased the remaining patch (v6-0001 is the same as v5-0002)

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v6-0001-Column-List-new-pgdocs-section.patch
Description: Binary data

Re: POC: GROUP BY optimization

2022-09-04 Thread Jonathan S. Katz


On 9/4/22 6:14 PM, Tomas Vondra wrote:


I've pushed the fix to 15+master. In the end I just used David's patches
that set parallel_setup_cost to 0.


Thanks! I have closed the open item.

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Thomas Munro

On Fri, Sep 2, 2022 at 6:20 PM Thomas Munro  wrote:
> ... The active ingredient here is a setting of
> maintenance_io_concurency=0, which runs into a dumb accounting problem
> of the fencepost variety and incorrectly concludes it's reached the
> end early.  Setting it to 3 or higher allows his system to complete
> recovery.  I'm working on a fix ASAP.

The short version is that when tracking the number of IOs in progress,
I had two steps in the wrong order in the algorithm for figuring out
whether IO is saturated.  Internally, the effect of
maintenance_io_concurrency is clamped to 2 or more, and that mostly
hides the bug until you try to replay a particular sequence like
Justin's with such a low setting.  Without that clamp, and if you set
it to 1, then several of our recovery tests fail.

That clamp was a bad idea.  What I think we really want is for
maintenance_io_concurrency=0 to disable recovery prefetching exactly
as if you'd set recovery_prefetch=off, and any other setting including
1 to work without clamping.

Here's the patch I'm currently testing.  It also fixes a related
dangling reference problem with very small maintenance_io_concurrency.

I had this more or less figured out on Friday when I wrote last, but I
got stuck on a weird problem with 026_overwrite_contrecord.pl.  I
think that failure case should report an error, no?  I find it strange
that we end recovery in silence.  That was a problem for the new
coding in this patch, because it is confused by XLREAD_FAIL without
queuing an error, and then retries, which clobbers the aborted recptr
state.  I'm still looking into that.
From fabc519d39adcb65996031059dbf42c8e3623764 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 5 Sep 2022 12:07:24 +1200
Subject: [PATCH] Fix recovery_prefetch with low maintenance_io_concurrency.

The LSN-based logic for knowing when IOs complete ran operations in the
wrong order.  We should process completed IOs first before trying to
start more, so that it is always possible to decode one more record when
the decoded record queue is empty, even if maintenance_io_concurrency is
set so low that a single earlier WAL record might have saturated the IO
queue.

That thinko was hidden because the effect of maintenance_io_concurrency
was arbitrarily clamped to be at least 2.  Fix the ordering, and also
remove that clamp.  We need a special case for 0, which is now treated
the same as recovery_prefetch=off, but otherwise the number is used
directly.  This allows for testing with 1, which would have made the
problem obvious in simple test scenarios.

Back-patch to 15.

Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
---
 src/backend/access/transam/xlogprefetcher.c | 58 ++---
 src/backend/access/transam/xlogreader.c |  4 ++
 2 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/xlogprefetcher.c b/src/backend/access/transam/xlogprefetcher.c
index 9aa56411d5..96038448cc 100644
--- a/src/backend/access/transam/xlogprefetcher.c
+++ b/src/backend/access/transam/xlogprefetcher.c
@@ -72,7 +72,9 @@
 int			recovery_prefetch = RECOVERY_PREFETCH_TRY;

 #ifdef USE_PREFETCH
-#define RecoveryPrefetchEnabled() (recovery_prefetch != RECOVERY_PREFETCH_OFF)
+#define RecoveryPrefetchEnabled() \
+		(recovery_prefetch != RECOVERY_PREFETCH_OFF && \
+		 maintenance_io_concurrency > 0)
 #else
 #define RecoveryPrefetchEnabled() false
 #endif
@@ -1000,7 +1002,8 @@ XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher, char **errmsg)

 		if (RecoveryPrefetchEnabled())
 		{
-			max_inflight = Max(maintenance_io_concurrency, 2);
+			Assert(maintenance_io_concurrency > 0);
+			max_inflight = maintenance_io_concurrency;
 			max_distance = max_inflight * XLOGPREFETCHER_DISTANCE_MULTIPLIER;
 		}
 		else
@@ -1018,14 +1021,39 @@ XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher, char **errmsg)
 	}

 	/*
-	 * Release last returned record, if there is one.  We need to do this so
-	 * that we can check for empty decode queue accurately.
+	 * Release last returned record, if there is one, and perform tasks that
+	 * are we can do now that the caller has replayed it.
 	 */
-	XLogReleasePreviousRecord(prefetcher->reader);
+	if (likely(record = prefetcher->reader->record))
+	{
+		XLogRecPtr	replayed_up_to = record->next_lsn;
+
+		XLogReleasePreviousRecord(prefetcher->reader);
+
+		/*
+		 * Can we drop any filters yet?  If we were waiting for a relation to
+		 * be created or extended, it is now OK to access blocks in the covered
+		 * range.
+		 */
+		XLogPrefetcherCompleteFilters(prefetcher, replayed_up_to);
+
+		/*
+		 * All IO initiated by earlier WAL is now completed.  This might
+		 * trigger further prefetching.
+		 */
+		lrq_complete_lsn(prefetcher->streaming_read, replayed_up_to);
+	}

-	/* If there's nothing queued yet, then start prefetching. */
+	/*
+	 * If there's nothing queued yet, then start prefetching to cause at least
+	 * one record

Re: Postmaster self-deadlock due to PLT linkage resolution

2022-09-04 Thread Thomas Munro

On Wed, Aug 31, 2022 at 1:34 AM Thomas Munro  wrote:
> On Wed, Aug 31, 2022 at 12:26 AM Robert Haas  wrote:
> > On Tue, Aug 30, 2022 at 8:17 AM Thomas Munro  wrote:
> > > FWIW I suspect FreeBSD can't break like this in a program linked with
> > > libthr, because it has a scheme for deferring signals while the
> > > runtime linker holds locks.  _rtld_bind calls _thr_rtld_rlock_acquire,
> > > which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to
> > > defer until release.  For a non-thread program, I'm not entirely sure,
> > > but I don't think the fork() problem exists there.  (Could be wrong,
> > > based on a quick look.)
> >
> > Well that seems a bit ironic, considering that Tom has worried in the
> > past that linking with threading libraries would break stuff.
>
> Hah.  To clarify, non-thread builds don't have that exact fork()
> problem, but it turns out they do have a related state clobbering
> problem elsewhere, which I've reported.

For the record, reporting that resulted in a change for non-libthr rtld:

https://cgit.freebsd.org/src/commit/?id=a687683b997c5805ecd6d8278798b7ef00d9908f

RE: Column Filtering in Logical Replication

2022-09-04 Thread shiy.f...@fujitsu.com

On Mon, Sep 5, 2022 8:28 AM Peter Smith  wrote:
> 
> I have rebased the remaining patch (v6-0001 is the same as v5-0002)
> 

Thanks for updating the patch. Here are some comments.

1.
+ the  will be successful but later
+ the WalSender on the publisher, or the subscriber may throw an error. In
+ this scenario, the user needs to recreate the subscription after adjusting

Should "WalSender" be changed to "walsender"? I saw "walsender" is used in other
places in the documentation.

2.
+test_pub=# CREATE TABLE t1(id int, a text, b text, c text, d text, e text, 
PRIMARY KEY(id));
+CREATE TABLE
+test_pub=#

+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 (id, b, a, d);
+CREATE PUBLICATION
+test_pub=#

I think the redundant "test_pub=#" can be removed.

Besides, I tested the examples in the patch, there's no problem.

Regards,
Shi yu

Re: Handle infinite recursion in logical replication setup

2022-09-04 Thread Peter Smith

Here are my review comments for v45-0001:

==

1. doc/src/sgml/logical-replication.sgml

  
   To find which tables might potentially include non-local origins (due to
   other subscriptions created on the publisher) try this SQL query:

SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
FROM pg_publication P,
 LATERAL pg_get_publication_tables(P.pubname) GPT
 LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
 pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
  P.pubname IN (pub-names);


1a.
To use "" with the <> then simply put meta characters in the SGML.
e.g.


~

1b.
The patch forgot to add the SQL "#" instruction as suggested by my
previous comment (see [1] #3b)

~~~

2.

 
  Preventing Data Inconsistencies for copy_data, origin=NONE

The title is OK, but I think this should all be a  sub-section
of "31.2 Subscription"

==

3. src/backend/commands/subscriptioncmds.c - check_publications_origin

+ initStringInfo(&cmd);
+ appendStringInfoString(&cmd,
+"SELECT DISTINCT P.pubname AS pubname\n"
+"FROM pg_publication P,\n"
+" LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+" LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+" pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+"WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');
+ get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);

(see from get_skip_tables_str)

+ appendStringInfoString(dest, " AND C.oid NOT IN (SELECT C.oid FROM
pg_class C JOIN pg_namespace N on N.oid = C.relnamespace where ");


IMO the way you are using get_skip_tables_str should be modified. I
will show by way of example below.
- "where" -> "WHERE"
- put the SELECT at the caller instead of inside the function
- handle the ")" at the caller

All this will also make the body of the 'get_skip_tables_str' function
much simpler (see the next review comments)

SUGGESTION
if (subrel_count > 0)
{
/* TODO - put some explanatory comment here about skipping the tables */
appendStringInfo(&cmd,
" AND C.oid NOT IN (\n"
"SELECT C.oid FROM pg_class C\n"
"JOIN pg_namespace N on N.oid = C.relnamespace WHERE ");
get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
appendStringInf(&cmd, “)”);
}

~~~

4. src/backend/commands/subscriptioncmds.c - get_skip_tables_str

+/*
+ * Add the table names that should be skipped.
+ */

This comment does not have enough detail to know really what the
function is for. Perhaps you only need to say that this is a helper
function for 'check_publications_origin' and then where it is called
you can give a comment (e.g. see my review comment #3)

~~

5. get_skip_tables_str (body)

5a. Variable 'count' is not a very good name; IMO just say 'i' for
index, and it can be declared C99 style.

~

5b. Variable 'first' is not necessary - it's same as (i == 0)

~

5c. The whole "SELECT" part and the ")" parts are more simply done at
the caller (see the review comment #3)

~

5d.

+ appendStringInfo(dest, "(C.relname = '%s' and N.nspname = '%s')",
+ tablename, schemaname);

It makes no difference but I thought it would feel more natural if the
SQL would compare the schema name *before* the table name.

~

5e.
"and" -> "AND"

~

Doing all 5a,b,c,d, and e means overall having a much simpler function body:

SUGGESTION
+ for (int i = 0; i < subrel_count; i++)
+ {
+ Oid relid = subrel_local_oids[i];
+ char*schemaname = get_namespace_name(get_rel_namespace(relid));
+ char*tablename = get_rel_name(relid);
+
+ if (i > 0)
+ appendStringInfoString(dest, " OR ");
+
+ appendStringInfo(dest, "(N.nspname = '%s' AND C.relname = '%s')",
+ schemaname, tablename);
+ }

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsku25%2BVjz7HiohWxc2WU07O_ZV4voFG%2BU7WzwKhUzpGQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Amit Kapila

On Sun, Sep 4, 2022 at 7:38 PM Tomas Vondra
 wrote:
>
> On 9/4/22 14:24, Tomas Vondra wrote:
> >
> >> As per
> >> my understanding, the problem I reported in the email [1] is the same
> >> and we have seen this in BF failures as well. I posted a way to
> >> reproduce it in that email. It seems this is possible if the decoding
> >> gets XLOG_HEAP2_NEW_CID as the first record (belonging to a
> >> subtransaction) after XLOG_RUNNING_XACTS.
> >>
> >
> > Interesting. That's certainly true for WAL in the crashing case:
> >
> > rmgr: Standby len (rec/tot): 58/58, tx:  0, lsn:
> > 0/01CDCD70, prev 0/01CDCD10, desc: RUNNING_XACTS nextXid 850
> > latestCompletedXid 847 oldestRunningXid 848; 1 xacts: 848
> > rmgr: Heap2   len (rec/tot): 60/60, tx:849, lsn:
> > 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel 1663/16384/1249; tid
> > 58/38; cmin: 1, cmax: 14, combo: 6
> >
>
> I investigated using the pgdata from the crashed run (can provide, if
> you have rpi4 or some other aarch64 machine), and the reason is pretty
> simple - the restart_lsn for the slot is 0/1CDCD70, which is looong
> after the subxact assignment, so we add both xids as toplevel.
>
> That seems broken - if we skip the assignment like this, doesn't that
> break spill-to-disk and/or streaming? IIRC that's exactly why we had to
> start logging assignments immediately with wal_level=logical.
>

We had started logging assignments immediately in commit 0bead9af48
for streaming transactions in PG14. This issue exists prior to that. I
have tried and reproduced it in PG13 but I think it will be there even
before that. So, I am not sure if the spilling behavior is broken due
to this. I think if we don't get assignment recording before
processing changes during decoding commit then we could miss sending
the changes which won't be the case here. Do you see any other
problem?

> Or maybe we're not dealing with the restart_lsn properly, and we should
> have ignored those records. Both xacts started long before the restart
> LSN, so we're not seeing the whole xact anyway.
>

Right, but is that problematic? The restart LSN will be used as a
point to start reading the WAL and that helps in building a consistent
snapshot. However, for decoding to send the commit, we use
start_decoding_at point which will ensure that we send complete
transactions.

> However, when processing the NEW_CID record:
>
> tx:849, lsn: 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel
> 1663/16384/1249; tid 58/38; cmin: 1, cmax: 14, combo: 6
>
> we ultimately do this in SnapBuildProcessNewCid:
>
> #1  0x005566cccdb4 in ReorderBufferAddNewTupleCids (rb=0x559dd64218,
> xid=848, lsn=30264752, locator=..., tid=..., cmin=1, cmax=14,
> combocid=6) at reorderbuffer.c:3218
> #2  0x005566cd1f7c in SnapBuildProcessNewCid (builder=0x559dd6a248,
> xid=849, lsn=30264752, xlrec=0x559dd6e1e0) at snapbuild.c:818
>
> so in fact we *know* 849 is a subxact of 848, but we don't call
> ReorderBufferAssignChild in this case. In fact we can't even do the
> assignment easily in this case, because we create the subxact first, so
> that the crash happens right when we attempt to create the toplevel one,
> and we never even get a chance to do the assignment:
>
> 1) process the NEW_CID record, logged for 849 (subxact)
> 2) process CIDs in the WAL record, which has topleve_xid 848
>
>
> So IMHO we need to figure out what to do for WAL records that create
> both the toplevel and subxact - either we need to skip them, or rethink
> how we create the ReorderBufferTXN structs.
>

As per my understanding, we can't skip them as they are used to build
the snapshot.

-- 
With Regards,
Amit Kapila.

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Thomas Munro

On Mon, Sep 5, 2022 at 1:28 PM Thomas Munro  wrote:
> I had this more or less figured out on Friday when I wrote last, but I
> got stuck on a weird problem with 026_overwrite_contrecord.pl.  I
> think that failure case should report an error, no?  I find it strange
> that we end recovery in silence.  That was a problem for the new
> coding in this patch, because it is confused by XLREAD_FAIL without
> queuing an error, and then retries, which clobbers the aborted recptr
> state.  I'm still looking into that.

On reflection, it'd be better not to clobber any pre-existing error
there, but report one only if there isn't one already queued.  I've
done that in this version, which I'm planning to do a bit more testing
on and commit soonish if there are no comments/objections, especially
for that part.

I'll have to check whether a doc change is necessary somewhere to
advertise that maintenance_io_concurrency=0 turns off prefetching, but
IIRC that's kinda already implied.

I've tested quite a lot of scenarios including make check-world with
maintenance_io_concurrency = 0, 1, 10, 1000, and ALTER SYSTEM for all
relevant GUCs on a standby running large pgbench to check expected
effect on pg_stat_recovery_prefetch view and generate system calls.
From 956d0613ecad68ba694b8a492895aa02002dbed5 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 5 Sep 2022 12:07:24 +1200
Subject: [PATCH v2] Fix recovery_prefetch with low maintenance_io_concurrency.

The LSN-based logic for knowing when IOs complete ran operations in the
wrong order.  We should process completed IOs first before trying to
start more, so that it is always possible to decode one more record when
the decoded record queue is empty, even if maintenance_io_concurrency is
set so low that a single earlier WAL record might have saturated the IO
queue.

That thinko was hidden because the effect of maintenance_io_concurrency
was arbitrarily clamped to be at least 2.  Fix the ordering, and also
remove that clamp.  We need a special case for 0, which is now treated
the same as recovery_prefetch=off, but otherwise the number is used
directly.  This allows for testing with 1, which would have made the
problem obvious in simple test scenarios.

Back-patch to 15.

Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
---
 src/backend/access/transam/xlogprefetcher.c | 58 ++---
 src/backend/access/transam/xlogreader.c | 12 +
 2 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/xlogprefetcher.c b/src/backend/access/transam/xlogprefetcher.c
index 9aa56411d5..33491d7325 100644
--- a/src/backend/access/transam/xlogprefetcher.c
+++ b/src/backend/access/transam/xlogprefetcher.c
@@ -72,7 +72,9 @@
 int			recovery_prefetch = RECOVERY_PREFETCH_TRY;

 #ifdef USE_PREFETCH
-#define RecoveryPrefetchEnabled() (recovery_prefetch != RECOVERY_PREFETCH_OFF)
+#define RecoveryPrefetchEnabled() \
+		(recovery_prefetch != RECOVERY_PREFETCH_OFF && \
+		 maintenance_io_concurrency > 0)
 #else
 #define RecoveryPrefetchEnabled() false
 #endif
@@ -1000,7 +1002,8 @@ XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher, char **errmsg)

 		if (RecoveryPrefetchEnabled())
 		{
-			max_inflight = Max(maintenance_io_concurrency, 2);
+			Assert(maintenance_io_concurrency > 0);
+			max_inflight = maintenance_io_concurrency;
 			max_distance = max_inflight * XLOGPREFETCHER_DISTANCE_MULTIPLIER;
 		}
 		else
@@ -1018,14 +1021,39 @@ XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher, char **errmsg)
 	}

 	/*
-	 * Release last returned record, if there is one.  We need to do this so
-	 * that we can check for empty decode queue accurately.
+	 * Release last returned record, if there is one, and perform tasks that
+	 * are we can do now that the caller has replayed it.
 	 */
-	XLogReleasePreviousRecord(prefetcher->reader);
+	if (likely(record = prefetcher->reader->record))
+	{
+		XLogRecPtr	replayed_up_to = record->next_lsn;
+
+		XLogReleasePreviousRecord(prefetcher->reader);
+
+		/*
+		 * Can we drop any filters yet?  If we were waiting for a relation to
+		 * be created or extended, it is now OK to access blocks in the
+		 * covered range.
+		 */
+		XLogPrefetcherCompleteFilters(prefetcher, replayed_up_to);
+
+		/*
+		 * All IO initiated by earlier WAL is now completed.  This might
+		 * trigger further prefetching.
+		 */
+		lrq_complete_lsn(prefetcher->streaming_read, replayed_up_to);
+	}

-	/* If there's nothing queued yet, then start prefetching. */
+	/*
+	 * If there's nothing queued yet, then start prefetching to cause at least
+	 * one record to be queued.
+	 */
 	if (!XLogReaderHasQueuedRecordOrError(prefetcher->reader))
+	{
+		Assert(lrq_inflight(prefetcher->streaming_read) == 0);
+		Assert(lrq_completed(prefetcher->streaming_read) == 0);
 		lrq_prefetch(prefetcher->streaming_read);
+	}

 	/* Read the next record. */
 	record = XLogNextRecord(prefetcher->reader, errmsg);
@@ -1039,12 +1067,

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Kyotaro Horiguchi

At Mon, 5 Sep 2022 13:28:12 +1200, Thomas Munro  wrote 
in 
> I had this more or less figured out on Friday when I wrote last, but I
> got stuck on a weird problem with 026_overwrite_contrecord.pl.  I
> think that failure case should report an error, no?  I find it strange
> that we end recovery in silence.  That was a problem for the new
> coding in this patch, because it is confused by XLREAD_FAIL without
> queuing an error, and then retries, which clobbers the aborted recptr
> state.  I'm still looking into that.

+1 for showing any message for the failure, but I think we shouldn't
hide an existing message if any. And the error messages around are
just telling that " at RecPtr". So I think
"missing contrecord at RecPtr" is sufficient here.

diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
index cdcacc7803..bfe332c014 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -907,6 +907,11 @@ err:
 */
state->abortedRecPtr = RecPtr;
state->missingContrecPtr = targetPagePtr;
+
+   /* Put a generic error message if no particular cause is 
recorded. */
+   if (!state->errormsg_buf[0])
+   report_invalid_record(state, "missing contrecord at 
%X/%X",
+ 
LSN_FORMAT_ARGS(RecPtr));
}
 
if (decoded && decoded->oversized)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-04 Thread Kyotaro Horiguchi

(the previous mail was crossing with yours..)

At Mon, 05 Sep 2022 14:15:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> +1 for showing any message for the failure, but I think we shouldn't
me> hide an existing message if any.

At Mon, 5 Sep 2022 16:54:07 +1200, Thomas Munro  wrote 
in 
> On reflection, it'd be better not to clobber any pre-existing error
> there, but report one only if there isn't one already queued.  I've
> done that in this version, which I'm planning to do a bit more testing
> on and commit soonish if there are no comments/objections, especially
> for that part.

It looks fine in this regard.  I still think that the message looks
somewhat internal.

me> And the error messages around are
me> just telling that " at RecPtr". So I think
me> "missing contrecord at RecPtr" is sufficient here.


> I'll have to check whether a doc change is necessary somewhere to
> advertise that maintenance_io_concurrency=0 turns off prefetching, but
> IIRC that's kinda already implied.
> 
> I've tested quite a lot of scenarios including make check-world with
> maintenance_io_concurrency = 0, 1, 10, 1000, and ALTER SYSTEM for all
> relevant GUCs on a standby running large pgbench to check expected
> effect on pg_stat_recovery_prefetch view and generate system calls.

+   if (likely(record = prefetcher->reader->record))

Isn't this confusing a bit?


+   if (likely(record = prefetcher->reader->record))
+   {
+   XLogRecPtr  replayed_up_to = record->next_lsn;
+
+   XLogReleasePreviousRecord(prefetcher->reader);
+

The likely condition is the prerequisite for
XLogReleasePreviousRecord.  But is is a little hard to read the
condition as "in case no previous record exists".  Since there is one
in most cases, can't call XLogReleasePreviousRecord() unconditionally
then the function returns true when the previous record exists and
false if not?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: freeing LDAPMessage in CheckLDAPAuth

2022-09-04 Thread Michael Paquier

On Sun, Sep 04, 2022 at 06:52:37AM -0700, Zhihong Yu wrote:
> Please take a look at patch v3.

Fine as far as it goes.  I would have put the initialization of
search_message closer to ldap_search_s() for consistency with libpq.
That's what we do with ldap_search_st().
--
Michael


signature.asc
Description: PGP signature

Re: SQL/JSON features for v15

2022-09-04 Thread Amit Langote

On Fri, Sep 2, 2022 at 8:56 PM Justin Pryzby  wrote:
> On Wed, Aug 31, 2022 at 03:51:18PM +0900, Amit Langote wrote:
> > Finally, I get this warning:
> >
> > execExprInterp.c: In function ‘ExecJsonCoerceCStringToText’:
> > execExprInterp.c:4765:3: warning: missing braces around initializer
> > [-Wmissing-braces]
> >NameData encoding = {0};
> >^
> > execExprInterp.c:4765:3: warning: (near initialization for
> > ‘encoding.data’) [-Wmissing-braces]
>
> With what compiler ?
>
> This has came up before:
> 20211202033145.gk17...@telsasoft.com
> 20220716115932.gv18...@telsasoft.com

Didn't realize it when I was reviewing the patch but somehow my build
script had started using gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44),
which I know is old.

- Amit

Re: Refactoring postgres_fdw/connection.c

2022-09-04 Thread Etsuro Fujita

On Tue, Jul 26, 2022 at 4:25 PM Kyotaro Horiguchi
 wrote:
> At Tue, 26 Jul 2022 00:54:47 +0900, Fujii Masao  
> wrote in
> > There are two functions, pgfdw_get_result() and
> > pgfdw_get_cleanup_result(),
> > to get a query result. They have almost the same code, call
> > PQisBusy(),
> > WaitLatchOrSocket(), PQconsumeInput() and PQgetResult() in the loop,
> > but only pgfdw_get_cleanup_result() allows its callers to specify the
> > timeout.
> > 0001 patch transforms pgfdw_get_cleanup_result() to the common
> > function
> > to get a query result and makes pgfdw_get_result() use it instead of
> > its own (duplicate) code. The patch also renames
> > pgfdw_get_cleanup_result()
> > to pgfdw_get_result_timed().
>
> Agree to that refactoring.

+1 for that refactoring.  Here are a few comments about the 0001 patch:

I'm not sure it's a good idea to change the function's name, because
that would make backpatching hard.  To avoid that, how about
introducing a workhorse function for pgfdw_get_result and
pgfdw_get_cleanup_result, based on the latter function as you
proposed, and modifying the two functions so that they call the
workhorse function?

@@ -1599,13 +1572,9 @@ pgfdw_finish_pre_commit_cleanup(List *pending_entries)
entry = (ConnCacheEntry *) lfirst(lc);

/* Ignore errors (see notes in pgfdw_xact_callback) */
-   while ((res = PQgetResult(entry->conn)) != NULL)
-   {
-   PQclear(res);
-   /* Stop if the connection is lost (else we'll loop infinitely) */
-   if (PQstatus(entry->conn) == CONNECTION_BAD)
-   break;
-   }
+   pgfdw_get_result_timed(entry->conn, 0, &res, NULL);
+   PQclear(res);

The existing code prevents interruption, but this would change to
allow it.  Do we need this change?

> > pgfdw_xact_callback() and pgfdw_subxact_callback() have similar codes
> > to
> > issue COMMIT or RELEASE SAVEPOINT commands. 0002 patch adds the common
> > function,
> > pgfdw_exec_pre_commit(), for that purpose, and changes those functions
> > so that they use the common one.
>
> I'm not sure the two are similar with each other.  The new function
> pgfdw_exec_pre_commit() looks like a merger of two isolated code paths
> intended to share a seven-line codelet.  I feel the code gets a bit
> harder to understand after the change.  I mildly oppose to this part.

I have to agree with Horiguchi-san, because as mentioned by him, 1)
there isn't enough duplicate code in the two bits to justify merging
them into a single function, and 2) the 0002 patch rather just makes
code complicated.  The current implementation is easy to understand,
so I'd vote for leaving them alone for now.

(I think the introduction of pgfdw_abort_cleanup is good, because that
de-duplicated much code that existed both in pgfdw_xact_callback and
in pgfdw_subxact_callback, which would outweigh the downside of
pgfdw_abort_cleanup that it made code somewhat complicated due to the
logic to handle both the transaction and subtransaction cases within
that single function.  But 0002 is not the case, I think.)

> > pgfdw_finish_pre_commit_cleanup() and
> > pgfdw_finish_pre_subcommit_cleanup()
> > have similar codes to wait for the results of COMMIT or RELEASE
> > SAVEPOINT commands.
> > 0003 patch adds the common function, pgfdw_finish_pre_commit(), for
> > that purpose,
> > and replaces those functions with the common one.
> > That is, pgfdw_finish_pre_commit_cleanup() and
> > pgfdw_finish_pre_subcommit_cleanup()
> > are no longer necessary and 0003 patch removes them.
>
> It gives the same feeling with 0002.

I have to agree with Horiguchi-san on this as well; the existing
single-purpose functions are easy to understand, so I'd vote for
leaving them alone.

Sorry for being late to the party.

Best regards,
Etsuro Fujita

Re: Asynchronous execution support for Custom Scan

2022-09-04 Thread Etsuro Fujita

On Fri, Sep 2, 2022 at 10:43 PM Kazutaka Onishi  wrote:
> The asynchronous version "ctidscan" plugin is ready.

Thanks for that!

I looked at the extended version quickly.  IIUC, it uses the proposed
APIs, but actually executes ctidscans *synchronously*, so it does not
improve performance.  Right?

Anyway, that version seems to be useful for testing that the proposed
APIs works well.  So I'll review the proposed patches with it.  I'm
not Fujii-san, though.  :-)

Best regards,
Etsuro Fujita

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Amit Kapila

On Sun, Sep 4, 2022 at 11:10 PM Tomas Vondra
 wrote:
>
> On 9/4/22 16:08, Tomas Vondra wrote:
> > ...
> >
> > so in fact we *know* 849 is a subxact of 848, but we don't call
> > ReorderBufferAssignChild in this case. In fact we can't even do the
> > assignment easily in this case, because we create the subxact first, so
> > that the crash happens right when we attempt to create the toplevel one,
> > and we never even get a chance to do the assignment:
> >
> > 1) process the NEW_CID record, logged for 849 (subxact)
> > 2) process CIDs in the WAL record, which has topleve_xid 848
> >
> >
> > So IMHO we need to figure out what to do for WAL records that create
> > both the toplevel and subxact - either we need to skip them, or rethink
> > how we create the ReorderBufferTXN structs.
> >
>
> This fixes the crash for me, by adding a ReorderBufferAssignChild call
> to SnapBuildProcessNewCid, and tweaking ReorderBufferAssignChild to
> ensure we don't try to create the top xact before updating the subxact
> and removing it from the toplevel_by_lsn list.
>
> Essentially, what's happening is this:
>
> 1) We read the NEW_CID record, which is logged with XID 849, i.e. the
> subxact. But we don't know it's a subxact, so we create it as a
> top-level xact with the LSN.
>
> 2) We start processing contents of the NEW_CID, which however has info
> that 849 is subxact of 848, calls ReorderBufferAddNewTupleCids which
> promptly does ReorderBufferTXNByXid() with the top-level XID, which
> creates it with the same LSN, and crashes because of the assert.
>
> I'm not sure what's the right/proper way to fix this ...
>
> The problem is ReorderBufferAssignChild was coded in a way that did not
> expect the subxact to be created first (as a top-level xact).
>

I think there was a previously hard-coded way to detect that and we
have removed it in commit
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e3ff789acfb2754cd7b5e87f6f4463fd08e35996).
I think it is possible that subtransaction gets logged without
previous top-level txn record as shown in the commit shared.

-- 
With Regards,
Amit Kapila.

Re: POC: GROUP BY optimization

2022-09-04 Thread Michael Paquier

On Mon, Sep 05, 2022 at 12:14:48AM +0200, Tomas Vondra wrote:
> I've pushed the fix to 15+master. In the end I just used David's patches
> that set parallel_setup_cost to 0.

Thanks, Tomas!
--
Michael


signature.asc
Description: PGP signature

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-09-04 Thread Tomas Vondra




On 9/5/22 06:32, Amit Kapila wrote:
> On Sun, Sep 4, 2022 at 7:38 PM Tomas Vondra
>  wrote:
>>
>> On 9/4/22 14:24, Tomas Vondra wrote:
>>>
 As per
 my understanding, the problem I reported in the email [1] is the same
 and we have seen this in BF failures as well. I posted a way to
 reproduce it in that email. It seems this is possible if the decoding
 gets XLOG_HEAP2_NEW_CID as the first record (belonging to a
 subtransaction) after XLOG_RUNNING_XACTS.

>>>
>>> Interesting. That's certainly true for WAL in the crashing case:
>>>
>>> rmgr: Standby len (rec/tot): 58/58, tx:  0, lsn:
>>> 0/01CDCD70, prev 0/01CDCD10, desc: RUNNING_XACTS nextXid 850
>>> latestCompletedXid 847 oldestRunningXid 848; 1 xacts: 848
>>> rmgr: Heap2   len (rec/tot): 60/60, tx:849, lsn:
>>> 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel 1663/16384/1249; tid
>>> 58/38; cmin: 1, cmax: 14, combo: 6
>>>
>>
>> I investigated using the pgdata from the crashed run (can provide, if
>> you have rpi4 or some other aarch64 machine), and the reason is pretty
>> simple - the restart_lsn for the slot is 0/1CDCD70, which is looong
>> after the subxact assignment, so we add both xids as toplevel.
>>
>> That seems broken - if we skip the assignment like this, doesn't that
>> break spill-to-disk and/or streaming? IIRC that's exactly why we had to
>> start logging assignments immediately with wal_level=logical.
>>
> 
> We had started logging assignments immediately in commit 0bead9af48
> for streaming transactions in PG14. This issue exists prior to that. I
> have tried and reproduced it in PG13 but I think it will be there even
> before that. So, I am not sure if the spilling behavior is broken due
> to this. I think if we don't get assignment recording before
> processing changes during decoding commit then we could miss sending
> the changes which won't be the case here. Do you see any other
> problem?
> 

I can't, but that's hardly a proof of anything. You're right spilling to
disk may not be broken by this, though. I forgot it precedes assignments
being logged immediately, so it does not rely on that.

>> Or maybe we're not dealing with the restart_lsn properly, and we should
>> have ignored those records. Both xacts started long before the restart
>> LSN, so we're not seeing the whole xact anyway.
>>
> 
> Right, but is that problematic? The restart LSN will be used as a
> point to start reading the WAL and that helps in building a consistent
> snapshot. However, for decoding to send the commit, we use
> start_decoding_at point which will ensure that we send complete
> transactions.
> 

Which part would not be problematic? There's some sort of a bug, that's
for sure.

I think it's mostly clear we won't output this transaction, because the
restart LSN is half-way through. We can either ignore it at commit time,
and then we have to make everything work in case we miss assignments (or
any other part of the transaction).

Or we can ignore stuff early, and not even process some of the changes.
For example in this case do we need to process the NEW_CID contents for
transaction 848? If we can skip that bit, the problem will disappear.

But maybe this is futile and there are other similar issues ...

>> However, when processing the NEW_CID record:
>>
>> tx:849, lsn: 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel
>> 1663/16384/1249; tid 58/38; cmin: 1, cmax: 14, combo: 6
>>
>> we ultimately do this in SnapBuildProcessNewCid:
>>
>> #1  0x005566cccdb4 in ReorderBufferAddNewTupleCids (rb=0x559dd64218,
>> xid=848, lsn=30264752, locator=..., tid=..., cmin=1, cmax=14,
>> combocid=6) at reorderbuffer.c:3218
>> #2  0x005566cd1f7c in SnapBuildProcessNewCid (builder=0x559dd6a248,
>> xid=849, lsn=30264752, xlrec=0x559dd6e1e0) at snapbuild.c:818
>>
>> so in fact we *know* 849 is a subxact of 848, but we don't call
>> ReorderBufferAssignChild in this case. In fact we can't even do the
>> assignment easily in this case, because we create the subxact first, so
>> that the crash happens right when we attempt to create the toplevel one,
>> and we never even get a chance to do the assignment:
>>
>> 1) process the NEW_CID record, logged for 849 (subxact)
>> 2) process CIDs in the WAL record, which has topleve_xid 848
>>
>>
>> So IMHO we need to figure out what to do for WAL records that create
>> both the toplevel and subxact - either we need to skip them, or rethink
>> how we create the ReorderBufferTXN structs.
>>
> 
> As per my understanding, we can't skip them as they are used to build
> the snapshot.
> 

Don't we know 848 (the top-level xact) won't be decoded? In that case we
won't need the snapshot, so why build it? Of course, the NEW_CID is
logged with xid 849 and we don't know it's subxact of 848, but when
processing the NEW_CID content we can realize that (xl_heap_new_cid does
include top_xid).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Pos

Re: Clarify restriction on partitioned tables primary key / unique indexes

2022-09-04 Thread David Rowley

On Fri, 2 Sept 2022 at 22:06, David Rowley  wrote:
> Thanks.  I ended up adjusting it to:
>
> "To create a unique or primary key constraint on a partitioned table,"

and pushed.

Thanks for having a look at this Erik.

David

47 matches

Mail list logo