EphemeralNamedRelation and materialized view

2024-07-26 Thread Yugo Nagata
Hi,

While looking into the commit b4da732fd64e936970f38c792f8b32c4bdf2bcd5,
I noticed that we can create a materialized view using Ephemeral Named
Relation in PostgreSQL 16 or earler. 


postgres=# create table tbl (i int);
CREATE TABLE
 ^
postgres=# create or replace function f() returns trigger as $$ begin 
 create materialized view mv as select * from enr; return new; end; $$ language 
plpgsql;
CREATE FUNCTION

postgres=# create trigger trig after insert on tbl referencing new table as enr 
execute function f();
CREATE TRIGGER

postgres=# insert into tbl values (10);

postgres=# \d
 List of relations
 Schema | Name |   Type| Owner  
+--+---+
 public | mv   | materialized view | yugo-n
 public | tbl  | table | yugo-n
(2 rows)


We cannot refresh or get the deinition of it, though.

postgres=# refresh materialized view mv;
ERROR:  executor could not find named tuplestore "enr"

postgres=# \d+ mv
ERROR:  unrecognized RTE kind: 7

In PostgreSQL 17, materialized view using ENR cannot be created 
because queryEnv is not pass to RefreshMatViewByOid introduced by b4da732fd64.
When we try to create it, the  error is raised.

 ERROR: executor could not find named tuplestore "enr"

Although it is hard to imagine users actually try to create materialized view
using ENR, how about prohibiting it even in PG16 or earlier by passing NULL
as queryEnv arg in CreateQueryDesc to avoid to create useless matviews 
accidentally,
as the attached patch?


Regards,
Yugo Nagata

-- 
Yugo Nagata 
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..fda1e4d92b 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -327,7 +327,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 		/* Create a QueryDesc, redirecting output to our tuple receiver */
 		queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
 	GetActiveSnapshot(), InvalidSnapshot,
-	dest, params, queryEnv, 0);
+	dest, params, is_matview ? NULL : queryEnv, 0);
 
 		/* call ExecutorStart to prepare the plan for execution */
 		ExecutorStart(queryDesc, GetIntoRelEFlags(into));


Re: warning: dereferencing type-punned pointer

2024-07-26 Thread Tatsuo Ishii
> On 24.07.24 20:09, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> On 24.07.24 16:05, Tom Lane wrote:
 I'm not very thrilled with these changes.  It's not apparent why
 your compiler is warning about these usages of IsA and not any other
 ones,
>>> I think one difference is that normally IsA is called on a Node *
>>> (since
>>> you call IsA to decide what to cast it to), but in this case it's
>>> called
>>> on a pointer that is already of type ErrorSaveContext *.
>> Hmm.  But there are boatloads of places where we call IsA on a
>> pointer of type Expr *, or sometimes other things.  Why aren't
>> those triggering the same warning?
> 
> It must have to do with the fact that the escontext field in
> JsonExprState has the object inline, not as a pointer.  AIUI, with
> dynamically allocated objects you have more liberties about what type
> to interpret them as than with actually declared objects.

I don't agree. I think the compiler just dislike that nodeTag macro's
argument is a pointer created by '&' operator in this case:

#define nodeTag(nodeptr)  (((const Node*)(nodeptr))->type)

If we just give a pointer variable either it's type is Node * or
ErrorSaveContext * to nodeTag macro, the compiler becomes happy.

Moreover I think whether the object is inline or not is
irrelevant. Attached is a self contained test case. In the program:

if (IsA(&f, List))

produces the strict aliasing rule violation but

if (IsA(fp, List))

does not. Here "f" is an object defined as:

typedef struct Foo
{
NodeTag type;
int d;
} Foo;

Foo f;

and fp is defined as:

Foo *fp = &f;

$ gcc -Wall -O2 -c strict2.c
strict2.c: In function ‘sub’:
strict2.c:1:29: warning: dereferencing type-punned pointer will break 
strict-aliasing rules [-Wstrict-aliasing]
1 | #define nodeTag(nodeptr)  (((const Node*)(nodeptr))->type)
  |~^~~
strict2.c:2:31: note: in expansion of macro ‘nodeTag’
2 | #define IsA(nodeptr,_type_)  (nodeTag(nodeptr) == T_##_type_)
  |   ^~~
strict2.c:26:6: note: in expansion of macro ‘IsA’
   26 |  if (IsA(&f, List))
  |  ^~~
At top level:
strict2.c:21:12: warning: ‘sub’ defined but not used [-Wunused-function]
   21 | static int sub(void)
  |^~~

> If you change the member to a pointer
> 
> -   ErrorSaveContext escontext;
> +   ErrorSaveContext *escontext;
>  } JsonExprState;
> 
> and make the required adjustments elsewhere in the code, the warning
> goes away.

I think this is not necessary. Just my patch in the upthread is enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Re: warning: dereferencing type-punned pointer

2024-07-26 Thread Tatsuo Ishii
Sorry, I forgot to attach the file...

> I don't agree. I think the compiler just dislike that nodeTag macro's
> argument is a pointer created by '&' operator in this case:
> 
> #define nodeTag(nodeptr)  (((const Node*)(nodeptr))->type)
> 
> If we just give a pointer variable either it's type is Node * or
> ErrorSaveContext * to nodeTag macro, the compiler becomes happy.
> 
> Moreover I think whether the object is inline or not is
> irrelevant. Attached is a self contained test case. In the program:
> 
>   if (IsA(&f, List))
> 
> produces the strict aliasing rule violation but
> 
>   if (IsA(fp, List))
> 
> does not. Here "f" is an object defined as:
> 
> typedef struct Foo
> {
>   NodeTag type;
>   int d;
> } Foo;
> 
> Foo f;
> 
> and fp is defined as:
> 
>   Foo *fp = &f;
> 
> $ gcc -Wall -O2 -c strict2.c
> strict2.c: In function ‘sub’:
> strict2.c:1:29: warning: dereferencing type-punned pointer will break 
> strict-aliasing rules [-Wstrict-aliasing]
> 1 | #define nodeTag(nodeptr)  (((const Node*)(nodeptr))->type)
>   |~^~~
> strict2.c:2:31: note: in expansion of macro ‘nodeTag’
> 2 | #define IsA(nodeptr,_type_)  (nodeTag(nodeptr) == T_##_type_)
>   |   ^~~
> strict2.c:26:6: note: in expansion of macro ‘IsA’
>26 |  if (IsA(&f, List))
>   |  ^~~
> At top level:
> strict2.c:21:12: warning: ‘sub’ defined but not used [-Wunused-function]
>21 | static int sub(void)
>   |^~~
> 
>> If you change the member to a pointer
>> 
>> -   ErrorSaveContext escontext;
>> +   ErrorSaveContext *escontext;
>>  } JsonExprState;
>> 
>> and make the required adjustments elsewhere in the code, the warning
>> goes away.
> 
> I think this is not necessary. Just my patch in the upthread is enough.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
/*
 * Minimum definitions copied from PostgreSQL to make the
 * test self-contained.
 */
#define nodeTag(nodeptr)  (((const Node*)(nodeptr))->type)
#define IsA(nodeptr,_type_)  (nodeTag(nodeptr) == T_##_type_)

typedef enum NodeTag
{
T_Invalid = 0,
T_List = 1
} NodeTag;

typedef struct Node
{
NodeTag type;
} Node;

/* Home brew node */
typedef struct Foo
{
NodeTag type;
int d;
} Foo;

static int  sub(void)
{
Foo f;
Foo *fp = &f;
f.type = T_List;

/* strict aliasing rule error */
if (IsA(&f, List))
return 1;

/* This is ok */
if (IsA(fp, List))
return 1;
return 0;
}




Re: Separate HEAP WAL replay logic into its own file

2024-07-26 Thread Li, Yong


> On Jul 23, 2024, at 09:54, Sutou Kouhei  wrote:
>
>
> Here are my comments for your patch:
>
> 1. Could you create your patch by "git format-patch -vN master"
>   or something? If you create your patch by "git format-patch",
>   we can apply your patch by "git am XXX.patch".
>

Thanks for your review. I’ve updated the patch to follow your
suggested format.

>
> 3. Could you remove '#include "access/heapam_xlog.h"' from
>   heapam.c because it's needless now.
>
>   BTW, it seems that we can remove more includes from
>   heapam.c:
>
> 4. Could you remove needless includes from heapam_xlog.c? It
>   seems that we can remove the following includes:

I have removed the redundant includes in the latest patch.

>
> 5. There are still WAL related codes in heapam.c:
>
>   4.1. log_heap_update()
>   4.2. log_heap_new_cid()
>   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
>   4.4. if (needwal) {...} in heap_multi_insert()
>   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
>   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
>   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
>   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
>   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
>   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
>   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
>   4.12. log_heap_visible()
>
>   Should we move them to head_xlog.c too?
>
>   If we should do it, separated commits will be easy to
>   review. For example, the 0001 patch moves existing codes
>   to head_xlog.c as-is. The 0002 patch extracts WAL related
>   codes in heap_insert() to heap_xlog.c and so on.
>
>
> Thanks,
> --
> kou

I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.

Please help review the updated file again. Thanks in advance!


Yong






v2-0001-heapam_refactor.patch
Description: v2-0001-heapam_refactor.patch


Re: Inconsistency between try_mergejoin_path and create_mergejoin_plan

2024-07-26 Thread Richard Guo
On Thu, Jul 25, 2024 at 12:07 AM Tom Lane  wrote:
> I took a brief look at this.  I think the basic idea is sound,
> but I have a couple of nits:

Thank you for reviewing this patch!

> * It's not entirely obvious that the checks preceding these additions
> are sufficient to ensure that the clauses are OpExprs.  They probably
> are, since the clauses are marked hashable or mergeable, but that test
> is mighty far away.  More to the point, if they ever weren't OpExprs
> the result would likely be to pass a bogus OID to get_commutator and
> thus silently fail, allowing the problem to go undetected for a long
> time.  I'd suggest using castNode() rather than a hard-wired
> assumption that the clause is an OpExpr.

Good point.  I've modified the code to use castNode(), and added
comment accordingly.

> * Do we really need to construct a whole new set of bogus operators
> and opclasses to test this?  As you noted, the regression tests
> already set up an incomplete opclass for other purposes.  Why can't
> we extend that test, to reduce the amount of cycles wasted forevermore
> on this rather trivial point?
>
> (I'm actually wondering whether we really need to memorialize this
> with a regression test case at all.  But I'd settle for minimizing
> the amount of test cycles added.)

At first I planned to use the alias type 'int8alias1' created in
equivclass.sql for this test.  However, I found that this type is not
created yet when running join.sql.  Perhaps it's more convenient to
place this test in equivclass.sql to leverage the bogus operators
created there, but the test does not seem to be related to 'broken'
ECs.  I'm not sure if equivclass.sql is the appropriate place.

Do you think it works if we place this test in equivclass.sql and
write a comment explaining why it's there, like attached?  Now I’m
also starting to wonder if this change actually warrants such a test.

BTW, do you think this patch is worth backpatching?

Thanks
Richard


v4-0001-Check-the-validity-of-commutators-for-merge-hash-clauses.patch
Description: Binary data


RE: [Proposal] Add foreign-server health checks infrastructure

2024-07-26 Thread Hayato Kuroda (Fujitsu)
Dear Fujii-san,

Just in case - based on the agreement in [1], I updated patches to keep them
consistent. We can use same pictures for further discussions...

IIUC, the patch which adds user_name attribute to get_connection() can be 
discussed
in later stage, is it right?

[1]: 
https://www.postgresql.org/message-id/768032ee-fb57-494b-b674-1ccb65b6f969%40oss.nttdata.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v44-0001-doc-Enhance-documentation-for-postgres_fdw_get_c.patch
Description:  v44-0001-doc-Enhance-documentation-for-postgres_fdw_get_c.patch


v44-0002-postgres_fdw-Add-used_in_xact-column-to-postgres.patch
Description:  v44-0002-postgres_fdw-Add-used_in_xact-column-to-postgres.patch


v44-0003-postgres_fdw-Add-connection-status-check-to-post.patch
Description:  v44-0003-postgres_fdw-Add-connection-status-check-to-post.patch


Re: DRAFT: Pass sk_attno to consistent function

2024-07-26 Thread Michał Kłeczek


> On 26 Jul 2024, at 01:28, Matthias van de Meent 
>  wrote:
> 
> All in all, this still seems like a very (very) specific optimization,
> of which I'm not sure that it is generalizable. However, array
> introspection and filtering for SAOP equality checks feel like a
> relatively easy (?) push-down optimization in (e.g.) runtime partition
> pruning (or planning); isn't that even better patch potential here?

The issue is not partition pruning for SAOP - it works fine. The issue is lack 
of SAOP support in GIST.

Because I cannot use SAOP I have two options:

1) LATERAL JOIN (ie. iterate through input array elements, SELECT rows for 
each, merge results)
2) Implement a custom operator that emulates SAOP and provide consistent 
function for it. Additionally provide SAOP clause (redundantly) to enable 
partition pruning.

In case of 1):
- the query becomes convoluted with multiple redundant ORDER BY and LIMIT 
clauses
- unnecessary sort is performed (because we have to merge results of subqueries)
- some partitions are scanned multiple times (per each element in input array 
that happens to land in the same partition)

In case of 2):
- the whole input array is passed to consistent function for each partition so 
we unnecessarily search for non-existent rows

To fix the issue in 2) I need to somehow filter input array per partition - 
hence this patch.

Regards,

—
Michal

Re: Restart pg_usleep when interrupted

2024-07-26 Thread Bertrand Drouvot
Hi,

On Thu, Jul 25, 2024 at 05:27:15PM -0500, Sami Imseih wrote:
> I am attaching v3 of the patch which addresses the comments made
> earlier by Bertrand about the comment in the patch [1].

Thanks!

Looking at it:

1 ===

+   struct instr_time start_time;

I think we can get rid of the "struct" keyword here.

2 ===

+   struct instr_time current_time;
+   struct instr_time elapsed_time;

Same as above.

3 ===

I gave more thoughts and I think it can be simplified a bit to reduce the
number of operations in the while loop.

What about relying on a "absolute" time that way:

instr_time absolute;
absolute.ticks = start_time.ticks + msec * 100;

and then in the while loop:

while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
{
instr_time current_time;
INSTR_TIME_SET_CURRENT(current_time);

if (current_time.ticks > absolute.ticks)
{
break;

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-07-26 Thread Alvaro Herrera
On 2024-Jul-26, Junwang Zhao wrote:

> There is a bug report[0] Tender comments might be the same
> issue as this one, but I tried Alvaro's and mine patch, neither
> could solve that problem, I did not tried Tender's earlier patch
> thought. I post the test script below in case you are interested.

Yeah, I've been looking at this whole debacle this week and after
looking at it more closely, I realized that the overall problem requires
a much more invasive solution -- namely, that on DETACH, if the
referenced table is partitioned, we need to create additional
pg_constraint entries from the now-standalone table (was partition) to
each of the partitions of the referenced table; and also add action
triggers to each of those.  Without that, the constraint is incomplete
and doesn't work (as reported multiple times already).

One thing I have not yet tried is what if the partition being detach is
also partitioned.  I mean, do we need to handle each sub-partition
explicitly in some way?  I think the answer is no, but it needs tests.

I have written the patch to do this on detach, and AFAICS it works well,
though it changes the behavior of some existing tests (IIRC related to
self-referencing FKs).  Also, the next problem is making sure that
ATTACH deals with it correctly.  I'm on this bit today.

Self-referencing FKs seem to have additional problems :-(

The queries I was talking about are these

\set tables prim.*''',''forign.*''',lone

select oid, conparentid, contype, conname, conrelid::regclass, 
confrelid::regclass, conkey, confkey, conindid::regclass from pg_constraint 
where contype = 'f' and (conrelid::regclass::text ~ any (array[:tables]) or 
confrelid::regclass::text ~ any (array[:tables])) order by contype, conrelid, 
confrelid; select tgconstraint, oid, tgrelid::regclass, 
tgconstrrelid::regclass, tgname, tgparentid, tgconstrindid::regclass, 
tgfoid::regproc from pg_trigger where tgconstraint in (select oid from 
pg_constraint where conrelid::regclass::text ~ any (array[:tables]) or 
confrelid::regclass::text ~ any (array[:tables])) order by tgconstraint, 
tgrelid::regclass::text, tgfoid;

Written as a single line in psql they let you quickly see all the
constraints and their associated triggers, so for instance you can see
whether this sequence

create table prim (a int primary key) partition by list (a);
create table prim1 partition of prim for values in (1);
create table prim2 partition of prim for values in (2);
create table forign (a int references prim) partition by list (a);
create table forign1 partition of forign for values in (1);
create table forign2 partition of forign for values in (2);
alter table forign detach partition forign1;

produces the same set of constraints and triggers as this other sequence

create table prim (a int primary key) partition by list (a);
create table prim1 partition of prim for values in (1);
create table prim2 partition of prim for values in (2);
create table forign (a int references prim) partition by list (a);
create table forign2 partition of forign for values in (2);
create table forign1 (a int references prim);


The patch is more or less like the attached, far from ready.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 721d24783b..1ea4003aec 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10634,7 +10634,7 @@ CloneFkReferenced(Relation parentRel, Relation 
partitionRel)
 * Because we're only expanding the key space at the referenced 
side,
 * we don't need to prevent any operation in the referencing 
table, so
 * AccessShareLock suffices (assumes that dropping the 
constraint
-* acquires AEL).
+* acquires AccessExclusiveLock).
 */
fkRel = table_open(constrForm->conrelid, AccessShareLock);
 
@@ -19175,8 +19175,10 @@ DetachPartitionFinalize(Relation rel, Relation 
partRel, bool concurrent,
DropClonedTriggersFromPartition(RelationGetRelid(partRel));
 
/*
-* Detach any foreign keys that are inherited.  This includes creating
-* additional action triggers.
+* Detach any foreign keys that are inherited.  This requires marking 
some
+* constraints and triggers as no longer having parents, as well as
+* creating additional constraint entries and triggers, depending on the
+* shape of each FK.
 */
fks = copyObject(RelationGetFKeyList(partRel));
if (fks != NIL)
@@ -19185,10 +19187,15 @@ DetachPartitionFinalize(Relation rel, Relation 
partRel, bool concurrent,
{
ForeignKeyCacheInfo *fk = lfirst(cell);
HeapTuple   contup;
+   HeapTuple   parentConTup

Re: pgsql: Add more SQL/JSON constructor functions

2024-07-26 Thread Amit Langote
On Fri, Jul 26, 2024 at 11:12 AM Amit Langote  wrote:
> On Thu, Jul 25, 2024 at 11:16 PM Amit Langote  wrote:
> > On Wed, Jul 24, 2024 at 3:25 PM jian he  wrote:
> > > 2. domain over jsonb should fail just like domain over other types?
> > > RETURNING djs keep quotes DEFAULT '"11"' ON empty
> > > should fail as
> > > ERROR:  could not coerce ON EMPTY expression (DEFAULT) to the RETURNING 
> > > type
> > > DETAIL:  value for domain djs violates check constraint "djs_check""
> >
> > I think this should be fixed with the attached patch 0004.
>
> It is fixed but with the patch 0003, not 0004.
>
> Also, the test cases in 0004, which is a patch to fix a problem with
> OMIT QUOTES being disregarded when RETURNING domain-over-jsonb, didn't
> test that problem.  So I have updated the test case to use a domain
> over jsonb.

Pushed 0003-0005 ahead of 0001-0002.  Will try to push them over the
weekend.  Rebased for now.

-- 
Thanks, Amit Langote


0001-SQL-JSON-Some-fixes-to-JsonBehavior-expression-casti.patch
Description: Binary data


0002-SQL-JSON-Fix-casting-for-integer-EXISTS-columns-in-J.patch
Description: Binary data


Re: Remove duplicate table scan in logical apply worker and code refactoring

2024-07-26 Thread Ashutosh Bapat
On Thu, Jul 25, 2024 at 5:56 PM Kirill Reshke  wrote:

> > +/*
> > + * If the tuple to be modified could not be found, a log message is 
> > emitted.
> > + */
> > +static void
> > +report_tuple_not_found(CmdType cmd, Relation targetrel, bool is_partition)
> > +{
> > + Assert(cmd == CMD_UPDATE || cmd == CMD_DELETE);
> > +
> > + /* XXX should this be promoted to ereport(LOG) perhaps? */
> > + elog(DEBUG1,
> > + "logical replication did not find row to be %s in replication target 
> > relation%s \"%s\"",
> > + cmd == CMD_UPDATE ? "updated" : "deleted",
> > + is_partition ? "'s partition" : "",
> > + RelationGetRelationName(targetrel));
> > +}
>
> Encapsulating report logic inside function is ok, but double ternary
> expression is a bit out of line. I do not see similar places within
> PostgreSQL,
> so it probably violates code style.
>

They it's written, it would make it hard for the translations. See [1]
which redirects to [2].

[1] https://www.postgresql.org/docs/current/error-style-guide.html
[2] https://www.postgresql.org/docs/current/nls-programmer.html#NLS-GUIDELINES

-- 
Best Wishes,
Ashutosh Bapat




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-07-26 Thread Li, Yong


> On Jul 25, 2024, at 12:51, Sutou Kouhei  wrote:
> 
> Hi,
> 
> THREAD SUMMARY:

Very nice summary.

> 
> Implementation:
> 
> The v18 patch set is the latest patch set. [6]
> It includes the following patches:
> 
> 0001: This adds a basic feature (Copy{From,To}Routine)
>  (This isn't enough for extending COPY format.
>  This just extracts minimal procedure sets to be
>  extendable as callback sets.)
> 0002: This uses Copy{From,To}Rountine for the existing
>  formats (text, csv and binary)
>  (This may not be committed because there is a
>  profiling related concern. See the following section
>  for details)
> 0003: This adds support for specifying custom format by
>  "COPY ... WITH (format 'my-format')"
>  (This also adds a test for this feature.)
> 0004: This exports Copy{From,To}StateData
>  (But this isn't enough to implement custom COPY
>  FROM/TO handlers as an extension.)
> 0005: This adds opaque member to Copy{From,To}StateData and
>  export some functions to read the next data and flush
>  the buffer
>  (We can implement a PoC Apache Arrow COPY FROM/TO
>  handler as an extension with this. [7])
> 
> Thanks,
> --
> kou
> 

This review is for 0001 only because the other patches are not ready
for commit.

The v18-0001 patch applies cleanly to HEAD. “make check-world” also
runs cleanly. The patch looks good for me.


Regards,
Yong

Re: [PATCH] Add crc32(text) & crc32(bytea)

2024-07-26 Thread Aleksander Alekseev
Hi,

> This sounds generally reasonable to me, especially given the apparent
> demand.  Should we also introduce crc32c() while we're at it?

Might be a good idea. However I didn't see a demand for crc32c() SQL
function yet. Also I'm not sure whether the best interface for it
would be crc32c() or crc32(x, version='c') or perhaps crc32(x,
polinomial=...). I propose keeping the scope small this time.

-- 
Best regards,
Aleksander Alekseev




Re: consider -Wmissing-variable-declarations

2024-07-26 Thread Peter Eisentraut
I have committed all of the fixes that I had previously posted, but 
before actually activating the warning option, I found another small 
hiccup with the Bison files.


Before Bison 3.4, the generated parser implementation files run afoul of 
-Wmissing-variable-declarations (in spite of commit ab61c40bfa2) because 
declarations for yylval and possibly yylloc are missing.  The generated 
header files contain an extern declaration, but the implementation files 
don't include the header files.  Since Bison 3.4, the generated 
implementation files automatically include the generated header files, 
so then it works.


To make this work with older Bison versions as well, I made a patch to 
include the generated header file from the .y file.


(With older Bison versions, the generated implementation file contains 
effectively a copy of the header file pasted in, so including the header 
file is redundant.  But we know this works anyway because the core 
grammar uses this arrangement already.)
From a4eadd43c9e0300bd6027c0d4d59f5eec11cc577 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 26 Jul 2024 10:17:53 +0200
Subject: [PATCH v3 1/2] Include bison header files into implementation files

Before Bison 3.4, the generated parser implementation files run afoul
of -Wmissing-variable-declarations (in spite of commit ab61c40bfa2)
because declarations for yylval and possibly yylloc are missing.  The
generated header files contain an extern declaration, but the
implementation files don't include the header files.  Since Bison 3.4,
the generated implementation files automatically include the generated
header files, so then it works.

To make this work with older Bison versions as well, include the
generated header file from the .y file.

(With older Bison versions, the generated implementation file contains
effectively a copy of the header file pasted in, so including the
header file is redundant.  But we know this works anyway because the
core grammar uses this arrangement already.)

Discussion: 
https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c9...@eisentraut.org
---
 contrib/cube/cubeparse.y| 8 +---
 contrib/seg/segparse.y  | 1 +
 src/backend/bootstrap/bootparse.y   | 1 +
 src/backend/replication/repl_gram.y | 1 +
 src/backend/replication/syncrep_gram.y  | 2 ++
 src/interfaces/ecpg/preproc/ecpg.header | 1 +
 src/pl/plpgsql/src/pl_gram.y| 1 +
 src/test/isolation/specparse.y  | 1 +
 8 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/contrib/cube/cubeparse.y b/contrib/cube/cubeparse.y
index fd56d0e1628..52622875cbb 100644
--- a/contrib/cube/cubeparse.y
+++ b/contrib/cube/cubeparse.y
@@ -11,13 +11,15 @@
 #include "utils/float.h"
 #include "varatt.h"
 
+/* All grammar constructs return strings */
+#define YYSTYPE char *
+
+#include "cubeparse.h"
+
 /* silence -Wmissing-variable-declarations */
 extern int cube_yychar;
 extern int cube_yynerrs;
 
-/* All grammar constructs return strings */
-#define YYSTYPE char *
-
 /*
  * Bison doesn't allocate anything that needs to live across parser calls,
  * so we can easily have it use palloc instead of malloc.  This prevents
diff --git a/contrib/seg/segparse.y b/contrib/seg/segparse.y
index 729d4b6390b..efa5fb749cf 100644
--- a/contrib/seg/segparse.y
+++ b/contrib/seg/segparse.y
@@ -12,6 +12,7 @@
 #include "utils/float.h"
 
 #include "segdata.h"
+#include "segparse.h"
 
 /* silence -Wmissing-variable-declarations */
 extern int seg_yychar;
diff --git a/src/backend/bootstrap/bootparse.y 
b/src/backend/bootstrap/bootparse.y
index 58e0878dc8d..73a7592fb71 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -32,6 +32,7 @@
 #include "nodes/makefuncs.h"
 #include "utils/memutils.h"
 
+#include "bootparse.h"
 
 /* silence -Wmissing-variable-declarations */
 extern int boot_yychar;
diff --git a/src/backend/replication/repl_gram.y 
b/src/backend/replication/repl_gram.y
index c46ca395263..06daa954813 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -22,6 +22,7 @@
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
 
+#include "repl_gram.h"
 
 /* silence -Wmissing-variable-declarations */
 extern int replication_yychar;
diff --git a/src/backend/replication/syncrep_gram.y 
b/src/backend/replication/syncrep_gram.y
index 5ce4f1bfe73..e4d9962226c 100644
--- a/src/backend/replication/syncrep_gram.y
+++ b/src/backend/replication/syncrep_gram.y
@@ -17,6 +17,8 @@
 #include "nodes/pg_list.h"
 #include "replication/syncrep.h"
 
+#include "syncrep_gram.h"
+
 /* Result of parsing is returned in one of these two variables */
 SyncRepConfigData *syncrep_parse_result;
 char  *syncrep_parse_error_msg;
diff --git a/src/interfaces/ecpg/preproc/ecpg.header 
b/src/interfaces/ecpg/preproc/ecpg.header
index 571b92f6434..3790a601d1a 100644
--- a/src/interfaces/ecpg/preproc/ecpg.header
+++ b

Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-07-26 Thread Tender Wang
Junwang Zhao  于2024年7月26日周五 14:57写道:

>
> There is a bug report[0] Tender comments might be the same
> issue as this one, but I tried Alvaro's and mine patch, neither
> could solve that problem, I did not tried Tender's earlier patch
> thought. I post the test script below in case you are interested.
>

My earlier patch should handle Alexander reported case. But I did not do
more
test. I'm not sure that wether or not has dismatching between pg_constraint
and pg_trigger.

I aggred with Alvaro said that "requires a much more invasive solution".

-- 
Tender Wang


Re: WIP: parallel GiST index builds

2024-07-26 Thread Andreas Karlsson

On 7/22/24 2:08 PM, Andrey M. Borodin wrote:

During inserting tuples we need NSN on page. For NSN we can use just a counter, 
generated by gistGetFakeLSN() which in turn will call 
GetFakeLSNForUnloggedRel(). Or any other shared counter.
After inserting tuples we call log_newpage_range() to actually WAL-log pages.
All NSNs used during build must be less than LSNs used to insert new tuples 
after index is built.


I feel the tricky part about doing that is that we need to make sure the 
fake LSNs are all less than the current real LSN when the index build 
completes and while that normally should be the case we will have a 
almost never exercised code path for when the fake LSN becomes bigger 
than the real LSN which may contain bugs. Is that really worth it to 
optimize.


But if we are going to use fake LSN: since the index being built is not 
visible to any scans we do not have to use GetFakeLSNForUnloggedRel() 
but could use an own counter in shared memory in the GISTShared struct 
for this specific index which starts at FirstNormalUnloggedLSN. This 
would give us slightly less contention plus decrease the risk (for good 
and bad) of the fake LSN being larger than the real LSN.


Andreas




Re: Possible null pointer dereference in afterTriggerAddEvent()

2024-07-26 Thread Alexander Kuznetsov

25.07.2024 20:07, Alvaro Herrera wrote:

Maybe for sanity (and perhaps for Svace compliance) we could do it the
other way around, i.e. by testing events->tail for nullness instead of
events->head, then add the assertion:

if (events->tail == NULL)
{
Assert(events->head == NULL);
events->head = chunk;
}
else
events->tail->next = chunk;

This way, it's not wholly redundant.

Thanks for your response!
I agree with the proposed changes and have updated the patch accordingly. 
Version 2 is attached.

That said, I'm not sure we actually *need* to change this.

I understand and partly agree. But it appears that with these changes, the 
dereference of a null pointer is impossible even in builds where assertions are 
disabled. Previously, this issue could theoretically occur. Consequently, these 
changes slightly enhance overall security.

--
Best regards,
Alexander Kuznetsov
From 63a3a19fe67bfd1f427b21d53ff0ef642aed89c4 Mon Sep 17 00:00:00 2001
From: Alexander Kuznetsov 
Date: Fri, 26 Jul 2024 11:55:53 +0300
Subject: [PATCH v2] Add assertion of an empty list in afterTriggerAddEvent()

The unwritten assumption is that events->tail and events->head are either
both NULL or neither is NULL.

Change the order of checks to ensure that we never try
to dereference events->tail if it is NULL.

This was found by ALT Linux Team with Svace.
---
 src/backend/commands/trigger.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 170360edda..e039d62b0b 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -4096,8 +4096,11 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 		chunk->endptr = chunk->endfree = (char *) chunk + chunksize;
 		Assert(chunk->endfree - chunk->freeptr >= needed);
 
-		if (events->head == NULL)
+		if (events->tail == NULL)
+		{
+			Assert(events->head == NULL);
 			events->head = chunk;
+		}
 		else
 			events->tail->next = chunk;
 		events->tail = chunk;
-- 
2.42.2



Re: Conflict detection and logging in logical replication

2024-07-26 Thread Nisha Moond
On Thu, Jul 25, 2024 at 12:04 PM Zhijie Hou (Fujitsu)
 wrote:
> Here is the V6 patch set which addressed Shveta and Nisha's comments
> in [1][2][3][4].

Thanks for the patch.
I tested the v6-0001 patch with partition table scenarios. Please
review the following scenario where Pub updates a tuple, causing it to
move from one partition to another on Sub.

Setup:
Pub:
 create table tab (a int not null, b int not null);
 alter table tab add constraint tab_pk primary key (a,b);
Sub:
 create table tab (a int not null, b int not null) partition by range (b);
 alter table tab add constraint tab_pk primary key (a,b);
 create table tab_1 partition of tab FOR values from (MINVALUE) TO (100);
 create table tab_2 partition of tab FOR values from (101) TO (MAXVALUE);

Test:
 Pub: insert into tab values (1,1);
 Sub: update tab set a=1 where a=1;  > just to make it Sub's origin
 Sub: insert into tab values (1,101);
 Pub: update b=101 where b=1; --> Both 'update_differ' and
'insert_exists' are detected.

For non-partitioned tables, a similar update results in
'update_differ' and 'update_exists' conflicts. After detecting
'update_differ', the apply worker proceeds to apply the remote update
and if a tuple with the updated key already exists, it raises
'update_exists'.
This same behavior is expected for partitioned tables too.

Thanks,
Nisha




Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-07-26 Thread Richard Guo
On Wed, Jul 24, 2024 at 3:30 PM Ashutosh Bapat
 wrote:
> Done. Are you suggesting it for aesthetic purposes or there's
> something else (read, less defragmentation, performance gain etc.)? I
> am curious.

I think it's just for readability.

> PFA patches:
> 0001 - same as previous one
> 0002 - addresses your review comments

I've worked a bit more on the comments and commit message, and I plan
to push the attached soon, barring any objections or comments.

Thanks
Richard


v1-0001-Reduce-memory-used-by-partitionwise-joins.patch
Description: Binary data


Re: Allow logical failover slots to wait on synchronous replication

2024-07-26 Thread shveta malik
On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila  wrote:
>
> On Tue, Jul 9, 2024 at 12:39 AM John H  wrote:
> >
> > > Out of curiosity, did you compare with standby_slot_names_from_syncrep 
> > > set to off
> > > and standby_slot_names not empty?
> >
> > I didn't think 'standby_slot_names' would impact TPS as much since
> > it's not grabbing the SyncRepLock but here's a quick test.
> > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> > pgbench all running from the same server.
> >
> > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
> >
> > Result with: standby_slot_names =
> > 'replica_1,replica_2,replica_3,replica_4,replica_5'
> >
> > latency average = 5.600 ms
> > latency stddev = 2.854 ms
> > initial connection time = 5.503 ms
> > tps = 714.148263 (without initial connection time)
> >
> > Result with: standby_slot_names_from_syncrep = 'true',
> > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> >
> > latency average = 5.740 ms
> > latency stddev = 2.543 ms
> > initial connection time = 4.093 ms
> > tps = 696.776249 (without initial connection time)
> >
> > Result with nothing set:
> >
> > latency average = 5.090 ms
> > latency stddev = 3.467 ms
> > initial connection time = 4.989 ms
> > tps = 785.665963 (without initial connection time)
> >
> > Again I think it's possible to improve the synchronous numbers if we
> > cache but I'll try that out in a bit.
> >
>
> Okay, so the tests done till now conclude that we won't get the
> benefit by using 'standby_slot_names_from_syncrep'. Now, if we
> increase the number of standby's in both lists and still keep ANY 3 in
> synchronous_standby_names then the results may vary. We should try to
> find out if there is a performance benefit with the use of
> synchronous_standby_names in the normal configurations like the one
> you used in the above tests to prove the value of this patch.
>

I didn't fully understand the parameters mentioned above, specifically
what 'latency stddev' and 'latency average' represent.. But shouldn't
we see the benefit/value of this patch by having a setup where a
particular standby is slow in sending the response back to primary
(could be due to network lag or other reasons) and then measuring the
latency in receiving changes on failover-enabled logical subscribers?
We can perform this test with both of the below settings and say make
D and E slow in sending responses:
1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.

thanks
Shveta




Re: Conflict detection and logging in logical replication

2024-07-26 Thread shveta malik
On Fri, Jul 26, 2024 at 3:03 PM Nisha Moond  wrote:
>
> On Thu, Jul 25, 2024 at 12:04 PM Zhijie Hou (Fujitsu)
>  wrote:
> > Here is the V6 patch set which addressed Shveta and Nisha's comments
> > in [1][2][3][4].
>
> Thanks for the patch.
> I tested the v6-0001 patch with partition table scenarios. Please
> review the following scenario where Pub updates a tuple, causing it to
> move from one partition to another on Sub.
>
> Setup:
> Pub:
>  create table tab (a int not null, b int not null);
>  alter table tab add constraint tab_pk primary key (a,b);
> Sub:
>  create table tab (a int not null, b int not null) partition by range (b);
>  alter table tab add constraint tab_pk primary key (a,b);
>  create table tab_1 partition of tab FOR values from (MINVALUE) TO (100);
>  create table tab_2 partition of tab FOR values from (101) TO (MAXVALUE);
>
> Test:
>  Pub: insert into tab values (1,1);
>  Sub: update tab set a=1 where a=1;  > just to make it Sub's origin
>  Sub: insert into tab values (1,101);
>  Pub: update b=101 where b=1; --> Both 'update_differ' and
> 'insert_exists' are detected.
>
> For non-partitioned tables, a similar update results in
> 'update_differ' and 'update_exists' conflicts. After detecting
> 'update_differ', the apply worker proceeds to apply the remote update
> and if a tuple with the updated key already exists, it raises
> 'update_exists'.
> This same behavior is expected for partitioned tables too.

Good catch. Yes, from the user's perspective, an update_* conflict
should be raised when performing an update operation. But internally
since we are deleting from one partition and inserting to another, we
are hitting insert_exist. To convert this insert_exist to udpate_exist
conflict, perhaps we need to change insert-operation to
update-operation as the default resolver is 'always apply update' in
case of update_differ. But not sure how much complexity it will add to
 the code. If it makes the code too complex, I think we can retain the
existing behaviour but document this multi-partition case. And in the
resolver patch, we can handle the resolution of insert_exists by
converting it to update. Thoughts?

thanks
Shveta




Re: Conflict detection and logging in logical replication

2024-07-26 Thread Amit Kapila
On Fri, Jul 26, 2024 at 3:37 PM shveta malik  wrote:
>
> On Fri, Jul 26, 2024 at 3:03 PM Nisha Moond  wrote:
> >
> > On Thu, Jul 25, 2024 at 12:04 PM Zhijie Hou (Fujitsu)
> >  wrote:
> > > Here is the V6 patch set which addressed Shveta and Nisha's comments
> > > in [1][2][3][4].
> >
> > Thanks for the patch.
> > I tested the v6-0001 patch with partition table scenarios. Please
> > review the following scenario where Pub updates a tuple, causing it to
> > move from one partition to another on Sub.
> >
> > Setup:
> > Pub:
> >  create table tab (a int not null, b int not null);
> >  alter table tab add constraint tab_pk primary key (a,b);
> > Sub:
> >  create table tab (a int not null, b int not null) partition by range (b);
> >  alter table tab add constraint tab_pk primary key (a,b);
> >  create table tab_1 partition of tab FOR values from (MINVALUE) TO (100);
> >  create table tab_2 partition of tab FOR values from (101) TO (MAXVALUE);
> >
> > Test:
> >  Pub: insert into tab values (1,1);
> >  Sub: update tab set a=1 where a=1;  > just to make it Sub's origin
> >  Sub: insert into tab values (1,101);
> >  Pub: update b=101 where b=1; --> Both 'update_differ' and
> > 'insert_exists' are detected.
> >
> > For non-partitioned tables, a similar update results in
> > 'update_differ' and 'update_exists' conflicts. After detecting
> > 'update_differ', the apply worker proceeds to apply the remote update
> > and if a tuple with the updated key already exists, it raises
> > 'update_exists'.
> > This same behavior is expected for partitioned tables too.
>
> Good catch. Yes, from the user's perspective, an update_* conflict
> should be raised when performing an update operation. But internally
> since we are deleting from one partition and inserting to another, we
> are hitting insert_exist. To convert this insert_exist to udpate_exist
> conflict, perhaps we need to change insert-operation to
> update-operation as the default resolver is 'always apply update' in
> case of update_differ.
>

But we already document that behind the scenes such an update is a
DELETE+INSERT operation [1]. Also, all the privilege checks or before
row triggers are of type insert, so, I think it is okay to consider
this as insert_exists conflict and document it. Later, resolver should
also fire for insert_exists conflict.

One more thing we need to consider is whether we should LOG or ERROR
for update/delete_differ conflicts. If we LOG as the patch is doing
then we are intentionally overwriting the row when the user may not
expect it. OTOH, without a patch anyway we are overwriting, so there
is an argument that logging by default is what the user will expect.
What do you think?

[1] - https://www.postgresql.org/docs/devel/sql-update.html (See ...
Behind the scenes, the row movement is actually a DELETE and INSERT
operation.)

-- 
With Regards,
Amit Kapila.




Re: Conflict detection and logging in logical replication

2024-07-26 Thread shveta malik
On Fri, Jul 26, 2024 at 3:56 PM Amit Kapila  wrote:
>
> On Fri, Jul 26, 2024 at 3:37 PM shveta malik  wrote:
> >
> > On Fri, Jul 26, 2024 at 3:03 PM Nisha Moond  
> > wrote:
> > >
> > > On Thu, Jul 25, 2024 at 12:04 PM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > > Here is the V6 patch set which addressed Shveta and Nisha's comments
> > > > in [1][2][3][4].
> > >
> > > Thanks for the patch.
> > > I tested the v6-0001 patch with partition table scenarios. Please
> > > review the following scenario where Pub updates a tuple, causing it to
> > > move from one partition to another on Sub.
> > >
> > > Setup:
> > > Pub:
> > >  create table tab (a int not null, b int not null);
> > >  alter table tab add constraint tab_pk primary key (a,b);
> > > Sub:
> > >  create table tab (a int not null, b int not null) partition by range (b);
> > >  alter table tab add constraint tab_pk primary key (a,b);
> > >  create table tab_1 partition of tab FOR values from (MINVALUE) TO (100);
> > >  create table tab_2 partition of tab FOR values from (101) TO (MAXVALUE);
> > >
> > > Test:
> > >  Pub: insert into tab values (1,1);
> > >  Sub: update tab set a=1 where a=1;  > just to make it Sub's origin
> > >  Sub: insert into tab values (1,101);
> > >  Pub: update b=101 where b=1; --> Both 'update_differ' and
> > > 'insert_exists' are detected.
> > >
> > > For non-partitioned tables, a similar update results in
> > > 'update_differ' and 'update_exists' conflicts. After detecting
> > > 'update_differ', the apply worker proceeds to apply the remote update
> > > and if a tuple with the updated key already exists, it raises
> > > 'update_exists'.
> > > This same behavior is expected for partitioned tables too.
> >
> > Good catch. Yes, from the user's perspective, an update_* conflict
> > should be raised when performing an update operation. But internally
> > since we are deleting from one partition and inserting to another, we
> > are hitting insert_exist. To convert this insert_exist to udpate_exist
> > conflict, perhaps we need to change insert-operation to
> > update-operation as the default resolver is 'always apply update' in
> > case of update_differ.
> >
>
> But we already document that behind the scenes such an update is a
> DELETE+INSERT operation [1]. Also, all the privilege checks or before
> row triggers are of type insert, so, I think it is okay to consider
> this as insert_exists conflict and document it. Later, resolver should
> also fire for insert_exists conflict.

Thanks for the link. +1 on  existing behaviour of insert_exists conflict.

> One more thing we need to consider is whether we should LOG or ERROR
> for update/delete_differ conflicts. If we LOG as the patch is doing
> then we are intentionally overwriting the row when the user may not
> expect it. OTOH, without a patch anyway we are overwriting, so there
> is an argument that logging by default is what the user will expect.
> What do you think?

I was under the impression that in this patch we do not intend to
change behaviour of HEAD and thus only LOG the conflict wherever
possible. And in the next patch of resolver, based on the user's input
of error/skip/or resolve, we take the action. I still think it is
better to stick to the said behaviour. Only if we commit the resolver
patch in the same version where we commit the detection patch, then we
can take the risk of changing this default behaviour to 'always
error'. Otherwise users will be left with conflicts arising but no
automatic way to resolve those. But for users who really want their
application to error out, we can provide an additional GUC in this
patch itself which changes the behaviour to 'always ERROR on
conflict'. Thoughts?

thanks
Shveta




Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test

2024-07-26 Thread Alexander Lakhin

25.07.2024 15:00, Alexander Lakhin wrote:



The other question is: why is 005_opclass_damage taking so much time there?
...
$ make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*" 
PROVE_FLAGS="--timer"
[11:11:53] t/005_opclass_damage.pl .. ok 1370 ms ( 0.00 usr 0.00 sys +  
0.10 cusr  0.07 csys =  0.17 CPU)

$ echo "debug_parallel_query = regress" >/tmp/extra.config
$ TEMP_CONFIG=/tmp/extra.config make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*" 
PROVE_FLAGS="--timer"
[11:12:46] t/005_opclass_damage.pl .. ok    40854 ms ( 0.00 usr 0.00 sys +  
0.10 cusr  0.10 csys =  0.20 CPU)

...
So maybe at least this test should be improved for testing with
debug_parallel_query enabled, if such active use of parallel workers by
pg_amcheck can't be an issue to end users?



When running this test with "log_min_messages = DEBUG2" in my extra.config,
I see thousands of the following messages in the test log:
2024-07-26 09:32:54.544 UTC [2572189:46] DEBUG:  postmaster received pmsignal 
signal
2024-07-26 09:32:54.544 UTC [2572189:47] DEBUG:  registering background worker 
"parallel worker for PID 2572197"
2024-07-26 09:32:54.544 UTC [2572189:48] DEBUG:  starting background worker process 
"parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:49] DEBUG:  unregistering background worker 
"parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:50] DEBUG:  background worker "parallel 
worker" (PID 2572205) exited with exit code 0
2024-07-26 09:32:54.547 UTC [2572189:51] DEBUG:  postmaster received pmsignal 
signal
2024-07-26 09:32:54.547 UTC [2572189:52] DEBUG:  registering background worker 
"parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:53] DEBUG:  starting background worker process 
"parallel worker for PID 2572197"
2024-07-26 09:32:54.549 UTC [2572189:54] DEBUG:  unregistering background worker 
"parallel worker for PID 2572197"
2024-07-26 09:32:54.549 UTC [2572189:55] DEBUG:  background worker "parallel 
worker" (PID 2572206) exited with exit code 0
...

grep ' registering background worker' \
 src/bin/pg_amcheck/tmp_check/log/005_opclass_damage_test.log | wc -l
15669

So this test launches more than 15000 processes when debug_parallel_query
is enabled.

As far as I can see, this is happening because of the "PARALLEL UNSAFE"
marking is ignored when the function is called by CREATE INDEX/amcheck.

Namely, with a function defined as:
    CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
    SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;

SELECT int4_asc_cmp(1, 2);
executed without parallel workers. Whilst when it's used by an index:
CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
...
OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);

INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);

CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
launches 1000 parallel workers.

(This is reminiscent of bug #18314.)

One way to workaround this is to disable debug_parallel_query in the test
and another I find possible is to set max_parallel_workers = 0.

Best regards,
Alexander




Re: Built-in CTYPE provider

2024-07-26 Thread Noah Misch
On Wed, Jul 24, 2024 at 08:19:13AM -0700, Noah Misch wrote:
> On Wed, Jul 17, 2024 at 03:03:26PM -0700, Noah Misch wrote:
> > vote one way or the other on the question in
> > https://postgr.es/m/20240706195129...@rfd.leadboat.com?
> 
> I'll keep the voting open for another 24 hours from now
> or 36 hours after the last vote, whichever comes last.

I count 4.5 or 5 votes for "okay" and 2 votes for "not okay".  I've moved the
open item to "Non-bugs".

On Wed, Jul 17, 2024 at 11:06:43PM -0700, Jeff Davis wrote:
> You haven't established that any problem actually exists in version 17,
> and your arguments have been a moving target throughout this subthread.

I can understand that experience of yours.  It wasn't my intent to make a
moving target.  To be candid, I entered the thread with no doubts that you'd
agree with the problem.  When you and Tom instead shared a different view, I
switched to pursuing the votes to recognize the problem.  (Voting then held
that pg_c_utf8 is okay as-is.)

On Thu, Jul 18, 2024 at 09:52:44AM -0700, Jeff Davis wrote:
> On Thu, 2024-07-18 at 07:00 -0700, Noah Misch wrote:
> > Given all the messages on this thread, if the feature remains in
> > PostgreSQL, I
> > advise you to be ready to tolerate PostgreSQL "routinely updating the
> > built-in
> > provider to adopt any changes that Unicode makes".
> 
> You mean messages from me, like:
> 
>   * "I have no intention force a Unicode update" [1]
>   * "While nothing needs to be changed for 17, I agree that we may need
> to be careful in future releases not to break things." [2]
>   * "...you are right that we may need to freeze Unicode updates or be
> more precise about versioning..." [2]
>   * "If you are proposing that Unicode updates should not be performed
> if they affect the results of any IMMUTABLE function...I am neither
> endorsing nor opposing..." [3]
> 
> ?

Those, plus all the other messages.

On Fri, Jul 19, 2024 at 08:50:41AM -0700, Jeff Davis wrote:
> Consider:
> 
> a. Some user creates an expression index on NORMALIZE(); vs.
> b. Some user chooses the builtin "C.UTF-8" locale and creates a partial
> index with a predicate like "string ~ '[[:alpha:]]'" (or an expression
> index on LOWER())
> 
> Both cases create a risk if we update Unicode in some future version.
> Why are you unconcerned about case (a), but highly concerned about case
> (b)?

I am not unconcerned about (a), but the v17 beta process gave an opportunity
to do something about (b) that it didn't give for (a).  Also, I have never
handled a user report involving NORMALIZE().  I have handled user reports
around regexp index inconsistency, e.g. the one at
https://www.youtube.com/watch?v=kNH94tmpUus&t=1490s




Re: Conflict detection and logging in logical replication

2024-07-26 Thread Amit Kapila
On Thu, Jul 25, 2024 at 4:12 PM Amit Kapila  wrote:
>
> On Thu, Jul 25, 2024 at 12:04 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Here is the V6 patch set which addressed Shveta and Nisha's comments
> > in [1][2][3][4].
> >
>
> Do we need an option detect_conflict for logging conflicts? The
> possible reason to include such an option is to avoid any overhead
> during apply due to conflict detection. IIUC, to detect some of the
> conflicts like update_differ and delete_differ, we would need to fetch
> commit_ts information which could be costly but we do that only when
> GUC track_commit_timestamp is enabled which would anyway have overhead
> on its own. Can we do performance testing to see how much additional
> overhead we have due to fetching commit_ts information during conflict
> detection?
>
> The other time we need to enquire commit_ts is to log the conflict
> detection information which is an ERROR path, so performance shouldn't
> matter in this case.
>
> In general, it would be good to enable conflict detection/logging by
> default but if it has overhead then we can consider adding this new
> option. Anyway, adding an option could be a separate patch (at least
> for review), let the first patch be the core code of conflict
> detection and logging.
>
> minor cosmetic comments:
> 1.
> +static void
> +check_conflict_detection(void)
> +{
> + if (!track_commit_timestamp)
> + ereport(WARNING,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("conflict detection could be incomplete due to disabled
> track_commit_timestamp"),
> + errdetail("Conflicts update_differ and delete_differ cannot be
> detected, and the origin and commit timestamp for the local row will
> not be logged."));
> +}
>
> The errdetail string is too long. It would be better to split it into
> multiple rows.
>
> 2.
> -
> +static void check_conflict_detection(void);
>
> Spurious line removal.
>

A few more comments:
1.
For duplicate key, the patch reports conflict as following:
ERROR:  conflict insert_exists detected on relation "public.t1"
2024-07-26 11:06:34.570 IST [27800] DETAIL:  Key (c1)=(1) already
exists in unique index "t1_pkey", which was modified by origin 1 in
transaction 770 at 2024-07-26 09:16:47.79805+05:30.
2024-07-26 11:06:34.570 IST [27800] CONTEXT:  processing remote data
for replication origin "pg_16387" during message type "INSERT" for
replication target relation "public.t1" in transaction 742, finished
at 0/151A108

In detail, it is better to display the origin name instead of the
origin id. This will be similar to what we do in CONTEXT information.

2.
if (resultRelInfo->ri_NumIndices > 0)
  recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
-slot, estate, false, false,
-NULL, NIL, false);
+slot, estate, false,
+conflictindexes, &conflict,

It is better to use true/false for the bool parameter (something like
conflictindexes ? true : false). That will make the code easier to
follow.

3. The need for ReCheckConflictIndexes() is not clear from comments.
Can you please add a few comments to explain this?

4.
-   will simply be skipped.
+   will simply be skipped. Please refer to detect_conflict
+   for all the conflicts that will be logged when enabling
detect_conflict.
   

It would be easier to read the patch if you move 

Re: Allow logical failover slots to wait on synchronous replication

2024-07-26 Thread Amit Kapila
On Fri, Jul 26, 2024 at 3:28 PM shveta malik  wrote:
>
> On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila  wrote:
> >
> > On Tue, Jul 9, 2024 at 12:39 AM John H  wrote:
> > >
> > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep 
> > > > set to off
> > > > and standby_slot_names not empty?
> > >
> > > I didn't think 'standby_slot_names' would impact TPS as much since
> > > it's not grabbing the SyncRepLock but here's a quick test.
> > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> > > pgbench all running from the same server.
> > >
> > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
> > >
> > > Result with: standby_slot_names =
> > > 'replica_1,replica_2,replica_3,replica_4,replica_5'
> > >
> > > latency average = 5.600 ms
> > > latency stddev = 2.854 ms
> > > initial connection time = 5.503 ms
> > > tps = 714.148263 (without initial connection time)
> > >
> > > Result with: standby_slot_names_from_syncrep = 'true',
> > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > >
> > > latency average = 5.740 ms
> > > latency stddev = 2.543 ms
> > > initial connection time = 4.093 ms
> > > tps = 696.776249 (without initial connection time)
> > >
> > > Result with nothing set:
> > >
> > > latency average = 5.090 ms
> > > latency stddev = 3.467 ms
> > > initial connection time = 4.989 ms
> > > tps = 785.665963 (without initial connection time)
> > >
> > > Again I think it's possible to improve the synchronous numbers if we
> > > cache but I'll try that out in a bit.
> > >
> >
> > Okay, so the tests done till now conclude that we won't get the
> > benefit by using 'standby_slot_names_from_syncrep'. Now, if we
> > increase the number of standby's in both lists and still keep ANY 3 in
> > synchronous_standby_names then the results may vary. We should try to
> > find out if there is a performance benefit with the use of
> > synchronous_standby_names in the normal configurations like the one
> > you used in the above tests to prove the value of this patch.
> >
>
> I didn't fully understand the parameters mentioned above, specifically
> what 'latency stddev' and 'latency average' represent.. But shouldn't
> we see the benefit/value of this patch by having a setup where a
> particular standby is slow in sending the response back to primary
> (could be due to network lag or other reasons) and then measuring the
> latency in receiving changes on failover-enabled logical subscribers?
> We can perform this test with both of the below settings and say make
> D and E slow in sending responses:
> 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.
>

Yes, I also expect the patch should perform better in such a scenario
but it is better to test it. Also, irrespective of that, we should
investigate why the reported case is slower for
synchronous_standby_names and see if we can improve it.

BTW, you for 2), I think you wanted to say synchronized_standby_slots,
not standby_slot_names. We have recently changed the GUC name.


-- 
With Regards,
Amit Kapila.




Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Daniel Gustafsson
> On 24 Jul 2024, at 07:44, Heikki Linnakangas  wrote:
> 
> On 18/06/2024 16:11, Daniel Gustafsson wrote:
>>> On 17 Jun 2024, at 19:38, Andres Freund  wrote:
>>> Seems we ought to use SSL_CTX_set_num_tickets() to prevent issuing the 
>>> useless
>>> tickets?
>> Agreed, in 1.1.1 and above as the API was only introduced then.  LibreSSL 
>> added
>> the API in 3.5.4 but only for compatibility since it doesn't support TLS
>> tickets at all.
> 
> Wow, that's a bizarre API. The OpenSSL docs are not clear on what the 
> possible values for SSL_CTX_set_num_tickets() are. It talks about 0, and 
> mentions that 2 is the default, but what does it mean to set it to 1, or 5, 
> for example?

It means that 1 or 5 tickets can be sent to the user, OpenSSL accepts an
arbitrary number of tickets (tickets can be issued at other points during the
connection than the handshake AFAICT).

> Anyway, it's pretty clear that SSL_CTX_set_num_tickets(0) can be used to 
> disable tickets, so that's fine.
> 
>>> It seems like a buglet in openssl that it forces each session tickets to be
>>> sent in its own packet (it does an explicit BIO_flush(), so even if we
>>> buffered between openssl and OS, as I think we should, we'd still send it
>>> separately), but I don't really understand most of this stuff.
>> I don't see anything in the RFCs so not sure.
>> The attached applies this, and I think this is backpatching material since we
>> arguably fail to do what we say in the code.  AFAIK we don't have a hard rule
>> against backpatching changes to autoconf/meson?
> 
> Looks good to me. Backpatching autoconf/meson changes is fine, we've done it 
> before.

Thanks for review, I've applied this backpatched all the way.

--
Daniel Gustafsson





Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Marina Polyakova

Hello!

On 2024-07-26 14:55, Daniel Gustafsson wrote:

Thanks for review, I've applied this backpatched all the way.


It looks like the recommended way of using autoheader [1] is now broken. 
The attached patch fixes the master branch for me.


[1] 
https://www.postgresql.org/message-id/30511.1546097762%40sss.pgh.pa.us


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index db3fcbecc3ab33c8593eeb4a6d2927f674e2f684..3dea3856aaf06a3c59a77cc46e76b6d6efd5f88c 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -388,6 +388,9 @@
 /* Define to 1 if you have the `SSL_CTX_set_cert_cb' function. */
 #undef HAVE_SSL_CTX_SET_CERT_CB
 
+/* Define to 1 if you have the `SSL_CTX_set_num_tickets' function. */
+#undef HAVE_SSL_CTX_SET_NUM_TICKETS
+
 /* Define to 1 if stdbool.h conforms to C99. */
 #undef HAVE_STDBOOL_H
 
@@ -517,9 +520,6 @@
 /* Define to 1 if you have the `X509_get_signature_info' function. */
 #undef HAVE_X509_GET_SIGNATURE_INFO
 
-/* Define to 1 if you have the `SSL_CTX_set_num_tickets' function. */
-#undef HAVE_SSL_CTX_SET_NUM_TICKETS
-
 /* Define to 1 if the assembler supports X86_64's POPCNTQ instruction. */
 #undef HAVE_X86_64_POPCNTQ
 


Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-07-26 Thread Alexander Korotkov
On Fri, May 17, 2024 at 1:11 PM Alexander Korotkov  wrote:
> I think the thread contains enough motivation on why 0002, 0003 and
> 0004 are material for post-FF.  They are fixes and refactoring for
> new-in-v17 feature.  I'm going to push them if no objections.
>
> Regarding 0001, I'd like to ask Tom and Mark if they find convincing
> that given that optimization is small, simple and giving huge effect,
> it could be pushed post-FF?  Otherwise, this could wait for v18.

The revised version of 0001 unique checking optimization is attached.
I'm going to push this to v18 if no objections.

--
Regards,
Alexander Korotkov
Supabase


v4-0001-amcheck-Optimize-speed-of-checking-for-unique-con.patch
Description: Binary data


Re: WIP: parallel GiST index builds

2024-07-26 Thread Andrey M. Borodin



> On 26 Jul 2024, at 14:30, Andreas Karlsson  wrote:
> 
> I feel the tricky part about doing that is that we need to make sure the fake 
> LSNs are all less than the current real LSN when the index build completes 
> and while that normally should be the case we will have a almost never 
> exercised code path for when the fake LSN becomes bigger than the real LSN 
> which may contain bugs. Is that really worth it to optimize.
> 
> But if we are going to use fake LSN: since the index being built is not 
> visible to any scans we do not have to use GetFakeLSNForUnloggedRel() but 
> could use an own counter in shared memory in the GISTShared struct for this 
> specific index which starts at FirstNormalUnloggedLSN. This would give us 
> slightly less contention plus decrease the risk (for good and bad) of the 
> fake LSN being larger than the real LSN.

+1 for atomic counter in GISTShared.
BTW we can just reset LSNs to GistBuildLSN just before doing 
log_newpage_range().


Best regards, Andrey Borodin.



Re: Protocol question regarding Portal vs Cursor

2024-07-26 Thread Dave Cramer
On Thu, 25 Jul 2024 at 17:52, Dave Cramer  wrote:

>
>
> On Thu, 25 Jul 2024 at 16:19, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Thursday, July 25, 2024, Dave Cramer  wrote:
>>
>> May not make a difference but…
>>
>>
>>> 2024-07-25 15:55:39 FINEST org.postgresql.core.v3.QueryExecutorImpl
>>> sendSimpleQuery  FE=> SimpleQuery(query="declare C_3 CURSOR WITHOUT HOLD
>>> FOR SELECT * FROM testsps WHERE id = 2")
>>>
>>
>> You named the cursor c_3 (lowercase due to SQL case folding)
>>
>>
>>> 2024-07-25 15:55:39 FINEST org.postgresql.core.v3.QueryExecutorImpl
>>> sendDescribePortal  FE=> Describe(portal=C_3)
>>>
>>
>> The protocol doesn’t do case folding
>>
>>
>>>
>>> 2024-07-25 15:55:39 FINEST org.postgresql.core.v3.QueryExecutorImpl
>>> receiveErrorResponse  <=BE ErrorMessage(ERROR: portal "C_3" does not exist
>>>
>>
>> As evidenced by this error message.
>>
>>   Location: File: postgres.c, Routine: exec_describe_portal_message,
>>> Line: 2708
>>>
>>>
>>
>> You would be absolutely correct! Thanks for the quick response
>
>
So while the API's are "virtually" identical AFAICT there is no way to
create a "WITH HOLD" portal ?

Dave

>


Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Daniel Gustafsson
> On 26 Jul 2024, at 14:03, Marina Polyakova  wrote:
> On 2024-07-26 14:55, Daniel Gustafsson wrote:
>> Thanks for review, I've applied this backpatched all the way.
> 
> It looks like the recommended way of using autoheader [1] is now broken. The 
> attached patch fixes the master branch for me.

Thanks for the report, I'll fix it.  Buildfarm animal hamerkop also reminded me
that I had managed to stash the old MSVC buildsystem changes (ENOTENOUGHCOFFEE)
so fixing that at the same time.

--
Daniel Gustafsson





Detailed release notes

2024-07-26 Thread Marcos Pegoraro
I'm now using version 14 and planning to update to 17 as soon as it comes
available. Then looking carefully to release notes to see exactly what I'll
get when updated I see lots of unexplained features. Just because release
notes does not explain exactly what that change does. And I don't have a
way to get what code or messages generated that feature.


   -

   Allow query nodes to be run in parallel in more cases (Tom Lane)
   Cool this feature, but when and what kind of query will use this ?
   -

   Improve EXPLAIN's display of SubPlan nodes and output parameters (Tom
   Lane, Dean Rasheed)
   hmm, interesting, but what exactly ?

Everything that is done in Postgres is public, all messages and code are
available to anyone, but when I want to know what that feature is exactly
using release notes, I don't know how to find it.

I think it would be very interesting if we have on release notes what was
discussed for that change.

   -

   Allow query nodes to be run in parallel in more cases (Tom Lane) CF1
    and CF2
   
   -

   Improve EXPLAIN's display of SubPlan nodes and output parameters (Tom
   Lane, Dean Rasheed) CF1 

And these CF links could point to commitfest or email messages or even a
detailed tutorial of that feature.

regards
Marcos


Re: query_id, pg_stat_activity, extended query protocol

2024-07-26 Thread Anthonin Bonnefoy
On Thu, Jul 18, 2024 at 10:56 AM Michael Paquier  wrote:
> Please attach things to your emails, if your repository disappears for
> a reason or another we would lose knowledge in the archives of the
> community lists.

Noted and thanks for the reminder, I'm still learning about mailing
list etiquette.

> I have not looked at that entirely in details, and I'd need to check
> if it is possible to use what's here for more predictible tests:
> https://www.postgresql.org/message-id/ZqCMCS4HUshUYjGc%40paquier.xyz

For the tests, there are limited possibilities to check whether a
query_id has been set correctly.
- Checking pg_stat_activity is not possible in the regress tests as
you need a second session to check the reported query_id.
- pg_stat_statements can be used indirectly but you're limited to how
pgss uses query_id. For example, it doesn't rely on queryId in
ExecutorRun.

A possible solution I've been thinking of is to use a test module. The
module will assert on whether the queryId is set or not in parse, plan
and executor hooks. It will also check if the queryId reported in
pgstat matches the queryId at the root level.

This allows us to check that the queryId is correctly set with the
extended protocol. I've also found some queries which will trigger a
failure (ctas and cursor usage) though this is probably a different
issue from the extended protocol issue.

Regards,
Anthonin


0001-Create-module-for-query_id-test.patch
Description: Binary data


Re: Direct SSL connection and ALPN loose ends

2024-07-26 Thread Heikki Linnakangas

On 24/07/2024 02:37, Michael Paquier wrote:

On Tue, Jul 23, 2024 at 08:32:29PM +0300, Heikki Linnakangas wrote:

All these new tests are a great asset when refactoring this again.


Thanks for doing that.  The coverage, especially with v2, is going to
be really useful.


Yeah, I'm also not too excited about the additional code in the backend, but
I'm also not excited about writing another test C module just for this. I'm
inclined to commit this as it is, but we can certainly revisit this later,
since it's just test code.


The point would be to rely on the existing injection_points module,
with a new callback in it.  The callbacks could be on a file of their
own in the module, for clarity.


Hmm, do we want injection_points module to be a dumping ground for 
callbacks that are only useful for very specific injection points, in 
specific tests? I view it as a more general purpose module, containing 
callbacks that are useful for many different tests. Don't get me wrong, 
I'm not necessarily against it, and it would be expedient, that's just 
not how I see the purpose of injection_points.



What you have is OK for me anyway, it
is good to add more options to developers in this area and this gets
used in core.  That's also enough to manipulate the stack in or even
out of core.


Ok, I kept it that way.


Here's a new rebased version with some minor cleanup. Notably, I added docs
for the new IS_INJECTION_POINT_ATTACHED() macro.


0001 looks OK.

+   push @events, "backenderror" if $line =~ /error triggered for
injection point backend-/;
+   push @events, "v2error" if $line =~ /protocol version 2 error
triggered/;

Perhaps append an "injection_" for these two keywords?

+#include "storage/proc.h"

This inclusion in injection_point.c should not be needed.


sets the FrontendProtocol global variable, but I think it's more
straightforward to have the test code


The last sentence in the commit message of 0002 seems to be
unfinished.


Fixed.


Could you run a perltidy on 005_negotiate_encryption.pl?  There are a
bunch of new diffs in it.


Fixed.

Committed, thanks for the review, and thanks Jacob for the testing!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test

2024-07-26 Thread Andrew Dunstan



On 2024-07-26 Fr 7:00 AM, Alexander Lakhin wrote:

25.07.2024 15:00, Alexander Lakhin wrote:



The other question is: why is 005_opclass_damage taking so much time 
there?

...
$ make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*" 
PROVE_FLAGS="--timer"
[11:11:53] t/005_opclass_damage.pl .. ok 1370 ms ( 0.00 usr 0.00 
sys +  0.10 cusr  0.07 csys =  0.17 CPU)


$ echo "debug_parallel_query = regress" >/tmp/extra.config
$ TEMP_CONFIG=/tmp/extra.config make -s check -C src/bin/pg_amcheck/ 
PROVE_TESTS="t/005*" PROVE_FLAGS="--timer"
[11:12:46] t/005_opclass_damage.pl .. ok    40854 ms ( 0.00 usr 0.00 
sys +  0.10 cusr  0.10 csys =  0.20 CPU)


...
So maybe at least this test should be improved for testing with
debug_parallel_query enabled, if such active use of parallel workers by
pg_amcheck can't be an issue to end users?



When running this test with "log_min_messages = DEBUG2" in my 
extra.config,

I see thousands of the following messages in the test log:
2024-07-26 09:32:54.544 UTC [2572189:46] DEBUG:  postmaster received 
pmsignal signal
2024-07-26 09:32:54.544 UTC [2572189:47] DEBUG:  registering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.544 UTC [2572189:48] DEBUG:  starting background 
worker process "parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:49] DEBUG:  unregistering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:50] DEBUG:  background worker 
"parallel worker" (PID 2572205) exited with exit code 0
2024-07-26 09:32:54.547 UTC [2572189:51] DEBUG:  postmaster received 
pmsignal signal
2024-07-26 09:32:54.547 UTC [2572189:52] DEBUG:  registering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.547 UTC [2572189:53] DEBUG:  starting background 
worker process "parallel worker for PID 2572197"
2024-07-26 09:32:54.549 UTC [2572189:54] DEBUG:  unregistering 
background worker "parallel worker for PID 2572197"
2024-07-26 09:32:54.549 UTC [2572189:55] DEBUG:  background worker 
"parallel worker" (PID 2572206) exited with exit code 0

...

grep ' registering background worker' \
 src/bin/pg_amcheck/tmp_check/log/005_opclass_damage_test.log | wc -l
15669

So this test launches more than 15000 processes when debug_parallel_query
is enabled.

As far as I can see, this is happening because of the "PARALLEL UNSAFE"
marking is ignored when the function is called by CREATE INDEX/amcheck.

Namely, with a function defined as:
    CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE 
sql AS $$
    SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 
END; $$;


SELECT int4_asc_cmp(1, 2);
executed without parallel workers. Whilst when it's used by an index:
CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
...
OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);

INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);

CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
launches 1000 parallel workers.

(This is reminiscent of bug #18314.)

One way to workaround this is to disable debug_parallel_query in the test
and another I find possible is to set max_parallel_workers = 0.




But wouldn't either of those just be masking the problem?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Detailed release notes

2024-07-26 Thread Daniel Gustafsson
> On 26 Jul 2024, at 14:30, Marcos Pegoraro  wrote:
> 
> I'm now using version 14 and planning to update to 17 as soon as it comes 
> available. Then looking carefully to release notes to see exactly what I'll 
> get when updated I see lots of unexplained features. Just because release 
> notes does not explain exactly what that change does. And I don't have a way 
> to get what code or messages generated that feature.

There is a way, but it's not exactly visible from reading the release notes.

> • Allow query nodes to be run in parallel in more cases (Tom Lane)
> Cool this feature, but when and what kind of query will use this ?

Reading the source of the release notes will show a comment which links to the
commit.  The source can be seen here:


https://github.com/postgres/postgres/blob/REL_17_STABLE/doc/src/sgml/release-17.sgml

..and the comment for this item is:



  
   
   Allow query nodes to be run in parallel in more cases (Tom Lane)
   
  

This comment tells us the relevant commit is e08d74ca1, which can be found here:

https://github.com/postgres/postgres/commit/e08d74ca1

This in turn leads to the mailinglist discussion for this specific feature:

https://www.postgresql.org/message-id/1129530.1681317...@sss.pgh.pa.us

--
Daniel Gustafsson





Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Marina Polyakova

On 2024-07-26 15:27, Daniel Gustafsson wrote:
On 26 Jul 2024, at 14:03, Marina Polyakova 
 wrote:
It looks like the recommended way of using autoheader [1] is now 
broken. The attached patch fixes the master branch for me.


Thanks for the report, I'll fix it.  Buildfarm animal hamerkop also 
reminded me
that I had managed to stash the old MSVC buildsystem changes 
(ENOTENOUGHCOFFEE)

so fixing that at the same time.


Thank you!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test

2024-07-26 Thread Alexander Lakhin

26.07.2024 15:41, Andrew Dunstan wrote:




One way to workaround this is to disable debug_parallel_query in the test
and another I find possible is to set max_parallel_workers = 0.




But wouldn't either of those just be masking the problem?



Yes, I'm inclined to consider this behavior a problem (what if the table
contained 1M rows?), that's why I called those solutions workarounds.

Of course, there are parallel_setup_cost and parallel_tuple_cost
parameters, which can prevent this from happening in the wild, but still...

Best regards.
Alexander




Re: Detailed release notes

2024-07-26 Thread Marcos Pegoraro
Em sex., 26 de jul. de 2024 às 09:45, Daniel Gustafsson 
escreveu:

>
> There is a way, but it's not exactly visible from reading the release
> notes.
>

Cool, didn't know that.
But why is that just a hidden comment and not a visible link for us ?

regards
Marcos


Re: Incremental backup from a streaming replication standby fails

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 1:09 AM Laurenz Albe  wrote:
> > Committed this version to master and v17.
>
> Thanks for taking care of this.

Sure thing!

I knew it was going to confuse someone ... I just wasn't sure what to
do about it. Now we've at least done something, which is hopefully
superior to nothing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Detailed release notes

2024-07-26 Thread Daniel Gustafsson
> On 26 Jul 2024, at 15:00, Marcos Pegoraro  wrote:

> But why is that just a hidden comment and not a visible link for us ?

That's likely the wrong level of detail for the overwhelming majority of
release notes readers.  I have a feeling this was discussed not too long ago
but (if so) I fail to find that discussion now.

--
Daniel Gustafsson





Re: Detailed release notes

2024-07-26 Thread Marcos Pegoraro
Em sex., 26 de jul. de 2024 às 10:11, Daniel Gustafsson 
escreveu:

>
> That's likely the wrong level of detail for the overwhelming majority of
> release notes readers.  I have a feeling this was discussed not too long
> ago
> but (if so) I fail to find that discussion now.


Wrong level ? Where is the appropriate place on DOCs to see exactly what
I'll get when updated ?
A separate page "Detailed Release Notes" ? I don't think so.
I think release notes are sometimes the only place we read to decide if an
upgrade is doable or not.

Well, that opened my eyes, now I can see detailed info about every feature
when it's committed.
And I'm really convinced that a small link to that commit wouldn't
get dirty release notes.


Re: Detailed release notes

2024-07-26 Thread Jonathan S. Katz
On Jul 26, 2024, at 9:26 AM, Marcos Pegoraro  wrote:Em sex., 26 de jul. de 2024 às 10:11, Daniel Gustafsson  escreveu:
That's likely the wrong level of detail for the overwhelming majority of
release notes readers.  I have a feeling this was discussed not too long ago
but (if so) I fail to find that discussion now.Wrong level ? Where is the appropriate place on DOCs to see exactly what I'll get when updated ? A separate page "Detailed Release Notes" ? I don't think so.I think release notes are sometimes the only place we read to decide if an upgrade is doable or not.Well, that opened my eyes, now I can see detailed info about every feature when it's committed.And I'm really convinced that a small link to that commit wouldn't get dirty release notes.FWIW, pgPedia has a version of the release notes that does get that granular:https://pgpedia.info/postgresql-versions/postgresql-17.htmlJonathan

Re: [Proposal] Add foreign-server health checks infrastructure

2024-07-26 Thread Fujii Masao




On 2024/07/26 17:07, Hayato Kuroda (Fujitsu) wrote:

Dear Fujii-san,

Just in case - based on the agreement in [1], I updated patches to keep them
consistent. We can use same pictures for further discussions...


Thanks for updating the patches! I pushed them.


IIUC, the patch which adds user_name attribute to get_connection() can be 
discussed
in later stage, is it right?


No, let's work on the patch at this stage :)

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: add function argument names to regex* functions.

2024-07-26 Thread jian he
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane  wrote:
>
> The small issue is that table 9.10 offers this syntax diagram
> for regexp_replace:
>
> regexp_replace ( string text, pattern text, replacement text [, start integer 
> ] [, flags text ] ) → text
>
> This implies that it's valid to write
>
> regexp_replace (string, pattern, replacement, start, flags)
>
> but it is not: we have no function matching that signature.  I'm not
> in a hurry to add one, either, for fear of ambiguity against the other
> regexp_replace signature.  I think this needs to be broken into two
> syntax diagrams:
>
> regexp_replace ( string text, pattern text, replacement text [, start integer 
> ] ) → text
> regexp_replace ( string text, pattern text, replacement text [, flags text ] 
> ) → text
>

this problem is still there, after commit
580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e.

<<
It has the syntax regexp_replace(string, pattern, replacement [, start
[, N ]] [, flags ]). (Notice that N cannot be specified unless start
is, but flags can be given in any case.)
<<
doc, the above part still needs change?

see my posts:
https://postgr.es/m/CACJufxE5p4KhGyBUwCZCxhxdU%2BzJBXy2deX4u85SL%2Bkew4F7Cw%40mail.gmail.com




Re: Detailed release notes

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 9:26 AM Marcos Pegoraro  wrote:
> Well, that opened my eyes, now I can see detailed info about every feature 
> when it's committed.
> And I'm really convinced that a small link to that commit wouldn't get dirty 
> release notes.

+1. I think those links would be useful to a lot of people.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: recoveryCheck/008_fsm_truncation is failing on dodo in v14- (due to slow fsync?)

2024-07-26 Thread Alexander Lakhin

Hello Robins,

28.06.2024 13:20, Robins Tharakan wrote:

The past ~1 week, I tried to space out all other tasks on the machine, so as to 
ensure
that 1-min CPU is mostly <2 (and thus not many things hammering the disk) and 
with
that I see 0 failures these past few days. This isn't conclusive by any means, 
but it
does seem that reducing IO contention has helped remove the errors, like what
Alexander suspects / repros here.

Just a note, that I've reverted some of those recent changes now, and so if the 
theory
holds true, I wouldn't be surprised if some of these errors restarted on dodo.


Looking back at the test failures, I can see errors really reappeared
just after your revert (at 2024-06-28), so that theory proved true,
but I see none of those since 2024-07-02. Does it mean that you changed
something on dodo/fixed that performance issue?

Could you please describe how you resolved this issue, just for the record?

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-06-28%2017%3A00%3A28
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-06-28%2017%3A10%3A12
[3] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-07-01%2012%3A10%3A12
[4] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-07-01%2013%3A01%3A00
[5] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-07-02%2005%3A00%3A36
[6] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-07-02%2018%3A00%3A15

Best regards,
Alexander

Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 8:27 AM Daniel Gustafsson  wrote:
> Thanks for the report, I'll fix it.  Buildfarm animal hamerkop also reminded 
> me
> that I had managed to stash the old MSVC buildsystem changes 
> (ENOTENOUGHCOFFEE)
> so fixing that at the same time.

I was just looking at this commit and noticing that nothing in the
commit message explains why we want to turn off tickets. So then I
looked at the comments in the patch and that didn't explain it either.
So then I read through this thread and that also didn't explain it.

I don't doubt that you're doing the right thing here but it'd be nice
to document why it's the right thing someplace.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Assertion failure with summarize_wal enabled during pg_createsubscriber

2024-07-26 Thread Robert Haas
On Mon, Jul 22, 2024 at 11:02 AM Robert Haas  wrote:
> I'm still hoping for some review/feedback/testing of these before I
> commit them, but I also don't want to wait too long.

Hearing nothing, pushed 0001.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Detailed release notes

2024-07-26 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 26 Jul 2024, at 15:00, Marcos Pegoraro  wrote:
>> But why is that just a hidden comment and not a visible link for us ?

> That's likely the wrong level of detail for the overwhelming majority of
> release notes readers.  I have a feeling this was discussed not too long ago
> but (if so) I fail to find that discussion now.

Yeah, I too recall some discussion of surfacing the commit links
somehow, perhaps as a popup tooltip.  Nobody's got round to it yet.
It's not real clear how to handle multiple links per , which
happens from time to time in major release notes and just about
everyplace in minor release notes.

regards, tom lane




Re: add function argument names to regex* functions.

2024-07-26 Thread Tom Lane
jian he  writes:
> On Fri, Jul 19, 2024 at 5:48 AM Tom Lane  wrote:
>> but it is not: we have no function matching that signature.  I'm not
>> in a hurry to add one, either, for fear of ambiguity against the other
>> regexp_replace signature.  I think this needs to be broken into two
>> syntax diagrams:

> this problem is still there, after commit
> 580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e.

No, I believe I fixed it: the table now offers

regexp_replace ( string text, pattern text, replacement text [, flags text ] ) 
→ text

regexp_replace ( string text, pattern text, replacement text, start integer [, 
N integer [, flags text ] ] ) → text

That's different from either of the solutions discussed in this
thread, but simpler.

> <<
> It has the syntax regexp_replace(string, pattern, replacement [, start
> [, N ]] [, flags ]). (Notice that N cannot be specified unless start
> is, but flags can be given in any case.)
> <<
> doc, the above part still needs change?

AFAICS, that one is correct, so I left it alone.  (I didn't try to
merge the table's two entries into one like that, though.)

regards, tom lane




Re: pgsql: Add more SQL/JSON constructor functions

2024-07-26 Thread jian he
On Fri, Jul 26, 2024 at 4:53 PM Amit Langote  wrote:
>
>
> Pushed 0003-0005 ahead of 0001-0002.  Will try to push them over the
> weekend.  Rebased for now.

{
...
/*
 * For expression nodes that support soft errors.  Should be set to NULL
 * before calling ExecInitExprRec() if the caller wants errors thrown.
 */
ErrorSaveContext *escontext;
} ExprState;

i believe by default makeNode will set escontext to NULL.
So the comment should be, if you want to catch the soft errors, make
sure the escontext pointing to an allocated ErrorSaveContext.
or maybe just say, default is NULL.

Otherwise, the original comment's meaning feels like: we need to
explicitly set it to NULL
for certain operations, which I believe is false?


struct
{
Oidtargettype;
int32targettypmod;
boolomit_quotes;
boolexists_coerce;
boolexists_cast_to_int;
boolcheck_domain;
void   *json_coercion_cache;
ErrorSaveContext *escontext;
}jsonexpr_coercion;
do we need to comment that "check_domain" is only used for JSON_EXISTS_OP?



While reviewing it, I found another minor issue.


json_behavior_type:
ERROR_P{ $$ = JSON_BEHAVIOR_ERROR; }
| NULL_P{ $$ = JSON_BEHAVIOR_NULL; }
| TRUE_P{ $$ = JSON_BEHAVIOR_TRUE; }
| FALSE_P{ $$ = JSON_BEHAVIOR_FALSE; }
| UNKNOWN{ $$ = JSON_BEHAVIOR_UNKNOWN; }
| EMPTY_P ARRAY{ $$ = JSON_BEHAVIOR_EMPTY_ARRAY; }
| EMPTY_P OBJECT_P{ $$ = JSON_BEHAVIOR_EMPTY_OBJECT; }
/* non-standard, for Oracle compatibility only */
| EMPTY_P{ $$ = JSON_BEHAVIOR_EMPTY_ARRAY; }
;

EMPTY_P behaves the same as EMPTY_P ARRAY
so for function GetJsonBehaviorConst, the following "case
JSON_BEHAVIOR_EMPTY:" is wrong?

case JSON_BEHAVIOR_NULL:
case JSON_BEHAVIOR_UNKNOWN:
case JSON_BEHAVIOR_EMPTY:
val = (Datum) 0;
isnull = true;
typid = INT4OID;
len = sizeof(int32);
isbyval = true;
break;


also src/backend/utils/adt/ruleutils.c
if (jexpr->on_error->btype != JSON_BEHAVIOR_EMPTY)
get_json_behavior(jexpr->on_error, context, "ERROR");

for json_value, json_query, i believe we can save some circles in
ExecInitJsonExpr
if you don't specify on error, on empty

can you please check the attached, based on your latest attachment.
From 618b48991d5239eb924070b0357c5208a9d1ca5c Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 26 Jul 2024 21:15:41 +0800
Subject: [PATCH v2 1/1] save some circle in ExecInitJsonExpr

if returning type is not domain and on_error->btype is JSON_BEHAVIOR_NULL
or
if returning type is not domain and on_empty->btype is JSON_BEHAVIOR_NULL
---
 src/backend/executor/execExpr.c | 139 +---
 1 file changed, 75 insertions(+), 64 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 3c4e503b..48324e98 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -4237,6 +4237,8 @@ ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 		jsexpr->on_error->btype != JSON_BEHAVIOR_ERROR ?
 		&jsestate->escontext : NULL;
 
+	bool 		returning_is_domain =
+get_typtype(jsexpr->returning->typid) == TYPTYPE_DOMAIN;
 	jsestate->jsexpr = jsexpr;
 
 	/*
@@ -4388,8 +4390,6 @@ ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 	if (jsexpr->on_error &&
 		jsexpr->on_error->btype != JSON_BEHAVIOR_ERROR)
 	{
-		ErrorSaveContext *saved_escontext;
-
 		jsestate->jump_error = state->steps_len;
 
 		/* JUMP to end if false, that is, skip the ON ERROR expression. */
@@ -4400,42 +4400,46 @@ ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 		scratch->d.jump.jumpdone = -1;	/* set below */
 		ExprEvalPushStep(state, scratch);
 
-		/*
-		 * Steps to evaluate the ON ERROR expression; handle errors softly to
-		 * rethrow them in COERCION_FINISH step that will be added later.
-		 */
-		saved_escontext = state->escontext;
-		state->escontext = escontext;
-		ExecInitExprRec((Expr *) jsexpr->on_error->expr,
-		state, resv, resnull);
-		state->escontext = saved_escontext;
-
-		/* Step to coerce the ON ERROR expression if needed */
-		if (jsexpr->on_error->coerce)
-			ExecInitJsonCoercion(state, jsexpr->returning, escontext,
- jsexpr->omit_quotes, false,
- resv, resnull);
-
-		/*
-		 * Add a COERCION_FINISH step to check for errors that may occur when
-		 * coercing and rethrow them.
-		 */
-		if (jsexpr->on_error->coerce ||
-			IsA(jsexpr->on_error->expr, CoerceViaIO) ||
-			IsA(jsexpr->on_error->expr, CoerceToDomain))
+		if (returning_is_domain || jsexpr->on_error->btype != JSON_BEHAVIOR_NULL)
 		{
-			scratch->opcode = EEOP_JSONEXPR_COERCION_FINISH;
-			scratch->resvalue = resv;
-			scratch-

Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Daniel Gustafsson
> On 26 Jul 2024, at 16:08, Robert Haas  wrote:
> 
> On Fri, Jul 26, 2024 at 8:27 AM Daniel Gustafsson  wrote:
>> Thanks for the report, I'll fix it.  Buildfarm animal hamerkop also reminded 
>> me
>> that I had managed to stash the old MSVC buildsystem changes 
>> (ENOTENOUGHCOFFEE)
>> so fixing that at the same time.
> 
> I was just looking at this commit and noticing that nothing in the
> commit message explains why we want to turn off tickets. So then I
> looked at the comments in the patch and that didn't explain it either.
> So then I read through this thread and that also didn't explain it.

Sorry for the lack of detail, I probably Stockholm syndromed myself after
having spent some time in this code.

We turn off TLS session tickets for two reasons: a) we don't support TLS
session resumption, and some resumption capable client libraries can experience
connection failures if they try to use tickets received in the setup (Npgsql at
least had this problem in the past); b) it's network overhead in the connection
setup phase which doesn't give any value due to us not supporting their use.

TLS tickets were disallowed in 2017 in 97d3a0b09 but as Andres found out,
TLSv1.3 session tickets had a new API which we didn't call and thus issued
tickets.

> I don't doubt that you're doing the right thing here but it'd be nice
> to document why it's the right thing someplace.

I can add a summary of the above in the comment for future readers if you think
that would be useful.

--
Daniel Gustafsson





Re: add function argument names to regex* functions.

2024-07-26 Thread jian he
On Fri, Jul 26, 2024 at 10:17 PM Tom Lane  wrote:
>
> > <<
> > It has the syntax regexp_replace(string, pattern, replacement [, start
> > [, N ]] [, flags ]). (Notice that N cannot be specified unless start
> > is, but flags can be given in any case.)
> > <<
> > doc, the above part still needs change?
>
> AFAICS, that one is correct, so I left it alone.  (I didn't try to
> merge the table's two entries into one like that, though.)
>

functions-string.html output is correct.

but in functions-matching.html

regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).

can represent

regexp_replace(string, pattern, replacement , start,  flags ) ?

but we don't have "regexp_replace(string, pattern, replacement ,
start,  flags )"




Re: DRAFT: Pass sk_attno to consistent function

2024-07-26 Thread Michał Kłeczek


> On 26 Jul 2024, at 10:10, Michał Kłeczek  wrote:
> 
> 
> 
>> On 26 Jul 2024, at 01:28, Matthias van de Meent 
>>  wrote:
>> 
>> All in all, this still seems like a very (very) specific optimization,
>> of which I'm not sure that it is generalizable. However, array
>> introspection and filtering for SAOP equality checks feel like a
>> relatively easy (?) push-down optimization in (e.g.) runtime partition
>> pruning (or planning); isn't that even better patch potential here?
> 
> The issue is not partition pruning for SAOP - it works fine. The issue is 
> lack of SAOP support in GIST.
> 
> Because I cannot use SAOP I have two options:
> 
> 1) LATERAL JOIN (ie. iterate through input array elements, SELECT rows for 
> each, merge results)
> 2) Implement a custom operator that emulates SAOP and provide consistent 
> function for it. Additionally provide SAOP clause (redundantly) to enable 
> partition pruning.
> 
> In case of 1):
> - the query becomes convoluted with multiple redundant ORDER BY and LIMIT 
> clauses
> - unnecessary sort is performed (because we have to merge results of 
> subqueries)
> - some partitions are scanned multiple times (per each element in input array 
> that happens to land in the same partition)
> 
> In case of 2):
> - the whole input array is passed to consistent function for each partition 
> so we unnecessarily search for non-existent rows

Which is especially painful in case of sharding implementation based on 
partitioning and postgres_fdw as it requires multiple
SELECTS from remote partition.

> 
> To fix the issue in 2) I need to somehow filter input array per partition - 
> hence this patch.
> 

Regards,

—
Michal



Re: Direct SSL connection and ALPN loose ends

2024-07-26 Thread Heikki Linnakangas

On 17/06/2024 21:33, Andres Freund wrote:

If provided with the necessary key information, wireshark can decode TLS
exchanges when using sslnegotiation=postgres but not with direct. Presumably
it needs to be taught postgres' ALPN id or something.


I opened https://gitlab.com/wireshark/wireshark/-/merge_requests/16612 
to fix that in the wireshark pgsql protocol dissector.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 8:10 AM Alexander Korotkov  wrote:
> The revised version of 0001 unique checking optimization is attached.
> I'm going to push this to v18 if no objections.

I have no reason to specifically object to pushing this into 18, but I
would like to point out that you're posting here about this but failed
to reply to the "64-bit pg_notify page numbers truncated to 32-bit",
an open item that was assigned to you but which, since you didn't
respond, was eventually fixed by commits from Michael Paquier.

I know it's easy to lose track of the open items list and I sometimes
forget to check it myself, but it's rather important to stay on top of
any open items that get assigned to you.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: add function argument names to regex* functions.

2024-07-26 Thread Tom Lane
jian he  writes:
> On Fri, Jul 26, 2024 at 10:17 PM Tom Lane  wrote:
>> AFAICS, that one is correct, so I left it alone.  (I didn't try to
>> merge the table's two entries into one like that, though.)

> regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).

> can represent

> regexp_replace(string, pattern, replacement , start,  flags ) ?

Hmm, yeah, you're right.  I didn't want to write two separate
synopses there, but maybe there's no choice.

regards, tom lane




Re: Extension using Meson as build system

2024-07-26 Thread Peter Eisentraut

On 30.06.24 15:17, Junwang Zhao wrote:

Is there any extension that uses meson as build systems?
I'm starting a extension project that written in c++, cmake is my
initial choice as the build system, but since PostgreSQL has adopted
Meson, so I'm wondering if there is any extension that also uses
meson that I can reference.


I wrote a little demo:

https://github.com/petere/plsh/blob/meson/meson.build





Re: Extension using Meson as build system

2024-07-26 Thread Junwang Zhao
Hi, Peter

On Fri, Jul 26, 2024 at 11:06 PM Peter Eisentraut  wrote:
>
> On 30.06.24 15:17, Junwang Zhao wrote:
> > Is there any extension that uses meson as build systems?
> > I'm starting a extension project that written in c++, cmake is my
> > initial choice as the build system, but since PostgreSQL has adopted
> > Meson, so I'm wondering if there is any extension that also uses
> > meson that I can reference.
>
> I wrote a little demo:
>
> https://github.com/petere/plsh/blob/meson/meson.build
>

Thanks, I will check it out ;)

-- 
Regards
Junwang Zhao




Re: [PATCH] Add crc32(text) & crc32(bytea)

2024-07-26 Thread Nathan Bossart
On Fri, Jul 26, 2024 at 12:01:40PM +0300, Aleksander Alekseev wrote:
>> This sounds generally reasonable to me, especially given the apparent
>> demand.  Should we also introduce crc32c() while we're at it?
> 
> Might be a good idea. However I didn't see a demand for crc32c() SQL
> function yet. Also I'm not sure whether the best interface for it
> would be crc32c() or crc32(x, version='c') or perhaps crc32(x,
> polinomial=...). I propose keeping the scope small this time.

I don't think adding crc32c() would sufficiently increase the scope.  We'd
use the existing implementations for both crc32() and crc32c().  And
besides, this could be useful for adding tests for that code.

+crc32 ( text )

Do we need a version of the function that takes a text input?  It's easy
enough to cast to a bytea.

+text

My first reaction is that we should just have this return bytea like the
SHA ones do, if for no other reason than commit 10cfce3 seems intended to
move us away from returning text for these kinds of functions.  Upthread,
you mentioned the possibility of returning a bigint, too.  I think I'd
still prefer bytea in case we want to add, say, crc64() or crc16() in the
future.  That would allow us to keep all of these functions consistent
instead of returning different types for each.  However, I understand that
returning the numeric types might be more convenient.  I'm curious what
others think about this.

+Computes the CRC32 hash of
+the binary string, with the result written in hexadecimal.

I'm not sure we should call the check values "hashes."  Wikipedia does
include them in the "List of hash functions" page [0], but it seems to
deliberately avoid calling them hashes in the CRC page [1].  I'd suggest
calling them "CRC32 values" instead.

[0] https://en.wikipedia.org/wiki/List_of_hash_functions
[1] https://en.wikipedia.org/wiki/Cyclic_redundancy_check

-- 
nathan




Re: Assertion failure with summarize_wal enabled during pg_createsubscriber

2024-07-26 Thread Nathan Bossart
On Fri, Jul 26, 2024 at 10:09:22AM -0400, Robert Haas wrote:
> On Mon, Jul 22, 2024 at 11:02 AM Robert Haas  wrote:
>> I'm still hoping for some review/feedback/testing of these before I
>> commit them, but I also don't want to wait too long.
> 
> Hearing nothing, pushed 0001.

nitpick:  I think this one needs a pgindent.

-- 
nathan




Re: Detailed release notes

2024-07-26 Thread Masahiko Sawada
On Fri, Jul 26, 2024 at 6:56 AM Robert Haas  wrote:
>
> On Fri, Jul 26, 2024 at 9:26 AM Marcos Pegoraro  wrote:
> > Well, that opened my eyes, now I can see detailed info about every feature 
> > when it's committed.
> > And I'm really convinced that a small link to that commit wouldn't get 
> > dirty release notes.
>
> +1. I think those links would be useful to a lot of people.

+1. I've been asked a lot of times how to find the associated commit
IDs from release note items. These links would help users know the
details of the changes, and I believe many users would like to do
that.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Assertion failure with summarize_wal enabled during pg_createsubscriber

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 11:51 AM Nathan Bossart
 wrote:
> nitpick:  I think this one needs a pgindent.

Ugh, sorry. I thought of that while I was working on the commit but
then I messed up some other aspect of it and this went out of my head.

Fixed now, I hope.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Detailed release notes

2024-07-26 Thread Marcos Pegoraro
Em sex., 26 de jul. de 2024 às 13:01, Masahiko Sawada 
escreveu:

>
> +1. I've been asked a lot of times how to find the associated commit
> IDs from release note items. These links would help users know the
> details of the changes, and I believe many users would like to do
> that.


Yes, this way release notes would explain itself.

For now my release notes will be this
https://github.com/postgres/postgres/blob/REL_17_STABLE/doc/src/sgml/release-17.sgml
and not this
https://www.postgresql.org/docs/17/release-17.html

regards
Marcos


Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there

2024-07-26 Thread Tom Lane
Erik Wienhold  writes:
> On 2024-07-25 22:29 +0200, Tom Lane wrote:
>> This still isn't following our usual message style IMO.  Here's a
>> proposed v7 that outputs ...

> Works for me.  Thanks!

Pushed, then.

regards, tom lane




Re: Extension using Meson as build system

2024-07-26 Thread Andrew Dunstan



On 2024-07-26 Fr 11:06 AM, Peter Eisentraut wrote:

On 30.06.24 15:17, Junwang Zhao wrote:

Is there any extension that uses meson as build systems?
I'm starting a extension project that written in c++, cmake is my
initial choice as the build system, but since PostgreSQL has adopted
Meson, so I'm wondering if there is any extension that also uses
meson that I can reference.


I wrote a little demo:

https://github.com/petere/plsh/blob/meson/meson.build




nifty!


cheers


andew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Add ALL_CANDIDATES option to EXPLAIN

2024-07-26 Thread Anthonin Bonnefoy
Hi,

I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
of this option is to print all plan candidates instead of only the
cheapest plan. It will output the plans from the most expensive at the
top to the cheapest. Here's an example:

explain (all_candidates) select * from pgbench_accounts where aid=1;
 QUERY PLAN
-
 Plan 1
   ->  Gather  (cost=1000.00..3375.39 rows=1 width=97)
 Workers Planned: 1
 ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..2375.29 rows=1 width=97)
   Filter: (aid = 1)
 Plan 2
   ->  Seq Scan on pgbench_accounts  (cost=0.00..2890.00 rows=1 width=97)
 Filter: (aid = 1)
 Plan 3
   ->  Bitmap Heap Scan on pgbench_accounts  (cost=4.30..8.31 rows=1 width=97)
 Recheck Cond: (aid = 1)
 ->  Bitmap Index Scan on pgbench_accounts_pkey
(cost=0.00..4.30 rows=1 width=0)
   Index Cond: (aid = 1)
 Plan 4
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.29..8.31 rows=1 width=97)
 Index Cond: (aid = 1)

This can provide very useful insight on the planner's decisions like
whether it tried to use a specific index and how much cost difference
there is with the top plan. Additionally, it makes it possible to spot
discrepancies in generated plans like incorrect row estimation [1].

The plan list is generated from the upper_rel's pathlist. However, due
to how planning mutates the PlannerGlobal state, we can't directly
iterate the path list generated by the subquery_planner and create a
planned statement for them. Instead, we need to plan from scratch for
every path in the pathlist to generate the list of PlannedStmt.

The patch is split in two mostly to ease the review:
001: Propagate PlannerInfo root to add_path. This is needed to prevent
add_path from discarding path if all_candidates is enabled which will
be stored in PlannerGlobal.
002: Add the planner_all_candidates function and print of candidate
list in explain

[1] 
https://www.postgresql.org/message-id/flat/CAO6_Xqr9+51NxgO=XospEkUeAg-p=ejawmtpdczwjrggkj5...@mail.gmail.com

Regards,
Anthonin


v1-0002-Add-ALL_CANDIDATES-explain-option.patch
Description: Binary data


v1-0001-Propagate-root-PlannerInfo-to-add_path.patch
Description: Binary data


Re: Add ALL_CANDIDATES option to EXPLAIN

2024-07-26 Thread Tom Lane
Anthonin Bonnefoy  writes:
> I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
> of this option is to print all plan candidates instead of only the
> cheapest plan. It will output the plans from the most expensive at the
> top to the cheapest.

This doesn't seem feasible at all to me.  If we don't prune plans
fairly aggressively at lower plan levels, we'll have a combinatorial
explosion in the amount of work the planner has to do.  Have you
tried this on even slightly complex plans --- say, a dozen-way join?
I do not think you'll like the runtime, the amount of memory consumed,
or the verboseness of the output.  (I wonder how it interacts with
GEQO, too.)

regards, tom lane




Re: Add ALL_CANDIDATES option to EXPLAIN

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 12:59 PM Anthonin Bonnefoy
 wrote:
> I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
> of this option is to print all plan candidates instead of only the
> cheapest plan. It will output the plans from the most expensive at the
> top to the cheapest. Here's an example:

It's difficult for me to understand how this can work. Either it's not
really going to print out all candidates, or it's going to print out
gigabytes of output for moderately complex queries.

I've thought about trying to figure out some way of identifying and
printing out plans that are "interestingly different" from the chosen
plan, with the costs they would have had, but I haven't been able to
come up with a good algorithm. Printing out absolutely everything
doesn't seem viable, because planning would be slow and use amazing
amounts of memory and the output would be so large as to be useless.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: tiny step toward threading: reduce dependence on setlocale()

2024-07-26 Thread Andreas Karlsson

Nice refactoring!

Two small comments about CheckMyDatabase().

- Shouldn't we look at the default_locale.ctype_is_c when setting 
database_ctype_is_c instead of doing a strcmp()? or maybe we should even 
remove the global variable and always look at the default_locale?


- I think that the lookup of Anum_pg_database_datlocale could be done 
later in the code since it is not needed when we use a libc locale. E.g. 
as below.


if (dbform->datlocprovider == COLLPROVIDER_LIBC)
locale = collate;
else
{
datum = SysCacheGetAttr(DATABASEOID, tup, 
Anum_pg_database_datlocale, &isnull);

if (!isnull)
locale = TextDatumGetCString(datum);
}

Also is there any reaosn you do not squash th 4th and the 6th patch?

Andreas




Re: Add ALL_CANDIDATES option to EXPLAIN

2024-07-26 Thread Tom Lane
Robert Haas  writes:
> I've thought about trying to figure out some way of identifying and
> printing out plans that are "interestingly different" from the chosen
> plan, with the costs they would have had, but I haven't been able to
> come up with a good algorithm.

I wonder how far you'd get by just printing the surviving paths
(that is, something like Anthonin's patch, but without lobotomizing
add_path).  The survivors would have to dominate the cheapest-total
path along one of the other metrics add_path considers, which
seems like a rough approximation of "interestingly different".

regards, tom lane




Re: xid_wraparound tests intermittent failure.

2024-07-26 Thread Masahiko Sawada
On Thu, Jul 25, 2024 at 6:52 PM Andrew Dunstan  wrote:
>
>
> On 2024-07-25 Th 3:40 PM, Masahiko Sawada wrote:
>
> On Thu, Jul 25, 2024 at 11:06 AM Masahiko Sawada  
> wrote:
>
> On Thu, Jul 25, 2024 at 10:56 AM Andrew Dunstan  wrote:
>
> On 2024-07-23 Tu 6:59 PM, Masahiko Sawada wrote:
>
> See 
> 
>
>
> The failure logs are from a run where both tests 1 and 2 failed.
>
> Thank you for sharing the logs.
>
> I think that the problem seems to match what Alexander Lakhin
> mentioned[1]. Probably we can fix such a race condition somehow but
> I'm not sure it's worth it as setting autovacuum = off and
> autovacuum_max_workers = 1 (or a low number) is an extremely rare
> case. I think it would be better to stabilize these tests. One idea is
> to turn the autovacuum GUC parameter on while setting
> autovacuum_enabled = off for each table. That way, we can ensure that
> autovacuum workers are launched. And I think it seems to align real
> use cases.
>
> Regards,
>
> [1] 
> https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com
>
>
> OK, do you want to propose a patch?
>
> Yes, I'll prepare and share it soon.
>
> I've attached the patch. Could you please test if the patch fixes the
> instability you observed?
>
> Since we turn off autovacuum on all three tests and we wait for
> autovacuum to complete processing databases, these tests potentially
> have a similar (but lower) risk. So I modified these tests to turn it
> on so we can ensure the autovacuum runs periodically.
>
>
> I assume you actually meant to remove the "autovacuum = off" in 
> 003_wraparound.pl. With that change in your patch I retried my test, but on 
> iteration 100 out of 100 it failed on test 002_limits.pl.
>

I think we need to remove the "autovacuum = off' also in 002_limits.pl
as it waits for autovacuum to process both template0 and template1
databases. Just to be clear, the failure happened even without
"autovacuum = off"?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Add ALL_CANDIDATES option to EXPLAIN

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 1:40 PM Tom Lane  wrote:
> I wonder how far you'd get by just printing the surviving paths
> (that is, something like Anthonin's patch, but without lobotomizing
> add_path).  The survivors would have to dominate the cheapest-total
> path along one of the other metrics add_path considers, which
> seems like a rough approximation of "interestingly different".

My guess is it wouldn't be that great. It seems easy to imagine that
the key decision for a particular plan might be whether to use table A
or B as the driving table, or whether to sequential scan or index scan
some table involved in the query. It could well be that you end up
with the same output ordering either way (or no output ordering) for
one reason or another. I'm actually not sure "interestingly different"
can be defined in a useful, general way, because how is the computer
to know what the human being cares about in a particular case? In
practice, it feels like what you'd often like to do is say "show me
the plan if you { started with table | used scan method Y on table X |
did not use index X | whatever }". I haven't given up on the idea that
there could be some way of defining interesting-ness that avoids the
need for the user to say what they think is interesting, but it
certainly feels like one needs to be a lot more clever to make it work
without user input.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel heap vacuum

2024-07-26 Thread Masahiko Sawada
On Thu, Jul 25, 2024 at 2:58 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san,
>
> > Thank you for the test!
> >
> > I could reproduce this issue and it's a bug; it skipped even
> > non-all-visible pages. I've attached the new version patch.
> >
> > BTW since we compute the number of parallel workers for the heap scan
> > based on the table size, it's possible that we launch multiple workers
> > even if most blocks are all-visible. It seems to be better if we
> > calculate it based on (relpages - relallvisible).
>
> Thanks for updating the patch. I applied and confirmed all pages are scanned.
> I used almost the same script (just changed max_parallel_maintenance_workers)
> and got below result. I think the tendency was the same as yours.
>
> ```
> parallel 0: 61114.369 ms
> parallel 1: 34870.316 ms
> parallel 2: 23598.115 ms
> parallel 3: 17732.363 ms
> parallel 4: 15203.271 ms
> parallel 5: 13836.025 ms
> ```

Thank you for testing!

>
> I started to read your codes but takes much time because I've never seen 
> before...
> Below part contains initial comments.
>
> 1.
> This patch cannot be built when debug mode is enabled. See [1].
> IIUC, this was because NewRelminMxid was moved from struct LVRelState to 
> PHVShared.
> So you should update like " vacrel->counters->NewRelminMxid".

Right, will fix.

> 2.
> I compared parallel heap scan and found that it does not have compute_worker 
> API.
> Can you clarify the reason why there is an inconsistency?
> (I feel it is intentional because the calculation logic seems to depend on 
> the heap structure,
> so should we add the API for table scan as well?)

There is room to consider a better API design, but yes, the reason is
that the calculation logic depends on table AM implementation. For
example, I thought it might make sense to consider taking the number
of all-visible pages into account for the calculation of the number of
parallel workers as we don't want to launch many workers on the table
where most pages are all-visible. Which might not work for other table
AMs.

I'm updating the patch to implement parallel heap vacuum and will
share the updated patch. It might take time as it requires to
implement shared iteration support in radx tree.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [Proposal] Add foreign-server health checks infrastructure

2024-07-26 Thread Fujii Masao




On 2024/07/26 22:44, Fujii Masao wrote:



On 2024/07/26 17:07, Hayato Kuroda (Fujitsu) wrote:

Dear Fujii-san,

Just in case - based on the agreement in [1], I updated patches to keep them
consistent. We can use same pictures for further discussions...


Thanks for updating the patches! I pushed them.


The buildfarm member "hake" reported a failure in the postgres_fdw regression 
test.

diff -U3 
/export/home/elmer/c15x/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/expected/postgres_fdw.out
 
/export/home/elmer/c15x/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/results/postgres_fdw.out
--- 
/export/home/elmer/c15x/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/expected/postgres_fdw.out
   Fri Jul 26 19:16:29 2024
+++ 
/export/home/elmer/c15x/buildroot/HEAD/pgsql.build/contrib/postgres_fdw/results/postgres_fdw.out
Fri Jul 26 19:31:12 2024
@@ -12326,7 +12326,7 @@
   FROM postgres_fdw_get_connections(true);
  case
 --
-1
+0
 (1 row)
 
 -- Clean up



The regression.diffs shows that pgfdw_conn_check returned 0 even though 
pgfdw_conn_checkable()
returned true. This can happen if the "revents" from poll() indicates something 
other than
POLLRDHUP. I think that "revents" could indicate POLLHUP, POLLERR, or POLLNVAL. 
Therefore,
IMO pgfdw_conn_check() should be updated as follows. I will test this change.

-   return (input_fd.revents & POLLRDHUP) ? 1 : 0;
+   return (input_fd.revents &
+   (POLLRDHUP | POLLHUP | POLLERR | POLLNVAL)) ? 1 
: 0;

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: tls 1.3: sending multiple tickets

2024-07-26 Thread Robert Haas
On Fri, Jul 26, 2024 at 10:23 AM Daniel Gustafsson  wrote:
> We turn off TLS session tickets for two reasons: a) we don't support TLS
> session resumption, and some resumption capable client libraries can 
> experience
> connection failures if they try to use tickets received in the setup (Npgsql 
> at
> least had this problem in the past); b) it's network overhead in the 
> connection
> setup phase which doesn't give any value due to us not supporting their use.
>
> TLS tickets were disallowed in 2017 in 97d3a0b09 but as Andres found out,
> TLSv1.3 session tickets had a new API which we didn't call and thus issued
> tickets.

Thanks much for this explanation.

> > I don't doubt that you're doing the right thing here but it'd be nice
> > to document why it's the right thing someplace.
>
> I can add a summary of the above in the comment for future readers if you 
> think
> that would be useful.

I think having (a) and (b) from above at the end of the "Disallow SSL
session tickets" comment would be helpful.

While I'm complaining, the bit about SSL renegotiation could use a
better comment, too. One of my chronic complaints about comments is
that they should say why we're doing things, not what we're doing. To
me, having a comment that says "Disallow SSL renegotiation" followed
immediately by SSL_CTX_set_options(context, SSL_OP_NO_RENEGOTIATION)
is a good example of what not to do, because, I mean, I can guess
without the comment what that does. Actually, that comment is quite
well-written in terms of explaining why we do it in different ways
depending on the OpenSSL version; it just fails to explain why it's
important in the first place. I'm pretty sure I know in that case that
there are CVEs about that topic, but that's just because I happen to
remember the mailing list discussion on it, and I don't think every
hacker is contractually required to remember that.

I don't want to sound like I'm giving orders and I think it's really
up to you how much effort you want to put in here, but I feel like any
place where we are doing X because of some property of a non-PG code
base with which a particular reader might not be familiar, we should
have a comment explaining why we're doing it. And especially if it's
security-relevant.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-26 Thread Nathan Bossart
On Thu, Jul 25, 2024 at 11:42:55AM +0200, Daniel Gustafsson wrote:
>> On 25 Jul 2024, at 04:58, Nathan Bossart  wrote:
>> Here is just the live_check patch.  I rebased it, gave it a commit message,
>> and fixed a silly mistake.  Barring objections or cfbot indigestion, I plan
>> to commit this within the next couple of days.
> 
> LGTM

Thanks, committed.

-- 
nathan




Re: Inline non-SQL SRFs using SupportRequestSimplify

2024-07-26 Thread Tom Lane
Heikki Linnakangas  writes:
> On 28/06/2024 01:01, Paul Jungwirth wrote:
>> Another approach I considered is using a separate support request, e.g. 
>> SupportRequestInlineSRF, and
>> just calling it from inline_set_returning_function. I didn't like having two 
>> support requests that
>> did almost exactly the same thing. OTOH my current approach means you'll get 
>> an error if you do this:
>> 
>> ```
>> postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 
>> 'positions', 'employee_id',
>> 'valid_at');
>> ERROR:  unrecognized node type: 66
>> ```
>> 
>> I'll look into ways to fix that.

I like this idea, but I like exactly nothing about this implementation.
The right thing is to have a separate SupportRequestInlineSRF request
that is called directly by inline_set_returning_function.  It might be
"almost the same thing" as SupportRequestSimplify, but "almost" only
counts in horseshoes and hand grenades.  In particular, returning a
Query node is simply broken for SupportRequestSimplify (as your
example demonstrates), whereas it's the only correct result for
SupportRequestInlineSRF.

You could imagine keeping it to one support request by adding a
boolean field to the request struct to show which behavior is wanted,
but I think the principal result of that would be to break extensions
that weren't expecting such calls.  The defined mechanism for
extending the SupportRequest protocol is to add new support request
codes, not to whack around the APIs of existing ones.

> I think we should actually add an assertion after the call to the 
> SupportRequestSimplify support function, to check that it returned an 
> Expr node.

Um ... IsA(node, Expr) isn't going to work, and I'm not sure that
it'd be useful to try to enumerate the set of Expr subtypes that
should be allowed there.  But possibly it'd be worth asserting that
it's not a Query, just in case anyone gets confused about the
difference between SupportRequestSimplify and SupportRequestInlineSRF.

It would be good to have an in-core test case for this request type,
but I don't really see any built-in SRFs for which expansion as a
sub-SELECT would be an improvement.

regards, tom lane




Re: pg_upgrade failing for 200+ million Large Objects

2024-07-26 Thread Alexander Korotkov
Hi, Justin!

Thank you for sharing this.

On Wed, Jul 24, 2024 at 5:18 PM Justin Pryzby  wrote:
> On Mon, Apr 01, 2024 at 03:28:26PM -0400, Tom Lane wrote:
> > Nathan Bossart  writes:
> > > The one design point that worries me a little is the non-configurability 
> > > of
> > > --transaction-size in pg_upgrade.  I think it's fine to default it to 
> > > 1,000
> > > or something, but given how often I've had to fiddle with
> > > max_locks_per_transaction, I'm wondering if we might regret hard-coding 
> > > it.
> >
> > Well, we could add a command-line switch to pg_upgrade, but I'm
> > unconvinced that it'd be worth the trouble.  I think a very large
> > fraction of users invoke pg_upgrade by means of packager-supplied
> > scripts that are unlikely to provide a way to pass through such
> > a switch.  I'm inclined to say let's leave it as-is until we get
> > some actual field requests for a switch.
>
> I've been importing our schemas and doing upgrade testing, and was
> surprised when a postgres backend was killed for OOM during pg_upgrade:
>
> Killed process 989302 (postgres) total-vm:5495648kB, anon-rss:5153292kB, ...
>
> Upgrading from v16 => v16 doesn't use nearly as much RAM.
>
> While tracking down the responsible commit, I reproduced the problem
> using a subset of tables; at 959b38d770, the backend process used
> ~650 MB RAM, and at its parent commit used at most ~120 MB.
>
> 959b38d770b Invent --transaction-size option for pg_restore.
>
> By changing RESTORE_TRANSACTION_SIZE to 100, backend RAM use goes to
> 180 MB during pg_upgrade, which is reasonable.
>
> With partitioning, we have a lot of tables, some of them wide (126
> partitioned tables, 8942 childs, total 1019315 columns).  I didn't track
> if certain parts of our schema contribute most to the high backend mem
> use, just that it's now 5x (while testing a subset) to 50x higher.

Do you think there is a way to anonymize the schema and share it?

> We'd surely prefer that the transaction size be configurable.

I think we can add an option to pg_upgrade.  But I wonder if there is
something else we can do.  It seems that restoring some objects is
much more expensive than restoring others.  It would be nice to
identify such cases and check which memory contexts are growing and
why.  It would be helpful if you could share your data schema, so we
could dig into it.

I can imagine we need to count some DDL commands in aspect of maximum
restore transaction size in a different way than others.  Also, we
probably need to change the default restore transaction size.

--
Regards,
Alexander Korotkov
Supabase




Re: Building with meson on NixOS/nixpkgs

2024-07-26 Thread Tristan Partin
Heikki asked me to take a look at this patchset for the commitfest. 
Looks good to me.


Heikki, just be careful rebasing the first patch. You need to make sure 
the newly set `required: false` gets carried forward.


--
Tristan Partin
Neon (https://neon.tech)




Re: Converting tab-complete.c's else-if chain to a switch

2024-07-26 Thread Tom Lane
I wrote:
> I modified the preprocessor to work like that, and I like the results
> better than what I had.

I thought this version of the patch would be less subject to merge
conflicts than v1, but it didn't take long for somebody to break it.
Rebased v3 attached -- no nontrivial changes from v2.

I'd like to get this merged soon ...

regards, tom lane

From 273ec2558fad646a2835bfe795fb12b8e821a12f Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Fri, 26 Jul 2024 15:41:20 -0400
Subject: [PATCH v3 1/3] Invent "MatchAnyN" option for tab-complete.c's
 Matches/MatchesCS.

This argument matches any number (including zero) of previous words.
Use it to replace the common coding pattern

	if (HeadMatches("A", "B") && TailMatches("X", "Y"))

with

	if (Matches("A", "B", MatchAnyN, "X", "Y"))

In itself this feature doesn't do much except (arguably) make the
code slightly shorter and more readable.  However, it reduces the
number of complex if-condition patterns that have to be dealt with
in the next commits in this series.

While here, restructure the *Matches implementation functions so
that the actual work is done in functions that take a char **
array of pattern strings, and the versions taking variadic arguments
are thin wrappers around the array ones.  This simplifies the
new Matches logic considerably.  At the end of this patch series,
the array functions will be the only ones that are material to
performance, so having the variadic ones be wrappers makes sense.

Discussion: https://postgr.es/m/2208466.1720729...@sss.pgh.pa.us
---
 src/bin/psql/tab-complete.c | 400 +++-
 1 file changed, 258 insertions(+), 142 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 891face1b6..8f68bfc54d 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1446,10 +1446,12 @@ initialize_readline(void)
  *
  * For readability, callers should use the macros MatchAny and MatchAnyExcept
  * to invoke those two special cases for 'pattern'.  (But '|' and '*' must
- * just be written directly in patterns.)
+ * just be written directly in patterns.)  There is also MatchAnyN, but that
+ * is supported only in Matches/MatchesCS and is not handled here.
  */
 #define MatchAny  NULL
 #define MatchAnyExcept(pattern)  ("!" pattern)
+#define MatchAnyN ""
 
 static bool
 word_matches(const char *pattern,
@@ -1514,107 +1516,196 @@ word_matches(const char *pattern,
 }
 
 /*
- * Implementation of TailMatches and TailMatchesCS macros: do the last N words
- * in previous_words match the variadic arguments?
+ * Implementation of TailMatches and TailMatchesCS tests: do the last N words
+ * in previous_words match the pattern arguments?
  *
  * The array indexing might look backwards, but remember that
  * previous_words[0] contains the *last* word on the line, not the first.
  */
 static bool
-TailMatchesImpl(bool case_sensitive,
-int previous_words_count, char **previous_words,
-int narg,...)
+TailMatchesArray(bool case_sensitive,
+ int previous_words_count, char **previous_words,
+ int narg, const char *const *args)
 {
-	va_list		args;
-
 	if (previous_words_count < narg)
 		return false;
 
-	va_start(args, narg);
-
 	for (int argno = 0; argno < narg; argno++)
 	{
-		const char *arg = va_arg(args, const char *);
+		const char *arg = args[argno];
 
 		if (!word_matches(arg, previous_words[narg - argno - 1],
 		  case_sensitive))
-		{
-			va_end(args);
 			return false;
-		}
 	}
 
-	va_end(args);
-
 	return true;
 }
 
 /*
- * Implementation of Matches and MatchesCS macros: do all of the words
- * in previous_words match the variadic arguments?
+ * As above, but the pattern is passed as a variadic argument list.
  */
 static bool
-MatchesImpl(bool case_sensitive,
-			int previous_words_count, char **previous_words,
-			int narg,...)
+TailMatchesImpl(bool case_sensitive,
+int previous_words_count, char **previous_words,
+int narg,...)
 {
+	const char *argarray[64];
 	va_list		args;
 
-	if (previous_words_count != narg)
+	Assert(narg <= lengthof(argarray));
+
+	if (previous_words_count < narg)
 		return false;
 
 	va_start(args, narg);
+	for (int argno = 0; argno < narg; argno++)
+		argarray[argno] = va_arg(args, const char *);
+	va_end(args);
+
+	return TailMatchesArray(case_sensitive,
+			previous_words_count, previous_words,
+			narg, argarray);
+}
+
+/*
+ * Implementation of HeadMatches and HeadMatchesCS tests: do the first N
+ * words in previous_words match the pattern arguments?
+ */
+static bool
+HeadMatchesArray(bool case_sensitive,
+ int previous_words_count, char **previous_words,
+ int narg, const char *const *args)
+{
+	if (previous_words_count < narg)
+		return false;
 
 	for (int argno = 0; argno < narg; argno++)
 	{
-		const char *arg = va_arg(args, const char *);
+		const char *arg = args[argno];
 
-		if (!word_matches(arg, previous_words[narg - argno - 1],
+		if (!word_matches(a

Re: pg_upgrade failing for 200+ million Large Objects

2024-07-26 Thread Tom Lane
Alexander Korotkov  writes:
> On Wed, Jul 24, 2024 at 5:18 PM Justin Pryzby  wrote:
>> We'd surely prefer that the transaction size be configurable.

> I think we can add an option to pg_upgrade.  But I wonder if there is
> something else we can do.

Yeah, I'm not enamored of adding a command-line option, if only
because I think a lot of people invoke pg_upgrade through
vendor-provided scripts that aren't going to cooperate with that.
If we can find some way to make it adapt without help, that
would be much better.

regards, tom lane




Re: Assertion failure with summarize_wal enabled during pg_createsubscriber

2024-07-26 Thread Alexander Korotkov
On Fri, Jul 26, 2024 at 7:02 PM Robert Haas  wrote:
> On Fri, Jul 26, 2024 at 11:51 AM Nathan Bossart
>  wrote:
> > nitpick:  I think this one needs a pgindent.
>
> Ugh, sorry. I thought of that while I was working on the commit but
> then I messed up some other aspect of it and this went out of my head.
>
> Fixed now, I hope.

0002 could also use pg_indent and pgperltidy.  And I have couple other
notes regarding 0002.

>In the process of fixing these bugs, I realized that the logic to wait
>for WAL summarization to catch up was spread out in a way that made
>it difficult to reuse properly, so this code refactors things to make
>it easier.

It would be nice to split refactoring out of material logic changed.
This way it would be easier to review and would look cleaner in the
git history.

>To make this fix work, also teach the WAL summarizer that after a
>promotion has occurred, no more WAL can appear on the previous
>timeline: previously, the WAL summarizer wouldn't switch to the new
>timeline until we actually started writing WAL there, but that meant
>that when the startup process was waiting for the WAL summarizer, it
>was waiting for an action that the summarizer wasn't yet prepared to
>take.

I think this is a pretty long sentence, and I'm not sure I can
understand it correctly.  Does startup process wait for the WAL
summarizer without this patch?  If not, it's not clear for me that
word "previously" doesn't distribute to this part of sentence.
Breaking this into multiple sentences could improve the clarity for
me.

--
Regards,
Alexander Korotkov
Supabase




Re: Incremental backup from a streaming replication standby fails

2024-07-26 Thread Alexander Korotkov
On Fri, Jul 26, 2024 at 4:11 PM Robert Haas  wrote:
> On Fri, Jul 26, 2024 at 1:09 AM Laurenz Albe  wrote:
> > > Committed this version to master and v17.
> >
> > Thanks for taking care of this.
>
> Sure thing!
>
> I knew it was going to confuse someone ... I just wasn't sure what to
> do about it. Now we've at least done something, which is hopefully
> superior to nothing.

Great!  Should we mark the corresponding v17 open item as closed?

--
Regards,
Alexander Korotkov
Supabase




Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2024-07-26 Thread Melanie Plageman
On Mon, Jul 22, 2024 at 9:26 PM Masahiko Sawada  wrote:
>
> +   CREATE TABLE ${table1}(col1 int)
> +   WITH (autovacuum_enabled=false, fillfactor=10);
> +   INSERT INTO $table1 VALUES(7);
> +   INSERT INTO $table1 SELECT generate_series(1, $nrows) % 3;
> +   CREATE INDEX on ${table1}(col1);
> +   UPDATE $table1 SET col1 = 3 WHERE col1 = 0;
> +   INSERT INTO $table1 VALUES(7);
>
> These queries make sense to me; these make the radix tree wide and use
> more nodes, instead of fattening lead nodes (i.e. the offset bitmap).
> The $table1 has 18182 blocks and the statistics of radix tree shows:
>
> max_val = 65535
> num_keys = 18182
> height = 1, n4 = 0, n16 = 1, n32 = 0, n64 = 0, n256 = 72, leaves = 18182
>
> Which means that the height of the tree is 2 and we use the maximum
> size node for all nodes except for 1 node.

Do you have some kind of tool that prints this out for you? That would
be really handy.

> I don't have any great idea to substantially reduce the total number
> of tuples in the $table1. Probably we can use DELETE instead of UPDATE
> to make garbage tuples (although I'm not sure it's okay for this
> test). Which reduces the amount of WAL records from 11MB to 4MB and
> would reduce the time to catch up. But I'm not sure how much it would
> help. There might be ideas to trigger a two-round index vacuum with
> fewer tuples but if the tests are too optimized for the current
> TidStore, we will have to re-adjust them if the TidStore changes in
> the future. So I think it's better and reliable to allow
> maintenance_work_mem to be a lower value or use injection points
> somehow.

I think we can make improvements in overall time on master and 17 with
the examples John provided later in the thread. However, I realized
you are right about using a DELETE instead of an UPDATE. At some point
in my development, I needed the UPDATE to satisfy some other aspect of
the test. But that is no longer true. A DELETE works just as well as
an UPDATE WRT the dead items and, as you point out, much less WAL is
created and replay is much faster.

I also realized I forgot to add 043_vacuum_horizon_floor.pl to
src/test/recovery/meson.build in 16. I will post a patch here this
weekend which changes the UPDATE to a DELETE in 14-16 (sped up the
test by about 20% for me locally) and adds 043_vacuum_horizon_floor.pl
to src/test/recovery/meson.build in 16. I'll plan to push it on Monday
to save myself any weekend buildfarm embarrassment.

As for 17 and master, I'm going to try out John's examples and see if
it seems like it will be fast enough to commit to 17/master without
lowering the maintenance_work_mem lower bound.

If we want to lower it, I wonder if we just halve it -- since it seems
like the tests with half the number of tuples were fast enough to
avoid timing out on slow animals on the buildfarm? Or do we need some
more meaningful value to decrease it to?

- Melanie




Re: problems with "Shared Memory and Semaphores" section of docs

2024-07-26 Thread Nathan Bossart
Committed.

-- 
nathan




Re: tiny step toward threading: reduce dependence on setlocale()

2024-07-26 Thread Jeff Davis
On Fri, 2024-07-26 at 19:38 +0200, Andreas Karlsson wrote:
> Nice refactoring!
> 
> Two small comments about CheckMyDatabase().
> 
> - Shouldn't we look at the default_locale.ctype_is_c when setting 
> database_ctype_is_c instead of doing a strcmp()? or maybe we should
> even 
> remove the global variable and always look at the default_locale?

database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.

Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.

> - I think that the lookup of Anum_pg_database_datlocale could be done
> later in the code since it is not needed when we use a libc locale.
> E.g. 
> as below.

Done, thank you.

> Also is there any reaosn you do not squash th 4th and the 6th patch?

Done. I had to rearrange the patch ordering a bit because prior to the
cache refactoring patch, it's unsafe to call
pg_newlocale_from_collation() without checking lc_collate_is_c() or
lc_ctype_is_c() first.

Regards,
Jeff Davis

From 8a98af04912afedcb481d0e3851a485a63baf3d9 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 5 Jun 2024 11:45:55 -0700
Subject: [PATCH v4 1/5] Make database default collation internal to
 pg_locale.c.

---
 src/backend/utils/adt/pg_locale.c | 69 +--
 src/backend/utils/init/postinit.c | 44 
 src/include/utils/pg_locale.h |  3 +-
 3 files changed, 74 insertions(+), 42 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 38c40a40489..1653e997d9b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -56,6 +56,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -116,6 +117,8 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
+static struct pg_locale_struct default_locale;
+
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
 static bool CurrentLCTimeValid = false;
@@ -1443,8 +1446,6 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
-struct pg_locale_struct default_locale;
-
 void
 make_icu_collator(const char *iculocstr,
   const char *icurules,
@@ -1539,7 +1540,69 @@ pg_locale_deterministic(pg_locale_t locale)
 }
 
 /*
- * Create a locale_t from a collation OID.  Results are cached for the
+ * Initialize default_locale with database locale settings.
+ */
+void
+init_database_collation(void)
+{
+	HeapTuple	tup;
+	Form_pg_database dbform;
+	Datum		datum;
+	bool		isnull;
+
+	/* Fetch our pg_database row normally, via syscache */
+	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	dbform = (Form_pg_database) GETSTRUCT(tup);
+
+	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+	{
+		char	   *datlocale;
+
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+		datlocale = TextDatumGetCString(datum);
+
+		builtin_validate_locale(dbform->encoding, datlocale);
+
+		default_locale.info.builtin.locale = MemoryContextStrdup(
+ TopMemoryContext, datlocale);
+	}
+	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+	{
+		char	   *datlocale;
+		char	   *icurules;
+
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+		datlocale = TextDatumGetCString(datum);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(datlocale, icurules, &default_locale);
+	}
+	else
+	{
+		Assert(dbform->datlocprovider == COLLPROVIDER_LIBC);
+	}
+
+	default_locale.provider = dbform->datlocprovider;
+
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
+	ReleaseSysCache(tup);
+}
+
+/*
+ * Create a pg_locale_t from a collation OID.  Results are cached for the
  * lifetime of the backend.  Thus, do not free the result with freelocale().
  *
  * As a special optimization, the default/database collation returns 0.
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 25867c8bd5b..3537df37056 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -318,7 +318,6 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	bool		isnull;
 	char	  

Re: pg_upgrade failing for 200+ million Large Objects

2024-07-26 Thread Justin Pryzby
On Wed, Jul 24, 2024 at 09:17:51AM -0500, Justin Pryzby wrote:
> With partitioning, we have a lot of tables, some of them wide (126
> partitioned tables, 8942 childs, total 1019315 columns).

On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote:
> It would be nice to identify such cases and check which memory contexts are
> growing and why.

I reproduced the problem with this schema:

SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', 
array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM 
generate_series(1,999)i);
SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', 
i,i,i+1) FROM generate_series(1,999)i;

This used over 4 GB of RAM.
3114201 pryzbyj   20   0 5924520   4.2g  32476 T   0.0  53.8   0:27.35 
postgres: pryzbyj postgres [local] UPDATE

The large context is:
2024-07-26 15:22:19.280 CDT [3114201] LOG:  level: 1; CacheMemoryContext: 
5211209088 total in 50067 blocks; 420688 free (14 chunks); 5210788400 used

Note that there seemed to be no issue when I created 999 tables without
partitioning:

SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(1,999)i;

-- 
Justin




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2024-07-26 Thread Alexander Korotkov
On Fri, Jul 26, 2024 at 3:42 AM Noah Misch  wrote:
> On Thu, Jul 25, 2024 at 10:52:13AM +0900, Michael Paquier wrote:
> > On Wed, Jul 24, 2024 at 06:00:59AM -0700, Noah Misch wrote:
> > > I'm still seeing need for s/int/int64/ at:
>
> > I am attaching a patch for all these you have spotted, switching the
> > logs to use %llx.  Does that look fine for you?
>
> Yes.  I think that completes the project.

Thanks to everybody for working on this.  It's pity I didn't notice
this is v17 open item on me.  Sorry for this.

I tool a look on commits and on other slru code for similar issue.
Everything looks good for me.

--
Regards,
Alexander Korotkov
Supabase




Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-07-26 Thread Alexander Korotkov
On Fri, Jul 26, 2024 at 5:38 PM Robert Haas  wrote:
> On Fri, Jul 26, 2024 at 8:10 AM Alexander Korotkov  
> wrote:
> > The revised version of 0001 unique checking optimization is attached.
> > I'm going to push this to v18 if no objections.
>
> I have no reason to specifically object to pushing this into 18, but I
> would like to point out that you're posting here about this but failed
> to reply to the "64-bit pg_notify page numbers truncated to 32-bit",
> an open item that was assigned to you but which, since you didn't
> respond, was eventually fixed by commits from Michael Paquier.
>
> I know it's easy to lose track of the open items list and I sometimes
> forget to check it myself, but it's rather important to stay on top of
> any open items that get assigned to you.

Yes, it's a pity I miss this open item on me.  Besides putting ashes
on my head, I think I could pay more attention on other open items.

--
Regards,
Alexander Korotkov
Supabase




Re: Speed up collation cache

2024-07-26 Thread Jeff Davis
On Thu, 2024-06-20 at 17:07 +0700, John Naylor wrote:
> On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis  wrote:
> > Attached is a patch to use simplehash.h instead, which speeds
> > things up
> > enough to make them fairly close (from around 15% slower to around
> > 8%).
> 
> +#define SH_HASH_KEY(tb, key)   hash_uint32((uint32) key)
> 
> For a static inline hash for speed reasons, we can use murmurhash32
> here, which is also inline.

Thank you, that brings it down a few more percentage points.

New patches attached, still based on the setlocale-removal patch
series.

Setup:

  create collation libc_c (provider=libc, locale='C');
  create table collation_cache_test(t text);
  insert into collation_cache_test
select g::text||' '||g::text
  from generate_series(1,2) g;

Queries:

  select * from collation_cache_test where t < '0' collate "C";
  select * from collation_cache_test where t < '0' collate libc_c;

The two collations are identical except that the former benefits from
the optimization for C_COLLATION_OID, and the latter does not, so these
queries measure the overhead of the collation cache lookup.

Results (in ms):

  "C"   "libc_c"   overhead
   master:6350 7855 24%
   v4-0001:   6091 6324  4%

(Note: I don't have an explanation for the difference in performance of
the "C" locale -- probably just some noise in the test.)

Considering that simplehash brings the worst case overhead under 5%, I
don't see a big reason to use the single-element cache also.

Regards,
Jeff Davis

From 64a017f169858cf646002a28f97ae05cb7ab9fcd Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Fri, 14 Jun 2024 15:38:42 -0700
Subject: [PATCH v4] Change collation cache to use simplehash.h.

---
 src/backend/utils/adt/pg_locale.c | 39 +--
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2e6f624798f..5afb69c6632 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -57,12 +57,12 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
+#include "common/hashfn.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 #include "utils/guc_hooks.h"
-#include "utils/hsearch.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
@@ -129,10 +129,27 @@ typedef struct
 {
 	Oid			collid;			/* hash key: pg_collation OID */
 	pg_locale_t locale;			/* locale_t struct, or 0 if not valid */
-} collation_cache_entry;
 
-static HTAB *collation_cache = NULL;
+	/* needed for simplehash */
+	uint32		hash;
+	char		status;
+} collation_cache_entry;
 
+#define SH_PREFIX		collation_cache
+#define SH_ELEMENT_TYPE	collation_cache_entry
+#define SH_KEY_TYPE		Oid
+#define SH_KEY			collid
+#define SH_HASH_KEY(tb, key)   	murmurhash32((uint32) key)
+#define SH_EQUAL(tb, a, b)		(a == b)
+#define SH_GET_HASH(tb, a)		a->hash
+#define SH_SCOPE		static inline
+#define SH_STORE_HASH
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+static MemoryContext CollationCacheContext = NULL;
+static collation_cache_hash *CollationCache = NULL;
 
 #if defined(WIN32) && defined(LC_MESSAGES)
 static char *IsoLocaleName(const char *);
@@ -1219,18 +1236,16 @@ lookup_collation_cache(Oid collation)
 	Assert(OidIsValid(collation));
 	Assert(collation != DEFAULT_COLLATION_OID);
 
-	if (collation_cache == NULL)
+	if (CollationCache == NULL)
 	{
-		/* First time through, initialize the hash table */
-		HASHCTL		ctl;
-
-		ctl.keysize = sizeof(Oid);
-		ctl.entrysize = sizeof(collation_cache_entry);
-		collation_cache = hash_create("Collation cache", 100, &ctl,
-	  HASH_ELEM | HASH_BLOBS);
+		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
+	  "collation cache",
+	  ALLOCSET_DEFAULT_SIZES);
+		CollationCache = collation_cache_create(
+			CollationCacheContext, 16, NULL);
 	}
 
-	cache_entry = hash_search(collation_cache, &collation, HASH_ENTER, &found);
+	cache_entry = collation_cache_insert(CollationCache, collation, &found);
 	if (!found)
 	{
 		/*
-- 
2.34.1



Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2024-07-26 Thread Melanie Plageman
On Wed, Jul 24, 2024 at 8:19 AM John Naylor  wrote:
>
> On Wed, Jul 24, 2024 at 2:42 PM John Naylor  wrote:
> > As for lowering the limit, we've experimented with 256kB here:
> >
> > https://www.postgresql.org/message-id/canwcazzutvz3lsypauyqvzcezxz7qe+9ntnhgyzdtwxpul+...@mail.gmail.com
> >
> > As I mention there, going lower than that would need a small amount of
> > reorganization in the radix tree. Not difficult -- the thing I'm
> > concerned about is that we'd likely need to document a separate
> > minimum for DSA, since that behaves strangely with 256kB and might not
> > work at all lower than that.
>
> For experimentation, here's a rough patch (really two, squashed
> together for now) that allows m_w_m to go down to 64kB.

Oh, great, thanks! I didn't read this closely enough before I posed my
upthread question about how small we should make the minimum. It
sounds like you've thought a lot about this.

I ran my test with your patch (on my 64-bit system, non-assert build)
and the result is great:

master with my test (slightly modified to now use DELETE instead of
UPDATE as mentioned upthread)
3.09s

master with your patch applied, MWM set to 64kB and 9000 rows instead of 80
1.06s

> drop table if exists test;
> create table test (a int) with (autovacuum_enabled=false, fillfactor=10);
> insert into test (a) select i from generate_series(1,2000) i;
> create index on test (a);
> update test set a = a + 1;
>
> set maintenance_work_mem = '64kB';
> vacuum (verbose) test;
>
> INFO:  vacuuming "john.public.test"
> INFO:  finished vacuuming "john.public.test": index scans: 3
> pages: 0 removed, 91 remain, 91 scanned (100.00% of total)
>
> The advantage with this is that we don't need to care about
> MEMORY_CONTEXT_CHECKING or 32/64 bit-ness, since allocating a single
> large node will immediately blow the limit, and that will happen
> fairly quickly regardless. I suspect going this low will not work with
> dynamic shared memory and if so would need a warning comment.

I took a look at the patch, but I can't say I know enough about the
memory allocation subsystems and how TIDStore works to meaningfully
review it -- nor enough about DSM to comment about the interactions.

I suspect 256kB would also be fast enough to avoid my test timing out
on the buildfarm, but it is appealing to have a minimum for
maintenance_work_mem that is the same as work_mem.

- Melanie




Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2024-07-26 Thread Melanie Plageman
On Wed, Jul 24, 2024 at 3:43 AM John Naylor  wrote:
>
> On Wed, Jul 24, 2024 at 5:40 AM Masahiko Sawada  wrote:
>
> > Without MEMORY_CONTEXT_CHECK, if size is 16 bytes, required_size is
> > also 16 bytes as it's already 8-byte aligned and Bump_CHUNKHDRSZ is 0.
> > On the other hand with MEMORY_CONTEXT_CHECK, the requied_size is
> > bumped to 40 bytes as chunk_size is 24 bytes and Bump_CHUNKHDRSZ is 16
> > bytes. Therefore, with MEMORY_CONTEXT_CHECK, we allocate more memory
> > and use more Bump memory blocks, resulting in filling up TidStore in
> > the test cases. We can easily reproduce this test failure with
> > PostgreSQL server built without --enable-cassert. It seems that
> > copperhead is the sole BF animal that doesn't use --enable-cassert but
> > runs recovery-check.
>
> It seems we could force the bitmaps to be larger, and also reduce the
> number of updated tuples by updating only the last few tuples (say
> 5-10) by looking at the ctid's offset. This requires some trickery,
> but I believe I've done it in the past by casting to text and
> extracting with a regex. (I'm assuming the number of tuples updated is
> more important than the number of tuples inserted on a newly created
> table.)

Yes, the only thing that is important is having two rounds of index
vacuuming and having one tuple with a value matching my cursor
condition before the first index vacuum and one after. What do you
mean update only the last few tuples though?

- Melanie




Re: pg_upgrade failing for 200+ million Large Objects

2024-07-26 Thread Alexander Korotkov
On Fri, Jul 26, 2024 at 11:36 PM Justin Pryzby  wrote:
> On Wed, Jul 24, 2024 at 09:17:51AM -0500, Justin Pryzby wrote:
> > With partitioning, we have a lot of tables, some of them wide (126
> > partitioned tables, 8942 childs, total 1019315 columns).
>
> On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote:
> > It would be nice to identify such cases and check which memory contexts are
> > growing and why.
>
> I reproduced the problem with this schema:
>
> SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', 
> array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM 
> generate_series(1,999)i);
> SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', 
> i,i,i+1) FROM generate_series(1,999)i;
>
> This used over 4 GB of RAM.
> 3114201 pryzbyj   20   0 5924520   4.2g  32476 T   0.0  53.8   0:27.35 
> postgres: pryzbyj postgres [local] UPDATE
>
> The large context is:
> 2024-07-26 15:22:19.280 CDT [3114201] LOG:  level: 1; CacheMemoryContext: 
> 5211209088 total in 50067 blocks; 420688 free (14 chunks); 5210788400 used
>
> Note that there seemed to be no issue when I created 999 tables without
> partitioning:
>
> SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM 
> generate_series(1,999)i;

Thank you!  That was quick.
I'm looking into this.

--
Regards,
Alexander Korotkov
Supabase




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2024-07-26 Thread Michael Paquier
On Fri, Jul 26, 2024 at 11:50:48PM +0300, Alexander Korotkov wrote:
> Thanks to everybody for working on this.  It's pity I didn't notice
> this is v17 open item on me.  Sorry for this.

No problem.  I've just applied now the remaining pieces down to 17.
--
Michael


signature.asc
Description: PGP signature


Re: Support LIKE with nondeterministic collations

2024-07-26 Thread Paul A Jungwirth
On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut  wrote:
> Here is an updated patch for this.

I took a look at this. I added some tests and found a few that give
the wrong result (I believe). The new tests are included in the
attached patch, along with the results I expect. Here are the
failures:

 -- inner %% matches b then zero:
 SELECT U&'cb\0061\0308' LIKE U&'c%%\00E4' COLLATE ignore_accents;
  ?column?
 --
- t
+ f
 (1 row)

 -- trailing _ matches two codepoints that form one char:
 SELECT U&'cb\0061\0308' LIKE U&'cb_' COLLATE ignore_accents;
  ?column?
 --
- t
+ f
 (1 row)

-- leading % matches zero:
 SELECT U&'\0061\0308bc' LIKE U&'%\00E4bc' COLLATE ignore_accents;
  ?column?
 --
- t
+ f
 (1 row)

 -- leading % matches zero (with later %):
 SELECT U&'\0061\0308bc' LIKE U&'%\00E4%c' COLLATE ignore_accents;
  ?column?
 --
- t
+ f
 (1 row)

I think the 1st, 3rd, and 4th failures are all from % not backtracking
to match zero chars.

The 2nd failure I'm not sure about. Maybe my expectation is wrong, but
then why does the same test pass with __ leading not trailing? Surely
they should be consistent.

> I have added some more documentation based on the discussions, including
> some examples taken directly from the emails here.

This looks good to me.

> One thing I have been struggling with a bit is the correct use of
> LIKE_FALSE versus LIKE_ABORT in the MatchText() code.  I have made some
> small tweaks about this in this version that I think are more correct,
> but it could use another look.  Maybe also some more tests to verify
> this one way or the other.

I haven't looked at this yet.

Yours,

-- 
Paul  ~{:-)
p...@illuminatedcomputing.com


v3-0001-Support-LIKE-with-nondeterministic-collations.patch
Description: Binary data


Re: Use pgBufferUsage for block reporting in analyze

2024-07-26 Thread Masahiko Sawada
On Wed, Jul 24, 2024 at 1:58 AM Anthonin Bonnefoy
 wrote:
>
> On Mon, Jul 22, 2024 at 10:59 PM Masahiko Sawada  
> wrote:
> > The first line would vary depending on whether an autovacuum worker or
> > not. And the above suggestion includes a change of term "row" to
> > "tuple" for better consistency with VACUUM VERBOSE outputs. I think it
> > would be great if autoanalyze also writes logs in the same format.
> > IIUC with the patch, autoanalyze logs don't include the page and tuple
> > statistics.
>
> One issue is that the number of scanned pages, live tuples and dead
> tuples is only available in acquire_sample_rows which is where the log
> containing those stats is emitted. I've tried to implement the
> following in 0003:
> - Sampling functions now accept an AcquireSampleStats struct to store
> pages and tuples stats
> - Log is removed from sampling functions
> - do_analyze_rel now outputs scanned and tuples statistics when
> relevant. sampling from fdw doesn't provide those statistics so they
> are not displayed in those cases.

Studying how we write verbose log messages, it seems that currently
ANALYZE (autoanalyze) lets tables and FDWs write logs in its own
format. Which makes sense to me as some instruments for heap such as
dead tuple might not be necessary for FDWs and FDW might want to write
other information such as executed queries. An alternative idea would
be to pass StringInfo to AcquireSampleRowsFunc() so that callback can
write its messages there. This is somewhat similar to what we do in
the EXPLAIN command (cf, ExplainPropertyText() etc). It could be too
much but I think it could be better than writing logs in the single
format.

>
> I've also slightly modified 0002 to display "automatic analyze" when
> we're inside an autovacuum worker, similar to what's done with vacuum
> output.

+1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




  1   2   >