date:20240701

Re: Surround CheckRelation[Oid]LockedByMe() with USE_ASSERT_CHECKING

2024-07-01 Thread Bharath Rupireddy

Hi,

On Mon, Jul 1, 2024 at 12:12 PM Bertrand Drouvot
 wrote:
>
> Hi hackers,
>
> While working on a rebase for [1] due to 0cecc908e97, I noticed that
> CheckRelationLockedByMe() and CheckRelationOidLockedByMe() are used only in
> assertions.
>
> I think it would make sense to declare / define those functions only for
> assert enabled build: please find attached a tiny patch doing so.
>
> Thoughts?

If turning the CheckRelationXXXLocked() compile for non-assert builds,
why not do the same for LWLockHeldByMe, LWLockAnyHeldByMe and
LWLockHeldByMeInMode that are debug-only and being used in asserts?
While it might reduce the compiled binary size a bit for release
builds, we may have to be cautious about external or out of core
modules using them.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Virtual generated columns

2024-07-01 Thread Peter Eisentraut


On 28.06.24 02:00, jian he wrote:

inhttps://www.postgresql.org/docs/current/catalog-pg-attrdef.html
The catalog pg_attrdef stores column default values. The main
information about columns is stored in pg_attribute. Only columns for
which a default value has been explicitly set will have an entry here.
didn't mention generated columns related expressions.
Do we need to add something here? maybe a separate issue?


Yes and yes.  I have committed a separate change to update the 
documentation here.

Re: pgsql: Add more SQL/JSON constructor functions

2024-07-01 Thread jian he

On Sun, Jun 30, 2024 at 2:24 AM Alvaro Herrera  wrote:
>
> TBH I'm not super clear on why we decide on explicit or implicit cast
> based on presence of a typmod.  Why isn't it better to always use an
> implicit one?
>

I am using an example to explain it.
SELECT JSON_SERIALIZE(JSON('{ "a" : 1 } '));
we cannot directly use implicit cast from json to text in
{coerceJsonFuncExpr, coerce_to_target_type}
because function calls:
coerceJsonFuncExpr->coerce_to_target_type->can_coerce_type
->find_coercion_pathway
will look up pg_cast entries.
but we don't have text & json implicit cast entries, we will fail at:


if (!res && report_error)
ereport(ERROR,
errcode(ERRCODE_CANNOT_COERCE),
errmsg("cannot cast type %s to %s",
   format_type_be(exprtype),
   format_type_be(returning->typid)),
parser_coercion_errposition(pstate, location, expr));


Most of the cast uses explicit cast, which is what we previously did,
then in this thread, we found out for the returning type typmod(
(varchar, or varchar's domain)
We need to first cast the expression to text then text to varchar via
implicit cast.
To trap the error:
for example: SELECT JSON_SERIALIZE('{ "a" : 1 } ' RETURNING varchar(2);
also see the comment:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=c2d93c3802b205d135d1ae1d7ac167d74e08a274
+ /*
+ * Convert the source expression to text, because coerceJsonFuncExpr()
+ * will create an implicit cast to the RETURNING types with typmod and
+ * there are no implicit casts from json(b) to such types. For domains,
+ * the base type's typmod will be considered, so do so here too.
+ */
In general, I think implicit cast here is an exception.

overall I come up with following logic:
-
int32 baseTypmod = -1;
if (returning->typmod < 0)
(void) getBaseTypeAndTypmod(returning->typid, &baseTypmod);
else
baseTypmod = returning->typmod;

res = coerce_to_target_type(pstate, expr, exprtype,
returning->typid, baseTypmod,
baseTypmod > 0 ? COERCION_IMPLICIT :
COERCION_EXPLICIT,
baseTypmod > 0 ? COERCE_IMPLICIT_CAST :
COERCE_EXPLICIT_CAST,
location);
-
By the same way we are dealing with varchar,
I came up with a verbose patch for transformJsonBehavior,
which can cope with all the corner cases of bit and varbit data type.
I also attached a test sql file (scratch169.sql) for it.
some examples:
--fail
SELECT JSON_VALUE(jsonb '"111a"', '$'  RETURNING bit(3) default ''
on error);
--ok
SELECT JSON_VALUE(jsonb '"111a"', '$'  RETURNING bit(3) default '111' on error);
--ok
SELECT JSON_VALUE(jsonb '"111a"', '$'  RETURNING bit(3) default 32 on error);


makeJsonConstructorExpr we called (void)
getBaseTypeAndTypmod(returning->typid, &baseTypmod);
later in coerceJsonFuncExpr
we may also call (void) getBaseTypeAndTypmod(returning->typid, &baseTypmod);
maybe we can do some refactoring.
From 5f2d070d3cc47f7461d1474a5fe18e905243b31b Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 1 Jul 2024 14:26:09 +0800
Subject: [PATCH v1 1/1] hanlde types that have type modifier for json on
 error, on empty

---
 src/backend/parser/parse_expr.c | 83 ++---
 1 file changed, 65 insertions(+), 18 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 560b3606..15cdea1c 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -3582,7 +3582,7 @@ coerceJsonFuncExpr(ParseState *pstate, Node *expr,
 	Node	   *res;
 	int			location;
 	Oid			exprtype = exprType(expr);
-	int32		baseTypmod = returning->typmod;
+	int32		baseTypmod = -1;
 
 	/* if output type is not specified or equals to function type, return */
 	if (!OidIsValid(returning->typid) || returning->typid == exprtype)
@@ -3615,8 +3615,10 @@ coerceJsonFuncExpr(ParseState *pstate, Node *expr,
 	 * For domains, consider the base type's typmod to decide whether to setup
 	 * an implicit or explicit cast.
 	 */
-	if (get_typtype(returning->typid) == TYPTYPE_DOMAIN)
+	if (returning->typmod < 0)
 		(void) getBaseTypeAndTypmod(returning->typid, &baseTypmod);
+	else
+		baseTypmod = returning->typmod;
 
 	/* try to coerce expression to the output type */
 	res = coerce_to_target_type(pstate, expr, exprtype,
@@ -3649,7 +3651,7 @@ makeJsonConstructorExpr(ParseState *pstate, JsonConstructorType type,
 	JsonConstructorExpr *jsctor = makeNode(JsonConstructorExpr);
 	Node	   *placeholder;
 	Node	   *coercion;
-	int32		baseTypmod = returning->typmod;
+	int32		baseTypmod = -1;
 
 	jsctor->args = args;
 	jsctor->func = fexpr;
@@ -3693,8 +3695,10 @@ makeJsonConstructorExpr(ParseState *pstate, JsonConstructorType type,
 	 * there are no implicit casts from json(b) to such types.  For domains,
 	 * the base type's typmod will be considered, so do so here too.
 	 */
-	if (get_typtype(returning->typid) == TYPTYPE_DOMAIN)
+	if (returning->typmod < 0)
 		(void) getBaseTypeAndTypmod(returning->typid, &baseTypmod);
+	else
+		baseTypmod = returning->typmod;
 	if (baseTypmod > 0)
 		placeholder = coe

Re: Logical Replication of sequences

2024-07-01 Thread Peter Smith

Here are some review comments for the patch v20240625-0002

==
Commit Message

1.
This commit enhances logical replication by enabling the inclusion of all
sequences in publications. This improvement facilitates seamless
synchronization of sequence data during operations such as
CREATE SUBSCRIPTION, REFRESH PUBLICATION, and REFRESH PUBLICATION SEQUENCES.

~

Isn't this description getting ahead of the functionality a bit? For
example, it talks about operations like REFRESH PUBLICATION SEQUENCES
but AFAIK that syntax does not exist just yet.

~~~

2.
The commit message should mention that you are only introducing new
syntax for "FOR ALL SEQUENCES" here, but syntax for "FOR SEQUENCE" is
being deferred to some later patch. Without such a note it is not
clear why the gram.y syntax and docs seemed only half done.

==
doc/src/sgml/ref/create_publication.sgml

3.

 FOR ALL TABLES
+FOR ALL SEQUENCES
 
  
-  Marks the publication as one that replicates changes for all tables in
-  the database, including tables created in the future.
+  Marks the publication as one that replicates changes for all tables or
+  sequences in the database, including tables created in the future.

It might be better here to keep descriptions for "ALL TABLES" and "ALL
SEQUENCES" separated, otherwise the wording does not quite seem
appropriate for sequences (e.g. where it says "including tables
created in the future").

~~~

NITPICK - missing spaces
NITPICK - removed Oxford commas since previously there were none

~~~

4.
+   If FOR TABLE, FOR ALL TABLES,
+   FOR ALL SEQUENCES,or FOR TABLES IN
SCHEMA
+   are not specified, then the publication starts out with an empty set of
+   tables.  That is useful if tables or schemas are to be added later.

It seems like "FOR ALL SEQUENCES" is out of place since it is jammed
between other clauses referring to TABLES. Would it be better to
mention SEQUENCES last in the list?

~~~

5.
+   rights on the table.  The FOR ALL TABLES,
+   FOR ALL SEQUENCES, and
FOR TABLES IN SCHEMA clauses require the invoking

ditto of #4 above.

==
src/backend/catalog/pg_publication.c

GetAllSequencesPublicationRelations:

NITPICK - typo /relation/relations/

==
src/backend/commands/publicationcmds.c

6.
+ foreach(lc, stmt->for_all_objects)
+ {
+ char*val = strVal(lfirst(lc));
+
+ if (strcmp(val, "tables") == 0)
+ for_all_tables = true;
+ else if (strcmp(val, "sequences") == 0)
+ for_all_sequences = true;
+ }

Consider the foreach_ptr macro to slightly simplify this code.
Actually, this whole logic seems cumbersome -- can’t the parser assign
flags automatically. Please see my more detailed comment #10 below
about this in gram.y

~~~

7.
  /* FOR ALL TABLES requires superuser */
- if (stmt->for_all_tables && !superuser())
+ if (for_all_tables && !superuser())
  ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  errmsg("must be superuser to create FOR ALL TABLES publication")));

+ /* FOR ALL SEQUENCES requires superuser */
+ if (for_all_sequences && !superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create FOR ALL SEQUENCES publication")));
+

The current code is easy to read, but I wonder if it should try harder
to share common code, or at least a common translatable message like
"must be superuser to create %s publication".

~~~

8.
- else
+
+ /*
+ * If the publication might have either tables or sequences (directly or
+ * through a schema), process that.
+ */
+ if (!for_all_tables || !for_all_sequences)

I did not understand why this code cannot just say "else" like before,
because the direct or through-schema syntax cannot be specified at the
same time as "FOR ALL ...", so why is the more complicated condition
necessary? Also, the similar code in AlterPublicationOptions() was not
changed to be like this.

==
src/backend/parser/gram.y

9. comment

  *
  * CREATE PUBLICATION FOR ALL TABLES [WITH options]
  *
+ * CREATE PUBLICATION FOR ALL SEQUENCES [WITH options]
+ *
  * CREATE PUBLICATION FOR pub_obj [, ...] [WITH options]

The comment is not quite correct because actually you are allowing
simultaneous FOR ALL TABLES, SEQUENCES. It should be more like:

CREATE PUBLICATION FOR ALL pub_obj_type [,...] [WITH options]

pub_obj_type is one of:
TABLES
SEQUENCES

~~~

10.
+pub_obj_type: TABLES
+ { $$ = (Node *) makeString("tables"); }
+ | SEQUENCES
+ { $$ = (Node *) makeString("sequences"); }
+ ;
+
+pub_obj_type_list: pub_obj_type
+ { $$ = list_make1($1); }
+ | pub_obj_type_list ',' pub_obj_type
+ { $$ = lappend($1, $3); }
+ ;

IIUC the only thing you need is a flag to say if FOR ALL TABLE is in
effect and another flag to say if FOR ALL SEQUENCES is in effect. So,
It seemed clunky to build up a temporary list of "tables" or
"sequences" strings here, which is subsequently scanned by
CreatePublication to be turned back into booleans.

Can't we just change the CreatePublicationStmt field to have:

A) a

Re: [PATCH] Fix docs to use canonical links

2024-07-01 Thread Daniel Gustafsson

> On 1 Jul 2024, at 08:06, Michael Paquier  wrote:
> 
> On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote:
>> During work in the separate thread [1], I discovered more cases
>> where the link in docs wasn't the canonical link [2].
>> 
>> [1] 
>> https://postgr.es/m/cakfquwyex9pj9g0zhjewsmsbnquygh+fycw-66ezjfvg4ko...@mail.gmail.com
>> [2] https://en.wikipedia.org/wiki/Canonical_link_element
>> 
>> The. below script e.g. doesn't parse SGML, and is broken in some other ways
>> also, but probably good enough to suggest changes that can then be manually
>> carefully verified.
> 
> The 19 links you are updating here avoid redirections in Wikipedia and
> the Postgres wiki.  It's always a bit of a chicken-and-egg game in
> this area, because links always change, still I don't mind the change.

Avoding redirects is generally a good thing, not everyone is on lightning fast
internet.  Wikipedia is however not doing any 30X redirects so it's not really
an issue for those links, it's all 200 requests.

--
Daniel Gustafsson

Re: Backporting BackgroundPsql

2024-07-01 Thread Daniel Gustafsson

> On 29 Jun 2024, at 06:38, Pavan Deolasee  wrote:

> Don't we need to add install and uninstall rules for the new module, like we 
> did in 
> https://git.postgresql.org/pg/commitdiff/a4c17c86176cfa712f541b81b2a026ae054b275e
>  and  
> https://git.postgresql.org/pg/commitdiff/7039c7cff6736780c3bbb41a90a6dfea0f581ad2
>  ?

Thats correct, we should backport those as well.

--
Daniel Gustafsson

Re: Conflict Detection and Resolution

2024-07-01 Thread Ajin Cherian

On Thu, Jun 27, 2024 at 1:14 PM Nisha Moond 
wrote:

> Please find the attached  'patch0003', which implements conflict
> resolutions according to the global resolver settings.
>
> Summary of Conflict Resolutions Implemented in 'patch0003':
>
> INSERT Conflicts:
> 
> 1) Conflict Type: 'insert_exists'
>
> Supported Resolutions:
> a) 'remote_apply': Convert the INSERT to an UPDATE and apply.
> b) 'keep_local': Ignore the incoming (conflicting) INSERT and retain
> the local tuple.
> c) 'error': The apply worker will error out and restart.
>
>
Hi Nisha,

While testing the patch, when conflict resolution is configured and
insert_exists is set to "remote_apply", I see this warning in the logs due
to a resource not being closed:

2024-07-01 02:52:59.427 EDT [20304] LOG:  conflict insert_exists detected
on relation "public.test1"
2024-07-01 02:52:59.427 EDT [20304] DETAIL:  Key already exists. Applying
resolution method "remote_apply"
2024-07-01 02:52:59.427 EDT [20304] CONTEXT:  processing remote data for
replication origin "pg_16417" during message type "INSERT" for replication
target relation "public.test1" in transaction 763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] WARNING:  resource was not closed:
[138] (rel=base/5/16413, blockNum=0, flags=0x9380, refcount=1 1)
2024-07-01 02:52:59.427 EDT [20304] CONTEXT:  processing remote data for
replication origin "pg_16417" during message type "COMMIT" in transaction
763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] WARNING:  resource was not closed:
TupleDesc 0x7f8c0439e448 (16402,-1)
2024-07-01 02:52:59.427 EDT [20304] CONTEXT:  processing remote data for
replication origin "pg_16417" during message type "COMMIT" in transaction
763, finished at 0/15E7F68

regards,
Ajin Cherian
Fujitsu Australia

Re: Surround CheckRelation[Oid]LockedByMe() with USE_ASSERT_CHECKING

2024-07-01 Thread Bertrand Drouvot

Hi,

On Mon, Jul 01, 2024 at 12:35:34PM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> On Mon, Jul 1, 2024 at 12:12 PM Bertrand Drouvot
>  wrote:
> >
> > Hi hackers,
> >
> > While working on a rebase for [1] due to 0cecc908e97, I noticed that
> > CheckRelationLockedByMe() and CheckRelationOidLockedByMe() are used only in
> > assertions.
> >
> > I think it would make sense to declare / define those functions only for
> > assert enabled build: please find attached a tiny patch doing so.
> >
> > Thoughts?
> 
> If turning the CheckRelationXXXLocked() compile for non-assert builds,
> why not do the same for LWLockHeldByMe, LWLockAnyHeldByMe and
> LWLockHeldByMeInMode that are debug-only and being used in asserts?
> While it might reduce the compiled binary size a bit for release
> builds, we may have to be cautious about external or out of core
> modules using them.

Thanks for the feedback.

CheckRelationOidLockedByMe() is new (as it has been added in 0cecc908e97. While
its counterpart CheckRelationLockedByMe() has been added since a few years 
(2018)
in commit b04aeb0a053, I thought it would make sense to surround both of them.

While it's true that we could also surround LWLockHeldByMe() (added in 
e6cba71503f
, 2004 and signature change in ea9df812d85, 2014), LWLockAnyHeldByMe() (added in
eed959a457e, 2022) and LWLockHeldByMeInMode() (added in 016abf1fb83, 2016), I'm
not sure we should (due to their "age" and as you said we have to be cautious
about out of core modules / extensions that may use them).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Surround CheckRelation[Oid]LockedByMe() with USE_ASSERT_CHECKING

2024-07-01 Thread Michael Paquier

On Mon, Jul 01, 2024 at 06:42:46AM +, Bertrand Drouvot wrote:
> While working on a rebase for [1] due to 0cecc908e97, I noticed that
> CheckRelationLockedByMe() and CheckRelationOidLockedByMe() are used only in
> assertions.
> 
> I think it would make sense to declare / define those functions only for
> assert enabled build: please find attached a tiny patch doing so.
> 
> Thoughts?

Not convinced that's a good idea.  What about out-of-core code that
may use these routines for runtime checks in non-assert builds?
--
Michael


signature.asc
Description: PGP signature

Re: Conflict Detection and Resolution

2024-07-01 Thread Masahiko Sawada

On Thu, Jun 27, 2024 at 1:50 PM shveta malik  wrote:
>
> On Wed, Jun 26, 2024 at 2:33 PM Amit Kapila  wrote:
> >
> > On Tue, Jun 25, 2024 at 3:39 PM shveta malik  wrote:
> > >
> > > On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Jun 24, 2024 at 1:47 PM shveta malik  
> > > > wrote:
> > > > >
> > > > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > >> In the second patch, we can implement simple built-in resolution
> > > > > > >> strategies like apply and skip (which can be named as 
> > > > > > >> remote_apply and
> > > > > > >> keep_local, see [3][4] for details on these strategies) with 
> > > > > > >> ERROR or
> > > > > > >> LOG being the default strategy. We can allow these strategies to 
> > > > > > >> be
> > > > > > >> configured at the global and table level.
> > > > >
> > > > > Before we implement resolvers, we need a way to configure them. Please
> > > > > find the patch002 which attempts to implement Global Level Conflict
> > > > > Resolvers Configuration.  Note that patch002 is dependent upon
> > > > > Conflict-Detection patch001 which is reviewed in another thread [1].
> > > > > I have attached patch001 here for convenience and to avoid CFBot
> > > > > failures. But please use [1] if you have any comments on patch001.
> > > > >
> > > > > New DDL commands in patch002 are:
> > > > >
> > > > > To set global resolver for given conflcit_type:
> > > > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'
> > > > >
> > > > > To reset to default resolver:
> > > > > RESET CONFLICT RESOLVER FOR 'conflict_type'
> > > > >
> > > >
> > > > Does setting up resolvers have any meaning without subscriptions? I am
> > > > wondering whether we should allow to set up the resolvers at the
> > > > subscription level. One benefit is that users don't need to use a
> > > > different DDL to set up resolvers. The first patch gives a conflict
> > > > detection option at the subscription level, so it would be symmetrical
> > > > to provide a resolver at the subscription level. Yet another benefit
> > > > could be that it provides users facility to configure different
> > > > resolvers for a set of tables belonging to a particular
> > > > publication/node.
> > >
> > > There can be multiple tables included in a publication with varying
> > > business use-cases and thus may need different resolvers set, even
> > > though they all are part of the same publication.
> > >
> >
> > Agreed but this is the reason we are planning to keep resolvers at the
> > table level. Here, I am asking to set resolvers at the subscription
> > level rather than at the global level.
>
> Okay, got it. I misunderstood earlier that we want to replace table
> level resolvers with subscription ones.
> Having global configuration has one benefit that if the user has no
> requirement to set different resolvers for different subscriptions or
> tables, he may always set one global configuration and be done with
> it. OTOH, I also agree with benefits coming with subscription level
> configuration.

Setting resolvers at table-level and subscription-level sounds good to
me. DDLs for setting resolvers at subscription-level would need the
subscription name to be specified? And another question is: a
table-level resolver setting is precedent over all subscriber-level
resolver settings in the database?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Adminpack removal

2024-07-01 Thread Daniel Gustafsson

> On 28 Jun 2024, at 09:06, Philippe BEAUDOIN  wrote:

> So just looking in public repo covers probably less than 1% of the code. 
> However, this may give a first idea, especialy if a feature use is already 
> detected.

Searching for anything on Github is essentially a dead end since it reports so
many duplicates in forks etc.  That being said, I did a lot of searching and
browsing to find users [0], but came up empty (apart from forks which already
maintain their own copy).  A more targeted search is the Debian Code search
which at the time of removal (and well before then) showed zero occurrences of
adminpack functions in any packaged software, and no extensions which had
adminpack as a dependency.  While not an exhaustive search by any means, it
does provide a good hint.

Since you list no other extensions using adminpack to support keeping it, I
assume you also didn't find any when searching?

--
Daniel Gustafsson

[0] 
https://www.postgresql.org/message-id/B07CC211-DE35-4AC5-BD4E-0C6466700B06%40yesql.se

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-01 Thread Joel Jacobson

On Mon, Jul 1, 2024, at 08:04, Joel Jacobson wrote:
> * 0001-optimize-numeric-mul_var-small-factors.patch

New version to silence maybe-uninitialized error reported by cfbot.

/Joel

v2-0001-optimize-numeric-mul_var-small-factors.patch
Description: Binary data

Re: Wrong results with grouping sets

2024-07-01 Thread Richard Guo

On Mon, Jun 10, 2024 at 5:05 PM Richard Guo  wrote:
> This patchset does not apply any more.  Here is a new rebase.

Here is an updated version of this patchset.  I've run pgindent for it,
and also tweaked the commit messages a bit.

In principle, 0001 can be backpatched to all supported versions to fix
the cases where there are subqueries in the grouping expressions; 0002
can be backpatched to 16 where we have the nullingrels stuff.  But both
patches seem to be quite invasive.  I'm not sure if we want to backpatch
them to stable branches.  Any thoughts about backpatching?

Thanks
Richard


v9-0001-Introduce-an-RTE-for-the-grouping-step.patch
Description: Binary data


v9-0002-Mark-expressions-nullable-by-grouping-sets.patch
Description: Binary data

Re: Is missing LOGIN Event on Trigger Firing Matrix ?

2024-07-01 Thread Daniel Gustafsson

> On 27 Jun 2024, at 22:00, David G. Johnston  
> wrote:

> I suggest adding a sentence and link from the main description [1] to that 
> page.

How about the attached.

--
Daniel Gustafsson



login_evt_link.diff
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Daniel Gustafsson

> On 27 Jun 2024, at 06:01, Yugo NAGATA  wrote:
>> Em dom., 23 de jun. de 2024 às 23:56, Richard Guo 
>> escreveu:
>>> On Mon, Jun 24, 2024 at 7:51 AM Ranier Vilela  wrote:

 memcpy with strlen does not copy the whole string.
 strlen returns the exact length of the string, without
 the null-termination.
>>> 
>>> I noticed that the two callers of do_pg_backup_start both allocate
>>> BackupState with palloc0.  Can we rely on this to ensure that the
>>> BackupState.name is initialized with null-termination?
>>> 
>> I do not think so.

In this case we can, we do that today..

> However, I wonder it is better to use strlcpy without assuming such the good
> manner of callers.

..that being said I agree that it seems safer to use strlcpy() here.

--
Daniel Gustafsson

Re: LogwrtResult contended spinlock

2024-07-01 Thread Alvaro Herrera

Hello,

On 2024-Jun-29, Alexander Lakhin wrote:

> It doesn't, but the following works for me:
>  static inline uint64
>  pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 
> target_)
>  {
> -   uint64  currval;
> +   pg_attribute_aligned(8) uint64  currval;
> 
> because the failed assertion is:
> #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
>     AssertPointerAlignment(&currval, 8);
> #endif

Hah, thank you.

In the meantime I noticed that pg_attribute_aligned() is not supported
in every platform/compiler, so for safety sake I think it's better to go
with what we do for PGAlignedBlock: use a union with a double member.
That should be 8-byte aligned on x86 as well, unless I misunderstand.

BTW if things works after this fix, I suggest you get a buildfarm member
running with this configuration.  Otherwise it's quite likely that we'll
break it again.  Or we could just decide we don't care about this
particular platform ...  but AFAICS the buildfarm does have other 32-bit
animals running.

[ looks at buildfarm ]

Oh!  while looking at Adder's config, I noticed this line:
Checking for alignment of "double" : 4 
So, I do misunderstand doubles.

So my patch is not going to work.  There seems to be nothing that has
alignment 8 there, so I guess we're back to using
pg_attribute_aligned() and abort the compile if that doesn't exist.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"
>From 9d240e90f87bf8b53bd5d92b623e33419db64717 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 1 Jul 2024 10:41:06 +0200
Subject: [PATCH v2] Fix alignment of variable in
 pg_atomic_monotonic_advance_u64

Reported-by: Alexander Lakhin 
Discussion: https://postgr.es/m/36796438-a718-cf9b-2071-b2c1b947c...@gmail.com
---
 src/include/port/atomics.h | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index 78987f3154..964732e660 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -580,30 +580,38 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
 static inline uint64
 pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
 {
-	uint64		currval;
+	/*
+	 * On 32-bit machines, declaring a bare uint64 variable doesn't promise
+	 * the alignment we need, so coerce the compiler this way.
+	 */
+	union
+	{
+		uint64		u64;
+		double		force_align_d;
+	}			currval;
 
 #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
 	AssertPointerAlignment(ptr, 8);
 #endif
 
-	currval = pg_atomic_read_u64_impl(ptr);
-	if (currval >= target_)
+	currval.u64 = pg_atomic_read_u64_impl(ptr);
+	if (currval.u64 >= target_)
 	{
 		pg_memory_barrier();
-		return currval;
+		return currval.u64;
 	}
 
 #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
-	AssertPointerAlignment(&currval, 8);
+	AssertPointerAlignment(&currval.u64, 8);
 #endif
 
-	while (currval < target_)
+	while (currval.u64 < target_)
 	{
-		if (pg_atomic_compare_exchange_u64_impl(ptr, &currval, target_))
+		if (pg_atomic_compare_exchange_u64_impl(ptr, &currval.u64, target_))
 			break;
 	}
 
-	return Max(target_, currval);
+	return Max(target_, currval.u64);
 }
 
 #undef INSIDE_ATOMICS_H
-- 
2.39.2

Re: doc: modify the comment in function libpqrcv_check_conninfo()

2024-07-01 Thread Jelte Fennema-Nio

On Thu, 27 Jun 2024 at 12:27, ikedarintarof
 wrote:
> Thanks for your suggestion. I used ChatGPT to choose the wording, but
> it's still difficult for me.

Looks good to me now (but obviously biased since you took my wording).
Adding Robert, since he authored the commit that introduced this
comment.

Re: Make tuple deformation faster

2024-07-01 Thread Andy Fan

David Rowley  writes:

> Currently, TupleDescData contains the descriptor's attributes in a
> variable length array of FormData_pg_attribute allocated within the
> same allocation as the TupleDescData. According to my IDE,
> sizeof(FormData_pg_attribute) == 104 bytes. It's that large mainly due
> to attname being 64 bytes. The TupleDescData.attrs[] array could end
> up quite large on tables with many columns and that could result in
> poor CPU cache hit ratios when deforming tuples.
...
>
> To test the performance of this, I tried using the attached script
> which creates a table where the first column is a variable length
> column and the final column is an int.  The query I ran to test the
> performance inserted 1 million rows into this table and performed a
> sum() on the final column.  The attached graph shows that the query is
> 30% faster than master with 15 columns between the first and last
> column.

Yet another a wonderful optimization! I just want to know how did you
find this optimization (CPU cache hit) case and think it worths some
time. because before we invest our time to optimize something, it is
better know that we can get some measurable improvement after our time
is spend. At for this case, 30% is really a huge number even it is a
artificial case.

Another case is Andrew introduced NullableDatum 5 years ago and said using
it in TupleTableSlot could be CPU cache friendly, I can follow that, but
how much it can improve in an ideal case, is it possible to forecast it
somehow? I ask it here because both cases are optimizing for CPU cache..

-- 
Best Regards
Andy Fan

Re: A new strategy for pull-up correlated ANY_SUBLINK

2024-07-01 Thread Andrei Lepikhov


On 10/12/23 14:52, Andy Fan wrote:

Here the sublink can't be pulled up because of its reference to
the  LHS of left join, the original logic is that no matter the 'b.t in ..'
returns the true or false,  the rows in LHS will be returned.  If we
pull it up to LHS, some rows in LHS will be filtered out, which
breaks its original semantics.

Hi,
I spent some time trying to understand your sentence.
I mean the following case:

SELECT * FROM t1 LEFT JOIN t2
  ON t2.x IN (SELECT y FROM t3 WHERE t1.x=t3.x);

I read [1,2,3], but I am still unsure why it is impossible in the case 
of OUTER JOIN. By setting the LATERAL clause, we forbid any clauses from 
the RTE subquery to bubble up as a top-level clause and filter tuples 
from LHS, am I wrong? Does it need more research or you can show some 
case to support your opinion - why this type of transformation must be 
disallowed?


[1] https://www.postgresql.org/message-id/6531.1218473967%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/BANLkTikGFtGnAaXVh5%3DntRdN%2B4w%2Br%3DNPuw%40mail.gmail.com

[3] https://www.vldb.org/conf/1992/P091.PDF

--
regards, Andrei Lepikhov

Re: Add memory context type to pg_backend_memory_contexts view

2024-07-01 Thread David Rowley

On Sat, 1 Jun 2024 at 12:55, Michael Paquier  wrote:
> This patch looks like a good idea, so +1 from here.

Thank you to both of you for reviewing this.  I've now pushed the patch.

David

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Daniel Gustafsson

> On 27 Jun 2024, at 13:50, Ranier Vilela  wrote:

> Now with file patch really attached.

-   if (strlen(backupidstr) > MAXPGPATH)
+   if (strlcpy(state->name, backupidstr, sizeof(state->name)) >= 
sizeof(state->name))
ereport(ERROR,

Stylistic nit perhaps, I would keep the strlen check here and just replace the
memcpy with strlcpy.  Using strlen in the error message check makes the code
more readable.

-   charname[MAXPGPATH + 1];
+   charname[MAXPGPATH];/* backup label name */

With the introduced use of strlcpy, why do we need to change this field?

--
Daniel Gustafsson

Re: Avoid orphaned objects dependencies, take 3

2024-07-01 Thread Bertrand Drouvot

Hi,

On Wed, Jun 26, 2024 at 10:24:41AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Fri, Jun 21, 2024 at 01:22:43PM +, Bertrand Drouvot wrote:
> > Another thought for the RelationRelationId class case: we could check if 
> > there
> > is a lock first and if there is no lock then acquire one. That way that 
> > would
> > ensure the relation is always locked (so no "risk" anymore), but OTOH it may
> > add "unecessary" locking (see 2. mentioned previously).
> 
> Please find attached v12 implementing this idea for the RelationRelationId 
> class
> case. As mentioned, it may add unnecessary locking for 2. but I think that's
> worth it to ensure that we are always on the safe side of thing. This idea is
> implemented in LockNotPinnedObjectById().

Please find attached v13, mandatory rebase due to 0cecc908e97. In passing, make
use of CheckRelationOidLockedByMe() added in 0cecc908e97.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 461ae5d2fa037b2b9fd285c2a328c8bc785eaec8 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v13] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place before the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch takes into account any type of objects except the ones that are pinned
(they are not droppable because the system requires it).

A special case is done for objects that belong to the RelationRelationId class.
For those, we should be in one of the two following cases that would already
prevent the relation to be dropped:

1. The relation is already locked (could be an existing relation or a relation
that we are creating).

2. The relation is protected indirectly (i.e an index protected by a lock on
its table, a table protected by a lock on a function that depends the table...)

To avoid any risks for the RelationRelationId class case, we acquire a lock if
there is none. That may add unnecessary lock for 2. but that's worth it.

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- alter a dependency (function and schema)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/aclchk.c  |   1 +
 src/backend/catalog/dependency.c  | 121 +++-
 src/backend/catalog/heap.c|   7 +
 src/backend/catalog/index.c   |  26 
 src/backend/catalog/objectaddress.c   |  57 
 src/backend/catalog/pg_aggregate.c|   9 ++
 src/backend/catalog/pg_attrdef.c  |   1 +
 src/backend/catalog/pg_cast.c |   5 +
 src/backend/catalog/pg_collation.c|   1 +
 src/backend/catalog/pg_constraint.c   |  26 
 src/backend/catalog/pg_conversion.c   |   2 +
 src/backend/catalog/pg_depend.c   |  40 +-
 src/backend/catalog/pg_operator.c |  19 +++
 src/backend/catalog/pg_proc.c |  17 ++-
 src/backend/catalog/pg_publication.c  |   7 +
 src/backend/catalog/pg_range.c|   6 +
 src/backend/catalog/pg_type.c |  39 ++
 src/backend/catalog/toasting.c|   1 +
 src/backend/commands/alter.c  |   4 +
 src/backend/commands/amcmds.c |   1 +
 src/backend/commands/cluster.c|   7 +
 src/backend/commands/event_trigger.c  |   1 +
 src/backend/commands/extension.c  |   5 +
 src/backend/commands/foreigncmds.c|   7 +
 src/backend/commands/functioncmds.c   |   6 +
 src/backend/commands/indexcmds.c  |   2 +
 src/backend/commands/opclasscmds.c|  18 +++
 src/backend/commands/operatorcmds.c   |

Re: Surround CheckRelation[Oid]LockedByMe() with USE_ASSERT_CHECKING

2024-07-01 Thread Bertrand Drouvot

Hi,

On Mon, Jul 01, 2024 at 05:01:46PM +0900, Michael Paquier wrote:
> On Mon, Jul 01, 2024 at 06:42:46AM +, Bertrand Drouvot wrote:
> > While working on a rebase for [1] due to 0cecc908e97, I noticed that
> > CheckRelationLockedByMe() and CheckRelationOidLockedByMe() are used only in
> > assertions.
> > 
> > I think it would make sense to declare / define those functions only for
> > assert enabled build: please find attached a tiny patch doing so.
> > 
> > Thoughts?
> 
> Not convinced that's a good idea.  What about out-of-core code that
> may use these routines for runtime checks in non-assert builds?

Thanks for the feedback.

Yeah that could be an issue for CheckRelationLockedByMe() 
(CheckRelationOidLockedByMe()
is too recent to be a concern).

Having said that 1. out of core could want to use CheckRelationOidLockedByMe() (
probably if it was already using CheckRelationLockedByMe()) and 2. I just
submitted a rebase for [1] in which I thought that using
CheckRelationOidLockedByMe() would be a good idea.

So I think that we can get rid of this proposal.

[1]: 
https://www.postgresql.org/message-id/ZoJ5RVtMziIa3TQp%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Converting README documentation to Markdown

2024-07-01 Thread Daniel Gustafsson

> On 28 Jun 2024, at 20:40, Peter Eisentraut  wrote:

> If we're going to do it, then I expect that the files are marked up correctly 
> at all times.

I agree with that. I don't think it will be a terribly high bar though since we
were pretty much already writing markdown.  We already have pandoc in the meson
toolchain, adding a target to check syntax should be doable.

> Conversely, what's the best that could happen?

One of the main goals of this work was to make sure the documentation renders
nicely on platforms which potential new contributors consider part of the
fabric of writing code.  We might not be on Github (and I'm not advocating that
we should) but any new contributor we want to attract is pretty likely to be
using it.  The best that can happen is that new contributors find the postgres
code more approachable and get excited about contributing to postgres.

--
Daniel Gustafsson

Re: Conflict Detection and Resolution

2024-07-01 Thread Amit Kapila

On Mon, Jul 1, 2024 at 11:47 AM Masahiko Sawada  wrote:
>
> On Thu, May 23, 2024 at 3:37 PM shveta malik  wrote:
> >
> > DELETE
> > 
> > Conflict Type:
> > 
> > delete_missing: An incoming delete is trying to delete a row on a
> > target node which does not exist.
>
> IIUC the 'delete_missing' conflict doesn't cover the case where an
> incoming delete message is trying to delete a row that has already
> been updated locally or by another node. I think in update/delete
> conflict situations, we need to resolve the conflicts based on commit
> timestamps like we do for update/update and insert/update conflicts.
>
> For example, suppose there are two node-A and node-B and setup
> bi-directional replication, and suppose further that both have the row
> with id = 1, consider the following sequences:
>
> 09:00:00  DELETE ... WHERE id = 1 on node-A.
> 09:00:05  UPDATE ... WHERE id = 1 on node-B.
> 09:00:10  node-A received the update message from node-B.
> 09:00:15  node-B received the delete message from node-A.
>
> At 09:00:10 on node-A, an update_deleted conflict is generated since
> the row on node-A is already deleted locally. Suppose that we use
> 'apply_or_skip' resolution for this conflict, we convert the update
> message into an insertion, so node-A now has the row with id = 1. At
> 09:00:15 on node-B, the incoming delete message is applied and deletes
> the row with id = 1, even though the row has already been modified
> locally. The node-A and node-B are now inconsistent. This
> inconsistency can be avoided by using 'skip' resolution for the
> 'update_deleted' conflict on node-A, and 'skip' resolution is the
> default method for that actually. However, if we handle it as
> 'update_missing', the 'apply_or_skip' resolution is used by default.
>
> IIUC with the proposed architecture, DELETE always takes precedence
> over UPDATE since both 'update_deleted' and 'update_missing' don't use
> commit timestamps to resolve the conflicts. As long as that is true, I
> think there is no use case for 'apply_or_skip' and 'apply_or_error'
> resolutions in update/delete conflict cases. In short, I think we need
> something like 'delete_differ' conflict type as well. FYI PGD and
> Oracle GoldenGate seem to have this conflict type[1][2].
>

Your explanation makes sense to me and I agree that we should
implement 'delete_differ' conflict type.

> The 'delete'_differ' conflict type would have at least
> 'latest_timestamp_wins' resolution. With the timestamp based
> resolution method, we would deal with update/delete conflicts as
> follows:
>
> 09:00:00: DELETE ... WHERE id = 1 on node-A.
> 09:00:05: UPDATE ... WHERE id = 1 on node-B.
> - the updated row doesn't have the origin since it's a local change.
> 09:00:10: node-A received the update message from node-B.
> - the incoming update message has the origin of node-B whereas the
> local row is already removed locally.
> - 'update_deleted' conflict is generated.
>

FYI, as of now, we don't have a reliable way to detect
'update_deleted' type of conflicts but we had some discussion about
the same [1].

> - do the insert of the new row instead, because the commit
> timestamp of UPDATE is newer than DELETE's one.
> 09:00:15: node-B received the delete message from node-A.
> - the incoming delete message has the origin of node-B whereas the
> (updated) row doesn't have the origin.
> - 'update_differ' conflict is generated.
> - discard DELETE, because the commit timestamp of UPDATE is newer
> than DELETE' one.ard DELETE, because the commit timestamp of UPDATE is
> newer than DELETE' one.
>
> As a result, both nodes have the new version row.
>

Right, it seems to me that we should implement 'latest_time_wins' if
we want consistency in such cases.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1Lj-PWrP789KnKxZydisHajd38rSihWXO8MVBLDwxG1Kg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Conflict Detection and Resolution

2024-07-01 Thread Amit Kapila

On Mon, Jul 1, 2024 at 1:35 PM Masahiko Sawada  wrote:
>
> Setting resolvers at table-level and subscription-level sounds good to
> me. DDLs for setting resolvers at subscription-level would need the
> subscription name to be specified?
>

Yes, it should be part of the ALTER/CREATE SUBSCRIPTION command. One
idea could be to have syntax as follows:

ALTER SUBSCRIPTION name SET CONFLICT RESOLVER 'conflict_resolver' FOR
'conflict_type';
ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR 'conflict_type';

CREATE SUBSCRIPTION subscription_name CONNECTION 'conninfo'
PUBLICATION publication_name [, ...] CONFLICT RESOLVER
'conflict_resolver' FOR 'conflict_type';

> And another question is: a
> table-level resolver setting is precedent over all subscriber-level
> resolver settings in the database?
>

Yes.

-- 
With Regards,
Amit Kapila.

Re: cfbot update: Using GitHub for patch review

2024-07-01 Thread Ashutosh Bapat

On Sat, Jun 29, 2024 at 2:12 PM Jelte Fennema-Nio 
wrote:

> On Sat, 29 Jun 2024 at 01:13, Thomas Munro  wrote:
> >
> > On Sat, Jun 29, 2024 at 1:10 AM Ashutosh Bapat
> >  wrote:
> > > I need to sign in to github to add my review comments. So those who do
> not have a github account can not use it for review. But I don't think that
> can be fixed. We need a way to know who left review comments.
> >
> > I don't think Jelte was talking about moving review discussion to
> > Github, just providing a link to *view* the patches there.
>
> Totally correct. And I realize now I should have called that out
> explicitly in the initial email.
>

While I personally would like to see that one day, getting a consensus and
changing the process for whole community is a lot of effort. I didn't think
(or mean) that we would move our review process to Github with this change.
Sorry if it sounded like that.

>
> While I personally would love to be able to read & write comments on a
> Github PR, integrating that with the mailing list in a way that the
> community is happy with as a whole is no small task (both technically
> and politically).
>

It is not a small amount of work, I agree. But it may be a way forward.
Those who want to use PR for review can review them as long as the reviews
are visible on the mailing list. Many of us already draft our review emails
similar to how it would look like in a PR. If the PR system can send that
email on reviewer's behalf (as if it's sent by the reviewer) it will
integrate well with the current process. People will learn, get used to it
and move eventually to PR based reviews.

>
> So (for now) I took the easy way out and sidestepped all those
> difficulties, by making the github branches of the cfbot (which we
> already had) a bit more user friendly as a way to access patches in a
> read-only way.
>

+1.

-- 
Best Wishes,
Ashutosh Bapat

Re: Make tuple deformation faster

2024-07-01 Thread Matthias van de Meent

On Mon, 1 Jul 2024 at 10:56, David Rowley  wrote:
>
> Currently, TupleDescData contains the descriptor's attributes in a
> variable length array of FormData_pg_attribute allocated within the
> same allocation as the TupleDescData. According to my IDE,
> sizeof(FormData_pg_attribute) == 104 bytes. It's that large mainly due
> to attname being 64 bytes. The TupleDescData.attrs[] array could end
> up quite large on tables with many columns and that could result in
> poor CPU cache hit ratios when deforming tuples.
>
> Instead, we could make TupleDescData contain an out-of-line pointer to
> the array of FormData_pg_attribute and have a much more compact
> inlined array of some other struct that much more densely contains the
> fields required for tuple deformation. attname and many of the other
> fields are not required to deform a tuple.

+1

> I've attached a patch series which does this.
>
> 0001: Just fixes up some missing usages of TupleDescAttr(). (mostly
> missed by me, apparently :-( )
> 0002: Adjusts the TupleDescData.attrs array to make it out of line. I
> wanted to make sure nothing weird happened by doing this before doing
> the bulk of the other changes to add the new struct.
> 0003: Adds a very compact 8-byte struct named TupleDescDeformAttr,
> which can be used for tuple deformation. 8 columns fits on a 64-byte
> cacheline rather than 13 cachelines.

Cool, that's similar to, but even better than, my patch from 2021 over at [0].

One thing I'm slightly concerned about is that this allocates another
8 bytes for each attribute in the tuple descriptor. While that's not a
lot when compared with the ->attrs array, it's still quite a lot when
we might not care at all about this data; e.g. in temporary tuple
descriptors during execution, in intermediate planner nodes.

Did you test for performance gains (or losses) with an out-of-line
TupleDescDeformAttr array? One benefit from this would be that we
could reuse the deform array for suffix truncated TupleDescs, reuse of
which currently would require temporarily updating TupleDesc->natts
with a smaller value; but with out-of-line ->attrs and ->deform_attrs,
we could reuse these arrays between TupleDescs if one is shorter than
the other, but has otherwise fully matching attributes. I know that
btree split code would benefit from this, as it wouldn't have to
construct a full new TupleDesc when it creates a suffix-truncated
tuple during page splits.

> 0004: Adjusts the attalign to change it from char to uint8.  See below.
>
> The 0004 patch changes the TupleDescDeformAttr.attalign to a uint8
> rather than a char containing 'c', 's', 'i' or 'd'. This allows much
> more simple code in the att_align_nominal() macro. What's in master is
> quite a complex expression to evaluate every time we deform a column
> as it much translate: 'c' -> 1, 's' -> 2, 'i' -> 4, 'd' -> 8. If we
> just store that numeric value in the struct that macro can become a
> simple TYPEALIGN() so the operation becomes simple bit masking rather
> than a poorly branch predictable series of compare and jump.

+1, that's something I'd missed in my patches, and is probably the
largest contributor to the speedup.

> I'll stick this in the July CF. It would be good to get some feedback
> on the idea and feedback on whether more work on this is worthwhile.

Do you plan to remove the ->attcacheoff catalog field from the
FormData_pg_attribute, now that (with your patch) it isn't used
anymore as a placeholder field for fast (de)forming of tuples?

Kind regards,

Matthias van de Meent

[0] 
https://www.postgresql.org/message-id/CAEze2Wh8-metSryZX_Ubj-uv6kb%2B2YnzHAejmEdubjhmGusBAg%40mail.gmail.com

Re: Make tuple deformation faster

2024-07-01 Thread David Rowley

On Mon, 1 Jul 2024 at 21:17, Andy Fan  wrote:
> Yet another a wonderful optimization! I just want to know how did you
> find this optimization (CPU cache hit) case and think it worths some
> time. because before we invest our time to optimize something, it is
> better know that we can get some measurable improvement after our time
> is spend. At for this case, 30% is really a huge number even it is a
> artificial case.
>
> Another case is Andrew introduced NullableDatum 5 years ago and said using
> it in TupleTableSlot could be CPU cache friendly, I can follow that, but
> how much it can improve in an ideal case, is it possible to forecast it
> somehow? I ask it here because both cases are optimizing for CPU cache..

Have a look at:

perf stat --pid=

On my AMD Zen4 machine running the 16 extra column test from the
script in my last email, I see:

$ echo master && perf stat --pid=389510 sleep 10
master

 Performance counter stats for process id '389510':

   9990.65 msec task-clock:u  #0.999 CPUs utilized
 0  context-switches:u#0.000 /sec
 0  cpu-migrations:u  #0.000 /sec
 0  page-faults:u #0.000 /sec
   49407204156  cycles:u  #4.945 GHz
  18529494  stalled-cycles-frontend:u #0.04% frontend
cycles idle
   8505168  stalled-cycles-backend:u  #0.02% backend cycles idle
  165442142326  instructions:u#3.35  insn per cycle
  #0.00  stalled
cycles per insn
   39409877343  branches:u#3.945 G/sec
 146350275  branch-misses:u   #0.37% of all branches

  10.001012132 seconds time elapsed

$ echo patched && perf stat --pid=380216 sleep 10
patched

 Performance counter stats for process id '380216':

   9989.14 msec task-clock:u  #0.998 CPUs utilized
 0  context-switches:u#0.000 /sec
 0  cpu-migrations:u  #0.000 /sec
 0  page-faults:u #0.000 /sec
   49781280456  cycles:u  #4.984 GHz
  22922276  stalled-cycles-frontend:u #0.05% frontend
cycles idle
  24259785  stalled-cycles-backend:u  #0.05% backend cycles idle
  213688149862  instructions:u#4.29  insn per cycle
  #0.00  stalled
cycles per insn
   44147675129  branches:u#4.420 G/sec
  14282567  branch-misses:u   #0.03% of all branches

  10.005034271 seconds time elapsed

You can see the branch predictor has done a *much* better job in the
patched code vs master with about 10x fewer misses.  This should have
helped contribute to the "insn per cycle" increase.  4.29 is quite
good for postgres. I often see that around 0.5. According to [1]
(relating to Zen4), "We get a ridiculous 12 NOPs per cycle out of the
micro-op cache". I'm unsure how micro-ops translate to "insn per
cycle" that's shown in perf stat. I thought 4-5 was about the maximum
pipeline size from today's era of CPUs. Maybe someone else can explain
better than I can. In more simple terms, generally, the higher the
"insn per cycle", the better. Also, the lower all of the idle and
branch miss percentages are that's generally also better. However,
you'll notice that the patched version has more front and backend
stalls. I assume this is due to performing more instructions per cycle
from improved branch prediction causing memory and instruction stalls
to occur more frequently, effectively (I think) it's just hitting the
next bottleneck(s) - memory and instruction decoding. At least, modern
CPUs should be able to out-pace RAM in many workloads, so perhaps it's
not that surprising that "backend cycles idle" has gone up due to such
a large increase in instructions per cycle due to improved branch
prediction.

It would be nice to see this tested on some modern Intel CPU. A 13th
series or 14th series, for example, or even any intel from the past 5
years would be better than nothing.

David

[1] 
https://chipsandcheese.com/2022/11/05/amds-zen-4-part-1-frontend-and-execution-engine/

Alias of VALUES RTE in explain plan

2024-07-01 Thread Ashutosh Bapat

Hi All,
While reviewing Richard's patch for grouping sets, I stumbled upon
following explain output

explain (costs off)
select distinct on (a, b) a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a))
order by a, b;
   QUERY PLAN

 Unique
   ->  Sort
 Sort Key: "*VALUES*".column1, "*VALUES*".column2
 ->  HashAggregate
   Hash Key: "*VALUES*".column1, "*VALUES*".column2
   Hash Key: "*VALUES*".column1
   ->  Values Scan on "*VALUES*"
 Filter: (column1 = column2)
(8 rows)

There is no VALUES.column1 and VALUES.column2 in the query. The alias t.a
and t.b do not appear anywhere in the explain output. I think explain
output should look like
explain (costs off)
select distinct on (a, b) a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a))
order by a, b;
   QUERY PLAN

 Unique
   ->  Sort
 Sort Key: t.a, t.b
 ->  HashAggregate
   Hash Key: t.a, t.b
   Hash Key: t.a
   ->  Values Scan on "*VALUES*" t
 Filter: (a = b)
(8 rows)

I didn't get time to figure out the reason behind this, nor the history.
But I thought I would report it nonetheless.

-- 
Best Wishes,
Ashutosh Bapat

gamma() and lgamma() functions

2024-07-01 Thread Dean Rasheed

Attached is a patch adding support for the gamma and log-gamma
functions. See, for example:

https://en.wikipedia.org/wiki/Gamma_function

I think these are very useful general-purpose mathematical functions.
They're part of C99, and they're commonly included in other
mathematical libraries, such as the python math module, so I think
it's useful to make them available from SQL.

The error-handling for these functions seems to be a little trickier
than most, so that might need further discussion.

Regards,
Dean
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 5a16910..6464a8b
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1399,6 +1399,27 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FRO
   

 
+ gamma
+
+gamma ( double precision )
+double precision
+   
+   
+Gamma function
+   
+   
+gamma(0.5)
+1.772453850905516
+   
+   
+gamma(6)
+120
+   
+  
+
+  
+   
+
  gcd
 
 gcd ( numeric_type, numeric_type )
@@ -1436,6 +1457,23 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FRO

   
 
+  
+   
+
+ lgamma
+
+lgamma ( double precision )
+double precision
+   
+   
+Natural logarithm of the absolute value of the gamma function
+   
+   
+lgamma(1000)
+5905.220423209181
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
new file mode 100644
index cbbb8ae..e43aa42
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2779,6 +2779,91 @@ derfc(PG_FUNCTION_ARGS)
 }
 
 
+/* == GAMMA FUNCTIONS == */
+
+
+/*
+ *		dgamma			- returns the gamma function of arg1
+ */
+Datum
+dgamma(PG_FUNCTION_ARGS)
+{
+	float8		arg1 = PG_GETARG_FLOAT8(0);
+	float8		result;
+
+	/*
+	 * Per the POSIX spec, return NaN if the input is NaN, +/-infinity if the
+	 * input is +/-0, NaN if the input is -infinity, and infinity if the input
+	 * is infinity.
+	 */
+	if (isnan(arg1))
+		PG_RETURN_FLOAT8(get_float8_nan());
+	if (arg1 == 0)
+	{
+		if (signbit(arg1))
+			PG_RETURN_FLOAT8(-get_float8_infinity());
+		else
+			PG_RETURN_FLOAT8(get_float8_infinity());
+	}
+	if (isinf(arg1))
+	{
+		if (arg1 < 0)
+			PG_RETURN_FLOAT8(get_float8_nan());
+		else
+			PG_RETURN_FLOAT8(get_float8_infinity());
+	}
+
+	/*
+	 * Note that the POSIX/C99 gamma function is called "tgamma", not "gamma".
+	 *
+	 * For negative integer inputs, it may raise a domain error or a pole
+	 * error, or return NaN or +/-infinity (the POSIX spec indicates that this
+	 * may change in future implementations).  Since we can't reliably detect
+	 * integer inputs above a certain size, and we can't distinguish this from
+	 * any other overflow error, just treat them all the same.
+	 */
+	errno = 0;
+	result = tgamma(arg1);
+	if (errno != 0 || isinf(result) || isnan(result))
+		float_overflow_error();
+
+	PG_RETURN_FLOAT8(result);
+}
+
+
+/*
+ *		dlgamma			- natural logarithm of absolute value of gamma of arg1
+ */
+Datum
+dlgamma(PG_FUNCTION_ARGS)
+{
+	float8		arg1 = PG_GETARG_FLOAT8(0);
+	float8		result;
+
+	/* Per the POSIX spec, return NaN if the input is NaN */
+	if (isnan(arg1))
+		PG_RETURN_FLOAT8(get_float8_nan());
+
+	errno = 0;
+	result = lgamma(arg1);
+
+	/*
+	 * If an ERANGE error occurs, it means there is an overflow.  This can
+	 * happen if the input is 0, a negative integer, -infinity, or infinity.
+	 * In each of those cases, the correct result is infinity.  However, it
+	 * can also happen in other cases where the correct result is finite, but
+	 * too large to be represented (e.g., if the input is larger than around
+	 * 2.5e305).  There seems to be no way to distinguish those cases, so we
+	 * just return infinity for them too.  That's not ideal, but in practice,
+	 * lgamma() is much less likely to overflow than tgamma(), so it seems OK.
+	 */
+	if (unlikely(errno == ERANGE))
+		result = get_float8_infinity();
+
+	PG_RETURN_FLOAT8(result);
+}
+
+
 
 /*
  *		=
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
new file mode 100644
index 6a5476d..acaf4fb
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3484,6 +3484,13 @@
   proname => 'erfc', prorettype => 'float8', proargtypes => 'float8',
   prosrc => 'derfc' },
 
+{ oid => '8702', descr => 'gamma function',
+  proname => 'gamma', prorettype => 'float8', proargtypes => 'float8',
+  prosrc => 'dgamma' },
+{ oid => '8703', descr => 'natural logarithm of absolute value of gamma function',
+  proname => 'lgamma', prorettype => 'float8', proargtypes => 'float8',
+  prosrc => 'dlgamma' },
+
 { oid => '1618',
   proname => 'interval_mul', prorettype => 'interval',
   proargtypes => 'interval float8', prosrc => 'interval_mul' },
diff --git a/s

Re: New standby_slot_names GUC in PG 17

2024-07-01 Thread Amit Kapila

On Thu, Jun 27, 2024 at 7:14 AM Masahiko Sawada  wrote:
>
> On Wed, Jun 26, 2024 at 6:15 PM Zhijie Hou (Fujitsu)
>  wrote:
>
> Thank you for updating the patch. The v2 patch looks good to me.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: pgsql: Add more SQL/JSON constructor functions

2024-07-01 Thread Amit Langote

On Sun, Jun 30, 2024 at 3:56 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> >> +/*
> >> + * For domains, consider the base type's typmod to decide whether to 
> >> setup
> >> + * an implicit or explicit cast.
> >> + */
> >> +if (get_typtype(returning->typid) == TYPTYPE_DOMAIN)
> >> +(void) getBaseTypeAndTypmod(returning->typid, &baseTypmod);
>
> > TBH I'm not super clear on why we decide on explicit or implicit cast
> > based on presence of a typmod.  Why isn't it better to always use an
> > implicit one?
>
> Hmm ... there are a bunch of existing places that seem to have similar
> logic, but they are all in new-ish SQL/JSON functionality, and I would
> not be surprised if they are all wrong.  parse_coerce.c is quite
> opinionated about what a domain's typtypmod means (see comments in
> coerce_type() for instance); see also the logic in coerce_to_domain:
>
>  * If the domain applies a typmod to its base type, build the appropriate
>  * coercion step.  Mark it implicit for display purposes, because we don't
>  * want it shown separately by ruleutils.c; but the isExplicit flag passed
>  * to the conversion function depends on the manner in which the domain
>  * coercion is invoked, so that the semantics of implicit and explicit
>  * coercion differ.  (Is that really the behavior we want?)
>
> I don't think that this SQL/JSON behavior quite matches that.

The reason I decided to go for the implicit cast only when there is a
typmod is that the behavior with COERCION_EXPLICIT is only problematic
when there's a typmod because of this code in
build_coercion_expression:

if (nargs == 3)
{
/* Pass it a boolean isExplicit parameter, too */
cons = makeConst(BOOLOID,
 -1,
 InvalidOid,
 sizeof(bool),
 BoolGetDatum(ccontext == COERCION_EXPLICIT),
 false,
 true);

args = lappend(args, cons);
}

Yeah, we could have fixed that by always using COERCION_IMPLICIT for
SQL/JSON but, as Jian said, we don't have a bunch of casts that these
SQL/JSON functions need, which is why I guess we ended up with
COERCION_EXPLICIT here in the first place.

One option I hadn't tried was using COERCION_ASSIGNMENT instead, which
seems to give coerceJsonFuncExpr() the casts it needs with the
behavior it wants, so how about applying the attached?

-- 
Thanks, Amit Langote


v6-0001-SQL-JSON-Rethink-c2d93c3802b.patch
Description: Binary data

Re: Make tuple deformation faster

2024-07-01 Thread David Rowley

On Mon, 1 Jul 2024 at 22:07, Matthias van de Meent
 wrote:
> Cool, that's similar to, but even better than, my patch from 2021 over at [0].

I'll have a read of that. Thanks for pointing it out.

> One thing I'm slightly concerned about is that this allocates another
> 8 bytes for each attribute in the tuple descriptor. While that's not a
> lot when compared with the ->attrs array, it's still quite a lot when
> we might not care at all about this data; e.g. in temporary tuple
> descriptors during execution, in intermediate planner nodes.

I've not done it in the patch, but one way to get some of that back is
to ditch pg_attribute.attcacheoff. There's no need for it after this
patch.  That's only 4 out of 8 bytes, however. I think in most cases
due to FormData_pg_attribute being so huge, the aset.c power-of-2
roundup behaviour will be unlikely to cross a power-of-2 boundary.

The following demonstrates which column counts that actually makes a
difference with:

select c as n_cols,old_bytes, new_bytes from (select c,24+104*c as
old_bytes, 32+100*c+8*c as new_bytes from generate_series(1,1024) c)
where position('1' in old_bytes::bit(32)::text) != position('1' in
new_bytes::bit(32)::text);

That returns just 46 column counts out of 1024 where we cross a power
of 2 boundaries with the patched code that we didn't cross in master.
Of course, larger pallocs will result in a malloc() directly, so
perhaps that's not a good measure.  At least for smaller column counts
it should be mainly the same amount of memory used. There are only 6
rows in there for column counts below 100. I think if we were worried
about memory there are likely 100 other things we could do to reclaim
some. It would only take some shuffling of fields in RelationData. I
count 50 bytes of holes in that struct out of the 488 bytes. There are
probably a few that could be moved without upsetting the
struct-field-order-lords too much.

> Did you test for performance gains (or losses) with an out-of-line
> TupleDescDeformAttr array? One benefit from this would be that we
> could reuse the deform array for suffix truncated TupleDescs, reuse of
> which currently would require temporarily updating TupleDesc->natts
> with a smaller value; but with out-of-line ->attrs and ->deform_attrs,
> we could reuse these arrays between TupleDescs if one is shorter than
> the other, but has otherwise fully matching attributes. I know that
> btree split code would benefit from this, as it wouldn't have to
> construct a full new TupleDesc when it creates a suffix-truncated
> tuple during page splits.

No, but it sounds easy to test as patch 0002 moves that out of line
and does nothing else.

> > 0004: Adjusts the attalign to change it from char to uint8.  See below.
> >
> > The 0004 patch changes the TupleDescDeformAttr.attalign to a uint8
> > rather than a char containing 'c', 's', 'i' or 'd'. This allows much
> > more simple code in the att_align_nominal() macro. What's in master is
> > quite a complex expression to evaluate every time we deform a column
> > as it much translate: 'c' -> 1, 's' -> 2, 'i' -> 4, 'd' -> 8. If we
> > just store that numeric value in the struct that macro can become a
> > simple TYPEALIGN() so the operation becomes simple bit masking rather
> > than a poorly branch predictable series of compare and jump.
>
> +1, that's something I'd missed in my patches, and is probably the
> largest contributor to the speedup.

I think so too and I did consider if we should try and do that to
pg_attribute, renaming the column to attalignby. I started but didn't
finish a patch for that.

> > I'll stick this in the July CF. It would be good to get some feedback
> > on the idea and feedback on whether more work on this is worthwhile.
>
> Do you plan to remove the ->attcacheoff catalog field from the
> FormData_pg_attribute, now that (with your patch) it isn't used
> anymore as a placeholder field for fast (de)forming of tuples?

Yes, I plan to do that once I get more confidence I'm on to a winner here.

Thanks for having a look at this.

David

Re: Virtual generated columns

2024-07-01 Thread Peter Eisentraut


On 17.06.24 21:31, Tomasz Rybak wrote:

v1-0001-Rename-regress-test-generated-to-generated_stored.patch:
no objections here, makes sense as preparation for future changes

v1-0002-Put-generated_stored-test-objects-in-a-schema.patch:
also no objections.
OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out,
tablespace.out) are creating schemas and later dropping them - so here
it might also make sense to drop schema at the end of testing.


The existing tests for generated columns don't drop what they create at 
the end, which can be useful for pg_upgrade testing for example.  So 
unless there are specific reasons to change it, I would leave that as is.


Other tests might have other reasons.  For example, publications or row 
security might interfere with many other tests.



v1-0003-Remove-useless-initializations.patch:
All other cases (I checked directory src/backend/utils/cache)
calling MemoryContextAllocZero only initialize fields when they
are non-zero, so removing partial initialization with false brings
consistency to the code.

v1-0004-Remove-useless-code.patch:
Patch removes filling in of constraints from function
BuildDescForRelation. This function is only called from file
view.c and tablecmds.c (twice). In none of those cases
result->constr is used, so proposed change makes sense.
While I do not know code well, so might be wrong here,
I would apply this patch.


I have committed these two now.

Re: Virtual generated columns

2024-07-01 Thread Peter Eisentraut


On 28.06.24 02:00, jian he wrote:

the test structure you made ( generated_stored.sql,
generated_virtual.sq) looks ok to me.
but do we need to reset the search_path at the end of
generated_stored.sql, generated_virtual.sql?


No, the session ends at the end of the test file, so we don't need to 
reset session state.



+ /*
+ * TODO: This could be done, but it would need a different implementation:
+ * no rewriting, but still need to recheck any constraints.
+ */
+ if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual
generated columns"),
+ errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.",
+   colName, RelationGetRelationName(rel;

minor typo, should be
+ errmsg("ALTER TABLE SET EXPRESSION is not supported for virtual
generated columns"),


This style "ALTER TABLE / something else" is also used for other error 
messages related to ALTER TABLE subcommands, so I am using the same here.



insert/update/delete/merge returning have problems:
CREATE TABLE t2 (
a int ,
b int GENERATED ALWAYS AS (a * 2),
d int default 22);
insert into t2(a) select g from generate_series(1,10) g;

insert into t2 select 100 returning *, (select t2.b), t2.b = t2.a * 2;
update t2 set a = 12 returning *, (select t2.b), t2.b = t2.a * 2;
update t2 set a = 12 returning *, (select (select t2.b)), t2.b = t2.a * 2;
delete from t2 where t2.b = t2.a * 2 returning *, 1,((select t2.b));

currently all these query, error message is "unexpected virtual
generated column reference"
we expect above these query work?


Yes, this is a bug.  I'm looking into it.


issue with merge:
CREATE TABLE t0 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
insert into t0(a) select g from generate_series(1,10) g;
MERGE INTO t0 t USING t0 AS s ON 2 * t.a = s.b WHEN MATCHED THEN
DELETE returning *;

the above query returns zero rows, but for stored generated columns it
will return 10 rows.

in  transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
add
`qry->hasGeneratedVirtual = pstate->p_hasGeneratedVirtual;`
before
`assign_query_collations(pstate, qry);`
solve the problem.


Good catch.  Will fix.

Thanks for this review.  I will work on fixing the issues above and come 
back with a new patch set.

Re: [PATCH] Fix docs to use canonical links

2024-07-01 Thread Joel Jacobson

On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote:
> Avoding redirects is generally a good thing, not everyone is on lightning fast
> internet.  Wikipedia is however not doing any 30X redirects so it's not really
> an issue for those links, it's all 200 requests.

Yes, I noticed that too when observing the HTTPS traffic, so no issue there,
except that it's a bit annoying that the address bar suddenly changes.

However, I think David J had another good argument:

"If we are making wikipedia our authority we might as well use their standard 
for naming."

/Joel

Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-07-01 Thread David Christensen

On Jul 1, 2024, at 12:56 AM, Ashutosh Bapat  wrote:On Wed, Jun 12, 2024 at 4:22 AM David Christensen  wrote:On Mon, Jun 10, 2024 at 8:15 AM Robert Haas  wrote:
>
> On Mon, Jun 10, 2024 at 3:09 AM Ashutosh Bapat
>  wrote:
> > This is just one instance of measurements. If I run the experiment multiple times the results and the patterns will vary. Usually I have found planning time to vary within 5% for regular tables and within 9% for partitioned tables with a large number of partitions. Below are measurements with the experiment repeated multiple times. For a given number of partitioned tables (each with 1000 partitions) being joined, planning time is measured 10 consecutive times. For this set of 10 runs we calculate average and standard deviation of planning time. Such 10 sets are sampled. This means planning time is sampled 100 times in total with and without patch respectively. Measurements with master and patched are reported in the attached excel sheet.
>
> Well, this is fine then I guess, but if the original results weren't
> stable enough for people to draw conclusions from, then it's better
> not to post them, and instead do this work to get results that are
> stable before posting.

Just doing a quick code review of the structure and the caller, I'd
agree that this is properly hoisting the invariant, so don't see that
it should contribute to any performance regressions.  To the extent
that it's called multiple times I can see that it's an improvement,
with minimal code shuffling it seems like a sensible change (even in
the single-caller case).

In short +1 from me.Hi David,Do you think it needs more review or we can change it to "ready for committer"? I think it’s ready from my standpoint. David

Re: Make tuple deformation faster

2024-07-01 Thread Matthias van de Meent

On Mon, 1 Jul 2024 at 12:49, David Rowley  wrote:
>
> On Mon, 1 Jul 2024 at 22:07, Matthias van de Meent
>  wrote:
> > One thing I'm slightly concerned about is that this allocates another
> > 8 bytes for each attribute in the tuple descriptor. While that's not a
> > lot when compared with the ->attrs array, it's still quite a lot when
> > we might not care at all about this data; e.g. in temporary tuple
> > descriptors during execution, in intermediate planner nodes.
>
> I've not done it in the patch, but one way to get some of that back is
> to ditch pg_attribute.attcacheoff. There's no need for it after this
> patch.  That's only 4 out of 8 bytes, however.

FormData_pg_attribute as a C struct has 4-byte alignment; AFAIK it
doesn't have any fields that require 8-byte alignment? Only on 64-bit
systems we align the tuples on pages with 8-byte alignment, but
in-memory arrays of the struct wouldn't have to deal with that, AFAIK.

> I think in most cases
> due to FormData_pg_attribute being so huge, the aset.c power-of-2
> roundup behaviour will be unlikely to cross a power-of-2 boundary.

ASet isn't the only allocator, but default enough for this to make sense, yes.

> The following demonstrates which column counts that actually makes a
> difference with:
>
> select c as n_cols,old_bytes, new_bytes from (select c,24+104*c as
> old_bytes, 32+100*c+8*c as new_bytes from generate_series(1,1024) c)
> where position('1' in old_bytes::bit(32)::text) != position('1' in
> new_bytes::bit(32)::text);
>
> That returns just 46 column counts out of 1024 where we cross a power
> of 2 boundaries with the patched code that we didn't cross in master.
> Of course, larger pallocs will result in a malloc() directly, so
> perhaps that's not a good measure.  At least for smaller column counts
> it should be mainly the same amount of memory used. There are only 6
> rows in there for column counts below 100. I think if we were worried
> about memory there are likely 100 other things we could do to reclaim
> some. It would only take some shuffling of fields in RelationData. I
> count 50 bytes of holes in that struct out of the 488 bytes. There are
> probably a few that could be moved without upsetting the
> struct-field-order-lords too much.

I'd love for RelationData to be split into IndexRelation,
TableRelation, ForeignTableRelation, etc., as there's a lot of wastage
caused by exclusive fields, too.

> > > 0004: Adjusts the attalign to change it from char to uint8.  See below.
> > >
> > > The 0004 patch changes the TupleDescDeformAttr.attalign to a uint8
> > > rather than a char containing 'c', 's', 'i' or 'd'. This allows much
> > > more simple code in the att_align_nominal() macro. What's in master is
> > > quite a complex expression to evaluate every time we deform a column
> > > as it much translate: 'c' -> 1, 's' -> 2, 'i' -> 4, 'd' -> 8. If we
> > > just store that numeric value in the struct that macro can become a
> > > simple TYPEALIGN() so the operation becomes simple bit masking rather
> > > than a poorly branch predictable series of compare and jump.
> >
> > +1, that's something I'd missed in my patches, and is probably the
> > largest contributor to the speedup.
>
> I think so too and I did consider if we should try and do that to
> pg_attribute, renaming the column to attalignby. I started but didn't
> finish a patch for that.

I'm not sure we have a pg_type entry that that supports numeric
attalign values without increasing the size of the field, as the one
single-byte integer-like type (char) is always used as a printable
character, and implied to always be printable through its usage in
e.g. nodeToString infrastructure.

I'd love to have a better option, but we don't seem to have one yet.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: pg_createsubscriber: drop pre-existing subscriptions from the converted node

2024-07-01 Thread Amit Kapila

On Mon, Jul 1, 2024 at 11:44 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> > You can create a dummy subscription on node_p
> > and do a test similar to what we are doing in "# Create failover slot
> > to test its removal".
>
> Your approach looks better than mine. I followed the approach.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-01 Thread Dagfinn Ilmari Mannsåker

"Joel Jacobson"  writes:

> Hello hackers,
>
> Attached patch introduces an optimization of mul_var() in numeric.c,
> targeting cases where the multiplicands consist of only one or two
> base-NBASE digits. Such small multiplicands can fit into an int64 and
> thus be computed directly, resulting in a significant performance
> improvement, between 26% - 34% benchmarked on Intel Core i9-14900K.
>
> This optimization is similar to commit d1b307eef2, that also targeted
> one and two base-NBASE digit operands, but optimized div_var().

div_var() also has an optimisation for 3- and 4-digit operands under
HAVE_INT128 (added in commit 0aa38db56bf), would that make sense in
mul_var() too?

> Regards,
> Joel

- ilmari

RE: New standby_slot_names GUC in PG 17

2024-07-01 Thread Zhijie Hou (Fujitsu)

On Monday, July 1, 2024 6:45 PM Amit Kapila  wrote:
> 
> On Thu, Jun 27, 2024 at 7:14 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, Jun 26, 2024 at 6:15 PM Zhijie Hou (Fujitsu)
> >  wrote:
> >
> > Thank you for updating the patch. The v2 patch looks good to me.
> >
> 
> Pushed.

Thanks! I am attaching another patch to modify the release note as discussed.

Best Regards,
Hou zj


0001-add-recent-renaming-commit-to-the-release-note.patch
Description: 0001-add-recent-renaming-commit-to-the-release-note.patch

Re: [PATCH] Fix docs to use canonical links

2024-07-01 Thread Daniel Gustafsson

> On 1 Jul 2024, at 13:09, Joel Jacobson  wrote:
> 
> On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote:
>> Avoding redirects is generally a good thing, not everyone is on lightning 
>> fast
>> internet.  Wikipedia is however not doing any 30X redirects so it's not 
>> really
>> an issue for those links, it's all 200 requests.
> 
> Yes, I noticed that too when observing the HTTPS traffic, so no issue there,
> except that it's a bit annoying that the address bar suddenly changes.

Right, I was unclear, I'm not advocating against changing.  It won't move the
needle compared to 30X redirects but it also won't hurt.

> However, I think David J had another good argument:
> 
> "If we are making wikipedia our authority we might as well use their standard 
> for naming."

It's a moving target, but so is most if not all links.

--
Daniel Gustafsson

psql: Add leakproof field to \dAo+ meta-command results

2024-07-01 Thread Yugo NAGATA

Hi,

I would like to propose to add a new field to psql's \dAo+ meta-command
to show whether the underlying function of an operator is leak-proof.

This idea is inspired from [1] that claims some indexes uses non-LEAKPROOF
functions under the associated operators, as a result, it can not be selected
for queries with security_barrier views or row-level security policies.
The original proposal was to add a query over system catalogs for looking up
non-leakproof operators to the documentation, but I thought it is useful
to improve \dAo results rather than putting such query to the doc.

The attached patch adds the field to \dAo+ and also a description that
explains the relation between indexes and security quals with referencing
\dAo+ meta-command.

[1] 
https://www.postgresql.org/message-id/raw/5af3bf0c-5e0c-4128-81dc-084c5258b1af%40code406.com

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 3417c4cce46ec068464b7069428e7f4a9a2cd07d Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Mon, 1 Jul 2024 16:16:39 +0900
Subject: [PATCH] psql: Add leakproof field to \dAo+ meta-command results

This adds a field that shows whether the underlying function of an
operator associated with operator families is leak-proof.
It is useful for checking an index can be used with security_barrier
views or row-level security policies when the query's WHERE
clause contains an operator which is associated with the index.
---
 doc/src/sgml/ref/psql-ref.sgml |  3 ++-
 doc/src/sgml/rules.sgml| 10 ++
 src/bin/psql/describe.c| 17 +
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 830306ea1e..d59afa7524 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1362,7 +1362,8 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 is specified, only members of operator families whose names match that
 pattern are listed.
 If + is appended to the command name, each operator
-is listed with its sort operator family (if it is an ordering operator).
+is listed with its sort operator family (if it is an ordering operator),
+and whether it is leak-proof.
 
 
   
diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml
index 7a928bd7b9..5e17031ee9 100644
--- a/doc/src/sgml/rules.sgml
+++ b/doc/src/sgml/rules.sgml
@@ -2167,6 +2167,16 @@ CREATE VIEW phone_number WITH (security_barrier) AS
 view's row filters.
 
 
+
+For example, an index scan can not be selected for queries with
+security_barrier views or row-level security policies if an
+operator used in the WHERE clause is associated with the
+operator family of the index, but its underlying function is not marked
+LEAKPROOF. The  program's
+\dAo+ meta-command is useful for listing the operators
+with associated operator families and whether it is leak-proof.
+
+
 
 It is important to understand that even a view created with the
 security_barrier option is intended to be secure only
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f67bf0b892..243f099017 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6872,7 +6872,7 @@ listOpFamilyOperators(const char *access_method_pattern,
 	printQueryOpt myopt = pset.popt;
 	bool		have_where = false;
 
-	static const bool translate_columns[] = {false, false, false, false, false, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, false};
 
 	initPQExpBuffer(&buf);
 
@@ -6900,8 +6900,15 @@ listOpFamilyOperators(const char *access_method_pattern,
 
 	if (verbose)
 		appendPQExpBuffer(&buf,
-		  ", ofs.opfname AS \"%s\"\n",
-		  gettext_noop("Sort opfamily"));
+		  ", ofs.opfname AS \"%s\"\n,"
+		  "  CASE\n"
+		  "   WHEN p.proleakproof THEN '%s'\n"
+		  "   ELSE '%s'\n"
+		  " END AS \"%s\"\n",
+		  gettext_noop("Sort opfamily"),
+		  gettext_noop("yes"),
+		  gettext_noop("no"),
+		  gettext_noop("Leak-proof"));
 	appendPQExpBufferStr(&buf,
 		 "FROM pg_catalog.pg_amop o\n"
 		 "  LEFT JOIN pg_catalog.pg_opfamily of ON of.oid = o.amopfamily\n"
@@ -6909,7 +6916,9 @@ listOpFamilyOperators(const char *access_method_pattern,
 		 "  LEFT JOIN pg_catalog.pg_namespace nsf ON of.opfnamespace = nsf.oid\n");
 	if (verbose)
 		appendPQExpBufferStr(&buf,
-			 "  LEFT JOIN pg_catalog.pg_opfamily ofs ON ofs.oid = o.amopsortfamily\n");
+			 "  LEFT JOIN pg_catalog.pg_opfamily ofs ON ofs.oid = o.amopsortfamily\n"
+			 "  LEFT JOIN pg_catalog.pg_operator op ON op.oid = o.amopopr\n"
+			 "  LEFT JOIN pg_catalog.pg_proc p ON p.oid = op.oprcode\n");
 
 	if (access_method_pattern)
 	{
-- 
2.25.1

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-01 Thread Joel Jacobson

On Mon, Jul 1, 2024, at 14:25, Dagfinn Ilmari Mannsåker wrote:
> div_var() also has an optimisation for 3- and 4-digit operands under
> HAVE_INT128 (added in commit 0aa38db56bf), would that make sense in
> mul_var() too?

I considered it, but it only gives a marginal speed-up on Intel Core i9-14900K,
and is actually slower on Apple M3 Max.
Not really sure why. Maybe the code I tried can be optimized further:

```
#ifdef HAVE_INT128
/*
 * If var1 and var2 are up to four digits, their product will fit in
 * an int128 can be computed directly, which is significantly faster.
 */
if (var2ndigits <= 4)
{
int128  product = 0;

switch (var1ndigits)
{
case 1:
product = var1digits[0];
break;
case 2:
product = var1digits[0] * NBASE + var1digits[1];
break;
case 3:
product = ((int128) var1digits[0] * NBASE + 
var1digits[1])
* NBASE + var1digits[2];
break;
case 4:
product = (((int128) var1digits[0] * NBASE + 
var1digits[1])
* NBASE + var1digits[2]) * 
NBASE + var1digits[3];
break;
}

switch (var2ndigits)
{
case 1:
product *= var2digits[0];
break;
case 2:
product *= var2digits[0] * NBASE + 
var2digits[1];
break;
case 3:
product = ((int128) var2digits[0] * NBASE + 
var2digits[1])
* NBASE + var2digits[2];
break;
case 4:
product = (((int128) var2digits[0] * NBASE + 
var2digits[1])
* NBASE + var2digits[2]) * 
NBASE + var2digits[3];
break;
}

alloc_var(result, res_ndigits);
res_digits = result->digits;
for (i = res_ndigits - 1; i >= 0; i--)
{
res_digits[i] = product % NBASE;
product /= NBASE;
}
Assert(product == 0);

/*
 * Finally, round the result to the requested precision.
 */
result->weight = res_weight;
result->sign = res_sign;

/* Round to target rscale (and set result->dscale) */
round_var(result, rscale);

/* Strip leading and trailing zeroes */
strip_var(result);

return;
}
#endif
```

Benchmark 1, testing 2 ndigits * 2 ndigits:

SELECT
   timeit.pretty_time(total_time_a / 1e6 / executions,3) AS execution_time_a,
   timeit.pretty_time(total_time_b / 1e6 / executions,3) AS execution_time_b,
   total_time_a::numeric/total_time_b AS performance_ratio
FROM timeit.cmp(
   'numeric_mul',
   'numeric_mul_patched',
   input_values := ARRAY[
  '',
  ''
   ],
   min_time := 100,
   timeout := '10 s'
);

Apple M3 Max:

 execution_time_a | execution_time_b | performance_ratio
--+--+
 32.2 ns  | 20.5 ns  | 1.5700112246809388
(1 row)

Intel Core i9-14900K:

 execution_time_a | execution_time_b | performance_ratio
--+--+
 30.2 ns  | 21.4 ns  | 1.4113042510107371
(1 row)

So 57% and 41% faster.

Benchmark 2, testing 4 ndigits * 4 ndigits:

SELECT
   timeit.pretty_time(total_time_a / 1e6 / executions,3) AS execution_time_a,
   timeit.pretty_time(total_time_b / 1e6 / executions,3) AS execution_time_b,
   total_time_a::numeric/total_time_b AS performance_ratio
FROM timeit.cmp(
   'numeric_mul',
   'numeric_mul_patched',
   input_values := ARRAY[
  '',
  ''
   ],
   min_time := 100,
   timeout := '10 s'
);

Apple M3 Max:

 execution_time_a | execution_time_b |   performance_ratio
--+--+
 41.9 ns  | 51.3 ns  | 0.81733655797170943614
(1 row)

Intel Core i9-14900K:

 execution_time_a | execution_time_b | performance_ratio
--+--+
 40 ns| 38 ns| 1.0515610914706320
(1 row)

So 18% slower on Apple M3 Max and just 5% faster on Intel Core i9-14900

Re: PATCH: Add query for operators unusable with RLS to documentation

2024-07-01 Thread Yugo NAGATA

On Sun, 23 Jun 2024 19:14:09 +0900
Yugo NAGATA  wrote:

> and Operator Families"? Additionally, is it useful to add LEAKPROOF 
> information
> to the result of psql \dAo(+) meta-comand, or a function that can check given 
> index
> or operator is leakproof or not?

I worte a pach to implement the proposal above and submitted in the new 
thread[1].

[1] 
https://www.postgresql.org/message-id/20240701220817.483f9b645b95611f8b1f65da%40sranhm.sraoss.co.jp

Regards,
Yugo Nagata


> Regards,
> Yugo Nagata
> 
> > Thanks,
> > Josh
> > 
> > [1] 
> > https://www.postgresql.org/message-id/CAGrP7a2t%2BJbeuxpQY%2BRSvNe4fr3%2B%3D%3DUmyimwV0GCD%2BwcrSSb%3Dw%40mail.gmail.com
> 
> 
> -- 
> Yugo NAGATA 
> 
> 


-- 
Yugo NAGATA

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-01 Thread Joel Jacobson

On Mon, Jul 1, 2024, at 15:11, Joel Jacobson wrote:
> Not really sure why. Maybe the code I tried can be optimized further:

If anyone want experiment with the int128 version,
here is a patch that adds a separate numeric_mul_patched() function,
so it's easier to benchmark against the unmodified numeric_mul().

/Joel

0001-Optimize-mul_var-for-var2ndigits-4.patch
Description: Binary data

Re: Reg: Alternate way of hashing database role passwords

2024-07-01 Thread Daniel Gustafsson

> On 26 Jun 2024, at 18:59, Robert Haas  wrote:

> However, it seems like SCRAM is designed so
> that different hash functions can be substituted into it, so what I'm
> hoping is that we can keep SCRAM and just replace SCRAM-SHA-256 with
> SCRAM-WHATEVER when SHA-256 starts to look too weak.

Correct, SCRAM is an authentication method which can use different hashing
algorithms.  There are current drafts for SCRAM-SHA-512 and SHA3-512 but they
are some way away from being standardized.  If they become standards at some
point reasonable to extend our support, but until then there is no evidence
that what we have is insecure AFAIK.

https://datatracker.ietf.org/doc/html/draft-melnikov-scram-sha-512
https://datatracker.ietf.org/doc/html/draft-melnikov-scram-sha3-512

> What I find a bit surprising about Anbazhagan's question is that he
> asks about PBKDF2, which seems to be part of SCRAM already.

In scram_SaltedPassword() we perform PBKDF2 with HMAC as the pseudorandom
function.

--
Daniel Gustafsson

Re: Should we document how column DEFAULT expressions work?

2024-07-01 Thread James Coleman

On Sun, Jun 30, 2024 at 8:16 PM David G. Johnston
 wrote:
>
> On Sun, Jun 30, 2024 at 4:55 PM David Rowley  wrote:
>>
>>
>> I'd like to know what led someone down the path of doing something
>> like DEFAULT 'now()'::timestamp in a CREATE TABLE. Could it be a
>> faulty migration tool that created these and people copy them thinking
>> it's a legitimate syntax?
>>
>
> My thought process on this used to be:  Provide a text string of the 
> expression that is then stored within the catalog and eval'd during runtime.  
> If the only thing you are providing is a single literal and not some compound 
> expression it isn't that obvious that you are supposed to provide an unquoted 
> expression - which feels like it should be immediately evaluated - versus 
> something that is a constant.  Kinda like dynamic SQL.

I have a similar story to tell: I've honestly never thought about it
deeply until I started this thread, but just through experimentation a
few things were obvious:

- now() as a function call gives you the current timestamp in a query
- now() as a function call in a DDL DEFAULT clause sets that as a
default function call
- Quoting that function call (using the function call syntax is the
natural thing to try, I think, if you've already done the first two)
-- because some examples online show quoting it -- gives you DDL time
evaluation.

So I suspect -- though I've been doing this for so long I couldn't
tell you for certain -- that I largely intuitive the behavior by
observation.

And similarly to David J. I'd then assumed -- but never had a need to
test it -- that this was generalized.

I think DDL is also different conceptually from SQL/DML here in a kind
of insidious way: the "bare" function call in DEFAULT is *not*
executed as part of the query for DDL like it is with other queries.

Hope this helps explain things.

James Coleman

Re: Wrong security context for deferred triggers?

2024-07-01 Thread Bennie Swart


Hi,

Allow me to provide some background on how we came across this.

(This is my first time posting to a pgsql list so hopefully I've got 
everything set up correctly.)


We have a db with a big legacy section that we're in the process of 
modernizing. To compensate for some of the shortcomings we have designed 
a layer of writable views to better represent the underlying data and 
make working with it more convenient. The following should illustrate 
what we're doing:


    -- this is the schema containing the view layer.
    create schema api;
    -- and this user is granted access to the api, but not the rest of 
the legacy db.

    create role apiuser;
    grant usage on schema api to apiuser;

    -- some dummy objects in the legacy db - poorly laid out and poorly 
named.

    create schema legacy;
    create table legacy.stock_base (
    id serial primary key
      , lbl text not null unique
      , num int not null
      -- etc
    );
    create table legacy.stock_extra (
    id int not null unique references legacy.stock_base (id)
      , man text
      -- etc
    );

    -- create the stock view which better names and logically groups 
the data.

    create view api.stock as
      select sb.id
       , sb.lbl as description
       , sb.num as quantity
       , se.man as manufacturer
    from legacy.stock_base sb
      left join legacy.stock_extra se using (id);
    -- make it writable so it is easier to work with. use security 
definer to allow access to legacy sections.
    create function api.stock_cud() returns trigger language plpgsql 
security definer as $$

    begin
    -- insert/update legacy.stock_base and legacy.stock_extra 
depending on trigger action, modified fields, etc.

    assert tg_op = 'INSERT'; -- assume insert for example's sake.
    insert into legacy.stock_base (lbl, num) values 
(new.description, new.quantity) returning id into new.id;
    insert into legacy.stock_extra (id, man) values (new.id, 
new.manufacturer);

    return new;
    end;
    $$;
    create trigger stock_cud
    instead of insert or update or delete on api.stock
    for each row execute function api.stock_cud();

    -- grant the apiuser permission to work with the view.
    grant insert, update, delete on api.stock to apiuser;

    -- insert as superuser - works as expected.
    insert into api.stock (description, quantity, manufacturer) values 
('item1', 10, 'manufacturer1');

    -- insert as apiuser - works as expected.
    set role apiuser;
    insert into api.stock (description, quantity, manufacturer) values 
('item2', 10, 'manufacturer2');


In some cases there are constraint triggers on the underlying tables to 
validate certain states. It is, however, possible for a state to be 
temporarily invalid between statements, so long as it is valid at tx 
commit. For this reason the triggers are deferred by default. Consider 
the following example:


    reset role;
    create function legacy.stock_check_state() returns trigger language 
plpgsql as $$

    begin
    -- do some queries to check the state of stock based on 
modified rows and error if invalid.

    raise notice 'current_user %', current_user;
    -- dummy validation code.
    perform * from legacy.stock_base sb left join 
legacy.stock_extra se using (id) where sb.id = new.id;

    return new;
    end;
    $$;
    create constraint trigger stock_check_state
    after insert or update or delete on legacy.stock_base
    deferrable initially deferred
    for each row execute function legacy.stock_check_state();

Then repeat the inserts:

    -- insert as superuser - works as expected.
    reset role;
    insert into api.stock (description, quantity, manufacturer) values 
('item3', 10, 'manufacturer3'); -- NOTICE:  current_user postgres

    -- insert as apiuser - fails.
    set role apiuser;
    insert into api.stock (description, quantity, manufacturer) values 
('item4', 10, 'manufacturer4'); -- NOTICE:  current_user apiuser


    -- insert as apiuser, not deferred - works as expected.
    begin;
    set constraints all immediate;
    insert into api.stock (description, quantity, manufacturer) values 
('item4', 10, 'manufacturer4'); -- NOTICE:  current_user postgres

    commit;

The insert as apiuser fails when the constraint trigger is deferred, but 
works as expected when it is immediate.


Hopefully this helps to better paint the picture. Our workaround 
currently is to just make `legacy.stock_check_state()` security definer 
as well. As I told Laurenz, we're not looking to advocate for any 
specific outcome here. We noticed this strange behaviour and thought it 
to be a bug that should be fixed - whatever "fixed" ends up meaning.


Regards,

Bennie Swart


On 2024/06/26 17:53, Laurenz Albe wrote:

On Wed, 2024-06-26 at 07:38 -0700, David G. Johnston wrote:

We have a few choices then:
1. Status quo + documentation backpatch
2. Change v18 narrowly +

Re: [PATCH] Handle SK_SEARCHNULL and SK_SEARCHNOTNULL in HeapKeyTest

2024-07-01 Thread Aleksander Alekseev

Hi,

> This patches changes the HeapKeyTest macro to add handling for SK_SEARCHNULL
> and SK_SEARCHNOTNULL. While currently no core codes uses these ScanKey flags
> it would be useful for extensions if it was supported so extensions
> dont have to implement
> handling for those by themself.

As I recall, previously it was argued that changes like this should
have some use within the core [1].

Can you think of any such use?

[1]: https://commitfest.postgresql.org/42/4180/

-- 
Best regards,
Aleksander Alekseev

Re: LogwrtResult contended spinlock

2024-07-01 Thread Tom Lane

Alvaro Herrera  writes:
>> because the failed assertion is:
>> #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
>>     AssertPointerAlignment(&currval, 8);
>> #endif

Perhaps this assertion is what is wrong?  If the platform has no
native 8-byte alignment requirement, why do we think that atomics
need it?

regards, tom lane

Re: Backporting BackgroundPsql

2024-07-01 Thread Pavan Deolasee

H i Daniel,

On Mon, Jul 1, 2024 at 1:09 PM Daniel Gustafsson  wrote:

> > On 29 Jun 2024, at 06:38, Pavan Deolasee 
> wrote:
>
> > Don't we need to add install and uninstall rules for the new module,
> like we did in
> https://git.postgresql.org/pg/commitdiff/a4c17c86176cfa712f541b81b2a026ae054b275e
> and
> https://git.postgresql.org/pg/commitdiff/7039c7cff6736780c3bbb41a90a6dfea0f581ad2
> ?
>
> Thats correct, we should backport those as well.
>

Thanks for confirming. Attaching patches for PG15 and PG14, but this will
need backporting all the way up to PG12.

Thanks,
Pavan


0001-Fix-missing-installation-uninstallation-rules-for-Ba-PG14.patch
Description: Binary data


0001-Fix-missing-installation-uninstallation-rules-for-Ba-PG15.patch
Description: Binary data

Re: Should we move the resowner field from JitContext to LLVMJitContext?

2024-07-01 Thread Daniel Gustafsson

> On 5 Jun 2024, at 10:19, Andreas Karlsson  wrote:

> When Heikki made the resource owners extensible in commit 
> b8bff07daa85c837a2747b4d35cd5a27e73fb7b2 the API for JIT plugins changed when 
> ResourceOwnerForgetJIT() was moved from the generic JIT code to the LLVM 
> specific JIT code so now the resowner field of the context is only used by 
> the code of the LLVM plugin.
> 
> Maybe a bit late in the release cycle but should we make the resowner field 
> specific to the LLVM code too now that we already are breaking the API? I 
> personally do not like having a LLVM JIT specific field in the common struct. 
> Code is easier to understand if things are local. Granted most JIT engines 
> will likely need similar infrastructure but just providing the struct field 
> and nothing else does not seem very helpful.

I'm inclined to agree, given that the code for handling the resowner is private
to the LLVM implementation it makes sense for the resowner to be as well.  A
future JIT implementation will likely need a ResourceOwner, but it might just
as well need two or none.

--
Daniel Gustafsson

Re: gamma() and lgamma() functions

2024-07-01 Thread Stepan Neretin

On Mon, Jul 1, 2024 at 5:33 PM Dean Rasheed 
wrote:

> Attached is a patch adding support for the gamma and log-gamma
> functions. See, for example:
>
> https://en.wikipedia.org/wiki/Gamma_function
>
> I think these are very useful general-purpose mathematical functions.
> They're part of C99, and they're commonly included in other
> mathematical libraries, such as the python math module, so I think
> it's useful to make them available from SQL.
>
> The error-handling for these functions seems to be a little trickier
> than most, so that might need further discussion.
>
> Regards,
> Dean
>

Hi! The patch file seems broken.
git apply gamma-and-lgamma.patch
error: git apply: bad git-diff  — exptec /dev/null in line 2
Best regards, Stepan Neretin.

Re: Surround CheckRelation[Oid]LockedByMe() with USE_ASSERT_CHECKING

2024-07-01 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Jul 01, 2024 at 06:42:46AM +, Bertrand Drouvot wrote:
>> I think it would make sense to declare / define those functions only for
>> assert enabled build: please find attached a tiny patch doing so.

> Not convinced that's a good idea.  What about out-of-core code that
> may use these routines for runtime checks in non-assert builds?

Yeah.  Also, I believe it's possible for an extension that's been
built with assertions enabled to be used with a core server that
wasn't.  This is why, for example, ExceptionalCondition() is not
ifdef'd away in a non-assert build.  Even if you think there's
no use for CheckRelation[Oid]LockedByMe except in assertions,
it'd still be plenty reasonable for an extension to call them
in assertions.

regards, tom lane

Re: gamma() and lgamma() functions

2024-07-01 Thread Daniel Gustafsson

> On 1 Jul 2024, at 16:20, Stepan Neretin  wrote:

> The patch file seems broken.
> git apply gamma-and-lgamma.patch error: git apply: bad git-diff  — exptec 
> /dev/null in line 2

It's a plain patch file, if you apply it with patch and not git it will work 
fine:

$ patch -p1 < gamma-and-lgamma.patch
patching file 'doc/src/sgml/func.sgml'
patching file 'src/backend/utils/adt/float.c'
patching file 'src/include/catalog/pg_proc.dat'
patching file 'src/test/regress/expected/float8.out'
patching file 'src/test/regress/sql/float8.sql'

--
Daniel Gustafsson

Re: Should we document how column DEFAULT expressions work?

2024-07-01 Thread Peter Eisentraut


On 01.07.24 01:54, David Rowley wrote:

On Thu, 27 Jun 2024 at 23:57, Peter Eisentraut  wrote:

Maybe we should really be thinking about deprecating these special
values and steering users more urgently toward more robust alternatives.

Imagine if 'random' were a valid input value for numeric types.


I think there are valid reasons to use the special timestamp input
values.  One that I can think of is for use with partition pruning. If
you have a time-range partitioned table and want the planner to prune
the partitions rather than the executor, you could use
'now'::timestamp in your queries to allow the planner to prune.


Yeah, but is that a good user interface?  Or is that just something that 
happens to work now with the pieces that happened to be there, rather 
than a really designed interface?


Hypothetically, what would need to be done to make this work with now() 
or current_timestamp or something similar?  Do we need a new stability 
level that somehow encompasses this behavior, so that the function call 
can be evaluated at planning time?



That
works providing that you never use that in combination with PREPARE
and never put the query with the WHERE clause inside a VIEW.


And this kind of thing obviously makes this interface even worse.

Re: gamma() and lgamma() functions

2024-07-01 Thread Stepan Neretin

On Mon, Jul 1, 2024 at 5:33 PM Dean Rasheed 
wrote:

> Attached is a patch adding support for the gamma and log-gamma
> functions. See, for example:
>
> https://en.wikipedia.org/wiki/Gamma_function
>
> I think these are very useful general-purpose mathematical functions.
> They're part of C99, and they're commonly included in other
> mathematical libraries, such as the python math module, so I think
> it's useful to make them available from SQL.
>
> The error-handling for these functions seems to be a little trickier
> than most, so that might need further discussion.
>
> Regards,
> Dean
>

I tried to review the patch without applying it. It looks good to me, but I
have one notice:
ERROR:  value out of range: overflow. I think we need to add information
about the available ranges in the error message

Re: LogwrtResult contended spinlock

2024-07-01 Thread Alvaro Herrera

On 2024-Jul-01, Tom Lane wrote:

> Alvaro Herrera  writes:
> >> because the failed assertion is:
> >> #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> >>     AssertPointerAlignment(&currval, 8);
> >> #endif
> 
> Perhaps this assertion is what is wrong?  If the platform has no
> native 8-byte alignment requirement, why do we think that atomics
> need it?

Oh, that's a good question.  TBH I just copied the assertion from the
other routines for 64-bit variables in the same file.  But I think
that's correct.  We're gating the assertion on _not_ having emulation,
which must mean we have native atomics; on MSVC, if I read the #ifdef
maze correctly, that's implemented using _InterlockedCompareExchange,
whose docs state:

: The variables for this function must be aligned on a 64-bit boundary;
: otherwise, this function will behave unpredictably on multiprocessor x86
: systems and any non-x86 systems. See _aligned_malloc.
https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-interlockedcompareexchange64

So I think the assertion is correct.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"[PostgreSQL] is a great group; in my opinion it is THE best open source
development communities in existence anywhere."(Lamar Owen)

Re: Surround CheckRelation[Oid]LockedByMe() with USE_ASSERT_CHECKING

2024-07-01 Thread Bertrand Drouvot

Hi,

On Mon, Jul 01, 2024 at 10:21:35AM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Jul 01, 2024 at 06:42:46AM +, Bertrand Drouvot wrote:
> >> I think it would make sense to declare / define those functions only for
> >> assert enabled build: please find attached a tiny patch doing so.
> 
> > Not convinced that's a good idea.  What about out-of-core code that
> > may use these routines for runtime checks in non-assert builds?
> 
> Yeah.  Also, I believe it's possible for an extension that's been
> built with assertions enabled to be used with a core server that
> wasn't.  This is why, for example, ExceptionalCondition() is not
> ifdef'd away in a non-assert build.  Even if you think there's
> no use for CheckRelation[Oid]LockedByMe except in assertions,
> it'd still be plenty reasonable for an extension to call them
> in assertions.

Yeah good point, thanks for the feedback! I've withdrawn the CF entry.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Should we document how column DEFAULT expressions work?

2024-07-01 Thread Tom Lane

Peter Eisentraut  writes:
> On 01.07.24 01:54, David Rowley wrote:
>> I think there are valid reasons to use the special timestamp input
>> values.  One that I can think of is for use with partition pruning. If
>> you have a time-range partitioned table and want the planner to prune
>> the partitions rather than the executor, you could use
>> 'now'::timestamp in your queries to allow the planner to prune.

> Yeah, but is that a good user interface?  Or is that just something that 
> happens to work now with the pieces that happened to be there, rather 
> than a really designed interface?

That's not a very useful argument to make.  What percentage of the
SQL language as a whole is legacy cruft that we'd do differently if
we could?  I think the answer is depressingly high.  Adding more
special-purpose features to the ones already there doesn't move
that needle in a desirable direction.

I'd be more excited about this discussion if I didn't think that
the chances of removing 'now'::timestamp are exactly zero.  You
can't just delete useful decades-old features, whether there's
a better way or not.

regards, tom lane

RE: speed up a logical replica setup

2024-07-01 Thread Hayato Kuroda (Fujitsu)

Dear Tom,

> I have a different but possibly-related complaint: why is
> 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> runs for a bit over 19 seconds, which seems completely out of line
> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> other test scripts in this directory take much less).  It looks
> like most of the blame falls on this step:
> 
> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> 
> AFAICS the amount of data being replicated is completely trivial,
> so that it doesn't make any sense for this to take so long --- and
> if it does, that suggests that this tool will be impossibly slow
> for production use.  But I suspect there is a logic flaw causing
> this.

I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
wasted some time. This was caused by the recovery target seeming unsatisfactory.

We are setting recovery_target_lsn by the return value of 
pg_create_logical_replication_slot(),
which returns the end of the RUNNING_XACT record. If we use the returned value 
as
recovery_target_lsn as-is, however, we must wait for additional WAL generation
because the parameter requires that the replicated WAL overtake a certain point.
On my env, the function waited until the bgwriter emitted the 
XLOG_RUNNING_XACTS record.

One simple solution is to add an additional WAL record at the end of the 
publisher
setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The 
attached
patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much 
execution
time on my environment.

```
BEFORE
(13.751s) ok 30 - run pg_createsubscriber on node S
AFTER
(0.749s) ok 30 - run pg_createsubscriber on node S
```

However, even after the modification, the reported failure [1] could not be 
resolved on my env.

How do you think?

[1]: 
https://www.postgresql.org/message-id/0dffca12-bf17-4a7a-334d-225569de5e6e%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


emit_dummy_message.diff
Description: emit_dummy_message.diff

Re: LogwrtResult contended spinlock

2024-07-01 Thread Alvaro Herrera

Maybe we can do something like this,

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index 78987f3154..f6fa90bad8 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -580,7 +580,20 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, 
int64 sub_)
 static inline uint64
 pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
 {
+   /*
+* When using actual (not simulated) atomics, the target variable for
+* pg_atomic_compare_exchange_u64 must have suitable alignment, which
+* is acquired naturally on most platforms, but not on 32-bit ones;
+* persuade the compiler in that case, but fail if we
+* cannot.
+*/
+#if MAXIMUM_ALIGNOF >= 8
uint64  currval;
+#elif defined(pg_attribute_aligned) && !defined(PG_HAVE_ATOMIC_U64_SIMULATION)
+   pg_attribute_aligned(8) uint64  currval;
+#else
+#error "Must have pg_attribute aligned or simulated atomics"
+#endif
 
 #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
AssertPointerAlignment(ptr, 8);


-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."(Michael Brusser)

Re: LogwrtResult contended spinlock

2024-07-01 Thread Tom Lane

Alvaro Herrera  writes:
> Maybe we can do something like this,

> +#if MAXIMUM_ALIGNOF >= 8
>   uint64  currval;

This should probably be testing the alignment of int64 specifically,
rather than assuming that MAXIMUM_ALIGNOF applies to it.  At least
historically, there have been platforms where MAXIMUM_ALIGNOF is
determined by float quantities and integer quantities are different.

regards, tom lane

jsonpath: Inconsistency of timestamp_tz() Output

2024-07-01 Thread David E. Wheeler

Hackers,

There’s an odd difference in the behavior of timestamp_tz() outputs. Running 
with America/New_York as my TZ, it looks fine for a full timestamptz, identical 
to how casting the types directly works:

david=# set time zone 'America/New_York';
SET

david=# select '2024-08-15 12:34:56-04'::timestamptz;
  timestamptz   

 2024-08-15 12:34:56-04
(1 row)

david=# select jsonb_path_query_tz('"2024-08-15 12:34:56-04"', 
'$.timestamp_tz()');
 jsonb_path_query_tz 
-
 "2024-08-15T12:34:56-04:00"

Both show the time in America/New_York, which is great. But when casting from a 
date, the behavior differs. Casting directly:

david=# select '2024-08-15'::date::timestamptz;
  timestamptz   

 2024-08-15 00:00:00-04

It stringifies to the current zone setting again, as expected. But look at the 
output from a path query:

david=# select jsonb_path_query_tz('"2023-08-15"', '$.timestamp_tz()');
 jsonb_path_query_tz 
-
 "2023-08-15T04:00:00+00:00"

It’s using UTC for the display output! Shouldn’t it be using America/New_York?

Note that I’m comparing a cast from date to timestamptz because that’s how the 
jsonpath parsing works[1]: it ultimately uses 
date2timestamptz_opt_overflow()[2] to make the conversion, which appears to set 
the offset from the time zone GUC, so I’m not sure where it’s converted to UTC 
before stringifying.

Maybe an argument is missing from the stringification path?

FWIW, explicitly calling the string() jsonpath method does produce a result in 
the current TZ:

david=# select jsonb_path_query_tz('"2023-08-15"', '$.timestamp_tz().string()');
   jsonb_path_query_tz
--
 "2023-08-15 00:00:00-04"

That bit uses timestamptz_out to format the output, but JSONB has its own 
stringification[4] (called here[5]), but I can’t tell what might be different 
between a timestamptz cast from a date and one not cast from a date.

Note the same divergency in behavior occurs when the source value is a 
timestamp, too. Compare:

david=# select '2024-08-15 12:34:56'::timestamp::timestamptz;
  timestamptz   

 2024-08-15 12:34:56-04
(1 row)

david=# select jsonb_path_query_tz('"2023-08-15 12:34:56"', '$.timestamp_tz()');
 jsonb_path_query_tz 
-
 "2023-08-15T16:34:56+00:00"
(1 row)

Anyway, should the output of timestamptz JSONB values be made more consistent? 
I’m happy to make a patch to do so, but could use a hand figuring out where the 
behavior varies.

Best,

David

[1]: 
https://github.com/postgres/postgres/blob/3497c87/src/backend/utils/adt/jsonpath_exec.c#L2708-L2718
[2]: 
https://github.com/postgres/postgres/blob/3497c87/src/backend/utils/adt/date.c#L613-L698
[3]: 
https://github.com/postgres/postgres/blob/3fb59e789dd9f21610101d1ec106ad58095e24f3/src/backend/utils/adt/jsonpath_exec.c#L1650-L1653
[4]: 
https://github.com/postgres/postgres/blob/3fb59e789dd9f21610101d1ec106ad58095e24f3/src/backend/utils/adt/json.c#L369-L407
[5]: 
https://github.com/postgres/postgres/blob/3fb59e7/src/backend/utils/adt/jsonb.c#L743-L748

Re: gamma() and lgamma() functions

2024-07-01 Thread Alvaro Herrera

On 2024-Jul-01, Stepan Neretin wrote:

> I have one notice:
> ERROR:  value out of range: overflow. I think we need to add information
> about the available ranges in the error message

I think this is a project of its own.  The error comes from calling
float_overflow_error(), which is a generic routine used in several
functions which have different overflow conditions.  It's not clear to
me that throwing errors listing the specific range for each case is
worth the effort.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)

Re: small pg_dump code cleanup

2024-07-01 Thread Daniel Gustafsson

> On 11 Jun 2024, at 22:30, Nathan Bossart  wrote:

> At the moment, I'm inclined to commit v1 once v18 development opens up.  We
> can consider any additional adjustments separately.

Patch LGTM and the tests pass, +1 on pushing this version.

--
Daniel Gustafsson

Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

2024-07-01 Thread Nathan Bossart

Committed.

-- 
nathan

Windows perl/tcl requirement documentation

2024-07-01 Thread Andrew Dunstan



Our docs currently state this regarding the perl requirement for 
building on Windows:



ActiveState Perl

   ActiveState Perl is required to run the build generation scripts.
   MinGW or Cygwin Perl will not work. It must also be present in the
   PATH. Binaries can be downloaded from https://www.activestate.com
    (Note: version 5.14 or later is
   required, the free Standard Distribution is sufficient).


This really hasn't been a true requirement for quite some time. I 
stopped using AS perl quite a few years ago due to possible licensing 
issues, and have been building with MSVC using Strawberry Perl ever 
since. Andres recently complained that Strawberry was somewhat out of 
date, but that's no longer really the case - it's on 5.38.2, which is 
the latest in that series, and perl 5.40.0 was only releases a few weeks 
ago.


We recommend AS Tcl/Tk, which I have not used, but which I am wary of 
for the same reasons. There is a BSD licensed up to date windows binary 
installation called Magicsplat which I haven't tried but which might 
well be suitable for our purposes.


I suggest we remove these references to AS and replace them with 
references to Windows distros that are more appropriate.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Wrong security context for deferred triggers?

2024-07-01 Thread Laurenz Albe

On Sat, 2024-06-22 at 17:50 -0400, Joseph Koshakow wrote:
> On Mon, Jun 10, 2024 at 1:00 PM Laurenz Albe  wrote:
> > Like you, I was surprised by the current behavior.  There is a design
> > principle that PostgreSQL tries to follow, called the "Principle of
> > least astonishment".  Things should behave like a moderately skilled
> > user would expect them to.  In my opinion, the current behavior violates
> > that principle.  Tomas seems to agree with that point of view.
> 
> I worry that both approaches violate this principle in different ways.
> For example consider the following sequence of events:
> 
>     SET ROLE r1;
>     BEGIN;
>     SET CONSTRAINTS ALL DEFERRED;
>     INSERT INTO ...;
>     SET ROLE r2;
>     SET search_path = '...';
>     COMMIT;
> 
> I think that it would be reasonable to expect that the triggers execute
> with r2 and not r1, since the triggers were explicitly deferred and the
> role was explicitly set. It would likely be surprising that the search
> path was updated for the trigger but not the role. With your proposed
> approach it would be impossible for someone to trigger a trigger with
> one role and execute it with another, if that's a desirable feature.

I definitely see your point, although GUC settings and the current
security context are something different.

It would definitely not be viable to put all GUC values in the trigger
state.

So if you say "all or nothing", it would be nothing, and the patch should
be rejected.

> > I didn't find this strange behavior myself: it was one of our customers
> > who uses security definer functions for data modifications and has
> > problems with the current behavior, and I am trying to improve the
> > situation on their behalf.
> 
> Would it be possible to share more details about this use case? For
> example, What are their current problems? Are they not able to set
> constraints to immediate? Or can they update the trigger function
> itself be a security definer function? That might help illuminate why
> the current behavior is wrong.

I asked them for a statement, and they were nice enough to write up
https://postgr.es/m/e89e8dd9-7143-4db8-ac19-b2951cb0c0da%40gmail.com

They have a workaround, so the patch is not absolutely necessary for them.

> 
> I also took a look at the code. It doesn't apply cleanly to master, so
> I took the liberty of rebasing and attaching it.
> 
> > + /*
> > + * The role could have been dropped since the trigger was queued.
> > + * In that case, give up and error out.
> > + */
> > + pfree(GetUserNameFromId(evtshared->ats_rolid, false));
> 
> It feels a bit wasteful to allocate and copy the role name when we
> never actually use it. Is it possible to check that the role exists
> without copying the name?

If that is a concern (and I can envision it to be), I can improve that.
One option is to copy the guts of GetUserNameFromId(), and another
is to factor out the common parts into a new function.

I'd wait until we have a decision whether we want the patch or not
before I make the effort, if that's ok.

> Everything else looked good, and the code does what it says it will.

Thanks for the review!

Yours,
Laurenz Albe

Re: Restart pg_usleep when interrupted

2024-07-01 Thread Sami Imseih


> 
>> Therefore, rather than "improving" pg_usleep (and uglifying its API),
>> the right answer is to fix parallel vacuum leaders to not depend on
>> pg_usleep in the first place.  A better idea might be to use
>> pg_sleep() or equivalent code.
> 
> Yes, that is a good idea to explore and it will not require introducing
> an awkward new API. I will look into using something similar to
> pg_sleep.


Looking through the history of the sleep in vacuum_delay_point, commit
720de00af49 replaced WaitLatch with pg_usleep to allow for microsecond
sleep precision [1]. 

Thomas has proposed a WaitLatchUs implementation in [2], but I have not
yet tried it. 

So I see there are 2 possible options here to deal with the interrupt of a 
parallel vacuum leader when a message is sent by a parallel vacuum worker. 

Option 1/ something like my initial proposal which is
to create a function similar to pg_usleep that is able to deal with
interrupts in a sleep. This could be a function scoped only to vacuum.c,
so it can only be used for vacuum delay purposes.

—— 
Option 2/ to explore the WaitLatchUs implementation by
Thomas which will give both a latch implementation for a sleep with
the microsecond precision.

It is worth mentioning that if we do end up using WaitLatch(Us) inside
vacuum_delay_point, it will need to set only WL_TIMEOUT and 
WL_EXIT_ON_PM_DEATH.

i.e.
(void) WaitLatch(MyLatch, WL_TIMEOUT| WL_EXIT_ON_PM_DEATH,
   msec
  WAIT_EVENT_VACUUM_DELAY);

This way it is not interrupted by a WL_LATCH_SET when a message
is set by a parallel worker.

——

Ultimately, I think option 2 may be worth a closer look as it is a cleaner
and safer approach, to detect a postmaster death.


Thoughts?


[1] 
https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BhUKGKVbJE59JkwnUj5XMY%2B-rzcTFciV9vVC7i%3DLUfWPds8Xw%40mail.gmail.com

Re: Backporting BackgroundPsql

2024-07-01 Thread Heikki Linnakangas


On 01/07/2024 17:11, Pavan Deolasee wrote:

H i Daniel,

On Mon, Jul 1, 2024 at 1:09 PM Daniel Gustafsson > wrote:


 > On 29 Jun 2024, at 06:38, Pavan Deolasee
mailto:pavan.deola...@gmail.com>> wrote:

 > Don't we need to add install and uninstall rules for the new
module, like we did in
https://git.postgresql.org/pg/commitdiff/a4c17c86176cfa712f541b81b2a026ae054b275e 
 
and https://git.postgresql.org/pg/commitdiff/7039c7cff6736780c3bbb41a90a6dfea0f581ad2 
 ?

Thats correct, we should backport those as well.

Thanks for confirming. Attaching patches for PG15 and PG14, but this 
will need backporting all the way up to PG12.


Thanks! Pushed to v12-v15.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: pg_sequence_last_value() for unlogged sequences on standbys

2024-07-01 Thread Nathan Bossart

On Thu, May 16, 2024 at 08:33:35PM -0500, Nathan Bossart wrote:
> Here is a rebased version of 0002, which I intend to commit once v18
> development begins.

Committed.

-- 
nathan

Re: LogwrtResult contended spinlock

2024-07-01 Thread Alvaro Herrera

On 2024-Jul-01, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Maybe we can do something like this,
> 
> > +#if MAXIMUM_ALIGNOF >= 8
> > uint64  currval;
> 
> This should probably be testing the alignment of int64 specifically,
> rather than assuming that MAXIMUM_ALIGNOF applies to it.  At least
> historically, there have been platforms where MAXIMUM_ALIGNOF is
> determined by float quantities and integer quantities are different.

OK, so it's
#if ALIGNOF_LONG_LONG_INT >= 8

Alexander, can you please confirm whether this works for you?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)
>From 65e9e813859fec436805b1ada7be9339f8ecd63e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 1 Jul 2024 16:44:22 +0200
Subject: [PATCH v4] Fix alignment in pg_atomic_monotonic_advance_u64

Unsurprisingly, it's possible for Windows x86 to have 8-byte atomics but
4-byte-aligned long long ints.  This breaks an assertion in recently
introduced pg_atomic_monotonic_advance_u64 with commit f3ff7bf83bce.
Try to fix that blindly by adding pg_attribute_aligned(8).  Unfortunately
not all compilers support that, so it needs to be protected by #ifdef;
fortunately not all compilers _need_ it.  Hopefully the combinations we
now support cover all interesting cases.

Reported-by: Alexander Lakhin 
Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/36796438-a718-cf9b-2071-b2c1b947c...@gmail.com
---
 src/include/port/atomics.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index 78987f3154..eeb47dee30 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -580,7 +580,21 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
 static inline uint64
 pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
 {
+	/*
+	 * When using actual (not simulated) atomics, the target variable for
+	 * pg_atomic_compare_exchange_u64 must have suitable alignment.  This
+	 * occurs naturally on platforms where long long int is 8-byte aligned. On
+	 * others, we must explicitly coerce the compiler into applying suitable
+	 * alignment, or abort the compile if we cannot.  When using simulated
+	 * atomics, the alignment doesn't matter.
+	 */
+#if ALIGNOF_LONG_LONG_INT >= 8 || defined(PG_HAVE_ATOMIC_U64_SIMULATION)
 	uint64		currval;
+#elif defined(pg_attribute_aligned)
+	pg_attribute_aligned(8) uint64 currval;
+#else
+#error "Must have pg_attribute_aligned or simulated atomics on this platform"
+#endif
 
 #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
 	AssertPointerAlignment(ptr, 8);
-- 
2.39.2

Re: call for applications: mentoring program for code contributors

2024-07-01 Thread Robert Haas

On Thu, Jun 20, 2024 at 1:12 PM Robert Haas  wrote:
> I'm working to start a mentoring program where code contributors can
> be mentored by current committers. Applications are now open:
> https://forms.gle/dgjmdxtHYXCSg6aB7

Applications are now closed. Initially, I had imagined just keeping
this form more or less indefinitely, but that looks clearly
impractical at this point, so what I'm going to do instead is create a
new form at some future point TBD and repeat this process, taking into
account what needs we have at that time. Part of the reason it seems
impractical to keep the form open is because a significant percentage
of applications are from people who have posted a total of zero (0)
emails to pgsql-hackers, and I don't want to waste my time or that of
other committers by relying to such inquiries one by one. Hence, the
form is closed for now, but with the intention of having a new one at
some point when the time seems opportune. That will also give people
who did not find a match this time an opportunity to resubmit if
they're still interested.

Matching is largely complete at this point. I expect to send emails to
all applicants letting them know what happened with their application
soon, hopefully tomorrow (my time). In preparation for that, allow me
to say that I'm very pleased with the number of acceptances that I
anticipate being able to extend. Some committers ended up deciding to
take two mentees, which is really great. More details on that soon.
Nonetheless, I am sure that those who did not find a mentor for one
reason or another will be disappointed. I hope that no one will be so
disappointed that they give up on hacking on PostgreSQL. Remember, if
you didn't get matched to a mentor, you're no worse off than you were
before, and your work on PostgreSQL is no less valuable than it was
before! I am also hoping to start up something to provide some more
limited support to people who didn't match to a mentor, and I'll tell
you more about that when and if I have more to say.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: LogwrtResult contended spinlock

2024-07-01 Thread Andres Freund

Hi,

On 2024-07-01 11:10:24 +0200, Alvaro Herrera wrote:
> In the meantime I noticed that pg_attribute_aligned() is not supported
> in every platform/compiler, so for safety sake I think it's better to go
> with what we do for PGAlignedBlock: use a union with a double member.
> That should be 8-byte aligned on x86 as well, unless I misunderstand.

If a platform wants to support 8 byte atomics, it better provides a way to
make variables 8 bytes aligned. We already rely on that, actually. See use of
pg_attribute_aligned in e.g. src/include/port/atomics/generic-msvc.h



> From 9d240e90f87bf8b53bd5d92b623e33419db64717 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera 
> Date: Mon, 1 Jul 2024 10:41:06 +0200
> Subject: [PATCH v2] Fix alignment of variable in
>  pg_atomic_monotonic_advance_u64
> 
> Reported-by: Alexander Lakhin 
> Discussion: https://postgr.es/m/36796438-a718-cf9b-2071-b2c1b947c...@gmail.com
> ---
>  src/include/port/atomics.h | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> index 78987f3154..964732e660 100644
> --- a/src/include/port/atomics.h
> +++ b/src/include/port/atomics.h
> @@ -580,30 +580,38 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, 
> int64 sub_)
>  static inline uint64
>  pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 
> target_)
>  {
> - uint64  currval;
> + /*
> +  * On 32-bit machines, declaring a bare uint64 variable doesn't promise
> +  * the alignment we need, so coerce the compiler this way.
> +  */
> + union
> + {
> + uint64  u64;
> + double  force_align_d;
> + }   currval;

I wonder if we should just relax the alignment requirement for currval. It's
crucial that the pointer is atomically aligned (atomic ops across pages are
either forbidden or extremely slow), but it's far from obvious that it's
crucial for comparator value to be aligned.


>  #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
>   AssertPointerAlignment(ptr, 8);
>  #endif

What's the point of this assert, btw? This stuff is already asserted in lower
level routines, so it just seems redundant to have it here?


> - currval = pg_atomic_read_u64_impl(ptr);
> - if (currval >= target_)
> + currval.u64 = pg_atomic_read_u64_impl(ptr);
> + if (currval.u64 >= target_)
>   {
>   pg_memory_barrier();
> - return currval;
> + return currval.u64;
>   }
>  
>  #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> - AssertPointerAlignment(&currval, 8);
> + AssertPointerAlignment(&currval.u64, 8);
>  #endif
>  
> - while (currval < target_)
> + while (currval.u64 < target_)
>   {
> - if (pg_atomic_compare_exchange_u64_impl(ptr, &currval, target_))
> + if (pg_atomic_compare_exchange_u64_impl(ptr, &currval.u64, 
> target_))
>   break;
>   }
>  
> - return Max(target_, currval);
> + return Max(target_, currval.u64);

What does the Max() actually achieve here? Shouldn't it be impossible to reach
this with  currval < target_?

And why does target_ end in an underscore?

Greetings,

Andres Freund

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 06:20, Daniel Gustafsson 
escreveu:

> > On 27 Jun 2024, at 13:50, Ranier Vilela  wrote:
>
> > Now with file patch really attached.
>
> -   if (strlen(backupidstr) > MAXPGPATH)
> +   if (strlcpy(state->name, backupidstr, sizeof(state->name)) >=
> sizeof(state->name))
> ereport(ERROR,
>
> Stylistic nit perhaps, I would keep the strlen check here and just replace
> the
> memcpy with strlcpy.  Using strlen in the error message check makes the
> code
> more readable.
>
This is not performance-critical code, so I see no problem using strlen,
for the sake of readability.


>
> -   charname[MAXPGPATH + 1];
> +   charname[MAXPGPATH];/* backup label name */
>
> With the introduced use of strlcpy, why do we need to change this field?
>
The part about being the only reference in the entire code that uses
MAXPGPATH + 1.
MAXPGPATH is defined as 1024, so MAXPGPATH +1 is 1025.
I think this hurts the calculation of the array index,
preventing power two optimization.

Another argument is that all other paths have a 1023 size limit,
I don't see why the backup label would have to be different.

New version patch attached.

best regards,
Ranier Vilela
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d36272ab4f..bfb500aa54 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8742,9 +8742,9 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("backup label too long (max %d bytes)",
-		MAXPGPATH)));
+		MAXPGPATH - 1)));
 
-	memcpy(state->name, backupidstr, strlen(backupidstr));
+	(void) strlcpy(state->name, backupidstr, MAXPGPATH))
 
 	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index c30d4a5991..a52345850e 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -21,8 +21,7 @@
 typedef struct BackupState
 {
 	/* Fields saved at backup start */
-	/* Backup label name one extra byte for null-termination */
-	char		name[MAXPGPATH + 1];
+	char		name[MAXPGPATH];/* backup label name */
 	XLogRecPtr	startpoint;		/* backup start WAL location */
 	TimeLineID	starttli;		/* backup start TLI */
 	XLogRecPtr	checkpointloc;	/* last checkpoint location */

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 14:35, Ranier Vilela 
escreveu:

> Em seg., 1 de jul. de 2024 às 06:20, Daniel Gustafsson 
> escreveu:
>
>> > On 27 Jun 2024, at 13:50, Ranier Vilela  wrote:
>>
>> > Now with file patch really attached.
>>
>> -   if (strlen(backupidstr) > MAXPGPATH)
>> +   if (strlcpy(state->name, backupidstr, sizeof(state->name)) >=
>> sizeof(state->name))
>> ereport(ERROR,
>>
>> Stylistic nit perhaps, I would keep the strlen check here and just
>> replace the
>> memcpy with strlcpy.  Using strlen in the error message check makes the
>> code
>> more readable.
>>
> This is not performance-critical code, so I see no problem using strlen,
> for the sake of readability.
>
>
>>
>> -   charname[MAXPGPATH + 1];
>> +   charname[MAXPGPATH];/* backup label name */
>>
>> With the introduced use of strlcpy, why do we need to change this field?
>>
> The part about being the only reference in the entire code that uses
> MAXPGPATH + 1.
> MAXPGPATH is defined as 1024, so MAXPGPATH +1 is 1025.
> I think this hurts the calculation of the array index,
> preventing power two optimization.
>
> Another argument is that all other paths have a 1023 size limit,
> I don't see why the backup label would have to be different.
>
> New version patch attached.
>
Sorry for v5, I forgot to update the patch and it was an error.

best regards,
Ranier Vilela


v6-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Commitfest manager for July 2024

2024-07-01 Thread Andrey M. Borodin

Hello everyone!

As Michael noted in e26810d01d4 [0] hacking for Postgres 17 has begun.
I’ve skimmed through hackers@ list. And it seems that so far role of the 
commitfest manager is vacant.

Is anyone up for doing this community work? Make things moving :)


Best regards, Andrey Borodin.

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e26810d01d4
[1] https://commitfest.postgresql.org/48/

Re: Remove last traces of HPPA support

2024-07-01 Thread Tom Lane

Noah Misch  writes:
> On Sat, Oct 21, 2023 at 02:18:19AM -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> IMO a single person looking at HPPA code for a few minutes is a cost that 
>>> more
>>> than outweighs the potential benefits of continuing "supporting" this dead
>>> arch. Even code that doesn't need to change has costs, particularly if it's
>>> intermingled with actually important code (which spinlocks certainly are).

>> Yup, that.  It's not zero cost to carry this stuff.

> +1 for dropping it.

Done at commit edadeb0710.

regards, tom lane

Re: LogwrtResult contended spinlock

2024-07-01 Thread Alvaro Herrera

Hello,

Thanks for your attention here.

On 2024-Jul-01, Andres Freund wrote:

> On 2024-07-01 11:10:24 +0200, Alvaro Herrera wrote:
> > In the meantime I noticed that pg_attribute_aligned() is not supported
> > in every platform/compiler, so for safety sake I think it's better to go
> > with what we do for PGAlignedBlock: use a union with a double member.
> > That should be 8-byte aligned on x86 as well, unless I misunderstand.
> 
> If a platform wants to support 8 byte atomics, it better provides a way to
> make variables 8 bytes aligned. We already rely on that, actually. See use of
> pg_attribute_aligned in e.g. src/include/port/atomics/generic-msvc.h

Well, pg_atomic_uint64 is a struct. Passing pointers to it is fine,
which is what non-platform-specific code does; but because the
declaration of the type is in each platform-specific file, it might not
work to use it directly in generic code.  I didn't actually try, but it
seems a bit of a layering violation.  (I didn't find any place where
the struct is used that way.)

If that works, then I think we could simply declare currval as a
pg_atomic_uint64 and it'd be prettier.

> > +   /*
> > +* On 32-bit machines, declaring a bare uint64 variable doesn't promise
> > +* the alignment we need, so coerce the compiler this way.
> > +*/
> > +   union
> > +   {
> > +   uint64  u64;
> > +   double  force_align_d;
> > +   }   currval;
> 
> I wonder if we should just relax the alignment requirement for currval. It's
> crucial that the pointer is atomically aligned (atomic ops across pages are
> either forbidden or extremely slow), but it's far from obvious that it's
> crucial for comparator value to be aligned.

I'm pretty sure the Microsoft docs I linked to are saying it must be
aligned.

> >  #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> > AssertPointerAlignment(ptr, 8);
> >  #endif
> 
> What's the point of this assert, btw? This stuff is already asserted in lower
> level routines, so it just seems redundant to have it here?

There are in some of them, but not in pg_atomic_compare_exchange_u64_impl.

> > -   return Max(target_, currval);
> > +   return Max(target_, currval.u64);
> 
> What does the Max() actually achieve here? Shouldn't it be impossible to reach
> this with  currval < target_?

When two processes are hitting the cmpxchg concurrently, we need to
return the highest value that was written, even if it was the other
process that did it.  The assertions in the calling site are quickly
broken if we don't do this.  I admit this aspect took me by surprise.

> And why does target_ end in an underscore?

Heh, you tell me -- I just copied the style of the functions just above.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No nos atrevemos a muchas cosas porque son difíciles,
pero son difíciles porque no nos atrevemos a hacerlas" (Séneca)

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Alvaro Herrera

On 2024-Jul-01, Ranier Vilela wrote:

> > -   charname[MAXPGPATH + 1];
> > +   charname[MAXPGPATH];/* backup label name */
> >
> > With the introduced use of strlcpy, why do we need to change this field?
> >
> The part about being the only reference in the entire code that uses
> MAXPGPATH + 1.

The bit I don't understand about this discussion is what will happen
with users that currently have exactly 1024 chars in backup names today.
With this change, we'll be truncating their names to 1023 chars instead.
Why would they feel that such change is welcome?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Daniel Gustafsson

> On 1 Jul 2024, at 21:15, Alvaro Herrera  wrote:
> On 2024-Jul-01, Ranier Vilela wrote:

>>> -   charname[MAXPGPATH + 1];
>>> +   charname[MAXPGPATH];/* backup label name */
>>> 
>>> With the introduced use of strlcpy, why do we need to change this field?
>>> 
>> The part about being the only reference in the entire code that uses
>> MAXPGPATH + 1.
> 
> The bit I don't understand about this discussion is what will happen
> with users that currently have exactly 1024 chars in backup names today.
> With this change, we'll be truncating their names to 1023 chars instead.
> Why would they feel that such change is welcome?

That's precisely what I was getting at.  Maybe it makes sense to change, maybe
not, but that's not for this patch to decide as that's a different discussion
from using safe string copying API's.

--
Daniel Gustafsson

Re: Built-in CTYPE provider

2024-07-01 Thread Jeff Davis

On Sat, 2024-06-29 at 15:08 -0700, Noah Misch wrote:
> lower(), initcap(), upper(), and regexp_matches() are
> PROVOLATILE_IMMUTABLE.
> Until now, we've delegated that responsibility to the user.  The user
> is
> supposed to somehow never update libc or ICU in a way that changes
> outcomes
> from these functions.

To me, "delegated" connotes a clear and organized transfer of
responsibility to the right person to solve it. In that sense, I
disagree that we've delegated it.

What's happened here is evolution of various choices that seemed
reasonable at the time. Unfortunately, the consequences that are hard
for us to manage and even harder for users to manage themselves.

>   Now that postgresql.org is taking that responsibility
> for builtin C.UTF-8, how should we govern it?  I think the above text
> and [1]
> convey that we'll update the Unicode data between major versions,
> making
> functions like lower() effectively STABLE.  Is that right?

Marking them STABLE is not a viable option, that would break a lot of
valid use cases, e.g. an index on LOWER().

Unicode already has its own governance, including a stability policy
that includes case mapping:

https://www.unicode.org/policies/stability_policy.html#Case_Pair

Granted, that policy does not guarantee that the results will never
change. In particular, the results can change if using unassinged code
poitns that are later assigned to Cased characters.

That's not terribly common though; for instance, there are zero changes
in uppercase/lowercase behavior between Unicode 14.0 (2021) and 15.1
(current) -- even for code points that were unassigned in 14.0 and
later assigned. I checked this by modifying case_test.c to look at
unassigned code points as well.

There's a greater chance that character properties can change (e.g.
whether a character is "alphabetic" or not) in new releases of Unicode.
Such properties can affect regex character classifications, and in some
cases the results of initcap (because it uses the "alphanumeric"
classification to determine word boundaries).

I don't think we need code changes for 17. Some documentation changes
might be helpful, though. Should we have a note around LOWER()/UPPER()
that users should REINDEX any dependent indexes when the provider is
updated?

> (This thread had some discussion[2] that datcollversion/collversion
> won't
> necessarily change when a major versions changes lower() behavior.)

datcollversion/collversion track the vertsion of the collation
specifically (text ordering only), not the ctype (character semantics).
When using the libc provider, get_collation_actual_version() completely
ignores the ctype.

It would be interesting to consider tracking the versions separately,
though.

Regards,
Jeff Davis

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-01 Thread Nathan Bossart

I figured I'd post what I have so far since this thread hasn't been updated
in a while.  The attached patches are still "proof-of-concept grade," but
they are at least moving in the right direction (IMHO).  The variable
naming is still not great, and they are woefully undercommented, among
other things.

0001 introduces a new API for registering callbacks and running them in
parallel on all databases in the cluster.  This new system manages a set of
"slots" that follow a simple state machine to asynchronously establish a
connection and run the queries.  It uses system() to wait for these
asynchronous tasks to complete.  Users of this API only need to provide two
callbacks: one to return the query that should be run on each database and
another to process the results of that query.  If multiple queries are
required for each database, users can provide multiple sets of callbacks.

The other patches change several of the existing tasks to use this new API.
With these patches applied, I see the following differences in the output
of 'pg_upgrade | ts -i' for a cluster with 1k empty databases:

WITHOUT PATCH

00:00:19 Checking database user is the install user
ok
00:00:02 Checking for subscription state   
ok
00:00:06 Adding ".old" suffix to old global/pg_control 
ok
00:00:04 Checking for extension updates
ok

WITH PATCHES (--jobs 1)

00:00:10 Checking database user is the install user
ok
00:00:02 Checking for subscription state   
ok
00:00:07 Adding ".old" suffix to old global/pg_control 
ok
00:00:05 Checking for extension updates
ok

WITH PATCHES (--jobs 4)

00:00:06 Checking database user is the install user
ok
00:00:00 Checking for subscription state   
ok
00:00:02 Adding ".old" suffix to old global/pg_control 
ok
00:00:01 Checking for extension updates
ok

Note that the "Checking database user is the install user" time also
includes the call to get_db_rel_and_slot_infos() on the old cluster as well
as the call to get_loadable_libraries() on the old cluster.  I believe the
improvement with the patches with just one job is due to the consolidation
of the queries into one database connection (presently,
get_db_rel_and_slot_infos() creates 3 connections per database for some
upgrades).  Similarly, the "Adding \".old\" suffix to old
global/pg_control" time includes the call to get_db_rel_and_slot_infos() on
the new cluster.

There are several remaining places where we could use this new API to speed
up upgrades.  For example, I haven't attempted to use it for the data type
checks yet, and that tends to eat up a sizable chunk of time when there are
many databases.

On Thu, May 16, 2024 at 08:24:08PM -0500, Nathan Bossart wrote:
> On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote:
>> Also, did you consider connecting once to each database and running
>> many queries? Most of those seem like just checks.
> 
> This was the idea behind 347758b.  It may be possible to do more along
> these lines.  IMO parallelizing will still be useful even if we do combine
> more of the steps.

My current thinking is that any possible further consolidation should
happen as part of a follow-up effort to parallelization.  I'm cautiously
optimistic that the parallelization work will make the consolidation easier
since it moves things to rigidly-defined callback functions.

A separate piece of off-list feedback from Michael Paquier is that this new
parallel system might be something we can teach the ParallelSlot code used
by bin/scripts/ to do.  I've yet to look too deeply into this, but I
suspect that it will be difficult to combine the two.  For example, the
ParallelSlot system doesn't seem well-suited for the kind of
run-once-in-each-database tasks required by pg_upgrade, and the error
handling is probably little different, too.  However, it's still worth a
closer look, and I'm interested in folks' opinions on the subject.

-- 
nathan
>From d7683a095d4d2c1574005eb41504a5be256d6480 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 28 Jun 2024 11:02:44 -0500
Subject: [PATCH v2 1/6] introduce framework for parallelizing pg_upgrade tasks

---
 src/bin/pg_upgrade/Makefile  |   1 +
 src/bin/pg_upgrade/async.c   | 323 +++
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/pg_upgrade.h  |  16 ++
 src/tools/pgindent/typedefs.list |   4 +
 5 files changed, 345 insertions(+)
 create mode 100644 src/bin/pg_upgrade/async.c

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..3bc4f5d740 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgra

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 16:15, Alvaro Herrera 
escreveu:

> On 2024-Jul-01, Ranier Vilela wrote:
>
> > > -   charname[MAXPGPATH + 1];
> > > +   charname[MAXPGPATH];/* backup label name */
> > >
> > > With the introduced use of strlcpy, why do we need to change this
> field?
> > >
> > The part about being the only reference in the entire code that uses
> > MAXPGPATH + 1.
>
> The bit I don't understand about this discussion is what will happen
> with users that currently have exactly 1024 chars in backup names today.
> With this change, we'll be truncating their names to 1023 chars instead.
> Why would they feel that such change is welcome?
>
Yes of course, I understand.
This can be a problem for users.

best regards,
Ranier Vilela

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-01 Thread Joel Jacobson

On Mon, Jul 1, 2024, at 15:14, Joel Jacobson wrote:
> * 0001-Optimize-mul_var-for-var2ndigits-4.patch

Found a typo, fixed in new version.

The int128 version is still slower though,
I wonder if there is something that can be done to speed it up further.

Below is a more realistic benchmark than just microbenchmarking mul_var(),
not testing the int128 version, but the code for up to 2 NBASE-digits:

```
CREATE TABLE bench_mul_var (num1 numeric, num2 numeric);

INSERT INTO bench_mul_var (num1, num2)
SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM 
generate_series(1,1e8);

SELECT SUM(num1*num2) FROM bench_mul_var;
Time: 8331.953 ms (00:08.332)
Time: 7415.241 ms (00:07.415)
Time: 7298.296 ms (00:07.298)
Time: 7314.754 ms (00:07.315)
Time: 7289.560 ms (00:07.290)

SELECT SUM(numeric_mul_patched(num1,num2)) FROM bench_mul_var;
Time: 6403.426 ms (00:06.403)
Time: 6401.797 ms (00:06.402)
Time: 6366.136 ms (00:06.366)
Time: 6376.049 ms (00:06.376)
Time: 6317.282 ms (00:06.317)
``

Benchmarked on a Intel Core i9-14900K machine.

/Joel

v2-0001-Optimize-mul_var-for-var2ndigits-4.patch
Description: Binary data

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Ranier Vilela

Em seg., 1 de jul. de 2024 às 16:20, Daniel Gustafsson 
escreveu:

> > On 1 Jul 2024, at 21:15, Alvaro Herrera  wrote:
> > On 2024-Jul-01, Ranier Vilela wrote:
>
> >>> -   charname[MAXPGPATH + 1];
> >>> +   charname[MAXPGPATH];/* backup label name */
> >>>
> >>> With the introduced use of strlcpy, why do we need to change this
> field?
> >>>
> >> The part about being the only reference in the entire code that uses
> >> MAXPGPATH + 1.
> >
> > The bit I don't understand about this discussion is what will happen
> > with users that currently have exactly 1024 chars in backup names today.
> > With this change, we'll be truncating their names to 1023 chars instead.
> > Why would they feel that such change is welcome?
>
> That's precisely what I was getting at.  Maybe it makes sense to change,
> maybe
> not, but that's not for this patch to decide as that's a different
> discussion
> from using safe string copying API's.
>
Ok, this is not material for the proposal and can be considered unrelated.

We only have to replace it with strlcpy.

v7 attached.

best regards,
Ranier Vilela


v7-avoid-incomplete-copy-string-do_pg_backup_start.patch
Description: Binary data

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-01 Thread Robert Haas

On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart  wrote:
> 0001 introduces a new API for registering callbacks and running them in
> parallel on all databases in the cluster.  This new system manages a set of
> "slots" that follow a simple state machine to asynchronously establish a
> connection and run the queries.  It uses system() to wait for these
> asynchronous tasks to complete.  Users of this API only need to provide two
> callbacks: one to return the query that should be run on each database and
> another to process the results of that query.  If multiple queries are
> required for each database, users can provide multiple sets of callbacks.

I do really like the idea of using asynchronous communication here. It
should be significantly cheaper than using multiple processes or
threads, and maybe less code, too. But I'm confused about why you
would need or want to use system() to wait for asynchronous tasks to
complete. Wouldn't it be something like select()?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-01 Thread Nathan Bossart

On Mon, Jul 01, 2024 at 03:58:16PM -0400, Robert Haas wrote:
> On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart  
> wrote:
>> 0001 introduces a new API for registering callbacks and running them in
>> parallel on all databases in the cluster.  This new system manages a set of
>> "slots" that follow a simple state machine to asynchronously establish a
>> connection and run the queries.  It uses system() to wait for these
>> asynchronous tasks to complete.  Users of this API only need to provide two
>> callbacks: one to return the query that should be run on each database and
>> another to process the results of that query.  If multiple queries are
>> required for each database, users can provide multiple sets of callbacks.
> 
> I do really like the idea of using asynchronous communication here. It
> should be significantly cheaper than using multiple processes or
> threads, and maybe less code, too. But I'm confused about why you
> would need or want to use system() to wait for asynchronous tasks to
> complete. Wouldn't it be something like select()?

Whoops, I meant to type "select()" there.  Sorry for the confusion.

-- 
nathan

Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

2024-07-01 Thread Daniel Gustafsson

> On 1 Jul 2024, at 21:58, Ranier Vilela  wrote:

> We only have to replace it with strlcpy.

Thanks, I'll have a look at applying this in the tomorrow morning.

--
Daniel Gustafsson

Re: Commitfest manager for July 2024

2024-07-01 Thread Kirill Reshke

> Postgres 17
18

Re: WIP: parallel GiST index builds

2024-07-01 Thread Tomas Vondra

Hi,

I've done a number of experiments with the GiST parallel builds, both
with the sorted and unsorted cases, so let me share some of the results
and conclusions from that.

In the first post I did some benchmarks using btree_gist, but that
seemed not very realistic - there certainly are much more widely used
GiST indexes in the GIS world. So this time I used OpenStreetMap, loaded
using osm2pgsql, with two dataset sizes:

- small - "north america" (121GB without indexes)
- large - "planet" (688GB without indexes)

And then I created indexes using either gist_geometry_ops_2d (with sort)
or gist_geometry_ops_nd (no sorting).

On 6/7/24 19:41, Tomas Vondra wrote:
> Hi,
> 
> After looking into parallel builds for BRIN and GIN indexes, I was
> wondering if there's a way to do parallel builds for GiST too. I knew
> next to nothing about how GiST works, but I gave it a shot and here's
> what I have - the attached patch allows parallel GiST builds for the
> "unsorted" case (i.e. when the opclass does not include sortsupport),
> and does not support buffered builds.
> 
> 
> unsorted builds only
> 
> 
> Addressing only the unsorted case may seem a bit weird, but I did it
> this way for two reasons - parallel sort is a solved problem, and adding
> this to the patch seems quite straightforward. It's what btree does, for
> example. But I also was not very sure how common this is - we do have
> sort for points, but I have no idea if the PostGIS indexes define
> sorting etc. My guess was "no" but I've been told that's no longer true,
> so I guess sorted builds are more widely applicable than I thought.

For sorted builds, I made the claim that parallelizing sorted builds is
"solved problem" because we can use a parallel tuplesort. I was thinking
that maybe it'd be better to do that in the initial patch, and only then
introduce the more complex stuff in the unsorted case, so I gave this a
try, and it turned to be rather pointless.

Yes, parallel tuplesort does improve the duration, but it's not a very
significant improvement - maybe 10% or so. Most of the build time is
spent in gist_indexsortbuild(), so this is the part that would need to
be parallelized for any substantial improvement. Only then is it useful
to improve the tuplesort, I think.

And parallelizing gist_indexsortbuild() is not trivial - most of the
time is spent in gist_indexsortbuild_levelstate_flush() / gistSplit(),
so ISTM a successful parallel implementation would need to divide this
work between multiple workers. I don't have a clear idea how, though.

I do have a PoC/WIP patch doing the paralle tuplesort in my github
branch at [1] (and then also some ugly experiments on top of that), but
I'm not going to attach it here because of the reasons I just explained.
It'd be just a pointless distraction.

> In any case, I'm not in a rush to parallelize sorted builds. It can be
> added later, as an improvement, IMHO. In fact, it's a well isolated part
> of the patch, which might make it a good choice for someone looking for
> an idea for their first patch ...
> 

I still think this assessment is correct - it's fine to not parallelize
sorted builds. It can be improved in the future, or even not at all.

> 
> buffered builds
> ---
> 
> The lack of support for buffered builds is a very different thing. The
> basic idea is that we don't push the index entries all the way to the
> leaf pages right away, but accumulate them in buffers half-way through.
> This combines writes and reduces random I/O, which is nice.
> 
> Unfortunately, the way it's implemented does not work with parallel
> builds at all - all the state is in private memory, and it assumes the
> worker is the only possible backend that can split the page (at which
> point the buffers need to be split too, etc.). But for parallel builds
> this is obviously not true.
> 
> I'm not saying parallel builds can't do similar buffering, but it
> requires moving the buffers into shared memory, and introducing locking
> to coordinate accesses to the buffers. (Or perhaps it might be enough to
> only "notify" the workers about page splits, with buffers still in
> private memory?). Anyway, it seems far too complicated for v1.
> 
> In fact, I'm not sure the buffering is entirely necessary - maybe the
> increase in amount of RAM makes this less of an issue? If the index can
> fit into shared buffers (or at least page cache), maybe the amount of
> extra I/O is not that bad? I'm sure there may be cases really affected
> by this, but maybe it's OK to tell people to disable parallel builds in
> those cases?
> 

For unsorted builds, here's the results from one of the machines for
duration of CREATE INDEX with the requested number of workers (0 means
serial build) for different tables in the OSM databases:

  db type   size (MB)  | 0  1  2  3  4
  -|--
  small  line4889  |   811429294

Re: Relation bulk write facility

2024-07-01 Thread Noah Misch

On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> Committed this. Thanks everyone!

Commit 8af2565 wrote:
> --- /dev/null
> +++ b/src/backend/storage/smgr/bulk_write.c

> +/*
> + * Finish bulk write operation.
> + *
> + * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs
> + * the relation if needed.
> + */
> +void
> +smgr_bulk_finish(BulkWriteState *bulkstate)
> +{
> + /* WAL-log and flush any remaining pages */
> + smgr_bulk_flush(bulkstate);
> +
> + /*
> +  * When we wrote out the pages, we passed skipFsync=true to avoid the
> +  * overhead of registering all the writes with the checkpointer.  
> Register
> +  * the whole relation now.
> +  *
> +  * There is one hole in that idea: If a checkpoint occurred while we 
> were
> +  * writing the pages, it already missed fsyncing the pages we had 
> written
> +  * before the checkpoint started.  A crash later on would replay the WAL
> +  * starting from the checkpoint, therefore it wouldn't replay our 
> earlier
> +  * WAL records.  So if a checkpoint started after the bulk write, fsync
> +  * the files now.
> +  */
> + if (!SmgrIsTemp(bulkstate->smgr))
> + {

Shouldn't this be "if (bulkstate->use_wal)"?  The GetRedoRecPtr()-based
decision is irrelevant to the !wal case.  Either we don't need fsync at all
(TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal).  I
don't see any functional problem, but this likely arranges for an unnecessary
sync when a checkpoint starts between mdcreate() and here.  (The mdcreate()
sync may also be unnecessary, but that's longstanding.)

> + /*
> +  * Prevent a checkpoint from starting between the 
> GetRedoRecPtr() and
> +  * smgrregistersync() calls.
> +  */
> + Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
> + MyProc->delayChkptFlags |= DELAY_CHKPT_START;
> +
> + if (bulkstate->start_RedoRecPtr != GetRedoRecPtr())
> + {
> + /*
> +  * A checkpoint occurred and it didn't know about our 
> writes, so
> +  * fsync() the relation ourselves.
> +  */
> + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> + smgrimmedsync(bulkstate->smgr, bulkstate->forknum);
> + elog(DEBUG1, "flushed relation because a checkpoint 
> occurred concurrently");
> + }
> + else
> + {
> + smgrregistersync(bulkstate->smgr, bulkstate->forknum);
> + MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
> + }
> + }
> +}

This is an elegant optimization.

Re: Relation bulk write facility

2024-07-01 Thread Heikki Linnakangas


Thanks for poking at this!

On 01/07/2024 23:52, Noah Misch wrote:

Commit 8af2565 wrote:

--- /dev/null
+++ b/src/backend/storage/smgr/bulk_write.c



+/*
+ * Finish bulk write operation.
+ *
+ * This WAL-logs and flushes any remaining pending writes to disk, and fsyncs
+ * the relation if needed.
+ */
+void
+smgr_bulk_finish(BulkWriteState *bulkstate)
+{
+   /* WAL-log and flush any remaining pages */
+   smgr_bulk_flush(bulkstate);
+
+   /*
+* When we wrote out the pages, we passed skipFsync=true to avoid the
+* overhead of registering all the writes with the checkpointer.  
Register
+* the whole relation now.
+*
+* There is one hole in that idea: If a checkpoint occurred while we 
were
+* writing the pages, it already missed fsyncing the pages we had 
written
+* before the checkpoint started.  A crash later on would replay the WAL
+* starting from the checkpoint, therefore it wouldn't replay our 
earlier
+* WAL records.  So if a checkpoint started after the bulk write, fsync
+* the files now.
+*/
+   if (!SmgrIsTemp(bulkstate->smgr))
+   {


Shouldn't this be "if (bulkstate->use_wal)"?  The GetRedoRecPtr()-based
decision is irrelevant to the !wal case.  Either we don't need fsync at all
(TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal).


The point of GetRedoRecPtr() is to detect if a checkpoint has started 
concurrently. It works for that purpose whether or not the bulk load is 
WAL-logged. It is not compared with the LSNs of WAL records written by 
the bulk load.


Unlogged tables do need to be fsync'd. The scenario is:

1. Bulk load an unlogged table.
2. Shut down Postgres cleanly
3. Pull power plug from server, and restart.

We talked about this earlier in the "Unlogged relation copy is not 
fsync'd" thread [1]. I had already forgotten about that; that bug 
actually still exists in back branches, and we should fix it..


[1] 
https://www.postgresql.org/message-id/flat/65e94fc8-ce1d-dd02-3be3-fda0fe8f2965%40iki.fi



I don't see any functional problem, but this likely arranges for an
unnecessary sync when a checkpoint starts between mdcreate() and
here.  (The mdcreate() sync may also be unnecessary, but that's
longstanding.)
Hmm, yes we might do two fsyncs() with wal_level=minimal, unnecessarily. 
It seems hard to eliminate the redundancy. smgr_bulk_finish() could skip 
the fsync, if it knew that smgrDoPendingSyncs() will do it later. 
However, smgrDoPendingSyncs() might also decide to WAL-log the relation 
instead of fsyncing it, and in that case we do still need the fsync.


Fortunately, fsync() on a file that's already flushed to disk is pretty 
cheap.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

2024-07-01 Thread Dean Rasheed

On Mon, 1 Jul 2024 at 20:56, Joel Jacobson  wrote:
>
> Below is a more realistic benchmark
>
> CREATE TABLE bench_mul_var (num1 numeric, num2 numeric);
>
> INSERT INTO bench_mul_var (num1, num2)
> SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM 
> generate_series(1,1e8);
>
> SELECT SUM(num1*num2) FROM bench_mul_var;

I had a play with this, and came up with a slightly different way of
doing it that works for var2 of any size, as long as var1 is just 1 or
2 digits.

Repeating your benchmark where both numbers have up to 2 NBASE-digits,
this new approach was slightly faster:

SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
Time: 4762.990 ms (00:04.763)
Time: 4332.166 ms (00:04.332)
Time: 4276.211 ms (00:04.276)
Time: 4247.321 ms (00:04.247)
Time: 4166.738 ms (00:04.167)

SELECT SUM(num1*num2) FROM bench_mul_var; -- v2 patch
Time: 4398.812 ms (00:04.399)
Time: 3672.668 ms (00:03.673)
Time: 3650.227 ms (00:03.650)
Time: 3611.420 ms (00:03.611)
Time: 3534.218 ms (00:03.534)

SELECT SUM(num1*num2) FROM bench_mul_var; -- this patch
Time: 3350.596 ms (00:03.351)
Time: 3336.224 ms (00:03.336)
Time: 3335.599 ms (00:03.336)
Time: 3336.990 ms (00:03.337)
Time: 3351.453 ms (00:03.351)

(This was on an older Intel Core i9-9900K, so I'm not sure why all the
timings are faster. What compiler settings are you using?)

The approach taken in this patch only uses 32-bit integers, so in
theory it could be extended to work for var1ndigits = 3, 4, or even
more, but the code would get increasingly complex, and I ran out of
steam at 2 digits. It might be worth trying though.

Regards,
Dean
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 5510a20..adbfd5c
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8748,6 +8748,74 @@ mul_var(const NumericVar *var1, const Nu
}
 
/*
+* Simplified fast-path computation, if var1 has just one or two digits.
+* This is significantly faster, since it avoids allocating a separate
+* digit array, making multiple passes over var2, and having separate
+* carry-propagation passes.
+*/
+   if (var1ndigits <= 2)
+   {
+   NumericDigit *res_buf;
+
+   /* Allocate result digit array */
+   res_buf = digitbuf_alloc(res_ndigits);
+   res_buf[0] = 0; /* spare digit for 
later rounding */
+   res_digits = res_buf + 1;
+
+   /*
+* Compute the result digits directly, in one pass, propagating 
the
+* carry up as we go.
+*/
+   switch (var1ndigits)
+   {
+   case 1:
+   carry = 0;
+   for (i = res_ndigits - 3; i >= 0; i--)
+   {
+   newdig = (int) var1digits[0] * 
var2digits[i] + carry;
+   res_digits[i + 1] = (NumericDigit) 
(newdig % NBASE);
+   carry = newdig / NBASE;
+   }
+   res_digits[0] = (NumericDigit) carry;
+   break;
+
+   case 2:
+   newdig = (int) var1digits[1] * 
var2digits[res_ndigits - 4];
+   res_digits[res_ndigits - 2] = (NumericDigit) 
(newdig % NBASE);
+   carry = newdig / NBASE;
+
+   for (i = res_ndigits - 4; i > 0; i--)
+   {
+   newdig = (int) var1digits[0] * 
var2digits[i] +
+   (int) var1digits[1] * 
var2digits[i - 1] + carry;
+   res_digits[i + 1] = (NumericDigit) 
(newdig % NBASE);
+   carry = newdig / NBASE;
+   }
+
+   newdig = (int) var1digits[0] * var2digits[0] + 
carry;
+   res_digits[1] = (NumericDigit) (newdig % NBASE);
+   res_digits[0] = (NumericDigit) (newdig / NBASE);
+   break;
+   }
+
+   /* Store the product in result (minus extra rounding digit) */
+   digitbuf_free(result->buf);
+   result->ndigits = res_ndigits - 1;
+   result->buf = res_buf;
+   result->digits = res_digits;
+   result->weight = res_weight - 1;
+   result->sign = res_sign;
+
+   /* Round to target rscale (and set result->dscale) */
+   round_var(result, rscale);
+
+   /* Strip leading and trailing zeroes */
+   strip_var(result);
+
+   return;
+   }
+
+   /*

Re: Built-in CTYPE provider

2024-07-01 Thread Noah Misch

On Mon, Jul 01, 2024 at 12:24:15PM -0700, Jeff Davis wrote:
> On Sat, 2024-06-29 at 15:08 -0700, Noah Misch wrote:
> > lower(), initcap(), upper(), and regexp_matches() are
> > PROVOLATILE_IMMUTABLE.
> > Until now, we've delegated that responsibility to the user.  The user
> > is
> > supposed to somehow never update libc or ICU in a way that changes
> > outcomes
> > from these functions.
> 
> To me, "delegated" connotes a clear and organized transfer of
> responsibility to the right person to solve it. In that sense, I
> disagree that we've delegated it.

Good point.

> >   Now that postgresql.org is taking that responsibility
> > for builtin C.UTF-8, how should we govern it?  I think the above text
> > and [1]
> > convey that we'll update the Unicode data between major versions,
> > making
> > functions like lower() effectively STABLE.  Is that right?
> 
> Marking them STABLE is not a viable option, that would break a lot of
> valid use cases, e.g. an index on LOWER().

I agree.

> I don't think we need code changes for 17. Some documentation changes
> might be helpful, though. Should we have a note around LOWER()/UPPER()
> that users should REINDEX any dependent indexes when the provider is
> updated?

I agree the v17 code is fine.  Today, a user can (with difficulty) choose
dependency libraries so regexp_matches() is IMMUTABLE, as marked.  I don't
want $SUBJECT to be the ctype that, at some post-v17 version, can't achieve
that with unpatched PostgreSQL.  Let's change the documentation to say this
provider uses a particular snapshot of Unicode data, taken around PostgreSQL
17.  We plan never to change that data, so IMMUTABLE functions can rely on the
data.  If we provide a newer Unicode data set in the future, we'll provide it
in such a way that DDL must elect the new data.  How well would that suit your
vision for this feature?  An alternative would be to make pg_upgrade reject
operating on a cluster that contains use of $SUBJECT.

Re: Relation bulk write facility

2024-07-01 Thread Noah Misch

On Tue, Jul 02, 2024 at 12:53:05AM +0300, Heikki Linnakangas wrote:
> On 01/07/2024 23:52, Noah Misch wrote:
> > Commit 8af2565 wrote:
> > > --- /dev/null
> > > +++ b/src/backend/storage/smgr/bulk_write.c
> > 
> > > +/*
> > > + * Finish bulk write operation.
> > > + *
> > > + * This WAL-logs and flushes any remaining pending writes to disk, and 
> > > fsyncs
> > > + * the relation if needed.
> > > + */
> > > +void
> > > +smgr_bulk_finish(BulkWriteState *bulkstate)
> > > +{
> > > + /* WAL-log and flush any remaining pages */
> > > + smgr_bulk_flush(bulkstate);
> > > +
> > > + /*
> > > +  * When we wrote out the pages, we passed skipFsync=true to avoid the
> > > +  * overhead of registering all the writes with the checkpointer.  
> > > Register
> > > +  * the whole relation now.
> > > +  *
> > > +  * There is one hole in that idea: If a checkpoint occurred while we 
> > > were
> > > +  * writing the pages, it already missed fsyncing the pages we had 
> > > written
> > > +  * before the checkpoint started.  A crash later on would replay the WAL
> > > +  * starting from the checkpoint, therefore it wouldn't replay our 
> > > earlier
> > > +  * WAL records.  So if a checkpoint started after the bulk write, fsync
> > > +  * the files now.
> > > +  */
> > > + if (!SmgrIsTemp(bulkstate->smgr))
> > > + {
> > 
> > Shouldn't this be "if (bulkstate->use_wal)"?  The GetRedoRecPtr()-based
> > decision is irrelevant to the !wal case.  Either we don't need fsync at all
> > (TEMP or UNLOGGED) or smgrDoPendingSyncs() will do it (wal_level=minimal).
> 
> The point of GetRedoRecPtr() is to detect if a checkpoint has started
> concurrently. It works for that purpose whether or not the bulk load is
> WAL-logged. It is not compared with the LSNs of WAL records written by the
> bulk load.

I think the significance of start_RedoRecPtr is it preceding all records
needed to recreate the bulk write.  If start_RedoRecPtr==GetRedoRecPtr() and
we crash after commit, we're indifferent to whether the rel gets synced at a
checkpoint before that crash or rebuilt from WAL after that crash.  If
start_RedoRecPtr!=GetRedoRecPtr(), some WAL of the bulk write is already
deleted, so only smgrimmedsync() suffices.  Overall, while it is not compared
with LSNs in WAL records, it's significant only to the extent that such a WAL
record exists.  What am I missing?

> Unlogged tables do need to be fsync'd. The scenario is:
> 
> 1. Bulk load an unlogged table.
> 2. Shut down Postgres cleanly
> 3. Pull power plug from server, and restart.
> 
> We talked about this earlier in the "Unlogged relation copy is not fsync'd"
> thread [1]. I had already forgotten about that; that bug actually still
> exists in back branches, and we should fix it..
> 
> [1] 
> https://www.postgresql.org/message-id/flat/65e94fc8-ce1d-dd02-3be3-fda0fe8f2965%40iki.fi

Ah, that's right.  I agree this code suffices for unlogged.  As a further
optimization, it would be valid to ignore GetRedoRecPtr() for unlogged and
always call smgrregistersync().  (For any rel, smgrimmedsync() improves on
smgrregistersync() only if we fail to reach the shutdown checkpoint.  Without
a shutdown checkpoint, unlogged rels get reset anyway.)

> > I don't see any functional problem, but this likely arranges for an
> > unnecessary sync when a checkpoint starts between mdcreate() and
> > here.  (The mdcreate() sync may also be unnecessary, but that's
> > longstanding.)
> Hmm, yes we might do two fsyncs() with wal_level=minimal, unnecessarily. It
> seems hard to eliminate the redundancy. smgr_bulk_finish() could skip the
> fsync, if it knew that smgrDoPendingSyncs() will do it later. However,
> smgrDoPendingSyncs() might also decide to WAL-log the relation instead of
> fsyncing it, and in that case we do still need the fsync.

We do not need the fsync in the "WAL-log the relation instead" case; see
https://postgr.es/m/20230921062210.ga110...@rfd.leadboat.com

So maybe like this:

  if (use_wal) /* includes init forks */
current logic;
  else if (unlogged)
smgrregistersync;
  /* else temp || (permanent && wal_level=minimal): nothing to do */

> Fortunately, fsync() on a file that's already flushed to disk is pretty
> cheap.

Yep.  I'm more concerned about future readers wondering why the function is
using LSNs to decide what to do about data that doesn't appear in WAL.  A
comment could be another way to fix that, though.

Re: Parallel CREATE INDEX for GIN indexes

2024-07-01 Thread Andy Fan

Tomas Vondra  writes:

Hi Tomas,

I am in a incompleted review process but I probably doesn't have much
time on this because of my internal tasks. So I just shared what I
did and the non-good-result patch.

What I tried to do is:
1) remove all the "sort" effort for the state->bs_sort_state tuples since
its input comes from state->bs_worker_state which is sorted already.

2). remove *partial* "sort" operations between accum.rbtree to
state->bs_worker_state because the tuple in accum.rbtree is sorted
already. 

Both of them can depend on the same API changes. 

1. 
struct Tuplesortstate
{
..
+ bool input_presorted; /*  the tuples are presorted. */
+ new_tapes;  // writes the tuples in memory into a new 'run'. 
}

and user can set it during tuplesort_begin_xx(, presorte=true);


2. in tuplesort_puttuple, if memory is full but presorted is
true, we can (a) avoid the sort. (b) resuse the existing 'runs'
to reduce the effort of 'mergeruns' unless new_tapes is set to
true. once it switch to a new tapes, the set state->new_tapes to false
and wait 3) to change it to true again.

3. tuplesort_dumptuples(..);  // dump the tuples in memory and set
new_tapes=true to tell the *this batch of input is presorted but they
are done, the next batch is just presort in its own batch*.

In the gin-parallel-build case,  for the case 1), we can just use

for tuple in bs_worker_sort: 
tuplesort_putgintuple(state->bs_sortstate, ..);
tuplesort_dumptuples(..);

At last we can get a). only 1 run in the worker so that the leader can
have merge less runs in mergeruns.  b). reduce the sort both in
perform_sort_tuplesort and in sortstate_puttuple_common.

for the case 2). we can have:

   for tuple in RBTree.tuples:
  tuplesort_puttuples(tuple) ;  
  // this may cause a dumptuples internally when the memory is full,
  // but it is OK.
tuplesort_dumptuples(..);

we can just remove the "sort" into sortstate_puttuple_common but
probably increase the 'runs' in sortstate which will increase the effort
of mergeruns at last.

But the test result is not good, maybe the 'sort' is not a key factor of
this. I do missed the perf step before doing this. or maybe my test data
is too small. 

Here is the patch I used for the above activity. and I used the
following sql to test. 

CREATE TABLE t(a int[], b numeric[]);

-- generate 1000 * 1000 rows.
insert into t select i, n
from normal_rand_array(1000, 90, 1::int4, 1::int4) i,
normal_rand_array(1000, 90, '1.00233234'::numeric, '8.239241989134'::numeric) n;

alter table t set (parallel_workers=4);
set debug_parallel_query to on;
set max_parallel_maintenance_workers to 4;

create index on t using gin(a);
create index on t using gin(b);

I found normal_rand_array is handy to use in this case and I
register it into https://commitfest.postgresql.org/48/5061/.

Besides the above stuff, I didn't find anything wrong in the currrent
patch, and the above stuff can be categoried into "furture improvement"
even it is worthy to. 

-- 
Best Regards
Andy Fan

>From 48c2e03fd854c8f88f781adc944f37b004db0721 Mon Sep 17 00:00:00 2001
From: Andy Fan 
Date: Sat, 8 Jun 2024 13:21:08 +0800
Subject: [PATCH v20240702 1/3] Add function normal_rand_array function to
 contrib/tablefunc.

It can produce an array of numbers with n controllable array length and
duplicated elements in these arrays.
---
 contrib/tablefunc/Makefile|   2 +-
 contrib/tablefunc/expected/tablefunc.out  |  26 
 contrib/tablefunc/sql/tablefunc.sql   |  10 ++
 contrib/tablefunc/tablefunc--1.0--1.1.sql |   7 ++
 contrib/tablefunc/tablefunc.c | 140 ++
 contrib/tablefunc/tablefunc.control   |   2 +-
 doc/src/sgml/tablefunc.sgml   |  10 ++
 src/backend/utils/adt/arrayfuncs.c|   7 ++
 8 files changed, 202 insertions(+), 2 deletions(-)
 create mode 100644 contrib/tablefunc/tablefunc--1.0--1.1.sql

diff --git a/contrib/tablefunc/Makefile b/contrib/tablefunc/Makefile
index 191a3a1d38..f0c67308fd 100644
--- a/contrib/tablefunc/Makefile
+++ b/contrib/tablefunc/Makefile
@@ -3,7 +3,7 @@
 MODULES = tablefunc
 
 EXTENSION = tablefunc
-DATA = tablefunc--1.0.sql
+DATA = tablefunc--1.0.sql tablefunc--1.0--1.1.sql
 PGFILEDESC = "tablefunc - various functions that return tables"
 
 REGRESS = tablefunc
diff --git a/contrib/tablefunc/expected/tablefunc.out b/contrib/tablefunc/expected/tablefunc.out
index ddece79029..9f0cbbfbbe 100644
--- a/contrib/tablefunc/expected/tablefunc.out
+++ b/contrib/tablefunc/expected/tablefunc.out
@@ -12,6 +12,32 @@ SELECT avg(normal_rand)::int, count(*) FROM normal_rand(100, 250, 0.2);
 -- negative number of tuples
 SELECT avg(normal_rand)::int, count(*) FROM normal_rand(-1, 250, 0.2);
 ERROR:  number of rows cannot be negative
+SELECT count(*), avg(COALESCE(array_length(i, 1), 0)) FROM normal_rand_array(10, 3, 1.23::numeric, 8::numeric) as i;
+ count |avg 
+---+
+10 | 3.
+(1 row)
+
+

1 2 >

1 - 100 of 123 matches

Mail list logo