RE: Synchronizing slots from primary to standby

2024-04-12 Thread Zhijie Hou (Fujitsu)
On Friday, April 12, 2024 11:31 AM Amit Kapila  wrote:
> 
> On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Thursday, April 11, 2024 12:11 PM Amit Kapila 
> wrote:
> >
> > >
> > > 2.
> > > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> > >   elog(ERROR,
> > >   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > >
> > > Can we be more specific in this message? How about splitting it into
> > > error_message as "cannot synchronize local slot \"%s\"" and then
> > > errdetail as "Local slot's start streaming location LSN(%X/%X) is
> > > ahead of remote slot's LSN(%X/%X)"?
> >
> > Your version looks better. Since the above two messages all have
> > errdetail, I used the style of ereport(ERROR, errmsg_internal(),
> > errdetail_internal()... in the patch which is equal to the elog(ERROR but 
> > has an
> additional detail message.
> >
> 
> makes sense.
> 
> > Here is V5 patch set.
> >
> 
> I think we should move the check to not advance slot when one of
> remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside
> update_local_synced_slot() as we want to prevent updating slot in those cases
> even during slot synchronization.

Agreed. Here is the V6 patch which addressed this. I have merged the
two patches into one.

Best Regards,
Hou zj


v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch
Description:  v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch


Re: Can't find not null constraint, but \d+ shows that

2024-04-12 Thread Alvaro Herrera
On 2024-Apr-12, jian he wrote:

> Now I am more confused...

> +CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));
> +ALTER TABLE  notnull_tbl1 DROP c1;

> same query, mysql make let "c0" be not null

Yes, that was Postgres' old model.  But the way we think of it now, is
that a column is marked attnotnull when a pg_constraint entry exists to
support that flag, which can be a not-null constraint, or a primary key
constraint.  In the old Postgres model, you're right that we would
continue to have c0 as not-null, just like mysql.  In the new model,
that flag no longer has no reason to be there, because the backing
primary key constraint has been removed, which is why we reset it.

So what I was saying in the cases with replica identity and generated
columns, is that there's an attnotnull flag we cannot remove, because of
either of those things, but we don't have any backing constraint for it,
which is an inconsistency with the view of the world that I described
above.  I would like to manufacture one not-null constraint at that
point, or just abort the drop of the PK ... but I don't see how to do
either of those things.


If you want the c0 column to be still not-null after dropping the
primary key, you need to SET NOT NULL:

CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));   

ALTER TABLE notnull_tbl1 ALTER c0 SET NOT NULL;
ALTER TABLE  notnull_tbl1 DROP c1;
\d+ notnull_tbl1
  Table "public.notnull_tbl1"
 Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
Stats target │ Description 
┼─┼───┼──┼─┼─┼─┼──┼─
 c0 │ integer │   │ not null │ │ plain   │ │
  │ 
Not-null constraints:
"notnull_tbl1_c0_not_null" NOT NULL "c0"
Access method: heap


One thing that's not quite ideal, is that the "Nullable" column doesn't
make it obvious that the flag is going to be removed if you drop the PK;
you have to infer that that's going to happen by noticing that there's
no explicit not-null constraint listed for that column -- maybe too
subtle, especially if you have a lot of columns (luckily, PKs normally
don't have too many columns).  This is why I suggested to change the
contents of that column if the flag is sustained by the PK.  Something
like this, perhaps:

=# CREATE TABLE notnull_tbl1 (c0 int not null, c1 int, PRIMARY KEY (c0, c1));   

   
=# \d+ notnull_tbl1
  Table "public.notnull_tbl1"
 Column │  Type   │ Collation │   Nullable  │ Default │ Storage │ Compression │ 
Stats target │ Description 
┼─┼───┼─┼─┼─┼─┼──┼─
 c0 │ integer │   │ not null│ │ plain   │ │ 
 │ 
 c1 │ integer │   │ primary key │ │ plain   │ │ 
 │ 
Indexes:
"notnull_tbl1_pkey" PRIMARY KEY, btree (c0, c1)
Not-null constraints:
"notnull_tbl1_c0_not_null" NOT NULL "c0"
Access method: heap

which should make it obvious.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended."  (Gerry Pourwelle)




Re: Incorrect handling of IS [NOT] NULL quals on inheritance parents

2024-04-12 Thread David Rowley
On Thu, 11 Apr 2024 at 18:49, Richard Guo  wrote:
> I agree with both of your points.  But I also think we do not need to
> skip applying the NOT NULL qual optimization for partitioned tables.
> For partitioned tables, if the parent is marked NOT NULL, then all its
> children must also be marked NOT NULL.  And we've already populated
> notnullattnums for partitioned tables in get_relation_info.  Applying
> this optimization for partitioned tables can help save some cycles in
> apply_child_basequals if we've reduced or skipped some restriction
> clauses for a partitioned table.  This means in add_base_clause_to_rel
> we need to also check rte->relkind:
>
> -   if (!rte->inh)
> +   if (!rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE)

I was skipping this on purpose as I wasn't sure that we'd never expand
restriction_is_always_false() and restriction_is_always_true() to also
handle partition constraints.  However, after thinking about that
more, the partition constraint can only become more strict the deeper
down the partition levels you go.  If it was possible to insert a row
directly into a leaf partition that wouldn't be allowed when inserting
via the parent then there's a bug in the partition constraint.

I also considered if we'd ever add support to remove redundant quals
in CHECK constraints and if there was room for problematic mismatches
between partitioned table and leaf partitions, but I see we've thought
about that:

postgres=# alter table only p add constraint p_check check (a = 0);
ERROR:  constraint must be added to child tables too

> I also think we should update the related comments for
> apply_child_basequals and its caller, as my v1 patch does, since now we
> might reduce or skip some of the resulting clauses.

I felt the comments you wanted to add at the call site of
apply_child_basequals() knew too much about what lies within that
function.  The caller needn't know anything about what optimisations
are applied in apply_child_basequals(). In my book, that's just as bad
as documenting things about the calling function from within a
function. Since functions are designed to be reused, you're just
asking for such comments to become outdated as soon as we teach
apply_child_basequals() some new tricks. In this case, all the caller
needs to care about is properly handling a false return value.

After further work on the comments, I pushed the result.

Thanks for working on this.

David




Re: Allow non-superuser to cancel superuser tasks.

2024-04-12 Thread Kirill Reshke
On Thu, 11 Apr 2024 at 16:55, Kirill Reshke  wrote:

> 7)
>
> > +# Copyright (c) 2022-2024, PostgreSQL Global Development Group
> > [...]
> > +# Copyright (c) 2024-2024, PostgreSQL Global Development Group
>
> > These need to be cleaned up.
>
> > +# Makefile for src/test/recovery
> > +#
> > +# src/test/recovery/Makefile
>
> > This is incorrect, twice.
>
> Cleaned up, thanks!

Oh, wait, I did this wrong.

Should i use

+# Copyright (c) 2024-2024, PostgreSQL Global Development Group

(Like in src/test/signals/meson.build &
src/test/signals/t/001_signal_autovacuum.pl)
or

+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
(Like in src/test/signals/Makefile)

at the beginning of each added file?




Re: altering a column's collation leaves an invalid foreign key

2024-04-12 Thread jian he
On Tue, Mar 26, 2024 at 1:00 PM jian he  wrote:
>
> On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
>  wrote:
> >
> > On 3/23/24 10:04, Paul Jungwirth wrote:
> > > Perhaps if the previous collation was nondeterministic we should force a 
> > > re-check.
> >
> > Here is a patch implementing this. It was a bit more fuss than I expected, 
> > so maybe someone has a
> > better way.
> >
>
> + /* test follows the one in ri_FetchConstraintInfo() */
> + if (ARR_NDIM(arr) != 1 ||
> + ARR_HASNULL(arr) ||
> + ARR_ELEMTYPE(arr) != INT2OID)
> + elog(ERROR, "conkey is not a 1-D smallint array");
> + attarr = (AttrNumber *) ARR_DATA_PTR(arr);
> +
> + /* stash a List of the collation Oids in our Constraint node */
> + for (i = 0; i < numkeys; i++)
> + con->old_collations = lappend_oid(con->old_collations,
> +  list_nth_oid(changedCollationOids, attarr[i] - 1));
>
> I don't understand the "ri_FetchConstraintInfo" comment.

I still don't understand this comment.

>
> +static void
> +RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab)
> +{
> + Oid typid;
> + int32 typmod;
> + Oid collid;
> + ListCell   *lc;
> +
> + /* Fill in the list with InvalidOid if this is our first visit */
> + if (tab->changedCollationOids == NIL)
> + {
> + int len = RelationGetNumberOfAttributes(tab->rel);
> + int i;
> +
> + for (i = 0; i < len; i++)
> + tab->changedCollationOids = lappend_oid(tab->changedCollationOids,
> + InvalidOid);
> + }
> +
> + get_atttypetypmodcoll(RelationGetRelid(tab->rel), attnum,
> +  &typid, &typmod, &collid);
> +
> + lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
> + lfirst_oid(lc) = collid;
> +}
>
> do we need to check if `collid` is a valid collation?
> like:
>
> if (!OidIsValid(collid))
> {
> lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
> lfirst_oid(lc) = collid;
> }
now I think RememberCollationForRebuilding is fine. no need further change.


in ATAddForeignKeyConstraint.
if (old_check_ok)
{
/*
* When a pfeqop changes, revalidate the constraint.  We could
* permit intra-opfamily changes, but that adds subtle complexity
* without any concrete benefit for core types.  We need not
* assess ppeqop or ffeqop, which RI_Initial_Check() does not use.
*/
old_check_ok = (pfeqop == lfirst_oid(old_pfeqop_item));
old_pfeqop_item = lnext(fkconstraint->old_conpfeqop,
old_pfeqop_item);
}
maybe we can do the logic right below it. like:

if (old_check_ok)
{

* All deterministic collations use bitwise equality to resolve
* ties, but if the previous collation was nondeterministic,
* we must re-check the foreign key, because some references
* that use to be "equal" may not be anymore. If we have
* InvalidOid here, either there was no collation or the
* attribute didn't change.
old_check_ok = (old_collation == InvalidOid ||
get_collation_isdeterministic(old_collation));
}
then we don't need to cram it with the other `if (old_check_ok){}`.


do we need to add an entry in
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Open_Issues
?




Re: Add notes to pg_combinebackup docs

2024-04-12 Thread Magnus Hagander
On Fri, Apr 12, 2024 at 12:14 AM David Steele  wrote:

>
>
> On 4/11/24 20:51, Tomas Vondra wrote:
> > On 4/11/24 02:01, David Steele wrote:
> >>
> >> I have a hard time seeing this feature as being very useful, especially
> >> for large databases, until pg_combinebackup works on tar (and compressed
> >> tar). Right now restoring an incremental requires at least twice the
> >> space of the original cluster, which is going to take a lot of users by
> >> surprise.
> >
> > I do agree it'd be nice if pg_combinebackup worked with .tar directly,
> > without having to extract the directories first. No argument there, but
> > as I said in the other thread, I believe that's something we can add
> > later. That's simply how incremental development works.
>
> OK, sure, but if the plan is to make it practical later doesn't that
> make the feature something to be avoided now?
>

That could be said for any feature. When we shipped streaming replication,
the plan was to support synchronous in the future. Should we not have
shipped it, or told people to avoid it?

Sure, the current state limits it's uses in some cases. But it still leaves
a bunch of other cases where it works just fine.




> >> I know you have made some improvements here for COW filesystems, but my
> >> experience is that Postgres is generally not run on such filesystems,
> >> though that is changing a bit.
> >
> > I'd say XFS is a pretty common choice, for example. And it's one of the
> > filesystems that work great with pg_combinebackup.
>
> XFS has certainly advanced more than I was aware.
>

And it happens to be the default on at least one of our most common
platforms.


> However, who says this has to be the filesystem the Postgres instance
> > runs on? Who in their right mind put backups on the same volume as the
> > instance anyway? At which point it can be a different filesystem, even
> > if it's not ideal for running the database.
>
> My experience is these days backups are generally placed in object
> stores. Sure, people are still using NFS but admins rarely have much
> control over those volumes. They may or not be COW filesystems.
>

If it's mounted through NFS I assume pg_combinebackup won't actually be
able to use the COW features? Or does that actually work through NFS?

Mounted LUNs on a SAN I find more common today though, and there it would
do a fine job.


>
> > FWIW I think it's fine to tell users that to minimize the disk space
> > requirements, they should use a CoW filesystem and --copy-file-range.
> > The docs don't say that currently, that's true.
>
> That would probably be a good addition to the docs.
>

+1, that would be a good improvement.


> All of this also depends on how people do the restore. With the CoW
> > stuff they can do a quick (and small) copy on the backup server, and
> > then copy the result to the actual instance. Or they can do restore on
> > the target directly (e.g. by mounting a r/o volume with backups), in
> > which case the CoW won't really help.
>
> And again, this all requires a significant amount of setup and tooling.
> Obviously I believe good backup requires effort but doing this right
> gets very complicated due to the limitations of the tool.
>

It clearly needs to be documented that there are space needs. But
temporarily getting space for something like that is not very complicated
in most environments. But you do have to be aware of it.

Generally speaking it's already the case that the "restore experience" with
pg_basebackup is far from great. We don't have a "pg_baserestore". You
still have to deal with archive_command and restore_command, which we all
know can be easy to get wrong. I don't see how this is fundamentally worse
than that.

Personally, I tend to recommend that "if you want PITR and thus need to
mess with archive_command etc, you should use a backup tool like
pg_backrest. If you're fine with just daily backups or whatnot, use
pg_basebackup". The incremental backup story fits somewhere in between, but
I'd still say this is (today) primarily a tool directed at those that don't
need full PITR.


> But yeah, having to keep the backups as expanded directories is not
> > great, I'd love to have .tar. Not necessarily because of the disk space
> > (in my experience the compression in filesystems works quite well for
> > this purpose), but mostly because it's more compact and allows working
> > with backups as a single piece of data (e.g. it's much cleared what the
> > checksum of a single .tar is, compared to a directory).
>
> But again, object stores are commonly used for backup these days and
> billing is based on data stored rather than any compression that can be
> done on the data. Of course, you'd want to store the compressed tars in
> the object store, but that does mean storing an expanded copy somewhere
> to do pg_combinebackup.
>

Object stores are definitely getting more common. I wish they were getting
a lot more common than they actually are, because they simplify a lot.  But

Re: Incorrect handling of IS [NOT] NULL quals on inheritance parents

2024-04-12 Thread Richard Guo
On Fri, Apr 12, 2024 at 4:19 PM David Rowley  wrote:

> After further work on the comments, I pushed the result.
>
> Thanks for working on this.


Thanks for pushing!

BTW, I noticed a typo in the comment of add_base_clause_to_rel.

--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2644,7 +2644,7 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
 * apply_child_basequals() sees, whereas the inh==false one is what's
used
 * for the scan node in the final plan.
 *
-* We make an exception to this is for partitioned tables.  For these,
we
+* We make an exception to this for partitioned tables.  For these, we

Thanks
Richard


Re: Add notes to pg_combinebackup docs

2024-04-12 Thread Magnus Hagander
On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra 
wrote:

>
>
> On 4/9/24 09:59, Martín Marqués wrote:
> > Hello,
> >
> > While doing some work/research on the new incremental backup feature
> > some limitations were not listed in the docs. Mainly the fact that
> > pg_combienbackup works with plain format and not tar.
> >
>
> Right. The docs mostly imply this by talking about output directory and
> backup directories, but making it more explicit would not hurt.
>
> FWIW it'd be great if we could make incremental backups work with tar
> format in the future too. People probably don't want to keep around the
> expanded data directory or extract everything before combining the
> backups is not very convenient. Reading and writing the tar would make
> this simpler.
>
> > Around the same time, Tomas Vondra tested incremental backups with a
> > cluster where he enabled checksums after taking the previous full
> > backup. After combining the backups the synthetic backup had pages
> > with checksums and other pages without checksums which ended in
> > checksum errors.
> >
>
> I'm not sure just documenting this limitation is sufficient. We can't
> make the incremental backups work in this case (it's as if someone
> messes with cluster without writing stuff into WAL), but I think we
> should do better than silently producing (seemingly) corrupted backups


+1. I think that should be an open item that needs to get sorted.


I say seemingly, because the backup is actually fine, the only problem
> is it has checksums enabled in the controlfile, but the pages from the
> full backup (and the early incremental backups) have no checksums.
>
> What we could do is detect this in pg_combinebackup, and either just
> disable checksums with a warning and hint to maybe enable them again. Or
> maybe just print that the user needs to disable them.
>

I don't think either of these should be done automatically. Something like
pg_combinebackup simply failing and requiring you to say
"--disable-checksums" to have it do that would be the way to go, IMO.
(once we can reliably detect it of course)


I was thinking maybe we could detect this while taking the backups, and
> force taking a full backup if checksums got enabled since the last
> backup. But we can't do that because we only have the manifest from the
> last backup, and the manifest does not include info about checksums.
>

Can we forcibly read and parse it out of pg_control?


It's a bit unfortunate we don't have a way to enable checksums online.
> That'd not have this problem IIRC, because it writes proper WAL. Maybe
> it's time to revive that idea ... I recall there were some concerns
> about tracking progress to allow resuming stuff, but maybe not having
> anything because in some (rare?) cases it'd need to do more work does
> not seem like a great trade off.
>
>
For that one I still think it would be perfectly acceptable to have no
resume at all, but that's a whole different topic :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: sql/json remaining issue

2024-04-12 Thread Amit Langote
On Thu, Apr 11, 2024 at 12:02 PM jian he  wrote:
> On Wed, Apr 10, 2024 at 4:39 PM Amit Langote  wrote:
> > Attached is a bit more polished version of that, which also addresses
> > the error messages in JsonPathQuery() and JsonPathValue().  I noticed
> > that there was comment I had written at one point during JSON_TABLE()
> > hacking that said that we should be doing this.
> >
> > I've also added an open item for this.
>
> `
>  | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
> COLUMNS ( json_table_column [, ...] )
> NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
> ( json_table_column [, ...] )
> `
>  "json_path_specification" should be "path_expression"?

Fixed in 0002.

> your explanation about memory usage is clear to me!
>
>
> The following are minor cosmetic issues while applying v2.
> +errmsg("JSON path expression in JSON_VALUE should return singleton
> scalar item")));
> "singleton" is not intuitive to me.
> Then I looked around.
> https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F&q=singleton
> There is only one appearance of "singleton" in the manual.

Yes, singleton is a term used a lot in the source code but let's keep
it out of error messages and docs.  So fixed.

> errhint("Use WITH WRAPPER clause to wrap SQL/JSON item sequence into 
> array.")));
> maybe
> errhint("Use WITH WRAPPER clause to wrap SQL/JSON items into array.")));
> or
> errhint("Use WITH WRAPPER clause to wrap SQL/JSON item sequences into
> array.")));

Changed to use "SQL/JSON items into array.".

> then I wonder what's the difference between
> 22038 ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED
> 2203F ERRCODE_SQL_JSON_SCALAR_REQUIRED
>
> i assume '{"hello":"world"}'  is a singleton, but not a scalar item?
> if so, then I think the error message within the "if (count > 1)"
> branch in JsonPathValue
> should use ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED
> within the "if (!IsAJsonbScalar(res))" branch should use
> ERRCODE_SQL_JSON_SCALAR_REQUIRED
> ?
> + if (column_name)
> + ereport(ERROR,
> + (errcode(ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM),
> + errmsg("JSON path expression for column \"%s\" should return
> singleton scalar item",
> + column_name)));
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_SQL_JSON_SCALAR_REQUIRED),
> + errmsg("JSON path expression in JSON_VALUE should return singleton
> scalar item")));
> the error message seems similar, but the error code is different?
> both within "if (count > 1)" and "if (!IsAJsonbScalar(res))" branch.

Using different error codes for the same error is a copy-paste mistake
on my part.  Fixed.

> in src/include/utils/jsonpath.h, comments
> /* SQL/JSON item */
> should be
> /* SQL/JSON query functions */
>
>
> elog(ERROR, "unrecognized json wrapper %d", wrapper);
> should be
> elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);

Fixed in 0003.

-- 
Thanks, Amit Langote


v3-0001-JSON_TABLE-mention-column-name-in-some-error-mess.patch
Description: Binary data


v3-0002-JSON_TABLE-Use-term-path_expression-more-consiste.patch
Description: Binary data


v3-0003-SQL-JSON-Fix-some-comments-in-jsonpath.h.patch
Description: Binary data


Re: Add notes to pg_combinebackup docs

2024-04-12 Thread David Steele

On 4/12/24 19:09, Magnus Hagander wrote:
On Fri, Apr 12, 2024 at 12:14 AM David Steele 

OK, sure, but if the plan is to make it practical later doesn't that
make the feature something to be avoided now?


That could be said for any feature. When we shipped streaming 
replication, the plan was to support synchronous in the future. Should 
we not have shipped it, or told people to avoid it?


This doesn't seem like a great example. Synchronous rep is by far the 
more used mode in my experience. I actively dissuade people from using 
sync rep because of the downsides. More people think they need it than 
actually need it.



 > However, who says this has to be the filesystem the Postgres instance
 > runs on? Who in their right mind put backups on the same volume
as the
 > instance anyway? At which point it can be a different filesystem,
even
 > if it's not ideal for running the database.

My experience is these days backups are generally placed in object
stores. Sure, people are still using NFS but admins rarely have much
control over those volumes. They may or not be COW filesystems.


If it's mounted through NFS I assume pg_combinebackup won't actually be 
able to use the COW features? Or does that actually work through NFS?


Pretty sure it won't work via NFS, but I was wrong about XFS, so...

Mounted LUNs on a SAN I find more common today though, and there it 
would do a fine job.


Huh, interesting. This is a case I almost never see anymore.


 > All of this also depends on how people do the restore. With the CoW
 > stuff they can do a quick (and small) copy on the backup server, and
 > then copy the result to the actual instance. Or they can do
restore on
 > the target directly (e.g. by mounting a r/o volume with backups), in
 > which case the CoW won't really help.

And again, this all requires a significant amount of setup and tooling.
Obviously I believe good backup requires effort but doing this right
gets very complicated due to the limitations of the tool.

It clearly needs to be documented that there are space needs. But 
temporarily getting space for something like that is not very 
complicated in most environments. But you do have to be aware of it.


We find many environments ridiculously tight on space. There is a 
constant fight with customers/users to even get the data/WAL volumes 
sized correctly.


For small databases it is probably not an issue, but this feature really 
shines with very large databases.


Generally speaking it's already the case that the "restore experience" 
with pg_basebackup is far from great. We don't have a "pg_baserestore". 
You still have to deal with archive_command and restore_command, which 
we all know can be easy to get wrong. I don't see how this is 
fundamentally worse than that.


I pretty much agree with this statement. pg_basebackup is already hard 
to use effectively. Now it is just optionally harder.


Personally, I tend to recommend that "if you want PITR and thus need to 
mess with archive_command etc, you should use a backup tool like 
pg_backrest. If you're fine with just daily backups or whatnot, use 
pg_basebackup". The incremental backup story fits somewhere in between, 
but I'd still say this is (today) primarily a tool directed at those 
that don't need full PITR.


Yeah, there are certainly cases where PITR is not required, but they 
still seem to be in the minority. PITR cannot be disabled for the most 
recent backup in pgBackRest and we've had few complaints about that overall.



 > But yeah, having to keep the backups as expanded directories is not
 > great, I'd love to have .tar. Not necessarily because of the disk
space
 > (in my experience the compression in filesystems works quite well for
 > this purpose), but mostly because it's more compact and allows
working
 > with backups as a single piece of data (e.g. it's much cleared
what the
 > checksum of a single .tar is, compared to a directory).

But again, object stores are commonly used for backup these days and
billing is based on data stored rather than any compression that can be
done on the data. Of course, you'd want to store the compressed tars in
the object store, but that does mean storing an expanded copy somewhere
to do pg_combinebackup.

Object stores are definitely getting more common. I wish they were 
getting a lot more common than they actually are, because they simplify 
a lot.  But they're in my experience still very far from being a majority.


I see it the other way, especially the last few years. The majority seem 
to be object stores followed up closely by NFS. Directly mounted storage 
on the backup host appears to be rarer.



But if the argument is that all this can/will be fixed in the future, I
guess the smart thing for users to do is wait a few releases for
incremental backups to become a practical feature.

Th

Re: pg_upgrde failed : logical replication : alter_subscription_add_log

2024-04-12 Thread Amit Kapila
Note - Please keep pgsql-hackers in CC while responding.

On Fri, Apr 12, 2024 at 10:44 AM Perumal Raj  wrote:

> Thanks Amit for the update,
>
> Documentation says : https://www.postgresql.org/docs/15/upgrading.html
>
> 19.6.3. Upgrading Data via Replication
>
> It is also possible to use logical replication methods to create a standby
> server with the updated version of PostgreSQL. This is possible because
> logical replication supports replication between different major versions
> of PostgreSQL. The standby can be on the same computer or a different
> computer. Once it has synced up with the primary server (running the older
> version of PostgreSQL), you can switch primaries and make the standby the
> primary and shut down the older database instance. Such a switch-over
> results in only several seconds of downtime for an upgrade.
>
> This method of upgrading can be performed using the built-in logical
> replication facilities as well as using external logical replication
> systems such as pglogical, Slony, Londiste, and Bucardo.
>
> What is "built-in logical replication" ?
>

See docs at [1].

[1] - https://www.postgresql.org/docs/devel/logical-replication.html
--
With Regards,
Amit Kapila


Re: promotion related handling in pg_sync_replication_slots()

2024-04-12 Thread Amit Kapila
On Fri, Apr 12, 2024 at 7:47 AM shveta malik  wrote:
>
> On Sat, Apr 6, 2024 at 11:49 AM Amit Kapila  wrote:
> >
> >
> > Few comments:
> > ==
> > 1.
> >  void
> >  SyncReplicationSlots(WalReceiverConn *wrconn)
> >  {
> > + /*
> > + * Startup process signaled the slot sync to stop, so if meanwhile user
> > + * has invoked slot sync SQL function, simply return.
> > + */
> > + SpinLockAcquire(&SlotSyncCtx->mutex);
> > + if (SlotSyncCtx->stopSignaled)
> > + {
> > + ereport(LOG,
> > + errmsg("skipping slot synchronization as slot sync shutdown is
> > signaled during promotion"));
> > +
> > + SpinLockRelease(&SlotSyncCtx->mutex);
> > + return;
> > + }
> > + SpinLockRelease(&SlotSyncCtx->mutex);
> >
> > There is a race condition with this code. Say during promotion
> > ShutDownSlotSync() is just before setting this flag and the user has
> > invoked pg_sync_replication_slots() and passed this check but still
> > didn't set the SlotSyncCtx->syncing flag. So, now, the promotion would
> > recognize that there is slot sync going on in parallel, and slot sync
> > wouldn't know that the promotion is in progress.
>
> Did you mean that now, the promotion *would not* recognize...
>

Right.

> I see, I will fix this.
>

Thanks.

-- 
With Regards,
Amit Kapila.




Re: promotion related handling in pg_sync_replication_slots()

2024-04-12 Thread Amit Kapila
On Fri, Apr 12, 2024 at 7:57 AM shveta malik  wrote:
>
> On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
>  wrote:
> >
> > On Fri, Apr 5, 2024 at 10:31 AM shveta malik  wrote:
> > >
> > > Please find v2. Changes are:
> >
> > Thanks for the patch. Here are some comments.
>
> Thanks for reviewing.
> >
> > 1. Can we have a clear saying in the shmem variable who's syncing at
> > the moment? Is it a slot sync worker or a backend via SQL function?
> > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
> >
> > typedef enum SlotSyncSource
> > {
> > SLOT_SYNC_NONE,
> > SLOT_SYNC_WORKER,
> > SLOT_SYNC_BACKEND,
> > } SlotSyncSource;
> >
> > Then, the check in ShutDownSlotSync can be:
> >
> > +   /*
> > +* Return if neither the slot sync worker is running nor the 
> > function
> > +* pg_sync_replication_slots() is executing.
> > +*/
> > +   if ((SlotSyncCtx->pid == InvalidPid) &&
> > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
> > {
> >

I don't know if this will be help, especially after fixing the race
condition I mentioned. But otherwise, also, at this stage it doesn't
seem helpful to add the source of sync explicitly.

> > 2.
> > SyncReplicationSlots(WalReceiverConn *wrconn)
> >  {
> > +/*
> > + * Startup process signaled the slot sync to stop, so if meanwhile user
> > + * has invoked slot sync SQL function, simply return.
> > + */
> > +SpinLockAcquire(&SlotSyncCtx->mutex);
> > +if (SlotSyncCtx->stopSignaled)
> > +{
> > +ereport(LOG,
> > +errmsg("skipping slot synchronization as slot sync
> > shutdown is signaled during promotion"));
> > +
> >
> > Unless I'm missing something, I think this can't detect if the backend
> > via SQL function is already half-way through syncing in
> > synchronize_one_slot. So, better move this check to (or also have it
> > there) slot sync loop that calls synchronize_one_slot. To avoid
> > spinlock acquisitions, we can perhaps do this check in when we acquire
> > the spinlock for synced flag.
>
> If the sync via SQL function is already half-way, then promotion
> should wait for it to finish. I don't think it is a good idea to move
> the check to synchronize_one_slot().  The sync-call should either not
> start (if it noticed the promotion) or finish the sync and then let
> promotion proceed. But I would like to know others' opinion on this.
>

Agreed.

-- 
With Regards,
Amit Kapila.




Re: Another WaitEventSet resource leakage in back branches

2024-04-12 Thread Etsuro Fujita
Hi Andres,

On Fri, Apr 12, 2024 at 1:29 AM Andres Freund  wrote:
> On 2024-03-22 21:15:45 +0900, Etsuro Fujita wrote:
> > While working on [1], I noticed $SUBJECT: WaitLatchOrSocket in back
> > branches is ignoring the possibility of failing partway through, too.
> > I added a PG_FAINALLY block to that function, like commit 555276f85.
> > Patch attached.
>
> Could you expand a bit on the concrete scenario you're worried about here?
> PG_TRY/CATCH aren't free, so adding something like this to a quite common
> path, in the back branches, without a concrete analysis as to why it's needed,
> seems a bit scary.

What I am worried about is that system calls used in
WaitLatchOrSocket, like epoll_ctl, might fail, throwing an error
(epoll_ctl might fail due to eg, ENOMEM or ENOSPC).  The probability
of such failures would be pretty low, but not zero.

This causes more problems than it solves?

Thanks for the comment!

Best regards,
Etsuro Fujita




Re: promotion related handling in pg_sync_replication_slots()

2024-04-12 Thread shveta malik
On Fri, Apr 12, 2024 at 4:18 PM Amit Kapila  wrote:
>
> On Fri, Apr 12, 2024 at 7:57 AM shveta malik  wrote:
> >
> > On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
> >  wrote:
> > >
> > > On Fri, Apr 5, 2024 at 10:31 AM shveta malik  
> > > wrote:
> > > >
> > > > Please find v2. Changes are:
> > >
> > > Thanks for the patch. Here are some comments.
> >
> > Thanks for reviewing.
> > >
> > > 1. Can we have a clear saying in the shmem variable who's syncing at
> > > the moment? Is it a slot sync worker or a backend via SQL function?
> > > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
> > >
> > > typedef enum SlotSyncSource
> > > {
> > > SLOT_SYNC_NONE,
> > > SLOT_SYNC_WORKER,
> > > SLOT_SYNC_BACKEND,
> > > } SlotSyncSource;
> > >
> > > Then, the check in ShutDownSlotSync can be:
> > >
> > > +   /*
> > > +* Return if neither the slot sync worker is running nor the 
> > > function
> > > +* pg_sync_replication_slots() is executing.
> > > +*/
> > > +   if ((SlotSyncCtx->pid == InvalidPid) &&
> > > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
> > > {
> > >
>
> I don't know if this will be help, especially after fixing the race
> condition I mentioned. But otherwise, also, at this stage it doesn't
> seem helpful to add the source of sync explicitly.
>

Agreed.

Please find v3 addressing race-condition and one other comment.

Up-thread it was suggested that,  probably, checking
SlotSyncCtx->syncing should be sufficient in ShutDownSlotSync(). On
re-thinking, it might not be. Slot sync worker sets and resets
'syncing' with each sync-cycle, and thus we need to rely on worker's
pid in ShutDownSlotSync(), as there could be a window where promotion
is triggered and 'syncing' is not set for worker, while the worker is
still running. This implementation of setting and resetting syncing
with each sync-cycle looks better as compared to setting syncing
during the entire life-cycle of the worker. So, I did not change it.

To fix the race condition, I moved the setting of the 'syncing'  flag
together with the 'stopSignaled' check under the same spinLock for the
SQL function. OTOH, for worker, I feel it is good to check
'stopSignaled' at the beginning itself, while retaining the
setting/resetting of 'syncing' at a later stage during the actual sync
cycle. This makes handling for SQL function and worker slightly
different. And thus to achieve this, I had to take the 'syncing' flag
handling out of synchronize_slots() and move it to both worker and SQL
function by introducing 2 new functions check_and_set_syncing_flag()
and reset_syncing_flag().
I am analyzing if there are better ways to achieve this, any
suggestions are welcome.

thanks
Shveta


v3-0001-Handle-stopSignaled-during-sync-function-call.patch
Description: Binary data


Re: post-freeze damage control

2024-04-12 Thread Tomas Vondra



On 4/11/24 23:48, David Steele wrote:
> On 4/11/24 20:26, Tomas Vondra wrote:
>>
>> On 4/11/24 03:52, David Steele wrote:
>>> On 4/11/24 10:23, Tom Kincaid wrote:

 The extensive Beta process we have can be used to build confidence we
 need in a feature that has extensive review and currently has no known
 issues or outstanding objections.
>>>
>>> I did have objections, here [1] and here [2]. I think the complexity,
>>> space requirements, and likely performance issues involved in restores
>>> are going to be a real problem for users. Some of these can be addressed
>>> in future releases, but I can't escape the feeling that what we are
>>> releasing here is half-baked.
>>
>> I haven't been part of those discussions, and that part of the thread is
>> a couple months old already, so I'll share my view here instead.
>>
>> I do not think it's half-baked. I certainly agree there are limitations,
>> and there's all kinds of bells and whistles we could add, but I think
>> the fundamental infrastructure is corrent and a meaningful step forward.
>> Would I wish it to handle .tar for example? Sure I would. But I think
>> it's something we can add in the future - if we require all of this to
>> happen in a single release, it'll never happen.
> 
> Fair enough, but the current release is extremely limited and it would
> be best if that was well understood by users.
> 
>> FWIW that discussion also mentions stuff that I think the feature should
>> not do. In particular, I don't think the ambition was (or should be) to
>> make pg_basebackup into a stand-alone tool. I always saw pg_basebackup
>> more as an interface to "backup steps" correctly rather than a complete
>> backup solution that'd manage backup registry, retention, etc.
> 
> Right -- this is exactly my issue. pg_basebackup was never easy to use
> as a backup solution and this feature makes it significantly more
> complicated. Complicated enough that it would be extremely difficult for
> most users to utilize in a meaningful way.
> 

Perhaps, I agree we could/should try to do better job to do backups, no
argument there. But I still don't quite see why introducing such
infrastructure to "manage backups" should be up to the patch adding
incremental backups. I see it as something to build on top of
pg_basebackup/pg_combinebackup, not into those tools.

> But they'll try because it is a new pg_basebackup feature and they'll
> assume it is there to be used. Maybe it would be a good idea to make it
> clear in the documentation that significant tooling will be required to
> make it work.
> 

Sure, I'm not against making it clearer pg_combinebackup is not a
complete backup solution, and documenting the existing restrictions.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: post-freeze damage control

2024-04-12 Thread Tomas Vondra



On 4/12/24 08:42, David Steele wrote:
> On 4/11/24 20:26, Tomas Vondra wrote:
>> On 4/11/24 03:52, David Steele wrote:
>>> On 4/11/24 10:23, Tom Kincaid wrote:

 The extensive Beta process we have can be used to build confidence we
 need in a feature that has extensive review and currently has no known
 issues or outstanding objections.
>>>
>>> I did have objections, here [1] and here [2]. I think the complexity,
>>> space requirements, and likely performance issues involved in restores
>>> are going to be a real problem for users. Some of these can be addressed
>>> in future releases, but I can't escape the feeling that what we are
>>> releasing here is half-baked.
>>>
>> I do not think it's half-baked. I certainly agree there are limitations,
>> and there's all kinds of bells and whistles we could add, but I think
>> the fundamental infrastructure is corrent and a meaningful step forward.
>> Would I wish it to handle .tar for example? Sure I would. But I think
>> it's something we can add in the future - if we require all of this to
>> happen in a single release, it'll never happen.
> 
> I'm not sure that I really buy this argument, anyway. It is not uncommon
> for significant features to spend years in development before they are
> committed. This feature went from first introduction to commit in just
> over six months. Obviously Robert had been working on it for a while,
> but for a feature this large six months is a sprint.
> 

Sure, but it's also not uncommon for significant features to be
developed incrementally, over multiple releases, introducing the basic
infrastructure first, and then expanding the capabilities later. I'd
cite logical decoding/replication and parallel query as examples of this
approach.

It's possible there's some fundamental flaw in the WAL summarization?
Sure, I can't rule that out, although I find it unlikely. Could there be
bugs? Sure, that's possible, but that applies to all code.

But it seems to me all the comments are about the client side, not about
the infrastructure. Which is fair, I certainly agree it'd be nice to
handle more use cases with less effort, but I still think the patch is a
meaningful step forward.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Should we add a compiler warning for large stack frames?

2024-04-12 Thread Andrew Dunstan


On 2024-04-11 Th 16:17, Andres Freund wrote:

128k is probably not going to be an issue in practice. However, it also seems
not great from a performance POV to use this much stack in a function that's
called fairly often.  I'd allocate the buffer in verify_backup_checksums() and
reuse it across all to-be-checked files.




Yes, I agree. I'll make that happen in the next day or two.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Add notes to pg_combinebackup docs

2024-04-12 Thread Tomas Vondra
On 4/12/24 11:50, David Steele wrote:
> On 4/12/24 19:09, Magnus Hagander wrote:
>> On Fri, Apr 12, 2024 at 12:14 AM David Steele >
>> ...>>
>>  > But yeah, having to keep the backups as expanded directories is
>> not
>>  > great, I'd love to have .tar. Not necessarily because of the disk
>>     space
>>  > (in my experience the compression in filesystems works quite
>> well for
>>  > this purpose), but mostly because it's more compact and allows
>>     working
>>  > with backups as a single piece of data (e.g. it's much cleared
>>     what the
>>  > checksum of a single .tar is, compared to a directory).
>>
>>     But again, object stores are commonly used for backup these days and
>>     billing is based on data stored rather than any compression that
>> can be
>>     done on the data. Of course, you'd want to store the compressed
>> tars in
>>     the object store, but that does mean storing an expanded copy
>> somewhere
>>     to do pg_combinebackup.
>>
>> Object stores are definitely getting more common. I wish they were
>> getting a lot more common than they actually are, because they
>> simplify a lot.  But they're in my experience still very far from
>> being a majority.
> 
> I see it the other way, especially the last few years. The majority seem
> to be object stores followed up closely by NFS. Directly mounted storage
> on the backup host appears to be rarer.
> 

One thing I'd mention is that not having built-in support for .tar and
.tgz backups does not mean it's impossible to use pg_combinebackup with
archives. You can mount them using e.g. "ratarmount" and then use that
as source directories for pg_combinebackup.

It's not entirely friction-less because AFAICS it's necessary to do the
backup in plain format and then do the .tar to have the expected "flat"
directory structure (and not manifest + 2x tar). But other than that it
seems to work fine (based on my limited testing).


FWIW the "archivemount" performs terribly, so adding this capability
into pg_combinebackup is clearly far from trivial.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-12 Thread Daniel Gustafsson
> On 10 Apr 2024, at 09:31, Peter Eisentraut  wrote:

> I think it might be better to separate this into two steps:

Fair enough.

> 1. Move to 1.1.0.  This is an API update.  Change OPENSSL_API_COMPAT, and 
> remove a bunch of code that no longer needs to be conditional.  We could 
> check for a representative function like OPENSSL_init_ssl() in 
> configure/meson, or we could just let the compilation fail with older 
> versions.

The attached 0002 bumps the minimum required version to 1.1.0 with a hard error
in autoconf/meson, and removes all the 1.0.2 support code.

I think the documentation for PQinitOpenSSL should be reworded from "you have
to do this, unless you run 1.1.0 or later" to "If you run 1.1.0, you need to do
this).  As it is now the important bit of the paragrapg is at the end rather
than in the beginning.  Trying this I didn't find a wording which seemed like
an improvement though, suggestions are welcome.

> 2. Move to 1.1.1.  I understand this has to do with the fork-safety of 
> pg_strong_random(), and it's not an API change but a behavior change. Let's 
> make this association clearer in the code.  For example, add a version check 
> or assertion about this into pg_strong_random() itself.

0005 moves the fork safety init inline with calling pg_strong_random, and
removes it for everyone else.  This allows 1.1.0 to be supported as we
effectively are at the 1.1.0 API level, at the cost of calls for strong random
being slower on 1.1.0.  An unscientific guess based on packaged OpenSSL
versions and the EOL and ELS/LTS status of 1.1.0, is that the number of
production installs of postgres 17 using OpenSSL 1.1.0 is close to zero.

> I don't know how LibreSSL interacts with either of these two points. That's 
> something that could be clearer.

The oldest LibreSSL we have in the buildfarm is 3.2 (from OpenBSD 6.8), which
the attached version supports and passes tests with.  LibreSSL has been
providing fork safety since 2.0.2 which is well into the past.

> * doc/src/sgml/libpq.sgml
> 
> This small documentation patch could be committed forthwith.

Agreed, separated into 0001 in the attached and can be committed regardless of
the remaining ones.

> * src/backend/libpq/be-secure-openssl.c
> 
> +#include 
> 
> This patch doesn't appear to add anything, so why does it need a new include?

As mentioned downthread, an indirect inclusion was removed so we need to
explicitly include it.

> Could the additions of SSL_OP_NO_CLIENT_RENEGOTIATION and 
> SSL_R_VERSION_TOO_LOW be separate patches?

They can, done in the attached.

SSL_R_VERSION_TOO_LOW was introduced quite recently in LibreSSL 3.6.3 (OpenBSD
7.2), but splitting the check for TOO_LOW and TOO_HIGH into two seems pretty
uncontroversial and a good way to get ever so slightly better error handling on
recent LibreSSL.

SSL renegotiation has been supported for much longer in LibreSSL so adding that
to make OpenSSL and LibreSSL support slightly more on par seems seems like a
good idea regardless of version bump.

> * src/common/hmac_openssl.c
> 
> There appears to be some unrelated refactoring happening here?

I assume you mean changing back to FRONTEND from USE_RESOWNER_FOR_HMAC, the
latter was added recently in 38698dd38 and have never shipped, so it seemed
more backpatch-friendly to move back.  Given the amount of changes it probably
won't move the needle though so reverted.

> * src/include/common/openssl.h
> 
> Is the comment no longer applicable to OpenSSL, only to LibreSSL?

OpenSSL has since 0.9.8 defined TLS_MAX_VERSION which points highest version
TLS protocol supported, but AFAIK there is no such construction in LibreSSL.
Assuming I didn't totally misunderstand the comment of course.

> * src/port/pg_strong_random.c
> 
> I would prefer to remove pg_strong_random_init() if it's no longer useful.  I 
> mean, if we leave it as is, and we are not removing any callers, then we are 
> effectively continuing to support OpenSSL <1.1.1, right?

The attached removes pg_strong_random_init and instead calls it explicitly for
1.1.0 users by checking the OpenSSL version.

Is the attached split in line with how you were thinking about it?

--
Daniel Gustafsson



v7-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data


v7-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data


v7-0003-Support-disallowing-SSL-renegotiation-in-LibreSSL.patch
Description: Binary data


v7-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


v7-0001-Doc-Use-past-tense-for-things-which-happened-in-t.patch
Description: Binary data


Re: Add notes to pg_combinebackup docs

2024-04-12 Thread Tomas Vondra



On 4/12/24 11:12, Magnus Hagander wrote:
> On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra 
> wrote:
> 
>>
>>
>> On 4/9/24 09:59, Martín Marqués wrote:
>>> Hello,
>>>
>>> While doing some work/research on the new incremental backup feature
>>> some limitations were not listed in the docs. Mainly the fact that
>>> pg_combienbackup works with plain format and not tar.
>>>
>>
>> Right. The docs mostly imply this by talking about output directory and
>> backup directories, but making it more explicit would not hurt.
>>
>> FWIW it'd be great if we could make incremental backups work with tar
>> format in the future too. People probably don't want to keep around the
>> expanded data directory or extract everything before combining the
>> backups is not very convenient. Reading and writing the tar would make
>> this simpler.
>>
>>> Around the same time, Tomas Vondra tested incremental backups with a
>>> cluster where he enabled checksums after taking the previous full
>>> backup. After combining the backups the synthetic backup had pages
>>> with checksums and other pages without checksums which ended in
>>> checksum errors.
>>>
>>
>> I'm not sure just documenting this limitation is sufficient. We can't
>> make the incremental backups work in this case (it's as if someone
>> messes with cluster without writing stuff into WAL), but I think we
>> should do better than silently producing (seemingly) corrupted backups
> 
> 
> +1. I think that should be an open item that needs to get sorted.
> 
> 
> I say seemingly, because the backup is actually fine, the only problem
>> is it has checksums enabled in the controlfile, but the pages from the
>> full backup (and the early incremental backups) have no checksums.
>>
>> What we could do is detect this in pg_combinebackup, and either just
>> disable checksums with a warning and hint to maybe enable them again. Or
>> maybe just print that the user needs to disable them.
>>
> 
> I don't think either of these should be done automatically. Something like
> pg_combinebackup simply failing and requiring you to say
> "--disable-checksums" to have it do that would be the way to go, IMO.
> (once we can reliably detect it of course)
> 

You mean pg_combinebackup would have "--disable-checksums" switch? Yeah,
that'd work, I think. It's probably better than producing a backup that
would seem broken when the user tries to start the instance.

Also, I realized the user probably can't disable the checksums without
starting the instance to finish recovery. And if there are invalid
checksums, I'm not sure that would actually work.

> 
> I was thinking maybe we could detect this while taking the backups, and
>> force taking a full backup if checksums got enabled since the last
>> backup. But we can't do that because we only have the manifest from the
>> last backup, and the manifest does not include info about checksums.
>>
> 
> Can we forcibly read and parse it out of pg_control?
> 

You mean when taking the backup, or during pg_combinebackup?

During backup, it depends. For the instance we should be able to just
get that from the instance, no need to get it from pg_control. But for
the backup (used as a baseline for the increment) we can't read the
pg_control - the only thing we have is the manifest.

During pg_combinebackup we obviously can read pg_control for all the
backups to combine, but at that point it feels a bit too late - it does
not seem great to do backups, and then at recovery to tell the user the
backups are actually not valid.

I think it'd be better to detect this while taking the basebackup.

> 
> It's a bit unfortunate we don't have a way to enable checksums online.
>> That'd not have this problem IIRC, because it writes proper WAL. Maybe
>> it's time to revive that idea ... I recall there were some concerns
>> about tracking progress to allow resuming stuff, but maybe not having
>> anything because in some (rare?) cases it'd need to do more work does
>> not seem like a great trade off.
>>
>>
> For that one I still think it would be perfectly acceptable to have no
> resume at all, but that's a whole different topic :)
> 

I very much agree.

It seems a bit strange that given three options to enable checksums

1) offline
2) online without resume
3) online with resume

the initial argument was that we need to allow resuming the process
because on large systems it might take a lot of time, and we'd lose all
the work if the system restarts. But then we concluded that it's too
complex and it's better if the large systems have to do an extended
outage to enable checksums ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-12 Thread Dmitry Koval

Hi!

Attached is a patch with corrections based on comments in previous 
letters (I think these corrections are not final).

I'll be very grateful for feedbacks and bug reports.

11.04.2024 20:00, Alexander Lakhin wrote:
> may be an attempt to merge into implicit
> pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does?

Corrected. Result is:

\d+ s1.*
Table "s1.tp0"
...
Table "s1.tp1"
...
\d+ tp*
Did not find any relation named "tp*".


12.04.2024 4:53, Alexander Korotkov wrote:
> I think we shouldn't unconditionally copy schema name and
> relpersistence from the parent table. Instead we should throw the
> error on a mismatch like CREATE TABLE ... PARTITION OF ... does.
12.04.2024 5:20, Robert Haas wrote:
> We definitely shouldn't copy the schema name from the parent table.

Fixed.

12.04.2024 5:20, Robert Haas wrote:
> One of the things I dislike about this type of feature -- not this
> implementation specifically, but just this kind of idea in general --
> is that the syntax mentions a whole bunch of tables but in a way where
> you can't set their properties. Persistence, reloptions, whatever.

In next releases I want to allow specifying options (probably, first of 
all, specifying tablespace of the partitions).
But before that, I would like to get a users reaction - what options 
they really need?


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom 707d15cf9ee4673a1deed2825f48ffbc09a34e9d Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Wed, 10 Apr 2024 18:54:05 +0300
Subject: [PATCH v4 1/2] Fix for SPLIT/MERGE partitions of temporary table

---
 src/backend/commands/tablecmds.c  |  32 +++--
 src/backend/parser/parse_utilcmd.c|  19 ++-
 src/test/regress/expected/partition_merge.out | 118 +-
 src/test/regress/expected/partition_split.out |  29 +
 src/test/regress/sql/partition_merge.sql  |  95 +-
 src/test/regress/sql/partition_split.sql  |  23 
 6 files changed, 297 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8a98a0af48..b59e1dda03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -21145,17 +21145,19 @@ moveSplitTableRows(Relation rel, Relation splitRel, 
List *partlist, List *newPar
  * createPartitionTable: create table for a new partition with given name
  * (newPartName) like table (modelRelName)
  *
- * Emulates command: CREATE TABLE  (LIKE 
+ * Emulates command: CREATE [TEMP] TABLE  (LIKE 
  * INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY)
  */
 static void
-createPartitionTable(RangeVar *newPartName, RangeVar *modelRelName,
+createPartitionTable(Relation rel, RangeVar *newPartName, RangeVar 
*modelRelName,
 AlterTableUtilityContext *context)
 {
CreateStmt *createStmt;
TableLikeClause *tlc;
PlannedStmt *wrapper;
 
+   newPartName->relpersistence = rel->rd_rel->relpersistence;
+
createStmt = makeNode(CreateStmt);
createStmt->relation = newPartName;
createStmt->tableElts = NIL;
@@ -21291,7 +21293,7 @@ ATExecSplitPartition(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
SinglePartitionSpec *sps = (SinglePartitionSpec *) 
lfirst(listptr);
RelationnewPartRel;
 
-   createPartitionTable(sps->name, parentName, context);
+   createPartitionTable(rel, sps->name, parentName, context);
 
/* Open the new partition and acquire exclusive lock on it. */
newPartRel = table_openrv(sps->name, AccessExclusiveLock);
@@ -21488,10 +21490,10 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
{
/* Create partition table with generated temparary name. */
sprintf(tmpRelName, "merge-%u-%X-tmp", RelationGetRelid(rel), 
MyProcPid);
-   mergePartName = 
makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
-
tmpRelName, -1);
+   mergePartName = makeRangeVar(cmd->name->schemaname, tmpRelName, 
-1);
}
-   createPartitionTable(mergePartName,
+   createPartitionTable(rel,
+mergePartName,
 
makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
  
RelationGetRelationName(rel), -1),
 context);
@@ -21507,12 +21509,6 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
/* Copy data from merged partitions to new partition. */
moveMergedTablesRows(rel, mergingPartitionsList, newPartRel);
 
-   /*
-* Attach a new partition to the partitioned table. wqueue = NULL:

Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Tom Lane
Andres Freund  writes:
> Oh my. There's a workload that completely trivially hits this, without even
> trying hard. LISTEN/NOTIFY.

Hm.  Bug in the NOTIFY logic perhaps?  Sending that many signals
can't be good from a performance POV, whether or not it triggers
spinlock issues.

regards, tom lane




Re: Add notes to pg_combinebackup docs

2024-04-12 Thread Magnus Hagander
On Fri, Apr 12, 2024 at 3:01 PM Tomas Vondra 
wrote:

>
>
> On 4/12/24 11:12, Magnus Hagander wrote:
> > On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra <
> tomas.von...@enterprisedb.com>
> > wrote:
> >
> >>
> >>
> >> On 4/9/24 09:59, Martín Marqués wrote:
> >>> Hello,
> >>>
> >>> While doing some work/research on the new incremental backup feature
> >>> some limitations were not listed in the docs. Mainly the fact that
> >>> pg_combienbackup works with plain format and not tar.
> >>>
> >>
> >> Right. The docs mostly imply this by talking about output directory and
> >> backup directories, but making it more explicit would not hurt.
> >>
> >> FWIW it'd be great if we could make incremental backups work with tar
> >> format in the future too. People probably don't want to keep around the
> >> expanded data directory or extract everything before combining the
> >> backups is not very convenient. Reading and writing the tar would make
> >> this simpler.
> >>
> >>> Around the same time, Tomas Vondra tested incremental backups with a
> >>> cluster where he enabled checksums after taking the previous full
> >>> backup. After combining the backups the synthetic backup had pages
> >>> with checksums and other pages without checksums which ended in
> >>> checksum errors.
> >>>
> >>
> >> I'm not sure just documenting this limitation is sufficient. We can't
> >> make the incremental backups work in this case (it's as if someone
> >> messes with cluster without writing stuff into WAL), but I think we
> >> should do better than silently producing (seemingly) corrupted backups
> >
> >
> > +1. I think that should be an open item that needs to get sorted.
> >
> >
> > I say seemingly, because the backup is actually fine, the only problem
> >> is it has checksums enabled in the controlfile, but the pages from the
> >> full backup (and the early incremental backups) have no checksums.
> >>
> >> What we could do is detect this in pg_combinebackup, and either just
> >> disable checksums with a warning and hint to maybe enable them again. Or
> >> maybe just print that the user needs to disable them.
> >>
> >
> > I don't think either of these should be done automatically. Something
> like
> > pg_combinebackup simply failing and requiring you to say
> > "--disable-checksums" to have it do that would be the way to go, IMO.
> > (once we can reliably detect it of course)
> >
>
> You mean pg_combinebackup would have "--disable-checksums" switch? Yeah,
> that'd work, I think. It's probably better than producing a backup that
> would seem broken when the user tries to start the instance.
>
> Also, I realized the user probably can't disable the checksums without
> starting the instance to finish recovery. And if there are invalid
> checksums, I'm not sure that would actually work.
>
> >
> > I was thinking maybe we could detect this while taking the backups, and
> >> force taking a full backup if checksums got enabled since the last
> >> backup. But we can't do that because we only have the manifest from the
> >> last backup, and the manifest does not include info about checksums.
> >>
> >
> > Can we forcibly read and parse it out of pg_control?
> >
>
> You mean when taking the backup, or during pg_combinebackup?
>

Yes. That way combining the backups into something that doesn't have proper
checksums (either by actually turning them off or as today just breaking
them and forcing you to turn it off yourself) can only happen
intentionally. And if you weren't aware of the problem, it turns into a
hard error, so you will notice before it's too late.


During backup, it depends. For the instance we should be able to just
> get that from the instance, no need to get it from pg_control. But for
> the backup (used as a baseline for the increment) we can't read the
> pg_control - the only thing we have is the manifest.


> During pg_combinebackup we obviously can read pg_control for all the
> backups to combine, but at that point it feels a bit too late - it does
> not seem great to do backups, and then at recovery to tell the user the
> backups are actually not valid.


> I think it'd be better to detect this while taking the basebackup.
>

Agreed. In the end, we might want to do *both*, but the earlier the better.

But to do that what we'd need is to add a flag to the initial manifest that
says "this cluster is supposed to have checksum = " and then refuse
to take an inc if it changes? It doesn't seem like the end of the world to
add that to it?



> > It's a bit unfortunate we don't have a way to enable checksums online.
> >> That'd not have this problem IIRC, because it writes proper WAL. Maybe
> >> it's time to revive that idea ... I recall there were some concerns
> >> about tracking progress to allow resuming stuff, but maybe not having
> >> anything because in some (rare?) cases it'd need to do more work does
> >> not seem like a great trade off.
> >>
> >>
> > For that one I still think it would be perfectly acceptable to have no
> > resume 

Re: Security lessons from liblzma - libsystemd

2024-04-12 Thread Magnus Hagander
On Thu, Apr 4, 2024 at 1:10 AM Peter Eisentraut 
wrote:

> On 03.04.24 23:19, Magnus Hagander wrote:
> > When the code is this simple, we should definitely consider carrying it
> > ourselves. At least if we don't expect to need *other* functionality
> > from the same library in the future, which I doubt we will from
> libsystemd.
>
> Well, I've long had it on my list to do some integration to log directly
> to the journal, so you can preserve metadata better.  I'm not sure right
> now whether this would use libsystemd, but it's not like there is
> absolutely no other systemd-related functionality that could be added.
>

Ah interesting. I hadn't thought of that use-case.




> Personally, I think this proposed change is trying to close a barndoor
> after a horse has bolted.  There are many more interesting and scary
> libraries in the dependency tree of "postgres", so just picking off one
> right now doesn't really accomplish anything.  The next release of
> libsystemd will drop all the compression libraries as hard dependencies,
> so the issue in that sense is gone anyway.  Also, fun fact: liblzma is
> also a dependency via libxml2.
>


To be clear, I didn't mean to single out this one, just saying that it's
something we should keep in consideration in general when adding library
dependencies. Every new dependency, no matter how small, increases the
management and risks for it. And we should just be aware of that and weigh
them against each other.

As in we should *consider* it, that doesn't' mean we should necessarily
*do* it.

(And yes, there are many scary dependencies down the tree)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: post-freeze damage control

2024-04-12 Thread Alexander Korotkov
On Wed, Apr 10, 2024 at 5:27 PM Robert Haas  wrote:
>
> On Mon, Apr 8, 2024 at 10:12 PM Tom Lane  wrote:
> > I have another one that I'm not terribly happy about:
> >
> > Author: Alexander Korotkov 
> > Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300
> >
> > Transform OR clauses to ANY expression
>
> I realize that this has been reverted now, but what's really
> frustrating about this case is that I reviewed this patch before and
> gave feedback similar to some of the feedback you gave, and it just
> didn't matter, and the patch was committed anyway.
>
> > I don't know that I'd call it scary exactly, but I do think it
> > was premature.  A week ago there was no consensus that it was
> > ready to commit, but Alexander pushed it (or half of it, anyway)
> > despite that.  A few concrete concerns:
> >
> > * Yet another planner GUC.  Do we really need or want that?
>
> IMHO, no, and I said so in
> https://www.postgresql.org/message-id/CA%2BTgmob%3DebuCHFSw327b55DJzE3JtOuZ5owxob%2BMgErb4me_Ag%40mail.gmail.com
>
> > * What the medical community would call off-label usage of
> > query jumbling.  I'm not sure this is even correct as-used,
> > and for sure it's using that code for something never intended.
> > Nor is the added code adequately (as in, at all) documented.
>
> And I raised this point here:
> https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com
>
> > * Patch refuses to group anything but Consts into the SAOP
> > transformation.  I realize that if you want to produce an
> > array Const you need Const inputs, but I wonder why it
> > wasn't considered to produce an ARRAY[] construct if there
> > are available clauses with pseudo-constant (eg Param)
> > comparison values.
> >
> > * I really, really dislike jamming this logic into prepqual.c,
> > where it has no business being.  I note that it was shoved
> > into process_duplicate_ors without even the courtesy of
> > expanding the header comment:
> >
> >  * process_duplicate_ors
> >  *Given a list of exprs which are ORed together, try to apply
> >  *the inverse OR distributive law.
> >
> > Another reason to think this wasn't a very well chosen place is
> > that the file's list of #include's went from 4 entries to 11.
> > Somebody should have twigged to the idea that this was off-topic
> > for prepqual.c.
>
> All of this seems like it might be related to my comments in the above
> email about the transformation being done too early.
>
> > * OrClauseGroupKey is not a Node type, so why does it have
> > a NodeTag?  I wonder what value will appear in that field,
> > and what will happen if the struct is passed to any code
> > that expects real Nodes.
>
> I don't think I raised this issue.
>
> > I could probably find some other nits if I spent more time
> > on it, but I think these are sufficient to show that this
> > was not commit-ready.
>
> Just imagine if someone had taken time to give similar feedback before
> the commit.

FWIW, I made my conclusion that it isn't worth to commit stuff like
this without explicit consent from Tom.  As well as it isn't worth to
commit table AM changes without explicit consent from Andres.  And it
isn't worth it to postpone large features to the last CF (it's better
to postpone to the next release then).

--
Regards,
Alexander Korotkov




Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi,

On 2024-04-12 09:43:46 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Oh my. There's a workload that completely trivially hits this, without even
> > trying hard. LISTEN/NOTIFY.
> 
> Hm.  Bug in the NOTIFY logic perhaps?

I don't think it's a bug, async.c:SignalBackends() will signal once for every
NOTIFY, regardless of whether the target has already been signaled. You're
certainly right that:

> Sending that many signals can't be good from a performance POV, whether or
> not it triggers spinlock issues.

I wonder if we could switch this to latches, because with latches we'd only
re-send a signal if the receiving end has already processed the signal. Or
alternatively, make procsignal.c do something similar, although that might
take a bit of work to be done race-free.

Greetings,

Andres Freund




Re: Security lessons from liblzma - libsystemd

2024-04-12 Thread Andres Freund
Hi,

On 2024-04-04 01:10:20 +0200, Peter Eisentraut wrote:
> On 03.04.24 23:19, Magnus Hagander wrote:
> > When the code is this simple, we should definitely consider carrying it
> > ourselves. At least if we don't expect to need *other* functionality
> > from the same library in the future, which I doubt we will from
> > libsystemd.
> 
> Well, I've long had it on my list to do some integration to log directly to
> the journal, so you can preserve metadata better.  I'm not sure right now
> whether this would use libsystemd, but it's not like there is absolutely no
> other systemd-related functionality that could be added.

Interesting. I think that'd in all likelihood end up using libsystemd.


> Personally, I think this proposed change is trying to close a barndoor after
> a horse has bolted.  There are many more interesting and scary libraries in
> the dependency tree of "postgres", so just picking off one right now doesn't
> really accomplish anything.  The next release of libsystemd will drop all
> the compression libraries as hard dependencies, so the issue in that sense
> is gone anyway.  Also, fun fact: liblzma is also a dependency via libxml2.

I agree that doing this just because of future risk in liblzma is probably not
worth it. Despite soon not being an unconditional dependency of libsystemd
anymore, I'd guess that in a few months it's going to be one of the better
vetted libraries out there.  But I don't think that means it's not worth
reducing the dependencies that we have unconditionally loaded.

Having a dependency to a fairly large library (~900k), which could be avoided
with ~30 LOC, is just not worth it. Independent of liblzma. Not from a
performance POV, nor from a risk POV.

I'm actually fairly bothered by us linking to libxml2. It was effectively
unmaintained for most of the last decade, with just very occasional drive-by
commits. And it's not that there weren't significant bugs or such. Maintenance
has picked up some, but it's still not well maintained, I'd say.  If I wanted
to attack postgres, it's where I'd start.

My worry level, in decreasing order, about postmaster-level dependencies:
- libxml2 - effectively unmaintained, just about maintained today
- gssapi - heavily undermaintained from what I can see, lots of complicated code
- libldap - undermaintained, lots of creaky old code
- libpam - undermaintained, lots of creaky old code
- the rest

Greetings,

Andres Freund




Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Alexander Lakhin

12.04.2024 08:05, Alexander Lakhin wrote:

2024-04-12 05:00:17.981 UTC [762336] PANIC:  stuck spinlock detected at 
WaitBufHdrUnlocked, bufmgr.c:5726



It looks like that spinlock issue caused by a race condition/deadlock.
What I see when the test fails is:
A client backend executing "DROP DATABASE conflict_db" performs
dropdb() -> DropDatabaseBuffers() -> InvalidateBuffer()
At the same time, bgwriter performs (for the same buffer):
BgBufferSync() -> SyncOneBuffer()

When InvalidateBuffer() is called, the buffer refcount is zero,
then bgwriter pins the buffer, thus increases refcount;
InvalidateBuffer() gets into the retry loop;
bgwriter calls UnpinBuffer() -> UnpinBufferNoOwner() ->
  WaitBufHdrUnlocked(), which waits for !BM_LOCKED state,
while InvalidateBuffer() waits for the buffer refcount decrease.

As it turns out, it's not related to spinlocks' specifics or PRNG, just a
serendipitous find.

Best regards,
Alexander




cpluspluscheck/headerscheck require build in REL_16_STABLE

2024-04-12 Thread Marina Polyakova

Hello, hackers!

When running cpluspluscheck/headerscheck on REL_16_STABLE [1] I found 
that everything was ok only if it was preceded by a build and make 
maintainer-clean was not used:


$ ./configure --without-icu --with-perl --with-python > configure.txt &&
make -j16 > make.txt &&
make -j16 clean > clean.txt &&
make -j16 cpluspluscheck > cpluspluscheck.txt

$ ./configure --without-icu --with-perl --with-python > configure_1.txt 
&&

make -j16 > make.txt &&
make -j16 distclean > clean.txt &&
./configure --without-icu --with-perl --with-python > configure_2.txt &&
make -j16 cpluspluscheck > cpluspluscheck.txt

Otherwise cpluspluscheck/headerscheck will fail:

$ ./configure --without-icu --with-perl --with-python > configure_1.txt 
&&

make -j16 > make.txt &&
make -j16 maintainer-clean > clean.txt &&
./configure --without-icu --with-perl --with-python > configure_2.txt &&
make -j16 cpluspluscheck > cpluspluscheck.txt

$ ./configure --without-icu --with-perl --with-python > configure.txt &&
make -j16 cpluspluscheck > cpluspluscheck.txt

In file included from /tmp/cpluspluscheck.Zy4645/test.cpp:3:
/home/marina/postgrespro/postgrespro/src/backend/parser/gramparse.h:29:10: 
fatal error: gram.h: No such file or directory

   29 | #include "gram.h"
  |  ^~~~
compilation terminated.
In file included from /tmp/cpluspluscheck.Zy4645/test.cpp:3:
/home/marina/postgrespro/postgrespro/src/backend/utils/adt/jsonpath_internal.h:26:10: 
fatal error: jsonpath_gram.h: No such file or directory

   26 | #include "jsonpath_gram.h"
  |  ^
compilation terminated.
make: *** [GNUmakefile:141: cpluspluscheck] Error 1

In the other branches everything is fine: these problems begin with 
commits [2] (jsonpath_gram.h) and [3] (gram.h) and in the master branch 
there're no such problems after commit [4]. The attached diff.patch 
fixes this issue for me. (IIUC internal headers generated by Bison are 
usually excluded from such checks so I also excluded gramparse.h and 
jsonpath_internal.h...)


[1] 
https://github.com/postgres/postgres/commit/e177da5c87a10abac97c028bfb427bafb7353aa2
[2] 
https://github.com/postgres/postgres/commit/dac048f71ebbcf2f980d280711f8ff8001331c5d
[3] 
https://github.com/postgres/postgres/commit/ecaf7c5df54f7fa9df2fdc7225d2bb4e283f0081
[4] 
https://github.com/postgres/postgres/commit/721856ff24b3722ce8e894e5a32c9c063cd48455


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/tools/pginclude/cpluspluscheck b/src/tools/pginclude/cpluspluscheck
index 4e09c4686b30b973fd4ca2443d04b5228e904f4b..e77979c97ebd09dff18cb98090254bd3724de0ff 100755
--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -121,14 +121,15 @@ do
 
 	# We can't make these Bison output files compilable standalone
 	# without using "%code require", which old Bison versions lack.
-	# parser/gram.h will be included by parser/gramparse.h anyway.
 	test "$f" = contrib/cube/cubeparse.h && continue
 	test "$f" = contrib/seg/segparse.h && continue
 	test "$f" = src/backend/bootstrap/bootparse.h && continue
 	test "$f" = src/backend/parser/gram.h && continue
+	test "$f" = src/backend/parser/gramparse.h && continue
 	test "$f" = src/backend/replication/repl_gram.h && continue
 	test "$f" = src/backend/replication/syncrep_gram.h && continue
 	test "$f" = src/backend/utils/adt/jsonpath_gram.h && continue
+	test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue
 	test "$f" = src/bin/pgbench/exprparse.h && continue
 	test "$f" = src/pl/plpgsql/src/pl_gram.h && continue
 	test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue
diff --git a/src/tools/pginclude/headerscheck b/src/tools/pginclude/headerscheck
index 8dee1b56709d06a9ef1f5e80b956a079fa8a1e6a..30e0e47684e571e8c86459654a0ad37326d798e0 100755
--- a/src/tools/pginclude/headerscheck
+++ b/src/tools/pginclude/headerscheck
@@ -116,14 +116,15 @@ do
 
 	# We can't make these Bison output files compilable standalone
 	# without using "%code require", which old Bison versions lack.
-	# parser/gram.h will be included by parser/gramparse.h anyway.
 	test "$f" = contrib/cube/cubeparse.h && continue
 	test "$f" = contrib/seg/segparse.h && continue
 	test "$f" = src/backend/bootstrap/bootparse.h && continue
 	test "$f" = src/backend/parser/gram.h && continue
+	test "$f" = src/backend/parser/gramparse.h && continue
 	test "$f" = src/backend/replication/repl_gram.h && continue
 	test "$f" = src/backend/replication/syncrep_gram.h && continue
 	test "$f" = src/backend/utils/adt/jsonpath_gram.h && continue
+	test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue
 	test "$f" = src/bin/pgbench/exprparse.h && continue
 	test "$f" = src/pl/plpgsql/src/pl_gram.h && continue
 	test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue


Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-12 Thread Alexander Lakhin

Hi Dmitry,

12.04.2024 16:04, Dmitry Koval wrote:

Hi!

Attached is a patch with corrections based on comments in previous letters (I 
think these corrections are not final).
I'll be very grateful for feedbacks and bug reports.

11.04.2024 20:00, Alexander Lakhin wrote:
> may be an attempt to merge into implicit
> pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does?

Corrected. Result is:


Thank you!
Still now we're able to create a partition in the pg_temp schema
explicitly. Please try:
ALTER TABLE t
MERGE PARTITIONS (tp_0_1, tp_1_2) INTO pg_temp.tp_0_2;

in the scenario [1] and you'll get the same empty table.

[1] 
https://www.postgresql.org/message-id/fdaa003e-919c-cbc9-4f0c-e4546e96bd65%40gmail.com

Best regards,
Alexander




PG_TEST_EXTRAs by theme rather than test name (Re: pgsql: Add tests for libpq gssencmode and sslmode options)

2024-04-12 Thread Heikki Linnakangas

(moved to pgsql-hackers, change subject)

On 10/04/2024 18:54, Heikki Linnakangas wrote:

On 10/04/2024 17:48, Peter Eisentraut wrote:

On 08.04.24 01:50, Heikki Linnakangas wrote:

Add tests for libpq gssencmode and sslmode options


Why aren't these tests at
src/interfaces/libpq/t/nnn_negotiate_encryption.pl ?


To be honest, it never occurred to me. It started out as extra tests
under src/test/ssl/, and when I decided to move them out to its own
module, I didn't think of moving them to src/interfaces/libpq/t/.

I will move it, barring any objections or better ideas.


Moved.

I also added an extra check for PG_TEST_EXTRA=kerberos, so that the 
tests that require a MIT Kerberos installation are only run if 
PG_TEST_EXTRA=kerberos is specified. That seems prudent; it seems 
unlikely that you would want to run libpq_encryption tests with Kerberos 
tests included, but not the main kerberos tests. If you specify 
PG_TEST_EXTRA=libpq_encryption, but not 'kerberos', it's probably 
because you don't have an MIT Kerberos installation on your system.


I added documentation for the new PG_TEST_EXTRA=libpq_encryption option, 
I missed that earlier, with a note on the above interaction with 'kerberos'.



As we accumulate more PG_TEST_EXTRA options, I think we should 
categorize the tests by the capabilities they need or the risk 
associated, rather than by test names. Currently we have:


- kerberos: Requires MIT Kerberos installation and opens TCP/IP listen 
sockets

- ldap: Requires OpenLDAP installation and opens TCP/IP listen sockets
- ssl: Opens TCP/IP listen sockets.
- load_balance: Requires editing the system 'hosts' file and opens 
TCP/IP listen sockets.
- libpq_encryption: Opens TCP/IP listen sockets. For the GSSAPI tests, 
requires MIT Kerberos installation

- wal_consistency_checking: is resource intensive
- xid_wraparound: is resource intensive

There are a few clear themes here:

- tests that open TCP/IP listen sockets
- tests that require OpenLDAP installation
- tests that require MIT Kerberos installation
- tests that require editing 'hosts' file
- tests that are resource intensive

We could have PG_TEST_EXTRA options that match those themes, and 
enable/disable the individual tests based on those requirements. For 
example, if you're on a single-user system and have no issue with 
opening TCP/IP listen sockets, you would specify 
"PG_TEST_EXTRA=tcp-listen", and all the tests that need to open TCP/IP 
listen sockets would run. Also it would be nice to have autoconf/meson 
tests for the presence of OpenLDAP / MIT Kerberos installations, instead 
of having to enable/disable them with PG_TEST_EXTRA.


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: allow changing autovacuum_max_workers without restarting

2024-04-12 Thread Imseih (AWS), Sami
I spent sometime reviewing/testing the POC. It is relatively simple with a lot
of obvious value. 

I tested with 16 tables that constantly reach the autovac threashold and the
patch did the right thing. I observed concurrent autovacuum workers matching
the setting as I was adjusting it dynamically.

As you mention above, If there are more autovacs in progress and a new lower 
setting 
is applied, we should not take any special action on those autovacuums, and 
eventually 
the max number of autovacuum workers will match the setting.

I also tested by allowing user connections to reach max_connections, and 
observed the 
expected number of autovacuums spinning up and correctly adjusted.

Having autovacuum tests ( unless I missed something ) like the above is a good 
general improvement, but it should not be tied to this. 

A few comments on  the POC patch:

1/ We should emit a log when autovacuum_workers is set higher than the max.

2/ should the name of the restart limit be "reserved_autovacuum_workers"?

Regards,

Sami Imseih
AWS (Amazon Web Services)





Importing Extended Statistics

2024-04-12 Thread Corey Huinker
I'm creating this new thread separate from the existing Statistics
Export/Import thread to keep the original thread focused on that patch.

Assuming that the function signature for pg_set_attribute_stats() remains
the same (regclass, attname, inherited, version, ...stats...), how would we
design the function signature for pg_set_extended_stats()?

Currently, the variant arguments in pg_set_attribute_stats are mapping
attribute names from pg_stats onto parameter names in the function, so a
call looks like this:

SELECT pg_set_attribute_stats('public.foo'::regclass, 'foo_id', false,
17000,
'null_frac', 0.4::real,
'avg_width', 20::integer,
'n_distinct, ', 100::real,
...);


And that works, because there's a 1:1 mapping of attribute names to param
names in the pairs of variants.

However, that won't work for extended stats, as it has 2 mappings:

* 1 set of stats from pg_stats_ext
* N sets of stats from pg_stats_ext_exprs, one per expression contained in
the statistics object definition.

My first attempt at making this possible is to have section markers. The
variant arguments would be matched against column names in pg_stats_ext
until the parameter 'expression' is encountered, at which point it would
begin matching parameters from pg_stats_ext_exprs to parameters for the
first expression. Any subsequent use of 'expression' would move the
matching to the next expression.

This method places a burden on the caller to get all of the pg_stats_ext
values specified before any expression values. It also places a burden on
the reader that they have to count the number of times they see
'expression' used as a parameter, but it otherwise allows 1 variant list to
serve two purposes.

Thoughts?


Re: Table AM Interface Enhancements

2024-04-12 Thread Melanie Plageman
On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov  wrote:
>
> On Fri, Apr 12, 2024 at 12:04 AM Andres Freund  wrote:
> > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> > > I hope this work is targeting pg18.
> >
> > I think anything of the scope discussed by Melanie would be very clearly
> > targeting 18. For 17, I don't know yet whether we should revert the the
> > ANALYZE streaming read user (041b96802ef), just do a bit of comment 
> > polishing,
> > or some other small change.
> >
> > One oddity is that before 041b96802ef, the opportunities for making the
> > interface cleaner were less apparent, because c6fc50cb4028 increased the
> > coupling between analyze.c and the way the table storage works.
>
> Thank you for pointing this out about c6fc50cb4028, I've missed this.
>
> > > Otherwise, do I get this right that this post feature-freeze works on
> > > designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> > > committed on Mar 30.
> >
> > Note that there were versions of the patch that were targeting the
> > pre-27bc1772fc interface.
>
> Sure, I've checked this before writing.  It looks quite similar to the
> result of applying my revert patch [1] to the head.
>
> Let me describe my view over the current situation.
>
> 1) If we just apply my revert patch and leave c6fc50cb4028 and
> 041b96802ef in the tree, then we get our table AM API narrowed.  As
> you expressed the current API requires block numbers to be 1:1 with
> the actual physical on-disk location [2].  Not a secret I think the
> current API is quite restrictive.  And we're getting the ANALYZE
> interface narrower than it was since 737a292b5de.  Frankly speaking, I
> don't think this is acceptable.
>
> 2) Pushing down the read stream and prefetch to heap am is related to
> difficulties [3], [4].  That's quite a significant piece of work to be
> done post FF.

I had operated under the assumption that we needed to push the
streaming read code into heap AM because that is what we did for
sequential scan, but now that I think about it, I don't see why we
would have to. Bilal's patch pre-27bc1772fc did not do this. But I
think the code in acquire_sample_rows() isn't more tied to heap AM
after 041b96802ef than it was before it. Are you of the opinion that
the code with 041b96802ef ties acquire_sample_rows() more closely to
heap format?

- Melanie




Re: using extended statistics to improve join estimates

2024-04-12 Thread Justin Pryzby
On Tue, Apr 02, 2024 at 04:23:45PM +0800, Andy Fan wrote:
> 
> 0001 is your patch, I just rebase them against the current master. 0006
> is not much relevant with current patch, and I think it can be committed
> individually if you are OK with that.

Your 002 should also remove listidx to avoid warning
../src/backend/statistics/extended_stats.c:2879:8: error: variable 'listidx' 
set but not used [-Werror,-Wunused-but-set-variable]

> Subject: [PATCH v1 2/8] Remove estimiatedcluases and varRelid arguments

> @@ -2939,15 +2939,11 @@ statext_try_join_estimates(PlannerInfo *root, List 
> *clauses, int varRelid,
>   /* needs to happen before skipping any clauses */
>   listidx++;
>  
> - /* Skip clauses that were already estimated. */
> - if (bms_is_member(listidx, estimatedclauses))
> - continue;
> -

Your 007 could instead test if relids == NULL:

> Subject: [PATCH v1 7/8] bms_is_empty is more effective than bms_num_members(b)
>-   if (bms_num_members(relids) == 0)
>+   if (bms_is_empty(relids))

typos:
001: s/heuristict/heuristics/
002: s/grantee/guarantee/
002: s/estimiatedcluases/estimatedclauses/

It'd be nice to fix/silence these warnings from 001:

|../src/backend/statistics/extended_stats.c:3151:36: warning: ‘relid’ may be 
used uninitialized [-Wmaybe-uninitialized]
| 3151 | if (var->varno != relid)
|  |^
|../src/backend/statistics/extended_stats.c:3104:33: note: ‘relid’ was declared 
here
| 3104 | int relid;
|  | ^
|[1016/1893] Compiling C object src/backend/postgres_lib.a.p/statistics_mcv.c.o
|../src/backend/statistics/mcv.c: In function ‘mcv_combine_extended’:
|../src/backend/statistics/mcv.c:2431:49: warning: declaration of ‘idx’ shadows 
a previous local [-Wshadow=compatible-local]

FYI, I also ran the patch with a $large number of reports without
observing any errors or crashes.

I'll try to look harder at the next patch revision.

-- 
Justin




Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi,

Given that I found several ways to spuriously trigger the stuck spinlock
logic, I think we need to backpatch a fix.


On 2024-04-11 21:41:39 -0700, Andres Freund wrote:
> Tracking the total amount of time spent sleeping doesn't require any
> additional syscalls, due to the nanosleep()...

I did quickly implement this, and it works nicely on unix. Unfortunately I
couldn't find something similar for windows. Which would mean we'd need to
record the time before every sleep.

Another issue with that approach is that we'd need to add a field to
SpinDelayStatus, which'd be an ABI break. Not sure there's any external code
using it, but if there were, it'd be a fairly nasty, rarely hit, path.

Afaict there's no padding that could be reused on 32 bit platforms [1].


I wonder if, for easy backpatching, the easiest solution is to just reset
errno before calling pg_usleep(), and only increment status->delays if
errno != EINTR. Given our pg_usleep() implementations, that'd preven the stuck
spinlock logic from triggering too fast.


If the ABI issue weren't there, we could record the current time the first
time we sleep during a spinlock acquisition (there's an existing branch for
that) and then only trigger the stuck spinlock detection when enough time has
passed.  That'd be fine from an efficiency POV.


Even if we decide to do so, I don't think it'd be a good idea to remove the
stuck logic alltogether in the backbranches. That might take out services that
have been running ok-ish for a long time.

Greetings,

Andres Freund


[1] But we are wasting a bunch of space on 64bit platforms due to switching
between pointers and 32bit integers, should probably fix that.




Re: allow changing autovacuum_max_workers without restarting

2024-04-12 Thread Nathan Bossart
On Fri, Apr 12, 2024 at 05:27:40PM +, Imseih (AWS), Sami wrote:
> A few comments on  the POC patch:

Thanks for reviewing.

> 1/ We should emit a log when autovacuum_workers is set higher than the max.

Hm.  Maybe the autovacuum launcher could do that.

> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?

That's kind-of what I had in mind, although I think we might want to avoid
the word "reserved" because it sounds a bit like reserved_connections and
superuser_reserved_connections.  "autovacuum_max_slots" or
"autovacuum_max_worker_slots" might be worth considering, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: further improving roles_is_member_of()

2024-04-12 Thread Nathan Bossart
On Thu, Apr 11, 2024 at 11:16:33PM -0500, Nathan Bossart wrote:
> As shown in the attached work-in-progress patch, this actually ends up
> removing more code than it adds, and it seems to provide similar results to
> HEAD (using the benchmark from the previous thread [0]).  I intend to test
> it with many more roles to see if it's better in more extreme cases.

Even with 100K roles, the Bloom filter added in commit d365ae7 seems to do
a pretty good job at keeping up with the hash table approach.  The callers
of roles_is_member_of() that do list_member_oid() on the returned list
might benefit a little from a hash table, but I'm skeptical it makes much
difference in practice.  This was an interesting test, but I'll likely
withdraw this patch shortly.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Issue with the PRNG used by Postgres

2024-04-12 Thread Andres Freund
Hi,

On 2024-04-12 11:33:17 -0700, Andres Freund wrote:
> I wonder if, for easy backpatching, the easiest solution is to just reset
> errno before calling pg_usleep(), and only increment status->delays if
> errno != EINTR. Given our pg_usleep() implementations, that'd preven the stuck
> spinlock logic from triggering too fast.

Here's a patch implementing this approach. I confirmed that before we trigger
the stuck spinlock logic very quickly and after we don't. However, if most
sleeps are interrupted, it can delay the stuck spinlock detection a good
bit. But that seems much better than triggering it too quickly.

Greetings,

Andres Freund
>From 7a2e2aff29157e6e1b64dee60456f14a1e294237 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 12 Apr 2024 10:39:46 -0700
Subject: [PATCH v1 1/2] meson: Add target for s_lock_test

s_lock_test can't currently be used for automated tests as
a) it takes a while
b) it doesn't report failures in a usable way
but it is still useful to be able to build and run it manually.

The addition of PGDLLIMPORT to my_wait_event info is necessary to get link the
program correctly when targeting windows. Apparently that hadn't been tried before.

TODO: Should we build this by default, so it doesn't accidentally break, like
it has in the past?
---
 src/backend/storage/lmgr/meson.build | 13 +
 src/backend/storage/lmgr/s_lock.c|  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/lmgr/meson.build b/src/backend/storage/lmgr/meson.build
index 05ac41e809a..26463d4e8a2 100644
--- a/src/backend/storage/lmgr/meson.build
+++ b/src/backend/storage/lmgr/meson.build
@@ -11,3 +11,16 @@ backend_sources += files(
   's_lock.c',
   'spin.c',
 )
+
+# Test binary for spinlocks. This isn't run as a test as
+# a) it takes a while
+# b) it doesn't report failures in a usable way
+# but it is still useful to be able to build and run it manually.
+s_lock_test = executable('s_lock_test',
+  files('s_lock.c'),
+  dependencies: [backend_code],
+  link_with: [common_static, pgport_static],
+  kwargs: default_bin_args + {'install': false},
+  c_args: ['-DS_LOCK_TEST=1'],
+  build_by_default: false,
+)
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index 9b389d99512..f725d026317 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -67,7 +67,7 @@
  * s_lock_test.
  */
 static uint32 local_my_wait_event_info;
-uint32	   *my_wait_event_info = &local_my_wait_event_info;
+PGDLLIMPORT uint32 *my_wait_event_info = &local_my_wait_event_info;
 #endif
 
 static int	spins_per_delay = DEFAULT_SPINS_PER_DELAY;
-- 
2.44.0.279.g3bd955d269

>From 67dbf92aceaa506746c66276354ab430a85f06c7 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 12 Apr 2024 12:10:20 -0700
Subject: [PATCH v1 2/2] WIP: Avoid spuriously triggering the stuck spinlock
 detection

When signals are delivered, the sleep in perform_spin_delay() is
interrupted. Until now, perform_spin_delay() did not take that into account,
and behaved as if the sleep was uninterrupted. In some workloads signals can
be frequent enough to trigger the old logic almost immediately.

As this needs to be fixed in the backbranches, we are somewhat constrained by
ABI compatibility. Otherwise we e.g. could track the actual elapsed time in
SpinDelayStatus.

Instead, check if errno == EINTR after pg_usleep(), and only increase
SpinDelayStatus->delays if the sleep was not interrupted. Both the normal and
the windows implementation of pg_usleep() set errno == EINTR when interrupted.

This can substantially increase the amount of time until a stuck spinlock is
detected. While not great, it's much better than spuriously crash-restarting.
---
 src/port/pgsleep.c|  3 +++
 src/backend/storage/lmgr/s_lock.c | 12 +++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c
index 1284458bfce..1f87ef03490 100644
--- a/src/port/pgsleep.c
+++ b/src/port/pgsleep.c
@@ -36,6 +36,8 @@
  * might happen to run before the sleep begins, allowing the full delay.
  * Better practice is to use WaitLatch() with a timeout, so that backends
  * respond to latches and signals promptly.
+ *
+ * Some callers rely on errno being set to EINTR when interrupted.
  */
 void
 pg_usleep(long microsec)
@@ -49,6 +51,7 @@ pg_usleep(long microsec)
 		delay.tv_nsec = (microsec % 100L) * 1000;
 		(void) nanosleep(&delay, NULL);
 #else
+		/* will not be interrupted due to alertable = FALSE */
 		SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
 #endif
 	}
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index f725d026317..e173131e5b4 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -137,7 +137,7 @@ perform_spin_delay(SpinDelayStatus *status)
 	/* Block the process every spins_per_delay tries */
 	if (++(status->spins) >= spins_per_delay)

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-12 Thread Dmitry Koval

Thanks, Alexander!


Still now we're able to create a partition in the pg_temp schema
explicitly.


Attached patches with fix.

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom 2b68bbdb068e881e8ca6e34dec735f7ce656374f Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Wed, 10 Apr 2024 18:54:05 +0300
Subject: [PATCH v5 1/2] Fix for SPLIT/MERGE partitions of temporary table

---
 src/backend/commands/tablecmds.c  |  32 +++--
 src/backend/parser/parse_utilcmd.c|  36 -
 src/test/regress/expected/partition_merge.out | 124 +-
 src/test/regress/expected/partition_split.out |  29 
 src/test/regress/sql/partition_merge.sql  | 101 +-
 src/test/regress/sql/partition_split.sql  |  23 
 6 files changed, 324 insertions(+), 21 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8a98a0af48..b59e1dda03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -21145,17 +21145,19 @@ moveSplitTableRows(Relation rel, Relation splitRel, 
List *partlist, List *newPar
  * createPartitionTable: create table for a new partition with given name
  * (newPartName) like table (modelRelName)
  *
- * Emulates command: CREATE TABLE  (LIKE 
+ * Emulates command: CREATE [TEMP] TABLE  (LIKE 
  * INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY)
  */
 static void
-createPartitionTable(RangeVar *newPartName, RangeVar *modelRelName,
+createPartitionTable(Relation rel, RangeVar *newPartName, RangeVar 
*modelRelName,
 AlterTableUtilityContext *context)
 {
CreateStmt *createStmt;
TableLikeClause *tlc;
PlannedStmt *wrapper;
 
+   newPartName->relpersistence = rel->rd_rel->relpersistence;
+
createStmt = makeNode(CreateStmt);
createStmt->relation = newPartName;
createStmt->tableElts = NIL;
@@ -21291,7 +21293,7 @@ ATExecSplitPartition(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
SinglePartitionSpec *sps = (SinglePartitionSpec *) 
lfirst(listptr);
RelationnewPartRel;
 
-   createPartitionTable(sps->name, parentName, context);
+   createPartitionTable(rel, sps->name, parentName, context);
 
/* Open the new partition and acquire exclusive lock on it. */
newPartRel = table_openrv(sps->name, AccessExclusiveLock);
@@ -21488,10 +21490,10 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
{
/* Create partition table with generated temparary name. */
sprintf(tmpRelName, "merge-%u-%X-tmp", RelationGetRelid(rel), 
MyProcPid);
-   mergePartName = 
makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
-
tmpRelName, -1);
+   mergePartName = makeRangeVar(cmd->name->schemaname, tmpRelName, 
-1);
}
-   createPartitionTable(mergePartName,
+   createPartitionTable(rel,
+mergePartName,
 
makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
  
RelationGetRelationName(rel), -1),
 context);
@@ -21507,12 +21509,6 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
/* Copy data from merged partitions to new partition. */
moveMergedTablesRows(rel, mergingPartitionsList, newPartRel);
 
-   /*
-* Attach a new partition to the partitioned table. wqueue = NULL:
-* verification for each cloned constraint is not need.
-*/
-   attachPartitionTable(NULL, rel, newPartRel, cmd->bound);
-
/* Unlock and drop merged partitions. */
foreach(listptr, mergingPartitionsList)
{
@@ -21542,7 +21538,19 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
/* Rename partition. */
RenameRelationInternal(RelationGetRelid(newPartRel),
   cmd->name->relname, 
false, false);
+   /*
+* Bump the command counter to make the tuple of renamed 
partition
+* visible for attach partition operation.
+*/
+   CommandCounterIncrement();
}
+
+   /*
+* Attach a new partition to the partitioned table. wqueue = NULL:
+* verification for each cloned constraint is not needed.
+*/
+   attachPartitionTable(NULL, rel, newPartRel, cmd->bound);
+
/* Keep the lock until commit. */
table_close(newPartRel, NoLock);
 }
diff --git a/src/backend/parser/parse_utilcmd.c 
b/src/backend/parser/parse_utilcmd.c
index ceba06990

Re: CASE control block broken by a single line comment

2024-04-12 Thread Erik Wienhold
On 2024-04-09 00:54 +0200, Tom Lane wrote:
> I poked at this and found that the failures occur when the patched
> code decides to trim an expression like "_r.v" to just "_r", naturally
> breaking the semantics completely.  That happens because when
> plpgsql_yylex recognizes a compound token, it doesn't bother to
> adjust the token length to include the additional word(s).

Thanks Tom!  I haven't had the time to look at your patch.

I'm surprised that the lexer handles compound tokens.  I'd expect to
find that in the parser, especially because of using the context-aware
plpgsql_ns_lookup to determine if we have a T_DATUM or T_{WORD,CWORD}.

Is this done by the lexer to allow push-back of those compound tokens
and maybe even to also simplify some parser rules?

-- 
Erik




Re: Typos in the code and README

2024-04-12 Thread Bruce Momjian
On Thu, Apr 11, 2024 at 03:37:00PM +0200, Daniel Gustafsson wrote:
> On 11 Apr 2024, at 15:29, David Rowley  wrote:
> 
> On Fri, 12 Apr 2024, 1:05 am Daniel Gustafsson,  wrote:
> 
> Now that the tree has settled down a bit post-freeze I ran some 
> tooling
> to
> check spelling.  I was primarily interested in docs and README* which
> were
> mostly free from simply typos, while the code had some in various
> comments and
> one in code.  The attached fixes all that I came across (not
> cross-referenced
> against ongoing reverts or any other fixup threads but will be before
> pushing
> of course).
> 
> 
> I see you've corrected "iif" to "if". It should be "iff".
> 
> 
> Gotcha, will fix.  I opted for "if" due to recent threads where the use of
> "iff" was discouraged due to not being commonly known, but that was in doc/ 
> and
> not code comments.

I actually agree "iff" is just not clear enough.  "Iff" stands for "if
and only if" and maybe should be spelled out that way.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Typos in the code and README

2024-04-12 Thread Bruce Momjian
On Fri, Apr 12, 2024 at 04:55:16PM -0400, Bruce Momjian wrote:
> On Thu, Apr 11, 2024 at 03:37:00PM +0200, Daniel Gustafsson wrote:
> > On 11 Apr 2024, at 15:29, David Rowley  wrote:
> > 
> > On Fri, 12 Apr 2024, 1:05 am Daniel Gustafsson,  wrote:
> > 
> > Now that the tree has settled down a bit post-freeze I ran some 
> > tooling
> > to
> > check spelling.  I was primarily interested in docs and README* 
> > which
> > were
> > mostly free from simply typos, while the code had some in various
> > comments and
> > one in code.  The attached fixes all that I came across (not
> > cross-referenced
> > against ongoing reverts or any other fixup threads but will be 
> > before
> > pushing
> > of course).
> > 
> > 
> > I see you've corrected "iif" to "if". It should be "iff".
> > 
> > 
> > Gotcha, will fix.  I opted for "if" due to recent threads where the use of
> > "iff" was discouraged due to not being commonly known, but that was in doc/ 
> > and
> > not code comments.
> 
> I actually agree "iff" is just not clear enough.  "Iff" stands for "if
> and only if" and maybe should be spelled out that way.

Just to clarify, I think "if and only if" means "if A then B" and B can
only happen if A happens, meaning there are not other cases where B can
happen.  This latter part is what disinguishes "iff" from "if".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Reports on obsolete Postgres versions

2024-04-12 Thread Bruce Momjian
On Thu, Apr  4, 2024 at 04:51:50PM -0400, Bruce Momjian wrote:
> On Thu, Apr  4, 2024 at 04:38:10PM -0400, Greg Sabino Mullane wrote:
> > On Thu, Apr 4, 2024 at 2:23 PM Bruce Momjian  wrote:
> > +Such upgrades might require manual changes to complete so always read
> > +the release notes first.
> > 
> > Proposal:
> > "Such upgrades might require additional steps, so always read the release 
> > notes
> > first."
> 
> Yes, I modified that sentence.
> 
> > I went with frequently-encountered and low risk bugs".
> > 
> > But neither of those classifications are really true. Especially the "low 
> > risk"
> > part - I could see various ways a reader could wrongly interpret that.
> 
> I see your point.  Updated patch attached.

I am ready to apply this patch to the website.  How do I do this?  Do I
just commit this to the pgweb git tree?  Does that push to the website?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Typos in the code and README

2024-04-12 Thread Heikki Linnakangas

On 11/04/2024 16:05, Daniel Gustafsson wrote:

Now that the tree has settled down a bit post-freeze I ran some tooling to
check spelling.  I was primarily interested in docs and README* which were
mostly free from simply typos, while the code had some in various comments and
one in code.  The attached fixes all that I came across (not cross-referenced
against ongoing reverts or any other fixup threads but will be before pushing
of course).


Here's a few more. I've accumulate these over the past couple of months, 
keeping them stashed in a branch, adding to it whenever I've spotted a 
minor typo while reading the code.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 5f531b719c176b2f316b6341fa062af508ed2e10 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 7 Apr 2024 22:34:23 +0300
Subject: [PATCH 1/2] fix typos

---
 doc/src/sgml/maintenance.sgml   | 2 +-
 doc/src/sgml/meson.build| 2 +-
 src/backend/access/rmgrdesc/xactdesc.c  | 2 +-
 src/backend/catalog/system_functions.sql| 2 +-
 src/backend/commands/amcmds.c   | 2 +-
 src/backend/commands/tablecmds.c| 4 ++--
 src/backend/postmaster/walsummarizer.c  | 2 +-
 src/backend/replication/logical/slotsync.c  | 2 +-
 src/backend/statistics/dependencies.c   | 4 ++--
 src/backend/utils/adt/multirangetypes.c | 2 +-
 src/backend/utils/mmgr/aset.c   | 4 ++--
 src/backend/utils/mmgr/generation.c | 4 ++--
 src/bin/pg_basebackup/bbstreamer_tar.c  | 2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c | 2 +-
 src/interfaces/libpq/fe-secure-openssl.c| 2 +-
 src/test/modules/test_resowner/test_resowner_many.c | 2 +-
 src/test/regress/expected/aggregates.out| 2 +-
 src/test/regress/sql/aggregates.sql | 2 +-
 18 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2bfa05b8bc..0be90bdc7e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -802,7 +802,7 @@ HINT:  Execute a database-wide VACUUM in that database.
 
  Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
  system will begin to emit warning messages when the database's oldest MXIDs reach forty
- million transactions from the wraparound point.  And, just as an the XID case, if these
+ million transactions from the wraparound point.  And, just as in the XID case, if these
  warnings are ignored, the system will refuse to generate new MXIDs once there are fewer
  than three million left until wraparound.
 
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 3a4b47d387..e418de83a7 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -207,7 +207,7 @@ if docs_dep.found()
   alias_target('man', man)
   alias_target('install-man', install_doc_man)
 
-  # built and installed as part of the the docs target
+  # built and installed as part of the docs target
   installdocs += install_doc_man
   docs += man
 endif
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 41b842d80e..dccca201e0 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -25,7 +25,7 @@
  * Parse the WAL format of an xact commit and abort records into an easier to
  * understand format.
  *
- * This routines are in xactdesc.c because they're accessed in backend (when
+ * These routines are in xactdesc.c because they're accessed in backend (when
  * replaying WAL) and frontend (pg_waldump) code. This file is the only xact
  * specific one shared between both. They're complicated enough that
  * duplication would be bothersome.
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index fe2bb50f46..ae099e328c 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -5,7 +5,7 @@
  *
  * src/backend/catalog/system_functions.sql
  *
- * This file redefines certain built-in functions that it's impractical
+ * This file redefines certain built-in functions that are impractical
  * to fully define in pg_proc.dat.  In most cases that's because they use
  * SQL-standard function bodies and/or default expressions.  The node
  * tree representations of those are too unreadable, platform-dependent,
diff --git a/src/backend/commands/amcmds.c b/src/backend/commands/amcmds.c
index 10e386288a..aaa0f9a1dc 100644
--- a/src/backend/commands/amcmds.c
+++ b/src/backend/commands/amcmds.c
@@ -167,7 +167,7 @@ get_index_am_oid(const char *amname, bool missing_ok)
 
 /*
  * get_table_am_oid - given an access method name, look up its OID
- *		and verify it corresponds to an table AM.
+ *		and verify it corresponds to a table AM.
  */
 Oid
 g

Re: Typos in the code and README

2024-04-12 Thread Daniel Gustafsson
> On 12 Apr 2024, at 23:15, Heikki Linnakangas  wrote:
> 
> On 11/04/2024 16:05, Daniel Gustafsson wrote:
>> Now that the tree has settled down a bit post-freeze I ran some tooling to
>> check spelling.  I was primarily interested in docs and README* which were
>> mostly free from simply typos, while the code had some in various comments 
>> and
>> one in code.  The attached fixes all that I came across (not cross-referenced
>> against ongoing reverts or any other fixup threads but will be before pushing
>> of course).
> 
> Here's a few more. I've accumulate these over the past couple of months, 
> keeping them stashed in a branch, adding to it whenever I've spotted a minor 
> typo while reading the code.

Nice, let's lot all these together.

--
Daniel Gustafsson





Re: Simplify documentation related to Windows builds

2024-04-12 Thread Andres Freund
Hi,

On 2024-04-12 10:27:01 +0900, Michael Paquier wrote:
> Yeah.  These days I personally just go through stuff like Chocolatey
> or msys2 to get all my dependencies, or even a minimal set of them.  I
> suspect that most folks hanging around on pgsql-hackers do that as
> well.

Did that work with openssl for you? Because it didn't for me, when I tried
that for CI.

I didn't find it easy to find a working openssl for msvc, and when I did, it
was one a page that could easily just have been some phishing attempt. Because
of that I don't think we should remove the link to
https://slproweb.com/products/Win32OpenSSL.html


> So, yes, you're right that removing completely this list may be too
> aggressive for the end-user.  As far as I can see, there are a few
> things that stand out:

> - Diff is not mentioned in the list of dependencies on the meson page,
> and it may not exist by default on Windows.  I think that we should
> add it.

That seems quite basic compared to everything else. But also not opposed.

I guess it might be worth checking if diff is present during meson configure,
so it's not just some weird error. I didn't really think about that when
writing the meson stuff, because it's just a hardcoded command in
pg_regress.c, not something that visible to src/tools/msvc, configure or such.


> - We don't use activeperl anymore in the buildfarm, and recommending
> it is not a good idea based on the state of the project.  If we don't
> remove the entry, I would switch it to point to msys perl or even
> strawberry perl.  Andres has expressed concerns about the strawberry
> part, so perhaps mentioning only msys perl would be enough?

I think it's nonobvious enough to install that I think it's worth keeping
something. I tried at some point, and unfortunately the perl from
git-for-windows install doesn't quite work. It needs to be a perl targeting
ucrt (or perhaps some other native target).




> > So I think that it's pretty darn helpful to have some installation
> > instructions in the documentation for stuff like this, just like I
> > think it's useful that in the documentation index we tell people how
> > to get the doc toolchain working on various platforms.


FWIW, here's the mingw install commands to install a suitable environment for
building postgres on windows with mingw, from the automated image generation
for CI:

https://github.com/anarazel/pg-vm-images/blob/main/scripts/windows_install_mingw64.ps1#L21-L22

I wonder if we should maintain something like that list somewhere in the
postgres repo instead...

Greetings,

Andres Freund




Re: allow changing autovacuum_max_workers without restarting

2024-04-12 Thread Imseih (AWS), Sami
>> 1/ We should emit a log when autovacuum_workers is set higher than the max.


>> Hm. Maybe the autovacuum launcher could do that.

Would it be better to use a GUC check_hook that compares the 
new value with the max allowed values and emits a WARNING ?

autovacuum_max_workers already has a check_autovacuum_max_workers
check_hook, which can be repurposed for this.

In the POC patch, this check_hook is kept as-is, which will no longer make 
sense.

>> 2/ should the name of the restart limit be "reserved_autovacuum_workers"?


>> That's kind-of what I had in mind, although I think we might want to avoid
>> the word "reserved" because it sounds a bit like reserved_connections 
>> and superuser_reserved_connections

Yes, I agree. This can be confusing.

>>  "autovacuum_max_slots" or
>> "autovacuum_max_worker_slots" might be worth considering, too.

"autovacuum_max_worker_slots" is probably the best option because
we should have "worker" in the name of the GUC.


Regards,

Sami



Re: CASE control block broken by a single line comment

2024-04-12 Thread Tom Lane
Erik Wienhold  writes:
> I'm surprised that the lexer handles compound tokens.  I'd expect to
> find that in the parser, especially because of using the context-aware
> plpgsql_ns_lookup to determine if we have a T_DATUM or T_{WORD,CWORD}.

I'm not here to defend plpgsql's factorization ;-).  However, it
doesn't really have a parser of its own, at least not for expressions,
so I'm not sure how your suggestion could be made to work.

regards, tom lane




Re: Add notes to pg_combinebackup docs

2024-04-12 Thread David Steele




On 4/12/24 22:40, Tomas Vondra wrote:

On 4/12/24 11:50, David Steele wrote:

On 4/12/24 19:09, Magnus Hagander wrote:

On Fri, Apr 12, 2024 at 12:14 AM David Steele >
  > But yeah, having to keep the backups as expanded directories is
not
  > great, I'd love to have .tar. Not necessarily because of the disk
     space
  > (in my experience the compression in filesystems works quite
well for
  > this purpose), but mostly because it's more compact and allows
     working
  > with backups as a single piece of data (e.g. it's much cleared
     what the
  > checksum of a single .tar is, compared to a directory).

     But again, object stores are commonly used for backup these days and
     billing is based on data stored rather than any compression that
can be
     done on the data. Of course, you'd want to store the compressed
tars in
     the object store, but that does mean storing an expanded copy
somewhere
     to do pg_combinebackup.

Object stores are definitely getting more common. I wish they were
getting a lot more common than they actually are, because they
simplify a lot.  But they're in my experience still very far from
being a majority.


I see it the other way, especially the last few years. The majority seem
to be object stores followed up closely by NFS. Directly mounted storage
on the backup host appears to be rarer.



One thing I'd mention is that not having built-in support for .tar and
.tgz backups does not mean it's impossible to use pg_combinebackup with
archives. You can mount them using e.g. "ratarmount" and then use that
as source directories for pg_combinebackup.

It's not entirely friction-less because AFAICS it's necessary to do the
backup in plain format and then do the .tar to have the expected "flat"
directory structure (and not manifest + 2x tar). But other than that it
seems to work fine (based on my limited testing).


Well, that's certainly convoluted and doesn't really help a lot in terms 
of space consumption, it just shifts the additional space required to 
the backup side. I doubt this is something we'd be willing to add to our 
documentation so it would be up to the user to figure out and script.



FWIW the "archivemount" performs terribly, so adding this capability
into pg_combinebackup is clearly far from trivial.


I imagine this would perform pretty badly. And yes, doing it efficiently 
is not trivial but certainly doable. Scanning the tar file and matching 
to entries in the manifest is one way, but I would prefer to store the 
offsets into the tar file in the manifest then assemble an ordered list 
of work to do on each tar file. But of course the latter requires a 
manifest-centric approach, which is not what we have right now.


Regards,
-David




Re: post-freeze damage control

2024-04-12 Thread David Steele

On 4/12/24 22:12, Tomas Vondra wrote:

On 4/11/24 23:48, David Steele wrote:

On 4/11/24 20:26, Tomas Vondra wrote:

>>

FWIW that discussion also mentions stuff that I think the feature should
not do. In particular, I don't think the ambition was (or should be) to
make pg_basebackup into a stand-alone tool. I always saw pg_basebackup
more as an interface to "backup steps" correctly rather than a complete
backup solution that'd manage backup registry, retention, etc.


Right -- this is exactly my issue. pg_basebackup was never easy to use
as a backup solution and this feature makes it significantly more
complicated. Complicated enough that it would be extremely difficult for
most users to utilize in a meaningful way.


Perhaps, I agree we could/should try to do better job to do backups, no
argument there. But I still don't quite see why introducing such
infrastructure to "manage backups" should be up to the patch adding
incremental backups. I see it as something to build on top of
pg_basebackup/pg_combinebackup, not into those tools.


I'm not saying that managing backups needs to be part of pg_basebackup, 
but I am saying without that it is not a complete backup solution. 
Therefore introducing advanced features that the user then has to figure 
out how to manage puts a large burden on them. Implementing 
pg_combinebackup inefficiently out of the gate just makes their life harder.



But they'll try because it is a new pg_basebackup feature and they'll
assume it is there to be used. Maybe it would be a good idea to make it
clear in the documentation that significant tooling will be required to
make it work.


Sure, I'm not against making it clearer pg_combinebackup is not a
complete backup solution, and documenting the existing restrictions.


Let's do that then. I think it would make sense to add caveats to the 
pg_combinebackup docs including space requirements, being explicit about 
the format required (e.g. plain), and also possible mitigation with COW 
filesystems.


Regards,
-David




Re: Reports on obsolete Postgres versions

2024-04-12 Thread Jonathan S. Katz

On 4/12/24 2:12 PM, Bruce Momjian wrote:


I am ready to apply this patch to the website.  How do I do this?  Do I
just commit this to the pgweb git tree?  Does that push to the website?


I pushed this to the website[1].

Thanks,

Jonathan

[1] https://www.postgresql.org/support/versioning/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: post-freeze damage control

2024-04-12 Thread David Steele

On 4/12/24 22:27, Tomas Vondra wrote:



On 4/12/24 08:42, David Steele wrote:

On 4/11/24 20:26, Tomas Vondra wrote:

On 4/11/24 03:52, David Steele wrote:

On 4/11/24 10:23, Tom Kincaid wrote:


The extensive Beta process we have can be used to build confidence we
need in a feature that has extensive review and currently has no known
issues or outstanding objections.


I did have objections, here [1] and here [2]. I think the complexity,
space requirements, and likely performance issues involved in restores
are going to be a real problem for users. Some of these can be addressed
in future releases, but I can't escape the feeling that what we are
releasing here is half-baked.


I do not think it's half-baked. I certainly agree there are limitations,
and there's all kinds of bells and whistles we could add, but I think
the fundamental infrastructure is corrent and a meaningful step forward.
Would I wish it to handle .tar for example? Sure I would. But I think
it's something we can add in the future - if we require all of this to
happen in a single release, it'll never happen.


I'm not sure that I really buy this argument, anyway. It is not uncommon
for significant features to spend years in development before they are
committed. This feature went from first introduction to commit in just
over six months. Obviously Robert had been working on it for a while,
but for a feature this large six months is a sprint.



Sure, but it's also not uncommon for significant features to be
developed incrementally, over multiple releases, introducing the basic
infrastructure first, and then expanding the capabilities later. I'd
cite logical decoding/replication and parallel query as examples of this
approach.

It's possible there's some fundamental flaw in the WAL summarization?
Sure, I can't rule that out, although I find it unlikely. Could there be
bugs? Sure, that's possible, but that applies to all code.

But it seems to me all the comments are about the client side, not about
the infrastructure. Which is fair, I certainly agree it'd be nice to
handle more use cases with less effort, but I still think the patch is a
meaningful step forward.


Yes, my comments are all about the client code. I like the 
implementation of the WAL summarizer a lot. I don't think there is a 
fundamental flaw in the design, either, but I wouldn't be surprised if 
there are bugs. That's life in software development biz.


Even for the summarizer, though, I do worry about the complexity of 
maintaining it over time. It seems like it would be very easy to 
introduce a bug and have it go unnoticed until it causes problems in the 
field. A lot of testing was done outside of the test suite for this 
feature and I'm not sure if we can rely on that focus with every release.


For me an incremental approach would be to introduce the WAL summarizer 
first. There are already plenty of projects that do page-level 
incremental (WAL-G, pg_probackup, pgBackRest) and could help shake out 
the bugs. Then introduce the client tools later when they are more 
robust. Or, release the client tools now but mark them as experimental 
or something so people know that changes are coming and they don't get 
blindsided by that in the next release. Or, at the very least, make the 
caveats very clear so users can make an informed choice.


Regards,
-David




Re: Simplify documentation related to Windows builds

2024-04-12 Thread Michael Paquier
On Fri, Apr 12, 2024 at 02:53:58PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-04-12 10:27:01 +0900, Michael Paquier wrote:
> > Yeah.  These days I personally just go through stuff like Chocolatey
> > or msys2 to get all my dependencies, or even a minimal set of them.  I
> > suspect that most folks hanging around on pgsql-hackers do that as
> > well.
> 
> Did that work with openssl for you? Because it didn't for me, when I tried
> that for CI.

Yes, I recall pulling in OpenSSL from Chocolatey the last time I've
tested it.  Perhaps my memories are fuzzy though, it was a couple of
months ago and I don't have the host at hand anymore.

> I didn't find it easy to find a working openssl for msvc, and when I did, it
> was one a page that could easily just have been some phishing attempt. Because
> of that I don't think we should remove the link to
> https://slproweb.com/products/Win32OpenSSL.html

Okay, noted.

>> So, yes, you're right that removing completely this list may be too
>> aggressive for the end-user.  As far as I can see, there are a few
>> things that stand out:
> 
>> - Diff is not mentioned in the list of dependencies on the meson page,
>> and it may not exist by default on Windows.  I think that we should
>> add it.
> 
> That seems quite basic compared to everything else. But also not opposed.
> 
> I guess it might be worth checking if diff is present during meson configure,
> so it's not just some weird error. I didn't really think about that when
> writing the meson stuff, because it's just a hardcoded command in
> pg_regress.c, not something that visible to src/tools/msvc, configure or such.

A meson check would make sense here to catch that earlier.  We do that
for IPC::Run.

>> - We don't use activeperl anymore in the buildfarm, and recommending
>> it is not a good idea based on the state of the project.  If we don't
>> remove the entry, I would switch it to point to msys perl or even
>> strawberry perl.  Andres has expressed concerns about the strawberry
>> part, so perhaps mentioning only msys perl would be enough?
> 
> I think it's nonobvious enough to install that I think it's worth keeping
> something. I tried at some point, and unfortunately the perl from
> git-for-windows install doesn't quite work. It needs to be a perl targeting
> ucrt (or perhaps some other native target).

The question would be which one.  Msys perl is used across a few
buildfarm members, and Andrew has some success with it.

>>> So I think that it's pretty darn helpful to have some installation
>>> instructions in the documentation for stuff like this, just like I
>>> think it's useful that in the documentation index we tell people how
>>> to get the doc toolchain working on various platforms.
> 
> FWIW, here's the mingw install commands to install a suitable environment for
> building postgres on windows with mingw, from the automated image generation
> for CI:
> 
> https://github.com/anarazel/pg-vm-images/blob/main/scripts/windows_install_mingw64.ps1#L21-L22
> 
> I wonder if we should maintain something like that list somewhere in the
> postgres repo instead...

+1.  That sounds to me like the doc material we could add in the
Windows build section for meson.
--
Michael


signature.asc
Description: PGP signature


Re: Can't find not null constraint, but \d+ shows that

2024-04-12 Thread jian he
On Fri, Apr 12, 2024 at 3:52 PM Alvaro Herrera  wrote:
>
> On 2024-Apr-12, jian he wrote:
>
> > Now I am more confused...
>
> > +CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));
> > +ALTER TABLE  notnull_tbl1 DROP c1;
>
> > same query, mysql make let "c0" be not null
>
> Yes, that was Postgres' old model.  But the way we think of it now, is
> that a column is marked attnotnull when a pg_constraint entry exists to
> support that flag, which can be a not-null constraint, or a primary key
> constraint.  In the old Postgres model, you're right that we would
> continue to have c0 as not-null, just like mysql.  In the new model,
> that flag no longer has no reason to be there, because the backing
> primary key constraint has been removed, which is why we reset it.
>
> So what I was saying in the cases with replica identity and generated
> columns, is that there's an attnotnull flag we cannot remove, because of
> either of those things, but we don't have any backing constraint for it,
> which is an inconsistency with the view of the world that I described
> above.  I would like to manufacture one not-null constraint at that
> point, or just abort the drop of the PK ... but I don't see how to do
> either of those things.
>

thanks for your explanation.
now I understand it.
I wonder is there any incompatibility issue, or do we need to say something
about the new behavior when dropping a key column?

the comments look good to me.

only minor cosmetic issue:
+ if (unconstrained_cols)
i would like change it to
+ if (unconstrained_cols != NIL)

+ foreach(lc, unconstrained_cols)
we can change to
+ foreach_int(attnum, unconstrained_cols)
per commit
https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff




ci: Allow running mingw tests by default via environment variable

2024-04-12 Thread Andres Freund
Hi,

We have CI support for mingw, but don't run the task by default, as it eats up
precious CI credits.  However, for cfbot we are using custom compute resources
(currently donated by google), so we can afford to run the mingw tests. Right
now that'd require cfbot to patch .cirrus.tasks.yml.

While one can manually trigger manual task in one one's own repo, most won't
have the permissions to do so for cfbot.


I propose that we instead run the task automatically if
$REPO_MINGW_TRIGGER_BY_DEFAULT is set, typically in cirrus' per-repository
configuration.

Unfortunately that's somewhat awkward to do in the cirrus-ci yaml
configuration, the set of conditional expressions supported is very
simplistic.

To deal with that, I extended .cirrus.star to compute the required environment
variable. If $REPO_MINGW_TRIGGER_BY_DEFAULT is set, CI_MINGW_TRIGGER_TYPE is
set to 'automatic', if not it's 'manual'.


We've also talked in other threads about adding CI support for
1) windows, building with visual studio
2) linux, with musl libc
3) free/netbsd

That becomes more enticing, if we can enable them by default on cfbot but not
elsewhere. With this change, it'd be easy to add further variables to control
such future tasks.



I also attached a second commit, that makes the "code" dealing with ci-os-only
in .cirrus.tasks.yml simpler. While I think it does nicely simplify
.cirrus.tasks.yml, overall it adds lines, possibly making this not worth it.
I'm somewhat on the fence.


Thoughts?


On the code level, I thought if it'd be good to have a common prefix for all
the automatically set variables. Right now that's CI_, but I'm not at all
wedded to that.


Greetings,

Andres Freund
>From 5d26ecfedbcbc83b4cb6e41a34c1af9a3ab24cdb Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 12 Apr 2024 15:04:32 -0700
Subject: [PATCH v2 1/2] ci: Allow running mingw tests by default via
 environment variable

---
 .cirrus.yml | 12 ++--
 .cirrus.star| 39 +++
 .cirrus.tasks.yml   | 12 ++--
 src/tools/ci/README | 12 
 4 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index a83129ae46d..f270f61241f 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -10,12 +10,20 @@
 #
 # 1) the contents of this file
 #
-# 2) if defined, the contents of the file referenced by the, repository
+# 2) computed environment variables
+#
+#Used to enable/disable tasks based on the execution environment. See
+#.cirrus.star: compute_environment_vars()
+#
+# 3) if defined, the contents of the file referenced by the, repository
 #level, REPO_CI_CONFIG_GIT_URL variable (see
 #https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted
 #format)
 #
-# 3) .cirrus.tasks.yml
+#This allows running tasks in a different execution environment than the
+#default, e.g. to have sufficient resources for cfbot.
+#
+# 4) .cirrus.tasks.yml
 #
 # This composition is done by .cirrus.star
 
diff --git a/.cirrus.star b/.cirrus.star
index 36233872d1e..7a164bb00de 100644
--- a/.cirrus.star
+++ b/.cirrus.star
@@ -7,7 +7,7 @@ https://github.com/bazelbuild/starlark/blob/master/spec.md
 See also .cirrus.yml and src/tools/ci/README
 """
 
-load("cirrus", "env", "fs")
+load("cirrus", "env", "fs", "yaml")
 
 
 def main():
@@ -18,19 +18,36 @@ def main():
 
 1) the contents of .cirrus.yml
 
-2) if defined, the contents of the file referenced by the, repository
+2) computed environment variables
+
+3) if defined, the contents of the file referenced by the, repository
level, REPO_CI_CONFIG_GIT_URL variable (see
https://cirrus-ci.org/guide/programming-tasks/#fs for the accepted
format)
 
-3) .cirrus.tasks.yml
+4) .cirrus.tasks.yml
 """
 
 output = ""
 
 # 1) is evaluated implicitly
 
+
 # Add 2)
+additional_env = compute_environment_vars()
+env_fmt = """
+###
+# Computed environment variables start here
+###
+{0}
+###
+# Computed environment variables end here
+###
+"""
+output += env_fmt.format(yaml.dumps({'env': additional_env}))
+
+
+# Add 3)
 repo_config_url = env.get("REPO_CI_CONFIG_GIT_URL")
 if repo_config_url != None:
 print("loading additional configuration from \"{}\"".format(repo_config_url))
@@ -38,12 +55,26 @@ def main():
 else:
 output += "\n# REPO_CI_CONFIG_URL was not set\n"
 
-# Add 3)
+
+# Add 4)
 output += config_from(".cirrus.tasks.yml")
 
+
 return output
 
 
+def compute_environment_vars():
+cenv = {}
+
+# See definition of mingw task in .cirrus.tasks.yml
+if env.get("REPO_MINGW_TRIGGER_BY_DEFAULT") in ['true', '1', 'yes']:
+cenv['CI_MINGW_TRIGGER_TYPE'] = 'automatic'
+else:
+cenv['CI_MINGW_TRIGGER_TYPE'] = 'manual'
+
+return cenv
+
+
 def config_from(config_src):
 """return contents of config file `config_src`, surrounded by markers
 indicating start / end of the i

Re: gcc 12.1.0 warning

2024-04-12 Thread Andres Freund
Hi,

On 2023-10-27 13:09:01 +0300, Nazir Bilal Yavuz wrote:
> I was testing 'upgrading CI Debian images to bookworm'. I tested the
> bookworm on REL_15, REL_16 and upstream. REL_16 and upstream finished
> successfully but the CompilerWarnings task failed on REL_15 with the
> same error.

Is that still the case?  I briefly tried to repro this outside of CI and
couldn't reproduce the warning.

I'd really like to upgrade the CI images, it doesn't seem great to continue
using oldstable.


> gcc version: 12.2.0
> 
> CI Run: https://cirrus-ci.com/task/6151742664998912

Unfortunately the logs aren't accessible anymore, so I can't check the precise
patch level of the compiler and/or the precise invocation used.

Greetings,

Andres Freund




Re: CASE control block broken by a single line comment

2024-04-12 Thread Erik Wienhold
On 2024-04-13 00:20 +0200, Tom Lane wrote:
> Erik Wienhold  writes:
> > I'm surprised that the lexer handles compound tokens.  I'd expect to
> > find that in the parser, especially because of using the context-aware
> > plpgsql_ns_lookup to determine if we have a T_DATUM or T_{WORD,CWORD}.
> 
> I'm not here to defend plpgsql's factorization ;-).  However, it
> doesn't really have a parser of its own, at least not for expressions,
> so I'm not sure how your suggestion could be made to work.

Not a suggestion.  Just a question about the general design, unrelated
to this fix, in case you know the answer off the cuff.  I see that
863a62064c already had the lexer handle those compound tokens, but
unfortunately without an explanation on why.  Never mind if that's too
much to ask about a design descision made over 25 years ago.

-- 
Erik




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-12 Thread Bharath Rupireddy
On Sat, Apr 6, 2024 at 5:10 PM Bharath Rupireddy
 wrote:
>
> Please see the attached v38 patch.

Hi, thanks everyone for reviewing the design and patches so far. Here
I'm with the v39 patches implementing inactive timeout based (0001)
and XID age based (0002) invalidation mechanisms.

I'm quoting the hackers who are okay with inactive timeout based
invalidation mechanism:
Bertrand Drouvot -
https://www.postgresql.org/message-id/ZgL0N%2BxVJNkyqsKL%40ip-10-97-1-34.eu-west-3.compute.internal
and 
https://www.postgresql.org/message-id/ZgPHDAlM79iLtGIH%40ip-10-97-1-34.eu-west-3.compute.internal
Amit Kapila - 
https://www.postgresql.org/message-id/CAA4eK1L3awyzWMuymLJUm8SoFEQe%3DDa9KUwCcAfC31RNJ1xdJA%40mail.gmail.com
Nathan Bossart -
https://www.postgresql.org/message-id/20240325195443.GA2923888%40nathanxps13
Robert Haas - 
https://www.postgresql.org/message-id/CA%2BTgmoZTbaaEjSZUG1FL0mzxAdN3qmXksO3O9_PZhEuXTkVnRQ%40mail.gmail.com

I'm quoting the hackers who are okay with XID age based invalidation mechanism:
Nathan Bossart -
https://www.postgresql.org/message-id/20240326150918.GB3181099%40nathanxps13
and https://www.postgresql.org/message-id/20240327150557.GA3994937%40nathanxps13
Alvaro Herrera -
https://www.postgresql.org/message-id/202403261539.xcjfle7sksz7%40alvherre.pgsql
Bertrand Drouvot -
https://www.postgresql.org/message-id/ZgPHDAlM79iLtGIH%40ip-10-97-1-34.eu-west-3.compute.internal
Amit Kapila - 
https://www.postgresql.org/message-id/CAA4eK1L3awyzWMuymLJUm8SoFEQe%3DDa9KUwCcAfC31RNJ1xdJA%40mail.gmail.com

There was a point raised by Robert
https://www.postgresql.org/message-id/CA%2BTgmoaRECcnyqxAxUhP5dk2S4HX%3DpGh-p-PkA3uc%2BjG_9hiMw%40mail.gmail.com
for XID age based invalidation. An issue related to
vacuum_defer_cleanup_age
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=be504a3e974d75be6f95c8f9b7367126034f2d12
led to the removal of the GUC
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1118cd37eb61e6a2428f457a8b2026a7bb3f801a.
The same issue may not happen for the XID age based invaliation. This
is because the XID age is not calculated using FullTransactionId but
using TransactionId as the slot's xmin and catalog_xmin are tracked as
TransactionId.

There was a point raised by Amit
https://www.postgresql.org/message-id/CAA4eK1K8wqLsMw6j0hE_SFoWAeo3Kw8UNnMfhsWaYDF1GWYQ%2Bg%40mail.gmail.com
on when to do the XID age based invalidation - whether in checkpointer
or when vacuum is being run or whenever ComputeXIDHorizons gets called
or in autovacuum process. For now, I've chosen the design to do these
new invalidation checks in two places - 1) whenever the slot is
acquired and the slot acquisition errors out if invalidated, 2) during
checkpoint. However, I'm open to suggestions on this.

I've also verified the case whether the replication_slot_xid_age
setting can help in case of server inching towards the XID wraparound.
I've created a primary and streaming standby setup with
hot_standby_feedback set to on (so that the slot gets an xmin). Then,
I've set replication_slot_xid_age to 2 billion on the primary, and
used xid_wraparound extension to reach XID wraparound on the primary.
Once I start receiving the WARNINGs about VACUUM, I did a checkpoint
after which the slot got invalidated enabling my VACUUM to freeze XIDs
saving my database from XID wraparound problem.

Thanks a lot Masahiko Sawada for an offlist chat about the XID age
calculation logic.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From f3ba2562ba7d9c4f13e283740260025b8d1c9b0f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 12 Apr 2024 14:52:35 +
Subject: [PATCH v39 1/2] Add inactive_timeout based replication slot
 invalidation.

Till now, postgres has the ability to invalidate inactive
replication slots based on the amount of WAL (set via
max_slot_wal_keep_size GUC) that will be needed for the slots in
case they become active. However, choosing a default value for
max_slot_wal_keep_size is tricky. Because the amount of WAL a
customer generates, and their allocated storage will vary greatly
in production, making it difficult to pin down a one-size-fits-all
value. It is often easy for developers to set a timeout of say 1
or 2 or 3 days, after which the inactive slots get invalidated.

To achieve the above, postgres introduces a GUC allowing users
set inactive timeout. The replication slots that are inactive
for longer than specified amount of time get invalidated.

The invalidation check happens at various locations to help being
as latest as possible, these locations include the following:
- Whenever the slot is acquired and the slot acquisition errors
out if invalidated.
- During checkpoint

Note that this new invalidation mechanism won't kick-in for the
slots that are currently being synced from the primary to the
standby, because such synced slots are typically considered not
active (for them to be later conside

Re: cpluspluscheck/headerscheck require build in REL_16_STABLE

2024-04-12 Thread John Naylor
On Fri, Apr 12, 2024 at 11:51 PM Marina Polyakova
 wrote:
>
> Hello, hackers!
>
> When running cpluspluscheck/headerscheck on REL_16_STABLE [1] I found
> that everything was ok only if it was preceded by a build and make
> maintainer-clean was not used:

I can reproduce this.

> In the other branches everything is fine: these problems begin with
> commits [2] (jsonpath_gram.h) and [3] (gram.h) and in the master branch
> there're no such problems after commit [4]. The attached diff.patch
> fixes this issue for me. (IIUC internal headers generated by Bison are
> usually excluded from such checks so I also excluded gramparse.h and
> jsonpath_internal.h...)

I'm not in favor of this patch because these files build fine on
master, so there is no good reason to exclude them. We should arrange
so that they build fine on PG16 as well. The problem is, not all the
required headers are generated when invoking `make headerscheck`. The
attached patch brings in some Makefile rules from master to make this
work. Does this fix it for you?
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 3c42003175..82cae98a44 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -160,7 +160,7 @@ submake-utils-headers:
 
 .PHONY: generated-headers
 
-generated-headers: $(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers submake-nodes-headers submake-utils-headers
+generated-headers: $(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers submake-nodes-headers submake-utils-headers parser/gram.h
 
 $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index deb901609f..4299735cb6 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -38,9 +38,12 @@ all: distprep probes.h generated-header-symlinks
 
 distprep: fmgr-stamp errcodes.h
 
-.PHONY: generated-header-symlinks
+.PHONY: generated-header-symlinks submake-adt-headers
 
-generated-header-symlinks: $(top_builddir)/src/include/utils/header-stamp $(top_builddir)/src/include/utils/probes.h
+generated-header-symlinks: $(top_builddir)/src/include/utils/header-stamp $(top_builddir)/src/include/utils/probes.h submake-adt-headers
+
+submake-adt-headers:
+	$(MAKE) -C adt jsonpath_gram.h
 
 $(SUBDIRS:%=%-recursive): fmgr-stamp errcodes.h
 


Re: CASE control block broken by a single line comment

2024-04-12 Thread Tom Lane
Erik Wienhold  writes:
> On 2024-04-13 00:20 +0200, Tom Lane wrote:
>> I'm not here to defend plpgsql's factorization ;-).  However, it
>> doesn't really have a parser of its own, at least not for expressions,
>> so I'm not sure how your suggestion could be made to work.

> Not a suggestion.  Just a question about the general design, unrelated
> to this fix, in case you know the answer off the cuff.  I see that
> 863a62064c already had the lexer handle those compound tokens, but
> unfortunately without an explanation on why.  Never mind if that's too
> much to ask about a design descision made over 25 years ago.

Well, it was a design decision I wasn't involved in, so I dunno.
You could reach out to Jan on the slim chance he remembers.

regards, tom lane




Re: UUID v7

2024-04-12 Thread Andrey M. Borodin



> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio  wrote:
> 
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in 
lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now 
in AUTH48 state. This means after final approval by authors RFC will be 
imminently publicised. Most probably, this will happen circa 2 weeks after 
feature freeze :)


Best regards, Andrey Borodin.

[0] https://www.rfc-editor.org/auth48/rfc9562