Re: [RFC] Lock-free XLog Reservation from WAL

2025-02-05 Thread Japin Li
On Mon, 27 Jan 2025 at 17:30, "Zhou, Zhiguo"  wrote:
> On 1/26/2025 10:59 PM, Yura Sokolov wrote:
>> 24.01.2025 12:07, Japin Li пишет:
>>> On Thu, 23 Jan 2025 at 21:44, Japin Li  wrote:
 On Thu, 23 Jan 2025 at 15:03, Yura Sokolov
  wrote:
> 23.01.2025 11:46, Japin Li пишет:
>> On Wed, 22 Jan 2025 at 22:44, Japin Li  wrote:
>>> On Wed, 22 Jan 2025 at 17:02, Yura Sokolov
>>>  wrote:
 I believe, I know why it happens: I was in hurry making v2 by
 cherry-picking internal version. I reverted some changes in
 CalcCuckooPositions manually and forgot to add modulo
 PREV_LINKS_HASH_CAPA.

 Here's the fix:

   pos->pos[0] = hash % PREV_LINKS_HASH_CAPA;
 -   pos->pos[1] = pos->pos[0] + 1;
 +   pos->pos[1] = (pos->pos[0] + 1) % PREV_LINKS_HASH_CAPA;
   pos->pos[2] = (hash >> 16) % PREV_LINKS_HASH_CAPA;
 -   pos->pos[3] = pos->pos[2] + 2;
 +   pos->pos[3] = (pos->pos[2] + 2) % PREV_LINKS_HASH_CAPA;

 Any way, here's v3:
 - excess file "v0-0001-Increase..." removed. I believe it was source
     of white-space apply warnings.
 - this mistake fixed
 - more clear slots strategies + "8 positions in two
 cache-lines" strategy.

 You may play with switching PREV_LINKS_HASH_STRATEGY to 2 or 3
 and see
 if it affects measurably.
>>>
>>> Thanks for your quick fixing.  I will retest it tomorrow.
>> Hi, Yura Sokolov
>> Here is my test result of the v3 patch:
>> | case  | min    | avg    | max
>> |
>> |---++
>> +|
>> | master (44b61efb79)   | 865,743.55 | 871,237.40 |
>> 874,492.59 |
>> | v3    | 857,020.58 | 860,180.11 |
>> 864,355.00 |
>> | v3 PREV_LINKS_HASH_STRATEGY=2 | 853,187.41 | 855,796.36 |
>> 858,436.44 |
>> | v3 PREV_LINKS_HASH_STRATEGY=3 | 863,131.97 | 864,272.91 |
>> 865,396.42 |
>> It seems there are some performance decreases :( or something I
>> missed?
>>
>
> Hi, Japin.
> (Excuse me for duplicating message, I found I sent it only to you
> first time).
>
 Never mind!

> v3 (as well as v2) doesn't increase NUM_XLOGINSERT_LOCKS itself.
> With only 8 in-progress inserters spin-lock is certainly better
> than any
> more complex solution.
>
> You need to compare "master" vs "master + NUM_XLOGINSERT_LOCKS=64" vs
> "master + NUM_XLOGINSERT_LOCKS=64 + v3".
>
> And even this way I don't claim "Lock-free reservation" gives any
> profit.
>
> That is why your benchmarking is very valuable! It could answer, does
> we need such not-small patch, or there is no real problem at all?
>
>>>
>>> Hi, Yura Sokolov
>>>
>>> Here is the test result compared with NUM_XLOGINSERT_LOCKS and the
>>> v3 patch.
>>>
>>> | case  | min  | avg  |
>>> max  | rate% |
>>> |---+--+--+--
>>> +---|
>>> | master (4108440)  | 891,225.77   | 904,868.75   |
>>> 913,708.17   |    |
>>> | lock 64   | 1,007,716.95 | 1,012,013.22 |
>>> 1,018,674.00 | 11.84 |
>>> | lock 64 attempt 1 | 1,016,716.07 | 1,017,735.55 |
>>> 1,019,328.36 | 12.47 |
>>> | lock 64 attempt 2 | 1,015,328.31 | 1,018,147.74 |
>>> 1,021,513.14 | 12.52 |
>>> | lock 128  | 1,010,147.38 | 1,014,128.11 |
>>> 1,018,672.01 | 12.07 |
>>> | lock 128 attempt 1    | 1,018,154.79 | 1,023,348.35 |
>>> 1,031,365.42 | 13.09 |
>>> | lock 128 attempt 2    | 1,013,245.56 | 1,018,984.78 |
>>> 1,023,696.00 | 12.61 |
>>> | lock 64 v3    | 1,010,893.30 | 1,022,787.25 |
>>> 1,029,200.26 | 13.03 |
>>> | lock 64 attempt 1 v3  | 1,014,961.21 | 1,019,745.09 |
>>> 1,025,511.62 | 12.70 |
>>> | lock 64 attempt 2 v3  | 1,015,690.73 | 1,018,365.46 |
>>> 1,020,200.57 | 12.54 |
>>> | lock 128 v3   | 1,012,653.14 | 1,013,637.09 |
>>> 1,014,358.69 | 12.02 |
>>> | lock 128 attempt 1 v3 | 1,008,027.57 | 1,016,849.87 |
>>> 1,024,597.15 | 12.38 |
>>> | lock 128 attempt 2 v3 | 1,020,552.04 | 1,024,658.92 |
>>> 1,027,855.90 | 13.24 |
>
> The data looks really interesting and I recognize the need for further
> investigation. I'm not very familiar with BenchmarkSQL but we've done
> similar tests with HammerDB/TPCC by solely increasing
> NUM_XLOGINSERT_LOCKS from 8 to 128, and we observed a significant
> performance drop of ~50% and the cycle ratio of spinlock acquisition
> (s_lock) rose to over 60% of the total, which is basically consistent
> with the previous findings in [1].
>
> Could you please share the details of your test environment, including
> the device, configuration, and test approach, so we can collaborate on
> understanding the differences?

Sorr

Re: using index to speedup add not null constraints to a table

2025-02-05 Thread jian he
hi

rebased new patch attached.
I also did some cosmetic changes. comments refined.
make sure using index_scan mechanism to fast check column not-null can
only be used via btree index.
isolation tests are simplified.
From be8fea79986339cadff21dbe9c4415b330a296a3 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 5 Feb 2025 16:05:10 +0800
Subject: [PATCH v3 1/1] using index to speedup add not null constraint to
 existing table

This patch tries to use index_beginscan() / index_getnext() / index_endscan()
mentioned in [1] to speedup adding not-null constraints to the existing table.

There are two ways to add not-null constraints to an existing table.
1. ATExecAddConstraint->ATAddCheckNNConstraint->AddRelationNewConstraints
2. ATExecSetNotNull->AddRelationNewConstraints

The logic is:
1. In AddRelationNewConstraints, if condition meet (see get_index_for_validate_notnull)
   then attempt to use indexscan to pre check whether the column is NOT NULL.
2. AddRelationNewConstraints returned a list of CookedConstraint nodes.
   pass not-null cooked constraint to tab->constraints (a list of NewConstraint).
   code snippet:
if (!ccon->skip_validation)
{
newcon = (NewConstraint *) palloc0(sizeof(NewConstraint));
if (ccon->pre_validated)
newcon->nn_attnum = ccon->attnum;
tab->constraints = lappend(tab->constraints, newcon);
}
3. In ATRewriteTable, if all attnums associated with not-null constraint have
already been prevalidated, then no need to needscan.

CAUTION:
ALTER TABLE SET DATA TYPE and ALTER TABLE SET EXPRESSION trigger an index
rebuild (refer to RememberAllDependentForRebuilding).  While the former cannot
be applied to partitioned tables, the latter can.  If an index rebuild occurs,
then indexscan cannot be used to prevalidate whether a column contains null
values.  Consequently, pre-validating a NOT NULL constraint is currently not
supported for partitioned tables.

concurrency concern:
ALTER TABLE SET NOT NULL will take an ACCESS EXCLUSIVE lock, so there is less
variant of racing issue can occur?  to prove accurate, I wrote some isolation
tests. see[2]

performance concern:
it will be slower than normal adding not-null constraint ONLY IF your index is
bloat and a large percent of bloat is related to null value data.
demo:

100% bloat and zero percent null value:
drop table if exists t \; create unlogged table t(a int, b int, c int);
insert into t select g, g+1 from generate_series(1,1_000_000) g;
create index t_idx_a on t(a);
delete from t;
alter table t add constraint t1 not null a;

patch Time:  1.873 ms
master Time: 648.312 ms

 %20 percent column value is null and deleted from heap still on the index.
drop table if exists t \; create unlogged table t(a int, b int);
insert into t select case when g % 5 = 0 then null else g end, g+1
from generate_series(1,1_000_000) g;

create index t_idx_a on t(a);
delete from t where a is null;

alter table t add constraint t1 not null a;
patch Time:: 1080.407 ms (00:01.080)
master Time: 970.671 ms

references:
[1] https://postgr.es/m/CA%2BTgmoa5NKz8iGW_9v7wz%3D-%2BzQFu%3DE4SZoaTaU1znLaEXRYp-Q%40mail.gmail.com
[2] https://postgr.es/m/900056D1-32DF-4927-8251-3E0C0DC407FD%40anarazel.de
---
 src/backend/catalog/heap.c| 90 +
 src/backend/catalog/pg_constraint.c   |  1 +
 src/backend/commands/tablecmds.c  | 87 ++---
 src/backend/executor/execIndexing.c   | 97 +++
 src/include/catalog/heap.h|  3 +
 src/include/executor/executor.h   |  1 +
 .../isolation/expected/fast-set-notnull.out   | 97 +++
 src/test/isolation/isolation_schedule |  1 +
 .../isolation/specs/fast-set-notnull.spec | 43 
 src/test/regress/expected/alter_table.out | 41 
 src/test/regress/sql/alter_table.sql  | 47 +
 11 files changed, 497 insertions(+), 11 deletions(-)
 create mode 100644 src/test/isolation/expected/fast-set-notnull.out
 create mode 100644 src/test/isolation/specs/fast-set-notnull.spec

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 57ef466acc..35cd64e87c 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -57,6 +57,7 @@
 #include "commands/tablecmds.h"
 #include "commands/typecmds.h"
 #include "common/int.h"
+#include "executor/executor.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
@@ -117,6 +118,7 @@ static Node *cookConstraint(ParseState *pstate,
 			char *relname);
 
 
+static Oid get_index_for_validate_notnull(Relation rel, int colnum);
 /* 
  *XXX UGLY HARD CODED BADNESS FOLLOWS XXX
  *
@@ -2286,6 +2288,8 @@ StoreConstraints(Relation rel, List *cooked_constraints, bool is_internal)
  * newConstraints: list of Constraint nodes
  * allow_me

Re: should we have a fast-path planning for OLTP starjoins?

2025-02-05 Thread Richard Guo
On Wed, Feb 5, 2025 at 5:42 AM Tomas Vondra  wrote:
> Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
> of these star join queries with LEFT JOIN too, and then the FKs are not
> needed. All you need is a PK / unique index on the other side.
>
> Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
> to make this work for most cases, and then the rest would simply use the
> regular join order algorithm.
>
> I was thinking that if we allow the dimensions to eliminate rows in the
> fact table, we'd simply join them starting from the most selective ones.
> But that doesn't work if the joins might have different per-row costs
> (e.g. because some dimensions are much larger etc). Doing something
> smarter would likely end up fairly close to the regular join order
> algorithm ...

Yeah, we need to ensure that the joins to the fact table don't affect
its rows; otherwise, the join order matters for the final query plan,
and we'd better run the regular join search algorithm in this case.

For inner joins, using the foreign key seems ideal for this.  For left
joins, we might be able to leverage rel_is_distinct_for() to handle
that.

Thanks
Richard




Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-02-05 Thread Michail Nikolaev
Ooops, missed one commit - fixed (logic related to LP_DEAD in GIST
extracted to separate commit).

Also, commitfest entry is here - https://commitfest.postgresql.org/52/5542/

>


v7-0004-This-should-fix-issues-with-incorrect-results-whe.patch
Description: Binary data


v7-0002-This-should-fix-issues-with-incorrect-results-whe.patch
Description: Binary data


v7-0003-Also-add-ability-to-set-LP_DEAD-bits-in-more-case.patch
Description: Binary data


v7-0001-isolation-tester-showing-broken-index-only-scans-.patch
Description: Binary data


Re: per backend WAL statistics

2025-02-05 Thread Bertrand Drouvot
Hi,

On Wed, Feb 05, 2025 at 11:16:15AM +0900, Michael Paquier wrote:
> On Tue, Feb 04, 2025 at 08:49:41AM +, Bertrand Drouvot wrote:
> > can be extracted from pg_stat_get_backend_io(), so there is no need to 
> > duplicate
> > this information. The same comment could be done for pg_stat_wal and 
> > pg_stat_io
> > though, but pg_stat_wal already exists so removing fields has not the same
> > effect.
> > 
> > What are you thoughts about keeping in pg_stat_get_backend_wal() only the
> > 4 stats mentioned above?
> 
> wal_buffers_full is incremented in AdvanceXLInsertBuffer(), part of
> PendingWalStats.  wal_records, wal_fpi and wal_bytes are part of the
> instrumentation field.  It looks to me that if you discard the
> wal_buffers_full part, the implementation of the data in the backend
> could just be tied to the fields coming from WalUsage.

Yup.

> Actually, could it actually be useful to have wal_buffers_full be
> available in WalUsage, so as it would show up in EXPLAIN in a
> per-query basis with show_wal_usage()?

Yeah, that might help. One could not be 100% sure that the statement being
explained is fully responsible of the wal buffer being full (as it could just be
a "victim" of an already almost full wal buffer). But OTOH that might help to
understand why an EXPLAIN analyze is slower than another one (i.e one generating
wal buffer full and the other not). Also I think it could be added to
pg_stat_statements and could also provide valuable information.

> Consolidating that would make
> what you are trying it a bit easier, because we would have the
> WalUsage and the pg_stat_io parts without any of the PendingWalStats
> part.  And it is just a counter not that expensive to handle, like the
> data for records, fpis and bytes.  This extra information could be
> useful to have in the context of an EXPLAIN.

Yeah, I did a bit of archeology to try to understand why it's not already the
case. From what I can see, in commit time order:

1. df3b181499 introduced the WalUsage structure
2. 6b466bf5f2 added the wal usage in pg_stat_statements
3. 33e05f89c5 added the wal usage in EXPLAIN
4. 8d9a935965f added pg_stat_wal (and wal_buffers_full)
5. 01469241b2f added the wal usage in pg_stat_wal 

So, wal_buffers_full has been introduced after the WalUsage structure was
there but I don't see any reason in the emails as to why it's not in the 
WalUsage
structure (I might have missed it though).

I think that this proposal makes sense but would need a dedicated thread,
thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: per backend WAL statistics

2025-02-05 Thread Michael Paquier
On Wed, Feb 05, 2025 at 10:22:55AM +, Bertrand Drouvot wrote:
> So, wal_buffers_full has been introduced after the WalUsage structure was
> there but I don't see any reason in the emails as to why it's not in the 
> WalUsage
> structure (I might have missed it though).
> 
> I think that this proposal makes sense but would need a dedicated thread,
> thoughts?

Using a separate thread for a change like that makes sense to me.  I
have to admit that the simplifications in terms of designs for what
we're discussing here makes such a change more valuable.  Adding this
information to WalUsage is one thing.  Showing it in EXPLAIN is a
second thing.  Doing the former simplifies the patch you are proposing
here.  We don't necessarily have to do the latter, but I don't see a
reason to not do it, either.
--
Michael


signature.asc
Description: PGP signature


Re: should we have a fast-path planning for OLTP starjoins?

2025-02-05 Thread Richard Guo
On Wed, Feb 5, 2025 at 5:55 AM Tom Lane  wrote:
> Right now, if we have four tables to join, we have a joinlist
> (A B C D).  (Really they're integer relids, but let's use names here.)
> If we decide to force C to be joined last, it should be sufficient to
> convert this to ((A B D) C).  If C and D both look like candidates for
> this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
> This is pretty much the same thing that happens if you set
> join_collapse_limit to 1 and use JOIN syntax to force a join order.
> In fact, IIRC we start out with nested joinlists and there is some
> code that normally flattens them until it decides it'd be creating
> too large a sub-problem.  I'm suggesting selectively reversing the
> flattening.

This should not be too difficult to implement.  Outer joins seem to
add some complexity, though.  We need to ensure that the resulting
joins in each sub-list are legal given the query's join order
constraints.  For example, if we make the joinlist be (((A B) C) D),
we need to ensure that the A/B join and the (A/B)/C join are legal.

Thanks
Richard




Re: Logging parallel worker draught

2025-02-05 Thread Benoit Lobréau

On 2/3/25 6:36 AM, Sami Imseih wrote:

As far as the current set of patches, I had some other changes that
I missed earlier; indentation to the calls in LogParallelWorkersIfNeeded
and comment for the LogParallelWorkersIfNeeded function. I also re-worked the
setup of the GUC as it was not setup the same way as other
GUCs with an options list. I ended up just making those changes
in v8.



Besides that, I think this is ready for committer.


Thank you for your time and advice on this.

--
Benoit Lobréau
Consultant
http://dalibo.com





Re: Restrict publishing of partitioned table with a foreign table as partition

2025-02-05 Thread Álvaro Herrera
On 2025-Feb-05, vignesh C wrote:

> We can maintain the behavior you suggested when the
> PUBLISH_VIA_PARTITION_ROOT option is set to false. However, when
> PUBLISH_VIA_PARTITION_ROOT is true, the table data is copied from the
> root table (as intended by the user), which will also include the
> foreign table data. In this case, wouldn’t it be better to throw an
> error?

It sounds to me a reasonable restriction that you can only add a
partitioned table to a publication if publish_via_partition_root=false.
Then the case of hybrid partitioned tables is supported, but it requires
that changes are published directly by partitions, which is sensible
anyway.

In this case, during CREATE FOREIGN TABLE of a partition with this
condition, we must check whether any publications include the schema
that the table is being created on (or attached, for ALTER TABLE ATTACH
PARTITION), and whether there are any publications that are FOR ALL
TABLES that would be affected.


(If we later figure out a way to allow publish_via_partition_root and
skip the tuples in foreign partitions, it's easy to remove the
restriction.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)




Re: Commitfest app release on Feb 17 with many improvements

2025-02-05 Thread Álvaro Herrera
On 2025-Feb-04, Jelte Fennema-Nio wrote:

> To be clear: the only stats column currently in the staging
> environment shows exactly this. Changes 4 and 5 are *not* visible in
> columns on the commitfest page, that info is currently only visible on
> the patch details page. (For context for others: during the FOSDEM
> meeting 4 was a column on the commitfest page, but I reverted that
> after the feedback there)
> 
> So, I guess clutter wise the current staging is acceptable to you.

Check.

> On Tue, 4 Feb 2025 at 17:27, Álvaro Herrera  wrote:

> > 2. a per-patch breakdown of the number of addition and deletions for the
> >last patchset version posted, one line per patch.  This would appear
> >in a tooltip on the column for the totals, to avoid interface
> >clutter.
> 
> Statistics for each patch would definitely be nice. Not trivial to add
> though, because it would need cfbot to get those statistics for each
> patch. I created a github issue for this idea to track it[1]

Understood, thanks.

> > > 10. Allow sorting patches by total additions/deletions (added together
> > > for total lines changed)
> >
> > Seems reasonable, though I don't see much of an use-case for this.
> 
> My main idea behind this is to allow people to easily find some small
> patchsets, for when they have ~30 minutes to review something. Or for
> new contributors to find something to review without getting
> overwhelmed.

Ah, okay, sounds good.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Better title output for psql \dt \di etc. commands

2025-02-05 Thread Álvaro Herrera
On 2025-Feb-04, Tom Lane wrote:

> The implementation I had in mind was to just invent a
> pg_log_error_internal() macro alias for pg_log_error().
> That'd take about two lines counting the explanatory comment.
> This approach would fail to suppress the cost of gettext's
> trying to look up the string, but surely we aren't concerned
> about that here --- we just want to not burden translators
> with the string.

Yeah, okay, that seems good enough for me.  I agree that the cost of an
unnecessary gettext() call is negligible.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Add isolation test template in injection_points for wait/wakeup/detach

2025-02-05 Thread Bertrand Drouvot
Hi,

On Wed, Feb 05, 2025 at 03:12:53PM +0900, Michael Paquier wrote:
> On Thu, Oct 31, 2024 at 08:47:04AM +0900, Michael Paquier wrote:
> > Thanks for the review.  Applied that, then.
> 
> I was looking at src/test/isolation/README, and based on what is
> described for parenthesized markers (which I didn't know about so we
> learn a new thing every day), it is possible to force a strict
> ordering of the test.  Quote that matters:
> "A marker consisting solely of a step name indicates that this step
> may not be reported as completing until that other step has
> completed."

Oh, I did not know about it too (thanks for sharing).

> In our case, the problem that caused 9f00edc22888 is that the "wait"
> step could be so slow that the "detach" step could report before the
> "wait" step.

Yup.

> So I'd like to propose that we revisit 9f00edc22888, and re-enable the
> permutation with a marker so as the "detach" step waits for the "wait"
> step to complete before printing its own report, as per the attached.
> That should make the permutation safe to use.
> 
> The output of the attached patch is the same as what has been removed
> in 9f00edc22888 except that the permutation is changed from that:
> permutation wait1 wakeup2 detach2
> To that, making sure that detach2 report happens only after wait1
> has returned its own result:
> permutation wait1 wakeup2 detach2(wait1)
> 
> Thoughts?

I think that makes sense and the patch LGTM.
A few tests are already using this technique (including injection_points in
inplace.spec).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-05 Thread Álvaro Herrera
On 2025-Feb-03, Vladlen Popolitov wrote:

> You use FORMAT option to add new formats, filling it with routine name
> in shared library. As result any caller can call any routine in PostgreSQL
> kernel.
> I think, it will start competition, who can find most dangerous routine
> to call just from COPY FROM command.

Hah.

Maybe it would be a better UI to require that COPY format handlers are
registered explicitly before they can be used:

 CREATE ACCESS METHOD copy_yaml TYPE copy HANDLER copy_yaml_handler;

... and then when the FORMAT is not recognized as one of the hardcoded
methods, we go look in pg_am for one with amtype='c' and the given name.
That gives you the function that initializes the Copy state.

This is convenient enough because system administrators can add COPY
formats that anyone can use, and doesn't allow to call arbitrary
functions via COPY.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)




Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-05 Thread Shlok Kyal
On Wed, 5 Feb 2025 at 01:26, Shubham Khanna  wrote:
>
> On Tue, Feb 4, 2025 at 3:49 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Shubham,
> >
> > On Tue, Feb 4, 2025 at 2:10 PM Shubham Khanna
> >  wrote:
> > > > >
> > > >
> > > > It could be a bit tricky to find that for users but they can devise a
> > > > query to get the names and numbers of databases matching the given
> > > > pattern.  OTOH, I am not sure there is a clear need at this stage for
> > > > pattern matching for this tool. So, we can go with a simple switch as
> > > > you are proposing at this stage.
> > > >
> > >
> > > After reconsidering the idea of supporting '--all-databases' switch is
> > > the better approach at this stage, I have added the new switch in the
> > > latest patch.
> > > The attached patch contains the suggested changes.
> >
> > + If neither -d nor -a is
> > +   specified, pg_createsubscriber will use
> > +   --all-databases by default.
> >
> > As pointed upthread by me and Peter, using --all-databases by default
> > is not a good behaviour.
> >
> > But the code doesn't behave like --all-databases by default. Looks
> > like we need to fix the documentation.
>
> Updated the documentation accordingly and added the current behaviour
> of -a/--all-databases option.
>
> > + /* Generate publication and slot names if not specified */
> > + SimpleStringListCell *cell;
> > +
> > + fetch_all_databases(opt);
> > +
> > + cell = opt->database_names.head;
> >
> > We don't seem to check existence of publication and slot name
> > specification as the comment indicates. Do we need to check that those
> > names are not specified at all? and also mention in the documentation
> > that those specifications are required when using -a/--all-databases
> > option?
> >
>
> Added a check to verify that publication and slot names are not
> manually specified when using -a/--all-databases option and updated
> the documentation accordingly.
>
> The attached patch contains the suggested changes.
>
Hi Shubham,

Here are some of my comments:

1. We should start the comment text from the next line
+
+/* Function to validate all the databases and generate publication/slot names
+ * when using '--all-databases'.
+ */

2. Here in error message it should be '--database'
+ /* Check for conflicting options */
+ if (opt->all_databases && opt->database_names.head != NULL)
+ {
+ pg_log_error("cannot specify --dbname when using --all-databases");
+ exit(1);
+ }
+

3. I think checking 'opt->all_databases' in if conditions in function
'validate_databases' is not required
+static void
+validate_databases(struct CreateSubscriberOptions *opt)
+{
+ /* Check for conflicting options */
+ if (opt->all_databases && opt->database_names.head != NULL)
+ {
+ pg_log_error("cannot specify --dbname when using --all-databases");
+ exit(1);
+ }

as we are already checking it while calling the function
+ /* Validate and process database options */
+ if (opt.all_databases)
+ validate_databases(&opt);
+

4. Here we should update the count of 'num_replslots' and 'num_pubs'

+ while (cell != NULL)
+ {
+ char slot_name[NAMEDATALEN];
+
+ snprintf(slot_name, sizeof(slot_name), "%s_slot", cell->val);
+ simple_string_list_append(&opt->replslot_names, slot_name);
+
+ snprintf(slot_name, sizeof(slot_name), "%s_pub", cell->val);
+ simple_string_list_append(&opt->pub_names, slot_name);
+
+ cell = cell->next;
+ }
Since we are not updating these counts, the names are reflected as expected.

Should also store subscription names similarly? Or maybe store
subscription names and assign slot names same as subscription names?

5. Since --all-databases option is added we should update the comment:
/*
 * If --database option is not provided, try to obtain the dbname from
 * the publisher conninfo. If dbname parameter is not available, error
 * out.
 */

6. Extra blank lines should be removed
  }
  }

+
+
  /* Number of object names must match number of databases */


Thanks and Regards,
Shlok Kyal




Re: CREATE ROLE bug?

2025-02-05 Thread David G. Johnston
On Wed, Jan 25, 2023 at 10:21 AM Robert Haas  wrote:

> I think that the desire to maintain

the distinction between membership and ADMIN OPTION makes sense as a
> general rule


I haven't worked through the details but I suspect part of the issue is
that we are not maintaining this distinction when it comes to prohibiting
circular dependencies.  Might it be that we need to check for and prohibit
the admin and the set/inherit loops separately, but allow for an overall
loop so long as one direction is strictly admin and one direction is
strictly set/inherit?

David J.


Re: Remove unnecessary static specifier

2025-02-05 Thread Tom Lane
Daniel Gustafsson  writes:
> From a quick first inspection (and running the tests with the patch applied) I
> agree with this, these variables do not need to be static.  I'll stare a bit
> more at this to make sure but seems like the right patch.

+1.  All three of those variables are visibly assigned to before any
other reference, so they cannot carry data across calls of the
function.

While we're at it, could we make the adjacent "magic" string be
"static const char *magic"?  (Probably needs a couple more
"const" modifiers at use sites, too.)

regards, tom lane




Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ranier Vilela
Em qua., 5 de fev. de 2025 às 14:09, Ilia Evdokimov <
ilya.evdoki...@tantorlabs.com> escreveu:

> But what should we do if cheapest == NULL further? Should we return NULL
> of get_cheapest_parameterized_child_path() function?
>
> If it is, we should write it like this:
>
> if (cheapset == NULL || bms(PATH_REQ_OUTER(cheapset), required_outer))
>  return cheapest;
>

I think no in this case.
If cheapset is NULL, the logic is to continue the find.
What cannot happen is passing a null pointer to bms(PATH_REQ_OUTER.

best regards,
Ranier Vilela


Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ranier Vilela
Hi.

Em qua., 5 de fev. de 2025 às 14:08, Daniel Gustafsson 
escreveu:

> > On 5 Jan 2025, at 00:29, Ranier Vilela  wrote:
> >
> > Hi.
> >
> > Per Coverity.
> >
> > All call sites of function *get_cheapest_path_for_pathkeys* checks
> > for NULL returns.
> >
> > So, it is highly likely that the function will return NULL.
> >
> > IMO, the Assert in this particular call, is not fully effective.
> >
> > Fix removing the Assert and always check if the return is NULL.
>
> Yet the author wrote an Assert here (over a decade ago), so rather than
> blindly
> changing that it seems reasonable to motivate a patch like this with an
> investigation on what the Assert means here.  The fact that Coverity
> complains
> is far from conclusive evidence that something is wrong.
>
This is evidence that we do not have reports about this bug.
In any case, I think it's very unsafe for the future to trust that a
function that returns NULL
will never return in this particular case, don't you think?

best regards,
Ranier Vilela


Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Daniel Gustafsson
> On 5 Feb 2025, at 18:34, Ranier Vilela  wrote:
> Em qua., 5 de fev. de 2025 às 14:08, Daniel Gustafsson  > escreveu:

>> Yet the author wrote an Assert here (over a decade ago), so rather than 
>> blindly
>> changing that it seems reasonable to motivate a patch like this with an
>> investigation on what the Assert means here.  The fact that Coverity 
>> complains
>> is far from conclusive evidence that something is wrong.
> This is evidence that we do not have reports about this bug.

Before that can be stated it needs to be determined if this is a bug, this
thread has not done that yet.

--
Daniel Gustafsson



Inconsistency between Compression and Storage for Foreign Tables

2025-02-05 Thread Zhang Mingli

Hi,

While developing the CREATE FOREIGN TABLE LIKE functionality in [0], I had to 
consider the like_options, such as STORAGE, COMPRESSION, and others.

Although foreign tables do not have actual storage in PostgreSQL, we allow the 
STORAGE option as it may be useful for foreign data wrappers (FDWs) that 
support this concept.

As stated in the ALTER FOREIGN TBALE documentation[1]:

> This form sets the storage mode for a column. See the similar form of ALTER 
> TABLE for more details. Note that the storage mode has no effect unless the 
> table's foreign-data wrapper chooses to pay attention to it.

However, when aligning COMPRESSION with STORAGE, I find it confusing. IMO, 
COMPRESSION should behave similarly to STORAGE for foreign tables,
even though they lack real storage. This could be particularly useful for FDWs 
like postgres_fdw.

I noticed several inconsistencies between COMPRESSION and STORAGE for foreign 
tables:

1. We actually allow both SET COMPRESSION and STORAGE for foreign table 
columns, but the CREATE FOREIGN TABLE documentation[2] does not mention this.
gpadmin=# CREATE FOREIGN DATA WRAPPER extstats_dummy_fdw;
CREATE FOREIGN DATA WRAPPER
gpadmin=# CREATE SERVER extstats_dummy_srv FOREIGN DATA WRAPPER 
extstats_dummy_fdw;
CREATE SERVER
gpadmin=# create foreign table ft1(a int, b text compression lz4) server 
extstats_dummy_srv;
CREATE FOREIGN TABLE
gpadmin=# \set HIDE_TOAST_COMPRESSION false
gpadmin=# \d+ ft1
  Foreign table "public.ft1"
Column |  Type   | Collation | Nullable | Default | FDW options | Storage  | 
Stats target | Description
+-+---+--+-+-+--+--+-
a  | integer |   |  | | | plain|
  |
b  | text|   |  | | | extended |
  |
Server: extstats_dummy_srv

gpadmin=# select attname, attcompression from pg_attribute where attname = 'b' 
and attrelid = 'ft1'::regclass::oid;
attname | attcompression
-+
b   | l
(1 row)

The COMPRESSION info is not listed even HIDE_TOAST_COMPRESSION is set to false 
because describe.c will ignore that column if table is a foreign table.
But select from pg_attribute will show that compression info.

And the COMPRESSION info is copied when creating a table like that foreign 
table including options.

gpadmin=# create table t1(like ft1 including all);
CREATE TABLE
gpadmin=# \d+ t1
   Table "public.t1"
Column |  Type   | Collation | Nullable | Default | Storage  | Compression | 
Stats target | Description
+-+---+--+-+--+-+--+-
a  | integer |   |  | | plain| |
  |
b  | text|   |  | | extended | lz4 |
  |
Access method: heap

The same goes for STORAGE:
gpadmin=# create foreign table ft2(a int, b text storage external) 
server extstats_dummy_srv;
CREATE FOREIGN TABLE
gpadmin=# \d+ ft2
  Foreign table "public.ft2"
Column |  Type   | Collation | Nullable | Default | FDW options | Storage  | 
Stats target | Description
+-+---+--+-+-+--+--+-
a  | integer |   |  | | | plain|
  |
b  | text|   |  | | | external |
  |
Server: extstats_dummy_srv
gpadmin=# create table t2(like ft2 including all);
CREATE TABLE
gpadmin=# \d+ t2
   Table "public.t2"
Column |  Type   | Collation | Nullable | Default | Storage  | Compression | 
Stats target | Description
+-+---+--+-+--+-+--+-
a  | integer |   |  | | plain| |
  |
b  | text|   |  | | external | |
  |
Access method: heap

2. We allow ALTER COLUMN SET STORAGE for foreign table columns, but we disallow 
SET COMPRESSION.

gpadmin=# alter foreign table ft1 alter column b set compression pglz;
ERROR:  ALTER action ALTER COLUMN ... SET COMPRESSION cannot be performed on 
relation "ft1"
DETAIL:  This operation is not supported for foreign tables.

gpadmin=# alter foreign table ft1 alter column b set storage external;
ALTER FOREIGN TABLE
gpadmin=# \d+ ft1
  Foreign table "public.ft1"
Column |  Type   | Collation | Nullable | Default | FDW options | Storage  | 
Stats target | Description
+-+---+--+-+-+--+--+-
a  | integer |  

Re: Feature Request: Add AES-128-CFB Mode Support to pgcrypto

2025-02-05 Thread Álvaro Herrera
On 2025-Jan-29, Umar Hayat wrote:

> Hi Daniel Gustafsson and Vladyslav Nebozhyn,
> I created a patch for CFB mode for AES encryption. Please have a look
> and let me know what you think.
> Patch covers implementation, tests and documentation changes.
> 
> OpenSSL profives aes-cfb1, aes-cfb8 and aes-cfb128 modes where aes-cfb
> defaults to aes-cfb128. For simplicity I only added aes-cfb, which is
> the most common method used, lower number of bits will introduce
> performance degradation, but if it's desirable I can add them as well.

I kicked the tires on this by encrypting a file with 
  openssl aes-128-cfb -K afe908123efcba901230afe908eb5a04 -iv 
912387caedade123912387c7ec0b9d0f -pbkdf2
then importing that into a bytea column.  Then I can sort-of obtain the
file back with

select decrypt_iv from data, decrypt_iv(a, '\xafe908123efcba901230', 
'\x912387caedade123', 'aes-cfb');

... Except that appears that openssl will encode UTF8 characters in my file
as \ooo octal escapes in the encrypted output, which is really weird, so
the file doesn't roundtrip exactly.  Maybe this is a pgcrypto shortcoming, not
sure.  The final output looks like this (shortened):

=# select decrypt_iv from data, decrypt_iv(a, 
'\xafe908123efcba901230afe908eb5a04', '\x912387caedade123912387c7ec0b9d0f', 
'aes-cfb');

 >
─>
 \302\241Hoy ha sido un d\303\255a genial! Era el cumple de la abuela, que es 
la mam\303\241 de mam\303\241,

Anyway, at least the bytes appear to be interpreted the same by both
openssl and this new function, so that's good news.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)




Re: [PATCH] SVE popcount support

2025-02-05 Thread Nathan Bossart
On Tue, Feb 04, 2025 at 09:01:33AM +, chiranmoy.bhattacha...@fujitsu.com 
wrote:
>> +/*
>> + * For smaller inputs, aligning the buffer degrades the performance.
>> + * Therefore, the buffers only when the input size is sufficiently 
>> large.
>> + */
> 
>> Is the inverse true, i.e., does aligning the buffer improve performance for
>> larger inputs?  I'm also curious what level of performance degradation you
>> were seeing.
> 
> Here is a comparison of all three cases. Alignment is marginally better for 
> inputs
> above 1024B, but the difference is small. Unaligned performs better for 
> smaller inputs.
> Aligned After 128B => the current implementation "if (aligned != buf && bytes 
> > 4 * vec_len)"
> Always Aligned => condition "bytes > 4 * vec_len" is removed.
> Unaligned => the whole if block was removed
> 
>  buf| Always Aligned | Aligned After 128B | Unaligned
> +---++
>16   |   37.851  |   38.203   | 34.971
>32   |   37.859  |   38.187   | 34.972
>64   |   37.611  |   37.405   | 34.121
>   128   |   45.357  |   45.897   | 41.890
>   256   |   62.440  |   63.454   | 58.666
>   512   |  100.120  |  102.767   | 99.861
>  1024   |  159.574  |  158.594   |164.975
>  2048   |  282.354  |  281.198   |283.937
>  4096   |  532.038  |  531.068   |533.699
>  8192   | 1038.973  | 1038.083   |   1039.206
> 16384   | 2028.604  | 2025.843   |   2033.940

Hm.  These results are so similar that I'm tempted to suggest we just
remove the section of code dedicated to alignment.  Is there any reason not
to do that?

+   /* Process 2 complete vectors */
+   for (; i < loop_bytes; i += vec_len * 2)
+   {
+   vec64 = svand_x(pred, svld1(pred, (const uint64 *) (buf + i)), 
mask64);
+   accum1 = svadd_x(pred, accum1, svcnt_x(pred, vec64));
+   vec64 = svand_x(pred, svld1(pred, (const uint64 *) (buf + i + 
vec_len)), mask64);
+   accum2 = svadd_x(pred, accum2, svcnt_x(pred, vec64));
+   }

Does this hand-rolled loop unrolling offer any particular advantage?  What
do the numbers look like if we don't do this or if we process, say, 4
vectors at a time?

-- 
nathan




Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ranier Vilela
Hi.

Em qua., 5 de fev. de 2025 às 13:51, Ilia Evdokimov <
ilya.evdoki...@tantorlabs.com> escreveu:

>
> On 05.01.2025 02:29, Ranier Vilela wrote:
> > Hi.
> >
> > Per Coverity.
> >
> > All call sites of function *get_cheapest_path_for_pathkeys* checks
> > for NULL returns.
> >
> > So, it is highly likely that the function will return NULL.
> >
> > IMO, the Assert in this particular call, is not fully effective.
> >
> > Fix removing the Assert and always check if the return is NULL.
> >
> > best regards,
> > Ranier Vilela
>
>
> Hi!
>
> Thanks for noticing this. If this happens in the planner, it poses a
> serious risk of a segmentation fault that could crash the instance if a
> NULL pointer is dereferenced. Since checking for NULL is very cheap, I
> support this patch.
>
Thanks for taking a look.

best regards,
Ranier Vilela


Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-05 Thread Vladlen Popolitov

Álvaro Herrera писал(а) 2025-02-05 18:49:

On 2025-Feb-03, Vladlen Popolitov wrote:


You use FORMAT option to add new formats, filling it with routine name
in shared library. As result any caller can call any routine in 
PostgreSQL

kernel.
I think, it will start competition, who can find most dangerous 
routine

to call just from COPY FROM command.


Hah.

Maybe it would be a better UI to require that COPY format handlers are
registered explicitly before they can be used:

 CREATE ACCESS METHOD copy_yaml TYPE copy HANDLER copy_yaml_handler;

... and then when the FORMAT is not recognized as one of the hardcoded
methods, we go look in pg_am for one with amtype='c' and the given 
name.

That gives you the function that initializes the Copy state.

This is convenient enough because system administrators can add COPY
formats that anyone can use, and doesn't allow to call arbitrary
functions via COPY.


Yes! It is what I propose. This looks much safer and already used in 
access methods creation.


--
Best regards,

Vladlen Popolitov.




Re: Failed assertion with jit enabled

2025-02-05 Thread Tom Lane
Bertrand Drouvot  writes:
> I was doing some tests and managed to trigger a failed assertion with jit
> enabled.

> The test can be simplified to:

> postgres=# select count(*) from generate_series(1,1000);
> server closed the connection unexpectedly

Hmm, works for me on today's HEAD.

> The failed assertion is:
> TRAP: failed Assert("false"), File: "llvmjit_expr.c", Line: 2833, PID: 3060333

There are two Assert(false) in that file, neither of them particularly
close to line 2833 as of today, so I'm not sure what version you are
using or which one you're hitting.  Nonetheless, I'll go out on a
limb and guess that this will go away if you do "git clean -dfxq"
and a full rebuild.  It smells like different files getting out of
sync about the representation of ExprEvalOp or ExprEvalStep.

regards, tom lane




Re: Feature Request: Add AES-128-CFB Mode Support to pgcrypto

2025-02-05 Thread Daniel Gustafsson
> On 5 Feb 2025, at 18:24, Álvaro Herrera  wrote:

> Anyway, at least the bytes appear to be interpreted the same by both
> openssl and this new function, so that's good news.

Thanks for confirming.  Short of a very small doc change (which I don't have
handy on this laptop) I think this patch is ready.  I wish we had a better way
of adding support for ciphers in OpenSSL but it's not the responsibility of
this patch to solve that.

--
Daniel Gustafsson





Re: Failed assertion with jit enabled

2025-02-05 Thread Bertrand Drouvot
Hi,

On Wed, Feb 05, 2025 at 10:51:17AM -0500, Tom Lane wrote:
> Bertrand Drouvot  writes:
> > I was doing some tests and managed to trigger a failed assertion with jit
> > enabled.
> 
> > The test can be simplified to:
> 
> > postgres=# select count(*) from generate_series(1,1000);
> > server closed the connection unexpectedly
> 
> Hmm, works for me on today's HEAD.

Thanks for looking at it!

> > The failed assertion is:
> > TRAP: failed Assert("false"), File: "llvmjit_expr.c", Line: 2833, PID: 
> > 3060333
> 
> There are two Assert(false) in that file, neither of them particularly
> close to line 2833 as of today, so I'm not sure what version you are
> using or which one you're hitting.  Nonetheless, I'll go out on a
> limb and guess that this will go away if you do "git clean -dfxq"
> and a full rebuild.

Yeah that's my default way to do and that was done that way.

> It smells like different files getting out of
> sync about the representation of ExprEvalOp or ExprEvalStep.
> 

I did look more closely (knowing that it works for you) and the issue is linked
to not using --with-llvm. Inded, I used to use --with-llvm but removed it some
time ago for testing.

So the failed build was not using --with-llvm and was relying on an old version
of llvmjit.so (build from the last time I used --with-llvm)...

As a default I also always use "maintainer-clean" but it looks like it does not
remove llvmjit.so from the installation directory: is that a miss?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: problems with extensions compiling after upgrade to fc42 (and gcc15)

2025-02-05 Thread Pavel Stehule
st 5. 2. 2025 v 21:55 odesílatel Peter Eisentraut 
napsal:

> On 05.02.25 21:31, Pavel Stehule wrote:
> > I found a problem with compilation of plpgsql_check when I upgraded my
> > fedora.
> >
> > plpgsql_check cannot be compiled (against PostgreSQL 15-). The
> > compilation fails
> >
> > rc/expr_walk.c:84:66: warning: passing argument 2 of ‘query_tree_walker’
> > from incompatible pointer type [-Wincompatible-pointer-types]
> > 84 |
> >   detect_dependency_walker,
> >|
> >   ^~~~
> >|
>  |
> >|
> >   _Bool (*)(Node *, void *)
> > In file included from src/expr_walk.c:14:
> > /usr/local/pgsql/13/include/server/nodes/nodeFuncs.h:137:52: note:
> > expected ‘_Bool (*)(void)’ but argument is of type ‘_Bool (*)(Node *,
> > void *)’
> >137 | extern bool query_tree_walker(Query *query, bool (*walker) (),
> >| ~~~^~
> > src/expr_walk.c:43:1: note: ‘detect_dependency_walker’ declared here
> > 43 | detect_dependency_walker(Node *node, void *context)
> >| ^~~~
> >
> > It is strange, so the source code of Postgres is compiled without
> problems.
> >
> > The workaround is pushing to Makefiie
> >
> > override CFLAGS +=  -Wno-error=incompatible-pointer-types
>
> This will be fixed in the next minor releases.  See commit f00c401c65a
> for example.
>
>
ok, thank you for info

Regards

Pavel


Buffer overflow in zic

2025-02-05 Thread Evgeniy Gorbanyov

Hello.

Ifyou compilezicwithASAN,you cangetthe following(notethiswill 
delete/etc/localtime):

|$ sudo ./zic -l fff
=
==5528==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x0053103f at pc 0x00501ceb bp 0x7ffe9fbe6510 sp 0x7ffe9fbe6508

READ of size 1 at 0x0053103f thread T0
    #0 0x501cea in relname /artifacts/postgres/src/timezone/zic.c:978:36
    #1 0x50081b in dolink /artifacts/postgres/src/timezone/zic.c:1045:42
    #2 0x4fab14 in main /artifacts/postgres/src/timezone/zic.c:846:3
    #3 0x7ff25975fefc in __libc_start_main (/lib64/libc.so.6+0x27efc)
    #4 0x41c459 in _start 
/usr/src/RPM/BUILD/glibc-2.32-alt5.p10.3/csu/../sysdeps/x86_64/start.S:120

| |
0x0053103f is located 33 bytes to the left of global variable 
'' defined in 'zic.c:841:14' (0x531060) of size 13

  '' is ascii string 'link to link'
0x0053103f is located 1 bytes to the left of global variable 
'' defined in 'zic.c:806:15' (0x531040) of size 15

  '' is ascii string '/etc/localtime'
0x0053103f is located 26 bytes to the right of global variable 
'' defined in 'zic.c:804:15' (0x531020) of size 5

  '' is ascii string 'data'
SUMMARY: AddressSanitizer: global-buffer-overflow 
/artifacts/postgres/src/timezone/zic.c:978:36 in relname

Shadow bytes around the buggy address:
  0x8009e1b0: 00 00 00 00 07 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x8009e1c0: 07 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 07 f9 f9 f9
  0x8009e1d0: f9 f9 f9 f9 00 03 f9 f9 00 00 00 00 07 f9 f9 f9
  0x8009e1e0: f9 f9 f9 f9 00 00 00 00 07 f9 f9 f9 f9 f9 f9 f9
  0x8009e1f0: 00 00 00 04 f9 f9 f9 f9 00 03 f9 f9 02 f9 f9 f9
=>0x8009e200: 05 f9 f9 f9 05 f9 f9[f9]00 07 f9 f9 00 05 f9 f9
  0x8009e210: 00 05 f9 f9 00 03 f9 f9 00 02 f9 f9 00 00 f9 f9
  0x8009e220: 01 f9 f9 f9 02 f9 f9 f9 03 f9 f9 f9 00 00 00 00
  0x8009e230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x8009e240: 00 00 00 00 00 00 00 00 00 02 f9 f9 f9 f9 f9 f9
  0x8009e250: 00 00 00 00 f9 f9 f9 f9 00 00 f9 f9 00 00 00 03
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:    f7
  Container overflow:  fc
  Array cookie:    ac
  Intra object redzone:    bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:    cb
==5528==ABORTING|

Patch is included in the attachment.


Best regards,
Evgeniy Gorbanyov
diff --combined src/timezone/zic.c
index d605c721ec,d605c721ec..00
--- a/src/timezone/zic.c
+++ b/src/timezone/zic.c
@@@ -975,7 -975,7 +975,7 @@@ relname(char const *target, char const 
  		if (f[i] == '/')
  			dir_len = i + 1;
  	for (; linkname[i]; i++)
--		dotdots += linkname[i] == '/' && linkname[i - 1] != '/';
++		dotdots += i > 0 && linkname[i] == '/' && linkname[i - 1] != '/';
  	taillen = strlen(f + dir_len);
  	dotdotetcsize = 3 * dotdots + taillen + 1;
  	if (dotdotetcsize <= linksize)


Re: Special-case executor expression steps for common combinations

2025-02-05 Thread Daniel Gustafsson
> On 13 Sep 2024, at 15:01, Andreas Karlsson  wrote:
> 
> On 9/10/24 10:54 AM, Daniel Gustafsson wrote:
>>> On 22 Jul 2024, at 23:25, Andreas Karlsson  wrote:
>>> 
>>> I have bench marked the two patches now and failed to measure any speedup 
>>> or slowdown from the first patch (removing return) but I think it is a good 
>>> idea anyway.
>>> 
>>> For the second patch (optimize strict) I managed to measure a ~1% speed up 
>>> for the following query "SELECT sum(x + y + 1) FROM t;" over one million 
>>> rows.
>> That's expected, this is mostly about refactoring the code to simplifying the
>> JITed code (and making tiny strides towards JIT expression caching).
> 
> Yup! Expected and nice tiny speedup.
> 
>>> I would say both patches are ready for committer modulo my proposed style 
>>> fixes.
>> I am a bit wary about removing the out_error label and goto since it may open
>> up for reports from static analyzers about control reaching the end of a
>> non-void function without a return. The other change has been incorporated.
>> The attached v3 is a rebase to handle executor changes done since v2, with 
>> the
>> above mentioned fix as well.  If there are no objections I think we should
>> apply this version.
> 
> Sounds good to me and in my opinion this is ready to be committed.

This fell off the ever-growing TODO again.  Re-reading it I still think it's a
good idea, it applied almost cleanly still and still gives a slight performance
improvement along with the more interesting refactoring which will make caching
of expressions easier down the line.

Attached is a rebased v4, unless there are objections I'll go ahead with this
version.

--
Daniel Gustafsson



v4-0002-Add-special-case-fast-paths-for-strict-functions.patch
Description: Binary data


v4-0001-Replace-EEOP_DONE-with-special-steps-for-return-n.patch
Description: Binary data


Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-02-05 Thread Michail Nikolaev
Hello everyone, and Mathias!

I have fixed sp-gist related crash and a few issues in implementation.
Now it passes tests and (in my opinion) feels simpler.

I'll register that thread in commitfest to honor the bureaucracy.

Best regards,
Mikhail.


v6-0003-This-should-fix-issues-with-incorrect-results-whe.patch
Description: Binary data


v6-0002-Also-add-ability-to-set-LP_DEAD-bits-in-more-case.patch
Description: Binary data


v6-0001-This-should-fix-issues-with-incorrect-results-whe.patch
Description: Binary data


Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-02-05 Thread Amit Kapila
On Wed, Feb 5, 2025 at 10:30 AM vignesh C  wrote:
>
> On Tue, 4 Feb 2025 at 19:56, Nisha Moond  wrote:
> >
> > Here is v69 patch set addressing above and Kuroda-san's comments in [1].
>
> Few minor suggestions:
> 1) In the slot invalidation reporting below:
> +   case RS_INVAL_IDLE_TIMEOUT:
> +   Assert(inactive_since > 0);
> +
> +   /* translator: second %s is a GUC variable name */
> +   appendStringInfo(&err_detail, _("The slot's
> idle time %s exceeds the configured \"%s\" duration."),
> +
> timestamptz_to_str(inactive_since),
> +
> "idle_replication_slot_timeout");
> +   /* translator: %s is a GUC variable name */
> +   appendStringInfo(&err_hint, _("You might need
> to increase \"%s\"."),
> +
> "idle_replication_slot_timeout");
>
> It is logged like:
> 2025-02-05 10:04:11.616 IST [330567] DETAIL:  The slot's idle time
> 2025-02-05 10:02:49.131631+05:30 exceeds the configured
> "idle_replication_slot_timeout" duration.
>
> Here even though we tell idle time, we are logging the inactive_since
> value which kind of gives a wrong meaning.
>
> How about we change it to:
> The slot has been inactive since 2025-02-05 10:02:49.131631+05:30,
> which exceeds the configured "idle_replication_slot_timeout" duration.
>

Would it address your concern if we write the actual idle duration
(now - inactive_since) instead of directly using inactive_since in the
above message?

A few other comments:
1.
+ * 4. The slot is not being synced from the primary while the server
+ *is in recovery
+ *
+ * Note that the idle timeout invalidation mechanism is not
+ * applicable for slots on the standby server that are being synced
+ * from the primary server (i.e., standby slots having 'synced' field 'true').
+ * Synced slots are always considered to be inactive because they don't
+ * perform logical decoding to produce changes.

The 4th point in the above comment and the rest of the comment is
mostly saying the same thing.

2.
+ * Flush all replication slots to disk. Also, invalidate obsolete slots during
+ * non-shutdown checkpoint.
  *
  * It is convenient to flush dirty replication slots at the time of checkpoint.
  * Additionally, in case of a shutdown checkpoint, we also identify the slots
@@ -1924,6 +2007,45 @@ CheckPointReplicationSlots(bool is_shutdown)

Can we try and see how the patch looks if we try to invalidate the
slot due to idle time at the same time when we are trying to
invalidate due to WAL?

-- 
With Regards,
Amit Kapila.




Re: Sample rate added to pg_stat_statements

2025-02-05 Thread Ilia Evdokimov



On 04.02.2025 20:59, Sami Imseih wrote:

To summarize the results of all benchmarks, I compiled them into a table:

Thanks for compiling the benchmark data above.

The main benefit of this patch will be to give the user
a toggle if they are observing high spinlock contention
due to pg_stat_statements which will likely occur
on larger machines.

I also did not see this thread [1] mentioned in the thread above,
but one of the reasons pg_stat_statements.track_planning
was turned off by default was due to the additional spinlock
acquire to track the planning stats. Bringing this up as sample_rate
might also be beneficial as well if a user decides to track planning.

Regards,

Sami

[1] 
https://www.postgresql.org/message-id/flat/2895b53b033c47ccb22972b589050...@ex13d05uwc001.ant.amazon.com



Thanks for the thread. As we can see, simply enabling or disabling not 
only track_planning but also other pg-stat_statements parameters is too 
radical a measure for managing performance. With the introduction of 
sample_rate,  users now have more flexibility in controlling spinlock 
contention. This allows them to balance overhead and statistics 
collection rather than completely disabling the feature.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: Better title output for psql \dt \di etc. commands

2025-02-05 Thread Tom Lane
=?utf-8?Q?=C3=81lvaro?= Herrera  writes:
> On 2025-Feb-04, Tom Lane wrote:
>> The implementation I had in mind was to just invent a
>> pg_log_error_internal() macro alias for pg_log_error().
>> That'd take about two lines counting the explanatory comment.
>> This approach would fail to suppress the cost of gettext's
>> trying to look up the string, but surely we aren't concerned
>> about that here --- we just want to not burden translators
>> with the string.

> Yeah, okay, that seems good enough for me.  I agree that the cost of an
> unnecessary gettext() call is negligible.

Sounds good, I'll proceed along those lines.

regards, tom lane




Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Daniel Gustafsson
> On 5 Jan 2025, at 00:29, Ranier Vilela  wrote:
> 
> Hi.
> 
> Per Coverity.
> 
> All call sites of function *get_cheapest_path_for_pathkeys* checks
> for NULL returns.
> 
> So, it is highly likely that the function will return NULL.
> 
> IMO, the Assert in this particular call, is not fully effective.
> 
> Fix removing the Assert and always check if the return is NULL.

Yet the author wrote an Assert here (over a decade ago), so rather than blindly
changing that it seems reasonable to motivate a patch like this with an
investigation on what the Assert means here.  The fact that Coverity complains
is far from conclusive evidence that something is wrong.

--
Daniel Gustafsson





Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ilia Evdokimov
But what should we do if cheapest == NULL further? Should we return NULL 
of get_cheapest_parameterized_child_path() function?


If it is, we should write it like this:

if (cheapset == NULL || bms(PATH_REQ_OUTER(cheapset), required_outer))
    return cheapest;

I'll look into this issue further.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: Remove unnecessary static specifier

2025-02-05 Thread Daniel Gustafsson
> On 5 Feb 2025, at 17:05, Tom Lane  wrote:

> While we're at it, could we make the adjacent "magic" string be
> "static const char *magic"?  (Probably needs a couple more
> "const" modifiers at use sites, too.)

Good point, from the link referenced it's clear that FreeBSD has made that
change as well.  I'll fix that at the same time.

--
Daniel Gustafsson





Re: log_min_messages per backend type

2025-02-05 Thread Álvaro Herrera
Hello Euler,

On 2024-Dec-17, Euler Taveira wrote:

> Sometimes you need to inspect some debug messages from autovacuum worker but
> you cannot apply the same setting for backends (that could rapidly fill the 
> log
> file). This proposal aims to change log_min_messages to have different log
> levels depending on which backend type the message comes from.
> 
> The syntax was changed from enum to string and it accepts a list of elements.
> 
> Instead of enum, it now accepts a comma-separated list of elements (string).
> Each element is LOGLEVEL:BACKENDTYPE.

This format seems unintuitive.  I would have thought you do it the other
way around, "backendtype:loglevel" ... that seems more natural because
it's like assigning the 'loglevel' value to the 'backendtype' element.
  SET log_min_messages TO 'checkpointer:debug2, autovacuum:debug1';


I dislike the array of names in variable.c.  We already have an array in
launch_backend.c (child_process_kinds), plus GetBackendTypeDesc in
miscinit.c.  Maybe not for this patch to clean up though.


I think it should be acceptable to configure one global setting with
exceptions for particular backend types:

log_min_messages = WARNING, autovacuum:DEBUG1

Right now I think the code only accepts the unadorned log level if there
are no other items in the list.  I think the proposal downthread is to
use the keyword ALL for this,

  log_min_messages = all:WARNING, autovacuum:DEBUG1   # I don't like this

but I think it's inferior, because then "all" is not really "all", and I
think it would be different if I had said

  log_min_messages = autovacuum:DEBUG1, all:WARNING   # I don't like this

because it looks like the "all" entry should override the one I set for
autovacuum before, which frankly would not make sense to me.

So I think these two lines,

log_min_messages = WARNING, autovacuum:DEBUG1
log_min_messages = autovacuum:DEBUG1, WARNING

should behave identically and mean "set the level for autovacuum to
DEBUG1, and to any other backend type to WARNING.


Also, I think it'd be better to reject duplicates in the list.  Right
now it looks like the last entry for one backend type overrides prior
ones.  I mean

log_min_messages = autovacuum:DEBUG1, autovacuum:ERROR

would set autovacuum to error, but that might be mistake prone if your
string is long.  So implementation-wise I suggest to initialize the
whole newlogminmsgs array to -1, then scan the list of entries (saving
an entry without backend type as the one to use later and) setting every
backend type to the number specified; if we see trying to set a value
that's already different from -1, throw error.  After scanning the whole
log_min_messages array, we scan the newlogminmsgs and set any entries
that are still -1 to the value that we saved before.


The new code in variable.c should be before the /* DATESTYLE */ comment
rather than at the end of the file.


You still have many XXX comments.  Also, postgresql.conf should list the
valid values for backendtype, as well as show an example of a valid
setting.  Please don't use ALLCAPS backend types in the docs, this looks
ugly:

> +Valid BACKENDTYPE values are 
> ARCHIVER,
> +AUTOVACUUM, BACKEND,
> +BGWORKER, BGWRITER,
> +CHECKPOINTER, LOGGER,
> +SLOTSYNCWORKER, WALRECEIVER,
> +WALSENDER, WALSUMMARIZER, and
> +WALWRITER.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"  (Jorge González)




Re: Virtual generated columns

2025-02-05 Thread Dean Rasheed
On Tue, 4 Feb 2025 at 22:36, Peter Eisentraut  wrote:
>
> Yeah, this is quite contorted.  I have renamed it like you suggested.

I looked over this again and I think the patch is in good shape to be committed.

One thought that occurred to me was whether it would be better for the
psql describe output (and maybe also pg_dump) to explicitly output
"virtual" for columns of this kind. I know that that's the default for
generated columns, but someone reading the output might not know or
remember that, so perhaps it would be helpful to be explicit.

Regards,
Dean




Re: [PATCH] Add regression tests of ecpg command notice (error / warning)

2025-02-05 Thread Ryo Kanbayashi
On Wed, Feb 5, 2025 at 9:31 PM Ryo Kanbayashi  wrote:
>
> Hi hackers,
>
> When I wrote patch of ecpg command notice bug, I recognized needs of
> regression tests for ecpg command notices and I say that I write the
> tests.
>
> https://commitfest.postgresql.org/52/5497/
> https://www.postgresql.org/message-id/0efab1f6-5d8d-451f-a7dc-ef9c73ba9e02%40oss.nttdata.com
>
> This mail is about patch of the tests.
>
> Patch is attached to this mail : ecpg_cmd_notice_regress_test.patch
> This patch passed CI.
> https://cirrus-ci.com/build/4861827500212224
>
> And corresponding diff is below.
> https://github.com/ryogrid/postgres/compare/0ec3c295e7594ed3af86bca1a4b4be269c2f069d...72329311a75630594bcaa38248255360b7e8e525
>
> I explain about implementation of this patch.
>
> What is this patch
> - add regression tests which test ecpg command notices such as warning
> and errors
> - test notices implemented in ecpg.addons file
>
> Basic policy on implementation
> - do in a way that matches the method using the existing pg_regress
> command as much as possible
> - avoid methods that increase the scope of influence
>
> Next, I list answers to points that are likely to be pointed out in
> advance below :)
> - shell scripts and bat files is used due to ...
> avoid non zero exit code of ecpg command makes tests failure
> avoid increasing C code for executing binary which cares cross platform
> - python code is used because I couldn't write meson.build
> appropriately describe dependency about materials which is used on
> tests without it. please help me...
> - as you said, kick this kind of tests by pg_regress accompanied with
> needless PG server process execution. but pg_regress doesn't execute
> test without it and making pg_regress require modification which has
> not small scope of influence

Sorry, I re-send patch because a patch already sent includes needless
stderr output file.
NEW PATCH FILENAME: ecpg_cmd_notice_regress_test2.patch

In addition, diff between patches does not affect test behavior.

---
Sincerely,
Ryo Kanbayashi
https://github.com/ryogrid




Fwd: [PATCH] Add regression tests of ecpg command notice (error / warning)

2025-02-05 Thread Ryo Kanbayashi
On Wed, Feb 5, 2025 at 9:31 PM Ryo Kanbayashi  wrote:
>
> Hi hackers,
>
> When I wrote patch of ecpg command notice bug, I recognized needs of
> regression tests for ecpg command notices and I say that I write the
> tests.
>
> https://commitfest.postgresql.org/52/5497/
> https://www.postgresql.org/message-id/0efab1f6-5d8d-451f-a7dc-ef9c73ba9e02%40oss.nttdata.com
>
> This mail is about patch of the tests.
>
> Patch is attached to this mail : ecpg_cmd_notice_regress_test.patch
> This patch passed CI.
> https://cirrus-ci.com/build/4861827500212224
>
> And corresponding diff is below.
> https://github.com/ryogrid/postgres/compare/0ec3c295e7594ed3af86bca1a4b4be269c2f069d...72329311a75630594bcaa38248255360b7e8e525
>
> I explain about implementation of this patch.
>
> What is this patch
> - add regression tests which test ecpg command notices such as warning
> and errors
> - test notices implemented in ecpg.addons file
>
> Basic policy on implementation
> - do in a way that matches the method using the existing pg_regress
> command as much as possible
> - avoid methods that increase the scope of influence
>
> Next, I list answers to points that are likely to be pointed out in
> advance below :)
> - shell scripts and bat files is used due to ...
> avoid non zero exit code of ecpg command makes tests failure
> avoid increasing C code for executing binary which cares cross platform
> - python code is used because I couldn't write meson.build
> appropriately describe dependency about materials which is used on
> tests without it. please help me...
> - as you said, kick this kind of tests by pg_regress accompanied with
> needless PG server process execution. but pg_regress doesn't execute
> test without it and making pg_regress require modification which has
> not small scope of influence

Sorry, I re-send patch because a patch already sent includes needless
stderr output file.
NEW PATCH FILENAME: ecpg_cmd_notice_regress_test2.patch

In addition, diff between patches does not affect test behavior.

---
Sincerely,
Ryo Kanbayashi
https://github.com/ryogrid


ecpg_cmd_notice_regress_test2.patch
Description: Binary data


Re: Better visualization of default values

2025-02-05 Thread Peter Smith
On Thu, Feb 6, 2025 at 8:08 AM Marcos Pegoraro  wrote:
>
> Reading DOCs sometimes is boring because you want to know only the default 
> value of something. I know what that GUC is, I know how to change it, I only 
> don't remember what its default value is. Then you have to read that entire 
> paragraph just to read that the default value is ...
>
> So, what you think about this.
> It's not complete but you can see if it's useful or not
>

+0.5

IIUC the problem here (if there is one), is just that the default is
sometimes buried within the GUC description.

But just putting the default value adjacent to the GUC name might not
be the best solution because often defaults have special meanings.
e.g. there is no point knowing the default value is -1 unless you know
what that value means. So with your patch, not only is the default
value duplicated, but you are back where you started having to read
all the description text again to learn what it means.

IMO it would be better simply to ensure that defaults are always
described in a consistent place within all GUC descriptions, Then they
are easy to find because you always know where to look for them. In
other words, add nothing new -- just slightly rearrange and/or add
blank lines to the existing text.

Like this:

 (type)




For example:

CURRENTLY
vacuum_cost_delay (floating point)

The amount of time that the process will sleep when the cost limit has
been exceeded. If this value is specified without units, it is taken
as milliseconds. The default value is 0, which disables the cost-based
vacuum delay feature. Positive values enable cost-based vacuuming.

When using cost-based vacuuming, appropriate values for
vacuum_cost_delay are usually quite small, perhaps less than 1
millisecond. While vacuum_cost_delay can be set to
fractional-millisecond values, such delays may not be measured
accurately on older platforms. On such platforms, increasing VACUUM's
throttled resource consumption above what you get at 1ms will require
changing the other vacuum cost parameters. You should, nonetheless,
keep vacuum_cost_delay as small as your platform will consistently
measure; large delays are not helpful.

SUGGESTION
vacuum_cost_delay (floating point)

The amount of time that the process will sleep when the cost limit has
been exceeded. If this value is specified without units, it is taken
as milliseconds. Positive values enable cost-based vacuuming.

The default value is 0, which disables the cost-based vacuum delay feature.

When using cost-based vacuuming, appropriate values for
vacuum_cost_delay are usually quite small, ... blah blah

~~~

Thoughts?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Show WAL write and fsync stats in pg_stat_io

2025-02-05 Thread Bertrand Drouvot
Hi,

On Thu, Feb 06, 2025 at 12:35:28PM +0900, Michael Paquier wrote:
> On Wed, Feb 05, 2025 at 09:52:14PM -0500, Tom Lane wrote:
> > Michael Paquier  writes:
> > Yeah, if we want to assume we can see stats counts left over from
> > initdb, we have to put this in a TAP test, though I dunno if that is
> > the most appropriate one.
> 
> A second option I can think of for the reads is a SQL query in
> pg_walinspect.  We are sure that we have a xlogreader context there,
> forcing reads.
> 
> Anyway, I would just stick all that to TAP, like the attached in 027,
> where we would rely on the startup process to read data, and the
> checkpointer to initialize a segment for the primary.  Perhaps not the
> best position, but we already have similar queries in this test, and
> these two are cheap.  Thoughts about the attached?

That sounds ok, but I wonder if that's the best appropriate place. I could
think of the checkpointer test in 029_stats_restart.pl and the startup/standby
one in one related to standby (030_stats_cleanup_replica.pl?). Though that's
probably just a matter of taste.

s/and the primary WAL some writes/and the primary some WAL writes/

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Failed assertion with jit enabled

2025-02-05 Thread Bertrand Drouvot
Hi,

On Wed, Feb 05, 2025 at 06:58:54PM +0100, Álvaro Herrera wrote:
> On 2025-Feb-05, Bertrand Drouvot wrote:
> 
> > As a default I also always use "maintainer-clean" but it looks like it does 
> > not
> > remove llvmjit.so from the installation directory: is that a miss?
> 
> Hmm, "make uninstall" is supposed to remove things from the install
> directory, but maintainer-clean is not. 

Yeah, I can see that maintainer-clean does remove some stuff in the install
directory but is not exhaustive (so not a valid option).

> But if you first configure with
> different options and then run uninstall, then it won't remove the
> library you had installed previously.  Maybe what you really wanted was
> to zap the entire install directory.

Yeap, will do that now (as also suggested by Tom).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Implement waiting for wal lsn replay: reloaded

2025-02-05 Thread Andrei Lepikhov

On 12/4/24 18:12, Kirill Reshke wrote:

On Wed, 27 Nov 2024 at 09:09, Alexander Korotkov  wrote:

Any comments?

What's the current status of
https://commitfest.postgresql.org/50/5167/ ? Should we close it or
reattach to this thread?

To push this feature further I rebased the patch onto current master.
Also, let's add a commitfest entry:
https://commitfest.postgresql.org/52/5550/

--
regards, Andrei LepikhovFrom ea224b84d343ea726f47af30a7a974e0736d79cc Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Thu, 6 Feb 2025 14:13:09 +0700
Subject: [PATCH v2] Implement WAIT FOR command

WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top.  During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
---
 doc/src/sgml/ref/allfiles.sgml|   1 +
 doc/src/sgml/reference.sgml   |   1 +
 src/backend/access/transam/Makefile   |   3 +-
 src/backend/access/transam/meson.build|   1 +
 src/backend/access/transam/xact.c |   6 +
 src/backend/access/transam/xlog.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  11 +
 src/backend/access/transam/xlogwait.c | 336 ++
 src/backend/commands/Makefile |   3 +-
 src/backend/commands/meson.build  |   1 +
 src/backend/commands/wait.c   | 185 ++
 src/backend/lib/pairingheap.c |  18 +-
 src/backend/parser/gram.y |  15 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/proc.c   |   6 +
 src/backend/tcop/pquery.c |  12 +-
 src/backend/tcop/utility.c|  22 ++
 .../utils/activity/wait_event_names.txt   |   2 +
 src/include/access/xlogwait.h |  89 +
 src/include/commands/wait.h   |  21 ++
 src/include/lib/pairingheap.h |   3 +
 src/include/nodes/parsenodes.h|   7 +
 src/include/parser/kwlist.h   |   1 +
 src/include/storage/lwlocklist.h  |   1 +
 src/include/tcop/cmdtaglist.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/044_wait_for_lsn.pl   | 217 +++
 src/tools/pgindent/typedefs.list  |   4 +
 28 files changed, 967 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/access/transam/xlogwait.c
 create mode 100644 src/backend/commands/wait.c
 create mode 100644 src/include/access/xlogwait.h
 create mode 100644 src/include/commands/wait.h
 create mode 100644 src/test/recovery/t/044_wait_for_lsn.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867..8b585cba75 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83f..bd14ec00d2 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -216,6 +216,7 @@
&update;
&vacuum;
&values;
+   &wait;
 
  
 
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db..a32f473e0a 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -36,7 +36,8 @@ OBJS = \
 	xlogreader.o \
 	xlogrecovery.o \
 	xlogstats.o \
-	xlogutils.o
+	xlogutils.o \
+	xlogwait.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index e8ae9b13c8..74a62ab3ea 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
   'xlogrecovery.c',
   'xlogstats.c',
   'xlogutils.c',
+  'xlogwait

Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Jelte Fennema-Nio
On Wed, 5 Feb 2025 at 21:05, Daniel Gustafsson  wrote:
> For reference, you meant 53 right?

Yes, I meant 53

> If the
> CFBot always need one in "Future" state we should document that to make sure 
> we
> don't miss that going forward (and perhaps automate it to make sure we dont
> make manual work for ourselves).

Afaict it doesn't need at least one in the "Future" state, instead it
needs one after the current[1] commitfest. I don't think it should
rely on that though. So I created an issue to fix that[2].

It does seem silly that we require people to manually create new
commitfests though, so I created an issue to track that[3]

[1]: https://commitfest.postgresql.org/current/
[2]: https://github.com/macdice/cfbot/issues/22
[3]: https://github.com/postgres/pgcommitfest/issues/25




Re: Update Unicode data to Unicode 16.0.0

2025-02-05 Thread Jeff Davis
On Mon, 2024-11-11 at 07:27 +0100, Peter Eisentraut wrote:
> Here is the patch to update the Unicode data to version 16.0.0.
> 
> Normally, this would have been routine, but a few months ago there
> was 
> some debate about how this should be handled. [0]  AFAICT, the
> consensus 
> was to go ahead with it, but I just wanted to notify it here to be
> clear.

We discussed $SUBJECT at the Developer Meeting before FOSDEM.

Those people who were most concerned about the Unicode updates on the
list were not present, so I don't consider the discussion to be
binding. But the attendees present agreed that:

  (a) we should not block the update to Unicode indefinitely; and
  (b) we should make reasonable attempts to mitigate potential
problems.

One idea for (b) resurfaced, which was to make a best-effort check at
pg_upgrade time for affected indexes. The check would not be
bulletproof, because we can't catch dependencies that are hidden inside
SPI (e.g. a plpgsql function that calls LOWER()), but it would catch
most potential problems.

Patch attached. A few notes:

  * The dependency entries don't exist because LOWER(), etc., are
system objects (pinned); so it queries the indexprs, indpreds,
partexprs, and conbin.
  * The query is large and perhaps too clever, but it seems to work. I
tried to add inline comments to the SQL, and pgindent had its own ideas
about how to format them -- suggestions welcome.
  * We haven't actually done the Unicode update yet, so it will notice
that the PG17 and PG18 Unicode versions are the same, and return early.
Either apply on top of the Unicode update patch, or comment out the
early return for testing.
  * It emits a warning rather than an error, so you need to specify
pg_upgrade with "-r" to see the output file.
  * I didn't adapt the query to run on pre-17 versions, even though it
could find some potential problem cases (like an index on NORMALIZE()).
I can add that if someone thinks it's worthwhile.

Regards,
Jeff Davis

From 9d7097edff9230bc9ced9757eb7f23b3ac267b0f Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 3 Feb 2025 13:02:37 -0800
Subject: [PATCH v1] Add pg_upgrade check for Unicode-dependent relations.

---
 src/bin/pg_upgrade/check.c | 179 +
 1 file changed, 179 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 7ca1d8fffc9..17ca1066906 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -13,6 +13,7 @@
 #include "catalog/pg_class_d.h"
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
+#include "common/unicode_version.h"
 
 static void check_new_cluster_is_empty(void);
 static void check_is_install_user(ClusterInfo *cluster);
@@ -25,6 +26,7 @@ static void check_for_tables_with_oids(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_unicode_update(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
 static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(void);
@@ -633,6 +635,12 @@ check_and_dump_old_cluster(void)
 
 	check_for_data_types_usage(&old_cluster);
 
+	/*
+	 * Unicode updates can affect some objects that use expressions with
+	 * functions dependent on Unicode.
+	 */
+	check_for_unicode_update(&old_cluster);
+
 	/*
 	 * PG 14 changed the function signature of encoding conversion functions.
 	 * Conversions from older versions cannot be upgraded automatically
@@ -1740,6 +1748,177 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 		check_ok();
 }
 
+/*
+ * Callback function for processing results of query for
+ * check_for_unicode_update()'s UpgradeTask.  If the query returned any rows
+ * (i.e., the check failed), write the details to the report file.
+ */
+static void
+process_unicode_update(DbInfo *dbinfo, PGresult *res, void *arg)
+{
+	UpgradeTaskReport *report = (UpgradeTaskReport *) arg;
+	int			ntups = PQntuples(res);
+	int			i_reloid = PQfnumber(res, "reloid");
+	int			i_nspname = PQfnumber(res, "nspname");
+	int			i_relname = PQfnumber(res, "relname");
+
+	if (ntups == 0)
+		return;
+
+	if (report->file == NULL &&
+		(report->file = fopen_priv(report->path, "w")) == NULL)
+		pg_fatal("could not open file \"%s\": %m", report->path);
+
+	fprintf(report->file, "In database: %s\n", dbinfo->db_name);
+
+	for (int rowno = 0; rowno < ntups; rowno++)
+		fprintf(report->file, "  (oid=%s) %s.%s\n",
+PQgetvalue(res, rowno, i_reloid),
+PQgetvalue(res, rowno, i_nspname),
+PQgetvalue(res, rowno, i_relname));
+}
+
+/*
+ * Check if the Unicode version built into Postgres changed between the old
+ * cluster and the new cluster.
+ */
+static bool
+unicode_version_changed(ClusterInfo *cluster)
+{
+	PGconn	   *conn_template1 = connectToServer(cluste

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-05 Thread Jeff Davis
On Tue, 2025-02-04 at 23:19 +0300, Alexander Borisov wrote:
> I've done many different experiments and everywhere the result is
> within
> the margin of the v2 patch result.

Great, thank you for working on this!

There doesn't appear to be a downside. Even though it's more complex,
we have exhaustive tests to compare with ICU, so that should catch any
correctness issues.

Heikki mentioned the radix tree, so I'd be interested to know what the
trade-offs there are. I don't have a strong opinion, but I'd like to be
able to explain why we use a radix tree for encoding conversions and
the generated branches approach in this patch for case mapping.

Also, I have a question: when there are deeply-nested "if" statements,
like in this patch, can that create a burden on the branch predictor
that affects code elsewhere?

Regards,
Jeff Davis





Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ilia Evdokimov


On 05.02.2025 21:56, Tom Lane wrote:

It's not a bug.  Since the call specifies NIL pathkeys (meaning it
doesn't care about sort order) and does not insist on a parallel-safe
path, there should always be a path that satisfies it.  The only
way it could fail to find a path is if the rel's pathlist is entirely
empty, a case already rejected by the sole caller.

Moreover, the argument that we might not have gotten a report is not
credible.  In a production build without an Assert, the next line
would still dump core just as effectively if the result were NULL.

While the proposed patch doesn't break anything, it's removing a
logic cross-check that was put there intentionally.  So I don't
find it to be an improvement.

regards, tom lane



Ah, if this Assert was intentionally added to ensure that a path must be 
always found under the given conditions, and that any issues with this 
can be detected in the right place, then I agree. The patch likely makes 
worse.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.


Re: New GUC autovacuum_max_threshold ?

2025-02-05 Thread Nathan Bossart
Committed.

-- 
nathan




Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-05 Thread Melanie Plageman
On Thu, Jan 16, 2025 at 5:50 PM Melanie Plageman
 wrote:
>
> On Thu, Jan 16, 2025 at 4:43 PM Melanie Plageman
>  wrote:
> >
> > On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
> >  wrote:
> > >
> > > I've done something similar to this in attached v2.
> >
> > This needed a rebase. See attached v4.
>
> Whoops -- docs didn't build. Attached v5.

Outside of the positive performance impact of vacuuming pages before
they go cold (detailed in my first email [1]), there is also a
substantial positive effect with this patch for large tables with
substantial cold regions: fewer anti-wraparound vacuums and more
frequent normal/aggressive vacuums

With the default vacuum settings, you often see an append-only table
devolve to _only_ anti-wraparound vacuums after the first aggressive
vacuum. I ran an insert-only workload for an hour (with 32 clients and
synchronous commit off to maximize the amount of data inserted) with
the default vacuum settings. On master, after the first aggressive
vacuum, we do only anti-wraparound vacuums (and only two of these are
triggered). With the patch, after the first aggressive vacuum, 10 more
vacuums are triggered -- none of which are anti-wraparound vacuums.

I attached a chart comparing the autovacuums triggered on master vs
with the patch.

Besides the performance benefit of spreading the freezing work over
more normal vacuums (thereby disrupting foreground workloads less),
anti-wraparound vacuums are not auto canceled by DDL -- making them
more of a nuisance to users.

[1] 
https://www.postgresql.org/message-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com


Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-05 Thread Melanie Plageman
On Sat, Jan 18, 2025 at 11:51 AM Tomas Vondra  wrote:
>
> Sure. I repeated the benchmark with v13, and it seems the behavior did
> change. I no longer see the "big" regression when most of the pages get
> updated (and need vacuuming).
>
> I can't be 100% sure this is due to changes in the patch, because I did
> some significant upgrades to the machine since that time - it has Ryzen
> 9900x instead of the ancient i5-2500k, new mobo/RAM/...  It's pretty
> much a new machine, I only kept the "old" SATA SSD RAID storage so that
> I can do some tests with non-NVMe.
>
> So there's a (small) chance the previous runs were hitting a bottleneck
> that does not exist on the new hardware.
>
> Anyway, just to make this information more complete, the machine now has
> this configuration:
>
> * Ryzen 9 9900x (12/24C), 64GB RAM
> * storage:
>   - data: Samsung SSD 990 PRO 4TB (NVMe)
>   - raid-nvme: RAID0 4x Samsung SSD 990 PRO 1TB (NVMe)
>   - raid-sata: RAID0 6x Intel DC3700 100GB (SATA)
>
> Attached is the script, raw results (CSV) and two PDFs summarizing the
> results as a pivot table for different test parameters. Compared to the
> earlier run I tweaked the script to also vary io_combine_limit (ioc), as
> I wanted to see how it interacts with effective_io_concurrency (eic).
>
> Looking at the new results, I don't see any regressions, except for two
> cases - data (single NVMe) and raid-nvme (4x NVMe). There's a small area
> of regression for eic=32 and perc=0.0005, but only with WAL-logging.
>
> I'm not sure this is worth worrying about too much. It's a heuristics
> and for every heuristics there's some combination parameters where it
> doesn't quite do the optimal thing. The area where the patch brings
> massive improvements (or does not regress) are much more significant.
>
> I personally am happy with this behavior, seems to be performing fine.

Yes, looking at these results, I also feel good about it. I've updated
the commit metadata in attached v14, but I could use a round of review
before pushing it.

- Melanie
From 2397c7fb4b91f907ebcec60d35067c5072a2ae8b Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 5 Feb 2025 17:23:05 -0500
Subject: [PATCH v14 2/2] Use streaming I/O in VACUUM's third phase

Now vacuum's third phase (its second pass over the heap), which removes
dead items referring to dead tuples collected in the first phase, uses a
read stream that looks ahead in the TidStore.

Author: Melanie Plageman 
Co-authored-by: Thomas Munro 
Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 42 
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 70351de403d..222ee01e1ad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2250,6 +2250,27 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 	return allindexes;
 }
 
+/*
+ * Read stream callback for vacuum's third phase (second pass over the heap).
+ */
+static BlockNumber
+vacuum_reap_lp_read_stream_next(ReadStream *stream,
+void *callback_private_data,
+void *per_buffer_data)
+{
+	TidStoreIter *iter = callback_private_data;
+	TidStoreIterResult *iter_result;
+
+	iter_result = TidStoreIterateNext(iter);
+	if (iter_result == NULL)
+		return InvalidBlockNumber;
+
+	/* Save the TidStoreIterResult for later, so we can extract the offsets. */
+	memcpy(per_buffer_data, iter_result, sizeof(*iter_result));
+
+	return iter_result->blkno;
+}
+
 /*
  *	lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
  *
@@ -2270,6 +2291,8 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 static void
 lazy_vacuum_heap_rel(LVRelState *vacrel)
 {
+	Buffer		buf;
+	ReadStream *stream;
 	BlockNumber vacuumed_pages = 0;
 	Buffer		vmbuffer = InvalidBuffer;
 	LVSavedErrInfo saved_err_info;
@@ -2290,10 +2313,18 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 			 InvalidBlockNumber, InvalidOffsetNumber);
 
 	iter = TidStoreBeginIterate(vacrel->dead_items);
-	while ((iter_result = TidStoreIterateNext(iter)) != NULL)
+	stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE,
+		vacrel->bstrategy,
+		vacrel->rel,
+		MAIN_FORKNUM,
+		vacuum_reap_lp_read_stream_next,
+		iter,
+		sizeof(TidStoreIterResult));
+
+	while (BufferIsValid(buf = read_stream_next_buffer(stream,
+	   (void **) &iter_result)))
 	{
 		BlockNumber blkno;
-		Buffer		buf;
 		Page		page;
 		Size		freespace;
 		OffsetNumber offsets[MaxOffsetNumber];
@@ -2301,8 +2332,7 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 
 		vacuum_delay_point();
 
-		blkno = iter_result->blkno;
-		vacrel->blkno = blkno;
+		vacrel->blkno = blkno = BufferGetBlockNumber(buf);
 
 		num_offsets = TidStoreGetBlockOffsets(iter_result, offsets, lengthof(offsets));
 		Assert(num_offsets <= lengthof(

Re: Confusing variable naming in LWLockRelease

2025-02-05 Thread Jeff Davis
On Wed, 2025-01-29 at 23:16 -0800, Jacob Brazeal wrote:
> Here the variable name "oldstate" leads one to believe that the value
> is fetched before the sub operation, similar to some other usages in
> lwlock.c.

I believe it refers to the state of the lock prior to lock acquisition;
not prior to subtraction.

I agree that it's a bit confusing, because I don't think it necessarily
changes it back to the state prior to acquisition if it's a shared
lock. But the name "newstate" wouldn't be great, either.

I don't have a great name in mind, so perhaps a comment instead?

Regards,
Jeff Davis





Inquiry About Google Summer of Code Projects

2025-02-05 Thread Mohamed Badawy
I hope this message finds you well.

My name is Mohamed, and I am currently a third-year Computer Engineering
student. I possess skills in C++, C#, Python, Object-Oriented Programming
(OOP), Data Structures and Algorithms (DSA), SQL, and SQL Server. Recently,
I have also gained knowledge about Git and GitHub, as well as the
architecture of Git.

I noticed that there are some topics you require for the projects that I am
not yet familiar with. Would it be possible for me to learn these as I
progress through the projects, or do you require prior knowledge of these
topics before applying?

Thank you for your time, and I look forward to your response.

Best regards,
Mohamed


Re: Showing applied extended statistics in explain Part 2

2025-02-05 Thread Ilia Evdokimov



On 05.02.2025 09:28, Tatsuro Yamada wrote:

Hi All,

Thank you everyone for your cooperation with comments on the patch and 
solution ideas.
I am sorting through your review comments now. And after rebasing the 
patch, I plan to
send a patch that addresses the comments as much as possible to 
-hackers by Feb 21 at the latest.


Therefore, the status of the patch for this feature from last month's 
commit fest will be

"moving to the next commit fest".

P.S.
Thanks for letting me know, Ilia.

Thanks,
Tatsuro Yamada



Thank for your work!

Regarding dependency statistics, I noticed that we can't apply them 
individually, which makes it difficult to map specific clauses to 
individual stats. However, why not display the dependency statistics 
above the execution plan as a list? We could format it as a list, but I 
believe it's crucial to highlight when dependency statistics are 
applied, Analyzing dependencies stats often takes the longest compared 
to other statistics, and if users could see the extended statistics for 
dependencies upfront, they might be able to remove them if they're not 
useful for the plan.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: Show WAL write and fsync stats in pg_stat_io

2025-02-05 Thread Nazir Bilal Yavuz
Hi,

On Wed, 5 Feb 2025 at 21:32, Tom Lane  wrote:
>
> Michael Paquier  writes:
> > At the end, we want this patch and this data, and my benchmarcking is
> > not showing much differences even if going through a workload with
> > many pages, so I've used the version relying entirely on
> > track_io_timing and applied it.
>
> Locally, the test added by this commit fails like so:
>
> diff -U3 /home/postgres/pgsql/src/test/regress/expected/stats.out 
> /home/postgres
> /pgsql/src/test/regress/results/stats.out
> --- /home/postgres/pgsql/src/test/regress/expected/stats.out2025-02-04 
> 12:33
> :07.456393545 -0500
> +++ /home/postgres/pgsql/src/test/regress/results/stats.out 2025-02-05 
> 13:08
> :30.605638432 -0500
> @@ -886,7 +886,7 @@
>WHERE context = 'normal' AND object = 'wal';
>   ?column?
>  --
> - t
> + f
>  (1 row)
>
>  -
>
> This is pretty repeatable (not perfectly so) in a build with
> --enable-debug --enable-cassert --enable-tap-tests --with-llvm
> but it usually passes without --with-llvm.  System is fairly
> up-to-date RHEL8 on x86_64.  No idea why the buildfarm isn't
> unhappy.  Any pointers where to look?

Thanks for the report!

My thoughts when adding this test was that startup process must do the
WAL read I/O while server is starting, i.e.:

'''
startup process ->
InitWalRecovery ->
ReadCheckpointRecord ->
ReadRecord ->
XLogPrefetcherReadRecord ->
lrq_complete_lsn ->
lrq_prefetch ->
lrq->next = XLogPrefetcherNextBlock ->
XLogReadAhead ->
XLogDecodeNextRecord ->
ReadPageInternal ->
state->routine.page_read = XLogPageRead()
'''

Is there a chance that the function chain above does not get triggered
while running the stats.sql test?

--
Regards,
Nazir Bilal Yavuz
Microsoft




Re: should we have a fast-path planning for OLTP starjoins?

2025-02-05 Thread Corey Huinker
>
>
> Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
> of these star join queries with LEFT JOIN too, and then the FKs are not
> needed. All you need is a PK / unique index on the other side.


Indeed, many installations specifically _remove_ foreign keys because of
the dreaded RI check on delete. Basically, if you delete one or more rows
in dim1 then the referencing fact1 must be scanned to ensure that it does
not contain a reference to the deleted row. Often the referencing field on
fact1 is not indexed, because the index is almost never useful in an actual
select query, so even if you did index it several unused index metrics will
identify it as a candidate for deletion. What you get is one sequential
scan of fact1 for every row deleted from dim1. Now, we could get around
this by changing how we do delete RI checks, either by moving to statement
level triggers or bypassing triggers entirely, but until we do so, it is
likely that many customers avoid otherwise useful FK references.


Re: RFC: Packing the buffer lookup table

2025-02-05 Thread Matthias van de Meent
On Wed, 5 Feb 2025 at 02:22, Andres Freund  wrote:
>
> Hi,
>
> On 2025-02-04 19:58:36 +0100, Matthias van de Meent wrote:
> > On Thu, 30 Jan 2025 at 08:48, Matthias van de Meent
> >  wrote:
> > >
> > > Together that results in the following prototype patchset.
> >
> > Here's an alternative patch, which replaces dynahash in the buffer
> > lookup table with an open-coded replacement that has fewer
> > indirections during lookups, and allows for a much increased locality.
> >
> > Overall, it has less memory usage than our current dynahash in all
> > cases; but compared to my previous patch it improves memory only in
> > some cases, in others it uses more memory.
> >
> > The main design is a separate chaining hash table (similar to and
> > derived from code of Dynahash) to provide efficient access to free
> > elements.
>
> Why do you want to use a separate chaining hash table? For something as
> performance sensitive as this the lower cache hit ratio seems like a strong
> reason to not go for such a hash table "architecture"?

I think it's hard to defend this patchset against a perfect solution,
but lacking that perfect solution (and provided the current definitely
not great solution), I'll share my thoughts on why I chose for a
different hashmap. While wordy, I'm not trying to defend or attack
anything or anyone, just explaining my reasoning.

My main reasons are:

1.) this is a mostly drop-in replacement that, in the general case, is
better in every metric vs dynahash; which is probably preferred over
continuing the status quo given that it's now really starting to show
its limits (see e.g. the hash_search_with_hash_value thread).

2.) I'm not experienced enough with writing non-constant-sized
allocations in shared memory to get something like a radixtree.h
implemented in shared memory. Given that threading is still some
releases away, I thought a chaining hash table (which only needs
constant size allocations and thus requires only a simple slab
allocator) would be OK if it still improved on the status quo.
Note that radixtree's implementation has nodes sizing from 16 bytes up
to 1000+ bytes. Building an efficient allocator for that is difficult
(see: bin-packing problem), assuming it needs to interface with shmem
and thus statically pre-allocated.

3.) simplehash.h can't currently be used due to requiring resizing on
certain intervals, and we can't just dynamically re-allocate the
table. This makes the implementation unusable.

4.) simplehash also invalidates cache lines containing otherwise
unmodified elements due to its relocation strategy, which isn't great
either for cache sharing.

5.) A big reason why dynahash is slow, apart from random unpredictable
memory accesses, is said to be because it needs to follow 3 pointers
before it has its first element: The hash table has a pointer to a
segment, which is a pointer to a bucket, which is a pointer to an
ELEMENT - and the hash table itself is an allocated struct and thus
needs to be loaded through a pointer as well. It isn't the only reason
(e.g. bucket chains are expensive, too) but probably one large reason
why it's so slow.

As for what the buffer lookup table worries about; I considered the following:

cache locality: How quickly can you find an element, starting with 0 info)
dynahash (--) newhash (-) simplehash (+) radixtree (~?)
cache line dirtying: How many cache lines are dirtied with
insert/delete operations
dynahash (+++) newhash (++) simplehash (--) radixtree (++ ?)
spatial locality: How close are "near" elements to eachother in the structure?
dynahash (---) newhash (--?) simplehash (-?) radixtree (++)
partitioning: Can the structure be written to by multiple backends concurrently?
dynahash (+++) newhash (+++) simplehash (---) radixtree (--?)
memory usage: Compare memory overhead allocated for random buffers
dynahash (~) newhash (+) simplehash (++) radixtree (-?)


> I think it'd be better to work towards not using a hash table for the buffer
> mapping table than to write a dedicated hash table implementation for it
> though.  A hash table
>
> a) doesn't have ordered access support, which causes a bunch of O(NBuffers)
>operations and makes things like readahead etc way more expensive
> b) doesn't benefit from spatial locality, even though buffer lookups have a
>very high degree of spatial locality

While I hear those arguments, I'm not sure it's as easy as you make it
seem. You yourself have said that "dynahash is [a] really poor hash
table implementation." This new hash table was 'only' a few days'
work, and seems to at least have fixed at least 2 of the major issues
we see in dynahash: It reduces the pointer chasing for first result to
2 in the opmal case, and (with some further changes which I have yet
to share) it may achieve a much higher average locality vs the
current, unmodified, dynahash.
I did look at other solutions (specifically, adapting radixtree.h
and/or simplehash.h) but as I've mentioned above, the memory tooling
require

Re: RFC: Packing the buffer lookup table

2025-02-05 Thread Matthias van de Meent
On Wed, 5 Feb 2025 at 02:14, Andres Freund  wrote:
>
> Hi,
>
> On 2025-01-30 08:48:56 +0100, Matthias van de Meent wrote:
> > Some time ago I noticed that every buffer table entry is quite large at 40
> > bytes (+8): 16 bytes of HASHELEMENT header (of which the last 4 bytes are
> > padding), 20 bytes of BufferTag, and 4 bytes for the offset into the shared
> > buffers array, with generally 8 more bytes used for the bucket pointers.
> > (32-bit systems: 32 (+4) bytes)
> >
> > Does anyone know why we must have the buffer tag in the buffer table?
>
> It turns out to actually substantially improve CPU cache hit ratio on
> concurrent workloads. The BufferDesc is obviously frequently modified. Each
> modification (pin, lock) will pull the cacheline into modified state, within a
> single core. Causing higher latency when accessing that data on other
> cores. That's obviously not great for a crucial hashtable...  I think it
> mainly matters for things like inner index pages etc.
>
> It turns out that these misses due to dirtying already costs us rather dearly
> when running read-heavy workloads on bigger machines, mainly due to nbtree
> code doing things like BufferGetBlockNumber().

That is something I hadn't thought about, but indeed, you're right
that this wouldn't be great.

> It'd be interesting to benchmark how a separate, more densely packed,
> BufferTag array, shared by the hashtable and the BufferDesc itself.  Obviously
> that'd be a somewhat painful change.

Such a patch is actually not that bad, as surprisingly few files
actually touch on BufferDesc (let alone BufferDesc->tag - though the
number of places with changes is still 100+). I've prototyped with it,
and it removes another 12 bytes from the overhead of each buffer
(assuming we want to pack BufferDesc at 32 bytes, as that's possible).

> > Does anyone have an idea on how to best benchmark this kind of patch, apart
> > from "running pgbench"? Other ideas on how to improve this? Specific
> > concerns?
>
> I'd recommend benchmarking at least the following workloads, all fully shared
> buffer cache resident:
[...]

Thanks for the suggestions!

> It's unfortunately fairly important to test these both with a single client
> *and* a large number of clients on a multi-socket server. The latter makes
> cache miss latency much more visible.
>
> It might be worth looking at perf c2c profiles for before/after, I'd expect it
> to change noticeably. Might also be worth at looking at perf stat for cache
> misses, hitm, etc.

Hmm. I'll see if I can get some hardware to test this.

FYI, I've pushed a newer version of the 'newhash' approach to Github,
at [0]. It extracts the buffer tags from the BufferDesc into its own
array to reduce the problems with false sharing due to pins, and has
some more code to try and forcefully increase the locality of hash
elements.

There are still a few more potential changes that can increase cache
locality of hash elements with the buckets that store their data, but
I have no immediate plans for that.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://github.com/MMeent/postgres/tree/feat/new-buftable-hash




Re: Failed assertion with jit enabled

2025-02-05 Thread Tom Lane
Bertrand Drouvot  writes:
> I did look more closely (knowing that it works for you) and the issue is 
> linked
> to not using --with-llvm. Inded, I used to use --with-llvm but removed it some
> time ago for testing.
> So the failed build was not using --with-llvm and was relying on an old 
> version
> of llvmjit.so (build from the last time I used --with-llvm)...

Hmm ... I don't understand why a non-JIT build of the core would
attempt to load llvmjit.so.  Seems like trouble waiting to happen,
given how closely coupled the core and JIT are.  (The .bc files are
pretty much guaranteed to be out of sync in such a case.)


> As a default I also always use "maintainer-clean" but it looks like it does 
> not
> remove llvmjit.so from the installation directory: is that a miss?

Our "make install" doesn't attempt to remove any files, and I don't
believe that's customary for anyone else either.  It'd be pretty
dangerous when installing into a shared system directory.

My personal development cycle usually includes "rm -rf $INSTALLDIR"
before "make install", if I've done any not-localized changes.

regards, tom lane




Re: Inquiry About Google Summer of Code Projects

2025-02-05 Thread Jesper Pedersen

Hi Mohamed,

On 2/5/25 10:29 AM, Mohamed Badawy wrote:

I hope this message finds you well.

My name is Mohamed, and I am currently a third-year Computer Engineering 
student. I possess skills in C++, C#, Python, Object-Oriented 
Programming (OOP), Data Structures and Algorithms (DSA), SQL, and SQL 
Server. Recently, I have also gained knowledge about Git and GitHub, as 
well as the architecture of Git.


I noticed that there are some topics you require for the projects that I 
am not yet familiar with. Would it be possible for me to learn these as 
I progress through the projects, or do you require prior knowledge of 
these topics before applying?


Thank you for your time, and I look forward to your response.



Please, reach out to the mentor directly for the project you are 
interested in.


Note, that we havn't received formal approval yet, hopefully that will 
happen next Tuesday.


Feel free to contact me in private too.

Best regards,
 Jesper





Failed assertion with jit enabled

2025-02-05 Thread Bertrand Drouvot
Hi hackers,

I was doing some tests and managed to trigger a failed assertion with jit
enabled.

The test can be simplified to:

postgres=# select count(*) from generate_series(1,1000);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The failed assertion is:

TRAP: failed Assert("false"), File: "llvmjit_expr.c", Line: 2833, PID: 3060333

"git bisect" tells me it has been introduced with 80feb727c86.

I did not look at the exact details, just reporting the issue.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: per backend WAL statistics

2025-02-05 Thread Bertrand Drouvot
Hi,

On Wed, Feb 05, 2025 at 07:31:13PM +0900, Michael Paquier wrote:
> On Wed, Feb 05, 2025 at 10:22:55AM +, Bertrand Drouvot wrote:
> > So, wal_buffers_full has been introduced after the WalUsage structure was
> > there but I don't see any reason in the emails as to why it's not in the 
> > WalUsage
> > structure (I might have missed it though).
> > 
> > I think that this proposal makes sense but would need a dedicated thread,
> > thoughts?
> 
> Using a separate thread for a change like that makes sense to me.  I
> have to admit that the simplifications in terms of designs for what
> we're discussing here makes such a change more valuable.  Adding this
> information to WalUsage is one thing.  Showing it in EXPLAIN is a
> second thing.  Doing the former simplifies the patch you are proposing
> here.  We don't necessarily have to do the latter, but I don't see a
> reason to not do it, either.

Agree, I'll start a dedicated thread for that.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Eagerly scan all-visible pages to amortize aggressive vacuum

2025-02-05 Thread Robert Haas
On Tue, Feb 4, 2025 at 5:34 PM Melanie Plageman
 wrote:
> I think I misspoke when I said we are unlikely to have contended
> all-visible pages. I suppose it is trivial to concoct a scenario where
> there are many pinned all-visible pages.

It's hard to keep heap pages pinned for a really long time, so there
aren't likely to be a lot of them at once. Maybe async I/O will change
that somewhat, but the word in this sentence that I would emphasize is
"concoct". I don't think it's normal for there to be lots of pinned
pages just hanging around.

(It's a bit easier to have index pages that stay pinned for a long
time because an index scan can, or at least could last I checked, hold
onto a pin while the cursor was open even if we're not currently
executing the query. But for a heap page that doesn't happen, AFAIK.)

> However, if we don't count an eagerly scanned page as a failure when
> we don't get the cleanup lock because we won't have incurred a read,
> then why would we count any eagerly scanned page in shared buffers as
> a failure? In the case that we actually tried freezing the page and
> failed because it was too new, that is giving us information about
> whether or not we should keep trying to freeze. So, it is not just
> about doing the read but also about what the failure indicates about
> the data.

That's a good point, but failure to get a tuple lock isn't a judgement
either way on whether the XIDs in the page are old or new.

> Interestingly, we call heap_tuple_should_freeze() in
> lazy_scan_noprune(), so we actually could tell whether or not there
> are some tuples on the page old enough to trigger freezing if we had
> gotten the cleanup lock. One option would be to add a new output
> parameter to lazy_scan_noprune() that indicates whether or not there
> were tuples with xids older than the FreezeLimit, and only if there
> were not, count it as a failed eager scan.

Yeah, that could be done, and it doesn't sound like a bad idea.
However, I'm also unsure whether we need to add additional logic for
this case. My intuition is that it just won't happen very often. I'm
not quite confident that I'm correct about that, but if I had to
guess, my guess would be "rare scenario".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ilia Evdokimov



On 05.01.2025 02:29, Ranier Vilela wrote:

Hi.

Per Coverity.

All call sites of function *get_cheapest_path_for_pathkeys* checks
for NULL returns.

So, it is highly likely that the function will return NULL.

IMO, the Assert in this particular call, is not fully effective.

Fix removing the Assert and always check if the return is NULL.

best regards,
Ranier Vilela



Hi!

Thanks for noticing this. If this happens in the planner, it poses a 
serious risk of a segmentation fault that could crash the instance if a 
NULL pointer is dereferenced. Since checking for NULL is very cheap, I 
support this patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: Failed assertion with jit enabled

2025-02-05 Thread Álvaro Herrera
On 2025-Feb-05, Bertrand Drouvot wrote:

> As a default I also always use "maintainer-clean" but it looks like it does 
> not
> remove llvmjit.so from the installation directory: is that a miss?

Hmm, "make uninstall" is supposed to remove things from the install
directory, but maintainer-clean is not.  But if you first configure with
different options and then run uninstall, then it won't remove the
library you had installed previously.  Maybe what you really wanted was
to zap the entire install directory.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-05 Thread Masahiko Sawada
On Tue, Feb 4, 2025 at 11:37 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 4 Feb 2025 17:32:07 -0800,
>   Masahiko Sawada  wrote:
>
> > BTW we need to check if the return value type of the handler function
> > is copy_handler.
>
> Oh, can we do it without calling a function?

Yes.

> It seems that
> FmgrInfo doesn't have return value type information. Should
> we read pg_catalog.pg_proc or something for it?

Yes, we can do like what we do for TABLESAMPLE for example:

/* check that handler has correct return type */
if (get_func_rettype(handlerOid) != TSM_HANDLEROID)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("function %s must return type %s",
NameListToString(rts->method), "tsm_handler"),
 parser_errposition(pstate, rts->location)));

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




problems with extensions compiling after upgrade to fc42 (and gcc15)

2025-02-05 Thread Pavel Stehule
Hi

I found a problem with compilation of plpgsql_check when I upgraded my
fedora.

plpgsql_check cannot be compiled (against PostgreSQL 15-). The compilation
fails

rc/expr_walk.c:84:66: warning: passing argument 2 of ‘query_tree_walker’
from incompatible pointer type [-Wincompatible-pointer-types]
   84 |
 detect_dependency_walker,
  |
 ^~~~
  |  |
  |
 _Bool (*)(Node *, void *)
In file included from src/expr_walk.c:14:
/usr/local/pgsql/13/include/server/nodes/nodeFuncs.h:137:52: note: expected
‘_Bool (*)(void)’ but argument is of type ‘_Bool (*)(Node *, void *)’
  137 | extern bool query_tree_walker(Query *query, bool (*walker) (),
  | ~~~^~
src/expr_walk.c:43:1: note: ‘detect_dependency_walker’ declared here
   43 | detect_dependency_walker(Node *node, void *context)
  | ^~~~

It is strange, so the source code of Postgres is compiled without problems.

The workaround is pushing to Makefiie

override CFLAGS +=  -Wno-error=incompatible-pointer-types

Regards

Pavel


Improve CRC32C performance on SSE4.2

2025-02-05 Thread Devulapalli, Raghuveer
This patch improves the performance of SSE42 CRC32C algorithm. The current 
SSE4.2 implementation of CRC32C relies on the native crc32 instruction and 
processes 8 bytes at a time in a loop. The technique in  this paper uses the 
pclmulqdq instruction and processing 64 bytes at time. The algorithm is based 
on sse42 version of crc32 computation from Chromium's copy of zlib with 
modified constants for crc32c computation. See:

https://chromium.googlesource.com/chromium/src/+/refs/heads/main/third_party/zlib/crc32_simd.c

Microbenchmarks (generated with google benchmark using a standalone version of 
the same algorithms):

Comparing scalar_crc32c to sse42_crc32c (for various buffer sizes: 64, 128, 
256, 512, 1024, 2048 bytes)
Benchmark   Time CPU
  Time Old  Time New   CPU Old   CPU New

[scalar_crc32c vs. sse42_crc32c]/64  -0.8147 -0.8148
33 633 6
[scalar_crc32c vs. sse42_crc32c]/128-0.8962 -0.8962 
   88 988 9
[scalar_crc32c vs. sse42_crc32c]/256-0.9200 -0.9200 
  21117   21117
[scalar_crc32c vs. sse42_crc32c]/512-0.9389 -0.9389 
  48630   48630
[scalar_crc32c vs. sse42_crc32c]/1024  -0.9452 -0.9452  
103757  103757
[scalar_crc32c vs. sse42_crc32c]/2048 -0.9456 -0.9456   
   2140   116  2140   116

Raghuveer


v1-0001-Add-more-test-coverage-for-crc32c.patch
Description: v1-0001-Add-more-test-coverage-for-crc32c.patch


v1-0002-Improve-CRC32C-performance-on-SSE4.2.patch
Description: v1-0002-Improve-CRC32C-performance-on-SSE4.2.patch


Re: problems with extensions compiling after upgrade to fc42 (and gcc15)

2025-02-05 Thread Nathan Bossart
On Wed, Feb 05, 2025 at 09:31:27PM +0100, Pavel Stehule wrote:
> I found a problem with compilation of plpgsql_check when I upgraded my
> fedora.
> 
> plpgsql_check cannot be compiled (against PostgreSQL 15-). The compilation
> fails
> 
> rc/expr_walk.c:84:66: warning: passing argument 2 of `query_tree_walker´
> from incompatible pointer type [-Wincompatible-pointer-types]
>84 |
>  detect_dependency_walker,
>   |
>  ^~~~
>   |  |
>   |
>  _Bool (*)(Node *, void *)
> In file included from src/expr_walk.c:14:
> /usr/local/pgsql/13/include/server/nodes/nodeFuncs.h:137:52: note: expected
> `_Bool (*)(void)´ but argument is of type `_Bool (*)(Node *, void *)´
>   137 | extern bool query_tree_walker(Query *query, bool (*walker) (),
>   | ~~~^~
> src/expr_walk.c:43:1: note: `detect_dependency_walker´ declared here
>43 | detect_dependency_walker(Node *node, void *context)
>   | ^~~~

IIRC gcc 15 uses C23 by default, and this looks similar to the problem
fixed in commit d2b4b4c.  I think commit 1c27d16 unintentionally fixed this
for v16 and above.

-- 
nathan




Re: problems with extensions compiling after upgrade to fc42 (and gcc15)

2025-02-05 Thread Peter Eisentraut

On 05.02.25 21:31, Pavel Stehule wrote:
I found a problem with compilation of plpgsql_check when I upgraded my 
fedora.


plpgsql_check cannot be compiled (against PostgreSQL 15-). The 
compilation fails


rc/expr_walk.c:84:66: warning: passing argument 2 of ‘query_tree_walker’ 
from incompatible pointer type [-Wincompatible-pointer-types]
    84 | 
  detect_dependency_walker,
       | 
  ^~~~

       |                                                                  |
       | 
  _Bool (*)(Node *, void *)

In file included from src/expr_walk.c:14:
/usr/local/pgsql/13/include/server/nodes/nodeFuncs.h:137:52: note: 
expected ‘_Bool (*)(void)’ but argument is of type ‘_Bool (*)(Node *, 
void *)’

   137 | extern bool query_tree_walker(Query *query, bool (*walker) (),
       |                                             ~~~^~
src/expr_walk.c:43:1: note: ‘detect_dependency_walker’ declared here
    43 | detect_dependency_walker(Node *node, void *context)
       | ^~~~

It is strange, so the source code of Postgres is compiled without problems.

The workaround is pushing to Makefiie

override CFLAGS +=  -Wno-error=incompatible-pointer-types


This will be fixed in the next minor releases.  See commit f00c401c65a 
for example.






RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-02-05 Thread Devulapalli, Raghuveer
Hi John, 

> Further, we verified upthread that Intel's current and near-future product 
> line
> includes server chips (some with over 100 cores, so not exactly low-end) that
> don't support AVX-512 at all. I have no idea how common they will be, but they
> will certainly be found in cloud datacenters somewhere. Shouldn't we have an
> answer for them as well?

Just submitted a patch to improve the SSE4.2 version using the source you 
referenced. See 
 
https://www.postgresql.org/message-id/PH8PR11MB82869FF741DFA4E9A029FF13FBF72%40PH8PR11MB8286.namprd11.prod.outlook.com
 
> I know you had extended time off work, but I've already shared my findings and
> explained my reasoning [2]. The title of the paper is "Fast CRC Computation 
> for
> iSCSI Polynomial Using CRC32 Instruction", so unsurprisingly it does improve 
> the
> SSE42 version. With a few dozen lines of code, I can get ~3x speedup on page-
> sized inputs. At the very least we want to use this technique on Arm [3], and 
> the
> only blocker now is the question regarding the patents. I'm interested to 
> hear the
> response on this.

Still figuring this out. Will respond as soon as I can. 

Thanks,
Raghuveer


Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Daniel Gustafsson
> On 5 Feb 2025, at 20:36, Jelte Fennema-Nio  wrote:
> 
> On Wed, 5 Feb 2025 at 20:29, Jelte Fennema-Nio  wrote:
>> I'll look into fixing that soonish. I took a quick look and it seems
>> related to some unexpected response from the Cirrus API.
> 
> Okay I think I got it running again. It didn't like that there was no
> commitfest with number 54 yet. So I created one, and it's doing more
> than before now. I'll check after dinner if it's still running
> correctly then.

For reference, you meant 53 right?  (There is no 54 in the system.) If the
CFBot always need one in "Future" state we should document that to make sure we
don't miss that going forward (and perhaps automate it to make sure we dont
make manual work for ourselves).

--
Daniel Gustafsson





Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Tom Lane
Daniel Gustafsson  writes:
> On 5 Feb 2025, at 18:34, Ranier Vilela  wrote:
>> This is evidence that we do not have reports about this bug.

> Before that can be stated it needs to be determined if this is a bug, this
> thread has not done that yet.

It's not a bug.  Since the call specifies NIL pathkeys (meaning it
doesn't care about sort order) and does not insist on a parallel-safe
path, there should always be a path that satisfies it.  The only
way it could fail to find a path is if the rel's pathlist is entirely
empty, a case already rejected by the sole caller.

Moreover, the argument that we might not have gotten a report is not
credible.  In a production build without an Assert, the next line
would still dump core just as effectively if the result were NULL.

While the proposed patch doesn't break anything, it's removing a
logic cross-check that was put there intentionally.  So I don't
find it to be an improvement.

regards, tom lane




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Andres Freund
Hi,

On 2025-02-05 19:42:05 +0100, Jelte Fennema-Nio wrote:
> On Wed, 5 Feb 2025 at 00:22, Andres Freund  wrote:
> > Pushed like that.
> >
> > I'll watch CI and BF over the next hours.
> 
> I guess you probably noticed, but in case you didn't: CI on windows is
> still broken.

Huh. CI did pass on all platforms after my push:
https://cirrus-ci.com/github/postgres/postgres/

While there is a failure on master, it isn't due to this:
https://cirrus-ci.com/task/6185223693533184
[17:55:32.636] - 8< 
-
[17:55:32.636] stderr:
[17:55:32.636] #   Failed test 'can't connect to invalid database - error 
message'
[17:55:32.636] #   at C:/cirrus/src/test/recovery/t/037_invalid_database.pl 
line 40.
[17:55:32.636] #   'psql: error: connection to server on socket 
"C:/Windows/TEMP/kqIhcyR2yC/.s.PGSQL.31868" failed: server closed the 
connection unexpectedly
[17:55:32.636] #This probably means the server terminated abnormally
[17:55:32.636] #before or while processing the request.'
[17:55:32.636] # doesn't match '(?^:FATAL:\s+cannot connect to invalid 
database "regression_invalid")'
[17:55:32.636] # Looks like you failed 1 test of 10.
[17:55:32.636] 
[17:55:32.636] (test program exited with status code 1)
[17:55:32.636] 
--

I think that may be due to

commit a14707da564e8c94bd123f0e3a75e194fd7ef56a (upstream/master)
Author: Tom Lane 
Date:   2025-02-05 12:45:58 -0500
 
Show more-intuitive titles for psql commands \dt, \di, etc.




I do see a lot of failures on cfbot - but afaict that's because for some
reason there haven't been recent runs. Thomas?

E.g. the currently newest run is https://cirrus-ci.com/build/6378368658046976
which is based on

commit 43493cceda2
Author: Peter Eisentraut 
Date:   2025-01-24 22:58:13 +0100

Add get_opfamily_name() function

Greetings,

Andres Freund




Re: Prevent COPY FREEZE on Foreign tables

2025-02-05 Thread Sami Imseih
Thanks for the feedback!

> Instead of throwing an error, how about we turn that into a warning?
> This way, if someone is batch-importing data for multiple tables, it won’t 
> interrupt their script that generates a COPY for each table.
> Is it better to give them a heads-up without making them modify their 
> commands right away?

Hmm, I prefer we error out the transaction as it can be difficult
to detect a warning and the user will assume that the
transaction completed. Also, we currently error out when
copy freeze is on a parent partition, so I rather we keep
this behavior consistent.

Regards,

Sami

On Mon, Feb 3, 2025 at 7:31 PM Zhang Mingli  wrote:
>
> Hi,
>
>
> Zhang Mingli
> www.hashdata.xyz
> On Feb 4, 2025 at 04:18 +0800, Sami Imseih , wrote:
>
>
> This really does not make sense as this
> optimization cannot be applied to a remote table and it
> can give a user a false impression that it was.
>
> +1,
>
> ```
> + /*
> + * Raise an error on foreign tables as it's not possible to apply
> + * the COPY FREEZE optimization to a remote relation.
> + */
> + if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot perform COPY FREEZE on a foreign table")));
> + }
> +
> ```
> Instead of throwing an error, how about we turn that into a warning?
> This way, if someone is batch-importing data for multiple tables, it won’t 
> interrupt their script that generates a COPY for each table.
> Is it better to give them a heads-up without making them modify their 
> commands right away?




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Tom Lane
Jelte Fennema-Nio  writes:
> I guess you probably noticed, but in case you didn't: CI on windows is
> still broken.

Hard to tell, considering the cfbot has been completely wedged
since Sunday.

regards, tom lane




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Andres Freund
Hi,

On 2025-02-05 14:09:02 -0500, Tom Lane wrote:
> Jelte Fennema-Nio  writes:
> > I guess you probably noticed, but in case you didn't: CI on windows is
> > still broken.
> 
> Hard to tell, considering the cfbot has been completely wedged
> since Sunday.

It passed on the postgres repo just before this commit:
  https://cirrus-ci.com/build/4733656549294080
and then failed with it:
  https://cirrus-ci.com/build/5944955807465472

Greetings,

Andres Freund




Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary

2025-02-05 Thread Masahiko Sawada
On Tue, Feb 4, 2025 at 11:48 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Wed, Feb 05, 2025 at 12:08:26PM +0530, Amit Kapila wrote:
> > On Wed, Feb 5, 2025 at 2:06 AM Masahiko Sawada  
> > wrote:
> > >
> > > I've attached the updated patch. The fix needs to be back-patched to
> > > v16 where logical decoding on standby was introduced.
>
> Nice catch and thanks for the patch!
> I also do prefer the current idea (disallow to start if the hot_standby is off
> and there is an existing logical slot).
>
> > Isn't it better to give the new ERROR near the below code?
> > RestoreSlotFromDisk()
> > {
> > ...
> > if (cp.slotdata.database != InvalidOid && wal_level < WAL_LEVEL_LOGICAL)
> > ereport(FATAL,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("logical replication slot \"%s\" exists, but \"wal_level\" <
> > \"logical\"",
> > NameStr(cp.slotdata.name)),
> > ...
> > }
> >
> > If feasible, this will avoid an additional loop over the slots and a
> > new ERROR will be added at the same place as an existing similar
> > ERROR.
>
> Agree.

+1

>
> +  if (SlotIsLogical(s) && !EnableHotStandby)
> +  ereport(FATAL,
> +  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +   errmsg("hot standby must be enabled for pre-existing logical 
> replication slots")));
>
> On a primary lowering wal_level < logical, we'd get something like:
>
> "
> FATAL:  logical replication slot "logical_slot" exists, but "wal_level" < 
> "logical"
> "
>
> What about being close to it, so something like?
>
> "
> FATAL:  logical replication slot "logical_slot" exists, but "hot_standby" = 
> "off"
> "

Looks good, but I'd like to mention that this is required only on
standbys. So how about the following?

FATAL:  logical replication slot "s" exists on the standby, but
"hot_standby" = "off"
HINT:  Change "hot_standby" to be "on".

I've updated the patch accordingly.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v2-0001-Fix-assertion-when-decoding-XLOG_PARAMETER_CHANGE.patch
Description: Binary data


Re: Eagerly scan all-visible pages to amortize aggressive vacuum

2025-02-05 Thread Melanie Plageman
Attached v16 implements the logic to not count pages we failed to
freeze because of cleanup lock contention as eager freeze failures.

Re the code: I didn't put it in the same if statement block as
lazy_scan_prune() because I wanted to avoid another level of
indentation, but I am open to changing it if people hate it.

On Wed, Feb 5, 2025 at 11:47 AM Robert Haas  wrote:
>
> On Tue, Feb 4, 2025 at 5:34 PM Melanie Plageman
>  wrote:
> > I think I misspoke when I said we are unlikely to have contended
> > all-visible pages. I suppose it is trivial to concoct a scenario where
> > there are many pinned all-visible pages.
>
> It's hard to keep heap pages pinned for a really long time, so there
> aren't likely to be a lot of them at once. Maybe async I/O will change
> that somewhat, but the word in this sentence that I would emphasize is
> "concoct". I don't think it's normal for there to be lots of pinned
> pages just hanging around.
>
> (It's a bit easier to have index pages that stay pinned for a long
> time because an index scan can, or at least could last I checked, hold
> onto a pin while the cursor was open even if we're not currently
> executing the query. But for a heap page that doesn't happen, AFAIK.)

I started getting worried thinking about this. If you have a cursor
for select * from a table and fetch forward far enough, couldn't
vacuum fail to get cleanup locks on a whole range of blocks? If those
are all-visible and we don't count them as failures, we could end up
doing the overhead of lazy_scan_noprune() on all of those blocks. Even
though we wouldn't end up doing extra read I/O, I wonder if the extra
CPU overhead is going to be noticeable.

> > However, if we don't count an eagerly scanned page as a failure when
> > we don't get the cleanup lock because we won't have incurred a read,
> > then why would we count any eagerly scanned page in shared buffers as
> > a failure? In the case that we actually tried freezing the page and
> > failed because it was too new, that is giving us information about
> > whether or not we should keep trying to freeze. So, it is not just
> > about doing the read but also about what the failure indicates about
> > the data.
>
> That's a good point, but failure to get a tuple lock isn't a judgement
> either way on whether the XIDs in the page are old or new.

Yea, I guess that means the freeze limit caps wasted read I/O -- but
you could end up doing no read I/O and still hitting the freeze fail
limit because it is trying to detect when data is too new to be worth
eager scanning. But that's probably the behavior we want.

> > Interestingly, we call heap_tuple_should_freeze() in
> > lazy_scan_noprune(), so we actually could tell whether or not there
> > are some tuples on the page old enough to trigger freezing if we had
> > gotten the cleanup lock. One option would be to add a new output
> > parameter to lazy_scan_noprune() that indicates whether or not there
> > were tuples with xids older than the FreezeLimit, and only if there
> > were not, count it as a failed eager scan.
>
> Yeah, that could be done, and it doesn't sound like a bad idea.
> However, I'm also unsure whether we need to add additional logic for
> this case. My intuition is that it just won't happen very often. I'm
> not quite confident that I'm correct about that, but if I had to
> guess, my guess would be "rare scenario".

I've not done this in the attached v16. I have added a comment about it.
I think not doing it is a judgment call and not a bug, right?

- Melanie
From 8bfe510f260066de91f77d704b4eb8c01d67de40 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 27 Jan 2025 12:23:00 -0500
Subject: [PATCH v16] Eagerly scan all-visible pages to amortize aggressive
 vacuum

Aggressive vacuums must scan every unfrozen tuple in order to advance
the relfrozenxid/relminmxid. Because data is often vacuumed before it is
old enough to require freezing, relations may build up a large backlog
of pages that are set all-visible but not all-frozen in the visibility
map. When an aggressive vacuum is triggered, all of these pages must be
scanned. These pages have often been evicted from shared buffers and
even from the kernel buffer cache. Thus, aggressive vacuums often incur
large amounts of extra I/O at the expense of foreground workloads.

To amortize the cost of aggressive vacuums, eagerly scan some
all-visible but not all-frozen pages during normal vacuums.

All-visible pages that are eagerly scanned and set all-frozen in the
visibility map are counted as successful eager freezes and those not
frozen are counted as failed eager freezes.

If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. The number of failures
tolerated is configurable globally and per table. To effectively
amortize aggressive vacuums, we cap the number of successes as well.
Once we reach the maximum number of blocks successfully eager frozen,
eager scanning is disabled fo

Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Tom Lane
Andres Freund  writes:
> On 2025-02-05 14:09:02 -0500, Tom Lane wrote:
>> Hard to tell, considering the cfbot has been completely wedged
>> since Sunday.

> It passed on the postgres repo just before this commit:
>   https://cirrus-ci.com/build/4733656549294080
> and then failed with it:
>   https://cirrus-ci.com/build/5944955807465472

Hmm, maybe it's only the cfbot's web server that's broken,
but none of the pages at http://cfbot.cputube.org
appear to be updating.  What other mechanism are you using
to find the cirrus-ci.com logs?

regards, tom lane




Re: Prevent COPY FREEZE on Foreign tables

2025-02-05 Thread Sami Imseih
my apologies for the top post in the last reply. I hit send too fast :)

I also created a CF entry for this https://commitfest.postgresql.org/52/5544/

I don't think we will need to backpatch, unless someone has a different
opinion about this.

Regards,

Sami




Re: SQL:2011 application time

2025-02-05 Thread Paul Jungwirth

On 2/5/25 05:37, Peter Eisentraut wrote:

On 29.01.25 07:34, Paul Jungwirth wrote:
Is it possible to commit an RI_PLAN_NO_ACTION addition and see if that makes the buildfarm 
failures go away? Here is a proposed patch for that (v48.1). I would understand if this is too 
questionable a practice---but it would be nice to get sufficient test exposure to see if it makes 
a difference. Since I still haven't reproduced this locally (despite running continuously for 
almost a week), it's not an experiment I can do myself. If it *does* make the failures go away, 
then it suggests there is still some latent problem somewhere.


I'm tempted to give this a try.  But the cfbot is currently in a bit of a mess, so I'll wait until 
that is clean again so that we can have a usable baseline to work against.


Okay, thanks! I've been spending some more time on this, but I haven't made 
much progress.

It's surely not as simple as just oid wrapround. Here is a bpftrace script to show when we change 
TransamVariables->nextOid:


BEGIN {
  @setnext = 0
}

u:/home/paul/local/bin/postgres:GetNewObjectId {
  @newoids[tid] += 1
}

u:/home/paul/local/bin/postgres:SetNextObjectId {
  @setnext += 1
}

When I run this during `make installcheck` I get only 29608 total calls to GetNewObjectId, and none 
for SetNextObjectId.


I've also been looking at the dynahash code a bit. With gdb I can give two constraint oids a hash 
collision, but of course that isn't sufficient, since we memcmp the whole key as well.


Last night I started looking at ri_constraint_cache, which is maybe a little more interesting due to 
the syscache invalidation code. A parallel test could cause an invalidation between lines of the 
without_overlaps test. Getting the wrong riinfo could make us treat a RESTRICT constraint as NO 
ACTION. But I don't see any way for that to happen yet.


I have too much confidence in the Postgres codebase to really expect to find bugs in any of these 
places. And yet I don't see how 1772d554b0 could make a RESTRICT test fail, since all its changes 
are wrapped in `if (is_no_action)`---except if the RESTRICT constraint is somehow executing the NO 
ACTION query by mistake.


Anyway I'll keep at it!

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com





Re: RFC: Additional Directory for Extensions

2025-02-05 Thread David E. Wheeler
Hi Andrew,

On Feb 4, 2025, at 15:34, Andrew Dunstan  wrote:

>> I see. I confirm that works. Still, don’t all the other parameters work when 
>> passed to any/all targets? Should this one have an extension-specific name?
> 
> IDK, I don't understand what you think you're saying when you specify 
> --prefix to an extension build (as opposed to an install).

I am unfamiliar with that option. Although `prefix` is mentioned in the PGXS 
docs[1] in the context of other variables, it is not itself documented, neither 
as `prefix=` as Peter suggests, nor as `--prefix`.

At any rate, all the other PGXS variables I’ve used have worked with all the 
make targets, though they obviously don’t necessarily change the behavior of 
all of the targets.

>> ISTM it does more harm than good. The location of extension files should be 
>> highly predictable. I think the search path functionality mitigates the need 
>> for this parameter, and it should be dropped.
> 
> I agree that we should either drop the "directory" directive or fix this 
> patch so it doesn't break it. I have never used the directive, not sure I was 
> even aware of its existence, so I have no objection to dropping it.

I only just started using it, thinking it keeps things better organized. But 
it’s honestly a bit confusing, in that one must set both the `MODULEDIR` 
variable in the `Makefile` and the `directory` variable in the control file.

Best,

David

[1]: https://www.postgresql.org/docs/current/extend-pgxs.html



signature.asc
Description: Message signed with OpenPGP


Re: Show WAL write and fsync stats in pg_stat_io

2025-02-05 Thread Tom Lane
Michael Paquier  writes:
> At the end, we want this patch and this data, and my benchmarcking is
> not showing much differences even if going through a workload with
> many pages, so I've used the version relying entirely on
> track_io_timing and applied it.

Locally, the test added by this commit fails like so:

diff -U3 /home/postgres/pgsql/src/test/regress/expected/stats.out /home/postgres
/pgsql/src/test/regress/results/stats.out
--- /home/postgres/pgsql/src/test/regress/expected/stats.out2025-02-04 12:33
:07.456393545 -0500
+++ /home/postgres/pgsql/src/test/regress/results/stats.out 2025-02-05 13:08
:30.605638432 -0500
@@ -886,7 +886,7 @@
   WHERE context = 'normal' AND object = 'wal';
  ?column? 
 --
- t
+ f
 (1 row)
 
 -

This is pretty repeatable (not perfectly so) in a build with
--enable-debug --enable-cassert --enable-tap-tests --with-llvm
but it usually passes without --with-llvm.  System is fairly
up-to-date RHEL8 on x86_64.  No idea why the buildfarm isn't
unhappy.  Any pointers where to look?

regards, tom lane




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-05 Thread Masahiko Sawada
On Wed, Feb 5, 2025 at 3:49 AM Álvaro Herrera  wrote:
>
> On 2025-Feb-03, Vladlen Popolitov wrote:
>
> > You use FORMAT option to add new formats, filling it with routine name
> > in shared library. As result any caller can call any routine in PostgreSQL
> > kernel.
> > I think, it will start competition, who can find most dangerous routine
> > to call just from COPY FROM command.
>
> Hah.
>
> Maybe it would be a better UI to require that COPY format handlers are
> registered explicitly before they can be used:
>
>  CREATE ACCESS METHOD copy_yaml TYPE copy HANDLER copy_yaml_handler;
>
> ... and then when the FORMAT is not recognized as one of the hardcoded
> methods, we go look in pg_am for one with amtype='c' and the given name.
> That gives you the function that initializes the Copy state.
>
> This is convenient enough because system administrators can add COPY
> formats that anyone can use, and doesn't allow to call arbitrary
> functions via COPY.

I think that the patch needs to check if the function's result type is
COPY_HANDLEROID by using get_func_rettype(), before calling it. But
with this check, we can prevent arbitrary functions from being called
via COPY. Why do we need to extend CREATE ACCESS METHOD too for that
purpose?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Jelte Fennema-Nio
On Wed, 5 Feb 2025 at 00:22, Andres Freund  wrote:
> Pushed like that.
>
> I'll watch CI and BF over the next hours.

I guess you probably noticed, but in case you didn't: CI on windows is
still broken.




Re: Prevent COPY FREEZE on Foreign tables

2025-02-05 Thread Sami Imseih
> Yeah, I'd rather error out than expect users to respond to warnings to the
> effect of "hey, you told us to do something, but we did something else that
> isn't what you asked us to do."  That both retains the broken feature and
> adds more noise, neither of which seems desirable.

I agree.

Sami




Re: Prevent COPY FREEZE on Foreign tables

2025-02-05 Thread Nathan Bossart
On Wed, Feb 05, 2025 at 01:11:44PM -0600, Sami Imseih wrote:
> I don't think we will need to backpatch, unless someone has a different
> opinion about this.

I think this falls into the category of a bug, so the only reason I can see
for not back-patching it is that it could lead to somewhat widespread
breakage for existing scripts, etc. which are arguably kind-of working
today.  That seems unlikely, but it'd be good to hear some other opinions
on the matter.

-- 
nathan




Re: Prevent COPY FREEZE on Foreign tables

2025-02-05 Thread Nathan Bossart
On Wed, Feb 05, 2025 at 01:05:32PM -0600, Sami Imseih wrote:
>> Instead of throwing an error, how about we turn that into a warning?
>> This way, if someone is batch-importing data for multiple tables, it won´t 
>> interrupt their script that generates a COPY for each table.
>> Is it better to give them a heads-up without making them modify their 
>> commands right away?
> 
> Hmm, I prefer we error out the transaction as it can be difficult
> to detect a warning and the user will assume that the
> transaction completed. Also, we currently error out when
> copy freeze is on a parent partition, so I rather we keep
> this behavior consistent.

Yeah, I'd rather error out than expect users to respond to warnings to the
effect of "hey, you told us to do something, but we did something else that
isn't what you asked us to do."  That both retains the broken feature and
adds more noise, neither of which seems desirable.

-- 
nathan




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-05 Thread Álvaro Herrera
On 2025-Feb-05, Masahiko Sawada wrote:

> I think that the patch needs to check if the function's result type is
> COPY_HANDLEROID by using get_func_rettype(), before calling it. But
> with this check, we can prevent arbitrary functions from being called
> via COPY. Why do we need to extend CREATE ACCESS METHOD too for that
> purpose?

It's a nicer UI than a bare CREATE FUNCTION, but perhaps it is overkill.
IIRC the reason we require CREATE ACCESS METHOD for table AMs is so that
we acquire a pg_am entry with an OID that can be referenced from
elsewhere, for instance you can't drop an AM if tables are using it; but
you can't use COPY in rules or anything like that that's going to be
stored permanently.  Perhaps you're right that we don't need this for
extensible COPY FORMAT.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Jelte Fennema-Nio
On Wed, 5 Feb 2025 at 20:21, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2025-02-05 14:09:02 -0500, Tom Lane wrote:
> >> Hard to tell, considering the cfbot has been completely wedged
> >> since Sunday.
>
> > It passed on the postgres repo just before this commit:
> >   https://cirrus-ci.com/build/4733656549294080
> > and then failed with it:
> >   https://cirrus-ci.com/build/5944955807465472
>
> Hmm, maybe it's only the cfbot's web server that's broken,
> but none of the pages at http://cfbot.cputube.org
> appear to be updating.  What other mechanism are you using
> to find the cirrus-ci.com logs?

Ugh yes, cfbot isn't updating at all anymore. So Andres' commits might
very well have fixed the issue, but the prod cfbot is not doing any
builds at the moment...

I'll look into fixing that soonish. I took a quick look and it seems
related to some unexpected response from the Cirrus API.




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Andres Freund
Hi,

On 2025-02-05 14:20:59 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2025-02-05 14:09:02 -0500, Tom Lane wrote:
> >> Hard to tell, considering the cfbot has been completely wedged
> >> since Sunday.
> 
> > It passed on the postgres repo just before this commit:
> >   https://cirrus-ci.com/build/4733656549294080
> > and then failed with it:
> >   https://cirrus-ci.com/build/5944955807465472
> 
> Hmm, maybe it's only the cfbot's web server that's broken,
> but none of the pages at http://cfbot.cputube.org
> appear to be updating.

It does look to me like cfbot isn't updating the relevant branches, i.e. it's
not just the website that's not updating, or CI somehow not triggering after
cfbot updates the relevant branches.


> What other mechanism are you using to find the cirrus-ci.com logs?

This isn't run via cfbot, but via postgres' github mirror. Whenever the repo
sync pushes a change it also triggers CI.

You can see all the runs of that on
https://cirrus-ci.com/github/postgres/postgres/


CI on windows failed in ecpg for a few days, there were just two master runs
that didn't fail inbetween that being fixed and the failure I linked to
above. But recovery/037_invalid_database didn't fail at that time.

Greetings,

Andres Freund




Re: Windows CFBot is broken because ecpg dec_test.c error

2025-02-05 Thread Jelte Fennema-Nio
On Wed, 5 Feb 2025 at 20:29, Jelte Fennema-Nio  wrote:
> I'll look into fixing that soonish. I took a quick look and it seems
> related to some unexpected response from the Cirrus API.

Okay I think I got it running again. It didn't like that there was no
commitfest with number 54 yet. So I created one, and it's doing more
than before now. I'll check after dinner if it's still running
correctly then.




Re: Prevent COPY FREEZE on Foreign tables

2025-02-05 Thread Sami Imseih
> so the only reason I can see
> for not back-patching it is that it could lead to somewhat widespread
> breakage for existing scripts, etc. which are arguably kind-of working
> today.

That is my thought for not backpatching. It's not breaking the COPY
command, it's just not applying an optimization.

Let's see what others think.

Sami




Re: Failed assertion with jit enabled

2025-02-05 Thread Andres Freund
Hi,

On 2025-02-05 13:07:58 -0500, Tom Lane wrote:
> Bertrand Drouvot  writes:
> > I did look more closely (knowing that it works for you) and the issue is 
> > linked
> > to not using --with-llvm. Inded, I used to use --with-llvm but removed it 
> > some
> > time ago for testing.
> > So the failed build was not using --with-llvm and was relying on an old 
> > version
> > of llvmjit.so (build from the last time I used --with-llvm)...
> 
> Hmm ... I don't understand why a non-JIT build of the core would
> attempt to load llvmjit.so.  Seems like trouble waiting to happen,
> given how closely coupled the core and JIT are.  (The .bc files are
> pretty much guaranteed to be out of sync in such a case.)

To support a) packaging postgres with split-out JIT support b) pluggability of
JIT backends, the only way we detect if JIT is supported is by trying to load
the configured JIT backend (jit_provider GUC).

Greetings,

Andres Freund




Re: new commitfest transition guidance

2025-02-05 Thread Peter Geoghegan
On Tue, Feb 4, 2025 at 8:10 PM Tom Lane  wrote:
> As of right now, I see that 79 CF entries have been manually pushed to
> 2025-03 (but it's hard to tell how many of those were moved before
> 2025-01 closed).  180 live entries are still in 2025-01, including
> 20 RfC ones.  I think this experiment is already proving to be a
> failure, and if you increase the cost of compliance substantially,
> people just won't do it at all.

Evidently this new policy is why my skip scan patch series wasn't
being tested by CI.

I just don't think that this new policy makes sense. At least not as
implemented. Do we all now need to set ourselves reminders to re-enter
patches to each CF, lest we be accused of abandoning patches due to a
lack of interest?

-- 
Peter Geoghegan




Re: new commitfest transition guidance

2025-02-05 Thread Tom Lane
Peter Geoghegan  writes:
> Evidently this new policy is why my skip scan patch series wasn't
> being tested by CI.

Well no, the reason CI wasn't testing anything was the cfbot was
broken.  See nearby "CFBot is broken" thread.

> I just don't think that this new policy makes sense. At least not as
> implemented.

I'm a little dubious about it too, but let's not conflate this
question with plain ol' infrastructure bugs.

regards, tom lane




Re: Statistics Import and Export

2025-02-05 Thread Corey Huinker
On Sat, Jan 25, 2025 at 10:02 AM Corey Huinker 
wrote:

> Fixed. Holding off on posting updated patch pending decision on what's the
>>> best thing to do with partitioned indexes.
>>
>>
> Though I was able to get it to work multiple ways, the one that seems to
> make the most sense given Michael and Jeff's feedback is to handle
> partitioned indexes, do the ACL checks against the underlying table just
> like indexes, but take a ShareUpdateExclusiveLock on the partitioned index
> as well, because it doesn't have the same special case
> in check_inplace_rel_lock() that regular indexes do.
>
> In the future, if check_inplace_rel_lock() changes its special case to
> include partitioned indexes, then this code can get marginal simpler.
>
> New patchset, no changes to 0002 as work continues there.
>

Thought I sent this to the list, but apparently I only sent to Michael. The
changes referenced are in v45, already posted to the list.


Re: Show WAL write and fsync stats in pg_stat_io

2025-02-05 Thread Michael Paquier
On Wed, Feb 05, 2025 at 09:52:14PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> Yeah, if we want to assume we can see stats counts left over from
> initdb, we have to put this in a TAP test, though I dunno if that is
> the most appropriate one.

A second option I can think of for the reads is a SQL query in
pg_walinspect.  We are sure that we have a xlogreader context there,
forcing reads.

Anyway, I would just stick all that to TAP, like the attached in 027,
where we would rely on the startup process to read data, and the
checkpointer to initialize a segment for the primary.  Perhaps not the
best position, but we already have similar queries in this test, and
these two are cheap.  Thoughts about the attached?

> Now that I've looked at the tests a bit, I'm also distressed
> by this test pattern:
> 
> SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 
> 'commit_timestamp' \gset
> SELECT pg_stat_reset_slru();
> SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM 
> pg_stat_slru WHERE name = 'commit_timestamp';
> 
> This assumes that the execution time of pg_stat_reset_slru() is more
> than the system clock resolution.  I won't be surprised to see that
> fail in the future.  We did discover recently that gettimeofday is
> good to the microsecond on most modern platforms [1], but it won't
> get any better than that, while our machines keep getting faster.
> Just for reference, on my hardly-bleeding-edge-anymore workstation:

Hmm.  Interesting.
--
Michael
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index bab7b28084b..a04ecc4d2db 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -63,6 +63,26 @@ $node_standby_1->append_conf('postgresql.conf',
 	'max_standby_streaming_delay = 600s');
 $node_standby_1->start;
 
+# Check some WAL statistics.  The standby should have done WAL reads in
+# the startup process when starting, and the primary WAL some writes with
+# its checkpointer.
+my $result = $node_primary->safe_psql(
+	'postgres',
+	qq{SELECT object, context, writes > 0 AS writes_done
+  FROM pg_stat_io
+  WHERE context = 'init' AND
+object = 'wal' AND
+backend_type = 'checkpointer'});
+is($result, qq(wal|init|t), 'check contents of WAL stats on primary');
+$result = $node_standby_1->safe_psql(
+	'postgres',
+	qq{SELECT object, context, reads > 0 AS reads_done
+  FROM pg_stat_io
+  WHERE context = 'normal' AND
+object = 'wal' AND
+backend_type = 'startup'});
+is($result, qq(wal|normal|t), 'check contents of WAL stats on standby');
+
 my $dlpath = dirname($ENV{REGRESS_SHLIB});
 my $outputdir = $PostgreSQL::Test::Utils::tmp_check;
 
@@ -163,7 +183,7 @@ $node_primary->safe_psql('postgres', 'CREATE EXTENSION pg_stat_statements');
 # This gathers data based on the first characters for some common query types,
 # checking that reports are generated for SELECT, DMLs, and DDL queries with
 # CREATE.
-my $result = $node_primary->safe_psql(
+$result = $node_primary->safe_psql(
 	'postgres',
 	qq{WITH select_stats AS
   (SELECT upper(substr(query, 1, 6)) AS select_query
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 9a02481ee7e..7d91f047bb3 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -862,33 +862,6 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
  t
 (1 row)
 
--- Test pg_stat_io for WAL in an init context, that should do writes
--- and syncs.
-SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
-  FROM pg_stat_io
-  WHERE context = 'init' AND object = 'wal' \gset io_sum_wal_init_
-SELECT :io_sum_wal_init_writes > 0;
- ?column? 
---
- t
-(1 row)
-
-SELECT current_setting('fsync') = 'off'
-  OR :io_sum_wal_init_fsyncs > 0;
- ?column? 
---
- t
-(1 row)
-
--- Test pg_stat_io for WAL in a normal context, that should do reads as well.
-SELECT SUM(reads) > 0
-  FROM pg_stat_io
-  WHERE context = 'normal' AND object = 'wal';
- ?column? 
---
- t
-(1 row)
-
 -
 -- Test that resetting stats works for reset timestamp
 -
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 901e7bd56e3..11628ebc8a1 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -442,20 +442,6 @@ SELECT (current_schemas(true))[1] = ('pg_temp_' || beid::text) AS match
 FROM pg_stat_get_backend_idset() beid
 WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
 
--- Test pg_stat_io for WAL in an init context, that should do writes
--- and syncs.
-SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
-  FROM pg_stat_io
-  WHERE context = 'init' AND object = 'wal' \gset io_sum_wal_init_
-SELECT :io_sum_wal_init_writes > 0;
-SELECT current_setting('fsync') = 'off'
-  OR :io_sum_wal_init_fsyncs > 0;
-
--- Test pg_stat_io for WAL in a normal context, that should do reads as well.
-SELECT SUM(reads) > 0
- 

  1   2   >