date:20241021

Re: [PATCH] Add array_reverse() function

2024-10-21 Thread Joel Jacobson

On Mon, Oct 21, 2024, at 11:06, Aleksander Alekseev wrote:
> Hi,
>
> Recently I wanted to call array_reverse() and discovered that we still
> don't have it. I'm not the first one who encountered this limitation.
> array_reverse() was requested at least since 2009 [1] and the
> workaround on PostgreSQL Wiki is dated 2010 [2].
>
> The proposed patch adds this function. Only the first dimension of the
> array is reversed, for consistency with the existing functions such as
> array_shuffle() [3].

+1

I've needed this many times.

/Joel

Re: New "raw" COPY format

2024-10-21 Thread jian he

On Sat, Oct 19, 2024 at 11:33 PM Joel Jacobson  wrote:
>
> > ProcessCopyOptions
> > /* Extract options from the statement node tree */
> > foreach(option, options)
> > {
> > }
> > /* --- DELIMITER option --- */
> > /* --- NULL option --- */
> > /* --- QUOTE option --- */
> > Currently the regress test passed, i think that means your refactor is fine.
>
> I believe that a passing test indicates it might be okay,
> but a failing test definitely means it's not. :D
>
> I've meticulously refactored one option at a time, checking which code in
> ProcessCopyOptions depends on each option field to ensure the semantics
> are preserved.
>
> I think the changes are easy to follow, and it's clear that each change is
> correct when looking at them individually, though it might be more challenging
> when viewing the total change.
>
> I've tried to minimize code movement, preserving as much of the original
> code placement as possible.
>
> > in ProcessCopyOptions, maybe we can rearrange the code after the
> > foreach loop (foreach(option, options)
> > based on the parameters order in
> > https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
> > so we can review it by comparing the refactoring with the
> > sql-copy.html Parameters section's description.
>
> That would be nice, but unfortunately, it's not possible because the order of
> the option code blocks matters due to the setting of defaults in else/else
> if branches when an option is not specified.
>
> For example, in the documentation, DEFAULT precedes QUOTE,
> but in ProcessCopyOptions, the QUOTE code block must come before
> the DEFAULT code block due to the check:
>
>/* Don't allow the CSV quote char to appear in the default string. */
>

> I also believe there's value in minimizing code movement.
>
but v12-0001 was already hugely refactored.

make the ProcessCopyOptions process in following order:
1. Extract options from the statement node tree
2. checking each option, if not there set default value.
3. checking for interdependent options

I still think
making step2 aligned with the doc parameter section order will make it
more readable.

based on your patch
(v12-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch)
I put some checking to step3, make step2 checking order aligned with doc.


v12-0001-make-the-ProcessCopyOptions-option-aligned-wi.no-cfbot
Description: Binary data

Re: New "raw" COPY format

2024-10-21 Thread Joel Jacobson

On Mon, Oct 21, 2024, at 16:35, jian he wrote:
> make the ProcessCopyOptions process in following order:
> 1. Extract options from the statement node tree
> 2. checking each option, if not there set default value.
> 3. checking for interdependent options
>
> I still think
> making step2 aligned with the doc parameter section order will make it
> more readable.
>
> based on your patch
> (v12-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch)
> I put some checking to step3, make step2 checking order aligned with doc.

Smart to move the interdependent check to the designated section for it,
that's exactly the right place for it.

Really nice the order in the code now is aligned with the doc order.

/Joel

Re: [PATCH] Add array_reverse() function

2024-10-21 Thread Tom Lane

Ashutosh Bapat  writes:
> On Mon, Oct 21, 2024 at 2:36 PM Aleksander Alekseev
>  wrote:
> + /*
> + * There is no point in reversing empty arrays or arrays with less than
> + * two items.
> + */
> + if (ARR_NDIM(array) < 1 || ARR_DIMS(array)[0] < 2)
> + PG_RETURN_ARRAYTYPE_P(array);

> But it returns the input array as is. I think it should at least make
> a new copy of input array.

I don't think that's really necessary.  We have other functions that
will return an input value unchanged without copying it.  A
longstanding example is array_larger.  Also, this code looks to be
copied from array_shuffle.

regards, tom lane

Re: Using Expanded Objects other than Arrays from plpgsql

2024-10-21 Thread Michel Pelletier

On Sun, Oct 20, 2024 at 8:46 PM Tom Lane  wrote:

> Michel Pelletier  writes:
> > On Sun, Oct 20, 2024 at 10:13 AM Tom Lane  wrote:
>

(from thread
https://www.postgresql.org/message-id/CACxu%3DvJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg%40mail.gmail.com
)

> >> But it seems like we could get an easy win by adjusting
> >> plpgsql_exec_function along the lines of
> >> ...
>
> > I tried this change and couldn't get it to work, on the next line:
> > if (VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(var->value)))
> > var->value might not be a pointer, as it seems at least from my gdb
> > scratching, but say an integer.  This segfaults on non-array but
> > non-expandable datum.
>
>   We need the same test that exec_assign_value makes,
> !var->datatype->typbyval, before it's safe to apply DatumGetPointer.
> So line 549 needs to be more like
>
> -if (!var->isnull && var->datatype->typisarray)
> +if (!var->isnull && !var->datatype->typbyval)
>
> > Another comment that caught my eye was this one:
> >
> https://github.com/postgres/postgres/blob/master/src/pl/plpgsql/src/pl_exec.c#L8304
> > Not sure what the implication is there.
>
> Yeah, that's some more unfinished business.  I'm not sure if it
> matters to your use-case or not.
>
> BTW, we probably should move this thread to pgsql-hackers.

And here we are, thanks for your help on this Tom.  For some thread
switching context for others, I'm writing a postgres extension that wraps
the SuiteSparse:GraphBLAS API and provides new types for sparse and dense
matrices and vectors.  It's like a combination of numpy and scipy.sparse
but for Postgres with an emphasis on graph analytics as sparse adjacency
matrices using linear algebra.

I use the expandeddatum API to flatten and expand on disk compressed
representations of these objects into "live" in-memory objects managed by
SuiteSparse.  All GraphBLAS objects are opaque handles, and my expanded
objects are essentially a box around this handle.  I use memory context
callbacks to free the handles when the memory context of the box is freed.
This works very well and I've made a lot of progress on creating a very
clean algebraic API, here are the doctests for matrices, this is all
generated from live code!

https://onesparse.github.io/OneSparse/test_matrix_header/

Doing some benchmarking I noticed that when writing some simple graph
algorithms in plpgsql, my objects were being constantly flattened and
expanded.  Posting my question above to pgsql-general Tom gave me some tips
and suggested I move the thread here.

My non-expert summary: plpgsql only optimizes for expanded objects if they
are arrays.  Non array expanded objects get flattened/expanded on every
use.  For large matrices and vectors this is very expensive.  Ideally I'd
like to expand my object, use it throughout the function, return it to
another function that may use it, and only flatten it when it goes to disk
or it's completely unavoidable.  The comment in expandeddatum.h really kind
of sells this as one of the main features:

 * An expanded object is meant to survive across multiple operations, but
 * not to be enormously long-lived; for example it might be a local variable
 * in a PL/pgSQL procedure.  So its extra bulk compared to the on-disk
format
 * is a worthwhile trade-off.

In my case it's not a question of saving bulk, the on disk representation
of a matrix is not useful at compute time, it needs to be expanded (using
GraphBLAS's serialize/deserialize API) for it to be usable for matrix
operations like matmul.  In most cases algorithms using these objects
iterate in a loop, doing various algebraic operations almost always
involving a matmul until they converge on a stable solution or they exhaust
the input elements.  Here for example is a "minimum parent BFS" that takes
a graph and a starting node, and traverses the graph breadth first,
computing a vector of every node and its minimum parent id.

CREATE OR REPLACE FUNCTION bfs(graph matrix, start_node bigint)
> RETURNS vector LANGUAGE plpgsql AS
> $$
> DECLARE
> bfs_vector vector = vector('int32');
> next_vector vector = vector('int32');
> BEGIN
> bfs_vector = set_element(bfs_vector, start_node, 1);
> WHILE sssp_vector != next_vector LOOP
> next_vector = dup(bfs_vector);
> bfs_vector = vxm(bfs_vector, graph, 'any_secondi_int32',
>  w=>bfs_vector, accum=>'min_int32');
> END LOOP;
> RETURN bfs_vector;
> end;
> $$;
>

(If you're wondering "Why would anyone do it this way" it's because
SuiteSparse is optimized for parallel sparse matrix multiplication and has
a JIT compiler that can target multiple architectures, at the moment CPUs
and CUDA GPUs.  Reusing the same Linear Algebra already prevalent in graph
theory means not having to think about any low level implementation issues
and having code that is completely portable from CPU to GPU or other

Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-21 Thread Tender Wang

Alvaro Herrera  于2024年10月18日周五 22:52写道：

> On 2024-Sep-26, Jehan-Guillaume de Rorthais wrote:
>
> > REL_14_STABLE backport doesn't seem trivial, so I'll wait for some
> > feedback, review & decision before going further down in backpatching.
>
> Hi, thanks for these patches.  I have made some edits of my own.  In the
> end I decided that I quite liked the new structure of the
> addFkConstraint() function and friends.  I added a bunch of comments and
> changed names somewhat.  Also, I think the benefit of the refactoring is
> more obvious when all patches are squashed together, so I did that.
>
> For branch 14, I opted to make it delete the constraints on detach.
> This isn't ideal but unless somebody wants to spend a lot more time on
> this, it seems the best we can do.  Leaving broken constraints around
> seems worse.  The patch for 14 doesn't apply to 13 sadly.  I didn't have
> time to verify why, but it seems there's some rather larger conflict in
> one spot.
>
> I hope to be able to get these pushed over the weekend.  That would give
> us a couple of weeks before the November releases (but my availability
> in those two weeks might be spotty.)
>
> I spent some time going through your test additions and ended up with
> your functions in this form:
>
> -- displays constraints in schema fkpart12
> CREATE OR REPLACE FUNCTION
> fkpart12_constraints(OUT conrel regclass, OUT conname name,
>  OUT confrelid regclass, OUT conparent text)
> RETURNS SETOF record AS $$
>   WITH RECURSIVE arrh AS (
>SELECT oid, conrelid, conname, confrelid, NULL::name AS conparent
>  FROM pg_constraint
> WHERE connamespace = 'fkpart12'::regnamespace AND
>   contype = 'f' AND conparentid = 0
> UNION ALL
>SELECT c.oid, c.conrelid, c.conname, c.confrelid,
>   (pg_identify_object('pg_constraint'::regclass, arrh.oid,
> 0)).identity
>  FROM pg_constraint c
>   JOIN arrh ON c.conparentid = arrh.oid
>   ) SELECT conrelid::regclass, conname, confrelid::regclass, conparent
>   FROM arrh
>   ORDER BY conrelid::regclass::text, conname;
> $$
> LANGUAGE SQL;
>
> -- displays triggers in tables in schema fkpart12
> CREATE OR REPLACE FUNCTION
> fkpart12_triggers(OUT tablename regclass, OUT constr text,
>   OUT samefunc boolean, OUT parent text)
> RETURNS SETOF record AS $$
>   WITH RECURSIVE arrh AS (
>SELECT t.oid, t.tgrelid::regclass as tablename, tgname,
>   t.tgfoid::regproc as trigfn,
>   (pg_identify_object('pg_constraint'::regclass, c.oid,
> 0)).identity as constr,
>   NULL::bool as samefunc,
>   NULL::name AS parent
>  FROM pg_trigger t
> LEFT JOIN pg_constraint c ON c.oid = t.tgconstraint
> WHERE (SELECT relnamespace FROM pg_class WHERE oid = t.tgrelid) =
> 'fkpart12'::regnamespace
>   AND c.contype = 'f' AND t.tgparentid = 0
> UNION ALL
>SELECT t2.oid, t2.tgrelid::regclass as tablename, t2.tgname,
>   t2.tgfoid::regproc as trigfn,
>   (pg_identify_object('pg_constraint'::regclass, c2.oid,
> 0)).identity,
>   arrh.trigfn = t2.tgfoid as samefunc,
>   replace((pg_identify_object('pg_trigger'::regclass,
> t2.tgparentid, 0)).identity,
>   t2.tgparentid::text, 'TGOID')
>  FROM pg_trigger t2
> LEFT JOIN pg_constraint c2 ON c2.oid = t2.tgconstraint
>  JOIN arrh ON t2.tgparentid = arrh.oid
>   ) SELECT tablename, constr, samefunc, parent
>   FROM arrh
>   ORDER BY tablename::text, constr;
> $$
> LANGUAGE SQL;
>
> However, in the end I think this is a very good technique to verify that
> the fix works correctly, but it's excessive to include these results in
> the tests forevermore.  So I've left them out for now.  Maybe we should
> reconsider on the older branches?
>
> Hi,

I looked at the patch on master.   I gave a little comment in [1]
I reconsider the codes.  I suspect that we don't need the below if
statement anymore.
/*
* If the referenced side is partitioned (which we know because our
* parent's constraint points to a different relation than ours) then
* we must, in addition to the above, create pg_constraint rows that
* point to each partition, each with its own action triggers.
*/
if (parentConForm->conrelid != conform->conrelid)
{
...
}

The above contidion is always true according to my test.
I haven't figured out an anti-case.
Any thoughts?

[1]
https://www.postgresql.org/message-id/CAHewXNkuU2V7GfgFkwd265RJ99%2BBfJueNEZhrHSk39o3thqxNA%40mail.gmail.com

-- 
Thanks,
Tender Wang

Re: type cache cleanup improvements

2024-10-21 Thread jian he

thanks for the
INJECTION_POINT("typecache-before-rel-type-cache-insert");
Now I have better understanding of the whole changes.


+/*
+ * Add possibly missing RelIdToTypeId entries related to TypeCacheHas
+ * entries, marked as in-progress by lookup_type_cache().  It may happen
+ * in case of an error or interruption during the lookup_type_cache() call.
+ */
+static void
+finalize_in_progress_typentries(void)
comment typo. "TypeCacheHas" should be "TypeCacheHash"

Re: type cache cleanup improvements

2024-10-21 Thread Alexander Korotkov

On Mon, Oct 21, 2024 at 10:51 AM jian he  wrote:
>
> thanks for the
> INJECTION_POINT("typecache-before-rel-type-cache-insert");
> Now I have better understanding of the whole changes.
>
>
> +/*
> + * Add possibly missing RelIdToTypeId entries related to TypeCacheHas
> + * entries, marked as in-progress by lookup_type_cache().  It may happen
> + * in case of an error or interruption during the lookup_type_cache() call.
> + */
> +static void
> +finalize_in_progress_typentries(void)
> comment typo. "TypeCacheHas" should be "TypeCacheHash"

Thank you.  This also has been spotted by Alexander Lakhin (off-list).
Fixed in the attached revision of the patchset.

--
Regards,
Alexander Korotkov
Supabase


v16-0001-Update-header-comment-for-lookup_type_cache.patch
Description: Binary data


v16-0002-Avoid-looping-over-all-type-cache-entries-in-Typ.patch
Description: Binary data

Re: type cache cleanup improvements

2024-10-21 Thread Alexander Korotkov

On Mon, Oct 21, 2024 at 8:40 AM Andrei Lepikhov  wrote:
>
> On 21/10/2024 06:32, Dagfinn Ilmari Mannsåker wrote:
> > Alexander Korotkov  writes:
> >
> >> +static Oid *in_progress_list;
> >> +static int  in_progress_list_len;
> >> +static int  in_progress_list_maxlen;
> >
> > Is there any particular reason not to use pg_list.h for this?
> Sure. The type cache lookup has to be as much optimal as possible.
> Using an array and relating sequential access to it, we avoid memory
> allocations and deallocations 99.9% of the time. Also, quick access to
> the single element (which we will have in real life almost all of the
> time) is much faster than employing list machinery.

+1,
List with zero elements has to be NIL.  That means continuous
allocations/deallocations.

--
Regards,
Alexander Korotkov
Supabase

RE: msvc directory missing in PostgreSQL 17.0

2024-10-21 Thread Mark Hill

Thanks Bill!   Do you have a sample meson command for building that you could 
share?

Thanks, Mark

From: Bill Smith 
Sent: Friday, October 18, 2024 4:11 PM
To: Mark Hill 
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: msvc directory missing in PostgreSQL 17.0

EXTERNAL

On Oct 18, 2024, at 4:05 PM, Mark Hill 
mailto:mark.h...@sas.com>> wrote:

I just downloaded the Postgresql 17.0 and the extracted directory 
postgresql-17.0/src/tools does not contain
the “msvc” directory as it has since I’ve been building Postgres 12, 14, and 
16.   How am I supposed to build
PostgreSQL for Windows now?   I’m not sure what msvc_gendef.pl does.   How do 
we build Postgres for Windows
using Visual Studio?

You’ll want to use meson to build PG.  See the docs here:
[cid:image001.png@01DB23D4.03B42C60]
17.7. Platform-Specific 
Notes
postgresql.org

Thanks, Mark

Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-21 Thread Alvaro Herrera

On 2024-Oct-21, Tender Wang wrote:

> I suspect that we don't need the below if
> statement anymore.
> /*
> * If the referenced side is partitioned (which we know because our
> * parent's constraint points to a different relation than ours) then
> * we must, in addition to the above, create pg_constraint rows that
> * point to each partition, each with its own action triggers.
> */
> if (parentConForm->conrelid != conform->conrelid)
> {
> ...
> }
> 
> The above contidion is always true according to my test.
> I haven't figured out an anti-case.

You're right, this is useless, we can remove the 'if' line.  I kept the
block though, to have a place for all those local variable declarations
(otherwise the code looks messier than it needs to).

I also noticed that addFkRecurseReferenced() is uselessly taking a List
**wqueue argument but doesn't use it, so I removed it (as fallout, other
routines don't need it either, especially DetachPartitionFinalize).  I
added some commentary on the creation of dependencies in
addFkConstraint().

I also include a few other cosmetic changes; just comment layout
changes.

This time I attach the patch for master only; the others have changed
identically.  14 is unchanged from before.  I figured that the conflict
from 14 to 13 was trivial to resolve -- it was just because of DETACH
CONCURRENTLY, so some code moves around, but that's all that happens.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 205f6867eb93a47ba1db3977cfce9cb37389ddc5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Mon, 21 Oct 2024 23:14:12 +0200
Subject: [PATCH v5] Restructure foreign key handling code for ATTACH/DETACH

... to fix bugs when the referenced table is partitioned.

The catalog representation we chose for foreign keys connecting
partitioned tables (in commit f56f8f8da6af) is inconvenient, in the
sense that a standalone table has a different way to represent the
constraint when referencing a partitioned table, than when the same
table becomes a partition (and vice versa).  Because of this, we need to
create additional catalog rows on detach (pg_constraint and pg_trigger),
and remove them on attach.  We were doing some of those things, but not
all of them, leading to missing catalog rows in certain cases.

The worst problem seems to be that we are missing action triggers after
detaching a partition, which means that you could update/delete rows
from the referenced partitioned table that still had referencing rows on
that table, the server failing to throw the required errors.

!!!
Note that this means existing databases with FKs that reference
partitioned tables might have rows that break relational integrity, on
tables that were once partitions on the referencing side of the FK.

Another possible problem is that trying to reattach a table
that had been detached would fail indicating that internal triggers
cannot be found, which from the user's point of view is nonsensical.

In branches 15 and above, we fix this by creating a new helper function
addFkConstraint() which is in charge of creating a standalone
pg_constraint row, and repurposing addFkRecurseReferencing() and
addFkRecurseReferenced() so that they're only the recursive routine for
each side of the FK, and they call addFkConstraint() to create
pg_constraint at each partitioning level and add the necessary triggers.
These new routines can be used during partition creation, partition
attach and detach, and foreign key creation.  This reduces redundant
code and simplifies the flow.

In branches 14 and 13, we have a much simpler fix that consists on
simply removing the constraint on detach.  The reason is that those
branches are missing commit f4566345cf40, which reworked the way this
works in a way that we didn't consider back-patchable at the time.

We opted to leave branch 12 alone, because it's different from branch 13
enough that the fix doesn't apply; and because it is going in EOL mode
very soon, patching it now might be worse since there's no way to undo
the damage if it goes wrong.

Existing databases might need to be repaired.

In the future we might want to rethink the catalog representation to
avoid this problem, but for now the code seems to do what's required to
make the constraints operate correctly.

Co-authored-by: Jehan-Guillaume de Rorthais 
Co-authored-by: Tender Wang 
Co-authored-by: Alvaro Herrera 
Reported-by: Guillaume Lelarge 
Reported-by: Jehan-Guillaume de Rorthais 
Reported-by: Thomas Baehler (SBB CFF FFS) 
Discussion: https://postgr.es/m/20230420144344.40744130@karst
Discussion: https://postgr.es/m/20230705233028.2f554f73@karst
Discussion: https://postgr.es/m/gvap278mb02787e7134fd691861635a8bc9...@gvap278mb0278.chep278.prod.outlook.com
Discussion: https://postgr.es/m/18541-628a61bc267cd...@postgresql.org
---
 src/backend/commands/tablecmds.c  | 671 ++
 src/test/regress/expected/foreign_key.out |  96 
 src/test/regress/s

Re: Refactor GetLockStatusData() by skipping unused backends and groups

2024-10-21 Thread Bertrand Drouvot

Hi,

On Mon, Oct 21, 2024 at 09:19:49AM +0900, Fujii Masao wrote:
> Hi,
> 
> While reading the fast-path lock code, I noticed that GetLockStatusData()
> checks all slots for every backend to gather fast-path lock data. However,
> backends with PID=0 don't hold fast-path locks, right?

I think the same as those are not a "regular" backend.

> If so, we can
> improve efficiency by having GetLockStatusData() skip those backends early.

Agree.

> Additionally, when GetLockStatusData() checks a backend, it currently
> goes through all the slots accross its groups. Each group has 16 slots,
> so if a backend has 4 groups (this can change depending on 
> max_locks_per_transaction),
> that means checking 64 slots. Instead, we could refactor the function
> to skip groups without registered fast-path locks, improving performance.
> Since each set of 16 slots is packed into a uint32 variable 
> (PGPROC->fpLockBits[i]),
> it’s easy to check if a group has any fast-path locks.
> 
> I've attached a patch that implements these changes. This refactoring is
> especially useful when max_connections and max_locks_per_transaction are
> set high, as it reduces unnecessary checks across numerous slots.

I think that your refactoring proposal makes sense.

A few random comments on it:

1 ===

+   /* Skip backends with pid=0, as they don't hold fast-path locks 
*/
+   if (proc->pid == 0)
+   continue;

What about adding a few words in the comment that it represents prepared
transactions? Or maybe add a new macro (say IS_PREPARED_TRANSACTION(proc)) and
use it in the few places where we check for "PGPROC"->pid == 0 or "PGPROC"->pid 
!= 0?

2 ===

-   for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
+   for (uint32 g = 0; g < FastPathLockGroupsPerBackend; g++)
{

As FP_LOCK_SLOTS_PER_BACKEND = (FP_LOCK_SLOTS_PER_GROUP * 
FastPathLockGroupsPerBackend)
then the proposed approach starts with a "smaller" loop which makes sense.

and then the patch:

+  for (int j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)

So that we are covering "FP_LOCK_SLOTS_PER_BACKEND" but discarded groups that
do not contain registered fast-path locks:

+   /* Skip unallocated groups */
+   if (proc->fpLockBits[g] == 0)

That does make sense to me.

One remark about the comment, what about?

s/Skip unallocated groups/Skip groups without registered fast-path locks./?

or at least add a "." at the end to be consistent with:

"/* Skip unallocated slots. */"

3 ===

One thing that worry me a bit is that we "lost" the FP_LOCK_SLOTS_PER_BACKEND 
usage,
so that if there is a change on it (for wathever reason) then we probably need 
to
be careful that the change would be reflected here too.

So, what about to add an Assert to check that we overall iterated over 
FP_LOCK_SLOTS_PER_BACKEND?

4 ===

Then the patch does move existing code around and just add a call to 
FAST_PATH_SLOT()
to get fast-path lock slot index based on the group and slot indexes we are 
iterating
on.

That does make sense to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-10-21 Thread Kirill Reshke

Hi!

On Thu, 12 Sept 2024 at 15:13, jian he  wrote:
>
> On Mon, Sep 9, 2024 at 10:34 PM Jim Jones  wrote:
> >
> >
> > Hi there
> >
> > On 26.08.24 02:00, jian he wrote:
> > > hi all.
> > > patch updated.
> > > simplified the code a lot.
> > >
> > > idea is same:
> > > COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
> > >
> > > If the STDIN number of columns is the same as the target table, then
> > > InputFunctionCallSafe
> > > call failure will make that column values be null.
> > >
> > >
> > > If the STDIN number of columns is not the same as the target table, then 
> > > error
> > > ERROR:  missing data for column \"%s\"
> > > ERROR:  extra data after last expected column
> > > which is status quo.
> >
> > I wanted to give it another try, but the patch does not apply ...
> >
>
> here we are.
> please check the attached file.

Hi!
v4 no longer applies. It now conflicts with
e7834a1a251d4a28245377f383ff20a657ba8262.
Also, there were review comments.

So, I decided to rebase.

Review comments from [1] applied partially. I didn't do "and continue
with" -> "and will continue with" substitution as suggested, because
the first options are used for `ignore` doc one lines above. So, I
just don't know how to change this correctly. We definitely don't want
two separate forms of saying the same in 2 consecutive lines.

I did small changes:
1) added
`-- tests for set_to_null`
 option in the test script akin to 4ac2a9beceb10d44806d2cf157d5a931bdade39e

2) I rephrased
Allow "stop", or "ignore", "set_to_null" values
to
Allow "stop", "ignore", "set_to_null" values

PFA.

[1] 
https://www.postgresql.org/message-id/b26e9c6c-75bf-45ea-8aea-346dda3bd445%40uni-muenster.de
-- 
Best regards,
Kirill Reshke


v5-0001-on_error-set_to_null.patch
Description: Binary data

Re: Add pg_ownerships and pg_privileges system views

2024-10-21 Thread Alvaro Herrera

On 2024-Oct-20, Alvaro Herrera wrote:

> SELECT
> pg_shdepend.classid,
> pg_shdepend.objid,
> pg_shdepend.objsubid,
> identify.*,
> aclexplode.*
> FROM pg_catalog.pg_shdepend
> JOIN pg_catalog.pg_database ON pg_database.datname = 
> current_database() AND pg_database.oid = pg_shdepend.dbid
> JOIN pg_catalog.pg_authid ON pg_authid.oid = pg_shdepend.refobjid AND 
> pg_shdepend.refclassid = 'pg_authid'::regclass,
> LATERAL 
> pg_catalog.pg_identify_object(pg_shdepend.classid,pg_shdepend.objid,pg_shdepend.objsubid)
>  AS identify,
> LATERAL 
> pg_catalog.aclexplode(pg_catalog.pg_get_acl(pg_shdepend.classid,pg_shdepend.objid,pg_shdepend.objsubid))
>  AS aclexplode
> WHERE pg_shdepend.deptype = 'a' AND pg_shdepend.dbid = (( SELECT 
> pg_database_1.oid
>FROM pg_database pg_database_1
>   WHERE pg_database_1.datname = current_database()))
> ) AS a ;

... actually, the "AND pg_shdepend.dbid = ( SELECT pg_database_1.oid
...)" part of this is useless, because you already had that in the ON
condition of the original join to pg_database.  So, apologies for the
noise there.  TBH I don't see why you put that in the JOIN ON condition
instead of WHERE, but anyway you don't need to add a new condition for
it.  I guess I'd do it like this for clarity:

 FROM pg_catalog.pg_shdepend
 JOIN pg_catalog.pg_database ON pg_database.oid = pg_shdepend.dbid
 JOIN pg_catalog.pg_authid ON pg_authid.oid = pg_shdepend.refobjid
 LATERAL pg_catalog.pg_identify_object(pg_shdepend.classid, 
pg_shdepend.objid, pg_shdepend.objsubid) AS identify,
 LATERAL 
pg_catalog.aclexplode(pg_catalog.pg_get_acl(pg_shdepend.classid, 
pg_shdepend.objid, pg_shdepend.objsubid)) AS aclexplode
 WHERE pg_shdepend.deptype = 'a' AND
 pg_shdepend.refclassid = 'pg_catalog.pg_authid'::pg_catalog.regclass 
AND
 pg_database.datname = pg_catalog.current_database()

... but since these are inner joins, this might be a matter of style.
(I did add a couple of schema-qualifications there.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Debido a que la velocidad de la luz es mucho mayor que la del sonido,
 algunas personas nos parecen brillantes un minuto antes
 de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)

Re: Make default subscription streaming option as Parallel

2024-10-21 Thread Amit Kapila

On Mon, Oct 7, 2024 at 11:05 AM vignesh C  wrote:
>
> The tests demonstrate a significant performance improvement when using
> the parallel streaming option, insert shows 40-48 %improvement, delete
> shows 34-39 %improvement, update shows 26-30 %improvement. In the case
> of rollback the improvement is between 12-44%, the improvement
> slightly reduces with larger amounts of data being rolled back in this
> case. If there's a significant amount of data to roll back, the
> performance of streaming in parallel may be comparable to or slightly
> lower in some instances. However, this is acceptable since commit
> operations are generally more frequent than rollback operations.
>
> One key point to consider is that the lock on transaction objects will
> be held for a longer duration when using streaming in parallel. This
> occurs because the parallel apply worker initiates the transaction as
> soon as streaming begins, maintaining the lock until the transaction
> is fully completed. As a result, for long-running transactions, this
> extended lock can hinder concurrent access that requires a lock.
>

The longer-running transactions will anyway have a risk of deadlocks
or longer waits if there are concurrent operations on the subscribers.
However, with parallel apply, there is a risk of deadlock among the
leader and parallel workers when the schema in publisher and
subscriber is different. Say the subscriber has a unique constraint
that the publisher doesn't have. See the comments in this regard atop
applyparallelworker.c in the "Locking Considerations" section. Having
said that, the apply workers will detect deadlock in such cases and
will retry to apply the errored-out transaction. So, there is a
self-healing in-built mechanism and in such cases, we anyway have a
risk of UNIQUE_KEY conflict ERRORs which in most cases would require
manual intervention.

> Since there is a significant percentage improvement, we should make
> the default subscription streaming option parallel.
>

This makes sense to me. I think it would be better to add a Note or
Warning in the docs for the risk of deadlock when the schema of
publisher and subscriber is not the same even though such cases should
be less.

-- 
With Regards,
Amit Kapila.

Re: Pgoutput not capturing the generated columns

2024-10-21 Thread Amit Kapila

On Fri, Oct 18, 2024 at 5:42 PM Shubham Khanna
 wrote:
> > >
> > > I have fixed all the given comments. The attached patches contain the
> > > required changes.

Review comments:
===
1.
>
B. when generated columns are not published

* Publisher not-generated column => subscriber not-generated column:
  This is just normal logical replication (not changed by this patch).

* Publisher not-generated column => subscriber generated column:
  This will give ERROR.
>

Is the second behavior introduced by the patch? If so, why?

2.
@@ -1213,7 +1207,10 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
{
...
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
+ continue;
+
+ if (att->attgenerated && !pub->pubgencols)
  continue;

It is better to combine the above conditions and write a comment on it.

3.
@@ -368,18 +379,50 @@ pub_collist_contains_invalid_column(Oid pubid,
Relation relation, List *ancestor
  Anum_pg_publication_rel_prattrs,
  &isnull);

- if (!isnull)
+ if (!isnull || !pubgencols)
  {
  int x;
  Bitmapset  *idattrs;
  Bitmapset  *columns = NULL;

- /* With REPLICA IDENTITY FULL, no column list is allowed. */
- if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
- result = true;
+ if (!isnull)
+ {
+ /* With REPLICA IDENTITY FULL, no column list is allowed. */
+ if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
+ result = true;
+
+ /* Transform the column list datum to a bitmapset. */
+ columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ }
+ else
+ {
+ TupleDesc desc = RelationGetDescr(relation);
+ int nliveatts = 0;
+
+ for (int i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ /* Skip if the attribute is dropped or generated */
+ if (att->attisdropped)
+ continue;
+
+ nliveatts++;
+
+ if (att->attgenerated)
+ continue;
+
+ columns = bms_add_member(columns, i + 1);
+ }

- /* Transform the column list datum to a bitmapset. */
- columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ /* Return if all columns of the table will be replicated */
+ if (bms_num_members(columns) == nliveatts)
+ {
+ bms_free(columns);
+ ReleaseSysCache(tuple);
+ return false;
+ }

Won't this lead to traversing the entire column list for default cases
where publish_generated_columns would be false which could hurt the
update/delete's performance? Irrespective of that, it is better to
write some comments to explain this logic.

4. Some minimum parts of 0002 like the changes in
/doc/src/sgml/ref/create_publication.sgml should be part of 0001
patch. We can always add examples or more details in the docs as a
later patch.

-- 
With Regards,
Amit Kapila.

Re: type cache cleanup improvements

2024-10-21 Thread Alexander Korotkov

On Mon, Oct 21, 2024 at 1:16 PM Dagfinn Ilmari Mannsåker
 wrote:
> Alexander Korotkov  writes:
>
> > On Mon, Oct 21, 2024 at 8:40 AM Andrei Lepikhov  wrote:
> >>
> >> On 21/10/2024 06:32, Dagfinn Ilmari Mannsåker wrote:
> >> > Alexander Korotkov  writes:
> >> >
> >> >> +static Oid *in_progress_list;
> >> >> +static int  in_progress_list_len;
> >> >> +static int  in_progress_list_maxlen;
> >> >
> >> > Is there any particular reason not to use pg_list.h for this?
> >> Sure. The type cache lookup has to be as much optimal as possible.
> >> Using an array and relating sequential access to it, we avoid memory
> >> allocations and deallocations 99.9% of the time. Also, quick access to
> >> the single element (which we will have in real life almost all of the
> >> time) is much faster than employing list machinery.
>
> Lists are actually dynamically resized arrays these days (see commit
> 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164), not linked lists, so
> accessing arbitrary elements is O(1), not O(n). Just like this patch,
> the size is doubled (starting at 16) whenever array is full.
>
> > +1,
> > List with zero elements has to be NIL.  That means continuous
> > allocations/deallocations.
>
> This however is a valid point (unless we keep a dummy zeroth element to
> avoid it, which is even uglier than open-coding the array extension
> logic), so objection withdrawn.

OK, thank you!

The attached revision fixes EXTRA_INSTALL in
src/test/modules/typcache/Makefile.  Spotted off-list by Arthur
Zakirov.

--
Regards,
Alexander Korotkov
Supabase


v17-0001-Update-header-comment-for-lookup_type_cache.patch
Description: Binary data


v17-0002-Avoid-looping-over-all-type-cache-entries-in-Typ.patch
Description: Binary data

Re: Add pg_ownerships and pg_privileges system views

2024-10-21 Thread Joel Jacobson

On Mon, Oct 21, 2024, at 11:42, Alvaro Herrera wrote:
> ... actually, the "AND pg_shdepend.dbid = ( SELECT pg_database_1.oid
> ...)" part of this is useless, because you already had that in the ON
> condition of the original join to pg_database.  So, apologies for the
> noise there.  TBH I don't see why you put that in the JOIN ON condition
> instead of WHERE, but anyway you don't need to add a new condition for
> it.  I guess I'd do it like this for clarity:
>
>  FROM pg_catalog.pg_shdepend
>  JOIN pg_catalog.pg_database ON pg_database.oid = 
> pg_shdepend.dbid
>  JOIN pg_catalog.pg_authid ON pg_authid.oid = 
> pg_shdepend.refobjid
>  LATERAL pg_catalog.pg_identify_object(pg_shdepend.classid, 
> pg_shdepend.objid, pg_shdepend.objsubid) AS identify,
>  LATERAL 
> pg_catalog.aclexplode(pg_catalog.pg_get_acl(pg_shdepend.classid, 
> pg_shdepend.objid, pg_shdepend.objsubid)) AS aclexplode
>  WHERE pg_shdepend.deptype = 'a' AND
>pg_shdepend.refclassid = 'pg_catalog.pg_authid'::pg_catalog.regclass 
> AND
>pg_database.datname = pg_catalog.current_database()
>
> ... but since these are inner joins, this might be a matter of style.
> (I did add a couple of schema-qualifications there.)

Ahh, right, that's nicer, thanks for fixing.

New patch attached.

I also fixed pg_ownerships in the same way, moving the 
`pg_catalog.pg_database.datname = pg_catalog.current_database()` to the WHERE 
clause instead.

/Joel

v5-0001-Add-pg_ownerships-and-pg_privileges-system-views.patch
Description: Binary data

Re: Misleading error "permission denied for table"

2024-10-21 Thread Ashutosh Bapat

On Wed, Oct 16, 2024 at 10:11 PM Tom Lane  wrote:
>
> Nathan Bossart  writes:
> > On Wed, Oct 16, 2024 at 07:36:29PM +0530, Ashutosh Bapat wrote:
> >> Shouldn't we report "permission defined for column atest5.three?
>
> > We do have "permission denied for column" messages in aclchk.c (e.g.,
> > aclcheck_error_col()), but I don't see them actually used anywhere (or at
> > least they don't show up in any expected regression test output).

Attached 0001 adds a test, which triggers column permission error
message, to privileges.out. That test file has tests for GRANTing role
to role and ALTER user etc. but no test for GRANT on an object. But I
didn't find any other testfile which does that. So I think
privileges.sql is the right place for the test. I think this patch can
be committed independently.

> I'm
> > inclined to agree that a more specific error would be nice, but I worry
> > there's some hidden complexity that's prevented it thus far...
>
> See ExecCheckOneRelPerms, which seems to regard this as a TODO.
> I think the hard part is how to report the cases that involve
> pg_attribute_aclcheck_all(..., ACLMASK_ANY), which means
>
>  * If 'how' is ACLMASK_ANY, then returns ACLCHECK_OK if user has any of the
>  * privileges identified by 'mode' on any non-dropped column in the relation;
>
> so that you can't really finger a specific column as being in
> violation.  Maybe we could leave those cases as just mentioning the
> rel; not sure if that leads to failing to move the goalposts much.

I think, it might just suffice to say "permission denied for relation
\"%s\" and all of its columns" in the error with a hint ""require
permission on the relation or any of its columns" as done in patch
0002. What do you think?

> Otherwise, we could extend ExecCheckOneRelPerms and
> pg_attribute_aclcheck_all to pass back a column number, and
> then modify the error reporting in ExecCheckPermissions.
>

The attached patch does this. To select or modify a column, a user
needs to have relevant permission on the table and/or the column. So
the error message should mention both. That's what the patch does, it
reports the first column without the required privileges. With that
all the errors resulting from SELECT/INSERT/UPDATE report both the
name of the table and the first column without permission. E.g.
following change in the attached patch
INSERT INTO base_tbl VALUES (3, 'Row 3', 3.0); -- not allowed
- ERROR:  permission denied for table base_tbl
+ ERROR:  permission denied for relation "base_tbl" as well as its column "a"

That's technically correct but may not be as useful since the user may
not intend to add column privileges at all OR if they add column
privilege on "a", the statement will still error out mentioning column
"b" next. Let's call it approach 1.

To fix that we could take advice from TODO in ExecCheckOneRelPerms()
and see if there are other columns with column privileges. Existence
of such columns indicates that the user's intention is to add column
privileges and they are missing privileges on one or more of them. The
result would be much better, however
1. we can't stop checking permissions on the first failure as we do today
2. the columns for which column privileges are set may not be
referenced in the query and thus we have to look at all the column
privileges irrespective of whether those are referenced or not. That
looks like a lot of unnecessary work.
Let's call this approach 2.

Approach 3 is a heuristic. We stop at the first failed column
permission check but while doing so note if we found a column with
required permissions. If such a column exists, we mention both the
table and the first column for which permission check failed in the
error message. Otherwise we just mention the table in the error
message. In case the user is missing column permissions, it's quite
likely that they are missing it on a later column instead of very
first one. So this heuristic will mostly work. Downside is the error
message depends upon the columns mentioned in the query - thus may
change from query to query and has potential to cause confusion.

Approach 4 is to leave the error message as is but provide a hint
"require permissions on the relation or all the columns referenced in
the query". That's simplest and enough to set the user on the right
path, though not strictly accurate.

I am leaning towards 3 or 4 but open to suggestions (changes to above
approaches or new options).

--
Best Wishes,
Ashutosh Bapat
From b0d4d6a5bfa8b139612db946db786cd3a63de9bc Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 21 Oct 2024 16:47:59 +0530
Subject: [PATCH 2/2] Improve table permission error

In order to select or modify a column, a user requires either appropriate
permissions on the table or appropriate permissions on the column or both. When
neither permissions exist, the user gets an error message "permission denied on
the table ..." which may lead the user into thinking that they need table level
permissions. Tha

Re: commitfest.postgresql.org Specify thread msgid does not work for pgsql-bugs(at)lists(dot)postgresql(dot)org

2024-10-21 Thread Alvaro Herrera

On 2024-Oct-21, jian he wrote:

> hi.
> https://commitfest.postgresql.org/50/new/#
> won't work for messages from pgsql-bugs
> .
> 
> recently, I try to register an entry for
> https://www.postgresql.org/message-id/tencent_BAB00BBF45A08D941DF8290D9533941BB807%40qq.com
> but cannot.
> 
> see picture attached.

You need to use the Message-Id only, without the URL part of it, so it
has to start with "tencent".  Also, the %40 needs to be an @ sign.  The
@ is converted to %40 because of URL-encoding, but that's not the real
message-id.

I guess it would be helpful if the commitfest app can read a URL to a
mailing list post and reverse-engineer the message-id from it ... maybe
you'd have to propose a patch to this:
https://git.postgresql.org/gitweb/?p=pgcommitfest2.git;a=blob;f=pgcommitfest/commitfest/views.py;h=2c21cd3d623d86a6fc8a0f639ab127ece520eacf;hb=refs/heads/master#l387

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Make default subscription streaming option as Parallel

2024-10-21 Thread Peter Smith

On Mon, Oct 21, 2024 at 5:09 PM Amit Kapila  wrote:
>
> On Mon, Oct 7, 2024 at 4:03 PM vignesh C  wrote:
> >
> > On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu)
> >  wrote:
> >
> > > One comment for your patch;
> > > Shouldn't we add a streaming=off case for pg_dump test? You added lines 
> > > but no one
> > > passes it.
> > >
> >
> > Update existing pg_dump tests to cover the streaming options, the
> > attached patch has the changes for the same.
> >
>
> @@ -5235,6 +5235,8 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
>   appendPQExpBufferStr(query, ", streaming = on");
>   else if (strcmp(subinfo->substream, "p") == 0)
>   appendPQExpBufferStr(query, ", streaming = parallel");
> + else
> + appendPQExpBufferStr(query, ", streaming = off");
>
> For newer versions (>=18), we shouldn't need to specify "streaming =
> parallel" while creating a subscription as that will be the default.
> However, with the above code pg_dump statements will still have that.
> There is nothing wrong with that but ideally, it won't be required.
> Now, OTOH, doing that would require some version-handling code, which
> is nothing new for pg_dump but not sure it makes sense for this
> particular case. Another related point is that normally we don't
> recommend people to use dump generated with pg_dump to use with lower
> server versions than pg_dump itself, but the current proposed patch
> will allow that. However, if we change it as I am saying that won't be
> possible.

Leaving the patch as-is seems better to me.

PROS
- The simple code explicitly setting all parameter values is easy to
understand as-is.
- AFAICT it works for all that the pg_dump docs [1] guarantees.
- No version handling code will be needed...
- Therefore, no risk of accidentally introducing any versioning bugs.
- Yields a more portable dump file (even though not guaranteed by pg_dump docs)

CONS
- A few more chars in the dump file?
- What else?

==
[1] https://www.postgresql.org/docs/devel/app-pgdump.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Question about VACUUM behavior with sub-transactions in stored procedures

2024-10-21 Thread Кириллов Вячеслав

Hi hackers,,

I have a question regarding the behavior of the auto VACUUM in PostgreSQL in 
the context of using stored procedures with sub-transactions.


As I understand it, the parameters that control the initiation of VACUUM are 
set in the configuration file, such as autovacuum_vacuum_threshold, 
autovacuum_vacuum_scale_factor, and are stored in the system tables 
pg_stat_user_tables or pg_class (please correct me if I'm wrong). These system 
tables are updated after each completed transaction, and VACUUM analyzes them 
to determine whether to clean up dead rows, depending on the configured 
thresholds.


Here is the scenario: we have several stored procedures that modify or update 
table data. These procedures use sub-transactions, which are committed via 
COMMIT. However, from my understanding, the system table, which VACUUM checks, 
is not updated until the main (outermost) transaction completes. This means 
that during the execution of the procedures, a significant number of dead rows 
may accumulate, and only after the final COMMIT of the main transaction do 
these dead rows become visible for VACUUM.


As a result, there could be a sharp spike in CPU usage when VACUUM runs after 
the completion of the procedures, as it begins to clean up a large number of 
accumulated dead rows.

I would like to know if this behavior is expected and correct? Or could there 
be a potential issue or bug in this scenario?


To illustrate the issue, here's an example:

CREATE TABLE bloat
(
id integer generated always as identity,
d timestamptz
);

CREATE OR REPLACE PROCEDURE update_multiple_bloat()
LANGUAGE plpgsql
AS $$
DECLARE
row_to_update RECORD;
BEGIN
FOR row_to_update IN SELECT * FROM bloat
LOOP
UPDATE bloat SET d = CURRENT_TIMESTAMP WHERE d = row_to_update.d;
COMMIT;
END LOOP;
END;
$$;

CREATE OR REPLACE PROCEDURE insert_multiple_into_bloat(num_records integer)
LANGUAGE plpgsql
AS $$
DECLARE
i integer := 1;
BEGIN
LOOP
EXIT WHEN i > num_records;
INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);
i := i + 1;
COMMIT;
END LOOP;
END;
$$;

DO $$
DECLARE
row_data RECORD;
counter INT := 0;
BEGIN

BEGIN
INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);
COMMIT;
END;?

BEGIN
call insert_multiple_into_bloat(100);
END;

BEGIN
call update_multiple_bloat();
END;

END $$;

Thank you in advance for your help!

With Regards,
Vyacheslav Kirillov!

[PATCH] Add array_reverse() function

2024-10-21 Thread Aleksander Alekseev

Hi,

Recently I wanted to call array_reverse() and discovered that we still
don't have it. I'm not the first one who encountered this limitation.
array_reverse() was requested at least since 2009 [1] and the
workaround on PostgreSQL Wiki is dated 2010 [2].

The proposed patch adds this function. Only the first dimension of the
array is reversed, for consistency with the existing functions such as
array_shuffle() [3].

Examples:

=# SELECT array_reverse(ARRAY[1,2,3,NULL]);
 array_reverse
---
 {NULL,3,2,1}

=# SELECT array_reverse(ARRAY[[1,2],[3,4],[5,6]]);
array_reverse
-
 {{5,6},{3,4},{1,2}}

Thoughts?

[1]: http://postgr.es/m/4AEE80FC.7010608%40postnewspapers.com.au
[2]: https://wiki.postgresql.org/wiki/Array_reverse
[3]: https://www.postgresql.org/docs/current/functions-array.html

-- 
Best regards,
Aleksander Alekseev


v1-0001-Add-array_reverse-function.patch
Description: Binary data

Re: type cache cleanup improvements

2024-10-21 Thread Dagfinn Ilmari Mannsåker

Alexander Korotkov  writes:

> On Mon, Oct 21, 2024 at 8:40 AM Andrei Lepikhov  wrote:
>>
>> On 21/10/2024 06:32, Dagfinn Ilmari Mannsåker wrote:
>> > Alexander Korotkov  writes:
>> >
>> >> +static Oid *in_progress_list;
>> >> +static int  in_progress_list_len;
>> >> +static int  in_progress_list_maxlen;
>> >
>> > Is there any particular reason not to use pg_list.h for this?
>> Sure. The type cache lookup has to be as much optimal as possible.
>> Using an array and relating sequential access to it, we avoid memory
>> allocations and deallocations 99.9% of the time. Also, quick access to
>> the single element (which we will have in real life almost all of the
>> time) is much faster than employing list machinery.

Lists are actually dynamically resized arrays these days (see commit
1cff1b95ab6ddae32faa3efe0d95a820dbfdc164), not linked lists, so
accessing arbitrary elements is O(1), not O(n). Just like this patch,
the size is doubled (starting at 16) whenever array is full.

> +1,
> List with zero elements has to be NIL.  That means continuous
> allocations/deallocations.

This however is a valid point (unless we keep a dummy zeroth element to
avoid it, which is even uglier than open-coding the array extension
logic), so objection withdrawn.

- ilmari

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-10-21 Thread Fujii Masao





On 2024/10/21 18:30, Kirill Reshke wrote:

v4 no longer applies. It now conflicts with
e7834a1a251d4a28245377f383ff20a657ba8262.
Also, there were review comments.

So, I decided to rebase.


Thanks for the patch! Here are my review comments:

I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
and columns with errors. It's "unexpected" thing for columns to be silently
replaced with NULL due to on_error=set_to_null. So, similar to on_error=ignore,
there should be NOTICE messages indicating which input records had columns
set to NULL because of data type incompatibility. Without these messages,
users might not realize that some columns were set to NULL.


How should on_error=set_to_null behave when reject_limit is set? It seems
intuitive to trigger an error if the number of rows with columns' data type
issues exceeds reject_limit, similar to on_error=ignore. This is open to 
discussion.


psql's tab-completion should be updated to include SET_TO_NULL.


   An error_action value of
   stop means fail the command, while
   ignore means discard the input row and continue with 
the next one.
+  set_to_null means the input value will be set to 
null and continue with the next one.

How about merging these two descriptions to one and updating it to the 
following?

---
An error_action value of stop means fail the command, ignore means discard the 
input
row and continue with the next one, and set_to_null means replace columns with 
invalid
input values with NULL and move to the next row.
---


  The ignore option is applicable only for COPY 
FROM

This should be "... ignore and set_to_null options are ..."?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Question about VACUUM behavior with sub-transactions in stored procedures

2024-10-21 Thread David G. Johnston

On Monday, October 21, 2024, Кириллов Вячеслав  wrote:

> I have a question regarding the behavior of the auto VACUUM in PostgreSQL
> in the context of using stored procedures with sub-transactions.
>
>
This is a general usage inquiry not suited to discussion on -hackers.  We
have a -general mailing list to discuss how to use the product.  This list
is for discussing patches.

> Here is the scenario: we have several stored procedures that modify or
> update table data. These procedures use sub-transactions, which are
> committed via COMMIT.
>
>
This isn’t how sub-transactions work.  They are created mainly by save
points and are not independently committed (by the user in SQL).  What you
are using are full transactions.

https://www.postgresql.org/docs/17/plpgsql-transactions.html

David J.

Enhancing Memory Context Statistics Reporting

2024-10-21 Thread Rahila Syed

Hi,

PostgreSQL provides following capabilities for reporting memory contexts
statistics.
1. pg_get_backend_memory_contexts(); [1]
2. pg_log_backend_memory_contexts(pid); [2]

[1]  provides a view of memory context statistics for a local backend,
while [2] prints the memory context statistics of any backend or auxiliary
process to the PostgreSQL logs. Although [1] offers detailed statistics,
it is limited to the local backend, restricting its use to PostgreSQL
client backends only.
On the other hand, [2] provides the statistics for all backends but logs
them in a file,
which may not be convenient for quick access.

I propose enhancing memory context statistics reporting by combining these
capabilities and offering a view of memory statistics for all PostgreSQL
backends
and auxiliary processes.

Attached is a patch that implements this functionality. It introduces a SQL
function
that takes the PID of a backend as an argument, returning a set of records,
each containing statistics for a single memory context. The underlying C
function
sends a signal to the backend and waits for it to publish its memory
context statistics
 before returning them to the user. The publishing backend copies these
statistics
during the next CHECK_FOR_INTERRUPTS call.

This approach facilitates on-demand publication of memory statistics
for a specific backend, rather than collecting them at regular intervals.
Since past memory context statistics may no longer be relevant,
there is little value in retaining historical data. Any collected
statistics
can be discarded once read by the client backend.

A fixed-size shared memory block, currently accommodating 30 records,
 is used to store the statistics. This number was chosen arbitrarily,
 as it covers all parent contexts at level 1 (i.e., direct children of the
top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.

Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
 first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.

The statistics are reported in a breadth-first search order of the memory
context tree,
 with parent contexts reported before their children. This provides a
cumulative summary
before diving into the details of each child context's consumption.

The rationale behind the shared memory chunk is to ensure that the
majority of contexts which are the direct children of TopMemoryContext,
fit into memory
This allows a client to request a summary of memory statistics,
which can be served from memory without the overhead of file access,
unless necessary.

A publishing backend signals waiting client backends using a condition
variable when it has finished writing its statistics to memory.
The client backend checks whether the statistics belong to the requested
backend.
If not, it continues waiting on the condition variable, timing out after 2
minutes.
This timeout is an arbitrary choice, and further work is required to
determine
a more practical value.

All backends use the same memory space to publish their statistics.
Before publishing, a backend checks whether the previous statistics have
been
successfully read by a client using a shared flag, "in_use."
This flag is set by the publishing backend and cleared by the client
backend once the data is read. If a backend cannot publish due to shared
memory being occupied, it exits the interrupt processing code,
and the client backend times out with a warning.

Please find below an example query to fetch memory contexts from the backend
 with id '106114'. Second argument -'get_summary' is 'false',
indicating a request for statistics of all the contexts.

postgres=#
select * FROM pg_get_remote_backend_memory_contexts('116292', false) LIMIT
2;
-[ RECORD 1 ]-+--
name  | TopMemoryContext
ident |
type  | AllocSet
path  | {0}
total_bytes   | 97696
total_nblocks | 5
free_bytes| 15376
free_chunks   | 11
used_bytes| 82320
pid   | 116292
-[ RECORD 2 ]-+--
name  | RowDescriptionContext
ident |
type  | AllocSet
path  | {0,1}
total_bytes   | 8192
total_nblocks | 1
free_bytes| 6912
free_chunks   | 0
used_bytes| 1280
pid   | 116292

TODO:
1. Determine the behaviour when the statistics don't fit in one file.

*[1] **PostgreSQL: Re: Creating a function for exposing memory usage of
backend process

Re: ECPG Refactor: move sqlca variable in ecpg_log()

2024-10-21 Thread Fujii Masao




On 2024/10/19 2:43, Tom Lane wrote:

Fujii Masao  writes:

I've attached the latest version of the patch, now including the commit log.
Unless there are any objections, I'll proceed with committing it.


LGTM.  Maybe move down the sqlca variable declaration, so that the
declarations still match the order in which the variables are
initialized?  That's just cosmetic of course.


Thanks for the review! I've updated the patch accordingly, i.e.,
moved the declaration of the sqlca variable right after fmt.
The v3 patch is attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From 5a0ea1d78cb14ad90d64deb8a521d5627854d850 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Fri, 18 Oct 2024 14:22:17 +0900
Subject: [PATCH v3] ecpg: Refactor ecpg_log() to skip unnecessary calls to
 ECPGget_sqlca().

Previously, ecpg_log() always called ECPGget_sqlca() to retrieve sqlca,
even though it was only needed for debug logging. This commit updates
ecpg_log() to call ECPGget_sqlca() only when debug logging is enabled.

Author: Yuto Sasaki
Reviewed-by: Tom Lane, Fujii Masao
Discussion: 
https://postgr.es/m/ty2pr01mb3628a85689649babc9a1c6c3c1...@ty2pr01mb3628.jpnprd01.prod.outlook.com
---
 src/interfaces/ecpg/ecpglib/misc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/interfaces/ecpg/ecpglib/misc.c 
b/src/interfaces/ecpg/ecpglib/misc.c
index 2ae989e3e5..8b38c3eccf 100644
--- a/src/interfaces/ecpg/ecpglib/misc.c
+++ b/src/interfaces/ecpg/ecpglib/misc.c
@@ -232,10 +232,10 @@ void
 ecpg_log(const char *format,...)
 {
va_list ap;
-   struct sqlca_t *sqlca = ECPGget_sqlca();
const char *intl_format;
int bufsize;
char   *fmt;
+   struct sqlca_t *sqlca;
 
/*
 * For performance reasons, inspect simple_debug without taking the 
mutex.
@@ -262,6 +262,8 @@ ecpg_log(const char *format,...)
else
snprintf(fmt, bufsize, "[%d]: %s", (int) getpid(), intl_format);
 
+   sqlca = ECPGget_sqlca();
+
pthread_mutex_lock(&debug_mutex);
 
/* Now that we hold the mutex, recheck simple_debug */
-- 
2.46.2

Re: Make default subscription streaming option as Parallel

2024-10-21 Thread vignesh C

On Mon, 21 Oct 2024 at 14:36, Amit Kapila  wrote:
>
> On Mon, Oct 7, 2024 at 11:05 AM vignesh C  wrote:
> >
> > The tests demonstrate a significant performance improvement when using
> > the parallel streaming option, insert shows 40-48 %improvement, delete
> > shows 34-39 %improvement, update shows 26-30 %improvement. In the case
> > of rollback the improvement is between 12-44%, the improvement
> > slightly reduces with larger amounts of data being rolled back in this
> > case. If there's a significant amount of data to roll back, the
> > performance of streaming in parallel may be comparable to or slightly
> > lower in some instances. However, this is acceptable since commit
> > operations are generally more frequent than rollback operations.
> >
> > One key point to consider is that the lock on transaction objects will
> > be held for a longer duration when using streaming in parallel. This
> > occurs because the parallel apply worker initiates the transaction as
> > soon as streaming begins, maintaining the lock until the transaction
> > is fully completed. As a result, for long-running transactions, this
> > extended lock can hinder concurrent access that requires a lock.
> >
>
> The longer-running transactions will anyway have a risk of deadlocks
> or longer waits if there are concurrent operations on the subscribers.
> However, with parallel apply, there is a risk of deadlock among the
> leader and parallel workers when the schema in publisher and
> subscriber is different. Say the subscriber has a unique constraint
> that the publisher doesn't have. See the comments in this regard atop
> applyparallelworker.c in the "Locking Considerations" section. Having
> said that, the apply workers will detect deadlock in such cases and
> will retry to apply the errored-out transaction. So, there is a
> self-healing in-built mechanism and in such cases, we anyway have a
> risk of UNIQUE_KEY conflict ERRORs which in most cases would require
> manual intervention.
>
> > Since there is a significant percentage improvement, we should make
> > the default subscription streaming option parallel.
> >
>
> This makes sense to me. I think it would be better to add a Note or
> Warning in the docs for the risk of deadlock when the schema of
> publisher and subscriber is not the same even though such cases should
> be less.

Yes this can happen like scenarios below(with deadlock_timeout = 10ms):
Publisher:
CREATE TABLE t1(c1 int);
create publication pub1 for table t1;

Subscriber has an addition index on the table:
CREATE TABLE t1(c1 int);
CREATE UNIQUE INDEX idx1 on t1(c1);
Create subscription ...;

Publisher:
Session1:
Begin;
INSERT INTO t1 SELECT i FROM generate_series(1, 5000) s(i);

Session2:
-- Insert the record that is already inserted in session1
INSERT INTO t1 value(1);

Session1:
Commit;

In this case a deadlock will occur.

Attached v3 version patch has a caution added for the same.

Regards,
Vignesh
From a56f565689b8b18573acd873a097a213af6c6722 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Wed, 25 Sep 2024 14:42:20 +0530
Subject: [PATCH v3] Make default value for susbcription streaming option to
 parallel.

Currently default value of streaming option is set to false. All
transactions are fully decoded on the publisher before being sent
to the subscriber. This approach can leads reduced performance,
particularly under heavy load.

Changing default streaming option to parallel, by doing this,
incoming changes will be directly applied by one of the available
parallel apply workers. This method significantly improves the
performance of commit operations.
---
 doc/src/sgml/ref/create_subscription.sgml  | 27 ++
 src/backend/commands/subscriptioncmds.c|  2 +-
 src/bin/pg_dump/pg_dump.c  |  2 ++
 src/bin/pg_dump/t/002_pg_dump.pl   | 10 
 src/test/regress/expected/subscription.out | 24 +--
 5 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 8a3096e62b..3eab06bd2d 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -271,11 +271,23 @@ CREATE SUBSCRIPTION subscription_name
  
   Specifies whether to enable streaming of in-progress transactions
-  for this subscription.  The default value is off,
-  meaning all transactions are fully decoded on the publisher and only
-  then sent to the subscriber as a whole.
+  for this subscription.  The default value is parallel,
+  meaning incoming changes are directly applied via one of the parallel
+  apply workers, if available. If no parallel apply worker is free to
+  handle streaming transactions then the changes are written to
+  temporary files and applied after the transaction is committed. Note
+  that if an error happens in a parallel apply worker, the finish

Re: [BUG?] check_exclusion_or_unique_constraint false negative

2024-10-21 Thread Michail Nikolaev

Hello, Hou!

I have sent [0] reproducer within the context of conflict detection and
resolution to the original thread.

[0]:
https://www.postgresql.org/message-id/flat/CANtu0ojMjAwMRJK%3DH8y0YBB0ZEcN%2BJbdZeoXQn8dWO5F67jgsA%40mail.gmail.com#f5d1baf4702685aedf23daa9addc012e

>

Re: [PATCH] Add array_reverse() function

2024-10-21 Thread Ashutosh Bapat

On Mon, Oct 21, 2024 at 2:36 PM Aleksander Alekseev
 wrote:
>
> Hi,
>
> Recently I wanted to call array_reverse() and discovered that we still
> don't have it. I'm not the first one who encountered this limitation.
> array_reverse() was requested at least since 2009 [1] and the
> workaround on PostgreSQL Wiki is dated 2010 [2].
>
> The proposed patch adds this function. Only the first dimension of the
> array is reversed, for consistency with the existing functions such as
> array_shuffle() [3].
>
> Examples:
>
> =# SELECT array_reverse(ARRAY[1,2,3,NULL]);
>  array_reverse
> ---
>  {NULL,3,2,1}
>
> =# SELECT array_reverse(ARRAY[[1,2],[3,4],[5,6]]);
> array_reverse
> -
>  {{5,6},{3,4},{1,2}}
>
> Thoughts?

Looks useful. Glancing quickly at the code I have a comment

+
+ /* If the target array is empty, exit fast */
+ if (ndim < 1 || dims[0] < 1)
+ return construct_empty_array(elmtyp);

This is taken care by the caller, at least the ndim < 1 case.


+ /*
+ * There is no point in reversing empty arrays or arrays with less than
+ * two items.
+ */
+ if (ARR_NDIM(array) < 1 || ARR_DIMS(array)[0] < 2)
+ PG_RETURN_ARRAYTYPE_P(array);

But it returns the input array as is. I think it should at least make
a new copy of input array.

-- 
Best Wishes,
Ashutosh Bapat

Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-21 Thread David Rowley

On Tue, 22 Oct 2024 at 14:46, David G. Johnston
 wrote:
> We should probably at least improve the documentation in 19.17.1; this 
> interaction is apparently not self-evident.
>
> enable_indexscan
>
> Enable or disable the planner’s use of both index-scan and index-only-scans 
> plan types.
>
> enabled_indexonlyscan
>
> Set to off to disable index-only-scan plan type while leaving index-scan plan 
> types enabled.  This setting has no effect if enable_indexscan is set to off.

Yeah, I agree. The documentation could better reflect the current behaviour.

Do you want to submit that in patch form?

David

More CppAsString2() in psql's describe.c

2024-10-21 Thread Michael Paquier

Hi all,

This was on my stack of things for some time, but please find attached
a patch to clean up some code with $subject because HEAD's describe.c
is a mixed bag of relying on CppAsString2() and hardcoded values.
Switching to CppAsString2() has the advantage to make the code more
self-documented, so as it is not necessary to remember what a single
byte means in a catalog.

I should have caught most of them, with exceptions related to
policies, subscriptions and dependencies being intentional.

I have noticed that describe.c includes pg_am.h and not pg_am_d.h.
That's not a good idea even if it is OK now because this has the risk
of pulling backend-side definitions into psql.  psql -E reported
consistent formats in the queries generated, so things look in rather
good shape here.

Note that there were a couple of value checks not part of the queries
that relied on values from the catalogs for some relpersistences and
replidents.  I've fixed them while on it.

Thoughts or comments are welcome.
--
Michael
From 9fa620a80e3af543c134f1bfe5c547828680d204 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 22 Oct 2024 09:48:24 +0900
Subject: [PATCH] psql: Sprinkle more CppAsString2() in describe.c

---
 src/bin/psql/describe.c | 153 +---
 1 file changed, 97 insertions(+), 56 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 363a66e718..a465a34afd 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -15,11 +15,17 @@
 
 #include 
 
-#include "catalog/pg_am.h"
+#include "catalog/pg_am_d.h"
+#include "catalog/pg_amop_d.h"
 #include "catalog/pg_attribute_d.h"
 #include "catalog/pg_cast_d.h"
 #include "catalog/pg_class_d.h"
+#include "catalog/pg_collation_d.h"
+#include "catalog/pg_constraint_d.h"
 #include "catalog/pg_default_acl_d.h"
+#include "catalog/pg_statistic_ext_d.h"
+#include "catalog/pg_proc_d.h"
+#include "catalog/pg_type_d.h"
 #include "common.h"
 #include "common/logging.h"
 #include "describe.h"
@@ -94,7 +100,7 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem)
 		  "  pg_catalog.obj_description(p.oid, 'pg_proc') as \"%s\"\n"
 		  "FROM pg_catalog.pg_proc p\n"
 		  " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace\n"
-		  "WHERE p.prokind = 'a'\n",
+		  "WHERE p.prokind = " CppAsString2(PROKIND_AGGREGATE) "\n",
 		  gettext_noop("Description"));
 	else
 		appendPQExpBuffer(&buf,
@@ -160,8 +166,8 @@ describeAccessMethods(const char *pattern, bool verbose)
 	printfPQExpBuffer(&buf,
 	  "SELECT amname AS \"%s\",\n"
 	  "  CASE amtype"
-	  " WHEN 'i' THEN '%s'"
-	  " WHEN 't' THEN '%s'"
+	  " WHEN " CppAsString2(AMTYPE_INDEX) " THEN '%s'"
+	  " WHEN " CppAsString2(AMTYPE_TABLE) " THEN '%s'"
 	  " END AS \"%s\"",
 	  gettext_noop("Name"),
 	  gettext_noop("Index"),
@@ -340,9 +346,9 @@ describeFunctions(const char *functypes, const char *func_pattern,
 		  "  pg_catalog.pg_get_function_result(p.oid) as \"%s\",\n"
 		  "  pg_catalog.pg_get_function_arguments(p.oid) as \"%s\",\n"
 		  " CASE p.prokind\n"
-		  "  WHEN 'a' THEN '%s'\n"
-		  "  WHEN 'w' THEN '%s'\n"
-		  "  WHEN 'p' THEN '%s'\n"
+		  "  WHEN " CppAsString2(PROKIND_AGGREGATE) " THEN '%s'\n"
+		  "  WHEN " CppAsString2(PROKIND_WINDOW) " THEN '%s'\n"
+		  "  WHEN " CppAsString2(PROKIND_PROCEDURE) " THEN '%s'\n"
 		  "  ELSE '%s'\n"
 		  " END as \"%s\"",
 		  gettext_noop("Result data type"),
@@ -376,9 +382,12 @@ describeFunctions(const char *functypes, const char *func_pattern,
 	{
 		appendPQExpBuffer(&buf,
 		  ",\n CASE\n"
-		  "  WHEN p.provolatile = 'i' THEN '%s'\n"
-		  "  WHEN p.provolatile = 's' THEN '%s'\n"
-		  "  WHEN p.provolatile = 'v' THEN '%s'\n"
+		  "  WHEN p.provolatile = "
+		  CppAsString2(PROVOLATILE_IMMUTABLE) " THEN '%s'\n"
+		  "  WHEN p.provolatile = "
+		  CppAsString2(PROVOLATILE_STABLE) " THEN '%s'\n"
+		  "  WHEN p.provolatile = "
+		  CppAsString2(PROVOLATILE_VOLATILE) " THEN '%s'\n"
 		  " END as \"%s\"",
 		  gettext_noop("immutable"),
 		  gettext_noop("stable"),
@@ -387,9 +396,12 @@ describeFunctions(const char *functypes, const char *func_pattern,
 		if (pset.sversion >= 90600)
 			appendPQExpBuffer(&buf,
 			  ",\n CASE\n"
-			  "  WHEN p.proparallel = 'r' THEN '%s'\n"
-			  "  WHEN p.proparallel = 's' THEN '%s'\n"
-			  "  WHEN p.proparallel = 'u' THEN '%s'\n"
+			  "  WHEN p.proparallel = "
+			  CppAsString2(PROPARALLEL_RESTRICTED) " THEN '%s'\n"
+			  "  WHEN p.proparallel = "
+			  CppAsString2(PROPARALLEL_SAFE) " THEN '%s'\n"
+			  "  WHEN p.proparallel = "
+			  CppAsString2(PROPARALLEL_UNSAFE) " THEN '%s'\n"
 			  " END as \"%s\"",
 			  gettext_noop("restricted"),
 			  gettext_noop("safe"),
@@ -448,7 +460,8 @@ describeFunctions(const char *functypes, const char *fun

Re: Pgoutput not capturing the generated columns

2024-10-21 Thread Peter Smith

Hi SHubham, Here are my review comments for v40-0001 (code)

Please don't post a blanket response of "I have fixed all the
comments" response to this. Sometimes things get missed. Instead,
please reply done/not-done/whatever individually, so I can track the
changes properly.

==
src/backend/catalog/pg_publication.c

1.
- if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
- ereport(ERROR,
- errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("cannot use generated column \"%s\" in publication column list",
-colname));
-

This function no longer rejects generated columns from a publication
column list. So, now the function comment is wrong because it still
says "Checks for and raises an ERROR for any; unknown columns, system
columns, duplicate columns or generated columns."

NOTE: This was fixed already in v39-0001, but then broken again in
v40-0001 (???)

nit - also remove that semicolon (;) from the function comment.

==
src/backend/commands/publicationcmds.c

2.
- if (!isnull)
+ if (!isnull || !pubgencols)
  {
  int x;
  Bitmapset  *idattrs;
  Bitmapset  *columns = NULL;

- /* With REPLICA IDENTITY FULL, no column list is allowed. */
- if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
- result = true;
+ if (!isnull)
+ {
+ /* With REPLICA IDENTITY FULL, no column list is allowed. */
+ if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
+ result = true;
+
+ /* Transform the column list datum to a bitmapset. */
+ columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ }
+ else
+ {
+ TupleDesc desc = RelationGetDescr(relation);
+ int nliveatts = 0;
+
+ for (int i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ /* Skip if the attribute is dropped or generated */
+ if (att->attisdropped)
+ continue;
+
+ nliveatts++;
+
+ if (att->attgenerated)
+ continue;
+
+ columns = bms_add_member(columns, i + 1);
+ }

- /* Transform the column list datum to a bitmapset. */
- columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ /* Return if all columns of the table will be replicated */
+ if (bms_num_members(columns) == nliveatts)
+ {
+ bms_free(columns);
+ ReleaseSysCache(tuple);
+ return false;
+ }
+ }

2a.
AFAIK this code was written to deal with Sawada-San's comment about
UPDATE/DELETE [1], but the logic is not very clear. It needs more
comments on the "else" part, the generated columns and how the new
logic solves Sawada-San's problem.

~

2b.
This function is now dealing both with 'publish_via_root' as well as
'publish_generated_columns'. I am suspicious that you may be handling
each of these parameters independently (i.e.. there is some early
return "Return if all columns of the table will be replicated") but
there might be some scenarios where a *combination* of pubviaroot and
gencols is not working as expected.

For example, there is a whole comment later about this: "If pubviaroot
is true, we are validating the column list of the parent table, but
the bitmap contains the replica identity information of the child
table." which I am not sure is being dealt with correctly. IMO there
need to be more test cases added for these tricky combination
scenarios to prove the code is good.

~

2c.
I expected to see some more REPLICA IDENTITY tests to reproduce the
problem scenario that Sawada-San reported. Where are those?

==
src/backend/replication/logical/tablesync.c

3. make_copy_attnamelist

Patch v38-0001 used to have a COPY error:

+ /*
+ * Check if the subscription table generated column has same name
+ * as a non-generated column in the corresponding publication
+ * table.
+ */
+ if (!remotegenlist[remote_attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" has a
generated column \"%s\" "
+ "but corresponding column on source relation is not a generated column",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;
+

My most recent comment about this code was something along the lines
of "Make this detailed useful error message common if possible".

But, in v39 all this ERROR was removed (and it is still missing in
v40). Why? I did not see any other review comments asking for this to
be removed. AFAIK this was a correct error message for
not_generated->generated. Won't the current code now just silently
ignore/skip this, which is contrary behaviour to what normal
replication would do?

The recently deleted combo TAP tests could have detected this.

~~~

fetch_remote_table_info

4.
if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 15)
There was a new 'server_version' variable added, so why not use it
here? In previous versions, we used to do so, but now since v39, the
code has reverted yet again. (???).

~~~

5.
  appendStringInfo(&cmd,
  "SELECT DISTINCT"
- "  (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
- "   THEN NULL ELSE gpt.attrs END)"
+ "  (gpt.attrs)"
  "  FROM pg_publication p,"
  "  LATERAL pg_get_publication_ta

Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-21 Thread David Rowley

On Tue, 22 Oct 2024 at 13:45, Melanie Plageman
 wrote:
> I was surprised today when I saw that with
> enable_indexscan=off
> enable_indexonlyscan=on
> EXPLAIN prints that the index only scan is disabled:
>
>   QUERY PLAN
> ---
>  Index Only Scan using history_pkey on history
>Disabled: true

There's nothing new about Index Only Scans being disabled by
enable_indexscan. Index Only Scan is chosen with your test case as all
possible Paths are disabled and IOS is the cheapest of all Paths.

The PG17 results for your test case are:

   QUERY PLAN
-
 Index Only Scan using history_pkey on history
(cost=100.29..1000527.78 rows=19966 width=4)
(1 row)

(note the 1e10 disable_cost has been applied to the Index Only Scan costs)

Robert did propose to change this behaviour while he was working on
the disabled_nodes changes. I did push back on the proposed change
[1]. If you feel strongly that what we have is wrong, then maybe it's
worth opening the discussion about that again.

David

[1] 
https://www.postgresql.org/message-id/CAApHDvoUUKi0JNv8jtZPfc_JkLs7FqymC5-DDUFNKnm4MMmPuQ%40mail.gmail.com

EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-21 Thread David G. Johnston

On Monday, October 21, 2024, David Rowley  wrote:

> On Tue, 22 Oct 2024 at 13:45, Melanie Plageman
>  wrote:
> > I was surprised today when I saw that with
> > enable_indexscan=off
> > enable_indexonlyscan=on
>
> Robert did propose to change this behaviour while he was working on
> the disabled_nodes changes. I did push back on the proposed change
> [1]. If you feel strongly that what we have is wrong, then maybe it's
> worth opening the discussion about that again.
>

We should probably at least improve the documentation in 19.17.1; this
interaction is apparently not self-evident.

enable_indexscan

Enable or disable the planner’s use of both index-scan and index-only-scans
plan types.

enabled_indexonlyscan

Set to off to disable index-only-scan plan type while leaving index-scan
plan types enabled.  This setting has no effect if enable_indexscan is set
to off.

David J.

RE: Statistics Import and Export

2024-10-21 Thread Shinoda, Noriyoshi (SXD Japan FSIP)

> 1. Allow relpages to be set to -1 (partitioned tables with partitions have 
> this value after ANALYZE).
> 2. Turn off autovacuum on tables (where possible) if they are going to be the 
> target of pg_set_relation_stats().
> 3. Allow pg_set_relation_stats to continue past an out-of-range detection on 
> one attribute, rather than immediately returning false

Thank you for developing this feature.
If the relpages option contains -1 only for partitioned tables, shouldn't 
pg_set_relation_stats restrict the values that can be
specified by table type? The attached patch limits the value to -1 or more if 
the target
is a partition table, and 0 or more otherwise.
Changing relpages to -1 on a non-partitioned table seems to significantly 
change the execution plan.

---
postgres=> EXPLAIN (COSTS OFF) SELECT * FROM data1 WHERE c1=1000;
  QUERY PLAN
--
Index Scan using data1_pkey on data1
   Index Cond: (c1 = 1000)
(2 rows)

postgres=> SELECT pg_set_relation_stats('data1', relpages=>-1);
pg_set_relation_stats
---
t
(1 row)

postgres=> EXPLAIN (COSTS OFF) SELECT * FROM data1 WHERE c1=1000;
  QUERY PLAN
---
Bitmap Heap Scan on data1
   Recheck Cond: (c1 = 1000)
   ->  Bitmap Index Scan on data1_pkey
 Index Cond: (c1 = 1000)
(4 rows)
---

Regards,
Noriyoshi Shinoda

From: Corey Huinker 
Sent: Saturday, October 19, 2024 10:00 AM
To: Jeff Davis 
Cc: Shinoda, Noriyoshi (SXD Japan FSIP) ; jian he 
; Matthias van de Meent 
; Bruce Momjian ; Tom Lane 
; Nathan Bossart ; Magnus 
Hagander ; Stephen Frost ; Ashutosh 
Bapat ; Peter Smith ; 
PostgreSQL Hackers ; alvhe...@alvh.no-ip.org
Subject: Re: Statistics Import and Export



Patch that allows relation_statistics_update to continue after one failed stat 
(0001) attached, along with bool->void change (0002).

Once more, with feeling:


pg_set_relation_stats_patch_v1.diff
Description: pg_set_relation_stats_patch_v1.diff

EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-21 Thread Melanie Plageman

I was surprised today when I saw that with
enable_indexscan=off
enable_indexonlyscan=on
EXPLAIN prints that the index only scan is disabled:

  QUERY PLAN
---
 Index Only Scan using history_pkey on history
   Disabled: true

I wasn't sure if this was expected -- maybe index-only scan is
considered a type of index scan for this purpose. I dug around this
disable_cost thread [1] a bit to see if I could figure out what the
expected behavior is on my own, but I'm still not sure.

Anyway, maybe I'm misunderstanding something.

Here's my repro:

CREATE TABLE history(
id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
data TEXT);
INSERT INTO history(data)
select repeat('a', 100) from generate_series(1,1)i;
VACUUM history;
set enable_seqscan = off;
set enable_indexscan = off;
set enable_bitmapscan = off;
set enable_indexonlyscan = on;
EXPLAIN (costs off) SELECT id from history;

- Melanie

[1] 
https://www.postgresql.org/message-id/flat/CA%2BTgmoZEg1tyW31t3jxhvDwff29K%3D2C9r6722SuFb%3D3XVKWkow%40mail.gmail.com#402856db473920b9e0193b9f2cc2739b

Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-21 Thread David G. Johnston

On Monday, October 21, 2024, David Rowley  wrote:

> On Tue, 22 Oct 2024 at 14:46, David G. Johnston
>  wrote:
> > We should probably at least improve the documentation in 19.17.1; this
> interaction is apparently not self-evident.
> >
> > enable_indexscan
> >
> > Enable or disable the planner’s use of both index-scan and
> index-only-scans plan types.
> >
> > enabled_indexonlyscan
> >
> > Set to off to disable index-only-scan plan type while leaving index-scan
> plan types enabled.  This setting has no effect if enable_indexscan is set
> to off.
>
> Yeah, I agree. The documentation could better reflect the current
> behaviour.
>
> Do you want to submit that in patch form?
>

Will do tomorrow.

David J.

Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-10-21 Thread Pavel Stehule

Hi

pá 11. 10. 2024 v 19:39 odesílatel Pavel Stehule 
napsal:

>
>
> pá 11. 10. 2024 v 18:08 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > I tested it and it is working nicely.  I tested it against Orafce and I
>> > found an interesting point. The body of plpgsql functions is not
>> checked.
>> > Do you know the reason?
>>
>> In execute_extension_script():
>>
>> /*
>>  * Similarly disable check_function_bodies, to ensure that SQL
>> functions
>>  * won't be parsed during creation.
>>  */
>> if (check_function_bodies)
>> (void) set_config_option("check_function_bodies", "off",
>>  PGC_USERSET, PGC_S_SESSION,
>>  GUC_ACTION_SAVE, true, 0, false);
>>
>> I wondered if we should reconsider that, but I'm afraid we'd be more
>> likely to break working extensions than to do anything helpful.
>> An extension author who wants that can do what I did in the patch's
>> test cases: manually turn check_function_bodies on in the extension
>> script.
>>
>
> ok,
>
>
I tested this patch and I didn't find any issue. The possibility to show
errors inside extensions more precisely is very useful.

compilation without problems, all tests passed

I'll mark this patch as ready for committer.

Regards

Pavel




> Pavel
>
>>
>> regards, tom lane
>>
>

Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2024-10-21 Thread Tender Wang

Alvaro Herrera  于2024年10月22日周二 05:52写道：

> On 2024-Oct-21, Tender Wang wrote:
>
> > I suspect that we don't need the below if
> > statement anymore.
> > /*
> > * If the referenced side is partitioned (which we know because our
> > * parent's constraint points to a different relation than ours) then
> > * we must, in addition to the above, create pg_constraint rows that
> > * point to each partition, each with its own action triggers.
> > */
> > if (parentConForm->conrelid != conform->conrelid)
> > {
> > ...
> > }
> >
> > The above contidion is always true according to my test.
> > I haven't figured out an anti-case.
>
> You're right, this is useless, we can remove the 'if' line.  I kept the
> block though, to have a place for all those local variable declarations
> (otherwise the code looks messier than it needs to).
>

Agree.


>
> I also noticed that addFkRecurseReferenced() is uselessly taking a List
> **wqueue argument but doesn't use it, so I removed it (as fallout, other
> routines don't need it either, especially DetachPartitionFinalize).  I
> added some commentary on the creation of dependencies in
> addFkConstraint().
>

Yeah, I also noticed this before.


>
> I also include a few other cosmetic changes; just comment layout
> changes.
>
> This time I attach the patch for master only; the others have changed
> identically.  14 is unchanged from before.  I figured that the conflict
> from 14 to 13 was trivial to resolve -- it was just because of DETACH
> CONCURRENTLY, so some code moves around, but that's all that happens.
>
>
I don't find any other problems with the newest patch.

-- 
Thanks,
Tender Wang

Re: race condition in pg_class

2024-10-21 Thread Noah Misch

On Mon, Oct 21, 2024 at 10:00:00PM +0300, Alexander Lakhin wrote:
> Please look at an anomaly introduced with a07e03fd8.
> With the attached modification for intra-grant-inplace.spec, running this
> test triggers a Valgrind-detected error for me:
> ==00:00:00:09.624 319033== Conditional jump or move depends on uninitialised 
> value(s)
> ==00:00:00:09.624 319033==    at 0x25D120: DoesMultiXactIdConflict 
> (heapam.c:7373)
> ==00:00:00:09.624 319033==    by 0x25B45A: heap_inplace_lock (heapam.c:6265)
> ==00:00:00:09.624 319033==    by 0x27D8CB: systable_inplace_update_begin 
> (genam.c:867)
> ==00:00:00:09.624 319033==    by 0x33F717: index_update_stats (index.c:2856)
> ==00:00:00:09.624 319033==    by 0x33FEE2: index_build (index.c:3106)
> ==00:00:00:09.625 319033==    by 0x33C7D3: index_create (index.c:1276)
> ==00:00:00:09.625 319033==    by 0x451000: DefineIndex (indexcmds.c:1216)
> ==00:00:00:09.625 319033==    by 0x48D091: ATExecAddIndex (tablecmds.c:9156)
> ==00:00:00:09.625 319033==    by 0x483F8E: ATExecCmd (tablecmds.c:5302)
> ==00:00:00:09.625 319033==    by 0x483877: ATRewriteCatalogs 
> (tablecmds.c:5186)
> ==00:00:00:09.625 319033==    by 0x482B9A: ATController (tablecmds.c:4741)
> ==00:00:00:09.625 319033==    by 0x4827A1: AlterTable (tablecmds.c:4387)
> ==00:00:00:09.625 319033==

Thanks.

> Perhaps current_is_member in heap_inplace_lock() should be initialized
> before the DoesMultiXactIdConflict() call as in other places...

heap_inplace_lock() ignores current_is_member after computing it, so let's
just pass NULL, as attached.
Author: Noah Misch 
Commit: Noah Misch 

Stop reading uninitialized memory in heap_inplace_lock().

Stop computing a never-used value.  This removes the read; the read had
no functional implications.  Back-patch to v12, like commit
a07e03fd8fa7daf4d1356f7cb501ffe784ea6257.

Reported by Alexander Lakhin.  Reviewed by FIXME.

Discussion: 
https://postgr.es/m/6c92f59b-f5bc-e58c-9bdd-d1f21c17c...@gmail.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index da5e656..82a0492 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6260,10 +6260,9 @@ heap_inplace_lock(Relation relation,
LockTupleMode lockmode = LockTupleNoKeyExclusive;
MultiXactStatus mxact_status = 
MultiXactStatusNoKeyUpdate;
int remain;
-   boolcurrent_is_member;
 
if (DoesMultiXactIdConflict((MultiXactId) xwait, 
infomask,
-   
lockmode, ¤t_is_member))
+   
lockmode, NULL))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
ret = false;
diff --git a/src/test/isolation/expected/intra-grant-inplace.out 
b/src/test/isolation/expected/intra-grant-inplace.out
index b5fe8b0..4e9695a 100644
--- a/src/test/isolation/expected/intra-grant-inplace.out
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -63,6 +63,30 @@ step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY 
(c); 
 step r3: ROLLBACK;
 step addk2: <... completed>
 
+starting permutation: b3 sfnku3 keyshr5 addk2 r3
+step b3: BEGIN ISOLATION LEVEL READ COMMITTED;
+step sfnku3: 
+   SELECT relhasindex FROM pg_class
+   WHERE oid = 'intra_grant_inplace'::regclass FOR NO KEY UPDATE;
+
+relhasindex
+---
+f  
+(1 row)
+
+step keyshr5: 
+   SELECT relhasindex FROM pg_class
+   WHERE oid = 'intra_grant_inplace'::regclass FOR KEY SHARE;
+
+relhasindex
+---
+f  
+(1 row)
+
+step addk2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); 
+step r3: ROLLBACK;
+step addk2: <... completed>
+
 starting permutation: b2 sfnku2 addk2 c2
 step b2: BEGIN;
 step sfnku2: 
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec 
b/src/test/isolation/specs/intra-grant-inplace.spec
index 2992c85..9936d38 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -96,6 +96,14 @@ permutation
addk2(r3)
r3
 
+# reproduce bug in DoesMultiXactIdConflict() call
+permutation
+   b3
+   sfnku3
+   keyshr5
+   addk2(r3)
+   r3
+
 # same-xact rowmark
 permutation
b2

Re: Set query_id for query contained in utility statement

2024-10-21 Thread Michael Paquier

On Tue, Oct 22, 2024 at 02:06:16PM +0900, Michael Paquier wrote:
> I've looked at 0001, and finished by splitting the case of all-level
> tracking with the multi-statements as the resulting table was feeling
> heavily bloated, particularly because of MERGE that spawned in
> multiple lines even if there were less entries.  The rest, except for
> some styling inconsistencies, was feeling OK.

And of course I have forgotten to attach a rebase of the remaining
patches..
--
Michael
From f630b9445100e1ef96e989ef064bd1e3fcb9f171 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 22 Oct 2024 13:28:28 +0900
Subject: [PATCH v10 1/3] Track location to extract relevant part in nested
 statement

Previously, Query generated through transform would have unset
stmt_location. Extensions relying on the statement location to extract
the relevant part of the statement would fallback to use the whole
statement instead, thus showing the same string in the top and
nested level which was a source of confusion.

This patch fixes the issue by keeping track of the statement locations
and propagate it to Query during transform, allowing pgss to only show
the relevant part of the query for nested query.
---
 src/include/nodes/parsenodes.h|  10 ++
 src/include/parser/analyze.h  |   3 +-
 src/include/parser/parse_node.h   |   2 +
 src/backend/optimizer/util/clauses.c  |   2 +-
 src/backend/parser/analyze.c  |  96 +++---
 src/backend/parser/gram.y |  71 +++-
 src/backend/parser/parse_merge.c  |   2 +
 .../expected/level_tracking.out   | 165 +-
 .../pg_stat_statements/expected/planning.out  |  10 +-
 .../pg_stat_statements/expected/select.out|   2 +-
 .../pg_stat_statements/expected/utility.out   |   2 +-
 contrib/pg_stat_statements/sql/planning.sql   |   4 +-
 12 files changed, 240 insertions(+), 129 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index c92cef3d16..b40b661ec8 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2046,6 +2046,8 @@ typedef struct InsertStmt
 	List	   *returningList;	/* list of expressions to return */
 	WithClause *withClause;		/* WITH clause */
 	OverridingKind override;	/* OVERRIDING clause */
+	ParseLoc	stmt_location;	/* start location, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
 } InsertStmt;
 
 /* --
@@ -2060,6 +2062,8 @@ typedef struct DeleteStmt
 	Node	   *whereClause;	/* qualifications */
 	List	   *returningList;	/* list of expressions to return */
 	WithClause *withClause;		/* WITH clause */
+	ParseLoc	stmt_location;	/* start location, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
 } DeleteStmt;
 
 /* --
@@ -2075,6 +2079,8 @@ typedef struct UpdateStmt
 	List	   *fromClause;		/* optional from clause for more tables */
 	List	   *returningList;	/* list of expressions to return */
 	WithClause *withClause;		/* WITH clause */
+	ParseLoc	stmt_location;	/* start location, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
 } UpdateStmt;
 
 /* --
@@ -2090,6 +2096,8 @@ typedef struct MergeStmt
 	List	   *mergeWhenClauses;	/* list of MergeWhenClause(es) */
 	List	   *returningList;	/* list of expressions to return */
 	WithClause *withClause;		/* WITH clause */
+	ParseLoc	stmt_location;	/* start location, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
 } MergeStmt;
 
 /* --
@@ -2159,6 +2167,8 @@ typedef struct SelectStmt
 	bool		all;			/* ALL specified? */
 	struct SelectStmt *larg;	/* left child */
 	struct SelectStmt *rarg;	/* right child */
+	ParseLoc	stmt_location;	/* start location, or -1 if unknown */
+	ParseLoc	stmt_len;		/* length in bytes; 0 means "rest of string" */
 	/* Eventually add fields for CORRESPONDING spec here */
 } SelectStmt;
 
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 28b66fccb4..8ba4e050af 100644
--- a/src/include/parser/analyze.h
+++ b/src/include/parser/analyze.h
@@ -46,8 +46,9 @@ extern List *transformUpdateTargetList(ParseState *pstate,
 	   List *origTlist);
 extern List *transformReturningList(ParseState *pstate, List *returningList,
 	ParseExprKind exprKind);
-extern Query *transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree);
+extern Query *transformOptionalSelectInto(ParseState *pstate, Node *parseTree);
 extern Query *transformStmt(ParseState *pstate, Node *parseTree);
+extern void setQueryStmtLen(ParseState *pstate, Query *qry, int stmt_len);
 
 extern bool stmt_requires_parse_analysis(RawStmt *parseTree);
 extern bool analyze_requires_snapshot(RawStmt *parseTree);
diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h

Re: Row pattern recognition

2024-10-21 Thread Tatsuo Ishii

Hi,

I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
Currently my RPR implementation does not allow PREV(col + 1). If
"PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
to implement "PREV(col + 1)".

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Set query_id for query contained in utility statement

2024-10-21 Thread Michael Paquier

On Mon, Oct 21, 2024 at 10:35:17AM +0200, Anthonin Bonnefoy wrote:
> I've updated 0001 to only use ORDER BY query. The query strings are
> not exact doublons, as the nested statement has the additional ending
> ';' due to using the whole string instead of just the RawStmt. Thus,
> the other sort expressions will never be used since there's no
> equality. There's also the possibility of breaking down each statement
> in individual blocks, with pgss reset and fetch for each one. However,
> I feel it's gonna add a lot of noise in the test file.

I've looked at 0001, and finished by splitting the case of all-level
tracking with the multi-statements as the resulting table was feeling
heavily bloated, particularly because of MERGE that spawned in
multiple lines even if there were less entries.  The rest, except for
some styling inconsistencies, was feeling OK.

One of the multi-statement tests includes this output for HEAD, and
that's on two PGSS entries:
EXPLAIN (COSTS OFF) SELECT $1, $2 UNION SELECT $3, $4;
EXPLAIN (COSTS OFF) (SELECT 1, 2, 3) UNION SELECT 3, 4, 5;

EXPLAIN (COSTS OFF) SELECT 1, 2 UNION SELECT 3, 4;
EXPLAIN (COSTS OFF) (SELECT $1, $2, $3) UNION SELECT $4, $5, $6;  

I did not notice that first, but that's really something!
Normalization is only applied partially to a portion of the string, so
we have a bunch of bloat for non-top queries that has existed for
years.

+   ParseLocstmt_location;  /* start location, or -1 if unknown */
+   ParseLocstmt_len;   /* length in bytes; 0 means "rest of string" */

I'm OK with this approach after considering a few things, mostly in
terms of consistency with the existing node structures.  The existing
business with YYLLOC_DEFAULT() counts here.

-Query *
-transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)

What's the advantage of removing this routine?  Is that because you're
pushing the initialization of stmt_location and stmt_len into
transformOptionalSelectInto(), making it mostly moot?  Something that
worries me a bit is that this changes makes the code less clean, by
having a SELECT-INTO specific routine called in the parse-analyze
paths, while adding three individual paths in charge of setting
pstate->p_stmt_len and p_stmt_location.

+   n->stmt_len = @3 - @2;

This specific case deserves a comment.  I don't have the best
understanding of this area yet (need more analysis), but With the
existing updateRawStmtEnd() and RawStmt also tracking this
information, I am wondering if we could be smarter with less paths
manipulating the start locations and lengths.  And your patch adds a
new setQueryStmtLen() that does the same kind of job.  Hmm.

FWIW, I don't feel strongly about 0004 that tries to make the REFRESH
handling smarter.  I am not sure that it even makes sense as-is by
hacking into a wrapper of pg_get_viewdef_worker() to get the query
string, leading it to not be normalized.  This has also a small
footprint in 0003.  I think that the bits in ExecRefreshMatView()
should be discarded from 0003, as a result.
--
Michael

signature.asc
Description: PGP signature

Row pattern recognition

2024-10-21 Thread David G. Johnston

On Monday, October 21, 2024, Tatsuo Ishii  wrote:
>
> I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
> Currently my RPR implementation does not allow PREV(col + 1). If
> "PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
> to implement "PREV(col + 1)".
>

Interesting feature that I’m now just seeing.

The expression PREV(column_name) produces a value output taken from the
given named column in the preceding frame row.  It doesn’t make any sense
to me to attempt to add the integer 1 to an identifier that is being used
as a value input to a “function”.  It would also seem quite odd if “+ 1”
had something to do with row selection as opposed to simply being an
operator “+(column_name%type, integer)” expression.

Maybe RPR is defining something special here I haven't yet picked up on, in
which case just ignore this.  But if I read: “UP as price > prev(price +
1)” in the opening example it would be quite non-intuitive to reason out
the meaning.  “Price > prev(price) + 1” would mean my current row is at
least one (e.g. dollar per share) more than the value of the previous
period.

David J.

Re: Using read_stream in index vacuum

2024-10-21 Thread Melanie Plageman

On Mon, Oct 21, 2024 at 4:49 PM Andrei Borodin  wrote:
>
> 21.10.2024, 22:34, "Melanie Plageman" :
>
> The whole point of the read stream callback provided by the caller is
> that the logic to get the next block should be there
>
> We must get number of blocks after examining last block. But callback 
> returning EOF might be called before. With current API we have to restart.
>
> Removing extension lock will not change this.

I was suggesting you call RelationGetNumberOfBlocks() once
current_block == last_exclusive in the callback itself.

- Melanie

Re: race condition in pg_class

2024-10-21 Thread Alexander Lakhin


Hello Noah,

25.09.2024 01:43, Noah Misch wrote:

Pushed, but the pushes contained at least one defect:



Please look at an anomaly introduced with a07e03fd8.
With the attached modification for intra-grant-inplace.spec, running this
test triggers a Valgrind-detected error for me:
==00:00:00:09.624 319033== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:09.624 319033==    at 0x25D120: DoesMultiXactIdConflict 
(heapam.c:7373)
==00:00:00:09.624 319033==    by 0x25B45A: heap_inplace_lock (heapam.c:6265)
==00:00:00:09.624 319033==    by 0x27D8CB: systable_inplace_update_begin 
(genam.c:867)
==00:00:00:09.624 319033==    by 0x33F717: index_update_stats (index.c:2856)
==00:00:00:09.624 319033==    by 0x33FEE2: index_build (index.c:3106)
==00:00:00:09.625 319033==    by 0x33C7D3: index_create (index.c:1276)
==00:00:00:09.625 319033==    by 0x451000: DefineIndex (indexcmds.c:1216)
==00:00:00:09.625 319033==    by 0x48D091: ATExecAddIndex (tablecmds.c:9156)
==00:00:00:09.625 319033==    by 0x483F8E: ATExecCmd (tablecmds.c:5302)
==00:00:00:09.625 319033==    by 0x483877: ATRewriteCatalogs (tablecmds.c:5186)
==00:00:00:09.625 319033==    by 0x482B9A: ATController (tablecmds.c:4741)
==00:00:00:09.625 319033==    by 0x4827A1: AlterTable (tablecmds.c:4387)
==00:00:00:09.625 319033==

Perhaps current_is_member in heap_inplace_lock() should be initialized
before the DoesMultiXactIdConflict() call as in other places...

Best regards,
Alexanderdiff --git a/src/test/isolation/specs/intra-grant-inplace.spec b/src/test/isolation/specs/intra-grant-inplace.spec
index 2992c85b44..3339c9f400 100644
--- a/src/test/isolation/specs/intra-grant-inplace.spec
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -90,9 +90,9 @@ permutation
 
 # inplace wait NO KEY UPDATE w/ KEY SHARE
 permutation
-	keyshr5
 	b3
 	sfnku3
+	keyshr5
 	addk2(r3)
 	r3

Re: Using read_stream in index vacuum

2024-10-21 Thread Melanie Plageman

On Sun, Oct 20, 2024 at 10:19 AM Andrey M. Borodin  wrote:
>
>
>
> > On 20 Oct 2024, at 15:16, Junwang Zhao  wrote:
> >
> > I'm not sure if I did not express myself correctly, I didn't mean to
> > restart the stream,
> > I mean we can create a new stream for each outer loop, I attached a
> > refactor 0002
> > based on your 0001, correct me if I'm wrong.
>
> I really like how the code looks with this refactoring. But I think we need 
> some input from Thomas.
> Is it OK if we start handful of streams for 1 page at the end of vacuum scan? 
> How costly is to start a new scan?

You shouldn't need either loop in btvacuumscan().

For the inner loop:

for (; scanblkno < num_pages; scanblkno++)
{
Buffer buf = read_stream_next_buffer(stream, NULL);
btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
scanblkno);
}

you should just be able to be do something like

Buffer buf = read_stream_next_buffer(stream, NULL);
if (BufferIsValid(buf))
  btvacuumpage(&vstate, buf);

Obviously I am eliding some details and clean-up and such. But your
read stream callback should be responsible for advancing the block
number and thus you shouldn't need to loop like this in
btvacuumscan().

The whole point of the read stream callback provided by the caller is
that the logic to get the next block should be contained there
(read_stream_get_block() is an exception to this).

For the outer loop,  I feel like we have options. For example, maybe
the read stream callback can call RelationGetNumberOfBlocks(). I mean
maybe we don't want to have to take a relation extension lock in a
callback.

So, alternatively, we could add some kind of restartable flag to the
read stream API. So, after the callback returns InvalidBlockNumber and
the read_stream_next_buffer() returns NULL, we could call something
like read_stream_update() or read_stream_restart() or something. We
would have updated the BlockRangeReadStreamPrivate->last_exclusive
value. In your case it might not be substantially different
operationally than making new read streams (since you are not
allocating memory for a per-buffer data structure). But, I think the
code would read much better than making new read stream objects in a
loop.

- Melanie

Re: Using read_stream in index vacuum

2024-10-21 Thread Melanie Plageman

On Mon, Oct 21, 2024 at 3:34 PM Melanie Plageman
 wrote:
>
> For the outer loop,  I feel like we have options. For example, maybe
> the read stream callback can call RelationGetNumberOfBlocks(). I mean
> maybe we don't want to have to take a relation extension lock in a
> callback.

Also, given this note in btvacuumscan:

 * XXX: Now that new pages are locked with RBM_ZERO_AND_LOCK, I don't
 * think the use of the extension lock is still required.

Maybe we can stop requiring the extension lock and then I think it
might be okay to call RelationGetNumberOfBlocks() in the callback.
Probably needs more thought though.

- Melanie

Re: Using read_stream in index vacuum

2024-10-21 Thread Andrei Borodin

Melanie, thanks for your comments.21.10.2024, 22:34, "Melanie Plageman" :The whole point of the read stream callback provided by the caller isthat the logic to get the next block should be thereWe must get number of blocks after examining last block. But callback returning EOF might be called before. With current API we have to restart.Removing extension lock will not change this.Best regards, Andrey Borodin.

Re: cost delay brainstorming

2024-10-21 Thread Jay

I had suggested something more that just cost limit, throttling which would
be re-startable vacuum -
https://www.postgresql.org/message-id/CAPdcCKpvZiRCoDxQoo9mXxXAK8w=bx5nqdttgzvhv2suxp0...@mail.gmail.com
.

It may not be difficult to predict patterns of idle periods with cloud
infrastructure and monitoring now-a-days. If we keep manual vacuum going in
those idle periods, then there would be much less chance of auto-vacuum
getting disruptive. This can be built with extensions or inside the engine.

However, this change is a bit bigger than just a config parameter. It
didn't get much traction.

- Jay Sudrik

On Tue, Jun 18, 2024 at 1:09 AM Robert Haas  wrote:

> Hi,
>
> As I mentioned in my talk at 2024.pgconf.dev, I think that the biggest
> problem with autovacuum as it exists today is that the cost delay is
> sometimes too low to keep up with the amount of vacuuming that needs
> to be done. I sketched a solution during the talk, but it was very
> complicated, so I started to try to think of simpler ideas that might
> still solve the problem, or at least be better than what we have
> today.
>
> I think we might able to get fairly far by observing that if the
> number of running autovacuum workers is equal to the maximum allowable
> number of running autovacuum workers, that may be a sign of trouble,
> and the longer that situation persists, the more likely it is that
> we're in trouble. So, a very simple algorithm would be: If the maximum
> number of workers have been running continuously for more than, say,
> 10 minutes, assume we're falling behind and exempt all workers from
> the cost limit for as long as the situation persists. One could
> criticize this approach on the grounds that it causes a very sudden
> behavior change instead of, say, allowing the rate of vacuuming to
> gradually increase. I'm curious to know whether other people think
> that would be a problem.
>
> I think it might be OK, for a couple of reasons:
>
> 1. I'm unconvinced that the vacuum_cost_delay system actually prevents
> very many problems. I've fixed a lot of problems by telling users to
> raise the cost limit, but virtually never by lowering it. When we
> lowered the delay by an order of magnitude a few releases ago -
> equivalent to increasing the cost limit by an order of magnitude - I
> didn't personally hear any complaints about that causing problems. So
> disabling the delay completely some of the time might just be fine.
>
> 1a. Incidentally, when I have seen problems because of vacuum running
> "too fast", it's not been because it was using up too much I/O
> bandwidth, but because it's pushed too much data out of cache too
> quickly. A long overnight vacuum can evict a lot of pages from the
> system page cache by morning - the ring buffer only protects our
> shared_buffers, not the OS cache. I don't think this can be fixed by
> rate-limiting vacuum, though: to keep the cache eviction at a level
> low enough that you could be certain of not causing trouble, you'd
> have to limit it to an extremely low rate which would just cause
> vacuuming not to keep up. The cure would be worse than the disease at
> that point.
>
> 2. If we decided to gradually increase the rate of vacuuming instead
> of just removing the throttling all at once, what formula would we use
> and why would that be the right idea? We'd need a lot of state to
> really do a calculation of how fast we would need to go in order to
> keep up, and that starts to rapidly turn into a very complicated
> project along the lines of what I mooted in Vancouver. Absent that,
> the only other idea I have is to gradually ramp up the cost limit
> higher and higher, which we could do, but we would have no idea how
> fast to ramp it up, so anything we do here feels like it's just
> picking random numbers and calling them an algorithm.
>
> If you like this idea, I'd like to know that, and hear any further
> thoughts you have about how to improve or refine it. If you don't, I'd
> like to know that, too, and any alternatives you can propose,
> especially alternatives that don't require crazy amounts of new
> infrastructure to implement.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
>
>

Re: Using read_stream in index vacuum

2024-10-21 Thread Andrey M. Borodin

> On 22 Oct 2024, at 00:05, Melanie Plageman  wrote:
> 
> I was suggesting you call RelationGetNumberOfBlocks() once
> current_block == last_exclusive in the callback itself.

Consider following sequence of events:

0. We schedule some buffers for IO
1. We call RelationGetNumberOfBlocks() in callback when current_block == 
last_exclusive and return InvalidBlockNumber to signal EOF
After this:
2. Some page is getting split into new page with number last_exclusive
3. Buffers from IO are returned and vacuumed, but not with number 
last_exclusive, because it was not scheduled

Maybe I'm missing something...

Best regards, Andrey Borodin.

Re: Fix C23 compiler warning

2024-10-21 Thread Tom Lane

Peter Eisentraut  writes:
> Committed with that change.  Thanks.

Should we back-patch this?  (And also a67a49648d9?)  It's
not hard to imagine people wanting to compile our stable
branches with C23 compilers.  I might leave out v12, which
is just days away from EOL, but this seems like a reasonable
change for all the later branches.

regards, tom lane

Re: Fix C23 compiler warning

2024-10-21 Thread Peter Eisentraut


On 20.10.24 17:56, Tom Lane wrote:

Peter Eisentraut  writes:

This no longer works because in C23, because an empty argument list is
now equivalent to (void), rather than an indeterminate one as before.
And so this results in an incompatible function pointer type and
compiler warnings.  (gcc and clang agree on this.)



I think we can fix this easily with a few struct forward declarations,
preserving the goal of not including extra header files, like this:


Do the struct declarations themselves need comments?  Other
places do this like

struct PlannerInfo;/* avoid including pathnodes.h here */

LGTM other than that nit.


Committed with that change.  Thanks.

Re: Enhancing Memory Context Statistics Reporting

2024-10-21 Thread Michael Paquier

On Mon, Oct 21, 2024 at 11:54:21PM +0530, Rahila Syed wrote:
> On the other hand, [2] provides the statistics for all backends but logs
> them in a file, which may not be convenient for quick access.

To be precise, pg_log_backend_memory_contexts() pushes the memory
context stats to LOG_SERVER_ONLY or stderr, hence this is appended to
the server logs.

> A fixed-size shared memory block, currently accommodating 30 records,
>  is used to store the statistics. This number was chosen arbitrarily,
>  as it covers all parent contexts at level 1 (i.e., direct children of the
> top memory context)
> based on my tests.
> Further experiments are needed to determine the optimal number
> for summarizing memory statistics.

+ * Statistics are shared via fixed shared memory which
+ * can hold statistics for 29 contexts. The rest of the
[...]
+   MemoryContextInfo memctx_infos[30]; 
[...]
+   memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); 
[...]
+   size = add_size(size, mul_size(30, sizeof(MemoryContextInfo)));
[...]
+   memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); 
[...]
+   memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); 

This number is tied to MemoryContextState added by the patch.  Sounds
like this would be better as a constant properly defined rather than
hardcoded in all these places.  This would make the upper-bound more
easily switchable in the patch.

+   Datum   path[128];
+   chartype[128];
[...]
+   charname[1024];
+   charident[1024];
+   chartype[128];
+   Datum   path[128];

Again, constants.  Why these values?  You may want to use more
#defines here.

> Any additional statistics that exceed the shared memory capacity
> are written to a file per backend in the PG_TEMP_FILES_DIR. The client
> backend
>  first reads from the shared memory, and if necessary, retrieves the
> remaining data from the file,
> combining everything into a unified view. The files are cleaned up
> automatically
> if a backend crashes or during server restarts.

Is the addition of the file to write any remaining stats really that
useful?  This comes with a heavy cost in the patch with the "in_use"
flag, the various tweaks around the LWLock release/acquire protecting
the shmem area and the extra cleanup steps required after even a clean
restart.  That's a lot of facility for this kind of information.
Another thing that may be worth considering is to put this information
in a DSM per the variable-size nature of the information, perhaps cap
it to a max to make the memory footprint cheaper, and avoid all
on-disk footprint because we don't need it to begin with as this is
information that makes sense only while the server is running.

Also, why the single-backend limitation?  One could imagine a shared
memory area indexed similarly to pgproc entries, that includes
auxiliary processes as much as backends, so as it can be possible to
get more memory footprints through SQL for more than one single
process at one moment in time.  If each backend has its own area of
shmem to deal with, they could use a shared LWLock on the shmem area
with an extra spinlock while the context data is dumped into memory as
the copy is short-lived.  Each one of them could save the information
in a DSM created only when a dump of the shmem is requested for a
given PID, for example.
--
Michael

signature.asc
Description: PGP signature

56 matches

Mail list logo