Re: Little cleanup of ShmemInit function names

2024-08-29 Thread Heikki Linnakangas

On 29/08/2024 04:06, Alvaro Herrera wrote:

I have had many a chance to visit english.stackexchange.net on
account of something I read in pgsql lists or code comments.

I see what you did there :-).

Committed, with "which see". Thanks everyone!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Disallow USING clause when altering type of generated column

2024-08-29 Thread Peter Eisentraut

On 22.08.24 10:49, Peter Eisentraut wrote:

On 22.08.24 09:59, Yugo NAGATA wrote:
Although ERRCODE_INVALID_TABLE_DEFINITION is used for en error on  
changing
type of inherited column, I guess that is because it prevents from 
breaking
consistency between inherited and inheriting tables as a result of 
the command.
In this sense, maybe, ERRCODE_INVALID_COLUMN_DEFINITION is proper 
here, because

this check is to prevent inconsistency between columns in a tuple.


Yes, that was my thinking.  I think of ERRCODE_FEATURE_NOT_SUPPORTED as
"we could add it in the future", but that does not seem to apply here.


+    (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("cannot specify USING when altering type of 
generated column"),
+ errdetail("Column \"%s\" is a generated column.", 
colName)));


Do you thnik ERRCODE_INVALID_TABLE_DEFINITION is more proper than
ERRCODE_INVALID_COLUMN_DEFINITION in this case?


COLUMN seems better here.


Committed and backpatched, with that adjustment.





Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-08-29 Thread Bharath Rupireddy
On Wed, Aug 28, 2024 at 3:14 AM Jeff Davis  wrote:
>
> On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> > 0001 implementation issues:
> >
> > * We need default implementations for AMs that don't implement the
> > new
> > APIs, so that the AM will still function even if it only defines the
> > single-tuple APIs. If we need to make use of the AM's multi_insert
> > method (I'm not sure we do), then the default methods would need to
> > handle that as well. (I thought a previous version had these default
> > implementations -- is there a reason they were removed?)
>
> On second thought, it would be easier to just have the caller check
> whether the AM supports the multi-insert path; and if not, fall back to
> the single-tuple path. The single-tuple path is needed anyway for cases
> like before-row triggers.

Up until v21, the default implementation existed, see
https://www.postgresql.org/message-id/CALj2ACX90L5Mb5Vv%3DjsvhOdZ8BVsfpZf-CdCGhtm2N%2BbGUCSjg%40mail.gmail.com.
I then removed it in v22 to keep the code simple.

IMO, every caller branching out in the code like if (rel->rd_tableam->
tuple_modify_buffer_insert != NULL) then multi insert; else single
insert; doesn't look good. IMO, the default implementation approach
keeps things simple which eventually can be removed in *near* future.
Thoughts?

One change in the default implementation I would do from that of v21
is to assign the default AMs in GetTableAmRoutine() itself to avoid if
.. else if .. else in the table_modify_XXX().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Redux: Throttle WAL inserts before commit

2024-08-29 Thread Jakub Wartak
On Tue, Aug 27, 2024 at 12:51 PM Shirisha Shirisha
 wrote:
>
> Hello hackers,
>
> This is an attempt to resurrect the thread [1] to throttle WAL inserts
> before the point of commit.
>
> Background:
>
> Transactions on commit, wait for replication and make sure WAL is
> flushed up to commit lsn on standby, when synchronous_commit is on.

Hi Shirisha,

Just to let you know, there was a more recent attempt at that in [1]
in Jan 2023 , also with a resurrection attempt there in Nov 2023 by
Tomas. Those patches there seemed to have received plenty of attention
back then and were also based on SyncRepWaitForLSN(), but somehow
maybe we ran out of steam and there was not that big interest back
then.

Maybe you could post a review there (for Tomas's more modern recent
patch), if it is helping your use case even today. That way it could
get some traction again?

-Jakub Wartak.




Re: type cache cleanup improvements

2024-08-29 Thread Alexander Korotkov
On Mon, Aug 26, 2024 at 11:26 AM Alexander Korotkov
 wrote:
>
> On Mon, Aug 26, 2024 at 9:37 AM Andrei Lepikhov  wrote:
> > On 25/8/2024 23:22, Alexander Korotkov wrote:
> > > On Sun, Aug 25, 2024 at 10:21 PM Alexander Korotkov
> > >>> (This Assert is introduced by c14d4acb8.)
> > >>
> > >> Thank you for noticing.  I'm checking this.
> > >
> > > I didn't take into account that TypeCacheEntry could be invalidated
> > > while lookup_type_cache() does syscache lookups.  When I realized that
> > > I was curious on how does it currently work.  It appears that type
> > > cache invalidation mostly only clears the flags while values are
> > > remaining in place and still available for lookup_type_cache() caller.
> > > TypeCacheEntry.tupDesc is invalidated directly, and it has guarantee
> > > to survive only because we don't do any syscache lookups for composite
> > > data types later in lookup_type_cache().  I'm becoming less fan of how
> > > this works...  I think these aspects needs to be at least documented
> > > in details.
> > >
> > > Regarding c14d4acb8, it appears to require redesign.  I'm going to revert 
> > > it.
> > Sorry, but I don't understand your point.
> > Let's refocus on the problem at hand. The issue arose when the
> > TypeCacheTypCallback and the TypeCacheRelCallback were executed in
> > sequence within InvalidateSystemCachesExtended.
> > The first callback cleaned the flags TCFLAGS_HAVE_PG_TYPE_DATA and
> > TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS. But the call of the second callback
> > checks the typentry->tupDesc and, because it wasn't NULL, attempted to
> > remove this record a second time.
> > I think there is no case for redesign, but we have a mess in
> > insertion/deletion conditions.
>
> Yes, it's possible to repair the current approach.  But we need to do
> this correct, not just "not failing with current usages".  Then we
> need to call insert_rel_type_cache_if_needed() not just when we set
> TCFLAGS_HAVE_PG_TYPE_DATA flag, but every time we set any of
> TCFLAGS_OPERATOR_FLAGS or tupDesc.  That's a lot of places, not as
> simple and elegant as it was planned.  This is why I wonder if there
> is a better approach.
>
> Secondly, I'm not terribly happy with current state of type cache.
> The caller of lookup_type_cache() might get already invalidated data.
> This probably OK, because caller probably hold locks on dependent
> objects to guarantee that relevant properties of type actually
> persists.  At very least this should be documented, but it doesn't
> seem so.  Setting of tupdesc is sensitive to its order of execution.
> That feels quite fragile to me, and not documented either.  I think
> this area needs improvements before we push additional functionality
> there.

I see fdd965d074 added a proper handling for concurrent invalidation
for relation cache. If a concurrent invalidation occurs, we retry
building a relation descriptor.  Thus, we end up with returning of a
valid relation descriptor to caller.  I wonder if we can take the same
approach to type cache.  That would make the whole type cache more
consistent and less fragile.  Also, this patch will be simpler.

--
Regards,
Alexander Korotkov
Supabase




Re: Add contrib/pg_logicalsnapinspect

2024-08-29 Thread Amit Kapila
On Wed, Aug 28, 2024 at 1:25 AM Bertrand Drouvot
 wrote:
>
> On Mon, Aug 26, 2024 at 07:05:27PM +0530, Amit Kapila wrote:
> > On Thu, Aug 22, 2024 at 5:56 PM Bertrand Drouvot
> >  wrote:
> > >
> > > 2. The SnapBuildOnDisk and SnapBuild structs are now exposed to public. 
> > > Means
> > > we should now pay much more attention when changing their contents but I 
> > > think
> > > it's worth it.
> > >
> >
> > Is it possible to avoid exposing these structures? Can we expose some
> > function from snapbuild.c that provides the required information?
>
> Yeah, that's an option if we don't want to expose those structs to public.
>
> I think we could 1/ create a function that would return a formed HeapTuple, or
> 2/ we could create multiple functions (about 15) that would return the values
> we are interested in.
>
> I think 2/ is fine as it would give more flexiblity (no need to retrieve a 
> whole
> tuple if one is interested to only one value).
>

True, but OTOH, each time we add a new field to these structures, a
new function has to be exposed. I don't have a strong opinion on this
but seeing current use cases, it seems okay to expose a single
function.

> What do you think? Did you have something else in mind?
>

On similar lines, we can also provide a function to get the slot's
on-disk data. IIRC, Bharath had previously proposed a tool to achieve
the same. It is fine if we don't want to add that as part of this
patch but I mentioned it because by having that we can have a set of
functions to view logical decoding data.

-- 
With Regards,
Amit Kapila.




Re: Allow logical failover slots to wait on synchronous replication

2024-08-29 Thread shveta malik
On Thu, Aug 29, 2024 at 2:31 AM John H  wrote:
>
> Hi Amit,
>
> On Mon, Aug 26, 2024 at 11:00 PM Amit Kapila  wrote:
> > I wanted a simple test where in the first case you use
> > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case
> > use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You
> > can try some variations of it as well. The idea is that even if the
> > performance is less for synchronous_standby_names configuration, we
> > should be able to document it.  This will help users to decide what is
> > ...
> > What is the difference between "Test failover_slots with
> > synchronized_standby_slots = 'rr_1, rr_2,
> > > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new 
> > > shared cache"? I want to know what configuration did you used for 
> > > synchronous_standby_names in the latter case.
>
> Sorry for the confusion due to the bad-naming of the test cases, let
> me rephrase.
>  All three tests had synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> set with synchronous_commit = 'on', and failover_slots = 'on'
> for the 10 logical slots.
>
> # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2,
> rr_3, rr_4, rr_5'
> This is the test you wanted where the logical clients are waiting on
> all 5 slots to acknowledge the change since
> 'synchronized_standby_slots' takes priority when set.
>
> # Test failover_slots sync rep no cache
> This test has 'synchronized_standby_slots' commented out, and without
> relying on the new cache introduced in 0003.
> Logical clients will wait on synchronous_standby_names in this case.
>
> # Test failover slots with additional shared cache
> This test also has  'synchronized_standby_slots' commented out, and
> logical clients will wait on the LSNs
> reported from synchronous_standby_names but it relies on a new cache
> to reduce contention on SyncRepLock.
>
> > The idea is that even if the
> > performance is less for synchronous_standby_names configuration, we
> > should be able to document it.  This will help users to decide what is
> > best for them.
>
> Makes sense.
>
> > I am also not sure especially as the test results didn't shown much
> > improvement and the code also becomes bit complicated. BTW, in the
> > 0003 version in the below code:
>
> That's fair, I've updated to be more in line with 0002.
>
> > + /* Cache values to reduce contention */
> > + LWLockAcquire(SyncRepLock, LW_SHARED);
> > + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *)
> > walsndctl->lsn, sizeof(lsn));
> > + LWLockRelease(SyncRepLock);
> >
> > Which mode lsn is being copied? I am not sure if I understood this
> > part of the code.
>
> All of the mode LSNs are being copied in case SyncRepWaitMode changes in
> the next iteration. I've removed that part but kept:
>
> > + memcpy(lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn));
>
> as suggested by Bertrand to avoid the for loop updating values one-by-one.
>
> Here's what's logged after the memcpy:
>
> 2024-08-28 19:41:13.798 UTC [1160413] LOG:  lsn[0] after memcpy is: 
> 279/752C7FF0
> 2024-08-28 19:41:13.798 UTC [1160413] LOG:  lsn[1] after memcpy is: 
> 279/752C7F20
> 2024-08-28 19:41:13.798 UTC [1160413] LOG:  lsn[2] after memcpy is: 
> 279/752C7F20
>
> > In the 0002 version, in the following code [1], you are referring to
> > LSN mode which is enabled for logical walsender irrespective of the
> > mode used by the physical walsender. It is possible that they are
> > always the same but that is not evident from the code or comments in
> > the patch.
>
> They are almost always the same, I tried to indicate that with the
> following comment in the patch, but I could make it more explicit?
> > /* Initialize value in case SIGHUP changing to SYNC_REP_NO_WAIT */
>
> At the beginning we set
>
> > int mode = SyncRepWaitMode;
>
> At this time, the logical walsender mode it's checking against is the
> same as what the physical walsenders are using.
> It's possible that this mode is no longer the same when we execute the
> following check:
>
> > if (lsn[mode] >= wait_for_lsn)
>
> because of a SIGHUP to synchronous_commit that changes SyncRepWaitMode
> to some other value
>
> We cache the value instead of
> > if (lsn[SyncRepWaitMode] >= wait_for_lsn)
>
> because SYNC_REP_NO_WAIT is -1. If SyncRepWaitMode is set to this it
> leads to out of bounds access.
>
> I've attached a new patch that removes the shared cache introduced in 0003.
>

Thanks for the patch. Few comments and queries:

1)
+ static XLogRecPtr   lsn[NUM_SYNC_REP_WAIT_MODE];

We shall name it as 'lsns' as there are multiple.

2)

+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }

Can we do it like below similar to what you have done at another place:
memset(lsn, InvalidXLogRecPtr, sizeof(lsn));

3)
+ if (!initialized)
+ {
+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }
+ }

I do not see 'initialized' set to TRUE anywhere. Can you please
elabo

Make printtup a bit faster

2024-08-29 Thread Andy Fan


Usually I see printtup in the perf-report with a noticeable ratio. Take
"SELECT * FROM pg_class" for example, we can see:

85.65% 3.25%  postgres  postgres  [.] printtup

The high level design of printtup is:

1. Used a pre-allocated StringInfo DR_printtup.buf to store data for
each tuples. 
2. for each datum in the tuple, it calls the type-specific out function
and get a cstring. 
3. after get the cstring, we figure out the "len"  and add both len and
'data' into DR_printtup.buf.  
4. after all the datums are handled, socket_putmessage copies them into
PqSendBuffer. 
5. When the usage of PgSendBuffer is up to PqSendBufferSize, using send
syscall to sent them into client (by copying the data from userspace to
kernel space again). 

Part of the slowness is caused by "memcpy", "strlen" and palloc in
outfunction. 

8.35% 8.35%  postgres  libc.so.6  [.] __strlen_avx2 
4.27% 0.00%  postgres  libc.so.6  [.] __memcpy_avx_unaligned_erms
3.93% 3.93%  postgres  postgres  [.] palloc (part of them caused by
out function) 
5.70% 5.70%  postgres  postgres  [.] AllocSetAlloc (part of them
caused by printtup.) 

My high level proposal is define a type specific print function like:

oidprint(Datum datum, StringInfo buf)
textprint(Datum datum, StringInfo buf)

This function should append both data and len into buf directly. 

for the oidprint case, we can avoid:

5. the dedicate palloc in oid function.
6. the memcpy from the above memory into DR_printtup.buf

for the textprint case, we can avoid

7. strlen, since we can figure out the length from varlena.vl_len

int2/4/8/timestamp/date/time are similar with oid. and numeric, varchar
are similar with text. This almost covers all the common used type.

Hard coding the relationship between common used type and {type}print
function OID looks not cool, Adding a new attribute in pg_type looks too
aggressive however. Anyway this is the next topic to talk about.

If a type's print function is not defined, we can still using the out 
function (and PrinttupAttrInfo caches FmgrInfo rather than
FunctionCallInfo, so there is some optimization in this step as well).  

This proposal covers the step 2 & 3.  If we can do something more
aggressively, we can let the xxxprint print to PqSendBuffer directly,
but this is more complex and need some infrastructure changes. the
memcpy in step 4 is: "1.27% __memcpy_avx_unaligned_erms" in my above
case. 

What do you think?

-- 
Best Regards
Andy Fan





Re: Add contrib/pg_logicalsnapinspect

2024-08-29 Thread Bertrand Drouvot
Hi,

On Thu, Aug 29, 2024 at 02:51:36PM +0530, Amit Kapila wrote:
> On Wed, Aug 28, 2024 at 1:25 AM Bertrand Drouvot
>  wrote:
> >
> > On Mon, Aug 26, 2024 at 07:05:27PM +0530, Amit Kapila wrote:
> > > On Thu, Aug 22, 2024 at 5:56 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > 2. The SnapBuildOnDisk and SnapBuild structs are now exposed to public. 
> > > > Means
> > > > we should now pay much more attention when changing their contents but 
> > > > I think
> > > > it's worth it.
> > > >
> > >
> > > Is it possible to avoid exposing these structures? Can we expose some
> > > function from snapbuild.c that provides the required information?
> >
> > Yeah, that's an option if we don't want to expose those structs to public.
> >
> > I think we could 1/ create a function that would return a formed HeapTuple, 
> > or
> > 2/ we could create multiple functions (about 15) that would return the 
> > values
> > we are interested in.
> >
> > I think 2/ is fine as it would give more flexiblity (no need to retrieve a 
> > whole
> > tuple if one is interested to only one value).
> >
> 
> True, but OTOH, each time we add a new field to these structures, a
> new function has to be exposed. I don't have a strong opinion on this
> but seeing current use cases, it seems okay to expose a single
> function.

Yeah that's fair. And now I'm wondering if we need an extra module. I think
we could "simply" expose 2 new functions in core, thoughts?

> > What do you think? Did you have something else in mind?
> >
> 
> On similar lines, we can also provide a function to get the slot's
> on-disk data.

Yeah, having a way to expose the data from the disk makes fully sense to me.
 
> IIRC, Bharath had previously proposed a tool to achieve
> the same. It is fine if we don't want to add that as part of this
> patch but I mentioned it because by having that we can have a set of
> functions to view logical decoding data.
>

That's right. I think this one would be simply enough to expose one or two
functions in core too (and probably would not need an extra module).

Regards,
 
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: PG_TEST_EXTRA and meson

2024-08-29 Thread Ashutosh Bapat
On Wed, Aug 28, 2024 at 8:46 PM Nazir Bilal Yavuz  wrote:
>
> Also, I think TEST 110 and 170 do not look correct to me. In the
> current way, we do not pass PG_TEST_EXTRA to the make command.
>
> 110 should be:
> 'cd $XID_MODULE_DIR && PG_TEST_EXTRA=xid_wraparound make check'
> instead of 'PG_TEST_EXTRA=xid_wraparound cd $XID_MODULE_DIR && make
> check'
>
> 170 should be:
> 'cd $XID_MODULE_DIR && PG_TEST_EXTRA="" make check && cd $PGDir'
> instead of 'PG_TEST_EXTRA="" cd $XID_MODULE_DIR && make check && cd
> $PGDir'
>

You are right. Jacob did point that out, but I didn't fix all the
places back then. Here's updated script.

-- 
Best Wishes,
Ashutosh Bapat


test_pg_test_extra.sh
Description: application/shellscript


Re: Conflict Detection and Resolution

2024-08-29 Thread Amit Kapila
On Fri, Aug 23, 2024 at 10:39 AM shveta malik  wrote:
>
> Please find issues which need some thoughts and approval for
> time-based resolution and clock-skew.
>
> 1)
> Time based conflict resolution and two phase transactions:
>
> Time based conflict resolution (last_update_wins) is the one
> resolution which will not result in data-divergence considering
> clock-skew is taken care of. But when it comes to two-phase
> transactions, it might not be the case. For two-phase transaction, we
> do not have commit timestamp when the changes are being applied. Thus
> for time-based comparison, initially it was decided to user prepare
> timestamp but it may result in data-divergence.  Please see the
> example at [1].
>
> Example at [1] is a tricky situation, and thus in the initial draft,
> we decided to restrict usage of 2pc and CDR together. The plan is:
>
> a) During Create subscription, if the user has given last_update_wins
> resolver for any conflict_type and 'two_phase' is also enabled, we
> ERROR out.
> b) During Alter subscription, if the user tries to update resolver to
> 'last_update_wins' but 'two_phase' is enabled, we error out.
>
> Another solution could be to save both prepare_ts and commit_ts. And
> when any txn comes for conflict resolution, we first check if
> prepare_ts is available, use that else use commit_ts.  Availability of
> prepare_ts would indicate it was a prepared txn and thus even if it is
> committed, we should use prepare_ts for comparison for consistency.
> This will have some overhead of storing prepare_ts along with
> commit_ts. But if the number of prepared txns are reasonably small,
> this overhead should be less.
>

Yet another idea is that if the conflict is detected and the
resolution strategy is last_update_wins then from that point we start
writing all the changes to the file similar to what we do for
streaming mode and only once commit_prepared arrives, we will read and
apply changes. That will solve this problem.

> We currently plan to go with restricting 2pc and last_update_wins
> together, unless others have different opinions.
>

Sounds reasonable but we should add comments on the possible solution
like the one I have mentioned so that we can extend it afterwards.

> ~~
>
> 2)
> parallel apply worker and conflict-resolution:
> As discussed in [2] (see last paragraph in [2]), for streaming of
> in-progress transactions by parallel worker, we do not have
> commit-timestamp with each change and thus it makes sense to disable
> parallel apply worker with CDR. The plan is to not start parallel
> apply worker if 'last_update_wins' is configured for any
> conflict_type.
>

The other idea is that we can let the changes written to file if any
conflict is detected and then at commit time let the remaining changes
be applied by apply worker. This can introduce some complexity, so
similar to two_pc we can extend this functionality later.

>  ~~
>
> 3)
> parallel apply worker and clock skew management:
> Regarding clock-skew management as discussed in [3], we will wait for
> the local clock to come within tolerable range during 'begin' rather
> than before 'commit'. And this wait needs commit-timestamp in the
> beginning, thus we plan to restrict starting pa-worker even when
> clock-skew related GUCs are configured.
>
> Earlier we had restricted both 2pc and parallel worker worker start
> when detect_conflict was enabled, but now since detect_conflict
> parameter is removed, we will change the implementation to restrict
> all 3 above cases when last_update_wins is configured. When the
> changes are done, we will post the patch.
>

At this stage, we are not sure how we want to deal with clock skew.
There is an argument that clock-skew should be handled outside the
database, so we can probably have the clock-skew-related stuff in a
separate patch.

> ~~
>
> 4)
> 
> Earlier when 'detect_conflict' was enabled, we were giving WARNING if
> 'track_commit_timestamp' was not enabled. This was during CREATE and
> ALTER subscription. Now with this parameter removed, this WARNING has
> also been removed. But I think we need to bring back this WARNING.
> Currently default resolvers set may work without
> 'track_commit_timestamp' but when user gives CONFLICT RESOLVER in
> create-sub or alter-sub explicitly making them configured to
> non-default values (or say any values, does not matter if few are
> defaults), we may still emit this warning to alert user:
>
> 2024-07-26 09:14:03.152 IST [195415] WARNING:  conflict detection
> could be incomplete due to disabled track_commit_timestamp
> 2024-07-26 09:14:03.152 IST [195415] DETAIL:  Conflicts update_differ
> and delete_differ cannot be detected, and the origin and commit
> timestamp for the local row will not be logged.
>
> Thoughts?
>
> If we emit this WARNING during each resolution, then it may flood our
> log files, thus it seems better to emit it during create or alter
> subscription instead of during resolution.
>

Sounds reasonable.

-- 
W

Re: ANALYZE ONLY

2024-08-29 Thread Melih Mutlu
Hi Michael,

Michael Harris , 23 Ağu 2024 Cum, 13:01 tarihinde şunu
yazdı:

> V2 of the patch is attached.
>

Thanks for updating the patch. I have a few more minor feedbacks.

-ANALYZE [ ( option [, ...] )
> ] [ table_and_columns [, ...] ]
> +ANALYZE [ ( option [, ...] )
> ] [ [ ONLY ] table_and_columns
> [, ...] ]


I believe moving "[ ONLY ]" to the definitions of table_and_columns below,
as you did with "[ * ]", might be better to be consistent with other places
(see [1])

+ if ((options & VACOPT_VACUUM) && is_partitioned_table && ! i
> nclude_children)


There are also some issues with coding conventions in some places (e.g. the
space between "!" and "include_children" abode). I think running pgindent
would resolve such issue in most places.


[1] https://www.postgresql.org/docs/16/sql-createpublication.html

Regards,
-- 
Melih Mutlu
Microsoft


Re: Make printtup a bit faster

2024-08-29 Thread David Rowley
On Thu, 29 Aug 2024 at 21:40, Andy Fan  wrote:
>
>
> Usually I see printtup in the perf-report with a noticeable ratio.

> Part of the slowness is caused by "memcpy", "strlen" and palloc in
> outfunction.

Yeah, it's a pretty inefficient API from a performance point of view.

> My high level proposal is define a type specific print function like:
>
> oidprint(Datum datum, StringInfo buf)
> textprint(Datum datum, StringInfo buf)

I think what we should do instead is make the output functions take a
StringInfo and just pass it the StringInfo where we'd like the bytes
written.

That of course would require rewriting all the output functions for
all the built-in types, so not a small task.  Extensions make that job
harder. I don't think it would be good to force extensions to rewrite
their output functions, so perhaps some wrapper function could help us
align the APIs for extensions that have not been converted yet.

There's a similar problem with input functions not having knowledge of
the input length. You only have to look at textin() to see how useful
that could be. Fixing that would probably make COPY FROM horrendously
faster. Team that up with SIMD for the delimiter char search and COPY
go a bit better still. Neil Conway did propose the SIMD part in [1],
but it's just not nearly as good as it could be when having to still
perform the strlen() calls.

I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

David

[1] 
https://postgr.es/m/caow5syb1hprqkrzjcsrcp1eauqzzy+njz-awbboumogjhjs...@mail.gmail.com




Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-29 Thread Anthonin Bonnefoy
On Wed, Aug 28, 2024 at 12:24 AM Thomas Munro  wrote:
> 2.  I tested against LLVM 10-18, and found that 10 and 11 lack some
> needed symbols.  So I just hid this code from them.  Even though our
> stable branches support those and even older versions, I am not sure
> if it's worth trying to do something about that for EOL'd distros that
> no one has ever complained about.  I am willing to try harder if
> someone thinks that's important...

I would also assume that people using arm64 are more likely to use
recent versions than not.

I've done some additional tests on different LLVM versions with both
the unpatched version (to make sure the crash was triggered) and the
patched version. I'm joining the test scripts I've used as reference.
They target a kubernetes pod since it was the easiest way for me to
get a test ubuntu Jammy:
- setup_pod.sh: Install necessary packages, get multiple llvm
versions, fetch and compile master and patched version of postgres on
different LLVM version
- run_test.sh: go through all LLVM versions for both unpatched and
patched postgres to run the test_script.sh
- test_script.sh: ran inside the pod to setup the db with the
necessary tables and check if the crash happens

This generated the following output:
Test unpatched version on LLVM 19, : Crash triggered
Test unpatched version on LLVM 18, libLLVM-18.so.18.1: Crash triggered
Test unpatched version on LLVM 17, libLLVM-17.so.1: Crash triggered
Test unpatched version on LLVM 16, libLLVM-16.so.1: Crash triggered
Test unpatched version on LLVM 15, libLLVM-15.so.1: Crash triggered
Test unpatched version on LLVM 14, libLLVM-14.so.1: Crash triggered
Test unpatched version on LLVM 13, libLLVM-13.so.1: Crash triggered

Test patched version on LLVM 19, : Query ran successfully
Test patched version on LLVM 18, libLLVM-18.so.18.1: Query ran successfully
Test patched version on LLVM 17, libLLVM-17.so.1: Query ran successfully
Test patched version on LLVM 16, libLLVM-16.so.1: Query ran successfully
Test patched version on LLVM 15, libLLVM-15.so.1: Query ran successfully
Test patched version on LLVM 14, libLLVM-14.so.1: Query ran successfully
Test patched version on LLVM 13, libLLVM-13.so.1: Query ran successfully

I try to print the libLLVM linked to llvm.jit in the output to double
check whether I test on the correct version. The LLVM 19 package only
provides static libraries (probably because it's still a release
candidate?) so it shows as empty in the output. There was no LLVM 12
available when using the llvm.sh script so I couldn't test it. As for
the result, prepatch PG all crashed as expected while the patched
version was able to run the query successfully.

> Next, I think we should wait to see if the LLVM project commits that
> PR, this so that we can sync with their 19.x stable branch, instead of
> using code from a PR.  Our next minor release is in November, so we
> have some time.  If they don't commit it, we can consider it anyway: I
> mean, it's crashing all over the place in production, and we see that
> other projects are shipping this code already.

The PR[1] just received an approval and it sounds like they are ok to
eventually merge it.

[1] https://github.com/llvm/llvm-project/pull/71968
#!/bin/bash

# Don't want coredumps
ulimit -c 0

# Stop running PG
/var/lib/postgresql/.local/bin/pg_ctl -D /var/lib/postgresql/db_data stop &>> /dev/null

# Clean db_data
rm -rf /var/lib/postgresql/db_data

# Create new dbdata
/var/lib/postgresql/.local/bin/initdb -D /var/lib/postgresql/db_data -Apeer > /dev/null
echo "port = 5432
shared_buffers='4GB'
" > /var/lib/postgresql/db_data/postgresql.conf

# Start and setup test partitioned table
/var/lib/postgresql/.local/bin/pg_ctl -D /var/lib/postgresql/db_data start -l /tmp/pg.log > /dev/null
/var/lib/postgresql/.local/bin/pgbench -i --partitions=128 2> /dev/null

# Run the query to trigger the issue
/var/lib/postgresql/.local/bin/psql options=-cjit_above_cost=0 -c 'SELECT count(bid) from pgbench_accounts;' &>> /dev/null
exit_code=$?

if (( $exit_code == 2 )); then
echo "Crash triggered"
elif (( $exit_code == 0 )); then
echo "Query ran successfully"
else
echo "Other issue while running the query"
fi
#!/bin/bash
set -eu

TEST_POD=$(kubectl get pod -l app="test_pg" -nsre -o=jsonpath='{.items[*].metadata.name}')
LLVM_VERSIONS=(19 18 17 16 15 14 13)

get_llvm_version()
{
kubectl exec "$pod" -- bash -c "ldd /var/lib/postgresql/.local/lib/llvmjit.so | grep libLLVM | awk '{print \$1}'"
}

run_test_on_pod()
{
local pod="$1"

kubectl cp test_script.sh "$pod":/var/lib/postgresql/

for version in ${LLVM_VERSIONS[@]}; do
build_dir="~/postgres_build/build_${version}"
kubectl exec "$pod" -- bash -c "su postgres -c \" cd $build_dir; make -j19 install &>> /dev/null \" "

echo -n "Test unpatched version on LLVM $version, $(get_llvm_version): "
kubectl exec "$pod" -- bash -c "su postgres -c \" cd; bash test_script.sh \" "
done

for version in 

Re: Eager aggregation, take 3

2024-08-29 Thread Robert Haas
On Wed, Aug 28, 2024 at 10:26 PM Richard Guo  wrote:
> Yeah, I'm concerned about this too.  In addition to the inaccuracies
> in aggregation estimates, our estimates for joins are sometimes not
> very accurate either.  All this are likely to result in regressions
> with eager aggregation in some cases.  Currently I don't have a good
> answer to this problem.  Maybe we can run some benchmarks first and
> investigate the regressions discovered on a case-by-case basis to better
> understand the specific issues.

While it's true that we can make mistakes during join estimation, I
believe aggregate estimation tends to be far worse.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




[BUG] Security bugs affected version detected.

2024-08-29 Thread James Watt
Our tool have detected that postgre  in the version of REL9_6_18~ REL9_6_24
may also affected by the vulnerability  CVE-2022-2625. The vulnerability
database does not include these versions and you may not fix it in the
REL9_6 branch. Is there a need to backport the patch of CVE-2022-2625?


Re: Streaming read-ready sequential scan code

2024-08-29 Thread Alexander Lakhin

Hello Thomas,

29.08.2024 01:16, Thomas Munro wrote:

Yeah.  That's quite interesting, and must destabilise that
simple-minded demo.  I'm curious to know exactly what contention is
causing that (about 3/4 of a millisecond that I don't see and now I
want to know what it's waiting for), but it's a very crude test
lacking timer resolution in the earlier messages, and it's an
unrelated topic and a distraction.  Perhaps it explains why you saw
two different behaviours in Q15 with the patch and I didn't, though.
Really it shouldn't be so sensitive to such variations, it's obviously
a terrible plan, and TPC-DS needs a planner hacker mega-brain applied
to it; I'm going to try to nerd-snipe one...


I looked at two perf profiles of such out-of-sync processes and found no
extra calls or whatsoever in the slow one, it just has the number of perf
samples increased proportionally. It made me suspect CPU frequency
scaling... Indeed, with the "performance" governor set and the boost mode
disabled, I'm now seeing much more stable numbers (I do this tuning before
running performance tests, but I had forgotten about that when I ran that
your test, my bad).

I'm sorry for the noise and the distraction.

Still, now I can confirm your results. Without the patch, two parallel
workers gave "Buffers: shared hit=217 / Buffers: shared hit=226" 10 times
out of 10. And with the patch, I've got
"shared hit=219 / shared hit=224" 3 times,
"shared hit=220 / shared hit=223" 4 times,
"shared hit=221 / shared hit=222" 3 times of 10.

On b7b0f3f27~1, my results are:
"shared hit=219 / shared hit=224": 2
"shared hit=220 / shared hit=223": 3
"shared hit=221 / shared hit=222": 4
"shared hit=218 / shared hit=225": 1

Best regards,
Alexander




Re: [BUG] Security bugs affected version detected.

2024-08-29 Thread Daniel Gustafsson
> On 29 Aug 2024, at 14:54, James Watt  wrote:
> 
> Our tool have detected that postgre  in the version of REL9_6_18~ REL9_6_24 
> may also affected by the vulnerability  CVE-2022-2625. The vulnerability 
> database does not include these versions and you may not fix it in the REL9_6 
> branch. Is there a need to backport the patch of CVE-2022-2625?

9.6 was EOL at the time of 2022-2625 being announced and thus wasn't considered
for a backport of the fix, the project only applies fixes to supported
versions.  Anyone still running 9.6 in production is highly recommended to
upgrade to a supported version.

--
Daniel Gustafsson





Re: Eager aggregation, take 3

2024-08-29 Thread Robert Haas
On Wed, Aug 28, 2024 at 11:38 PM Tender Wang  wrote:
> I upload EXPLAIN(COSTS ON, ANALYZE) test to [1].
> I ran the same query three times, and I chose the third time result.
> You can check 19_off_explain.out and 19_on_explain.out.

So, in 19_off_explain.out, we got this:

 ->  Finalize GroupAggregate  (cost=666986.48..667015.35
rows=187 width=142) (actual time=272.649..334.318 rows=900 loops=1)
   ->  Gather Merge  (cost=666986.48..667010.21 rows=187
width=142) (actual time=272.644..333.847 rows=901 loops=1)
 ->  Partial GroupAggregate
(cost=665986.46..665988.60 rows=78 width=142) (actual
time=266.379..267.476 rows=300 loops=3)
   ->  Sort  (cost=665986.46..665986.65
rows=78 width=116) (actual time=266.367..266.583 rows=5081 loops=3)

And in 19_on_explan.out, we got this:

 ->  Finalize GroupAggregate  (cost=666987.03..666989.77
rows=19 width=142) (actual time=285.018..357.374 rows=900 loops=1)
   ->  Gather Merge  (cost=666987.03..666989.25 rows=19
width=142) (actual time=285.000..352.793 rows=15242 loops=1)
 ->  Sort  (cost=665987.01..665987.03 rows=8
width=142) (actual time=273.391..273.580 rows=5081 loops=3)
   ->  Nested Loop  (cost=665918.00..665986.89
rows=8 width=142) (actual time=252.667..269.719 rows=5081 loops=3)
 ->  Nested Loop
(cost=665917.85..665985.43 rows=8 width=157) (actual
time=252.656..264.755 rows=5413 loops=3)
   ->  Partial GroupAggregate
(cost=665917.43..665920.10 rows=82 width=150) (actual
time=252.643..255.627 rows=5413 loops=3)
 ->  Sort
(cost=665917.43..665917.64 rows=82 width=124) (actual
time=252.636..252.927 rows=5413 loops=3)

So, the patch was expected to cause the number of rows passing through
the Gather Merge to decrease from 197 to 19, but actually caused the
number of rows passing through the Gather Merge to increase from 901
to 15242. When the PartialAggregate was positioned at the top of the
join tree, it reduced the number of rows from 5081 to 300; but when it
was pushed down below two joins, it didn't reduce the row count at
all, and the subsequent two joins reduced it by less than 10%.

Now, you could complain about the fact that the Parallel Hash Join
isn't well-estimated here, but my question is: why does the planner
think that the PartialAggregate should go specifically here? In both
plans, the PartialAggregate isn't expected to change the row count.
And if that is true, then it's going to be cheapest to do it at the
point where the joins have reduced the row count to the minimum value.
Here, that would be at the top of the plan tree, where we have only
5081 estimated rows, but instead, the patch chooses to do it as soon
as we have all of the grouping columns, when we. still have 5413 rows.
I don't understand why that path wins on cost, unless it's just that
the paths compare fuzzily the same, in which case it kind of goes to
my earlier point about not really having the statistics to know which
way is actually going to be better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add contrib/pg_logicalsnapinspect

2024-08-29 Thread Bharath Rupireddy
On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot
 wrote:
>
> Yeah that's fair. And now I'm wondering if we need an extra module. I think
> we could "simply" expose 2 new functions in core, thoughts?
>
> > > What do you think? Did you have something else in mind?
> > >
> >
> > On similar lines, we can also provide a function to get the slot's
> > on-disk data.
>
> Yeah, having a way to expose the data from the disk makes fully sense to me.
>
> > IIRC, Bharath had previously proposed a tool to achieve
> > the same. It is fine if we don't want to add that as part of this
> > patch but I mentioned it because by having that we can have a set of
> > functions to view logical decoding data.
>
> That's right. I think this one would be simply enough to expose one or two
> functions in core too (and probably would not need an extra module).

+1 for functions in core unless this extra module
pg_logicalsnapinspect works as a tool to be helpful even when the
server is down.

FWIW, I wrote pg_replslotdata as a tool, not as an extension for
reading on-disk replication slot data to help when the server is down
- 
https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com.
When the server is running, pg_get_replication_slots() pretty much
gives the on-disk contents.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Use streaming read API in ANALYZE

2024-08-29 Thread Mats Kindahl
On Sat, Aug 24, 2024 at 5:31 AM Thomas Munro  wrote:

> On Thu, Aug 22, 2024 at 7:31 PM Mats Kindahl  wrote:
> > The alternate version proposed by Nazir allows you to deide which
> interface to use.
> > Reverting the patch entirely would also solve the problem.
>

After digging through the code a little more I discovered that
there actually is another one: move the ReadStream struct into
read_stream.h.


> > Passing down the block sampler and the strategy to scan_begin() and move
> the ReadStream setup in analyze.c into initscan() in heapam.c, but this
> requires adding new parameters to this function.
> > Having accessors that allow you to get the block sampler and strategy
> from the ReadStream object.
>
> I'm a bit confused about how it can make sense to use the same
> BlockSampler with a side relation/fork.  Could you point me at the
> code?
>

Sorry, that was a bit unclear. Intention was not to re-use the block
sampler but to set a new one up with parameters from the original block
sampler, which would require access to it. (The strategy is less of a
problem since only one is used.)

To elaborate on the situation:

For the TAM in question we have two different storage areas, both are
heaps. Both relations use the same attributes "publicly" (they are
internally different, but we transform them to look the same). One of the
relations is the "default" one and is stored in rd_rel. In order to run
ANALYZE, we need to sample blocks from both relations, in slightly
different ways.

With the old interface, we faked the number of blocks in relation_size()
callback and claimed that there were N + M blocks. When then being asked
about a block by block number, we could easily pick the correct relation
and just forward the call.

With the new ReadStream API, a read-stream is (automatically) set up on the
"default" relation, but we can set up a separate read-stream inside the TAM
for the other relation. However, the difficulty is in setting it up
correctly:

We cannot use the "fake number of block"-trick since the read stream does
not only compute the block number, but actually tries to read the buffer in
the relation provided when setting up the read stream, so a block number
outside the range of this relation will not be found since it is in a
different relation.

If we could create our own read stream with both relations, that could be
solved and we could just implement the same logic, but direct it to the
correct relations depending on where we want to read the block. Unless I am
mistaken, there is already support for this since there is an array of
in-progress I/O and it would be trivial to extend this with more
relations+forks, if you have access to the structure definition. The
ReadStream struct is, however, an opaque struct so it's hard to hack around
with it. Just making the struct declaration public would potentially solve
a lot of problems here. (See attached patch, which is close to the minimum
of what is needed to allow extension writers to tweak the contents.)

Since both relations are using the same attributes with the same
"analyzability", having that information would be useful to compute the
targrows for setting up the additional stream, but it is computed in
do_analyze_rel() and not further propagated, so it needs to be re-computed
if we want to set up a separate read-stream.


> > It would be great if this could be fixed before the PG17 release now
> that 27bc1772fc8 was reverted.
>
> Ack.  Thinking...
>

Right now I think that just making the ReadStream struct available in the
header file is the best approach. It is a safe and low-risk fix (so
something that can be added to a beta) and will allow extension writers to
hack to their hearts' contents. In addition to that, being able to select
what interface to use would also help.



> Random thought: is there a wiki page or something where we can find
> out about all the table AM projects?  For the successor to
> 27bc1772fc8, I hope they'll be following along.
>

At this point, unfortunately not, we are quite early in this. Once I have
something, I'll share.
-- 
Best wishes,
Mats Kindahl, Timescale
From ea4bb194e0dcccac8465b3aa13950f721bde3860 Mon Sep 17 00:00:00 2001
From: Mats Kindahl 
Date: Thu, 29 Aug 2024 10:39:34 +0200
Subject: Make ReadStream struct non-opaque

Move the ReadStream struct and two utility functions from read_stream.c
to read_stream.h to allow extensions to modify the structure if they
need to.
---
 src/backend/storage/aio/read_stream.c |  59 ---
 src/include/storage/read_stream.h | 105 ++
 2 files changed, 105 insertions(+), 59 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index a83c18c2a4..bf2a679037 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -97,65 +97,6 @@
 #include "utils/rel.h"
 #include "utils/spccache.h"
 
-typedef struct InProgressIO
-{
-	int16		buffer_index;
-	Read

Re: Switching XLog source from archive to streaming when primary available

2024-08-29 Thread Bharath Rupireddy
Hi,

Thanks for looking into this.

On Fri, Aug 23, 2024 at 5:03 AM John H  wrote:
>
> For a motivation aspect I can see this being useful
> synchronous_replicas if you have commit set to flush mode.

In synchronous replication setup, until standby finishes fetching WAL
from the archive, the commits on the primary have to wait which can
increase the query latency. If the standby can connect to the primary
as soon as the broken connection is restored, it can fetch the WAL
soon and transaction commits can continue on the primary. Is my
understanding correct? Is there anything more to this?

I talked to Michael Paquier at PGConf.Dev 2024 and got some concerns
about this feature for dealing with changing timelines. I can't think
of them right now.

And, there were some cautions raised upthread -
https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13
and https://www.postgresql.org/message-id/ZffaQt7UbM2Q9kYh%40paquier.xyz.

> So +1 on feature, easier configurability, although thinking about it
> more you could probably have the restore script be smarter and provide
> non-zero exit codes periodically.

Interesting. Yes, the restore script has to be smarter to detect the
broken connections and distinguish whether the server is performing
just the archive recovery/PITR or streaming from standby. Not doing it
right, perhaps, can cause data loss (?).

> The patch needs to be rebased but I tested this against an older 17 build.

Will rebase soon.

> > + ereport(DEBUG1,
> > + errmsg_internal("switched WAL source from %s to %s after %s",
> > + xlogSourceNames[oldSource],
>
> Not sure if you're intentionally changing from DEBUG1 from DEBUG2.

Will change.

> > * standby and increase the replication lag on primary.
>
> Do you mean "increase replication lag on standby"?
> nit: reading from archive *could* be faster since you could in theory
> it's not single-processed/threaded.

Yes. I think we can just say "All of these can impact the recovery
performance on
+ * standby and increase the replication lag."

> > However,
> > + * exhaust all the WAL present in pg_wal before switching. If successful,
> > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls
> > + * back to XLOG_FROM_ARCHIVE state.
>
> I think I'm missing how this happens. Or what "successful" means. If I'm 
> reading
> it right, no matter what happens we will always move to
> XLOG_FROM_STREAM based on how
> the state machine works?

Please have a look at some discussion upthread on exhausting pg_wal
before switching -
https://www.postgresql.org/message-id/20230119005014.GA3838170%40nathanxps13.
Even today, the standby exhausts pg_wal before switching to streaming
from the archive.

> I tested this in a basic RR setup without replication slots (e.g. log
> shipping) where the
> WAL is available in the archive but the primary always has the WAL
> rotated out and
> 'streaming_replication_retry_interval = 1'. This leads the RR to
> become stuck where it stops fetching from
> archive and loops between XLOG_FROM_PG_WAL and XLOG_FROM_STREAM.

Nice catch. This is a problem. One idea is to disable
streaming_replication_retry_interval feature for slot-less streaming
replication - either when primary_slot_name isn't specified disallow
the GUC to be set in assign_hook or when deciding to switch the wal
source. Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Virtual generated columns

2024-08-29 Thread jian he
On Thu, Aug 29, 2024 at 8:15 PM Peter Eisentraut  wrote:
>
>
> > drop table if exists comment_test cascade;
> > CREATE TABLE comment_test (
> >id int,
> >positive_col int  GENERATED ALWAYS AS (22) CHECK (positive_col > 0),
> >positive_col1 int  GENERATED ALWAYS AS (22) stored CHECK (positive_col > 
> > 0) ,
> >indexed_col int,
> >CONSTRAINT comment_test_pk PRIMARY KEY (id));
> > CREATE INDEX comment_test_index ON comment_test(indexed_col);
> > ALTER TABLE comment_test ALTER COLUMN positive_col1 SET DATA TYPE text;
> > ALTER TABLE comment_test ALTER COLUMN positive_col SET DATA TYPE text;
> > the last query should work just fine?
>
> I played with this and I don't see anything wrong with the current
> behavior.  I noticed that in your test case
>
>  >positive_col1 int  GENERATED ALWAYS AS (22) stored CHECK
> (positive_col > 0) ,
>
> you have the wrong column name in the check constraint.  I'm not sure if
> that was intentional.
>

That's my mistake. sorry for the noise.


On Wed, Aug 21, 2024 at 6:52 PM Dean Rasheed  wrote:
>
>
> Another argument for doing it that way round is to not add too many
> extra cycles to the processing of existing queries that don't
> reference generated expressions. ISTM that this patch is potentially
> adding quite a lot of additional overhead -- it looks like, for every
> Var in the tree, it's calling get_attgenerated(), which involves a
> syscache lookup to see if that column is a generated expression (which
> most won't be). Ideally, we should be trying to do the minimum amount
> of extra work in the common case where there are no generated
> expressions.
>
> Regards,
> Dean



>
> The new patch does some rebasing and contains various fixes to the
> issues you presented.  As I mentioned, I'll look into improving the
> rewriting.


based on your latest patch (v4-0001-Virtual-generated-columns.patch),
I did some minor cosmetic code change
and tried to address get_attgenerated overhead.

basically in expand_generated_columns_in_query
and expand_generated_columns_in_expr preliminary collect (reloid,attnum)
that have generated_virtual flag into expand_generated_context.
later in expand_generated_columns_mutator use the collected information.

deal with wholerow within the expand_generated_columns_mutator seems
tricky, will try later.


v4-0001-Virtual-generated-columns_minorchange.no-cfbot
Description: Binary data


Re: pg_verifybackup: TAR format backup verification

2024-08-29 Thread Amul Sul
On Sat, Aug 24, 2024 at 2:02 AM Robert Haas  wrote:
>
> On Wed, Aug 21, 2024 at 7:08 AM Amul Sul  wrote:
> []
> Then the result verifies. But I feel like we should have some test
> cases that do this kind of stuff so that there is automated
> verification. In fact, the current patch seems to have no negative
> test cases at all. I think we should test all the cases in
> 003_corruption.pl with tar format backups as well as with plain format
> backups, which we could do by untarring one of the archives, messing
> something up, and then retarring it. I also think we should have some
> negative test case specific to tar-format backup. I suggest running a
> coverage analysis and trying to craft test cases that hit as much of
> the code as possible. There will probably be some errors you can't
> hit, but you should try to hit the ones you can.
>

Done. I’ve added a few tests that extract, modify, and repack the tar
files, mainly base.tar and skipping tablespace.tar since it mostly
duplicate tests. I’ve also updated 002_algorithm.pl to cover
tests for tar backups.

>
> > 0011 patch: Regarding function names:
> >
> > 1. named the function verify_tar_backup_file() to align with
> > verify_plain_backup_file(), but it does not perform the complete
> > verification as verify_plain_backup_file does. Not sure if it is the
> > right name.
>
> I was thinking of something like precheck_tar_backup_file().

Done.

> > 2. verify_tar_file_contents() is the second and final part of tar
> > backup verification. Should its name be aligned with
> > verify_tar_backup_file()? I’m unsure what the best name would be.
> > Perhaps verify_tar_backup_file_final(), but then
> > verify_tar_backup_file() would need to be renamed to something like
> > verify_tar_backup_file_initial(), which might be too lengthy.
>
> verify_tar_file_contents() actually verifies the contents of all the
> tar files, not just one, so the name is a bit confusing. Maybe
> verify_all_tar_files().
>

Done.

>
> > 3. verify_tar_contents() is the core of verify_tar_file_contents()
> > that handles the actual verification. I’m unsure about the current
> > naming. Should we rename it to something like
> > verify_tar_contents_core()? It wouldn’t be an issue if we renamed
> > verify_tar_file_contents() as pointed in point #2.
>
> verify_one_tar_file()?
>

Done.

>
> But with those renames, I think you really start to see why I'm not
> very comfortable with verify_backup_directory(). The tar and plain
> format cases aren't really doing the same thing. We're just gluing
> them into a single function anyway.

Agreed. I can see the uncomfortness -- added a new function.

>
> I am also still uncomfortable with the way you've refactored some of
> this so that we end up with very small amounts of code far away from
> other code that they influence. Like you end up with this:
>
> /* Check the backup manifest entry for this file. */
> m = verify_manifest_entry(context, relpath, sb.st_size);
>
> /* Validate the pg_control information */
> if (should_verify_control_data(context->manifest, m))
> ...
> if (show_progress && !context->skip_checksums &&
> should_verify_checksum(m))
>
> But verify_manifest_entry can return NULL or it can set m->bad and
> either of those change the result of should_verify_control_data() and
> should_verify_checksum(), but none of that is obvious when you just
> look at this. Admittedly, the code in master isn't brilliant in terms
> of making it obvious what's happening either, but I think this is
> worse. I'm not really sure what I think we should do about that yet,
> but I'm uncomfortable with it.

I am not sure if I fully understand the concern, but I see it
differently. The verify_manifest_entry function returns an entry, m,
that the caller doesn't need to worry about, as it simply passes it to
subsequent routines or macros that are aware of the possible inputs --
whether it's NULL, m->bad, or something else.

Regards,
Amul
From 4f98ffa42916fe179e9a87b9043393b6449f1705 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v13 06/12] Refactor: split verify_backup_file() function and
 rename it.

The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.

In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++
 src/bin/pg_verifybackup/pg_verifybackup.h |  6 ++-
 2 files changed, 

Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes

2024-08-29 Thread Guillaume Lelarge
Hello,

This patch was a bit discussed on [1], and with more details on [2]. It
introduces four new columns in pg_stat_all_tables:

* parallel_seq_scan
* last_parallel_seq_scan
* parallel_idx_scan
* last_parallel_idx_scan

and two new columns in pg_stat_all_indexes:

* parallel_idx_scan
* last_parallel_idx_scan

As Benoit said yesterday, the intent is to help administrators evaluate the
usage of parallel workers in their databases and help configuring
parallelization usage.

A test script (test.sql) is attached. You can execute it with "psql -Xef
test.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).

Here is its result, a bit commented:

DROP TABLE IF EXISTS t1;
DROP TABLE
CREATE TABLE t1 (id integer);
CREATE TABLE
INSERT INTO t1 SELECT generate_series(1, 10_000_000);
INSERT 0 1000
VACUUM ANALYZE t1;
VACUUM
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]--+---
relname| t1
seq_scan   | 0
last_seq_scan  |
parallel_seq_scan  | 0
last_parallel_seq_scan |

==> no scan at all, the table has just been created

SELECT * FROM t1 LIMIT 1;
 id

  1
(1 row)

SELECT pg_sleep(1);
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]--+--
relname| t1
seq_scan   | 1
last_seq_scan  | 2024-08-29 15:43:17.377182+02
parallel_seq_scan  | 0
last_parallel_seq_scan |

==> one sequential scan, no parallelization

SELECT count(*) FROM t1;
  count
--
 1000
(1 row)

SELECT pg_sleep(1);
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]--+--
relname| t1
seq_scan   | 4
last_seq_scan  | 2024-08-29 15:43:18.504533+02
parallel_seq_scan  | 3
last_parallel_seq_scan | 2024-08-29 15:43:18.504533+02

==> one parallel sequential scan
==> I use the default configuration, so parallel_leader_participation = on,
max_parallel_workers_per_gather = 2
==> meaning 3 parallel sequential scans (1 leader, two workers)
==> take note that seq_scan was also incremented... we didn't change the
previous behaviour for this column

CREATE INDEX ON t1(id);
CREATE INDEX
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]--+--
indexrelname   | t1_id_idx
idx_scan   | 0
last_idx_scan  |
parallel_idx_scan  | 0
last_parallel_idx_scan |
idx_tup_read   | 0
idx_tup_fetch  | 0

==> no scan at all, the index has just been created

SELECT * FROM t1 WHERE id=15;
   id

 15
(1 row)

SELECT pg_sleep(1);
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]--+--
indexrelname   | t1_id_idx
idx_scan   | 1
last_idx_scan  | 2024-08-29 15:43:22.020853+02
parallel_idx_scan  | 0
last_parallel_idx_scan |
idx_tup_read   | 1
idx_tup_fetch  | 0

==> one index scan, no parallelization

SELECT * FROM t1 WHERE id BETWEEN 10 AND 40;
SELECT pg_sleep(1);
 pg_sleep
--

(1 row)

SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]--+--
indexrelname   | t1_id_idx
idx_scan   | 2
last_idx_scan  | 2024-08-29 15:43:23.136665+02
parallel_idx_scan  | 0
last_parallel_idx_scan |
idx_tup_read   | 32
idx_tup_fetch  | 0

==> another index scan, no parallelization

SELECT count(*) FROM t1 WHERE id BETWEEN 10 AND 40;
 count

 31
(1 row)

SELECT pg_sleep(1);
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]--+-
indexrelname   | t1_id_idx
idx_scan   | 5
last_idx_scan  | 2024-08-29 15:43:24.16057+02
parallel_idx_scan  | 3
last_parallel_idx_scan | 2024-08-29 15:43:24.16057+02
idx_tup_read   | 63
idx_tup_fetch  | 0

==> one parallel index scan
==> same thing, 3 parallel index scans (1 leader, two workers)
==> also, take note that idx_scan was also incremented... we didn't change
the previous behaviour for this column

First time I had to add new columns to a statistics catalog. I'm actually
not sure that we were right to change pg_proc.dat manually. We'l

Re: Add contrib/pg_logicalsnapinspect

2024-08-29 Thread Bertrand Drouvot
Hi,

On Thu, Aug 29, 2024 at 06:33:19PM +0530, Bharath Rupireddy wrote:
> On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot
>  wrote:
> >
> > That's right. I think this one would be simply enough to expose one or two
> > functions in core too (and probably would not need an extra module).
> 
> +1 for functions in core unless this extra module
> pg_logicalsnapinspect works as a tool to be helpful even when the
> server is down.

Thanks for the feedback!

I don't see any use case where it could be useful when the server is down. So,
I think I'll move forward with in core functions (unless someone has a different
opinion).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Removing log_cnt from pg_sequence_read_tuple()

2024-08-29 Thread Nathan Bossart
On Thu, Aug 29, 2024 at 02:11:22PM +0900, Michael Paquier wrote:
> Okay, here is a v2 of the patch using this name for the function.

LGTM

-- 
nathan




Re: 039_end_of_wal: error in "xl_tot_len zero" test

2024-08-29 Thread Nathan Bossart
On Thu, Aug 29, 2024 at 01:55:27AM -0400, Tom Lane wrote:
> Thomas Munro  writes:
>> The fix I propose to commit shortly is just the first of those new
>> lines, to homogenise the initial state.  See attached.  The previous
>> idea works too, I think, but this bigger hammer is more obviously
>> removing variation.
> 
> +1, but a comment explaining the need for the pg_switch_wal call
> seems in order.

+1

-- 
nathan




Re: Partitioned tables and [un]loggedness

2024-08-29 Thread Nathan Bossart
On Thu, Aug 29, 2024 at 03:44:45PM +0900, Michael Paquier wrote:
> On Tue, Aug 27, 2024 at 04:01:58PM -0500, Nathan Bossart wrote:
>> My current thinking is that it would be better to disallow marking
>> partitioned tables as LOGGED/UNLOGGED and continue to have users explicitly
>> specify what they want for each partition.  It'd still probably be good to
>> expand the documentation, but a clear ERROR when trying to set a
>> partitioned table as UNLOGGED would hopefully clue folks in.
> 
> The addition of the new LOGGED keyword is not required if we limit
> ourselves to an error when defining UNLOGGED, so if we drop this
> proposal, let's also drop this part entirely and keep DefineRelation()
> simpler.

+1

> Actually, is really issuing an error the best thing we can
> do after so many years allowing this grammar flavor to go through,
> even if it is perhaps accidental?  relpersistence is marked correctly
> for partitioned tables, it's just useless.  Expanding the
> documentation sounds fine to me, one way or the other, to tell what
> happens with partitioned tables.

IMHO continuing to allow partitioned tables to be marked UNLOGGED just
preserves the illusion that it does something.  An ERROR could help dispel
that misconception.

-- 
nathan




Re: pgstattuple: fix free space calculation

2024-08-29 Thread Frédéric Yhuel



On 8/23/24 12:51, Frédéric Yhuel wrote:



On 8/23/24 12:02, Rafia Sabih wrote:
On the other hand, this got me thinking about the purpose of this 
space information.
If we want to understand that there's still some space for the tuples 
in a page, then using PageGetExactFreeSpace is not doing justice in 
case of heap page, because we will not be able to add any more tuples 
there if there are already MaxHeapTuplesPerPage tuples there.


We won't be able to add, but we will be able to update a tuple in this 
page.


Sorry, that's not true.

So in this marginal case we have free space that's unusable in practice. 
No INSERT or UPDATE (HOT or not) is possible inside the page.


I don't know what pgstattuple should do in this case.

However, we should never encounter this case in practice (maybe on some 
exotic architectures with strange alignment behavior?). As I said, I 
can't fit more than 226 tuples per page on my machine, while 
MaxHeapTuplesPerPage is 291. Am I missing something?


Besides, pgstattuple isn't mission critical, is it?

So I think we should just use PageGetExactFreeSpace().

Here is a v3 patch. It's the same as v2, I only removed the last 
paragraph in the commit message.


Thank you Rafia and Andreas for your review and test.

Best regards,
FrédéricFrom f749d4ca2d6881a916c6ca2a3b882bb2740188d4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= 
Date: Wed, 21 Aug 2024 15:05:11 +0200
Subject: [PATCH v3] pgstattuple: use PageGetExactFreeSpace()

instead of PageGetHeapFreeSpace() or PageGetFreeSpace()

We want the exact free space, we don't care if there's enough room for
another line pointer.
---
 contrib/pgstattuple/pgstatapprox.c | 4 ++--
 contrib/pgstattuple/pgstatindex.c  | 2 +-
 contrib/pgstattuple/pgstattuple.c  | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index c84c642355..b63a9932d7 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -107,11 +107,11 @@ statapprox_heap(Relation rel, output_type *stat)
 		page = BufferGetPage(buf);
 
 		/*
-		 * It's not safe to call PageGetHeapFreeSpace() on new pages, so we
+		 * It's not safe to call PageGetExactFreeSpace() on new pages, so we
 		 * treat them as being free space for our purposes.
 		 */
 		if (!PageIsNew(page))
-			stat->free_space += PageGetHeapFreeSpace(page);
+			stat->free_space += PageGetExactFreeSpace(page);
 		else
 			stat->free_space += BLCKSZ - SizeOfPageHeaderData;
 
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 5c06ba6db4..1b6b768cf8 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -311,7 +311,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 
 			max_avail = BLCKSZ - (BLCKSZ - ((PageHeader) page)->pd_special + SizeOfPageHeaderData);
 			indexStat.max_avail += max_avail;
-			indexStat.free_space += PageGetFreeSpace(page);
+			indexStat.free_space += PageGetExactFreeSpace(page);
 
 			indexStat.leaf_pages++;
 
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 3bd8b96197..7e2a7262a3 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -372,7 +372,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 			buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
 		RBM_NORMAL, hscan->rs_strategy);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
-			stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
+			stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer));
 			UnlockReleaseBuffer(buffer);
 			block++;
 		}
@@ -385,7 +385,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
 	RBM_NORMAL, hscan->rs_strategy);
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
-		stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
+		stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer));
 		UnlockReleaseBuffer(buffer);
 		block++;
 	}
@@ -565,7 +565,7 @@ pgstat_index_page(pgstattuple_type *stat, Page page,
 {
 	OffsetNumber i;
 
-	stat->free_space += PageGetFreeSpace(page);
+	stat->free_space += PageGetExactFreeSpace(page);
 
 	for (i = minoff; i <= maxoff; i = OffsetNumberNext(i))
 	{
-- 
2.39.2



Re: Parallel workers stats in pg_stat_database

2024-08-29 Thread Benoit Lobréau

Hi,

This is a new patch which:

* fixes some typos
* changes the execgather / execgathermerge code so that the stats are 
accumulated in EState and inserted in pg_stat_database only once, during 
ExecutorEnd
* adds tests (very ugly, but I could get the parallel plan to be stable 
across make check executions.)



On 8/28/24 17:10, Benoit Lobréau wrote:

Hi hackers,

This patch introduces four new columns in pg_stat_database:

* parallel_worker_planned
* parallel_worker_launched
* parallel_maint_worker_planned
* parallel_maint_worker_launched

The intent is to help administrators evaluate the usage of parallel 
workers in their databases and help sizing max_worker_processes, 
max_parallel_workers or max_parallel_maintenance_workers).


Here is a test script:

psql << _EOF_

-- Index creation
DROP TABLE IF EXISTS test_pql;
CREATE TABLE test_pql(i int, j int);
INSERT INTO test_pql SELECT x,x FROM generate_series(1,100) as F(x);

-- 0 planned / 0 launched
EXPLAIN (ANALYZE)
 SELECT 1;

-- 2 planned / 2 launched
EXPLAIN (ANALYZE)
 SELECT i, avg(j) FROM test_pql GROUP BY i;

SET max_parallel_workers TO 1;
-- 4 planned / 1 launched
EXPLAIN (ANALYZE)
 SELECT i, avg(j) FROM test_pql GROUP BY i
 UNION
 SELECT i, avg(j) FROM test_pql GROUP BY i;

RESET max_parallel_workers;
-- 1 planned / 1 launched
CREATE INDEX ON test_pql(i);

SET max_parallel_workers TO 0;
-- 1 planned / 0 launched
CREATE INDEX ON test_pql(j);
-- 1 planned / 0 launched
CREATE INDEX ON test_pql(i, j);

SET maintenance_work_mem TO '96MB';
RESET max_parallel_workers;
-- 2 planned / 2 launched
VACUUM (VERBOSE) test_pql;

SET max_parallel_workers TO 1;
-- 2 planned / 1 launched
VACUUM (VERBOSE) test_pql;

-- TOTAL: parallel workers: 6 planned / 3 launched
-- TOTAL: parallel maint workers: 7 planned / 4 launched
_EOF_


And the output in pg_stat_database a fresh server without any 
configuration change except thoses in the script:


[local]:5445 postgres@postgres=# SELECT datname, 
parallel_workers_planned, parallel_workers_launched, 
parallel_maint_workers_planned, parallel_maint_workers_launched FROM pg

_stat_database WHERE datname = 'postgres' \gx

-[ RECORD 1 ]---+-

datname | postgres

parallel_workers_planned    | 6

parallel_workers_launched   | 3

parallel_maint_workers_planned  | 7

parallel_maint_workers_launched | 4

Thanks to: Jehan-Guillaume de Rorthais, Guillaume Lelarge and Franck
Boudehen for the help and motivation boost.

---
Benoit Lobréau
Consultant
http://dalibo.com


--
Benoit Lobréau
Consultant
http://dalibo.comFrom 0338cfb11ab98594b2f16d143b505e269566bb6e Mon Sep 17 00:00:00 2001
From: benoit 
Date: Wed, 28 Aug 2024 02:27:13 +0200
Subject: [PATCH] Adds four parallel workers stat columns to pg_stat_database

* parallel_workers_planned
* parallel_workers_launched
* parallel_maint_workers_planned
* parallel_maint_workers_launched
---
 doc/src/sgml/monitoring.sgml | 36 
 src/backend/access/brin/brin.c   |  4 +++
 src/backend/access/nbtree/nbtsort.c  |  4 +++
 src/backend/catalog/system_views.sql |  4 +++
 src/backend/commands/vacuumparallel.c|  5 +++
 src/backend/executor/execMain.c  |  5 +++
 src/backend/executor/execUtils.c |  3 ++
 src/backend/executor/nodeGather.c|  3 ++
 src/backend/executor/nodeGatherMerge.c   |  3 ++
 src/backend/utils/activity/pgstat_database.c | 36 
 src/backend/utils/adt/pgstatfuncs.c  | 12 +++
 src/include/catalog/pg_proc.dat  | 20 +++
 src/include/nodes/execnodes.h|  3 ++
 src/include/pgstat.h |  7 
 src/test/regress/expected/rules.out  |  4 +++
 src/test/regress/expected/stats.out  | 17 +
 src/test/regress/sql/stats.sql   | 14 
 17 files changed, 180 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 55417a6fa9..8c4b11c11d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3508,6 +3508,42 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
   
  
 
+ 
+  
+   parallel_workers_planned bigint
+  
+  
+   Number of parallel workers planned by queries on this database
+  
+ 
+
+ 
+  
+   parallel_workers_launched bigint
+  
+  
+   Number of parallel workers obtained by queries on this database
+  
+ 
+
+ 
+  
+   parallel_maint_workers_planned 
bigint
+  
+  
+   Number of parallel workers planned by utilities on this database
+  
+ 
+
+ 
+  
+   parallel_maint_workers_launched 
bigint
+  
+  
+   Number of parallel workers obtained by utilities on this database
+  
+ 
+
  
   
stats_reset timestamp with time 
zone
diff --git a

Re: Make printtup a bit faster

2024-08-29 Thread Tom Lane
David Rowley  writes:
> [ redesign I/O function APIs ]
> I had planned to work on this for PG18, but I'd be happy for some
> assistance if you're willing.

I'm skeptical that such a thing will ever be practical.  To avoid
breaking un-converted data types, all the call sites would have to
support both old and new APIs.  To avoid breaking non-core callers,
all the I/O functions would have to support both old and new APIs.
That probably adds enough overhead to negate whatever benefit you'd
get.

regards, tom lane




Re: race condition in pg_class

2024-08-29 Thread Nitin Motiani
On Thu, Aug 29, 2024 at 8:11 PM Noah Misch  wrote:
>
> On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:
> > My order of preference is: 2, 1, 3.
>
> I kept tuple locking responsibility in heapam.c.  That's simpler and better
> for modularity, but it does mean we release+acquire after any xmax wait.
> Before, we avoided that if the next genam.c scan found the same TID.  (If the
> next scan finds the same TID, the xmax probably aborted.)  I think DDL aborts
> are rare enough to justify simplifying as this version does.  I don't expect
> anyone to notice the starvation outside of tests built to show it.  (With
> previous versions, one can show it with a purpose-built test that commits
> instead of aborting, like the "001_pgbench_grant@9" test.)
>
> This move also loses the optimization of unpinning before XactLockTableWait().
> heap_update() doesn't optimize that way, so that's fine.
>
> The move ended up more like (1), though I did do
> s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2).  I
> felt that worked better than (2) to achieve lock release before
> CacheInvalidateHeapTuple().  Alternatives that could be fine:
>
>From a consistency point of view, I find it cleaner if we can have all
the heap_inplace_lock and heap_inplace_unlock in the same set of
functions. So here those would be the systable_inplace_* functions.

> - In the cancel case, call both systable_inplace_update_cancel and
>   systable_inplace_update_end.  _finish or _cancel would own unlock, while
>   _end would own systable_endscan().
>
What happens to CacheInvalidateHeapTuple() in this approach?  I think
it will still need to be brought to the genam.c layer if we are
releasing the lock in systable_inplace_update_finish.

> - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer.  While
>   tolerable now, this gets less attractive after the inplace160 patch from
>   https://postgr.es/m/flat/20240523000548.58.nmi...@google.com
>
I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

Also I have a small question from inplace120.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another
write to ri_RelationDesc in CatalogOpenIndexes in
src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
be set there also to keep it consistent with ri_RelationDesc. Please
let me know if I am missing something about the usage of the new
field.

Thanks & Regards,
Nitin Motiani
Google




Re: RFC: Additional Directory for Extensions

2024-08-29 Thread David E. Wheeler
On Aug 27, 2024, at 22:24, Craig Ringer  wrote:

> `pg_config` only cares about compile-time settings, so I would not
> expect its output to change.

Right, of course that’s its original purpose, but extensions depend on it to 
determine where to install extensions. Not just PGXS, but also pgrx and various 
Makefile customizations I’ve seen in the wild.

> I suspect we'd have to add PGXS extension-path awareness if going for
> per-extension subdir support. I'm not sure it makes sense to teach
> `pg_config` about this, since it'd need to have a different mode like
> 
>pg_config --extension myextname --extension-sharedir
> 
> since the extension's "sharedir" is
> $basedir/extensions/myextension/share or whatever.

Right. PGXS would just need to know where to put the directory for an 
extension. There should be a default for the project, and then it can be 
overridden with something like DESTDIR (but without full paths under that 
prefix).

> Supporting this looks to be a bit intrusive in the makefiles,
> requiring them to differentiate between "share dir for extensions" and
> "share dir for !extensions", etc. I'm not immediately sure if it can
> be done in a way that transparently converts unmodified extension PGXS
> makefiles to target the new paths; it might require an additional
> define, or use of new variables and an ifdef block to add
> backwards-compat to the extension makefile for building on older
> postgres.

Yeah, might just have to be an entirely new thing, though it sure would be nice 
for existing PGXS-using Makefiles to do the right thing. Maybe for the new 
version of the server with the proposed new pattern it would dispatch to the 
new thing somehow without modifying all the rest of its logic.

> Right. The proposed structure is rather a bigger change than I was
> thinking when I suggested supporting an extension search path not just
> a single extra path. But it's also a cleaner proposal; the
> per-extension directory would make it easier to ensure that the
> extension control file, sql scripts, and dynamic library all match the
> same extension and version if multiple ones are on the path. Which is
> desirable when doing things like testing a new version of an in-core
> extension.

💯

> Right. And I've been on the receiving end of having a small, focused
> patch derailed by others jumping in and scope-exploding it into
> something completely different to solve a much wider but related
> problem.

I’m not complaining, I would definitely prefer to see consensus on a cleaner 
proposal along the lines we’ve discussed and a commitment to time from parties 
able to get it done in time for v18. I’m willing to help where I can with my 
baby C! Failing that, we can fall back on the destdir patch.

> I'm definitely not trying to stand in the way of progress with this; I
> just want to make sure that it doesn't paint us into a corner that
> prevents a more general solution from being adopted later. That's why
> I'm suggesting making the config a multi-value string (list of paths)
> and raising a runtime "ERROR: searching multiple paths for extensions
> not yet supported" or something if >1 path configured.
> 
> If that doesn't work, no problem.

I think the logic would have to be different, so they’d be different GUCs with 
their own semantics. But if the core team and committers are on board with the 
general idea of search paths and per-extension directory organization, it would 
be best to avoid getting stuck with maintaining the current patch’s GUC.

OTOH, we could get it committed now and revert it later if we get the better 
thing done and committed.

>> I think we should get some clarity on the proposal, and then consensus, as 
>> you say. I say “get some clarity” because my proposal doesn’t require 
>> recursing, and I’m not sure why it’d be needed.
> 
> From what you and Gabriele are discussing (which I agree with), it wouldn’t.

Ah, great.

I’ll try to put some thought into a more formal proposal in a new thread next 
week. Unless your Gabriele beats me to it 😂.

Best,

David





Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-29 Thread Andrew Dunstan



On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able to 
tell which function you're in.
This is a hashref so it will be possible to populate new and exciting 
other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning like 
this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. This 
could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, it's 
greatly helped track down issues.  And there shouldn't be anything 
platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres patch, 
so any guidance on adding regression testing would be appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in the 
past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name already, 
just like when triggers pass you $_TD with all the pertinent information


A wishlist item would be for postgres plperl to automatically prepend 
the function name and schema when throwing perl warnings so you don't 
have to do your own __WARN__ handler, but this is the next best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, whereas 
you can know the function's name at compile time, using just the 
mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for other PLs?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





SQL:2023 JSON simplified accessor support

2024-08-29 Thread Alexandra Wang
Hello Hackers,

I’ve attached a patch to start adding SQL:2023 JSON simplified
accessor support. This allows accessing JSON or JSONB fields using dot
notation (e.g., colname.field.field...), similar to composite types.

Currently, PostgreSQL uses nonstandard syntax like colname->x->y for
JSON and JSONB, and colname['blah'] for JSONB. These existing syntaxes
predate the standard. Oracle already supports the standard dot
notation syntax [1].

The full specification for the JSON simplified accessor format is as
follows:

 ::=
   
 ::=

  |  
 ::=

  | 
  | 
  | 
  | 

I’ve implemented the member and array accessors and attached two
alternative patches:

1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch
enables dot access to JSON object fields and subscript access to
indexed JSON array elements by converting "." and "[]" indirection
into a JSON_QUERY JsonFuncExpr node.

2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This
alternative patch implements dot access to JSON object fields by
transforming the "." indirection into a "->" operator.

The upside of the v1 patch is that it strictly aligns with the SQL
standard, which specifies that the simplified access is equivalent to:

JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)

However, the performance of JSON_QUERY might be suboptimal due to
function call overhead. Therefore, I implemented the v2 alternative
using the "->" operator.

There is some uncertainty about the semantics of conditional array
wrappers. Currently, there is at least one subtle difference between
the "->" operator and JSON_QUERY, as shown:

postgres=# select '{"a": 42}'::json->'a';
 ?column?
--
 42
(1 row)

postgres=# select json_query('{"a": 42}'::json, 'lax $.a' with
conditional array wrapper null on empty null on error);
 json_query

 [42]
(1 row)

JSON_QUERY encloses the JSON value 42 in brackets, which may be a bug,
as Peter noted [2]. If there are no other semantic differences, we
could implement simple access without using JSON_QUERY to avoid
function call overhead.

I aim to first enable standard dot notation access to JSON object
fields. Both patches implement this, and I’m also open to alternative
approaches.

For subscripting access to jsonb array elements, jsonb already
supports this via the subscripting handler interface. In the v1 patch,
I added json support using JSON_QUERY, but I can easily adapt this for
the v2 patch using the -> operator. I did not leverage the
subscripting handler interface for json because implementing the
fetch/assign functions for json seems challenging for plain text. Let
me know if you have a different approach in mind.

Finally, I have not implemented wildcard or item method accessors yet
and would appreciate input on their necessity.

[1] 
https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/simple-dot-notation-access-to-json-data.html#GUID-7249417B-A337-4854-8040-192D5CEFD576
[2] 
https://www.postgresql.org/message-id/8022e067-818b-45d3-8fab-6e0d94d03...@eisentraut.org
From 99ee1cbc45ebd76ce21881bda95933d8a5a104c6 Mon Sep 17 00:00:00 2001
From: Alexandra Wang 
Date: Thu, 15 Aug 2024 02:11:33 -0700
Subject: [PATCH v2] Transform JSON dot access to arrow operator

Enabled dot-notation access to JSON/JSONB object by making a syntatic
sugar for the "->" operator in ParseFuncOrColumn() for arg of
JSON/JSONB type.

JSON array access via subscripting is not yet supported in this patch,
but can be implemented similarly by creating an OpExpr for the
json_array_element "->" operator.

Note that the output of the "->" operators are not wrapped by
brackets, which differs from the SQL standard specification for the
JSON simplified accessor equivalence shown below:

JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL 
ON ERROR)
---
 src/backend/parser/parse_func.c | 57 +---
 src/include/catalog/pg_operator.dat |  4 +-
 src/include/parser/parse_type.h |  1 +
 src/test/regress/expected/json.out  | 67 +
 src/test/regress/expected/jsonb.out | 55 +++
 src/test/regress/sql/json.sql   | 20 +
 src/test/regress/sql/jsonb.sql  | 17 
 7 files changed, 214 insertions(+), 7 deletions(-)

diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9b23344a3b..431c9883f2 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -33,6 +33,8 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "parser/parse_oper.h"
+#include "catalog/pg_operator_d.h"
 
 
 /* Possible error codes from LookupFuncNameInternal */
@@ -48,6 +50,8 @@ static void unify_hypothetical_args(ParseState *pstate,
 static Oid FuncNameAsType(List *funcname);
 static Node *ParseComplexProjection(ParseState *pstate, const char *funcname,

Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-29 Thread Ayush Vatsa
Hi all,
Please find attached the patch that re-enables
support for sequences within the pgstattuple extension.
I have also included the necessary test cases for
sequences, implemented in the form of regress tests.

Here is the commitfest link for the same -
https://commitfest.postgresql.org/49/5215/

Regards
Ayush Vatsa
AWS
From 95ab0aa019e1cfec73bc94448faeafeac67b434e Mon Sep 17 00:00:00 2001
From: Ayush Vatsa 
Date: Thu, 29 Aug 2024 21:40:29 +0530
Subject: [PATCH v1] Introduced support for sequences back in pgstattuple
 extension

---
 contrib/pgstattuple/expected/pgstattuple.out | 24 
 contrib/pgstattuple/pgstattuple.c|  6 -
 contrib/pgstattuple/sql/pgstattuple.sql  | 10 
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out
index 283856e109..1e79fd9036 100644
--- a/contrib/pgstattuple/expected/pgstattuple.out
+++ b/contrib/pgstattuple/expected/pgstattuple.out
@@ -273,6 +273,30 @@ select pgstathashindex('test_partition_hash_idx');
  (4,8,0,1,0,0,0,100)
 (1 row)
 
+-- test on sequence
+-- only pgstattuple and pg_relpages should work
+create sequence serial;
+select * from pgstattuple('serial');
+ table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent 
+---+-+---+---+--++++--
+  8192 |   1 |41 |   0.5 |0 |  0 |  0 |   8104 |98.93
+(1 row)
+
+select pgstatindex('serial');
+ERROR:  relation "serial" is not a btree index
+select pgstatginindex('serial');
+ERROR:  relation "serial" is not a GIN index
+select pgstathashindex('serial');
+ERROR:  relation "serial" is not a hash index
+select pg_relpages('serial');
+ pg_relpages 
+-
+   1
+(1 row)
+
+select * from pgstattuple_approx('serial');
+ERROR:  relation "serial" is of wrong relation kind
+DETAIL:  This operation is not supported for sequences.
 drop table test_partitioned;
 drop view test_view;
 drop foreign table test_foreign_table;
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 3bd8b96197..13c70c4152 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -323,7 +323,11 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 	pgstattuple_type stat = {0};
 	SnapshotData SnapshotDirty;
 
-	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+	/**
+	 * Sequences don't fall under heap AM but are still
+	 * allowed for obtaining tuple-level statistics.
+	 */
+	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID && rel->rd_rel->relkind != RELKIND_SEQUENCE)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("only heap AM is supported")));
diff --git a/contrib/pgstattuple/sql/pgstattuple.sql b/contrib/pgstattuple/sql/pgstattuple.sql
index b08c31c21b..ea121df0c7 100644
--- a/contrib/pgstattuple/sql/pgstattuple.sql
+++ b/contrib/pgstattuple/sql/pgstattuple.sql
@@ -119,6 +119,16 @@ create index test_partition_hash_idx on test_partition using hash (a);
 select pgstatindex('test_partition_idx');
 select pgstathashindex('test_partition_hash_idx');
 
+-- test on sequence
+-- only pgstattuple and pg_relpages should work
+create sequence serial;
+select * from pgstattuple('serial');
+select pgstatindex('serial');
+select pgstatginindex('serial');
+select pgstathashindex('serial');
+select pg_relpages('serial');
+select * from pgstattuple_approx('serial');
+
 drop table test_partitioned;
 drop view test_view;
 drop foreign table test_foreign_table;
-- 
2.41.0



Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-29 Thread Mark Murawski




On 8/29/24 11:56, Andrew Dunstan wrote:


On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able to 
tell which function you're in.
This is a hashref so it will be possible to populate new and exciting 
other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning 
like this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. This 
could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, it's 
greatly helped track down issues.  And there shouldn't be anything 
platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres patch, 
so any guidance on adding regression testing would be appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in 
the past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name already, 
just like when triggers pass you $_TD with all the pertinent information


A wishlist item would be for postgres plperl to automatically prepend 
the function name and schema when throwing perl warnings so you don't 
have to do your own __WARN__ handler, but this is the next best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, whereas 
you can know the function's name at compile time, using just the 
mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for other PLs?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Hi Andrew,


Thanks for the feedback.


1) Why is this not similar to _TD?  It literally operates identically. 
At run-time it passes you $_TD  for triggers.    Same her for 
functions.  This is all run-time.   What exactly is the issue you're 
trying to point out?



2) I would agree that other PLs should get the same detail.  I don't 
know the other ones as I've been only working in pl/perl.







Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-29 Thread Peter Geoghegan
On Wed, Aug 28, 2024 at 9:25 AM Robert Haas  wrote:
> On Tue, Aug 27, 2024 at 3:04 PM Tom Lane  wrote:
> I kind of had that reaction too initially, but I think that was mostly
> because "Primitive Index Scans" seemed extremely unclear. I think
> "Index Searches" is pretty comprehensible, honestly. Why shouldn't
> someone be able to figure out what that means?

Worth noting that Lukas Fittl made a point of prominently highlighting
the issue with how this works when he explained the Postgres 17 nbtree
work:

https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans

And no, I wasn't asked to give any input to the blog post. Lukas has a
general interest in making the system easier to understand for
ordinary users. Presumably that's why he zeroed in on this one aspect
of the work. It's far from an esoteric implementation detail.

-- 
Peter Geoghegan




Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-29 Thread Nathan Bossart
On Thu, Aug 29, 2024 at 10:17:57PM +0530, Ayush Vatsa wrote:
> Please find attached the patch that re-enables
> support for sequences within the pgstattuple extension.
> I have also included the necessary test cases for
> sequences, implemented in the form of regress tests.

Thanks.  Robert, do you have any concerns with this?

+select * from pgstattuple('serial');
+ table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | 
dead_tuple_len | dead_tuple_percent | free_space | free_percent 
+---+-+---+---+--++++--
+  8192 |   1 |41 |   0.5 |0 |  
0 |  0 |   8104 |98.93
+(1 row)

I'm concerned that some of this might be platform-dependent and make the
test unstable.  Perhaps we should just select count(*) here.

+   /**
+* Sequences don't fall under heap AM but are still
+* allowed for obtaining tuple-level statistics.
+*/

I think we should be a bit more descriptive here, like the comment in
verify_heapam.c:

/*
 * Sequences always use heap AM, but they don't show that in the 
catalogs.
 * Other relkinds might be using a different AM, so check.
 */

-- 
nathan




Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-29 Thread Nathan Bossart
On Thu, Aug 29, 2024 at 12:36:35PM -0500, Nathan Bossart wrote:
> +select * from pgstattuple('serial');
> + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | 
> dead_tuple_len | dead_tuple_percent | free_space | free_percent 
> +---+-+---+---+--++++--
> +  8192 |   1 |41 |   0.5 |0 |
>   0 |  0 |   8104 |98.93
> +(1 row)
> 
> I'm concerned that some of this might be platform-dependent and make the
> test unstable.  Perhaps we should just select count(*) here.

Sure enough, the CI testing for 32-bit is failing here [0].

[0] 
https://api.cirrus-ci.com/v1/artifact/task/4798423386292224/testrun/build-32/testrun/pgstattuple/regress/regression.diffs

-- 
nathan




Re: proposal: schema variables

2024-08-29 Thread Pavel Stehule
Hi

út 27. 8. 2024 v 16:52 odesílatel Laurenz Albe 
napsal:

> time""On Tue, 2024-08-27 at 08:52 +0200, Pavel Stehule wrote:
> > I can throw 200KB from another 300KB patch set which can be better for
> review, but it
> > can be harder to maintain this patch set. I'll try it, and I'll send a
> reduced version.
>
> That was not a criticism, and I think the way you split up the patch set
> right now
> is as good as it probably gets.  Ideally, one could say something like "we
> need at least
> patch 1 to 4, 5 and 6 are optional, but desirable, all others can easily
> be deferred
> to a later time".
>

It was mentioned here more times (I thought).

1..4 are fundamental
5..6 optional (6 are just tests)

others can be deferred (5-6 can be deferred too). Without support of
temporary session variables, it  is not too probable to repeatedly CREATE,
DROP the same variable in one session, so memory usage can be similar to
today's workarounds, but against workarounds, session variables use types
and access rights. Note - plpgsql doesn't try to delete compiled code from
cache immediately too - so the behaviour implemented in 1..4 is "similar"
to plpgsql func cache

14 .. allow you to use session variables as arguments of CALL or EXECUTE
statement, and variables can be used in plpgsql simple expr.
15 .. variables don't block parallelism
16 .. the result of plpgsql simple expr can be assigned to session variables
17 .. expr with session variable can be inlined in SQL

14-17 are performance related

7 - was requested by some people - some functionality can be possibly
replaced by plpgsql_check.
It has only 40 rows - it just raise warning or error when some conditions
are true

Regards

Pavel



>
> Yours,
> Laurenz Albe
>


Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-29 Thread Ayush Vatsa
> I'm concerned that some of this might be platform-dependent and make the
> test unstable.  Perhaps we should just select count(*) here.
> Sure enough, the CI testing for 32-bit is failing here [0].
Thanks for catching that! I wasn't aware of this earlier.

> I think we should be a bit more descriptive here
Regarding the comment, I've tried to make it more
descriptive and simpler than the existing one in
verify_heapam.c. Here’s the comment I propose:

/*
 * Sequences implicitly use the heap AM, even though it's not explicitly
 * recorded in the catalogs. For other relation kinds, verify that the AM
 * is heap; otherwise, raise an error.
 */

Please let me know if this still isn’t clear enough,
then I can make further revisions in line with verify_heapam.c.

The patch with all the changes is attached.

Regards
Ayush Vatsa
AWS


v1-0001-Introduced-support-for-sequences-back-in-pgstattu.patch
Description: Binary data


Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Pavel Stehule
út 27. 8. 2024 v 13:57 odesílatel Jim Jones 
napsal:

>
>
> On 26.08.24 16:59, Pavel Stehule wrote:
> >
> > 1. what about behaviour of NO INDENT - the implementation is not too
> > old, so it can be changed if we want (I think), and it is better to do
> > early than too late
>
> While checking the feasibility of removing indentation with NO INDENT I
> may have found a bug in XMLSERIALIZE ... INDENT.
> xmlSaveToBuffer seems to ignore elements if there are whitespaces
> between them:
>
> SELECT xmlserialize(DOCUMENT '42' AS text INDENT);
>   xmlserialize
> -
>+
>42+
>   +
>
> (1 row)
>
> SELECT xmlserialize(DOCUMENT ' 42 '::xml AS text
> INDENT);
> xmlserialize
> 
>   42 +
>
> (1 row)
>
> I'll take a look at it.
>

+1


> Regarding removing indentation: yes, it would be possible with libxml2.
> The question is if it would be right to do so.
> > 2. Are we able to implement SQL/XML syntax with libxml2?
> >
> > 3. Are we able to implement Oracle syntax with libxml2? And there are
> > benefits other than higher possible compatibility?
> I guess it would be beneficial if you're migrating from oracle to
> postgres - or the other way around. It certainly wouldn't hurt, but so
> far I personally had little use for the oracle's extra xmlserialize
> features.
> >
> > 4. Can there be some possible collision (functionality, syntax) with
> > CANONICAL?
> I couldn't find anything in the SQL/XML spec that might refer to
> canonocal xml.
> >
> > 5. SQL/XML XMLSERIALIZE supports other target types than varchar. I
> > can imagine XMLSERIALIZE with CANONICAL to bytea (then we don't need
> > to force database encoding). Does it make sense? Are the results
> > comparable?
> |
> As of pg16 bytea is not supported. Currently type| can be |character|,
> |character varying|, or |text - also their other flavours like 'name'.
>

I know, but theoretically, there can be some benefit for CANONICAL if pg
supports bytea there. Lot of databases still use non utf8 encoding.

It is a more theoretical question - if pg supports different types there in
future  (because SQL/XML or Oracle), then CANONICAL can be used without
limit, or CANONICAL can be used just for text? And you are sure, so you can
compare text X text, instead xml X xml?

+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT
doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize;
+ ?column?
+--
+ t
+ t
+(2 rows)

Maybe I am a little bit confused by these regress tests, because at the end
it is not too useful - you compare two identical XML, and WITH COMMENTS and
WITHOUT COMMENTS is tested elsewhere. I tried to search for a sense of this
test.  Better to use really different documents (columns) instead.

Regards

Pavel


>
> |
>
> --
> Jim
>
>


Primary and standby setting cross-checks

2024-08-29 Thread Heikki Linnakangas
Currently, if you configure a hot standby server with a smaller 
max_connections setting than the primary, the server refuses to start up:


LOG:  entering standby mode
FATAL:  recovery aborted because of insufficient parameter settings
DETAIL:  max_connections = 10 is a lower setting than on the primary 
server, where its value was 100.
HINT:  You can restart the server after making the necessary 
configuration changes.


Or if you change the setting in the primary while the standby is 
running, replay pauses:


WARNING:  hot standby is not possible because of insufficient parameter 
settings
DETAIL:  max_connections = 100 is a lower setting than on the primary 
server, where its value was 200.
CONTEXT:  WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE: 
max_connections=200 max_worker_processes=8 max_wal_senders=10 
max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical 
wal_log_hints=off track_commit_timestamp=off

LOG:  recovery has paused
DETAIL:  If recovery is unpaused, the server will shut down.
HINT:  You can then restart the server after making the necessary 
configuration changes.
CONTEXT:  WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE: 
max_connections=200 max_worker_processes=8 max_wal_senders=10 
max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical 
wal_log_hints=off track_commit_timestamp=off


Both of these are rather unpleasant behavior.

I thought I could get rid of that limitation with my CSN snapshot patch 
[1], because it gets rid of the fixed-size known-assigned XIDs array, 
but there's a second reason for these limitations. It's also used to 
ensure that the standby has enough space in the lock manager to hold 
possible AccessExclusiveLocks taken by transactions in the primary.


So firstly, I think that's a bad tradeoff. In vast majority of cases, 
you would not run out of lock space anyway, if you just started up the 
system. Secondly, that cross-check of settings doesn't fully prevent the 
problem. It ensures that the lock tables are large enough to accommodate 
all the locks you could possibly hold in the primary, but that doesn't 
take into account any additional locks held by read-only queries in the 
hot standby. So if you have queries running in the standby that take a 
lot of locks, this can happen anyway:


2024-08-29 21:44:32.634 EEST [668327] FATAL:  out of shared memory
2024-08-29 21:44:32.634 EEST [668327] HINT:  You might need to increase 
"max_locks_per_transaction".
2024-08-29 21:44:32.634 EEST [668327] CONTEXT:  WAL redo at 2/FD40FCC8 
for Standby/LOCK: xid 996 db 5 rel 154045
2024-08-29 21:44:32.634 EEST [668327] WARNING:  you don't own a lock of 
type AccessExclusiveLock
2024-08-29 21:44:32.634 EEST [668327] LOG:  RecoveryLockHash contains 
entry for lock no longer recorded by lock manager: xid 996 database 5 
relation 154045
TRAP: failed Assert("false"), File: 
"../src/backend/storage/ipc/standby.c", Line: 1053, PID: 668327
postgres: startup recovering 
0001000200FD(ExceptionalCondition+0x6e)[0x556a4588396e]
postgres: startup recovering 
0001000200FD(+0x44156e)[0x556a4571356e]
postgres: startup recovering 
0001000200FD(StandbyReleaseAllLocks+0x78)[0x556a45712738]
postgres: startup recovering 
0001000200FD(ShutdownRecoveryTransactionEnvironment+0x15)[0x556a45712685]
postgres: startup recovering 
0001000200FD(shmem_exit+0x111)[0x556a457062e1]
postgres: startup recovering 
0001000200FD(+0x434132)[0x556a45706132]
postgres: startup recovering 
0001000200FD(proc_exit+0x59)[0x556a45706079]
postgres: startup recovering 
0001000200FD(errfinish+0x278)[0x556a45884708]
postgres: startup recovering 
0001000200FD(LockAcquireExtended+0xa46)[0x556a45719386]
postgres: startup recovering 
0001000200FD(StandbyAcquireAccessExclusiveLock+0x11d)[0x556a4571330d]
postgres: startup recovering 
0001000200FD(standby_redo+0x70)[0x556a45713690]
postgres: startup recovering 
0001000200FD(PerformWalRecovery+0x7b3)[0x556a4547d313]
postgres: startup recovering 
0001000200FD(StartupXLOG+0xac3)[0x556a4546dae3]
postgres: startup recovering 
0001000200FD(StartupProcessMain+0xe8)[0x556a45693558]
postgres: startup recovering 
0001000200FD(+0x3ba95d)[0x556a4568c95d]
postgres: startup recovering 
0001000200FD(+0x3bce41)[0x556a4568ee41]
postgres: startup recovering 
0001000200FD(PostmasterMain+0x116e)[0x556a4568eaae]
postgres: startup recovering 
0001000200FD(+0x2f960e)[0x556a455cb60e]

/lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f10ef042c8a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f10ef042d45]
postgres: startup recovering 
0001000200FD(_start+0x21)[0x556a453af011]
2024-08-29 21:44:32.641 EEST [668324] LOG:  startup process (PID 668327) 
was terminated by signal 6: Aborted
2024-08-29 21:44:32.641 EEST [668324] LOG:  terminating any other active 

Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-29 Thread Nathan Bossart
On Fri, Aug 30, 2024 at 12:17:47AM +0530, Ayush Vatsa wrote:
> The patch with all the changes is attached.

Looks generally reasonable to me.

-- 
nathan




Re: Allow logical failover slots to wait on synchronous replication

2024-08-29 Thread John H
Hi Shveta,

Thanks for reviewing it so quickly.

On Thu, Aug 29, 2024 at 2:35 AM shveta malik  wrote:
>
> Thanks for the patch. Few comments and queries:
>
> 1)
> + static XLogRecPtr   lsn[NUM_SYNC_REP_WAIT_MODE];
>
> We shall name it as 'lsns' as there are multiple.
>

This follows the same naming convention in walsender_private.h

> 2)
>
> + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> + {
> + lsn[i] = InvalidXLogRecPtr;
> + }
>
> Can we do it like below similar to what you have done at another place:
> memset(lsn, InvalidXLogRecPtr, sizeof(lsn));
>

I don't think memset works in this case? Well, I think *technically* works but
not sure if that's something worth optimizing.
If I understand correctly, memset takes in a char for the value and
not XLogRecPtr (uint64).

So something like: memset(lsn, 0, sizeof(lsn))

InvalidXLogRecPtr is defined as 0 so I think it works but there's an
implicit dependency here
for correctness.

> 3)
> + if (!initialized)
> + {
> + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> + {
> + lsn[i] = InvalidXLogRecPtr;
> + }
> + }
>
> I do not see 'initialized' set to TRUE anywhere. Can you please
> elaborate the intent here?

You're right I thought I fixed this. WIll update.

>
> 4)
> + int mode = SyncRepWaitMode;
> + int i;
> +
> + if (!initialized)
> + {
> + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> + {
> + lsn[i] = InvalidXLogRecPtr;
> + }
> + }
> + if (mode == SYNC_REP_NO_WAIT)
> + return true;
>
> I do not understand this code well. As Amit also pointed out,  'mode'
> may change. When we initialize 'mode' lets say SyncRepWaitMode is
> SYNC_REP_NO_WAIT but by the time we check 'if (mode ==
> SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say
> SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from
> here. Is that a possibility? ProcessConfigFile() is in the caller, and
> thus we may end up using the wrong mode.
>

Yes it's possible for mode to change. In my comment to Amit in the other thread,
I think we have to store mode and base our execution of this logic and ignore
SyncRepWaitMode changing due to ProcesConfigFile/SIGHUP for one iteration.

We can store the value of mode later, so something like:

> if (SyncRepWaitMode == SYNC_REP_NO_WAIT)
> return true;
> mode = SyncRepWaitMode
> if (lsn[mode] >= wait_for_lsn)
>   return true;

But it's the same issue which is when you check lsn[mode],
SyncRepWaitMode could have changed to
something else, so you always have to initialize the value and will
always have this discrepancy.

I'm skeptical end users are changing SyncRepWaitMode in their database
clusters as
it has implications for their durability and I would assume they run
with the same durability guarantees.

Thanks,
-- 
John Hsu - Amazon Web Services




Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-08-29 Thread Jeff Davis
On Thu, 2024-08-29 at 12:55 +0530, Bharath Rupireddy wrote:
> IMO, every caller branching out in the code like if (rel->rd_tableam-
> >
> tuple_modify_buffer_insert != NULL) then multi insert; else single
> insert; doesn't look good. IMO, the default implementation approach
> keeps things simple which eventually can be removed in *near* future.
> Thoughts?

I believe we need the branching in the caller anyway:

1. If there is a BEFORE row trigger with a volatile function, the
visibility rules[1] mean that the function should see changes from all
the rows inserted so far this command, which won't work if they are
still in the buffer.

2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
that the function should see all previous rows inserted.

3. If there are volatile functions in the target list or WHERE clause,
the same visibility semantics apply.

4. If there's a "RETURNING ctid" clause, we need to either come up with
a way to return the tuples after flushing, or we need to use the
single-tuple path. (Similarly in the future when we support UPDATE ...
RETURNING, as Matthias pointed out.)

If we need two paths in each caller anyway, it seems cleaner to just
wrap the check for tuple_modify_buffer_insert in
table_modify_buffer_enabled().

We could perhaps use a one path and then force a batch size of one or
something, which is an alternative, but we have to be careful not to
introduce a regression (and it still requires a solution for #4).

Regards,
Jeff Davis

[1] 
https://www.postgresql.org/docs/devel/trigger-datachanges.html





Add parallel columns for pg_stat_statements

2024-08-29 Thread Guillaume Lelarge
Hello,

This patch was a bit discussed on [1], and with more details on [2]. It's
based on another patch sent in 2022 (see [3]). It introduces seven new
columns in pg_stat_statements:

 * parallelized_queries_planned, number of times the query has been planned
to be parallelized,
 * parallelized_queries_launched, number of times the query has been
executed with parallelization,
 * parallelized_workers_planned, number of parallel workers planned for
this query,
 * parallelized_workers_launched, number of parallel workers executed for
this query,
 * parallelized_nodes, number of parallelized nodes,
 * parallelized_nodes_all_workers, number of parallelized nodes which had
all requested workers,
 * parallelized_nodes_no_worker, number of parallelized nodes which had no
requested workers.

As Benoit said yesterday, the intent is to help administrators evaluate the
usage of parallel workers in their databases and help configuring
parallelization usage.

A test script (test2.sql) is attached. You can execute it with "psql -Xef
test2.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).

Here is its result, a bit commented:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION
SELECT pg_stat_statements_reset();
   pg_stat_statements_reset
---
 2024-08-29 18:00:35.314557+02
(1 row)

DROP TABLE IF EXISTS t1;
DROP TABLE
CREATE TABLE t1 (id integer);
CREATE TABLE
INSERT INTO t1 SELECT generate_series(1, 10_000_000);
INSERT 0 1000
VACUUM ANALYZE t1;
VACUUM
SELECT query,
  parallelized_queries_planned, parallelized_queries_launched,
  parallelized_workers_planned, parallelized_workers_launched,
  parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
(0 rows)

SELECT * FROM t1 LIMIT 1;
 id

  1
(1 row)

SELECT pg_sleep(1);
SELECT query,
  parallelized_queries_planned, parallelized_queries_launched,
  parallelized_workers_planned, parallelized_workers_launched,
  parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]--+--
query  | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned   | 0
parallelized_queries_launched  | 0
parallelized_workers_planned   | 0
parallelized_workers_launched  | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker   | 0

==> no parallelization

SELECT count(*) FROM t1;
  count
--
 1000
(1 row)

SELECT pg_sleep(1);
SELECT query,
  parallelized_queries_planned, parallelized_queries_launched,
  parallelized_workers_planned, parallelized_workers_launched,
  parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]--+--
query  | SELECT count(*) FROM t1
parallelized_queries_planned   | 1
parallelized_queries_launched  | 1
parallelized_workers_planned   | 2
parallelized_workers_launched  | 2
parallelized_nodes | 1
parallelized_nodes_all_workers | 1
parallelized_nodes_no_worker   | 0
-[ RECORD 2 ]--+--
query  | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned   | 0
parallelized_queries_launched  | 0
parallelized_workers_planned   | 0
parallelized_workers_launched  | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker   | 0

==> one parallelized query
==> I have the default configuration, so 2 for
max_parallel_worker_per_gather
==> hence two workers, with one node with all workers

SET max_parallel_workers_per_gather TO 5;
SET
SELECT count(*) FROM t1;
  count
--
 1000
(1 row)

SELECT pg_sleep(1);
SELECT query,
  parallelized_queries_planned, parallelized_queries_launched,
  parallelized_workers_planned, parallelized_workers_launched,
  parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]--+--
query  | SELECT count(*) FROM t1
parallelized_queries_planned   | 2
parallelized_queries_launched  | 2
parallelized_workers_planned   | 6
parallelized_workers_launched  | 6
parallelized_nodes | 2
parallelized_nodes_all_workers | 2
parallelized_nodes_no_worker   | 0
-[ RECORD 2 ]--+--
query  | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned   | 0
parallelized_queries_launched  | 0
parallelized_workers_planned   | 0
parallelized_workers_launched  | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker   | 0

==> another parallelized query
==> with 

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2024-08-29 Thread Jacob Champion
On Sun, Jun 30, 2024 at 10:48 AM Noah Misch  wrote:v
> > Would anyone like me to be more aggressive, and create a pgstat entry
> > as soon as we have the opening transaction? Or... as soon as a
> > connection is made?
>
> All else being equal, I'd like backends to have one before taking any lmgr
> lock or snapshot.

v3-0003 pushes the pgstat creation as far back as I felt comfortable,
right after the PGPROC registration by InitProcessPhase2(). That
function does lock the ProcArray, but if it gets held forever due to
some bug, you won't be able to use pg_stat_activity to debug it
anyway. And with this ordering, pg_stat_get_activity() will be able to
retrieve the proc entry by PID without a race.

This approach ends up registering an early entry for more cases than
the original patchset. For example, autovacuum and other background
workers will now briefly get their own "authenticating" state, which
seems like it could potentially confuse people. Should I rename the
state, or am I overthinking it?

> You could release the xmin before calling PAM or LDAP.  If you've copied all
> relevant catalog content to local memory, that's fine to do.

I played with the xmin problem a little bit, but I've shelved it for
now. There's probably a way to do that safely; I just don't understand
enough about the invariants to do it. For example, there's a comment
later on that says

 * We established a catalog snapshot while reading pg_authid and/or
 * pg_database;

and I'm a little nervous about invalidating the snapshot halfway
through that process. Even if PAM and LDAP don't rely on pg_authid or
other shared catalogs today, shouldn't they be allowed to in the
future, without being coupled to InitPostgres implementation order?
And I don't think we can move the pg_database checks before
authentication.

As for the other patches, I'll ping Andrew about 0001, and 0004
remains in its original WIP state. Anyone excited about that wait
event idea?

Thanks!
--Jacob


v3-0002-Test-Cluster-let-background_psql-work-asynchronou.patch
Description: Binary data


v3-0001-BackgroundPsql-handle-empty-query-results.patch
Description: Binary data


v3-0003-pgstat-report-in-earlier-with-STATE_AUTHENTICATIN.patch
Description: Binary data


v3-0004-WIP-report-external-auth-calls-as-wait-events.patch
Description: Binary data


Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-08-29 Thread Masahiko Sawada
On Sun, May 19, 2024 at 6:46 AM Noah Misch  wrote:
>
> On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> >
> > How about extending amcheck to support GIN and GIst indexes so that it
> > can detect potential data incompatibility due to changing 'char' to
> > 'unsigned char'? I think these new tests would be useful also for
> > users to check if they really need to reindex indexes due to such
> > changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> > Users who upgraded to PG18 can run the new amcheck tests on the
> > primary as well as the standby.
>
> If I were standardizing pg_trgm on one or the other notion of "char", I would
> choose signed char, since I think it's still the majority.  More broadly, I
> see these options to fix pg_trgm:
>
> 1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
> 2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.

Even though it's true that signed char systems are the majority, it
would not be acceptable to force the need to scan pg_trgm indexes on
unsigned char systems.

> 3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
>operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
>pg_upgrade on an unsigned-char system would automatically map v17
>gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
>architecture with upgrade-time scans.

Very interesting idea. How can new v18 users use the correct operator
class? I don't want to require users to specify the correct signed or
unsigned operator classes when creating a GIN index. Maybe we need to
dynamically use the correct compare function for the same operator
class depending on the char signedness. But is it possible to do it on
the extension (e.g. pg_trgm) side?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

2024-08-29 Thread Tom Lane
Richard Guo  writes:
> On Thu, Aug 29, 2024 at 4:47 AM Tom Lane  wrote:
>> In the meantime, I think this test case is mighty artificial,
>> and it wouldn't bother me any to just take it out again for the
>> time being.

> Yeah, I think we can remove the 't1.two+t2.two' test case if we go
> with your proposed patch to address this performance regression.

Here's a polished-up patchset for that.  I made the memoize test
removal a separate patch because (a) it only applies to master
and (b) it seems worth calling out as something we might be able
to revert later.

I found one bug in the draft patch: add_nulling_relids only processes
Vars of level zero, so we have to apply it before not after adjusting
the Vars' levelsup.  An alternative could be to add a levelsup
parameter to add_nulling_relids, but that seemed like unnecessary
complication.

regards, tom lane

From 8fafdc4852b8f2164286e6863219eb6b4d267639 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Thu, 29 Aug 2024 16:25:23 -0400
Subject: [PATCH v1 1/2] Remove one memoize test case added by commit
 069d0ff02.

This test case turns out to depend on the assumption that a non-Var
subquery output that's underneath an outer join will always get
wrapped in a PlaceHolderVar.  But that behavior causes performance
regressions in some cases compared to what happened before v16.
The next commit will avoid inserting a PHV in the same cases where
pre-v16 did, and that causes get_memoized_path to not detect that
a memoize plan could be used.

Commit this separately, in hopes that we can restore the test after
making get_memoized_path smarter.  (It's failing to find memoize
plans in adjacent cases where no PHV was ever inserted, so there
is definitely room for improvement there.)

Discussion: https://postgr.es/m/cag1ps1xvntzcekk24oufmklpvdp2vjt-d+f2aocwbw_v3ke...@mail.gmail.com
---
 src/test/regress/expected/memoize.out | 30 ---
 src/test/regress/sql/memoize.sql  | 11 --
 2 files changed, 41 deletions(-)

diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index df2ca5ba4e..9ee09fe2f5 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -160,36 +160,6 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
   1000 | 9.5000
 (1 row)
 
--- Try with LATERAL references within PlaceHolderVars
-SELECT explain_memoize('
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
-   explain_memoize
---
- Aggregate (actual rows=1 loops=N)
-   ->  Nested Loop (actual rows=1000 loops=N)
- ->  Seq Scan on tenk1 t1 (actual rows=1000 loops=N)
-   Filter: (unique1 < 1000)
-   Rows Removed by Filter: 9000
- ->  Memoize (actual rows=1 loops=N)
-   Cache Key: t1.two
-   Cache Mode: binary
-   Hits: 998  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
-   ->  Seq Scan on tenk1 t2 (actual rows=1 loops=N)
- Filter: ((t1.two + two) = unique1)
- Rows Removed by Filter: 
-(12 rows)
-
--- And check we get the expected results.
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
- count |avg 
+
-  1000 | 9.
-(1 row)
-
 -- Ensure we do not omit the cache keys from PlaceHolderVars
 SELECT explain_memoize('
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
diff --git a/src/test/regress/sql/memoize.sql b/src/test/regress/sql/memoize.sql
index 059bec5f4f..2eaeb1477a 100644
--- a/src/test/regress/sql/memoize.sql
+++ b/src/test/regress/sql/memoize.sql
@@ -85,17 +85,6 @@ SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
 LATERAL (SELECT t1.two+1 AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
 WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
 
--- Try with LATERAL references within PlaceHolderVars
-SELECT explain_memoize('
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
-
--- And check we get the expected results.
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
-
 -- Ensure we do not omit the cache keys from PlaceHolderVars
 SELECT explain_memoize('
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-- 
2.43.5

From 7b48394838797489e7ab869f97ca06449fdbcee3 Mon Sep 17 00:00:00 2001
F

Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-29 Thread Andrew Dunstan



On 2024-08-29 Th 1:01 PM, Mark Murawski wrote:



On 8/29/24 11:56, Andrew Dunstan wrote:


On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able 
to tell which function you're in.
This is a hashref so it will be possible to populate new and 
exciting other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning 
like this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. This 
could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, it's 
greatly helped track down issues.  And there shouldn't be anything 
platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres patch, 
so any guidance on adding regression testing would be appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in 
the past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name 
already, just like when triggers pass you $_TD with all the 
pertinent information


A wishlist item would be for postgres plperl to automatically 
prepend the function name and schema when throwing perl warnings so 
you don't have to do your own __WARN__ handler, but this is the next 
best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, whereas 
you can know the function's name at compile time, using just the 
mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for other 
PLs?






Hi Andrew,


Thanks for the feedback.


1) Why is this not similar to _TD?  It literally operates identically. 
At run-time it passes you $_TD  for triggers.    Same her for 
functions.  This is all run-time.   What exactly is the issue you're 
trying to point out?



It's not the same as the trigger data case because the function name is 
knowable at compile time, as in fact you have demonstrated. You just 
find it a bit inconvenient to have to code for that knowledge. By 
contrast, trigger data is ONLY knowable at run time.


But I don't see that it is such a heavy burden to have to write

  my $funcname = "foo";

or similar in your function.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Switching XLog source from archive to streaming when primary available

2024-08-29 Thread John H
Hi,

On Thu, Aug 29, 2024 at 6:32 AM Bharath Rupireddy
 wrote:
> In synchronous replication setup, until standby finishes fetching WAL
> from the archive, the commits on the primary have to wait which can
> increase the query latency. If the standby can connect to the primary
> as soon as the broken connection is restored, it can fetch the WAL
> soon and transaction commits can continue on the primary. Is my
> understanding correct? Is there anything more to this?
>

Yup, if you're running with synchronous_commit = 'on' with
synchronous_replicas, then you can
have the replica continue streaming changes into pg_wal faster than
WAL replay so commits
may be unblocked faster.

> I talked to Michael Paquier at PGConf.Dev 2024 and got some concerns
> about this feature for dealing with changing timelines. I can't think
> of them right now.

I'm not sure what the risk would be if the WAL/history files we sync
from streaming is the same as
we replay from archive.


> And, there were some cautions raised upthread -
> https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13
> and https://www.postgresql.org/message-id/ZffaQt7UbM2Q9kYh%40paquier.xyz.

Yup agreed. I need to understand this area a lot better before I can
do a more in-depth review.

> Interesting. Yes, the restore script has to be smarter to detect the
> broken connections and distinguish whether the server is performing
> just the archive recovery/PITR or streaming from standby. Not doing it
> right, perhaps, can cause data loss (?).

I don't think there would be data-loss, only replay is stuck/slows down.
It wouldn't be any different today if the restore-script returned a
non-zero exit status.
The end-user could configure their restore-script to return a non-zero
status, based on some
condition, to move to streaming.

> > > However,
> > > + * exhaust all the WAL present in pg_wal before switching. If successful,
> > > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls
> > > + * back to XLOG_FROM_ARCHIVE state.
> >
> > I think I'm missing how this happens. Or what "successful" means. If I'm 
> > reading
> > it right, no matter what happens we will always move to
> > XLOG_FROM_STREAM based on how
> > the state machine works?
>
> Please have a look at some discussion upthread on exhausting pg_wal
> before switching -
> https://www.postgresql.org/message-id/20230119005014.GA3838170%40nathanxps13.
> Even today, the standby exhausts pg_wal before switching to streaming
> from the archive.
>

I'm getting caught on the word "successful". My rough understanding of
WaitForWALToBecomeAvailable is that once you're in XLOG_FROM_PG_WAL, if it was
unsuccessful for whatever reason, it will still transition to
XLOG_FROM_STREAMING.
It does not loop back to XLOG_FROM_ARCHIVE if XLOG_FROM_PG_WAL fails.

> Nice catch. This is a problem. One idea is to disable
> streaming_replication_retry_interval feature for slot-less streaming
> replication - either when primary_slot_name isn't specified disallow
> the GUC to be set in assign_hook or when deciding to switch the wal
> source. Thoughts?

I don't think it's dependent on slot-less streaming. You would also run into the
issue if the WAL is no longer there on the primary, which can occur
with 'max_slot_wal_keep_size'
as well.
IMO the guarantee we need to make is that when we transition from
XLOG_FROM_STREAMING to
XLOG_FROM_ARCHIVE for a "fresh start", we should attempt to restore
from archive at least once.
I think this means that wal_source_switch_state should be reset back
to SWITCH_TO_STREAMING_NONE
whenever we transition to XLOG_FROM_ARCHIVE.
We've attempted the switch to streaming once, so let's not continually
re-try if it failed.

Thanks,

-- 
John Hsu - Amazon Web Services




Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-29 Thread Mark Murawski




On 8/29/24 16:54, Andrew Dunstan wrote:


On 2024-08-29 Th 1:01 PM, Mark Murawski wrote:



On 8/29/24 11:56, Andrew Dunstan wrote:


On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able 
to tell which function you're in.
This is a hashref so it will be possible to populate new and 
exciting other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning 
like this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. This 
could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, it's 
greatly helped track down issues.  And there shouldn't be anything 
platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres 
patch, so any guidance on adding regression testing would be 
appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in 
the past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name 
already, just like when triggers pass you $_TD with all the 
pertinent information


A wishlist item would be for postgres plperl to automatically 
prepend the function name and schema when throwing perl warnings so 
you don't have to do your own __WARN__ handler, but this is the 
next best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, 
whereas you can know the function's name at compile time, using just 
the mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for other 
PLs?






Hi Andrew,


Thanks for the feedback.


1) Why is this not similar to _TD?  It literally operates 
identically. At run-time it passes you $_TD  for triggers. Same her 
for functions.  This is all run-time.   What exactly is the issue 
you're trying to point out?



It's not the same as the trigger data case because the function name 
is knowable at compile time, as in fact you have demonstrated. You 
just find it a bit inconvenient to have to code for that knowledge. By 
contrast, trigger data is ONLY knowable at run time.


But I don't see that it is such a heavy burden to have to write

  my $funcname = "foo";

or similar in your function.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Understood, regarding knowability.  Trigger data is definitely going to 
be very dynamic in that regard.


No, It's not a heavy burden to hard code the function name.  But what my 
ideal goal would be is this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu 
AS $function$
use 'PostgresWarnHandler'; # <--- imagine this registers a WARN handler 
and outputs $_FN->{name} for you as part of the final warning


my $a;
print $a;

 etc


and then there's nothing 'hard coded' regarding the name of the 
function, anywhere.  It just seems nonsensical that postgres plperl 
can't send you the name of your registered function and there's 
absolutely no way to get it other than hard coding the name 
(redundantly, considering you're already know the name when you're 
defining the function in the first place)


But even better would be catching the warn at the plperl level, 
prepending the function name to the warn message, and then you only need:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu 
AS $function$


my $a;
print $a;

 etc

And then in this hypothetical situation -- magic ensues, and you're left 
with this:

# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $a in concatenation 
(.) or string in function public.throw_warning() line 1










Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Jim Jones



On 29.08.24 20:50, Pavel Stehule wrote:
>
> I know, but theoretically, there can be some benefit for CANONICAL if
> pg supports bytea there. Lot of databases still use non utf8 encoding.
>
> It is a more theoretical question - if pg supports different types
> there in future  (because SQL/XML or Oracle), then CANONICAL can be
> used without limit,
I like the idea of extending the feature to support bytea. I can
definitely take a look at it, but perhaps in another patch? This change
would most likely involve transformXmlSerialize in parse_expr.c, and I'm
not sure of the impact in other usages of XMLSERIALIZE.
> or CANONICAL can be used just for text? And you are sure, so you can
> compare text X text, instead xml X xml?
Yes, currently it only supports varchar or text - and their cousins. The
idea is to format the xml and serialize it as text in a way that they
can compared based on their content, independently of how they were
written, e.g '' is equal to ''.

>
> +SELECT xmlserialize(CONTENT doc AS text CANONICAL) =
> xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM
> xmltest_serialize;
> + ?column?
> +--
> + t
> + t
> +(2 rows)
>
> Maybe I am a little bit confused by these regress tests, because at
> the end it is not too useful - you compare two identical XML, and WITH
> COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search
> for a sense of this test.  Better to use really different documents
> (columns) instead.

Yeah, I can see that it's confusing. In this example I actually just
wanted to test that the default option of CANONICAL is CANONICAL WITH
COMMENTS, even if you don't mention it. In the docs I mentioned it like
this:

"The optional parameters WITH COMMENTS (which is the default) or WITH NO
COMMENTS, respectively, keep or remove XML comments from the given
document."

Perhaps I should rephrase it? Or maybe a comment in the regression tests
would suffice?

Thanks a lot for the input!

-- 
Jim





Re: [BUG?] XMLSERIALIZE( ... INDENT) won't work with blank nodes

2024-08-29 Thread Jim Jones


On 28.08.24 10:19, Jim Jones wrote:
> Hi,
>
> While testing a feature reported by Pavel in this thread[1] I realized
> that elements containing whitespaces between them won't be indented with
> XMLSERIALIZE( ... INDENT)
>
> SELECT xmlserialize(DOCUMENT '42' AS text INDENT);
>
>   xmlserialize   
> -
>    +
>    42+
>   +
>  
> (1 row)
>
> SELECT xmlserialize(DOCUMENT ' 42 '::xml AS text
> INDENT);
>     xmlserialize    
> 
>   42 +
>  
> (1 row)
>
>
> Other products have a different approach[2]
>
> Perhaps simply setting xmltotext_with_options' parameter 
> "perserve_whitespace" to false when XMLSERIALIZE(.. INDENT) would do the 
> trick.
>
> doc = xml_parse(data, xmloption_arg, !indent ? true : false,
>   GetDatabaseEncoding(),
>   &parsed_xmloptiontype, &content_nodes,
>   (Node *) &escontext);
>
>
> (diff attached)
>
> SELECT xmlserialize(DOCUMENT ' 42 '::xml AS text
> INDENT);
>   xmlserialize   
> -
>+
>42+
>   +
>  
> (1 row)
>
> If this is indeed the way to go I can update the regression tests accordingly.
>
> Best,
>

Just created a CF entry for this: https://commitfest.postgresql.org/49/5217/
v1 attached includes regression tests.

-- 
Jim
From e474cdc457a91e96432f5a14c69b4ecbc0345579 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 29 Aug 2024 19:41:02 +0200
Subject: [PATCH v1] Bug fix for XMLSERIALIZE(...INDENT) for xml containing
 blank nodes

This fixes a bug that let XMLSERIALIZE(... INDENT) to ignore
blank nodes. It basically sets xml_parse's parameter
'preserve_whitespace' to false if the INDENT flag was used
in XMLSERIALIZE. New regression tests are also included.
---
 src/backend/utils/adt/xml.c | 10 --
 src/test/regress/expected/xml.out   | 19 +++
 src/test/regress/expected/xml_1.out | 11 +++
 src/test/regress/expected/xml_2.out | 19 +++
 src/test/regress/sql/xml.sql|  3 +++
 5 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 447e72b21e..1cd4929870 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -677,8 +677,14 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
 	}
 
 #ifdef USE_LIBXML
-	/* Parse the input according to the xmloption */
-	doc = xml_parse(data, xmloption_arg, true, GetDatabaseEncoding(),
+	/*
+	 * Parse the input according to the xmloption
+	 * preserve_whitespace is set to false in case the function should
+	 * return an indented xml, otherwise libxml2 will ignore the elements
+	 * that contain whitespaces between them.
+	 */
+	doc = xml_parse(data, xmloption_arg, !indent ? true : false,
+	GetDatabaseEncoding(),
 	&parsed_xmloptiontype, &content_nodes,
 	(Node *) &escontext);
 	if (doc == NULL || escontext.error_occurred)
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 93a79cda8f..6f073101a1 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -663,6 +663,25 @@ SELECT xmlserialize(CONTENT  '42' AS text
  t
 (1 row)
 
+-- indent xml strings containing blank nodes
+SELECT xmlserialize(DOCUMENT '   ' AS text INDENT);
+ xmlserialize 
+--
++
+   +
+   +
+ 
+(1 row)
+
+SELECT xmlserialize(CONTENT  'text node   ' AS text INDENT);
+ xmlserialize 
+--
+ text node   +
++
+   +
+ 
+(1 row)
+
 SELECT xml 'bar' IS DOCUMENT;
  ?column? 
 --
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 9323b84ae2..d26e10441e 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -443,6 +443,17 @@ ERROR:  unsupported XML feature
 LINE 1: SELECT xmlserialize(CONTENT  '42<...
  ^
 DETAIL:  This functionality requires the server to be built with libxml support.
+-- indent xml strings containing blank nodes
+SELECT xmlserialize(DOCUMENT '   ' AS text INDENT);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '   '...
+ ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT  'text node   ' AS text INDENT);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT  'text node ...
+ ^
+DETAIL:  This functionality requires the server to be built with libxml support.
 SELECT xml 'bar' IS DOCUMENT;
 ERROR:  unsupported XML feature
 LINE 1: SELECT xml 'bar' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index f956322c69..7b154da4ba 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -649,6 +649,25 @@ S

Re: allowing extensions to control planner behavior

2024-08-29 Thread Jeff Davis
On Wed, 2024-08-28 at 19:25 -0400, Tom Lane wrote:
> This does not seem remarkably problematic to me, given Robert's
> proposal of a bitmask of allowed plan types per RelOptInfo.
> You just do something like
> 
> rel->allowed_plan_types = DESIRED_PLAN_TYPE;
> 
> The names of the bits you aren't setting are irrelevant to you.

I don't see that in the code yet, so I assume you are referring to the
comment at [1]?

I still like my idea to generalize the pathkey infrastructure, and
Robert asked for other approaches to consider. It would allow us to
hold onto multiple paths for longer, similar to pathkeys, which might
offer some benefits or simplifications.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/CA+TgmoZQyVxnRU--4g2bJonJ8RyJqNi2CHpy-=nwwbtnpaj...@mail.gmail.com





Re: Removing log_cnt from pg_sequence_read_tuple()

2024-08-29 Thread Michael Paquier
On Thu, Aug 29, 2024 at 09:28:49AM -0500, Nathan Bossart wrote:
> On Thu, Aug 29, 2024 at 02:11:22PM +0900, Michael Paquier wrote:
> > Okay, here is a v2 of the patch using this name for the function.
> 
> LGTM

Thanks, applied that, after one tweak for the #define name.
--
Michael


signature.asc
Description: PGP signature


Re: Make printtup a bit faster

2024-08-29 Thread Andy Fan
David Rowley  writes:

Hello David,

>> My high level proposal is define a type specific print function like:
>>
>> oidprint(Datum datum, StringInfo buf)
>> textprint(Datum datum, StringInfo buf)
>
> I think what we should do instead is make the output functions take a
> StringInfo and just pass it the StringInfo where we'd like the bytes
> written.
>
> That of course would require rewriting all the output functions for
> all the built-in types, so not a small task.  Extensions make that job
> harder. I don't think it would be good to force extensions to rewrite
> their output functions, so perhaps some wrapper function could help us
> align the APIs for extensions that have not been converted yet.

I have the similar concern as Tom that this method looks too
aggressive. That's why I said: 

"If a type's print function is not defined, we can still using the out 
function."

AND

"Hard coding the relationship between [common] used type and {type}print
function OID looks not cool, Adding a new attribute in pg_type looks too 
aggressive however. Anyway this is the next topic to talk about."

What would be the extra benefit we redesign all the out functions?

> There's a similar problem with input functions not having knowledge of
> the input length. You only have to look at textin() to see how useful
> that could be. Fixing that would probably make COPY FROM horrendously
> faster. Team that up with SIMD for the delimiter char search and COPY
> go a bit better still. Neil Conway did propose the SIMD part in [1],
> but it's just not nearly as good as it could be when having to still
> perform the strlen() calls.

OK, I think I can understand the needs to make in-function knows the
input length and good to know the SIMD part for delimiter char
search. strlen looks like a delimiter char search ('\0') as well. Not
sure if "strlen" has been implemented with SIMD part, but if not, why? 

> I had planned to work on this for PG18, but I'd be happy for some
> assistance if you're willing.

I see you did many amazing work with cache-line-frindly data struct
design, branch predition optimization and SIMD optimization. I'd like to
try one myself. I'm not sure if I can meet the target, what if we handle
the out/in function separately (can be by different people)? 

-- 
Best Regards
Andy Fan





Re: Make printtup a bit faster

2024-08-29 Thread David Rowley
On Fri, 30 Aug 2024 at 03:33, Tom Lane  wrote:
>
> David Rowley  writes:
> > [ redesign I/O function APIs ]
> > I had planned to work on this for PG18, but I'd be happy for some
> > assistance if you're willing.
>
> I'm skeptical that such a thing will ever be practical.  To avoid
> breaking un-converted data types, all the call sites would have to
> support both old and new APIs.  To avoid breaking non-core callers,
> all the I/O functions would have to support both old and new APIs.
> That probably adds enough overhead to negate whatever benefit you'd
> get.

Scepticism is certainly good when it comes to such a large API change.
I don't want to argue with you, but I'd like to state a few things
about why I think you're wrong on this...

So, we currently return cstrings in our output functions. Let's take
jsonb_out() as an example, to build that cstring, we make a *new*
StringInfoData on *every call* inside JsonbToCStringWorker(). That
gives you 1024 bytes before you need to enlarge it. However, it's
maybe not all bad as we have some size estimations there to call
enlargeStringInfo(), only that's a bit wasteful as it does a
repalloc() which memcpys the freshly allocated 1024 bytes allocated in
initStringInfo() when it doesn't yet contain any data. After
jsonb_out() has returned and we have the cstring, only we forgot the
length of the string, so most places will immediately call strlen() or
do that indirectly via appendStringInfoString(). For larger JSON
documents, that'll likely require pulling cachelines back into L1
again. I don't know how modern CPU cacheline eviction works, but if it
was as simple as FIFO then the strlen() would flush all those
cachelines only for memcpy() to have to read them back again for
output strings larger than L1.

If we rewrote all of core's output functions to use the new API, then
the branch to test the function signature would be perfectly
predictable and amount to an extra cmp and jne/je opcode. So, I just
don't agree with the overheads negating the benefits comment. You're
probably off by 1 order of magnitude at the minimum and for
medium/large varlena types likely 2-3+ orders. Even a simple int4out
requires a palloc()/memcpy. If we were outputting lots of data, e.g.
in a COPY operation, the output buffer would seldom need to be
enlarged as it would quickly adjust to the correct size.

For the input functions, the possible gains are extensive too.
textin() is a good example, it uses cstring_to_text(), but could be
changed to use cstring_to_text_with_len(). Knowing the input string
length also opens the door to SIMD. Take int4in() as an example, if
pg_strtoint32_safe() knew its input length there are a bunch of
prechecks that could be done with either 64-bit SWAR or with SIMD.
For example, if you knew you had an 8-char string of decimal digits
then converting that to an int32 is quite cheap. It's impossible to
overflow an int32 with 8 decimal digits, so no overflow checks need to
be done until there are at least 10 decimal digits. ca6fde922 seems
like good enough example of the possible gains of SIMD vs
byte-at-a-time processing. I saw some queries go 4x faster there and
that was me trying to keep the JSON document sizes realistic.
byte-at-a-time is just not enough to saturate RAM speed. Take DDR5,
for example, Wikipedia says it has a bandwidth of 32–64 GB/s, so
unless we discover room temperature superconductors, we're not going
to see any massive jump in clock speeds any time soon, and with 5 or
6Ghz CPUs, there's just no way to get anywhere near that bandwidth by
processing byte-at-a-time. For some sort of nieve strcpy() type
function, you're going to need at least a cmp and mov, even if those
were latency=1 (which they're not, see [1]), you can only do 2.5
billion of those two per second on a 5Ghz processor. I've done tested,
but hypothetically (assuming latency=1) that amounts to processing
2.5GB/s, i.e. a long way from DDR5 RAM speed and that's not taking
into account having to increment pointers to the next byte on each
loop.

David

[1] https://www.agner.org/optimize/instruction_tables.pdf




Re: Make printtup a bit faster

2024-08-29 Thread David Rowley
On Fri, 30 Aug 2024 at 12:10, Andy Fan  wrote:
> What would be the extra benefit we redesign all the out functions?

If I've understood your proposal correctly, it sounds like you want to
invent a new "print" output function for each type to output the Datum
onto a StringInfo, if that's the case, what would be the point of
having both versions? If there's anywhere we call output functions
where the resulting value isn't directly appended to a StringInfo,
then we could just use a temporary StringInfo to obtain the cstring
and its length.

David




Re: Make printtup a bit faster

2024-08-29 Thread Andy Fan
David Rowley  writes:

> On Fri, 30 Aug 2024 at 12:10, Andy Fan  wrote:
>> What would be the extra benefit we redesign all the out functions?
>
> If I've understood your proposal correctly, it sounds like you want to
> invent a new "print" output function for each type to output the Datum
> onto a StringInfo,

Mostly yes, but not for [each type at once], just for the [common used
type], like int2/4/8, float4/8, date/time/timestamp, text/.. and so on.

> if that's the case, what would be the point of having both versions?

The biggest benefit would be compatibility.

In my opinion, print function (not need to be in pg_type at all) is as
an optimization and optional, in some performance critical path we can
replace the out-function with printfunction, like (printtup). if such
performance-critical path find a type without a print-function is
defined, just keep the old way.

Kind of like supportfunction for proc, this is for data type? Within
this way, changes would be much smaller and step-by-step.

> If there's anywhere we call output functions
> where the resulting value isn't directly appended to a StringInfo,
> then we could just use a temporary StringInfo to obtain the cstring
> and its length.

I think this is true, but it requests some caller's code change.

-- 
Best Regards
Andy Fan





Re: Make printtup a bit faster

2024-08-29 Thread David Rowley
On Fri, 30 Aug 2024 at 13:04, Andy Fan  wrote:
>
> David Rowley  writes:
> > If there's anywhere we call output functions
> > where the resulting value isn't directly appended to a StringInfo,
> > then we could just use a temporary StringInfo to obtain the cstring
> > and its length.
>
> I think this is true, but it requests some caller's code change.

Yeah, calling code would need to be changed to take advantage of the
new API, however, the differences in which types support which API
could be hidden inside OutputFunctionCall(). That function could just
fake up a StringInfo for any types that only support the old cstring
API. That means we don't need to add handling for both cases
everywhere we need to call the output function. It's possible that
could make some operations slightly slower when only the old API is
available, but then maybe not as we do now have read-only StringInfos.
Maybe the StringInfoData.data field could just be set to point to the
given cstring using initReadOnlyStringInfo() rather than doing
appendBinaryStringInfo() onto yet another buffer for the old API.

David




Re: Make printtup a bit faster

2024-08-29 Thread Andy Fan
David Rowley  writes:

> On Fri, 30 Aug 2024 at 13:04, Andy Fan  wrote:
>>
>> David Rowley  writes:
>> > If there's anywhere we call output functions
>> > where the resulting value isn't directly appended to a StringInfo,
>> > then we could just use a temporary StringInfo to obtain the cstring
>> > and its length.
>>
>> I think this is true, but it requests some caller's code change.
>
> Yeah, calling code would need to be changed to take advantage of the
> new API, however, the differences in which types support which API
> could be hidden inside OutputFunctionCall(). That function could just
> fake up a StringInfo for any types that only support the old cstring
> API. That means we don't need to add handling for both cases
> everywhere we need to call the output function.

We can do this, then the printtup case (stands for some performance
crital path) still need to change discard OutputFunctionCall() since it
uses the fake StringInfo then a memcpy is needed again IIUC.

Besides above, my major concerns about your proposal need to change [all
the type's outfunction at once] which is too aggresive for me. In the
fresh setup without any extension is created, "SELECT count(*) FROM
pg_type" returns 627 already, So other piece of my previous reply is 
more important to me.  

It is great that both of us feeling the current stategy is not good for
performance:) 

-- 
Best Regards
Andy Fan





Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-08-29 Thread Peter Smith
Hi, here are some review comments for patch v43-0001.

==
Commit message

1.
... introduces a GUC allowing users set inactive timeout.

~

1a. You should give the name of the new GUC in the commit message.

1b. /set/to set/

==
doc/src/sgml/config.sgml

GUC "replication_slot_inactive_timeout"

2.
Invalidates replication slots that are inactive for longer than
specified amount of time

nit - suggest use similar wording as the prior GUC (wal_sender_timeout):
Invalidate replication slots that are inactive for longer than this
amount of time.

~

3.
This invalidation check happens either when the slot is acquired for
use or during a checkpoint. The time since the slot has become
inactive is known from its inactive_since value using which the
timeout is measured.

nit - the wording is too complicated. suggest:
The timeout check occurs when the slot is next acquired for use, or
during a checkpoint. The slot's 'inactive_since' field value is when
the slot became inactive.

~

4.
Note that the inactive timeout invalidation mechanism is not
applicable for slots on the standby that are being synced from a
primary server (whose synced field is true).

nit - that word "whose" seems ambiguous. suggest:
(e.g. the standby slot has 'synced' field true).

==
doc/src/sgml/system-views.sgml

5.
inactive_timeout means that the slot has been inactive for the
duration specified by replication_slot_inactive_timeout parameter.

nit - suggestion ("longer than"):
... the slot has been inactive for longer than the duration specified
by the replication_slot_inactive_timeout parameter.

==
src/backend/replication/slot.c

6.
 /* Maximum number of invalidation causes */
-#define RS_INVAL_MAX_CAUSES RS_INVAL_WAL_LEVEL
+#define RS_INVAL_MAX_CAUSES RS_INVAL_INACTIVE_TIMEOUT

IMO this #define belongs in the slot.h, immediately below where the
enum is defined.

~~~

7. ReplicationSlotAcquire:

I had a fundamental question about this logic.

IIUC the purpose of the patch was to invalidate replication slots that
have been inactive for too long.

So, it makes sense to me that some periodic processing (e.g.
CheckPointReplicationSlots) might do a sweep over all the slots, and
invalidate the too-long-inactive ones that it finds.

OTOH, it seemed quite strange to me that the patch logic is also
detecting and invalidating inactive slots during the
ReplicationSlotAcquire function. This is kind of saying "ERROR -
sorry, because this was inactive for too long you can't have it" at
the very moment that you wanted to use it again! IIUC such a slot
would be invalidated by the function InvalidatePossiblyObsoleteSlot(),
but therein lies my doubt -- how can the slot be considered as
"obsolete" when we are in the very act of trying to acquire/use it?

I guess it might be argued this is not so different to the scenario of
attempting to acquire a slot that had been invalidated momentarily
before during checkpoint processing. But, somehow that scenario seems
more like bad luck to me, versus ReplicationSlotAcquire() deliberately
invalidating something we *know* is wanted.

~

8.
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
since %s for more than %d seconds specified by
\"replication_slot_inactive_timeout\".",
+timestamptz_to_str(s->inactive_since),
+replication_slot_inactive_timeout)));

nit - IMO the info should be split into errdetail + errhint. Like this:
errdetail("The slot became invalid because it was inactive since %s,
which is more than %d seconds ago."...)
errhint("You might need to increase \"%s\".",
"replication_slot_inactive_timeout")

~~~

9. ReportSlotInvalidation

+ appendStringInfo(&err_detail,
+ _("The slot has been inactive since %s for more than %d seconds
specified by \"replication_slot_inactive_timeout\"."),
+ timestamptz_to_str(inactive_since),
+ replication_slot_inactive_timeout);
+ break;

IMO this error in ReportSlotInvalidation() should be the same as the
other one from ReplicationSlotAcquire(), which I suggested above
(comment #8) should include a hint. Also, including a hint here will
make this new message consistent with the other errhint (for
"max_slot_wal_keep_size") that is already in this function.

~~~

10. InvalidatePossiblyObsoleteSlot

+ if (cause == RS_INVAL_INACTIVE_TIMEOUT &&
+ (replication_slot_inactive_timeout > 0 &&
+ s->inactive_since > 0 &&
+ !(RecoveryInProgress() && s->data.synced)))

10a. Everything here is && so this has some redundant parentheses.

10b. Actually, IMO this complicated condition is overkill. Won't it be
better to just unconditionally assign
now = GetCurrentTimestamp(); here?

~

11.
+ * Note that we don't invalidate synced slots because,
+ * they are typically considered not active as they don't
+ * perform logical decoding to produce the changes.

nit - tweaked punctuation

~

12.
+ * If the

JIT: Remove some unnecessary instructions.

2024-08-29 Thread Xing Guo
Hi hackers,

I find there are some unnecessary load/store instructions being
emitted by the JIT compiler.

E.g.,

```
v_boolnull = l_load(b, TypeStorageBool, v_resnullp, "");
v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, "");

/* set resnull to boolnull */
LLVMBuildStore(b, v_boolnull, v_resnullp);
/* set revalue to boolvalue */
LLVMBuildStore(b, v_boolvalue, v_resvaluep);
```

v_boolnull is loaded from v_resnullp and stored to v_resnullp immediately.
v_boolvalue is loaded from v_resvaluep and stored to v_resvaluep immediately.

The attached patch is trying to fix them.

Best Regards,
Xing
From 6c7503b9e92c10b3dd9e6d33391273b854d53437 Mon Sep 17 00:00:00 2001
From: Xing Guo 
Date: Fri, 30 Aug 2024 11:43:45 +0800
Subject: [PATCH] JIT: Remove some unnecessary instructions.

There're some unnecessary instructions being emitted by the JIT
compiler. This patch removes them.
---
 src/backend/jit/llvm/llvmjit_expr.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 48ccdb942a..1edc476c59 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -720,11 +720,6 @@ llvm_compile_expr(ExprState *state)
 	v_boolnull = l_load(b, TypeStorageBool, v_resnullp, "");
 	v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, "");
 
-	/* set resnull to boolnull */
-	LLVMBuildStore(b, v_boolnull, v_resnullp);
-	/* set revalue to boolvalue */
-	LLVMBuildStore(b, v_boolvalue, v_resvaluep);
-
 	/* check if current input is NULL */
 	LLVMBuildCondBr(b,
 	LLVMBuildICmp(b, LLVMIntEQ, v_boolnull,
@@ -816,11 +811,6 @@ llvm_compile_expr(ExprState *state)
 	v_boolnull = l_load(b, TypeStorageBool, v_resnullp, "");
 	v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, "");
 
-	/* set resnull to boolnull */
-	LLVMBuildStore(b, v_boolnull, v_resnullp);
-	/* set revalue to boolvalue */
-	LLVMBuildStore(b, v_boolvalue, v_resvaluep);
-
 	LLVMBuildCondBr(b,
 	LLVMBuildICmp(b, LLVMIntEQ, v_boolnull,
   l_sbool_const(1), ""),
@@ -875,10 +865,8 @@ llvm_compile_expr(ExprState *state)
 			case EEOP_BOOL_NOT_STEP:
 {
 	LLVMValueRef v_boolvalue;
-	LLVMValueRef v_boolnull;
 	LLVMValueRef v_negbool;
 
-	v_boolnull = l_load(b, TypeStorageBool, v_resnullp, "");
 	v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, "");
 
 	v_negbool = LLVMBuildZExt(b,
@@ -887,8 +875,7 @@ llvm_compile_expr(ExprState *state)
 			l_sizet_const(0),
 			""),
 			  TypeSizeT, "");
-	/* set resnull to boolnull */
-	LLVMBuildStore(b, v_boolnull, v_resnullp);
+
 	/* set revalue to !boolvalue */
 	LLVMBuildStore(b, v_negbool, v_resvaluep);
 
-- 
2.46.0



Re: configure failures on chipmunk

2024-08-29 Thread Alexander Lakhin

Hello Heikki,

26.08.2024 07:00, Alexander Lakhin wrote:

Could you also take a look at the kerberos setup on chipmunk? Now that
chipmunk goes beyond configure (on HEAD and REL_17_STABLE), it fails on the
kerberosCheck stage, ...


I see that chipmunk turned green.

Thank you for taking care of that!

Best regards,
Alexander




Re: Collect statistics about conflicts in logical replication

2024-08-29 Thread shveta malik
On Thu, Aug 29, 2024 at 11:06 AM Zhijie Hou (Fujitsu)
 wrote:
>
>
> Agreed. Here is new version patch which change the names as suggested. I also
> rebased the patch based on another renaming commit 640178c9.
>

Thanks for the patch. Few minor things:

1)
conflict.h:
 * This enum is used in statistics collection (see
 * PgStat_StatSubEntry::conflict_count) as well, therefore, when adding new
 * values or reordering existing ones, ensure to review and potentially adjust
 * the corresponding statistics collection codes.

--We shall mention PgStat_BackendSubEntry as well.

026_stats.pl:
2)
# Now that the table is empty, the
# update conflict (update_existing) ERRORs will stop happening.

--Shall it be update_exists instead of update_existing here:

3)
This is an existing comment above insert_exists conflict capture:
# Wait for the apply error to be reported.

--Shall we change to:
# Wait for the subscriber to report apply error and insert_exists conflict.

thanks
Shveta




Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Pavel Stehule
čt 29. 8. 2024 v 23:54 odesílatel Jim Jones 
napsal:

>
>
> On 29.08.24 20:50, Pavel Stehule wrote:
> >
> > I know, but theoretically, there can be some benefit for CANONICAL if
> > pg supports bytea there. Lot of databases still use non utf8 encoding.
> >
> > It is a more theoretical question - if pg supports different types
> > there in future  (because SQL/XML or Oracle), then CANONICAL can be
> > used without limit,
> I like the idea of extending the feature to support bytea. I can
> definitely take a look at it, but perhaps in another patch? This change
> would most likely involve transformXmlSerialize in parse_expr.c, and I'm
> not sure of the impact in other usages of XMLSERIALIZE.
> > or CANONICAL can be used just for text? And you are sure, so you can
> > compare text X text, instead xml X xml?
> Yes, currently it only supports varchar or text - and their cousins. The
> idea is to format the xml and serialize it as text in a way that they
> can compared based on their content, independently of how they were
> written, e.g '' is equal to ''.
>
> >
> > +SELECT xmlserialize(CONTENT doc AS text CANONICAL) =
> > xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM
> > xmltest_serialize;
> > + ?column?
> > +--
> > + t
> > + t
> > +(2 rows)
> >
> > Maybe I am a little bit confused by these regress tests, because at
> > the end it is not too useful - you compare two identical XML, and WITH
> > COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search
> > for a sense of this test.  Better to use really different documents
> > (columns) instead.
>
> Yeah, I can see that it's confusing. In this example I actually just
> wanted to test that the default option of CANONICAL is CANONICAL WITH
> COMMENTS, even if you don't mention it. In the docs I mentioned it like
> this:
>
> "The optional parameters WITH COMMENTS (which is the default) or WITH NO
> COMMENTS, respectively, keep or remove XML comments from the given
> document."
>
> Perhaps I should rephrase it? Or maybe a comment in the regression tests
> would suffice?
>

comment will be enough



>
> Thanks a lot for the input!
>
> --
> Jim
>
>


Re: Make printtup a bit faster

2024-08-29 Thread Andy Fan
David Rowley  writes:

> On Fri, 30 Aug 2024 at 03:33, Tom Lane  wrote:
>>
>> David Rowley  writes:
>> > [ redesign I/O function APIs ]
>> > I had planned to work on this for PG18, but I'd be happy for some
>> > assistance if you're willing.
>>
>> I'm skeptical that such a thing will ever be practical.  To avoid
>> breaking un-converted data types, all the call sites would have to
>> support both old and new APIs.  To avoid breaking non-core callers,
>> all the I/O functions would have to support both old and new APIs.
>> That probably adds enough overhead to negate whatever benefit you'd
>> get.
>
> So, we currently return cstrings in our output functions. Let's take
> jsonb_out() as an example, to build that cstring, we make a *new*
> StringInfoData on *every call* inside JsonbToCStringWorker(). That
> gives you 1024 bytes before you need to enlarge it. However, it's
> maybe not all bad as we have some size estimations there to call
> enlargeStringInfo(), only that's a bit wasteful as it does a
> repalloc() which memcpys the freshly allocated 1024 bytes allocated in
> initStringInfo() when it doesn't yet contain any data. After
> jsonb_out() has returned and we have the cstring, only we forgot the
> length of the string, so most places will immediately call strlen() or
> do that indirectly via appendStringInfoString(). For larger JSON
> documents, that'll likely require pulling cachelines back into L1
> again. I don't know how modern CPU cacheline eviction works, but if it
> was as simple as FIFO then the strlen() would flush all those
> cachelines only for memcpy() to have to read them back again for
> output strings larger than L1.

The attached is PoC of this idea, not matter which method are adopted
(rewrite all the outfunction or a optional print function), I think the
benefit will be similar. In the blew test case, it shows us 10%+
improvements. (0.134ms vs 0.110ms)

create table demo as select oid as oid1, relname::text as text1, relam,
relname::text as text2 from pg_class; 

pgbench:  
select * from demo;

-- 
Best Regards
Andy Fan

>From 5ace763a5126478da1c3cb68d5221d83e45d2f34 Mon Sep 17 00:00:00 2001
From: Andy Fan 
Date: Fri, 30 Aug 2024 12:50:54 +0800
Subject: [PATCH v20240830 1/1] Avoiding some memcpy, strlen, palloc in
 printtup.

https://www.postgresql.org/message-id/87wmjzfz0h.fsf%40163.com
---
 src/backend/access/common/printtup.c | 40 ++--
 src/backend/utils/adt/oid.c  | 20 ++
 src/backend/utils/adt/varlena.c  | 17 
 src/include/catalog/pg_proc.dat  |  9 ++-
 4 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index c78cc39308..ecba4a7113 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -19,6 +19,7 @@
 #include "libpq/pqformat.h"
 #include "libpq/protocol.h"
 #include "tcop/pquery.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
@@ -49,6 +50,7 @@ typedef struct
 	bool		typisvarlena;	/* is it varlena (ie possibly toastable)? */
 	int16		format;			/* format code for this column */
 	FmgrInfo	finfo;			/* Precomputed call info for output fn */
+	FmgrInfo	p_finfo;/* Precomputed call info for print fn if any */
 } PrinttupAttrInfo;
 
 typedef struct
@@ -274,10 +276,25 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 		thisState->format = format;
 		if (format == 0)
 		{
-			getTypeOutputInfo(attr->atttypid,
-			  &thisState->typoutput,
-			  &thisState->typisvarlena);
-			fmgr_info(thisState->typoutput, &thisState->finfo);
+			/*
+			 * If the type defines a print function, then use it
+			 * rather than outfunction.
+			 *
+			 * XXX: need a generic function to improve the if-elseif.
+			 */
+			if (attr->atttypid == OIDOID)
+fmgr_info(F_OIDPRINT, &thisState->p_finfo);
+			else if (attr->atttypid == TEXTOID)
+fmgr_info(F_TEXTPRINT, &thisState->p_finfo);
+			else
+			{
+getTypeOutputInfo(attr->atttypid,
+  &thisState->typoutput,
+  &thisState->typisvarlena);
+fmgr_info(thisState->typoutput, &thisState->finfo);
+/* mark print function is invalid */
+thisState->p_finfo.fn_oid = InvalidOid;
+			}
 		}
 		else if (format == 1)
 		{
@@ -355,10 +372,17 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 		if (thisState->format == 0)
 		{
 			/* Text output */
-			char	   *outputstr;
-
-			outputstr = OutputFunctionCall(&thisState->finfo, attr);
-			pq_sendcountedtext(buf, outputstr, strlen(outputstr));
+			if (thisState->p_finfo.fn_oid)
+			{
+FunctionCall2(&thisState->p_finfo, attr, PointerGetDatum(buf));
+			}
+			else
+			{
+char	   *outputstr;
+
+outputstr = OutputFunctionCall(&thisState->finfo, attr);
+pq_sendcountedtext(buf, outputstr, strlen(outputstr));
+			}
 		}
 		else
 		{
diff --git a/src/backend/utils/adt/oid

Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

2024-08-29 Thread Thomas Munro
Hi Alex,

JITLink came back onto the radar screen: we see that LLVM 20 will
deprecate the RuntimeDyld link layer that we're using.  JITLink would
in fact fix the open bug report we have about LLVM crashing all over
the place on ARM[1], and I can see that would be quite easy to do what
you showed, but we can't use that solution for that problem because it
only works on LLVM 15+ (and maybe doesn't even support all the
architectures until later releases, I haven't followed the details).
So we'll probably finish up backporting that fix for RuntimeDyld,
which seems like it's going to work OK (thanks Anthonin, CCd, for
diagnosing that).  That's all fine and good, but if my crystal ball is
operating correctly, fairly soon we'll have a situation where RHEL is
shipping versions that *only* support JITLink, while other distros are
still shipping versions that need RuntimeDyld because they don't yet
have JITLink or it is not mature enough yet for all architectures.  So
we'll need to support both for a while.  That's all fine, and I can
see that it's going to be pretty easy to do, it's mostly just
LLVMOrcCreateThis() or LLVMOrcCreateThat() with some #ifdef around it,
job done.

The question I have is: is someone looking into getting the C API we
need for that into the LLVM main branch (LLVM 20-to-be)?  I guess I
would prefer to be able to use that, rather than adding more of our
own C++ wrapper code into our tree, if we can, and it seems like now
would be a good time to get ahead of that.

[1] 
https://www.postgresql.org/message-id/flat/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com




Re: Collect statistics about conflicts in logical replication

2024-08-29 Thread Peter Smith
Hi Hou-San. Here are my review comments for v4-0001.

==

1. Add links in the docs

IMO it would be good for all these confl_* descriptions (in
doc/src/sgml/monitoring.sgml) to include links back to where each of
those conflict types was defined [1].

Indeed, when links are included to the original conflict type
information then I think you should remove mentioning
"track_commit_timestamp":
+   counted only when the
+   track_commit_timestamp
+   option is enabled on the subscriber.

It should be obvious that you cannot count a conflict if the conflict
does not happen, but I don't think we should scatter/duplicate those
rules in different places saying when certain conflicts can/can't
happen -- we should just link everywhere back to the original
description for those rules.

~~~

2. Arrange all the counts into an intuitive/natural order

There is an intuitive/natural ordering for these counts. For example,
the 'confl_*' count fields are in the order insert -> update ->
delete, which LGTM.

Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
in a good order.

IMO it makes more sense if everything is ordered as:
'sync_error_count', then 'apply_error_count', then all the 'confl_*'
counts.

This comment applies to lots of places, e.g.:
- docs (doc/src/sgml/monitoring.sgml)
- function pg_stat_get_subscription_stats (pg_proc.dat)
- view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
- TAP test SELECTs (test/subscription/t/026_stats.pl)

As all those places are already impacted by this patch, I think it
would be good if (in passing) we (if possible) also swapped the
sync/apply counts so everything  is ordered intuitively top-to-bottom
or left-to-right.

==
[1] 
https://www.postgresql.org/docs/devel/logical-replication-conflicts.html#LOGICAL-REPLICATION-CONFLICTS

Kind Regards,
Peter Smith.
Fujitsu Australia




Supporting pg freeze in pg_dump, restore.

2024-08-29 Thread Stepan Neretin
H, hackersi! Tell me, please Do I understand correctly that
pg_dump/pg_restore has
not been taught to do COPY FREEZE when filling data? The last mention of
the plans was Bruce in 2013: https://postgrespro.com/list/thread-id/1815657
https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-copy-freeze/
Best regards, Stepan Neretin.


Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Jim Jones


On 30.08.24 06:46, Pavel Stehule wrote:
>
>
> čt 29. 8. 2024 v 23:54 odesílatel Jim Jones
>  napsal:
>
>
> > +SELECT xmlserialize(CONTENT doc AS text CANONICAL) =
> > xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM
> > xmltest_serialize;
> > + ?column?
> > +--
> > + t
> > + t
> > +(2 rows)
> >
> > Maybe I am a little bit confused by these regress tests, because at
> > the end it is not too useful - you compare two identical XML,
> and WITH
> > COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search
> > for a sense of this test.  Better to use really different documents
> > (columns) instead.
>
> Yeah, I can see that it's confusing. In this example I actually just
> wanted to test that the default option of CANONICAL is CANONICAL WITH
> COMMENTS, even if you don't mention it. In the docs I mentioned it
> like
> this:
>
> "The optional parameters WITH COMMENTS (which is the default) or
> WITH NO
> COMMENTS, respectively, keep or remove XML comments from the given
> document."
>
> Perhaps I should rephrase it? Or maybe a comment in the regression
> tests
> would suffice?
>
>
> comment will be enough
>

v12 attached adds a comment to this test.

Thanks

-- 
Jim
From 8b17a05a3386aaffcc6c5dafa9c57f49278dc4b6 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 30 Aug 2024 07:55:08 +0200
Subject: [PATCH v12] Add CANONICAL output format to xmlserialize

This patch introduces the CANONICAL option to xmlserialize, which
serializes xml documents in their canonical form - as described in
the W3C Canonical XML Version 1.1 specification. This option can
be used with the additional parameter WITH [NO] COMMENTS to keep
or remove xml comments from the canonical xml output. In case no
parameter is provided, WITH COMMENTS will be used as default. This
feature is based on the function xmlC14NDocDumpMemory from the C14N
module of libxml2.

This patch also includes regression tests and documentation.
---
 doc/src/sgml/datatype.sgml|  41 +++-
 src/backend/executor/execExprInterp.c |   2 +-
 src/backend/parser/gram.y |  21 +-
 src/backend/parser/parse_expr.c   |   2 +-
 src/backend/utils/adt/xml.c   | 271 --
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |  10 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   2 +-
 src/test/regress/expected/xml.out | 115 +++
 src/test/regress/expected/xml_1.out   | 109 +++
 src/test/regress/expected/xml_2.out   | 115 +++
 src/test/regress/sql/xml.sql  |  66 +++
 src/tools/pgindent/typedefs.list  |   1 +
 14 files changed, 647 insertions(+), 110 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..19ba5bbf7f 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4496,7 +4496,7 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { [ NO ] INDENT ] | CANONICAL [ WITH [NO] COMMENTS ]})
 
 type can be
 character, character varying, or
@@ -4513,6 +4513,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS 
 
+   
+The option CANONICAL converts a given
+XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology";>canonical form
+based on the https://www.w3.org/TR/xml-c14n11/";>W3C Canonical XML 1.1 Specification.
+It is basically designed to provide applications the ability to compare xml documents or test if they
+have been changed. The optional parameters WITH COMMENTS (which is the default) or
+WITH NO COMMENTS, respectively, keep or remove XML comments from the given document.
+
+
+ 
+ Example:
+
+
+   

 When a character string value is cast to or from type
 xml without going through XMLPARSE or
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 77394e76c3..9222eed1e2 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4189,7 +4189,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 *op->resvalue =
 	PointerGetDatum(xmltotext_with_options(DatumGetXmlP(value),
 		   xexpr->xmloption,
-		   xexpr->indent));
+		   xexpr->format));
 *op->resnull = false;
 			}
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 84cef57a70..bb05d2cb3b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -618,12 +618,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	xml_root_version opt_xml_root_standalone
 %type 	xmlexists_argument
 %type 	document_or_content
-%type 	xml_indent_o

Re: Collect statistics about conflicts in logical replication

2024-08-29 Thread shveta malik
On Fri, Aug 30, 2024 at 9:40 AM shveta malik  wrote:
>
> On Thu, Aug 29, 2024 at 11:06 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> >
> > Agreed. Here is new version patch which change the names as suggested. I 
> > also
> > rebased the patch based on another renaming commit 640178c9.
> >
>
> Thanks for the patch. Few minor things:
>
> 1)
> conflict.h:
>  * This enum is used in statistics collection (see
>  * PgStat_StatSubEntry::conflict_count) as well, therefore, when adding new
>  * values or reordering existing ones, ensure to review and potentially adjust
>  * the corresponding statistics collection codes.
>
> --We shall mention PgStat_BackendSubEntry as well.
>
> 026_stats.pl:
> 2)
> # Now that the table is empty, the
> # update conflict (update_existing) ERRORs will stop happening.
>
> --Shall it be update_exists instead of update_existing here:
>
> 3)
> This is an existing comment above insert_exists conflict capture:
> # Wait for the apply error to be reported.
>
> --Shall we change to:
> # Wait for the subscriber to report apply error and insert_exists conflict.
>

1) I have tested the patch, works well.
2) Verified headers inclusions, all good
3) All my comments (very old ones when the patch was initially posted)
are now addressed.

So apart from the comments I posted in [1],  I have no more comments.

[1]: 
https://www.postgresql.org/message-id/CAJpy0uAZpzustNOMBhxBctHHWbBA%3DsnTAYsLpoWZg%2BcqegmD-A%40mail.gmail.com

thanks
Shveta




Re: Collect statistics about conflicts in logical replication

2024-08-29 Thread shveta malik
On Fri, Aug 30, 2024 at 10:53 AM Peter Smith  wrote:
>
> Hi Hou-San. Here are my review comments for v4-0001.
>
> ==
>
> 1. Add links in the docs
>
> IMO it would be good for all these confl_* descriptions (in
> doc/src/sgml/monitoring.sgml) to include links back to where each of
> those conflict types was defined [1].
>
> Indeed, when links are included to the original conflict type
> information then I think you should remove mentioning
> "track_commit_timestamp":
> +   counted only when the
> +linkend="guc-track-commit-timestamp">track_commit_timestamp
> +   option is enabled on the subscriber.
>
> It should be obvious that you cannot count a conflict if the conflict
> does not happen, but I don't think we should scatter/duplicate those
> rules in different places saying when certain conflicts can/can't
> happen -- we should just link everywhere back to the original
> description for those rules.

+1, will make the doc better.

> ~~~
>
> 2. Arrange all the counts into an intuitive/natural order
>
> There is an intuitive/natural ordering for these counts. For example,
> the 'confl_*' count fields are in the order insert -> update ->
> delete, which LGTM.
>
> Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
> in a good order.
>
> IMO it makes more sense if everything is ordered as:
> 'sync_error_count', then 'apply_error_count', then all the 'confl_*'
> counts.
>
> This comment applies to lots of places, e.g.:
> - docs (doc/src/sgml/monitoring.sgml)
> - function pg_stat_get_subscription_stats (pg_proc.dat)
> - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
> - TAP test SELECTs (test/subscription/t/026_stats.pl)
>
> As all those places are already impacted by this patch, I think it
> would be good if (in passing) we (if possible) also swapped the
> sync/apply counts so everything  is ordered intuitively top-to-bottom
> or left-to-right.

Not sure about this though. It does not seem to belong to the current patch.

thanks
Shveta




Re: Conflict Detection and Resolution

2024-08-29 Thread Nisha Moond
On Fri, Aug 23, 2024 at 10:39 AM shveta malik  wrote:
>
> On Thu, Aug 22, 2024 at 3:44 PM shveta malik  wrote:
> >
> >
> > For clock-skew and timestamp based resolution, if needed, I will post
> > another email for the design items where suggestions are needed.
> >
>
> Please find issues which need some thoughts and approval for
> time-based resolution and clock-skew.
>
> 1)
> Time based conflict resolution and two phase transactions:
>
> Time based conflict resolution (last_update_wins) is the one
> resolution which will not result in data-divergence considering
> clock-skew is taken care of. But when it comes to two-phase
> transactions, it might not be the case. For two-phase transaction, we
> do not have commit timestamp when the changes are being applied. Thus
> for time-based comparison, initially it was decided to user prepare
> timestamp but it may result in data-divergence.  Please see the
> example at [1].
>
> Example at [1] is a tricky situation, and thus in the initial draft,
> we decided to restrict usage of 2pc and CDR together. The plan is:
>
> a) During Create subscription, if the user has given last_update_wins
> resolver for any conflict_type and 'two_phase' is also enabled, we
> ERROR out.
> b) During Alter subscription, if the user tries to update resolver to
> 'last_update_wins' but 'two_phase' is enabled, we error out.
>
> Another solution could be to save both prepare_ts and commit_ts. And
> when any txn comes for conflict resolution, we first check if
> prepare_ts is available, use that else use commit_ts.  Availability of
> prepare_ts would indicate it was a prepared txn and thus even if it is
> committed, we should use prepare_ts for comparison for consistency.
> This will have some overhead of storing prepare_ts along with
> commit_ts. But if the number of prepared txns are reasonably small,
> this overhead should be less.
>
> We currently plan to go with restricting 2pc and last_update_wins
> together, unless others have different opinions.
>

Done. v11-004 implements the idea of restricting 2pc and
last_update_wins together.

> ~~
>
> 2)
> parallel apply worker and conflict-resolution:
> As discussed in [2] (see last paragraph in [2]), for streaming of
> in-progress transactions by parallel worker, we do not have
> commit-timestamp with each change and thus it makes sense to disable
> parallel apply worker with CDR. The plan is to not start parallel
> apply worker if 'last_update_wins' is configured for any
> conflict_type.
>

Done.

>  ~~
>
> 3)
> parallel apply worker and clock skew management:
> Regarding clock-skew management as discussed in [3], we will wait for
> the local clock to come within tolerable range during 'begin' rather
> than before 'commit'. And this wait needs commit-timestamp in the
> beginning, thus we plan to restrict starting pa-worker even when
> clock-skew related GUCs are configured.
>

Done. v11 implements it.

> Earlier we had restricted both 2pc and parallel worker worker start
> when detect_conflict was enabled, but now since detect_conflict
> parameter is removed, we will change the implementation to restrict
> all 3 above cases when last_update_wins is configured. When the
> changes are done, we will post the patch.
>
> ~~
>
--
Thanks,
Nisha




Re: Conflict Detection and Resolution

2024-08-29 Thread Nisha Moond
On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila  wrote:
>
> On Thu, Aug 22, 2024 at 3:45 PM shveta malik  wrote:
> >
> > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond  
> > wrote:
> > >
> > > The patches have been rebased on the latest pgHead following the merge
> > > of the conflict detection patch [1].
> >
> > Thanks for working on patches.
> >
> > Summarizing the issues which need some suggestions/thoughts.
> >
> > 1)
> > For subscription based resolvers, currently the syntax implemented is:
> >
> > 1a)
> > CREATE SUBSCRIPTION 
> > CONNECTION  PUBLICATION 
> > CONFLICT RESOLVER
> > (conflict_type1 = resolver1, conflict_type2 = resolver2,
> > conflict_type3 = resolver3,...);
> >
> > 1b)
> > ALTER SUBSCRIPTION  CONFLICT RESOLVER
> > (conflict_type1 = resolver1, conflict_type2 = resolver2,
> > conflict_type3 = resolver3,...);
> >
> > Earlier the syntax suggested in [1] was:
> > CREATE SUBSCRIPTION  CONNECTION  PUBLICATION 
> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
> >
> > I think the currently implemented syntax  is good as it has less
> > repetition, unless others think otherwise.
> >
> > ~~
> >
> > 2)
> > For subscription based resolvers, do we need a RESET command to reset
> > resolvers to default? Any one of below or both?
> >
> > 2a) reset all at once:
> >  ALTER SUBSCRIPTION  RESET CONFLICT RESOLVERS
> >
> > 2b) reset one at a time:
> >  ALTER SUBSCRIPTION  RESET CONFLICT RESOLVER for 'conflict_type';
> >
> > The issue I see here is, to implement 1a and 1b, we have introduced
> > the  'RESOLVER' keyword. If we want to implement 2a, we will have to
> > introduce the 'RESOLVERS' keyword as well. But we can come up with
> > some alternative syntax if we plan to implement these. Thoughts?
> >
>
> It makes sense to have a RESET on the lines of (a) and (b). At this
> stage, we should do minimal in extending the syntax. How about RESET
> CONFLICT RESOLVER ALL for (a)?
>

Done, v11 implements the suggested RESET command.

> > ~~
> >

--
Thanks,
Nisha




Re: define PG_REPLSLOT_DIR

2024-08-29 Thread Michael Paquier
On Tue, Aug 20, 2024 at 04:23:06PM +, Bertrand Drouvot wrote:
> Please find attached v3 that:
> 
> - takes care of your comments (and also removed the use of PG_TBLSPC_DIR in
> RELATIVE_PG_TBLSPC_DIR).
> - removes the new macros from the comments (see Michael's and Yugo-San's
> comments in [0] resp. [1]).
> - adds a missing sizeof() (see [1]).
> - implements Ashutosh's idea of adding a new SLOT_DIRNAME_ARGS (see [2]). It's
> done in 0002 (I used REPLSLOT_DIR_ARGS though).
> - fixed a macro usage in ReorderBufferCleanupSerializedTXNs() (was not at the
> right location, discovered while implementing 0002).

I was looking at that, and applied 0001 for pg_replslot and merged
0003~0005 together for pg_logical.  For the first one, at the end I
have updated the comments in genfile.c.  For the second one, it looked
a bit strange to ignore "pg_logical/", which is the base for all the
others, so I have added a variable for it.  Locating them in
reorderbuffer.h with the GUCs was OK, but perhaps there was an
argument for logical.h.  The paths in origin.c refer to files, not
directories.

Not sure that 0002 is an improvement overall, so I have left this part
out.

In 0006, I am not sure that RELATIVE_PG_TBLSPC_DIR is really something
we should have.  Sure, that's a special case for base backups when
sending a directory, but we have also pg_wal/ and its XLOGDIR so
that's inconsistent.  Perhaps this part should be left as-is.
--
Michael


signature.asc
Description: PGP signature


Re: list of acknowledgments for PG17

2024-08-29 Thread Peter Eisentraut

On 27.08.24 11:26, Etsuro Fujita wrote:

On Sat, Aug 24, 2024 at 11:27 PM Peter Eisentraut  wrote:

As usual, please check for problems such as wrong sorting, duplicate
names in different variants, or names in the wrong order etc.


I think Japanese names are in the right order except “Sutou Kouhei”.
I am 100% sure his given name is Kouhei.


Fixed.  Thank you for checking.





Re: Conflict Detection and Resolution

2024-08-29 Thread Amit Kapila
On Wed, Aug 28, 2024 at 10:58 AM shveta malik  wrote:
>
> On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian  wrote:
> >
> >> 2)
> >> Currently pg_dump is dumping even the default resolvers configuration.
> >> As an example if I have not changed default configuration for say
> >> sub1, it still dumps all:
> >>
> >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH ()
> >> CONFLICT RESOLVER (insert_exists = 'error', update_differ =
> >> 'apply_remote', update_exists = 'error', update_missing = 'skip',
> >> delete_differ = 'apply_remote', delete_missing = 'skip');
> >>
> >> I am not sure if we need to dump default resolvers. Would like to know
> >> what others think on this.
> >>

Normally, we don't add defaults in the dumped command. For example,
dumpSubscription won't dump the options where the default is
unchanged. We shouldn't do it unless we have a reason for dumping
defaults.

> >> 3)
> >> Why in 002_pg_dump.pl we have default resolvers set explicitly?
> >>
> > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the 
> > regexp to check the pg_dump generated command for creating subscriptions. 
> > This is again connected to your 2nd question.
>
> Okay so we may not need this change if we plan to *not *dump defaults
> in pg_dump.
>
> Another point about 'defaults' is regarding insertion into the
> pg_subscription_conflict table. We currently do insert default
> resolvers into 'pg_subscription_conflict' even if the user has not
> explicitly configured them.
>

I don't see any problem with it. BTW, if we don't do it, I think
wherever we are referring the resolvers for a conflict, we need some
special handling for default and non-default. Am I missing something?

-- 
With Regards,
Amit Kapila.




RE: Collect statistics about conflicts in logical replication

2024-08-29 Thread Zhijie Hou (Fujitsu)
On Friday, August 30, 2024 2:24 PM shveta malik  wrote:
> 
> On Fri, Aug 30, 2024 at 10:53 AM Peter Smith 
> wrote:
> >
> > Hi Hou-San. Here are my review comments for v4-0001.

Thanks Shveta and Peter for giving comments !

> >
> > ==
> >
> > 1. Add links in the docs
> >
> > IMO it would be good for all these confl_* descriptions (in
> > doc/src/sgml/monitoring.sgml) to include links back to where each of
> > those conflict types was defined [1].
> >
> > Indeed, when links are included to the original conflict type
> > information then I think you should remove mentioning
> > "track_commit_timestamp":
> > +   counted only when the
> > +linkend="guc-track-commit-timestamp">track_commit_timesta
> mp
> > +   option is enabled on the subscriber.
> >
> > It should be obvious that you cannot count a conflict if the conflict
> > does not happen, but I don't think we should scatter/duplicate those
> > rules in different places saying when certain conflicts can/can't
> > happen -- we should just link everywhere back to the original
> > description for those rules.
> 
> +1, will make the doc better.

Changed. To add link to each conflict type, I added " 
> > ~~~
> >
> > 2. Arrange all the counts into an intuitive/natural order
> >
> > There is an intuitive/natural ordering for these counts. For example,
> > the 'confl_*' count fields are in the order insert -> update ->
> > delete, which LGTM.
> >
> > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
> > in a good order.
> >
> > IMO it makes more sense if everything is ordered as:
> > 'sync_error_count', then 'apply_error_count', then all the 'confl_*'
> > counts.
> >
> > This comment applies to lots of places, e.g.:
> > - docs (doc/src/sgml/monitoring.sgml)
> > - function pg_stat_get_subscription_stats (pg_proc.dat)
> > - view pg_stat_subscription_stats
> > (src/backend/catalog/system_views.sql)
> > - TAP test SELECTs (test/subscription/t/026_stats.pl)
> >
> > As all those places are already impacted by this patch, I think it
> > would be good if (in passing) we (if possible) also swapped the
> > sync/apply counts so everything  is ordered intuitively top-to-bottom
> > or left-to-right.
> 
> Not sure about this though. It does not seem to belong to the current patch.

I also don't think we should handle that in this patch.

Here is V5 patch which addressed above and Shveta's[1] comments.

[1] 
https://www.postgresql.org/message-id/CAJpy0uAZpzustNOMBhxBctHHWbBA%3DsnTAYsLpoWZg%2BcqegmD-A%40mail.gmail.com

Best Regards,
Hou zj


v5-0001-Collect-statistics-about-conflicts-in-logical-rep.patch
Description:  v5-0001-Collect-statistics-about-conflicts-in-logical-rep.patch


Re: ANALYZE ONLY

2024-08-29 Thread torikoshia

Hi Michael,

On 2024-08-23 19:01, Michael Harris wrote:


V2 of the patch is attached.


Thanks for the proposal and the patch.

You said this patch was a first draft version, so these may be too minor 
comments, but I will share them:



-- https://www.postgresql.org/docs/devel/progress-reporting.html
Note that when ANALYZE is run on a partitioned table, all of its 
partitions are also recursively analyzed.


Should we also note this is the default, i.e. not using ONLY option 
behavior here?



-- https://www.postgresql.org/docs/devel/ddl-partitioning.html
If you are using manual VACUUM or ANALYZE commands, don't forget that 
you need to run them on each child table individually. A command like:


ANALYZE measurement;
will only process the root table.


This part also should be modified, shouldn't it?


When running ANALYZE VERBOSE ONLY on a partition table, the INFO message 
is like this:


  =# ANALYZE VERBOSE ONLY only_parted;
  INFO:  analyzing "public.only_parted" inheritance tree

I may be wrong, but 'inheritance tree' makes me feel it includes child 
tables.
Removing 'inheritance tree' and just describing the table name as below 
might be better:


  INFO:  analyzing "public.only_parted"


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Conflict Detection and Resolution

2024-08-29 Thread shveta malik
On Fri, Aug 30, 2024 at 12:13 PM Amit Kapila  wrote:
>
> On Wed, Aug 28, 2024 at 10:58 AM shveta malik  wrote:
> >
> > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian  wrote:
> > >
> > >> 2)
> > >> Currently pg_dump is dumping even the default resolvers configuration.
> > >> As an example if I have not changed default configuration for say
> > >> sub1, it still dumps all:
> > >>
> > >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH ()
> > >> CONFLICT RESOLVER (insert_exists = 'error', update_differ =
> > >> 'apply_remote', update_exists = 'error', update_missing = 'skip',
> > >> delete_differ = 'apply_remote', delete_missing = 'skip');
> > >>
> > >> I am not sure if we need to dump default resolvers. Would like to know
> > >> what others think on this.
> > >>
>
> Normally, we don't add defaults in the dumped command. For example,
> dumpSubscription won't dump the options where the default is
> unchanged. We shouldn't do it unless we have a reason for dumping
> defaults.

Agreed, we should not dump defaults. I had the same opinion.

>
> > >> 3)
> > >> Why in 002_pg_dump.pl we have default resolvers set explicitly?
> > >>
> > > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the 
> > > regexp to check the pg_dump generated command for creating subscriptions. 
> > > This is again connected to your 2nd question.
> >
> > Okay so we may not need this change if we plan to *not *dump defaults
> > in pg_dump.
> >
> > Another point about 'defaults' is regarding insertion into the
> > pg_subscription_conflict table. We currently do insert default
> > resolvers into 'pg_subscription_conflict' even if the user has not
> > explicitly configured them.
> >
>
> I don't see any problem with it.

Yes, no problem

> BTW, if we don't do it, I think
> wherever we are referring the resolvers for a conflict, we need some
> special handling for default and non-default.

Yes, we will need special handling in such a case. Thus we shall go
with inserting defaults.

> Am I missing something?

No, I just wanted to know others' opinions, so I asked.


thanks
Shveta




Re: Conflict Detection and Resolution

2024-08-29 Thread Amit Kapila
On Wed, Aug 28, 2024 at 4:07 PM shveta malik  wrote:
>
> > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian  wrote:
> > >
>
> The review is WIP. Please find a few comments on patch001.
>
> 1)
> logical-repliction.sgmlL
>
> + Additional logging is triggered for specific conflict_resolvers.
> Users can also configure conflict_types while creating the
> subscription. Refer to section CONFLICT RESOLVERS for details on
> conflict_types and conflict_resolvers.
>
> Can we please change it to:
>
> Additional logging is triggered in various conflict scenarios, each
> identified as a conflict type. Users have the option to configure a
> conflict resolver for each conflict type when creating a subscription.
> For more information on the conflict types detected and the supported
> conflict resolvers, refer to the section 
>
> 2)
> SetSubConflictResolver
>
> + for (type = 0; type < resolvers_cnt; type++)
>
> 'type' does not look like the correct name here. The variable does not
> state conflict_type, it is instead a resolver-array-index, so please
> rename accordingly. Maybe idx or res_idx?
>
>  3)
> CreateSubscription():
>
> +if (stmt->resolvers)
> +check_conflict_detection();
>
> 3a) We can have a comment saying warn users if prerequisites  are not met.
>
> 3b) Also, I do not find the name 'check_conflict_detection'
> appropriate. One suggestion could be
> 'conf_detection_check_prerequisites' (similar to
> replorigin_check_prerequisites)
>
> 3c) We can move the below comment after check_conflict_detection() as
> it makes more sense there.
> /*
>  * Parse and check conflict resolvers. Initialize with default values
>  */
>
> 4)
> Should we allow repetition/duplicates of 'conflict_type=..' in CREATE
> and ALTER SUB? As an example:
> ALTER SUBSCRIPTION sub1 CONFLICT RESOLVER (insert_exists =
> 'apply_remote', insert_exists = 'error');
>
> Such a repetition works for Create-Sub but gives some internal error
> for alter-sub. (ERROR:  tuple already updated by self). Behaviour
> should be the same for both. And if we give an error, it should be
> some user understandable one. But I would like to know the opinions of
> others. Shall it give an error or the last one should be accepted as
> valid configuration in case of repetition?
>

I have tried the below statement to check existing behavior:
create subscription sub1 connection 'dbname=postgres' publication pub1
with (streaming = on, streaming=off);
ERROR:  conflicting or redundant options
LINE 1: ...=postgres' publication pub1 with (streaming = on, streaming=...

So duplicate options are not allowed. If we see any challenges to
follow same for resolvers then we can discuss but it seems better to
follow the existing behavior of other subscription options.

Also, the behavior for CREATE/ALTER should be the same.

-- 
With Regards,
Amit Kapila.