date:20240806

On Wed, Jul 31, 2024 at 07:01:38AM +, Bertrand Drouvot wrote:
> I share the same understanding and I think those can be removed.
> 
> The patch LGTM.

That sounds about right.  All the volatile references we have here
have been kept under the assumption that a memory barrier is required.
As we hold spin locks in these areas, that should not be necessary
anyway.  So LGTM as well.

A quick lookup at the rest of contrib/ is showing me that the
remaining volatile references point at uses with TRY/CATCH blocks,
where we require them.
--
Michael

signature.asc
Description: PGP signature

Re: Logical Replication of sequences

2024-08-06 Thread vignesh C

On Tue, 6 Aug 2024 at 10:24, shveta malik  wrote:
>
> On Tue, Aug 6, 2024 at 9:54 AM Amit Kapila  wrote:
> >
> > On Tue, Aug 6, 2024 at 8:49 AM shveta malik  wrote:
> > >
> > > > > > I was reviewing the design of patch003, and I have a query. Do we 
> > > > > > need
> > > > > > to even start an apply worker and create replication slot when
> > > > > > subscription created is for 'sequences only'? IIUC, currently 
> > > > > > logical
> > > > > > replication apply worker is the one launching sequence-sync worker
> > > > > > whenever needed. I think it should be the launcher doing this job 
> > > > > > and
> > > > > > thus apply worker may even not be needed for current functionality 
> > > > > > of
> > > > > > sequence sync?
> > > > >
> > > >
> > > > But that would lead to maintaining all sequence-sync of each
> > > > subscription by launcher. Say there are 100 sequences per subscription
> > > > and some of them from each subscription are failing due to some
> > > > reasons then the launcher will be responsible for ensuring all the
> > > > sequences are synced. I think it would be better to handle
> > > > per-subscription work by the apply worker.
> > >
> > > I thought we can give that task to sequence-sync worker. Once sequence
> > > sync worker is started by launcher, it keeps on syncing until all the
> > > sequences are synced (even failed ones) and then exits only after all
> > > are synced; instead of apply worker starting it multiple times for
> > > failed sequences. Launcher to start sequence sync worker when signaled
> > > by 'alter-sub refresh seq'.
> > > But after going through details given by Vignesh in [1], I also see
> > > the benefits of using apply worker for this task. Since apply worker
> > > is already looping and doing that for table-sync, we can reuse the
> > > same code for sequence sync and maintenance will be easy. So looks
> > > okay if we go with existing apply worker design.
> > >
> >
> > Fair enough. However, I was wondering whether apply_worker should exit
> > after syncing all sequences for a sequence-only subscription
>
> If apply worker exits, then on next sequence-refresh, we need a way to
> wake-up launcher to start apply worker which then will start
> table-sync worker. Instead, won't it be better if the launcher starts
> table-sync worker directly without the need of apply worker being
> present (which I stated earlier).

I favour the current design because it ensures the system remains
extendable for future incremental sequence synchronization. If the
launcher were responsible for starting the sequence sync worker, it
would add extra load that could hinder its ability to service other
subscriptions and complicate the design for supporting incremental
sync of sequences. Additionally, this approach offers the other
benefits mentioned in [1].

> >  or should
> > it be there for future commands that can refresh the subscription and
> > add additional tables or sequences?
>
> If we stick with apply worker starting table sync worker when needed
> by continuously checking seq-sync states ('i'/'r'), then IMO, it is
> better that apply-worker stays. But if we want  apply-worker to exit
> and start only when needed, then why not to start sequence-sync worker
> directly for seq-only subscriptions?

There is a risk that sequence synchronization might fail if the
sequence value from the publisher falls outside the defined minvalue
or maxvalue range. The apply worker must be active to determine
whether to initiate the sequence sync worker after the
wal_retrieve_retry_interval period. Typically, publications consisting
solely of sequences are uncommon. However, if a user wishes to use
such publications, they can disable the subscription if necessary and
re-enable it when a sequence refresh is needed.

[1] - 
https://www.postgresql.org/message-id/CALDaNm1KO8f3Fj%2BRHHXM%3DUSGwOcW242M1jHee%3DX_chn2ToiCpw%40mail.gmail.com

Regards,
Vignesh

Re: Remove support for old realpath() API

2024-08-06 Thread Daniel Gustafsson

> On 6 Aug 2024, at 07:43, Michael Paquier  wrote:

> I am dubious that it is a good idea to remove
> this code knowing that we have a thread from a few months ago about
> the fact that we have folks complaining about AIX support and that we
> should bring it back:

According to upthread it is supported since AIX 7.1 which shipped in 2010 so
even if support materializes for AIX it still wouldn't be needed.

--
Daniel Gustafsson

Re: Remove unnecessary forward declaration for heapam_methods variable

On Wed, Jul 31, 2024 at 10:36:11AM +0800, Japin Li wrote:
> I think the forward declaration for heapam_methods variable in 
> heapam_handler.c
> is unnecessary, right?

True.  This can be removed because all the code paths using
heapam_methods are after its declaration, so duplicating it makes
little sense.  Thanks, applied.
--
Michael


signature.asc
Description: PGP signature

Re: Injection points: preloading and runtime arguments

On Sun, Aug 04, 2024 at 01:02:22AM +0900, Michael Paquier wrote:
> On Fri, Aug 02, 2024 at 07:03:58PM +0300, Andrey M. Borodin wrote:
> > The test works fine with SQL interface “select
> > injection_points_load('get-new-multixact-id');”.
> 
> Yes, just use a load() here to make sure that the DSM required by the
> waits are properly initialized before entering in the critical section
> where the wait of the point get-new-multixact-id happens.

Hmm.  How about loading injection_points with shared_preload_libraries
now that it has a _PG_init() thanks to 75534436a477 to take care of
the initialization you need here?  We could add two hooks to request
some shmem based on a size and to do the shmem initialization.
--
Michael


signature.asc
Description: PGP signature

Re: psql client does not handle WSAEWOULDBLOCK on Windows

2024-08-06 Thread Ning

Hi Umar,

In the function of gss_read() if print the value of errno and SOCK_ERRNO
separately, I found the values are different:
 *ret = pqsecure_raw_read(conn, recv_buffer, length);
if (*ret < 0)
{
printf("errno: %d\n", errno);
printf("result_errno: %d\n", SOCK_ERRNO);
...

errno: 0
result_errno: 10035

Also refer to the
https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-wsagetlasterror
,
It shows that the Windows Sockets function does not set errno, but uses
WSAGetLastError to report errors. And refer to the
https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-recv
,
If the function fails, it should call the WSAGetLastError to get the
expansion
error message. This further shows that the socket operation error will not
be
reported through the errno.

So changing the error code check to use the SOCK_ERRNO instead of errno can
be
properly handled on both Windows and other platforms.

To reproduce the issue, I used the following version of Postgres and MIT
Kerberos:
PostgreSQL version: 16.3
MIT Kerberos Version 4.1
Operating System: Windows 11
Visual Studio 2022

On Tue, Jul 30, 2024 at 4:47 PM Ning  wrote:

>
> *Description:*The connection fails with a non-blocking socket error when
> using psql on
> Windows to connect to a PostgreSQL server with GSSAPI enabled. The error is
> because the socket error code is obtained by WSAGetLastError() instead of
> errno. This causes the value of errno to be incorrect when handling a
> non-blocking socket error.
>
>
> *Steps to Reproduce:*1. Compile PostgreSQL client (psql) on Windows.
> a. Make sure MIT Kerberos is installed. I use the latest version MIT
> Kerberos
> Version 4.1.
> b. Make sure GSSAPI is enabled
> 2. Attempt to connect to a PostgreSQL server using psql.
> a. Set up the Kerberos server and configure the PostgreSQL server by
> referring
> to https://github.com/50wu/kerberos-docker/blob/main/POSTGRES.README.md
> b. change the entry to hostgssenc on PostgreSQL server pg_hba.conf and
> restart
> hostgssencallall0.0.0.0/0gssinclude_realm=0
> krb_realm=GPDB.KRB
> c. Use the following command to connect to the database server
> psql -h  -U "postgres/krb5-service-example-com.example.com" -d
> postgres
> 3. The connection fails with a non-blocking socket error. The error is
> something like:
> psql: error: connection to server at "xxx", port 5432 failed:
>
> *Environment*:
> PostgreSQL version: 16.3
> Operating System: Windows 11
>
>
> *Fix Steps:*In the gss_read function of
> src/interfaces/libpq/fe-secure-gssapi.c, change the
> check of the error code to use the SOCK_ERRNO to make sure that EAGAIN,
> EWOULDBLOCK and EINTR can be properly handled on Windows and other
> platforms.
>
> The patch file is attached to this email, please review and consider
> merging it to
> the main code library.
>
> Thanks,
> Ning Wu
>

Rename C23 keyword


constexpr is a keyword in C23.  Rename a conflicting identifier for
future-proofing.

Obviously, C23 is way in the future, but this is a hard error that 
prevents any further exploration.  (To be clear: This only happens if 
you explicitly select C23 mode.  I'm not aware of a compiler where this 
is the default yet.)


From fa0f5ce119bff1bafd9fd278334c5caad242b6d2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Aug 2024 09:51:20 +0200
Subject: [PATCH] Rename C23 keyword

constexpr is a keyword in C23.  Rename a conflicting identifier for
future-proofing.
---
 src/backend/optimizer/util/predtest.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/optimizer/util/predtest.c 
b/src/backend/optimizer/util/predtest.c
index 6e3b376f3d3..50ea8077367 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -948,7 +948,7 @@ boolexpr_startup_fn(Node *clause, PredIterInfo info)
 typedef struct
 {
OpExpr  opexpr;
-   Const   constexpr;
+   Const   const_expr;
int next_elem;
int num_elems;
Datum  *elem_values;
@@ -992,13 +992,13 @@ arrayconst_startup_fn(Node *clause, PredIterInfo info)
state->opexpr.args = list_copy(saop->args);
 
/* Set up a dummy Const node to hold the per-element values */
-   state->constexpr.xpr.type = T_Const;
-   state->constexpr.consttype = ARR_ELEMTYPE(arrayval);
-   state->constexpr.consttypmod = -1;
-   state->constexpr.constcollid = arrayconst->constcollid;
-   state->constexpr.constlen = elmlen;
-   state->constexpr.constbyval = elmbyval;
-   lsecond(state->opexpr.args) = &state->constexpr;
+   state->const_expr.xpr.type = T_Const;
+   state->const_expr.consttype = ARR_ELEMTYPE(arrayval);
+   state->const_expr.consttypmod = -1;
+   state->const_expr.constcollid = arrayconst->constcollid;
+   state->const_expr.constlen = elmlen;
+   state->const_expr.constbyval = elmbyval;
+   lsecond(state->opexpr.args) = &state->const_expr;
 
/* Initialize iteration state */
state->next_elem = 0;
@@ -1011,8 +1011,8 @@ arrayconst_next_fn(PredIterInfo info)
 
if (state->next_elem >= state->num_elems)
return NULL;
-   state->constexpr.constvalue = state->elem_values[state->next_elem];
-   state->constexpr.constisnull = state->elem_nulls[state->next_elem];
+   state->const_expr.constvalue = state->elem_values[state->next_elem];
+   state->const_expr.constisnull = state->elem_nulls[state->next_elem];
state->next_elem++;
return (Node *) &(state->opexpr);
 }
-- 
2.46.0

Re: [PATCH] Add crc32(text) & crc32(bytea)

2024-08-06 Thread Aleksander Alekseev

Hi,

> This looks pretty good to me.  The only point that I think deserves more
> discussion is the return type.  Does bytea make the most sense here?  Or
> should we consider int/bigint?

Personally I would choose BYTEA in order to be consistent with sha*() functions.

It can be casted to TEXT if user wants a result similar to the one
md5() returns:

```
SELECT encode(crc32('PostgreSQL'), 'hex');
```

... and somewhat less convenient to BIGINT:

```
SELECT ((get_byte(crc, 0) :: bigint << 24) | (get_byte(crc, 1) << 16)
| (get_byte(crc, 2) << 8) | get_byte(crc, 3))
FROM (SELECT crc32('PostgreSQL') AS crc);
```

I don't like the `integer` option because crc32 value is typically
considered as an unsigned one and `integer` is not large enough to
represent uint32.

Perhaps we need get_int4() / get_int8() / get_numeric() as there seems
to be a demand [1][2] and it will allow us to easily cast a `bytea`
value to `integer` or `bigint`. This is probably another topic though.

[1]: 
https://stackoverflow.com/questions/32944267/postgresql-converting-bytea-to-bigint
[2]: 
https://postgr.es/m/AANLkTikip9xs8iXc8e%2BMgz1T1701i8Xk6QtbVB3KJQzX%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

RE: Conflict detection and logging in logical replication

2024-08-06 Thread Zhijie Hou (Fujitsu)

On Monday, August 5, 2024 6:52 PM Amit Kapila  wrote:
> 
> On Sun, Aug 4, 2024 at 1:22 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Friday, August 2, 2024 7:03 PM Amit Kapila 
> wrote:
> > >
> >
> > Here is the V11 patch set which addressed above and Kuroda-san[1]
> comments.
> >
> 
> A few design-level points:
> 
> *
> @@ -525,10 +602,33 @@ ExecSimpleRelationInsert(ResultRelInfo
> *resultRelInfo,
>   /* OK, store the tuple and create index entries for it */
>   simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
> 
> + conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
> +
>   if (resultRelInfo->ri_NumIndices > 0)
>   recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
> -slot, estate, false, false,
> -NULL, NIL, false);
> +slot, estate, false,
> +conflictindexes ? true : false,
> +&conflict,
> +conflictindexes, false);
> +
> + /*
> + * Checks the conflict indexes to fetch the conflicting local tuple
> + * and reports the conflict. We perform this check here, instead of
> + * performing an additional index scan before the actual insertion and
> + * reporting the conflict if any conflicting tuples are found. This is
> + * to avoid the overhead of executing the extra scan for each INSERT
> + * operation, even when no conflict arises, which could introduce
> + * significant overhead to replication, particularly in cases where
> + * conflicts are rare.
> + *
> + * XXX OTOH, this could lead to clean-up effort for dead tuples added
> + * in heap and index in case of conflicts. But as conflicts shouldn't
> + * be a frequent thing so we preferred to save the performance overhead
> + * of extra scan before each insertion.
> + */
> + if (conflict)
> + CheckAndReportConflict(resultRelInfo, estate, CT_INSERT_EXISTS,
> +recheckIndexes, slot);
> 
> I was thinking about this case where we have some pros and cons of doing
> additional scans only after we found the conflict. I was wondering how we will
> handle the resolution strategy for this when we have to remote_apply the tuple
> for insert_exists/update_exists cases.
> We would have already inserted the remote tuple in the heap and index before
> we found the conflict which means we have to roll back that change and then
> start a forest transaction to perform remote_apply which probably has to
> update the existing tuple. We may have to perform something like speculative
> insertion and then abort it. That doesn't sound cheap either. Do you have any
> better ideas?

Since most of the codes of conflict detection can be reused in the later
resolution patch. I am thinking we can go for re-scan after insertion approach
for detection patch. Then in resolution patch we can probably have a check in
the patch that if the resolver is remote_apply/last_update_win we detect
conflict before, otherwise detect it after. This way we can save an
subscription option in the detection patch because we are not introducing 
overhead
for the detection. And we can also save some overhead in the resolution patch
if there is no need to do a prior check. There could be a few duplicate codes
in resolution patch as have codes for both prior check and after check, but it
seems acceptable.


> 
> *
> -ERROR:  duplicate key value violates unique constraint "test_pkey"
> -DETAIL:  Key (c)=(1) already exists.
> +ERROR:  conflict insert_exists detected on relation "public.test"
> +DETAIL:  Key (c)=(1) already exists in unique index "t_pkey", which
> was modified locally in transaction 740 at 2024-06-26 10:47:04.727375+08.
> 
> I think the format to display conflicts is not very clear. The conflict 
> should be
> apparent just by seeing the LOG/ERROR message. I am thinking of something
> like below:
> 
> LOG: CONFLICT: ;
> DESCRIPTION: If any .. ; RESOLUTION: (This one can be added later)
> DEATAIL: remote_tuple (tuple values); local_tuple (tuple values);
> 
> With the above, one can easily identify the conflict's reason and action 
> taken by
> apply worker.

Thanks for the idea! I thought about few styles based on the suggested format,
what do you think about the following ?

---
Version 1
---
LOG: CONFLICT: insert_exists; DESCRIPTION: remote INSERT violates unique 
constraint "uniqueindex" on relation "public.test".
DETAIL: Existing local tuple (a, b, c) = (2, 3, 4) 
xid=123,origin="pub",timestamp=xxx; remote tuple (a, b, c) = (2, 4, 5).

LOG: CONFLICT: update_differ; DESCRIPTION: updating a row with key (a, b) = (2, 
4) on relation "public.test" was modified by a different source.
DETAIL: Existing local tuple (a, b, c) = (2, 3, 4) 
xid=123,origin="pub",timestamp=xxx; remote tuple (a, b, c) = (2, 4, 5).

LOG: CONFLICT: update_missing; DESCRIPTION: did not find the row with key (a, 
b) = (2, 4) on "public.test" to update.
DETAIL: remote tuple (a, b, c) = (2, 4, 5).

---
Version 2
It moves most the details to the DETAIL line compared to version 1.
--- 
LOG: CONFLICT: insert_exists on relation "public.test".
DETAIL: Key (a)=(1) already exists in u

Re: Wrong results with grouping sets

2024-08-06 Thread Richard Guo

On Wed, Jun 5, 2024 at 5:42 PM Richard Guo  wrote:
> I found a bug in the v6 patch.  The following query would trigger the
> Assert in make_restrictinfo that the given subexpression should not be
> an AND clause.
>
> select max(a) from t group by a > b and a = b having a > b and a = b;
>
> This is because the expression 'a > b and a = b' in the HAVING clause is
> replaced by a Var that references the GROUP RTE.  When we preprocess the
> columns of the GROUP RTE, we do not know whether the grouped expression
> is a havingQual or not, so we do not perform make_ands_implicit for it.
> As a result, after we replace the group Var in the HAVING clause with
> the underlying grouping expression, we will have a havingQual that is an
> AND clause.
>
> As we know, in the planner we need to first preprocess all the columns
> of the GROUP RTE.  We also need to replace any Vars in the targetlist
> and HAVING clause that reference the GROUP RTE with the underlying
> grouping expressions.  To fix the mentioned issue, I choose the perform
> this replacement before we preprocess the targetlist and havingQual, so
> that the make_ands_implicit would be performed when we preprocess the
> havingQual.

I've realized that there is something wrong with this conclusion.  If
we perform the replacement of GROUP Vars with the underlying grouping
expressions before we've done with expression preprocessing on
targetlist and havingQual, we may end up with failing to match the
expressions that are part of grouping items to lower target items.
Consider:

create table t (a int, b int);
insert into t values (1, 2);

select a < b and b < 3 from t group by rollup(a < b and b < 3)
having a < b and b < 3;

The expression preprocessing process would convert the HAVING clause
to implicit-AND format and thus it would fail to be matched to lower
target items.

Another example is:

create table t1 (a boolean);
insert into t1 values (true);

select not a from t1 group by rollup(not a) having not not a;

This HAVING clause 'not not a' would be reduced to 'a' and thus fail
to be matched to lower tlist.

I fixed this issue in v13 by performing the replacement of GROUP Vars
after we've done with expression preprocessing on targetlist and
havingQual.  An ensuing effect of this approach is that a HAVING
clause may contain expressions that are not fully preprocessed if they
are part of grouping items.  This is not an issue as long as the
clause remains in HAVING.  But if the clause is moved or copied into
WHERE, we need to re-preprocess these expressions.  Please see the
attached for the changes.

Thanks
Richard

v13-0001-Introduce-an-RTE-for-the-grouping-step.patch
Description: Binary data

v13-0002-Mark-expressions-nullable-by-grouping-sets.patch
Description: Binary data

v13-0003-Unwrap-a-PlaceHolderVar-when-safe.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-08-06 Thread Alexander Korotkov

On Tue, Aug 6, 2024 at 8:36 AM Michael Paquier  wrote:
> On Tue, Aug 06, 2024 at 05:17:10AM +0300, Alexander Korotkov wrote:
> > The 0001 patch is intended to improve this situation.  Actually, it's
> > not right to just put RecoveryInProgress() after
> > GetXLogReplayRecPtr(), because more wal could be replayed between
> > these calls.  Instead we need to recheck GetXLogReplayRecPtr() after
> > getting negative result of RecoveryInProgress() because WAL replay
> > position couldn't get updated after.
> > 0002 patch comprises fix for the header comment of WaitLSNSetLatches() 
> > function
> > 0003 patch comprises tests for pg_wal_replay_wait() errors.
>
> Before adding more tests, could it be possible to stabilize what's in
> the tree?  drongo has reported one failure with the recovery test
> 043_wal_replay_wait.pl introduced recently by 3c5db1d6b016:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-05%2004%3A24%3A54

Thank you for pointing!
Surely, I'll fix this before.

--
Regards,
Alexander Korotkov
Supabase

Re: Memory growth observed with C++ application consuming libpq.dll on Windows

2024-08-06 Thread Rajesh Kokkonda

I ran a trial version of a memory leak detector called Deleaker on windows
and found some modules that are listed as having leaks. I  ran the program
on Linux under valgrind and I do not see any leaks reported there. I have
attached the reported leaks on windows as windows_leaks.txt and valgrind
summary report as valgrind.txt.

I am working on generating a trimmed down version of the sample program to
share with you.  Let me know if you have any questions.

Thanks,
Rajesh

On Fri, Aug 2, 2024 at 10:19 PM Rajesh Kokkonda 
wrote:

> We did run our application under valgrind on Linux. We did not see any
> leaks. There is no platform dependent code in our application. We are
> seeing gradual memory growth only on windows.
>
> That is what lead me to believe the leak may be present in postgresql. I
> will run under available memory tools on windows and get back to you.
>
> I will also try to create a sample and see if I can reproduce the problem.
>
> Thanks,
> Rajesh
>
> On Fri, 2 Aug 2024, 21:45 Ranier Vilela,  wrote:
>
>> Em sex., 2 de ago. de 2024 às 11:54, Rajesh Kokkonda <
>> rajeshk.kokko...@gmail.com> escreveu:
>>
>>> Okay. I will try to create one sample program and send it to you
>>> sometime next week. In the meantime, I am listing down all the methods we
>>> are consuming from libpq.
>>>
>>> PQconnectdbParams
>>> PQstatus
>>> PQerrorMessage
>>> PQpingParams
>>> PQfinish
>>> PQresultStatus
>>> PQclear
>>> PQsetSingleRowMode
>>> PQntuples
>>> PQnfields
>>> PQftype
>>> PQgetvalue
>>> PQgetlength
>>> PQgetisnull
>>> PQgetCancel
>>> PQfreeCancel
>>> PQcancel
>>> PQsetErrorVerbosity
>>> PQsendPrepare
>>> PQsendQueryPrepared
>>> PQgetResult
>>> PQconsumeInput
>>> PQisBusy
>>> PQsetnonblocking
>>> PQflush
>>> PQsocket
>>> PQtransactionStatus
>>> PQresultErrorField
>>>
>>> It is highly likely that the memory consumption is caused by your
>> application.
>> Perhaps due to the lack of freeing up the resources used by the library.
>> You can try using this tool, to find out the root cause.
>>
>> https://drmemory.org/
>>
>> best regards,
>> Ranier Vilela
>>
>
==1659429== LEAK SUMMARY:
==1659429==definitely lost: 0 bytes in 0 blocks
==1659429==indirectly lost: 0 bytes in 0 blocks
==1659429==  possibly lost: 0 bytes in 0 blocks
==1659429==still reachable: 36,482 bytes in 121 blocks
==1659429==   of which reachable via heuristic:
==1659429== stdstring  : 537 bytes in 1 blocks
==1659429== suppressed: 0 bytes in 0 blocks
==1659429==
==1659429== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Re: Injection points: preloading and runtime arguments

2024-08-06 Thread Andrey M. Borodin

> On 6 Aug 2024, at 12:47, Michael Paquier  wrote:
> 
> Hmm.  How about loading injection_points with shared_preload_libraries
> now that it has a _PG_init() thanks to 75534436a477 to take care of
> the initialization you need here?  We could add two hooks to request
> some shmem based on a size and to do the shmem initialization.

SQL initialisation is fine for test purposes. I just considered that I'd better 
share that doing the same from C code is non-trivial.

Thanks!

Best regards, Andrey Borodin.

Fix comments in instr_time.h and remove an unneeded cast to int64

Hi hackers,

While working on [1], I came across what seems to be incorrect comments in
instr_time.h and an unneeded cast to int64.

Indeed, 03023a2664 represented time as an int64 on all platforms but forgot to
update the comment related to INSTR_TIME_GET_MICROSEC() and provided an 
incorrect
comment for INSTR_TIME_GET_NANOSEC().

Please find attached a tiny patch to correct those and, in passing, remove what
I think is an unneeded cast to int64.

[1]: 
https://www.postgresql.org/message-id/19E276C9-2C2B-435A-B275-8FA2AEB8%40gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 1d1f8089a9eba4e87af66d7c0f23f30d52dc042c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 6 Aug 2024 08:04:43 +
Subject: [PATCH v1] Fix comments in instr_time.h and remove an unneeded cast
 to int64

03023a2664 represented time as an int64 on all platforms but forgot to update
the comment related to INSTR_TIME_GET_MICROSEC() and provided an incorrect
comment for INSTR_TIME_GET_NANOSEC().

In passing removing an unneeded cast to int64.
---
 src/include/portability/instr_time.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 100.0% src/include/portability/

diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index a6fc1922f2..8f9ba2f151 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -32,9 +32,9 @@
  *
  * INSTR_TIME_GET_MILLISEC(t)		convert t to double (in milliseconds)
  *
- * INSTR_TIME_GET_MICROSEC(t)		convert t to uint64 (in microseconds)
+ * INSTR_TIME_GET_MICROSEC(t)		get t in microseconds
  *
- * INSTR_TIME_GET_NANOSEC(t)		convert t to uint64 (in nanoseconds)
+ * INSTR_TIME_GET_NANOSEC(t)		get t in nanoseconds
  *
  * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert
  * absolute times to intervals.  The INSTR_TIME_GET_xxx operations are
@@ -123,7 +123,7 @@ pg_clock_gettime_ns(void)
 	((t) = pg_clock_gettime_ns())
 
 #define INSTR_TIME_GET_NANOSEC(t) \
-	((int64) (t).ticks)
+	((t).ticks)
 
 
 #else			/* WIN32 */
-- 
2.34.1

Re: Restart pg_usleep when interrupted

Hi,

On Mon, Aug 05, 2024 at 03:07:34PM -0500, Sami Imseih wrote:

> 
> > yeah, we already have a few macros that access the .ticks, so maybe we 
> > could add
> > 2 new ones, say:
> > 
> > 1. INSTR_TIME_ADD_MS(t1, msec)
> > 2. INSTR_TIME_IS_GREATER(t1, t2)
> > 
> > I think the less operations is done in the while loop the better.
> > 
> 
> See v4. it includes 2 new instr_time.h macros to simplify the 
> code insidethe while loop.

Thanks!

1 ===

+#define INSTR_TIME_IS_GREATER(x,y) \
+   ((bool) (x).ticks > (y).ticks)

Around parentheses are missing, that should be ((bool) ((x).ticks > (y).ticks)).
I did not pay attention to it initially but found it was the culprit of breaks
not occuring (while my test case produces some).

That said, I don't think the cast is necessary here and that we could get rid of
it. 

2 ===

What about providing a quick comment about the 2 new macros in header of 
instr_time.h? (like it is done for the others macros)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Logical Replication of sequences

2024-08-06 Thread Peter Smith

Here are some review comments for the patch v20240805_2-0003.

==
doc/src/sgml/catalogs.sgml

nitpick - removed the word "either"

==
doc/src/sgml/ref/alter_subscription.sgml

I felt the discussions about "how to handle warnings" are a bit scattered:
e.g.1 - ALTER SUBSCRIPTION REFRESH PUBLICATION copy data referred to
CREATE SUBSCRIPTION copy data.
e.g.2 - ALTER SUBSCRIPTION REFRESH explains what to do, but now the
explanation is in 2 places.
e.g.3 - CREATE SUBSCRIPTION copy data explains what to do (again), but
IMO it belongs better in the common "Notes" part

FYI, I've moved all the information to one place (in the CREATE
SUBSCRIPTION "Notes") and others refer to this central place. See the
attached nitpicks diff.

REFRESH PUBLICATION copy_data
nitpick - now refers to  CREATE SUBSCRIPTION "Notes". I also moved it
to be nearer to the other sequence stuff.

REFRESH PUBLICATION SEQUENCES:
nitpick - now refers to CREATE SUBSCRIPTION "Notes".

==
doc/src/sgml/ref/create_subscription.sgml

REFRESH PUBLICATION copy_data
nitpick - now refers to CREATE SUBSCRIPTION "Notes"

Notes:
nitpick - the explanation of, and what to do about sequence WARNINGS,
is moved to here

==
src/backend/commands/sequence.c

pg_sequence_state:
nitpick - I just moved the comment in pg_sequence_state() to below the
NOTE, which talks about "page LSN".

==
src/backend/catalog/pg_subscription.c

1. HasSubscriptionRelations

Should function 'HasSubscriptionRelations' be renamed to
'HasSubscriptionTables'?

~~~

GetSubscriptionRelations:
nitpick - tweak some "skip" comments.

==
src/backend/commands/subscriptioncmds.c

2. CreateSubscription

  tables = fetch_table_list(wrconn, publications);
- foreach(lc, tables)
+ foreach_ptr(RangeVar, rv, tables)
+ {
+ Oid relid;
+
+ relid = RangeVarGetRelid(rv, AccessShareLock, false);
+
+ /* Check for supported relkind. */
+ CheckSubscriptionRelkind(get_rel_relkind(relid),
+ rv->schemaname, rv->relname);
+
+ AddSubscriptionRelState(subid, relid, table_state,
+ InvalidXLogRecPtr, true);
+ }
+
+ /* Add the sequences in init state */
+ sequences = fetch_sequence_list(wrconn, publications);
+ foreach_ptr(RangeVar, rv, sequences)

These 2 loops (first for tables and then for sequences) seem to be
executing the same code. If you wanted you could combine the lists
up-front, and then have one code loop instead of 2. It would mean less
code. OTOH, maybe the current code is more readable? I am not sure
what is best, so just bringing this to your attention.

~~~

AlterSubscription_refresh:
nitpick = typo /indicating tha/indicating that/

~~~

3. fetch_sequence_list

+ appendStringInfoString(&cmd, "SELECT DISTINCT n.nspname, c.relname,
s.seqtypid, s.seqmin, s.seqmax, s.seqstart, s.seqincrement,
s.seqcycle"
+" FROM pg_publication p, LATERAL
pg_get_publication_sequences(p.pubname::text) gps(relid),"
+" pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace JOIN
pg_sequence s ON c.oid = s.seqrelid"
+" WHERE c.oid = gps.relid AND p.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');

Please wrap this better to make the SQL more readable.

~~

4.
+ if (seqform->seqtypid != seqtypid || seqform->seqmin != seqmin ||
+ seqform->seqmax != seqmax || seqform->seqstart != seqstart ||
+ seqform->seqincrement != seqincrement ||
+ seqform->seqcycle != seqcycle)
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Sequence option in remote and local is not same for \"%s.%s\"",
+get_namespace_name(get_rel_namespace(relid)), get_rel_name(relid)),
+ errhint("Alter/Re-create the sequence using the same options as in remote."));

4a.
Are these really known as "options"? Or should they be called
"sequence parameters", or something else, like "sequence attributes"?

4a.
Is there a way to give more helpful information by identifying what
was different in the log? OTOH, maybe it would become too messy if
there were multiple differences...

==
src/backend/replication/logical/launcher.c

5. logicalrep_sync_worker_count

- if (isTablesyncWorker(w) && w->subid == subid)
+ if ((isTableSyncWorker(w) || isSequenceSyncWorker(w)) &&
+ w->subid == subid)

You could micro-optimize this -- it may be more efficient to write the
condition the other way around.

SUGGESTION
if (w->subid == subid && (isTableSyncWorker(w) || isSequenceSyncWorker(w)))

==
.../replication/logical/sequencesync.c

File header comment:
nitpick - there seems a large cut/paste mistake (the first 2
paragraphs are almost the same).
nitpick - reworded with the help of Chat-GPT for slightly better
clarity. Also fixed a couple of typos.
nitpick - it mentioned MAX_SEQUENCES_SYNC_PER_BATCH several times so I
changed the wording of one of them

~~~

fetch_remote_sequence_data:
nitpick - all other params have the same name as sequence members, so
change the parameter name /lsn/page_lsn/

~

copy_sequence:
nitpick - rename var /seq_lsn/seq_page_lsn/

==
src/backen

Re: Support specify tablespace for each merged/split partition

2024-08-06 Thread Amul Sul

On Mon, Aug 5, 2024 at 9:05 PM Junwang Zhao  wrote:
>
> Hi Amul,
>
> Thanks for your review.
>
> On Mon, Aug 5, 2024 at 8:38 PM Amul Sul  wrote:
> >
> > On Mon, Jul 15, 2024 at 11:19 AM Junwang Zhao  wrote:
> > >
> >[...]
> >  static Relation
> > -createPartitionTable(RangeVar *newPartName, Relation modelRel,
> > -AlterTableUtilityContext *context)
> > +createPartitionTable(RangeVar *newPartName, char *tablespacename,
> > +
> >
> > The comment should mention the tablespace setting in the same way it
> > mentions the access method.
>
> I'm not good at wording, can you give some advice?

My suggestion is to rewrite the third paragraph as follows, but
someone else might have a better version:
---
  The new partitions will also be created in the same tablespace as the parent
  if not specified. Also, this function sets the new partition access method
  same as parent table access methods (similarly to CREATE TABLE ... PARTITION
  OF).  It checks that parent and child tables have compatible persistence.
---
> >
> > +SELECT tablename, tablespace FROM pg_tables
> > +  WHERE tablename IN ('t', 'tp_0_2') AND schemaname = 
> > 'partitions_merge_schema'
> > +  ORDER BY tablename, tablespace;
> > + tablename |tablespace
> > +---+--
> > + t |
> > + tp_0_2| regress_tblspace
> > +(2 rows)
> > +
> > +SELECT tablename, indexname, tablespace FROM pg_indexes
> > +  WHERE tablename IN ('t', 'tp_0_2') AND schemaname = 
> > 'partitions_merge_schema'
> > +  ORDER BY tablename, indexname, tablespace;
> > + tablename |  indexname  | tablespace
> > +---+-+
> > + t | t_pkey  |
> > + tp_0_2| tp_0_2_pkey |
> > +(2 rows)
> > +
> >
> > This seems problematic to me. The index should be in the same
> > tablespace as the table.
>
> I'm not sure about this, it seems to me that partition index will alway
> inherit the tablespaceId of its parent(see generateClonedIndexStmt),
> do you think we should change the signature of this function?
>
> One thing worth mentioning is that for UNIQUE/PRIMARY KEY,
> it allows setting *USING INDEX TABLESPACE tablespace_name*,
> I don't think we should change the index tablespace in this case,
> what do you think?
>

I think you are correct; my understanding is a bit hazy.

>
> I have added you as a reviewer, hope you don't mind.

Thank you.

Regards,
Amul

Vectored IO in XLogWrite()

2024-08-06 Thread Melih Mutlu

Hi hackers,

I was looking into XLogWrite() and saw the below comment. It cannot really
circle back in wal buffers without needing to call pg_write() since next
pages wouldn't be contiguous in memory. So it needs to write whenever it
reaches the end of wal buffers.

  /*
> * Dump the set if this will be the last loop iteration, or if we are
> * at the last page of the cache area (since the next page won't be
> * contiguous in memory), or if we are at the end of the logfile
> * segment.
> */


I think that we don't have the "contiguous pages" constraint when writing
anymore as we can do vectored IO. It seems unnecessary to write just
because XLogWrite() is at the end of wal buffers.
Attached patch uses pg_pwritev() instead of pg_pwrite() and tries to write
pages in one call even if they're not contiguous in memory, until it
reaches the page at startidx.

After quickly experimenting the patch and comparing the number of write
calls, the patch's affect can be more visible when wal_buffers is quite low
as it's more likely to circle back to the beginning. When wal_buffers is
set to a decent amount, the patch only saves a few write calls. But I
wouldn't expect any regression introduced by the patch (I may be wrong
here), so I thought it may be worth to consider.

I appreciate any feedback on the proposed change. I'd also be glad to
benchmark the patch if you want to see how it performs in some specific
cases since I've been struggling with coming up a good test case.

Regards,
-- 
Melih Mutlu
Microsoft


v1-0001-Use-pg_pwritev-in-XlogWrite.patch
Description: Binary data

Remove hardcoded hash opclass function signature exceptions

hashvalidate(), which validates the signatures of support functions for 
the hash AM, contains several hardcoded exceptions.  For example, 
hash/date_ops support function 1 is hashint4(), which would ordinarily 
fail validation because the function argument is int4, not date.  But 
this works internally because int4 and date are of the same size.  There 
are several more exceptions like this that happen to work and were 
allowed historically but would now fail the function signature validation.


AFAICT, these exceptions were just carried over from before the current 
index AM API and validation functions were added.  The code contains 
comments like "For the moment, fix it by having a list of allowed 
cases.", so it probably wasn't meant as the ideal state.


This patch removes those exceptions by providing new support functions 
that have the proper declared signatures.  They internally share most of 
the C code with the "wrong" functions they replace, so the behavior is 
still the same.


With the exceptions gone, hashvalidate() is now simplified and relies 
fully on check_amproc_signature(), similar to other index AMs.


I'm also fixing one case where a brin opclass used hashvarlena() for 
bytea, even though in that case, there is no function signature 
validation done, so it doesn't matter that much.


Not done here, but maybe hashvarlena() and hashvarlenaextended() should 
be removed from pg_proc.dat, since their use as opclass support 
functions is now dubious.  They could continue to exist in the C code as 
internal support functions.From c154f024a58f96e8e65db5c621ae34f6d630b84b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Aug 2024 12:08:35 +0200
Subject: [PATCH] Remove hardcoded hash opclass function signature exceptions

hashvalidate(), which validates the signatures of support functions
for the hash AM, contained several hardcoded exceptions.  For example,
hash/date_ops support function 1 was hashint4(), which would
ordinarily fail validation because the function argument is int4, not
date.  But this works internally because int4 and date are of the same
size.  There are several more exceptions like this that happen to work
and were allowed historically but would now fail the function
signature validation.

This patch removes those exceptions by providing new support functions
that have the proper declared signatures.  They internally share most
of the code with the "wrong" functions they replace, so the behavior
is still the same.

With the exceptions gone, hashvalidate() is now simplified and relies
fully on check_amproc_signature().

TODO: Maybe hashvarlena() and hashvarlenaextended() should be removed
from pg_proc.dat, since their use as opclass support functions is
dubious.  They can continue to exist in the C code as internal support
functions.

TODO: catversion
---
 src/backend/access/hash/hashfunc.c |  12 +++
 src/backend/access/hash/hashvalidate.c | 129 +
 src/backend/utils/adt/bool.c   |  13 +++
 src/backend/utils/adt/date.c   |  12 +++
 src/backend/utils/adt/timestamp.c  |  12 +++
 src/backend/utils/adt/xid.c|  37 +++
 src/include/catalog/pg_amproc.dat  |  30 +++---
 src/include/catalog/pg_proc.dat|  42 
 8 files changed, 169 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c 
b/src/backend/access/hash/hashfunc.c
index 86f1adecb75..577ca470145 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -406,3 +406,15 @@ hashvarlenaextended(PG_FUNCTION_ARGS)
 
return result;
 }
+
+Datum
+hashbytea(PG_FUNCTION_ARGS)
+{
+   return hashvarlena(fcinfo);
+}
+
+Datum
+hashbyteaextended(PG_FUNCTION_ARGS)
+{
+   return hashvarlenaextended(fcinfo);
+}
diff --git a/src/backend/access/hash/hashvalidate.c 
b/src/backend/access/hash/hashvalidate.c
index 40164e2ea2b..330355d4c5b 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -22,19 +22,13 @@
 #include "catalog/pg_amproc.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
-#include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
-#include "parser/parse_coerce.h"
 #include "utils/builtins.h"
-#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/regproc.h"
 #include "utils/syscache.h"
 
 
-static bool check_hash_func_signature(Oid funcid, int16 amprocnum, Oid 
argtype);
-
-
 /*
  * Validator for a hash opclass.
  *
@@ -90,6 +84,7 @@ hashvalidate(Oid opclassoid)
{
HeapTuple   proctup = &proclist->members[i]->tuple;
Form_pg_amproc procform = (Form_pg_amproc) GETSTRUCT(proctup);
+   boolok;
 
/*
 * All hash functions should be registered with matching 
left/right
@@ -109,30 +104,17 @@ hashvalidate(Oid opclassoid)
switch (procform->amprocnum)
{

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-08-06 Thread Alexander Korotkov

On Tue, Aug 6, 2024 at 11:18 AM Alexander Korotkov 
wrote:
> On Tue, Aug 6, 2024 at 8:36 AM Michael Paquier 
wrote:
> > On Tue, Aug 06, 2024 at 05:17:10AM +0300, Alexander Korotkov wrote:
> > > The 0001 patch is intended to improve this situation.  Actually, it's
> > > not right to just put RecoveryInProgress() after
> > > GetXLogReplayRecPtr(), because more wal could be replayed between
> > > these calls.  Instead we need to recheck GetXLogReplayRecPtr() after
> > > getting negative result of RecoveryInProgress() because WAL replay
> > > position couldn't get updated after.
> > > 0002 patch comprises fix for the header comment of
WaitLSNSetLatches() function
> > > 0003 patch comprises tests for pg_wal_replay_wait() errors.
> >
> > Before adding more tests, could it be possible to stabilize what's in
> > the tree?  drongo has reported one failure with the recovery test
> > 043_wal_replay_wait.pl introduced recently by 3c5db1d6b016:
> >
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-05%2004%3A24%3A54
>
> Thank you for pointing!
> Surely, I'll fix this before.

Something breaks in these lines during second iteration of the loop.
"SELECT pg_current_wal_insert_lsn()" has been queried from primary, but
standby didn't receive "CALL pg_wal_replay_wait('...');"

for (my $i = 0; $i < 5; $i++)
{
print($i);
$node_primary->safe_psql('postgres',
"INSERT INTO wait_test VALUES (${i});");
my $lsn =
  $node_primary->safe_psql('postgres',
"SELECT pg_current_wal_insert_lsn()");
$psql_sessions[$i] = $node_standby1->background_psql('postgres');
$psql_sessions[$i]->query_until(
qr/start/, qq[
\\echo start
CALL pg_wal_replay_wait('${lsn}');
SELECT log_count(${i});
]);
}

I wonder what could it be.  Probably something hangs inside launching
background psql...  I'll investigate this more.

--
Regards,
Alexander Korotkov
Supabase

Re: Support specify tablespace for each merged/split partition

2024-08-06 Thread Junwang Zhao

On Tue, Aug 6, 2024 at 5:34 PM Amul Sul  wrote:
>
> On Mon, Aug 5, 2024 at 9:05 PM Junwang Zhao  wrote:
> >
> > Hi Amul,
> >
> > Thanks for your review.
> >
> > On Mon, Aug 5, 2024 at 8:38 PM Amul Sul  wrote:
> > >
> > > On Mon, Jul 15, 2024 at 11:19 AM Junwang Zhao  wrote:
> > > >
> > >[...]
> > >  static Relation
> > > -createPartitionTable(RangeVar *newPartName, Relation modelRel,
> > > -AlterTableUtilityContext *context)
> > > +createPartitionTable(RangeVar *newPartName, char *tablespacename,
> > > +
> > >
> > > The comment should mention the tablespace setting in the same way it
> > > mentions the access method.
> >
> > I'm not good at wording, can you give some advice?
>
> My suggestion is to rewrite the third paragraph as follows, but
> someone else might have a better version:
> ---
>   The new partitions will also be created in the same tablespace as the parent
>   if not specified. Also, this function sets the new partition access method
>   same as parent table access methods (similarly to CREATE TABLE ... PARTITION
>   OF).  It checks that parent and child tables have compatible persistence.
> ---

I changed to this with minor changes.

> > >
> > > +SELECT tablename, tablespace FROM pg_tables
> > > +  WHERE tablename IN ('t', 'tp_0_2') AND schemaname = 
> > > 'partitions_merge_schema'
> > > +  ORDER BY tablename, tablespace;
> > > + tablename |tablespace
> > > +---+--
> > > + t |
> > > + tp_0_2| regress_tblspace
> > > +(2 rows)
> > > +
> > > +SELECT tablename, indexname, tablespace FROM pg_indexes
> > > +  WHERE tablename IN ('t', 'tp_0_2') AND schemaname = 
> > > 'partitions_merge_schema'
> > > +  ORDER BY tablename, indexname, tablespace;
> > > + tablename |  indexname  | tablespace
> > > +---+-+
> > > + t | t_pkey  |
> > > + tp_0_2| tp_0_2_pkey |
> > > +(2 rows)
> > > +
> > >
> > > This seems problematic to me. The index should be in the same
> > > tablespace as the table.
> >
> > I'm not sure about this, it seems to me that partition index will alway
> > inherit the tablespaceId of its parent(see generateClonedIndexStmt),
> > do you think we should change the signature of this function?
> >
> > One thing worth mentioning is that for UNIQUE/PRIMARY KEY,
> > it allows setting *USING INDEX TABLESPACE tablespace_name*,
> > I don't think we should change the index tablespace in this case,
> > what do you think?
> >
>
> I think you are correct; my understanding is a bit hazy.

Thanks for your confirmation.

Attached v2 addressed all the problems you mentioned, thanks.

>
> >
> > I have added you as a reviewer, hope you don't mind.
>
> Thank you.
>
> Regards,
> Amul



-- 
Regards
Junwang Zhao


v2-0001-support-specify-tablespace-for-each-merged-split.patch
Description: Binary data


v2-0002-fix-stale-comments.patch
Description: Binary data

Re: Memory growth observed with C++ application consuming libpq.dll on Windows

2024-08-06 Thread Ranier Vilela

Em ter., 6 de ago. de 2024 às 05:33, Rajesh Kokkonda <
rajeshk.kokko...@gmail.com> escreveu:

> I ran a trial version of a memory leak detector called Deleaker on windows
> and found some modules that are listed as having leaks. I  ran the program
> on Linux under valgrind and I do not see any leaks reported there. I have
> attached the reported leaks on windows as windows_leaks.txt and valgrind
> summary report as valgrind.txt.
>
None of these sources are Postgres.

https://www.gnu.org/software/libiconv/
https://gnuwin32.sourceforge.net/packages/libintl.htm

best regards,
Ranier Vilela

Re: Fix comments in instr_time.h and remove an unneeded cast to int64


On 06/08/2024 11:54, Bertrand Drouvot wrote:

Hi hackers,

While working on [1], I came across what seems to be incorrect comments in
instr_time.h and an unneeded cast to int64.

Indeed, 03023a2664 represented time as an int64 on all platforms but forgot to
update the comment related to INSTR_TIME_GET_MICROSEC() and provided an 
incorrect
comment for INSTR_TIME_GET_NANOSEC().

Please find attached a tiny patch to correct those and, in passing, remove what
I think is an unneeded cast to int64.


Applied, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Logical Replication of sequences

2024-08-06 Thread vignesh C

On Mon, 5 Aug 2024 at 18:05, shveta malik  wrote:
>
> On Mon, Aug 5, 2024 at 11:04 AM vignesh C  wrote:
> >
> > On Wed, 31 Jul 2024 at 14:39, shveta malik  wrote:
> > >
> > > On Mon, Jun 10, 2024 at 5:00 PM vignesh C  wrote:
> > > >
> > > > On Mon, 10 Jun 2024 at 12:24, Amul Sul  wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jun 8, 2024 at 6:43 PM vignesh C  wrote:
> > > > >>
> > > > >> On Wed, 5 Jun 2024 at 14:11, Amit Kapila  
> > > > >> wrote:
> > > > >> [...]
> > > > >> A new catalog table, pg_subscription_seq, has been introduced for
> > > > >> mapping subscriptions to sequences. Additionally, the sequence LSN
> > > > >> (Log Sequence Number) is stored, facilitating determination of
> > > > >> sequence changes occurring before or after the returned sequence
> > > > >> state.
> > > > >
> > > > >
> > > > > Can't it be done using pg_depend? It seems a bit excessive unless I'm 
> > > > > missing
> > > > > something.
> > > >
> > > > We'll require the lsn because the sequence LSN informs the user that
> > > > it has been synchronized up to the LSN in pg_subscription_seq. Since
> > > > we are not supporting incremental sync, the user will be able to
> > > > identify if he should run refresh sequences or not by checking the lsn
> > > > of the pg_subscription_seq and the lsn of the sequence(using
> > > > pg_sequence_state added) in the publisher.
> > >
> > > How the user will know from seq's lsn that he needs to run refresh.
> > > lsn indicates page_lsn and thus the sequence might advance on pub
> > > without changing lsn and thus lsn may look the same on subscriber even
> > > though a sequence-refresh is needed. Am I missing something here?
> >
> > When a sequence is synchronized to the subscriber, the page LSN of the
> > sequence from the publisher is also retrieved and stored in
> > pg_subscriber_rel as shown below:
> > --- Publisher page lsn
> > publisher=# select pg_sequence_state('seq1');
> >  pg_sequence_state
> > 
> >  (0/1510E38,65,1,t)
> > (1 row)
> >
> > --- Subscriber stores the publisher's page lsn for the sequence
> > subscriber=# select * from pg_subscription_rel where srrelid = 16384;
> >  srsubid | srrelid | srsubstate | srsublsn
> > -+-++---
> >16389 |   16384 | r  | 0/1510E38
> > (1 row)
> >
> > If changes are made to the sequence, such as performing many nextvals,
> > the page LSN will be updated. Currently the sequence values are
> > prefetched for SEQ_LOG_VALS 32, so the lsn will not get updated for
> > the prefetched values, once the prefetched values are consumed the lsn
> > will get updated.
> > For example:
> > --- Updated LSN on the publisher (old lsn - 0/1510E38, new lsn - 0/1558CA8)
> > publisher=# select pg_sequence_state('seq1');
> >   pg_sequence_state
> > --
> >  (0/1558CA8,143,22,t)
> > (1 row)
> >
> > The user can then compare this updated value with the sequence's LSN
> > in pg_subscription_rel to determine when to re-synchronize the
> > sequence.
>
> Thanks for the details. But I was referring to the case where we are
> in between pre-fetched values on publisher (say at 25th value), while
> on subscriber we are slightly behind (say at 15th value), but page-lsn
> will be the same on both. Since the subscriber is behind, a
> sequence-refresh is needed on sub, but by looking at lsn (which is
> same), one can not say that for sure.  Let me know if I have
> misunderstood it.

Yes, at present, if the value is within the pre-fetched range, we
cannot distinguish it solely using the page_lsn. However, the
pg_sequence_state function also provides last_value and log_cnt, which
can be used to handle these specific cases.

Regards,
Vignesh

Re: Optimize mul_var() for var1ndigits >= 8

2024-08-06 Thread Dean Rasheed

On Mon, 5 Aug 2024 at 13:34, Joel Jacobson  wrote:
>
> Noted from 5e1f3b9eb:
> "While it adds some space on 32-bit machines, we aren't optimizing for that 
> case anymore."
>
> In this case, the extra 32-bit numeric_mul code seems to be 89 lines of code, 
> excluding comments.
> To me, this seems like quite a lot, so I lean on thinking we should omit that 
> code for now.
> We can always add it later if we get pushback.
>

OK, I guess that's reasonable. There is no clear-cut right answer
here, but I don't really want to have a lot of 32-bit-specific code
that significantly complicates this function, making it harder to
maintain. Without that code, the patch becomes much simpler, which
seems like a decent justification for any performance tradeoffs on
32-bit machines that are unlikely to affect many people anyway.

Regards,
Dean
From 6c1820257997facfe8e74fac8b574c8f683bbebc Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Thu, 18 Jul 2024 17:38:59 +0100
Subject: [PATCH v4 1/2] Extend mul_var_short() to 5 and 6-digit inputs.

Commit ca481d3c9a introduced mul_var_short(), which is used by
mul_var() whenever the shorter input has 1-4 NBASE digits and the
exact product is requested. As speculated on in that commit, it can be
extended to work for more digits in the shorter input. This commit
extends it up to 6 NBASE digits (21-24 decimal digits), for which it
also gives a significant speedup.

To avoid excessive code bloat and duplication, refactor it a bit using
macros and exploiting the fact that some portions of the code are
shared between the different cases.
---
 src/backend/utils/adt/numeric.c | 175 ++--
 1 file changed, 123 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index d0f0923710..ca28d0e3b3 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8720,10 +8720,10 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 	}
 
 	/*
-	 * If var1 has 1-4 digits and the exact result was requested, delegate to
+	 * If var1 has 1-6 digits and the exact result was requested, delegate to
 	 * mul_var_short() which uses a faster direct multiplication algorithm.
 	 */
-	if (var1ndigits <= 4 && rscale == var1->dscale + var2->dscale)
+	if (var1ndigits <= 6 && rscale == var1->dscale + var2->dscale)
 	{
 		mul_var_short(var1, var2, result);
 		return;
@@ -8882,7 +8882,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
 /*
  * mul_var_short() -
  *
- *	Special-case multiplication function used when var1 has 1-4 digits, var2
+ *	Special-case multiplication function used when var1 has 1-6 digits, var2
  *	has at least as many digits as var1, and the exact product var1 * var2 is
  *	requested.
  */
@@ -8904,7 +8904,7 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 
 	/* Check preconditions */
 	Assert(var1ndigits >= 1);
-	Assert(var1ndigits <= 4);
+	Assert(var1ndigits <= 6);
 	Assert(var2ndigits >= var1ndigits);
 
 	/*
@@ -8931,6 +8931,13 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 	 * carry up as we go.  The i'th result digit consists of the sum of the
 	 * products var1digits[i1] * var2digits[i2] for which i = i1 + i2 + 1.
 	 */
+#define PRODSUM1(v1,i1,v2,i2) ((v1)[i1] * (v2)[i2])
+#define PRODSUM2(v1,i1,v2,i2) (PRODSUM1(v1,i1,v2,i2) + (v1)[i1+1] * (v2)[i2-1])
+#define PRODSUM3(v1,i1,v2,i2) (PRODSUM2(v1,i1,v2,i2) + (v1)[i1+2] * (v2)[i2-2])
+#define PRODSUM4(v1,i1,v2,i2) (PRODSUM3(v1,i1,v2,i2) + (v1)[i1+3] * (v2)[i2-3])
+#define PRODSUM5(v1,i1,v2,i2) (PRODSUM4(v1,i1,v2,i2) + (v1)[i1+4] * (v2)[i2-4])
+#define PRODSUM6(v1,i1,v2,i2) (PRODSUM5(v1,i1,v2,i2) + (v1)[i1+5] * (v2)[i2-5])
+
 	switch (var1ndigits)
 	{
 		case 1:
@@ -8942,9 +8949,9 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * --
 			 */
 			carry = 0;
-			for (int i = res_ndigits - 2; i >= 0; i--)
+			for (int i = var2ndigits - 1; i >= 0; i--)
 			{
-term = (uint32) var1digits[0] * var2digits[i] + carry;
+term = PRODSUM1(var1digits, 0, var2digits, i) + carry;
 res_digits[i + 1] = (NumericDigit) (term % NBASE);
 carry = term / NBASE;
 			}
@@ -8960,23 +8967,17 @@ mul_var_short(const NumericVar *var1, const NumericVar *var2,
 			 * --
 			 */
 			/* last result digit and carry */
-			term = (uint32) var1digits[1] * var2digits[res_ndigits - 3];
+			term = PRODSUM1(var1digits, 1, var2digits, var2ndigits - 1);
 			res_digits[res_ndigits - 1] = (NumericDigit) (term % NBASE);
 			carry = term / NBASE;
 
 			/* remaining digits, except for the first two */
-			for (int i = res_ndigits - 3; i >= 1; i--)
+			for (int i = var2ndigits - 1; i >= 1; i--)
 			{
-term = (uint32) var1digits[0] * var2digits[i] +
-	(uint32) var1digits[1] * var2digits[i - 1] + carry;
+term = PRODSUM2(var1digits, 0, var2digits, i) + carry;
 res_digits[i + 1] = (NumericDigit) (term % NBASE);
 carry = term / N

Thread-unsafe MD5 on big-endian systems with no OpenSSL

While browsing through all our global variables for the multithreading 
effort, I noticed that our MD5 implementation in src/common/md5.c uses a 
static buffer on big-endian systems, which makes it not thread-safe. 
That's a bug because that function is also used in libpq.


This was introduced in commit b67b57a966, which replaced the old MD5 
fallback implementation with the one from pgcrypto. The thread-safety 
didn't matter for pgcrypto, but for libpq it does.


This only affects big-endian systems that are compiled without OpenSSL.

--
Heikki Linnakangas
Neon (https://neon.tech)From 5ec1ec90ba4b1828b4ecc3d6907489f879e1af12 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 6 Aug 2024 15:20:06 +0300
Subject: [PATCH 1/1] Make fallback MD5 implementation thread-safe on
 big-endian systems

Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a966, which replaced the implementation with the
one from pgcrypto.

Backpatch to v14, where we switched to the new implementation.
---
 src/common/md5.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/common/md5.c b/src/common/md5.c
index f1d47722f8..32b696a66d 100644
--- a/src/common/md5.c
+++ b/src/common/md5.c
@@ -150,10 +150,6 @@ static const uint8 md5_paddat[MD5_BUFLEN] = {
 	0, 0, 0, 0, 0, 0, 0, 0,
 };
 
-#ifdef WORDS_BIGENDIAN
-static uint32 X[16];
-#endif
-
 static void
 md5_calc(const uint8 *b64, pg_md5_ctx *ctx)
 {
@@ -167,6 +163,7 @@ md5_calc(const uint8 *b64, pg_md5_ctx *ctx)
 #else
 	/* 4 byte words */
 	/* what a brute force but fast! */
+	uint32		X[16];
 	uint8	   *y = (uint8 *) X;
 
 	y[0] = b64[3];
-- 
2.39.2

Re: pg_combinebackup does not detect missing files

2024-08-06 Thread Amul Sul

On Fri, Aug 2, 2024 at 7:07 PM Robert Haas  wrote:
>
> On Fri, Apr 19, 2024 at 11:47 AM Robert Haas  wrote:
> >
> [...]
> Here is a rebased version of the patch. No other changes since v1.
>

Here are two minor comments on this:

$ pg_combinebackup  /tmp/backup_full/ /tmp/backup_incr2/
/tmp/backup_incr3/ -o /tmp/backup_comb
pg_combinebackup: warning: "pg_wal/00010020" is
present on disk but not in the manifest

This warning shouldn’t be reported, since we don’t include WAL in the
backup manifest ? Also, I found that the final resultant manifest
includes this WAL entry:

$ head /tmp/backup_comb/backup_manifest | grep pg_wal
{ "Path": "pg_wal/00010020", "Size": 16777216,
"Last-Modified": "2024-08-06 11:54:16 GMT" },

---

+# Set up another new database instance.  force_initdb is used because
+# we want it to be a separate cluster with a different system ID.
+my $node2 = PostgreSQL::Test::Cluster->new('node2');
+$node2->init(force_initdb => 1, has_archiving => 1, allows_streaming => 1);
+$node2->append_conf('postgresql.conf', 'summarize_wal = on');
+$node2->start;
+

Unused cluster-node in the test.

Regards,
Amul

Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl)

2024-08-06 Thread Tomas Vondra

On 8/6/24 07:48, Michael Paquier wrote:
> Hi all,
> 
> dikkop has reported a failure with the regression tests of pg_combinebackup:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-08-04%2010%3A04%3A51
> 
> That's in the test 003_timeline.pl, from dc212340058b:
> #   Failed test 'incremental backup from node1'
> #   at t/003_timeline.pl line 43.
> 
> The node is extremely slow, so perhaps bumping up the timeout would be
> fine enough in this case (did not spend time analyzing it).  I don't
> think that this has been discussed, but perhaps I just missed a
> reference to it and the incremental backup thread is quite large.
> 

Yeah, it's a freebsd running on rpi4, from a USB flash disk, and in my
experience it's much slower than rpi4 running Linux. I'm not sure why is
that, never found a way to make it faster

The machine already has:

  export PGCTLTIMEOUT=600
  export PG_TEST_TIMEOUT_DEFAULT=600

I doubt increasing it further will do the trick. Maybe there's some
other timeout that I should increase?

FWIW I just moved the buildfarm stuff to a proper SSD disk (still USB,
but hopefully better than the crappy flash disk).

regards

-- 
Tomas Vondra

Re: Detailed release notes

2024-08-06 Thread Marcos Pegoraro

Em ter., 6 de ago. de 2024 às 04:30, jian he 
escreveu:

>
> you can also preview the attached screenshot to see the rendered effect.
>

Loved, except that the commit id does not help too much, so I don't think
we need it.
I think a numbered link would be better.

   - Change functions to use a safe search_path during maintenance
   operations (Jeff Davis) [1, 2]

And your patch has an additional space before comma before second, third
links, [1 , 2] instead of [1, 2]

regards
Marcos

Support multi-column renaming?

I have noticed that ALTER TABLE supports multiple column actions
(ADD/DROP column etc), but does not support multiple column renaming.
See [1]

Here is small example of what im talking about:

```
db2=# create table tt();
CREATE TABLE

-- multiple column altering ok
db2=# alter table tt add column i int, add column j int;
ALTER TABLE

-- single column renaming ok
db2=# alter table tt rename column i to i2;
ALTER TABLE
-- multiple column renaming not allowed
db2=# alter table tt rename column i2 to i3, rename column j to j2;
ERROR:  syntax error at or near ","
LINE 1: alter table tt rename column i2 to i3, rename column j to j2...
 ^
db2=#
```

Looking closely on gram.y, the only reason for this is that RenameStmt
is defined less flexible than alter_table_cmds (which is a list). All
other core infrastructure is ready to allow $subj.

So is it worth a patch?


[1] https://www.postgresql.org/docs/current/sql-altertable.html

-- 
Best regards,
Kirill Reshke

Re: Rename C23 keyword

On Tue, Aug 6, 2024 at 4:04 AM Peter Eisentraut  wrote:
> constexpr is a keyword in C23.  Rename a conflicting identifier for
> future-proofing.
>
> Obviously, C23 is way in the future, but this is a hard error that
> prevents any further exploration.  (To be clear: This only happens if
> you explicitly select C23 mode.  I'm not aware of a compiler where this
> is the default yet.)

This seems fine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Detailed release notes

On Tue, Aug 6, 2024 at 9:57 AM Marcos Pegoraro  wrote:
> Loved, except that the commit id does not help too much, so I don't think we 
> need it.
> I think a numbered link would be better.

I think the commit ID is quite useful. If you're using git, you can do
"git show $COMMITID". If you're using the web, you can go to
https://git.postgresql.org/pg/commitdiff/$COMMITID

Big -1 for removing the commit ID.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Thread-unsafe MD5 on big-endian systems with no OpenSSL

On Tue, Aug 6, 2024 at 8:23 AM Heikki Linnakangas  wrote:
> While browsing through all our global variables for the multithreading
> effort, I noticed that our MD5 implementation in src/common/md5.c uses a
> static buffer on big-endian systems, which makes it not thread-safe.
> That's a bug because that function is also used in libpq.
>
> This was introduced in commit b67b57a966, which replaced the old MD5
> fallback implementation with the one from pgcrypto. The thread-safety
> didn't matter for pgcrypto, but for libpq it does.
>
> This only affects big-endian systems that are compiled without OpenSSL.

LGTM.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix comments in instr_time.h and remove an unneeded cast to int64

Heikki Linnakangas  writes:
> On 06/08/2024 11:54, Bertrand Drouvot wrote:
>> Please find attached a tiny patch to correct those and, in passing, remove 
>> what
>> I think is an unneeded cast to int64.

> Applied, thanks!

I think this comment change is a dis-improvement.  It's removed the
documentation of the important fact that INSTR_TIME_GET_MICROSEC and
INSTR_TIME_GET_NANOSEC return a different data type from
INSTR_TIME_GET_MILLISEC (ie, integer versus float).  Also, the
expectation is that users of these APIs do not know the actual data
type of instr_time, and instead we tell them what the output of those
macros is.  This patch just blew a hole in that abstraction.

regards, tom lane

Re: Incremental View Maintenance, take 2

On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA  wrote:
>
> On Tue, 2 Jul 2024 17:03:11 +0900
> Yugo NAGATA  wrote:
>
> > On Sun, 31 Mar 2024 22:59:31 +0900
> > Yugo NAGATA  wrote:
> > > >
> > > > Also, I added a comment on RelationIsIVM() macro persuggestion from 
> > > > jian he.
> > > > In addition, I fixed a failure reported from cfbot on FreeBSD build 
> > > > caused by;
> > > >
> > > >  WARNING:  outfuncs/readfuncs failed to produce an equal rewritten 
> > > > parse tree
> > > >
> > > > This warning was raised since I missed to modify outfuncs.c for a new 
> > > > field.
> > >
> > > I found cfbot on FreeBSD still reported a failure due to
> > > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used
> > > wrong role names. Attached is a fixed version, v32.
> >
> > Attached is a rebased version, v33.
>
> I updated the patch to bump up the version numbers in psql and pg_dump codes
> from 17 to 18.
>
> Regards,
> Yugo Nagata
>
> >
> > Regards,
> > Yugo Nagata
> >
> >
> > --
> > Yugo NAGATA 
>
>
> --
> Yugo NAGATA 

Small updates with something o found recent days:

```
db2=# create incremental materialized view v2 as select * from v1;
ERROR:  VIEW or MATERIALIZED VIEW is not supported on incrementally
maintainable materialized view
```
Error messaging is not true, create view v2 as select * from v1; works fine.

```
db2=# create incremental materialized view vv2 as select i,j2, i / j2
from t1 join t2 on true;
db2=# insert into t2 values(1,0);
ERROR:  division by zero
```
It is very strange to receive `division by zero` while inserting into
relation, isn't it? Can we add some hints/CONTEXT here?
Regular triggers do it:
```
db2=# insert into ttt values(10,0);
ERROR:  division by zero
CONTEXT:  PL/pgSQL function f1() line 3 at IF
```

-- 
Best regards,
Kirill Reshke

Re: Support multi-column renaming?

2024-08-06 Thread Alvaro Herrera

On 2024-Aug-06, Kirill Reshke wrote:

> I have noticed that ALTER TABLE supports multiple column actions
> (ADD/DROP column etc), but does not support multiple column renaming.

> Looking closely on gram.y, the only reason for this is that RenameStmt
> is defined less flexible than alter_table_cmds (which is a list). All
> other core infrastructure is ready to allow $subj.
> 
> So is it worth a patch?

Hmm, yeah, maybe it'd be better if ALTER TABLE RENAME is not part of
RenameStmt but instead part of AlterTableStmt.  Probably not a super
trivial code change, but it should be doable.  The interactions with
different subcommands types in the same command should be considered
carefully (as well as with ALTER {VIEW,SEQUENCE,etc} RENAME, which I bet
we don't want changed due to the implications).

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Java is clearly an example of money oriented programming"  (A. Stepanov)

Re: Fix comments in instr_time.h and remove an unneeded cast to int64


On 06/08/2024 17:20, Tom Lane wrote:

Heikki Linnakangas  writes:

On 06/08/2024 11:54, Bertrand Drouvot wrote:

Please find attached a tiny patch to correct those and, in passing, remove what
I think is an unneeded cast to int64.



Applied, thanks!


I think this comment change is a dis-improvement.  It's removed the
documentation of the important fact that INSTR_TIME_GET_MICROSEC and
INSTR_TIME_GET_NANOSEC return a different data type from
INSTR_TIME_GET_MILLISEC (ie, integer versus float).  Also, the
expectation is that users of these APIs do not know the actual data
type of instr_time, and instead we tell them what the output of those
macros is.  This patch just blew a hole in that abstraction.


Hmm, ok I see. Then I propose:

1. Revert
2. Just fix the comment to say int64 instead of uint64.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 47a209a1840c9f66b584fb03a99baf56c5abe69f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 6 Aug 2024 17:46:58 +0300
Subject: [PATCH 1/2] Revert "Fix comments in instr_time.h and remove an
 unneeded cast to int64"

This reverts commit 3dcb09de7b. Tom Lane pointed out that it broke the
abstraction provided by the macros. The callers should not need to
know what the internal type is.

This commit is an exact revert, the next commit will fix the comments
on the macros that incorrectly claim that they return uint64.

Discussion: https://www.postgresql.org/message-id/zrhkv3maqfwns...@ip-10-97-1-34.eu-west-3.compute.internal
---
 src/include/portability/instr_time.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index 8f9ba2f151..a6fc1922f2 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -32,9 +32,9 @@
  *
  * INSTR_TIME_GET_MILLISEC(t)		convert t to double (in milliseconds)
  *
- * INSTR_TIME_GET_MICROSEC(t)		get t in microseconds
+ * INSTR_TIME_GET_MICROSEC(t)		convert t to uint64 (in microseconds)
  *
- * INSTR_TIME_GET_NANOSEC(t)		get t in nanoseconds
+ * INSTR_TIME_GET_NANOSEC(t)		convert t to uint64 (in nanoseconds)
  *
  * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert
  * absolute times to intervals.  The INSTR_TIME_GET_xxx operations are
@@ -123,7 +123,7 @@ pg_clock_gettime_ns(void)
 	((t) = pg_clock_gettime_ns())
 
 #define INSTR_TIME_GET_NANOSEC(t) \
-	((t).ticks)
+	((int64) (t).ticks)
 
 
 #else			/* WIN32 */
-- 
2.39.2

From a78416110d0d0746f668c7d133d40d7c7f982d49 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 6 Aug 2024 17:47:21 +0300
Subject: [PATCH 2/2] Fix datatypes in comments in instr_time.h

The INSTR_TIME_GET_NANOSEC(t) and INSTR_TIME_GET_MICROSEC(t) macros
return a signed int64.

Discussion: https://www.postgresql.org/message-id/zrhkv3maqfwns...@ip-10-97-1-34.eu-west-3.compute.internal
---
 src/include/portability/instr_time.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index a6fc1922f2..e66ecf34cd 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -32,9 +32,9 @@
  *
  * INSTR_TIME_GET_MILLISEC(t)		convert t to double (in milliseconds)
  *
- * INSTR_TIME_GET_MICROSEC(t)		convert t to uint64 (in microseconds)
+ * INSTR_TIME_GET_MICROSEC(t)		convert t to int64 (in microseconds)
  *
- * INSTR_TIME_GET_NANOSEC(t)		convert t to uint64 (in nanoseconds)
+ * INSTR_TIME_GET_NANOSEC(t)		convert t to int64 (in nanoseconds)
  *
  * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert
  * absolute times to intervals.  The INSTR_TIME_GET_xxx operations are
-- 
2.39.2

Re: SQL:2011 application time

2024-08-06 Thread jian he

On Tue, Aug 6, 2024 at 10:02 AM jian he  wrote:
>
> On Fri, Aug 2, 2024 at 1:09 AM Paul Jungwirth
>  wrote:
> >
> > On 7/25/24 08:52, Paul Jungwirth wrote:
> > > Here is a patch moving the not-empty check into 
> > > check_exclusion_or_unique_constraint. That is a more
> > > logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part 
> > > of the index constraint
> > > (not a CHECK constraint). At that point we've already looked up all the 
> > > information we need. So
> > > there is no extra cost for non-temporal tables, and no need to change 
> > > pg_class or add to the
> > > relcache. Also putting it there means we don't need any extra code to 
> > > enforce non-empties when we
> > > build the index or do anything else with it.
> > >
> > > I think this is the nicest solution we can expect. It is even cleaner 
> > > than the &&& ideas. So
> > > hopefully this gets us back to where we were when we decided to commit 
> > > PKs & FKs to v17.
> > >
> > > As before, I've left the nonempty check as a separate patch to make 
> > > reviewing easier, but when
> > > committing I would squash it with the PK patch.
> >
> > Hello,
> >
> > Here is an updated set of patches, rebased because the old patches no 
> > longer applied.
> >

hi. some minor issues.

in generateClonedIndexStmt
index->iswithoutoverlaps = (idxrec->indisprimary ||
idxrec->indisunique) && idxrec->indisexclusion;
this case, the index accessMethod will be "gist" only?

do you think it's necessary to:
index->iswithoutoverlaps = (idxrec->indisprimary ||
idxrec->indisunique) && idxrec->indisexclusion
&& strcmp(index->accessMethod, "gist") == 0);


src/bin/pg_dump/pg_dump.c and src/bin/psql/describe.c
should be "if (pset.sversion >= 18)"?


+ (This is sometimes called a
+  temporal key, if the column is a range of dates or timestamps, but
+  PostgreSQL allows ranges over any base type.)

PostgreSQL should be decorated as
PostgreSQL
?



in DefineIndex we have:
if (stmt->unique && !stmt->iswithoutoverlaps && !amRoutine->amcanunique)
if (stmt->indexIncludingParams != NIL && !amRoutine->amcaninclude)
if (numberOfKeyAttributes > 1 && !amRoutine->amcanmulticol)
if (exclusion && amRoutine->amgettuple == NULL)

maybe we can add:
if (stmt->iswithoutoverlaps && strcmp(accessMethodName, "gist") != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("access method \"%s\" does not support WITHOUT
OVERLAPS constraints",
accessMethodName)));



+ /* exclusionOpNames can be non-NIL if we are creating a partition */
+ if (iswithoutoverlaps && exclusionOpNames == NIL)
+ {
+ indexInfo->ii_ExclusionOps = palloc_array(Oid, nkeycols);
+ indexInfo->ii_ExclusionProcs = palloc_array(Oid, nkeycols);
+ indexInfo->ii_ExclusionStrats = palloc_array(uint16, nkeycols);
+ }
the comment is not 100% correct, i think.
creating a partition, "create table like INCLUDING ALL", both will go
through generateClonedIndexStmt.
generateClonedIndexStmt will produce exclusionOpNames if this index
supports exclusion constraint.

Re: Fix comments in instr_time.h and remove an unneeded cast to int64

Hi,

On Tue, Aug 06, 2024 at 05:49:32PM +0300, Heikki Linnakangas wrote:
> On 06/08/2024 17:20, Tom Lane wrote:
> > Heikki Linnakangas  writes:
> > > On 06/08/2024 11:54, Bertrand Drouvot wrote:
> > > > Please find attached a tiny patch to correct those and, in passing, 
> > > > remove what
> > > > I think is an unneeded cast to int64.
> > 
> > > Applied, thanks!
> > 
> > I think this comment change is a dis-improvement.  It's removed the
> > documentation of the important fact that INSTR_TIME_GET_MICROSEC and
> > INSTR_TIME_GET_NANOSEC return a different data type from
> > INSTR_TIME_GET_MILLISEC (ie, integer versus float).  Also, the
> > expectation is that users of these APIs do not know the actual data
> > type of instr_time, and instead we tell them what the output of those
> > macros is.  This patch just blew a hole in that abstraction.

Oh ok, did not think about it that way, thanks for the feedback!

> 
> Hmm, ok I see. Then I propose:
> 
> 1. Revert
> 2. Just fix the comment to say int64 instead of uint64.

LGTM, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2024-08-06 Thread John Naylor

On Sat, Jul 27, 2024 at 4:04 AM Melanie Plageman
 wrote:
>
> On Wed, Jul 24, 2024 at 8:19 AM John Naylor  wrote:

> I ran my test with your patch (on my 64-bit system, non-assert build)
> and the result is great:
>
> master with my test (slightly modified to now use DELETE instead of
> UPDATE as mentioned upthread)
> 3.09s
>
> master with your patch applied, MWM set to 64kB and 9000 rows instead of 
> 80
> 1.06s

Glad to hear it!

> I took a look at the patch, but I can't say I know enough about the
> memory allocation subsystems and how TIDStore works to meaningfully
> review it -- nor enough about DSM to comment about the interactions.

I tried using parallel vacuum with 64kB and it succeeded, but needed
to perform an index scan for every heap page pruned. It's not hard to
imagine some code moving around so that it doesn't work anymore, but
since this is for testing only, it seems a warning comment is enough.

> I suspect 256kB would also be fast enough to avoid my test timing out
> on the buildfarm, but it is appealing to have a minimum for
> maintenance_work_mem that is the same as work_mem.

Agreed on both counts:

I came up with a simple ctid expression to make the bitmap arrays larger:

delete from test where ctid::text like '%,2__)';

With that, it still takes between 250k and 300k tuples to force a
second index scan with 256kB m_w_m, default fillfactor, and without
asserts. (It may need a few more pages for 32-bit but not many more)
The table is around 1300 pages, where on v16 it's about 900. But with
fewer tuples deleted, the WAL for deletes should be lower. So it might
be comparable to v16's test.

It also turns out that to support 64kB memory settings, we actually
wouldn't need to change radix tree to lazily create memory contexts --
at least currently, SlabCreate doesn't allocate a keeper block, so a
newly created slab context reports 0 for "mem_allocated". So I'm
inclined to go ahead change the minimum m_w_m on v17 and master to
64kB. It's the quickest and (I think) most future-proof way to make
this test work. Any objections?

Re: Detailed release notes

2024-08-06 Thread Euler Taveira

On Tue, Aug 6, 2024, at 11:02 AM, Robert Haas wrote:
> On Tue, Aug 6, 2024 at 9:57 AM Marcos Pegoraro  wrote:
> > Loved, except that the commit id does not help too much, so I don't think 
> > we need it.
> > I think a numbered link would be better.
> 
> I think the commit ID is quite useful. If you're using git, you can do
> "git show $COMMITID". If you're using the web, you can go to
> https://git.postgresql.org/pg/commitdiff/$COMMITID
> 
> Big -1 for removing the commit ID.

Agree. Numbers mean nothing in this context. You are searching for detailed
information about the referred feature. A visual information (commit hash)
provides a context that you will find the source code modifications for that
feature.

Talking about the patch, do we want to rely on an external resource? I suggest
that we use a postgresql.org subdomain. It can point to

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=$COMMIT

or even better use a rewrite rule to define a user-friendly URL (and probably a
mechanism to hide the gitweb URL) like

https://www.postgresql.org/commit/da4017a694d

and the short version that we usually use for the mailing list.

https://postgr.es/c/da4017a694d

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Thread-unsafe MD5 on big-endian systems with no OpenSSL



> On Aug 6, 2024, at 23:05, Robert Haas  wrote:
> On Tue, Aug 6, 2024 at 8:23 AM Heikki Linnakangas  wrote:
>> 
>> This only affects big-endian systems that are compiled without OpenSSL.
> 
> LGTM.

Nice catch, looks fine to me as well.
--
Michael

Re: Detailed release notes

2024-08-06 Thread Andres Freund

Hi,

On 2024-08-06 12:02:59 -0300, Euler Taveira wrote:
> Talking about the patch, do we want to rely on an external resource? I suggest
> that we use a postgresql.org subdomain. It can point to
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=$COMMIT

I wonder if we should make that a configurable base domain? We have a few
other variables in the sgml that can optionally be set.

Greetings,

Andres Freund

Re: Minor refactorings to eliminate some static buffers

Here's another batch of little changes in the same vein. Mostly 
converting static bufs that are never modified to consts.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 6dd5a4a413212a61d9a4f5b9db73e812c8b5dcbd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 6 Aug 2024 17:58:29 +0300
Subject: [PATCH 1/5] Turn a few 'validnsps' static variables into locals

There was no need for these to be static buffers, a local variable
works just as well. I think they were marked as 'static' to imply that
they are read-only, but 'const' is more appropriate for that, so
change them to const.

To make it possible to mark the variables as 'const', also add 'const'
decorations to the transformRelOptions() signature.

Discussion: https://www.postgresql.org/message-id/7f86e06a-98c5-4ce3-8ec9-3885c8de0...@iki.fi
---
 src/backend/access/common/reloptions.c | 2 +-
 src/backend/commands/createas.c| 2 +-
 src/backend/commands/tablecmds.c   | 4 ++--
 src/backend/tcop/utility.c | 2 +-
 src/include/access/reloptions.h| 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index d6eb5d8559..49fd35bfc5 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1154,7 +1154,7 @@ add_local_string_reloption(local_relopts *relopts, const char *name,
  */
 Datum
 transformRelOptions(Datum oldOptions, List *defList, const char *namspace,
-	char *validnsps[], bool acceptOidsOff, bool isReset)
+	const char *const validnsps[], bool acceptOidsOff, bool isReset)
 {
 	Datum		result;
 	ArrayBuildState *astate;
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index c71ff80188..0b629b1f79 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -83,7 +83,7 @@ create_ctas_internal(List *attrList, IntoClause *into)
 	bool		is_matview;
 	char		relkind;
 	Datum		toast_options;
-	static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
+	const char *const validnsps[] = HEAP_RELOPT_NAMESPACES;
 	ObjectAddress intoRelationAddr;
 
 	/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0b2a52463f..1f94f4fdbb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -700,7 +700,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	ListCell   *listptr;
 	AttrNumber	attnum;
 	bool		partitioned;
-	static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
+	const char *const validnsps[] = HEAP_RELOPT_NAMESPACES;
 	Oid			ofTypeId;
 	ObjectAddress address;
 	LOCKMODE	parentLockmode;
@@ -14897,7 +14897,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 	Datum		repl_val[Natts_pg_class];
 	bool		repl_null[Natts_pg_class];
 	bool		repl_repl[Natts_pg_class];
-	static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
+	const char *const validnsps[] = HEAP_RELOPT_NAMESPACES;
 
 	if (defList == NIL && operation != AT_ReplaceRelOptions)
 		return;	/* nothing to do */
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 702a6c3a0b..2732d6bfc9 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1155,7 +1155,7 @@ ProcessUtilitySlow(ParseState *pstate,
 		{
 			CreateStmt *cstmt = (CreateStmt *) stmt;
 			Datum		toast_options;
-			static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
+			const char *validnsps[] = HEAP_RELOPT_NAMESPACES;
 
 			/* Remember transformed RangeVar for LIKE */
 			table_rv = cstmt->relation;
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index 81829b8270..df6923c9d5 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -220,7 +220,7 @@ extern void add_local_string_reloption(local_relopts *relopts, const char *name,
 	   fill_string_relopt filler, int offset);
 
 extern Datum transformRelOptions(Datum oldOptions, List *defList,
- const char *namspace, char *validnsps[],
+ const char *namspace, const char *const validnsps[],
  bool acceptOidsOff, bool isReset);
 extern List *untransformRelOptions(Datum options);
 extern bytea *extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
-- 
2.39.2

From f108ae4c2ddfa6ca77e9736dc3fb20e6bda7b67c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 6 Aug 2024 17:59:33 +0300
Subject: [PATCH 2/5] Make nullSemAction const, add 'const' decorators to
 related functions

To make it more clear that these should never be modified.

Discussion: https://www.postgresql.org/message-id/7f86e06a-98c5-4ce3-8ec9-3885c8de0...@iki.fi
---
 src/backend/utils/adt/jsonfuncs.c |  2 +-
 src/common/jsonapi.c  | 26 +--
 src/include/common/jsonapi.h  |  6 ++---
 src/include/utils/json

Re: Fix comments in instr_time.h and remove an unneeded cast to int64

Heikki Linnakangas  writes:
> Hmm, ok I see. Then I propose:

> 1. Revert
> 2. Just fix the comment to say int64 instead of uint64.

Yeah, it's probably reasonable to specify the output as int64
not uint64 (especially since it looks like that's what the
macros actually produce).

regards, tom lane

Re: Minor refactorings to eliminate some static buffers

2024-08-06 Thread Andres Freund

Hi,

On 2024-08-06 18:13:56 +0300, Heikki Linnakangas wrote:
> From 6dd5a4a413212a61d9a4f5b9db73e812c8b5dcbd Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Tue, 6 Aug 2024 17:58:29 +0300
> Subject: [PATCH 1/5] Turn a few 'validnsps' static variables into locals
> 
> There was no need for these to be static buffers, a local variable
> works just as well. I think they were marked as 'static' to imply that
> they are read-only, but 'const' is more appropriate for that, so
> change them to const.

I looked at these at some point in the past. Unfortunately compilers don't
always generate better code with const than static :( (the static
initialization can be done once in a global var, the const one not
necessarily).  Arguably what you'd want here is both.

I doubt that matters here though.


> diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
> index 702a6c3a0b..2732d6bfc9 100644
> --- a/src/backend/tcop/utility.c
> +++ b/src/backend/tcop/utility.c
> @@ -1155,7 +1155,7 @@ ProcessUtilitySlow(ParseState *pstate,
>   {
>   CreateStmt *cstmt = 
> (CreateStmt *) stmt;
>   Datum   
> toast_options;
> - static char 
> *validnsps[] = HEAP_RELOPT_NAMESPACES;
> + const char *validnsps[] 
> = HEAP_RELOPT_NAMESPACES;
>  
>   /* Remember transformed 
> RangeVar for LIKE */
>   table_rv = 
> cstmt->relation;

In the other places you used "const char * const", here just "const char *" - it
doesn't look like that's a required difference?



> From f108ae4c2ddfa6ca77e9736dc3fb20e6bda7b67c Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Tue, 6 Aug 2024 17:59:33 +0300
> Subject: [PATCH 2/5] Make nullSemAction const, add 'const' decorators to
>  related functions

> To make it more clear that these should never be modified.

Yep - and it reduces the size of writable mappings to boot.

LGTM.


> From da6f101b0ecc2d4e4b33bbcae316dbaf72e67d14 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Tue, 6 Aug 2024 17:59:45 +0300
> Subject: [PATCH 3/5] Mark misc static global variables as const

LGTM


> From 5d562f15aaba0bb082e714e844995705f0ca1368 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Tue, 6 Aug 2024 17:59:52 +0300
> Subject: [PATCH 4/5] Constify fields and parameters in spell.c
> 
> I started by marking VoidString as const, and fixing the fallout by
> marking more fields and function arguments as const. It proliferated
> quite a lot, but all within spell.c and spell.h.
> 
> A more narrow patch to get rid of the static VoidString buffer would
> be to replace it with '#define VoidString ""', as C99 allows assigning
> "" to a non-const pointer, even though you're not allowed to modify
> it. But it seems like good hygiene to mark all these as const. In the
> structs, the pointers can point to the constant VoidString, or a
> buffer allocated with palloc(), or with compact_palloc(), so you
> should not modify them.

Looks reasonable to me.


> From bb66efccf4f97d0001b730a1376845c0a19c7f27 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Tue, 6 Aug 2024 18:00:01 +0300
> Subject: [PATCH 5/5] Use psprintf to simplify gtsvectorout()
> 
> The buffer allocation was correct, but looked archaic and scary:
> 
> - It was weird to calculate the buffer size before determining which
>   format string was used. With the same effort, we could've used the
>   right-sized buffer for each branch.
> 
> - Commit aa0d3504560 added one more possible return string ("all true
>   bits"), but didn't adjust the code at the top of the function to
>   calculate the returned string's max size. It was not a live bug,
>   because the new string was smaller than the existing ones, but
>   seemed wrong in principle.
> 
> - Use of sprintf() is generally eyebrow-raising these days
> 
> Switch to psprintf(). psprintf() allocates a larger buffer than what
> was allocated before, 128 bytes vs 80 bytes, which is acceptable as
> this code is not performance or space critical.

I find the undocumented EXTRALEN the most confusing :)

LGTM.

Greetings,

Andres Freund

Re: Support multi-column renaming?

2024-08-06 Thread Andres Freund

Hi,

On 2024-08-06 10:45:48 -0400, Alvaro Herrera wrote:
> On 2024-Aug-06, Kirill Reshke wrote:
> 
> > I have noticed that ALTER TABLE supports multiple column actions
> > (ADD/DROP column etc), but does not support multiple column renaming.
> 
> > Looking closely on gram.y, the only reason for this is that RenameStmt
> > is defined less flexible than alter_table_cmds (which is a list). All
> > other core infrastructure is ready to allow $subj.
> > 
> > So is it worth a patch?
> 
> Hmm, yeah, maybe it'd be better if ALTER TABLE RENAME is not part of
> RenameStmt but instead part of AlterTableStmt.  Probably not a super
> trivial code change, but it should be doable.  The interactions with
> different subcommands types in the same command should be considered
> carefully (as well as with ALTER {VIEW,SEQUENCE,etc} RENAME, which I bet
> we don't want changed due to the implications).

I think you'd likely run into grammar ambiguity issues if you did it
naively. I think I looked at something related at some point in the past and
concluded that to avoid bison getting confused (afaict the grammar is still
LALR(1), it's just that bison doesn't merge states well enough), we'd have to
invent a RENAME_TO_P and inject that "manually" in base_yylex().

IIRC introducing RENAME_TO_P (as well as SET_SCHEMA_P, OWNER TO) did actually
result in a decent size reduction of the grammar.

Greetings,

Andres Freund

Re: remove volatile qualifiers from pg_stat_statements

On Tue, Aug 06, 2024 at 04:04:01PM +0900, Michael Paquier wrote:
> On Wed, Jul 31, 2024 at 07:01:38AM +, Bertrand Drouvot wrote:
>> I share the same understanding and I think those can be removed.
>> 
>> The patch LGTM.
> 
> That sounds about right.  All the volatile references we have here
> have been kept under the assumption that a memory barrier is required.
> As we hold spin locks in these areas, that should not be necessary
> anyway.  So LGTM as well.

Committed, thanks.

-- 
nathan

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-06 Thread Yugo Nagata

On Thu, 1 Aug 2024 23:41:18 +0500
Kirill Reshke  wrote:

> On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > Relatedly, if we can EXPLAIN a CREATE MATERIALIZED VIEW, perhaps we
> > should be able to EXPLAIN a REFRESH MATERIALIZED VIEW, too?
> Sure

REFRESH MATERIALIZED VIEW consists of not only the view query
execution in refresh_matview_datafill but also refresh_by_heap_swap
or refresh_by_match_merge. The former doesn't execute any query, so
it would not a problem, but the latter executes additional queries
including SELECT, some DDL, DELETE, and INSERT. 

If we would make EXPLAIN support REFRESH MATERIALIZED VIEW CONCURRENTLY
command,  how should we handle these additional queries?

Regards,
Yugo Nagata

-- 
Yugo Nagata

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

On Tue, 6 Aug 2024 at 21:06, Yugo Nagata  wrote:
>
> On Thu, 1 Aug 2024 23:41:18 +0500
> Kirill Reshke  wrote:
>
> > On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > > Relatedly, if we can EXPLAIN a CREATE MATERIALIZED VIEW, perhaps we
> > > should be able to EXPLAIN a REFRESH MATERIALIZED VIEW, too?
> > Sure
>
> REFRESH MATERIALIZED VIEW consists of not only the view query
> execution in refresh_matview_datafill but also refresh_by_heap_swap
> or refresh_by_match_merge. The former doesn't execute any query, so
> it would not a problem, but the latter executes additional queries
> including SELECT, some DDL, DELETE, and INSERT.
>
> If we would make EXPLAIN support REFRESH MATERIALIZED VIEW CONCURRENTLY
> command,  how should we handle these additional queries?
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata 

Hmm, is it a big issue? Maybe we can just add them in proper places of
output the same way we do it with triggers?

-- 
Best regards,
Kirill Reshke

Re: [PATCH] Add crc32(text) & crc32(bytea)

On Tue, Aug 06, 2024 at 11:04:41AM +0300, Aleksander Alekseev wrote:
> Perhaps we need get_int4() / get_int8() / get_numeric() as there seems
> to be a demand [1][2] and it will allow us to easily cast a `bytea`
> value to `integer` or `bigint`. This is probably another topic though.

Yeah, I was surprised to learn there wasn't yet an easy way to do this.
I'm not sure how much of a factor this should play in deciding the return
value for the CRC functions, but IMHO it's a reason to reconsider returning
text as you originally proposed.

-- 
nathan

Re: pg_verifybackup: TAR format backup verification

On Thu, Aug 1, 2024 at 9:19 AM Amul Sul  wrote:
> > I think I would have made this pass context->show_progress to
> > progress_report() instead of the whole verifier_context, but that's an
> > arguable stylistic choice, so I'll defer to you if you prefer it the
> > way you have it. Other than that, this LGTM.
>
> Additionally, I moved total_size and done_size to verifier_context
> because done_size needs to be accessed in astreamer_verify.c.
> With  this change, verifier_context is now more suitable.

But it seems like 0006 now changes the logic for computing total_size.
Prepatch, the condition is:

-   if (context->show_progress && !context->skip_checksums &&
-   should_verify_checksum(m))
-   context->total_size += m->size;

where should_verify_checksum(m) checks (((m)->matched) && !((m)->bad)
&& (((m)->checksum_type) != CHECKSUM_TYPE_NONE)). But post-patch the
condition is:

+   if (!context.skip_checksums)
...
+   if (!should_ignore_relpath(context, m->pathname))
+   total_size += m->size;

The old logic was reached from verify_backup_directory() which does
check should_ignore_relpath(), so the new condition hasn't added
anything. But it seems to have lost the show_progress condition, and
the m->checksum_type != CHECKSUM_TYPE_NONE condition. I think this
means that we'll sum the sizes even when not displaying progress, and
that if some of the files in the manifest had no checksums, our
progress reporting would compute wrong percentages after the patch.

> Understood. At the start of working on the v3 review, I thought of
> completely discarding the 0007 patch and copying most of
> verify_file_checksum() to a new function in astreamer_verify.c.
> However, I later realized we could deduplicate some parts, so I split
> verify_file_checksum() and moved the reusable part to a separate
> function. Please have a look at v4-0007.

Yeah, that seems OK.

The fact that these patches don't have commit messages is making life
more difficult for me than it needs to be. In particular, I'm looking
at 0009 and there's no hint about why you want to do this. In fact
that's the case for all of these refactoring patches. Instead of
saying something like "tar format verification will want to verify the
control file, but will not be able to read the file directly from
disk, so separate the logic that reads the control file from the logic
that verifies it" you just say what code you moved. Then I have to
guess why you moved it, or flip back and forth between the refactoring
patch and 0011 to try to figure it out. It would be nice if each of
these refactoring patches contained a clear indication about the
purpose of the refactoring in the commit message.

> I had the same thought about checking for NULL inside
> should_verify_control_data(), but I wanted to maintain the structure
> similar to should_verify_checksum(). Making this change would have
> also required altering should_verify_checksum(), I wasn’t sure if I
> should make that change before. Now, I did that in the attached
> version -- 0008 patch.

I believe there is no reason for this change to be part of 0008 at
all, and that this should be part of whatever later patch needs it.

> > Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also.
>
> Done.

OK, the formatting of 0011 looks much better now.

It seems to me that 0011 is arranging to palloc the checksum context
for every file and then pfree it at the end. It seems like it would be
considerably more efficient if astreamer_verify contained a
pg_checksum_context instead of a pointer to a pg_checksum_context. If
you need a flag to indicate whether we've reinitialized the checksum
for the current file, it's better to add that than to have all of
these unnecessary allocate/free cycles.

Existing astreamer code uses struct member names_like_this. For the
new one, you mostly used namesLikeThis except when you used
names_like_this or namesLkThs.

It seems to me that instead of adding a global variable
verify_backup_file_cb, it would be better to move the 'format'
variable into verifier_context. Then you can do something like if
(context->format == 'p') verify_plain_backup_file() else
verify_tar_backup_file().

It's pretty common for .tar.gz to be abbreviated to .tgz. I think we
should support that.

Let's suppose that I have a backup which, for some reason, does not
use the same compression for all files (base.tar, 16384.tgz,
16385.tar.gz, 16366.tar.lz4). With this patch, that will fail. Now,
that's not really a problem, because having a backup with mixed
compression algorithms like that is strange and you probably wouldn't
try to do it. But on the other hand, it looks to me like making the
code support that would be more elegant than what you have now.
Because, right now, you have code to detect what type of backup you've
got by looking at base.WHATEVER_EXTENSION ... but then you have to
also have code that complains if some later file d

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-06 Thread Yugo Nagata

On Thu, 01 Aug 2024 13:34:51 -0700
Jeff Davis  wrote:

> On Fri, 2024-08-02 at 00:13 +0500, Kirill Reshke wrote:
> > On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > > 
> > > EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> > > ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().
> > 
> > EXPLAIN ANALYZE and regular query goes through create_ctas_internal
> > (WITH NO DATA case too). Maybe we can simply move
> > SetUserIdAndSecContext call in this function?
> 
> We need to set up the SECURITY_RESTRICTED_OPERATION before planning, in
> case the planner executes some functions.
> 
> I believe we need to do some more refactoring to make this work. In
> version 17, I already refactored so that CREATE MATERIALIZED VIEW and
> REFRESH MATERIALIZED VIEW share the code. We can do something similar
> to extend that to EXPLAIN ... CREATE MATERIALIZED VIEW.

I think the most simple way to fix this is to set up the userid and the
the SECURITY_RESTRICTED_OPERATION, and call RestrictSearchPath()
before pg_plan_query in ExplainOneQuery(), as in the attached patch.

However, this is similar to the old code of ExecCreateTableAs() before
refactored. If we want to reuse REFRESH even in EXPLAIN as similar as the
current CREATE MATERIALIZED code, I think we would have to refactor
RefereshMatViewByOid to receive instrument_option and eflags as arguments
and call it in ExplainOnePlan, for example.

Do you think that is more preferred than setting up
SECURITY_RESTRICTED_OPERATION in ExplainOneQuery?
 
> As for the August release, the code freeze is on Saturday. Perhaps it
> can be done by then, but is there a reason we should rush it? This
> affects all supported versions, so we've lived with it for a while, and
> I don't see a security problem here. I wouldn't expect it to be a
> common use case, either.

I agree that we don't have to rush it since it is not a security bug
but just it could make a materialized view that cannot be refreshed.

Regards,
Yugo Nagata

-- 
Yugo Nagata 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5771aabf40..f8beb6f6a6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -22,6 +22,7 @@
 #include "jit/jit.h"
 #include "libpq/pqformat.h"
 #include "libpq/protocol.h"
+#include "miscadmin.h"
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -467,6 +468,10 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	MemoryContext planner_ctx = NULL;
 	MemoryContext saved_ctx = NULL;
 
+	Oid			save_userid = InvalidOid;
+	int			save_sec_context = 0;
+	int			save_nestlevel = 0;
+
 	if (es->memory)
 	{
 		/*
@@ -487,6 +492,20 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
 
+	/*
+	 * For CREATE MATERIALIZED VIEW command, switch to the owner's userid, so
+	 * that any functions are run as that user.  Also lock down security-restricted
+	 * operations and arrange to make GUC variable changes local to this command.
+	 */
+	if (into && into->viewQuery)
+	{
+		GetUserIdAndSecContext(&save_userid, &save_sec_context);
+		SetUserIdAndSecContext(save_userid,
+			   save_sec_context | SECURITY_RESTRICTED_OPERATION);
+		save_nestlevel = NewGUCNestLevel();
+		RestrictSearchPath();
+	}
+
 	/* plan the query */
 	plan = pg_plan_query(query, queryString, cursorOptions, params);
 
@@ -510,6 +529,16 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
    &planduration, (es->buffers ? &bufusage : NULL),
    es->memory ? &mem_counters : NULL);
+
+	/* CREATE MATERIALIZED VIEW command */
+	if (into && into->viewQuery)
+	{
+		/* Roll back any GUC changes */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+	}
 }
 
 /*

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

On Tue, 6 Aug 2024 at 22:13, Yugo Nagata  wrote:
>
> On Thu, 01 Aug 2024 13:34:51 -0700
> Jeff Davis  wrote:
>
> > On Fri, 2024-08-02 at 00:13 +0500, Kirill Reshke wrote:
> > > On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > > >
> > > > EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> > > > ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().
> > >
> > > EXPLAIN ANALYZE and regular query goes through create_ctas_internal
> > > (WITH NO DATA case too). Maybe we can simply move
> > > SetUserIdAndSecContext call in this function?
> >
> > We need to set up the SECURITY_RESTRICTED_OPERATION before planning, in
> > case the planner executes some functions.
> >
> > I believe we need to do some more refactoring to make this work. In
> > version 17, I already refactored so that CREATE MATERIALIZED VIEW and
> > REFRESH MATERIALIZED VIEW share the code. We can do something similar
> > to extend that to EXPLAIN ... CREATE MATERIALIZED VIEW.
>
> I think the most simple way to fix this is to set up the userid and the
> the SECURITY_RESTRICTED_OPERATION, and call RestrictSearchPath()
> before pg_plan_query in ExplainOneQuery(), as in the attached patch.
>
> However, this is similar to the old code of ExecCreateTableAs() before
> refactored. If we want to reuse REFRESH even in EXPLAIN as similar as the
> current CREATE MATERIALIZED code, I think we would have to refactor
> RefereshMatViewByOid to receive instrument_option and eflags as arguments
> and call it in ExplainOnePlan, for example.
>
> Do you think that is more preferred than setting up
> SECURITY_RESTRICTED_OPERATION in ExplainOneQuery?
>
> > As for the August release, the code freeze is on Saturday. Perhaps it
> > can be done by then, but is there a reason we should rush it? This
> > affects all supported versions, so we've lived with it for a while, and
> > I don't see a security problem here. I wouldn't expect it to be a
> > common use case, either.
>
> I agree that we don't have to rush it since it is not a security bug
> but just it could make a materialized view that cannot be refreshed.
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata 

> + /*
> + * For CREATE MATERIALIZED VIEW command, switch to the owner's userid, so
> + * that any functions are run as that user.  Also lock down 
> security-restricted
> + * operations and arrange to make GUC variable changes local to this command.
> + */
> + if (into && into->viewQuery)
> + {
> + GetUserIdAndSecContext(&save_userid, &save_sec_context);
> + SetUserIdAndSecContext(save_userid,
> +   save_sec_context | SECURITY_RESTRICTED_OPERATION);
> + save_nestlevel = NewGUCNestLevel();
> + RestrictSearchPath();
> + }
> +

In general, doing this ad-hoc coding for MV inside
standard_ExplainOneQuery function looks just out of place for me.
standard_ExplainOneQuery is on another level of abstraction.

-- 
Best regards,
Kirill Reshke

Re: Restart pg_usleep when interrupted

2024-08-06 Thread Sami Imseih



v5-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


v5 takes care of the latest comments by Bertrand.

Regards,

Sami

Re: Vectored IO in XLogWrite()

On Tue, Aug 6, 2024 at 5:36 AM Melih Mutlu  wrote:
> I think that we don't have the "contiguous pages" constraint when writing 
> anymore as we can do vectored IO. It seems unnecessary to write just because 
> XLogWrite() is at the end of wal buffers.
> Attached patch uses pg_pwritev() instead of pg_pwrite() and tries to write 
> pages in one call even if they're not contiguous in memory, until it reaches 
> the page at startidx.

Here are a few notes on this patch:

- It's not pgindent-clean. In fact, it doesn't even pass git diff --check.

- You added a new comment (/* Reaching the buffer... */) in the middle
of a chunk of lines that were already covered by an existing comment
(/* Dump the set ... */). This makes it look like the /* Dump the
set... */ comment only covers the 3 lines of code that immediately
follow it rather than everything in the "if" statement. You could fix
this in a variety of ways, but in this case the easiest solution, to
me, looks like just skipping the new comment. It seems like the point
is pretty self-explanatory.

- The patch removes the initialization of "from" but not the variable
itself. You still increment the variable you haven't initialized.

- I don't think the logic is correct after a partial write. Pre-patch,
"from" advances while "nleft" goes down, but post-patch, what gets
written is dependent on the contents of "iov" which is initialized
outside the loop and never updated. Perhaps compute_remaining_iovec
would be useful here?

- I assume this is probably a good idea in principle, because fewer
system calls are presumably better than more. The impact will probably
be very small, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-06 Thread Yugo Nagata

On Wed, 7 Aug 2024 02:13:04 +0900
Yugo Nagata  wrote:

> On Thu, 01 Aug 2024 13:34:51 -0700
> Jeff Davis  wrote:
> 
> > On Fri, 2024-08-02 at 00:13 +0500, Kirill Reshke wrote:
> > > On Thu, 1 Aug 2024 at 23:27, Jeff Davis  wrote:
> > > > 
> > > > EXPLAIN ANALYZE CREATE MATERIALIZED VIEW doesn't go through
> > > > ExecCreateTableAs(), but does use CreateIntoRelDestReceiver().
> > > 
> > > EXPLAIN ANALYZE and regular query goes through create_ctas_internal
> > > (WITH NO DATA case too). Maybe we can simply move
> > > SetUserIdAndSecContext call in this function?
> > 
> > We need to set up the SECURITY_RESTRICTED_OPERATION before planning, in
> > case the planner executes some functions.
> > 
> > I believe we need to do some more refactoring to make this work. In
> > version 17, I already refactored so that CREATE MATERIALIZED VIEW and
> > REFRESH MATERIALIZED VIEW share the code. We can do something similar
> > to extend that to EXPLAIN ... CREATE MATERIALIZED VIEW.
> 
> I think the most simple way to fix this is to set up the userid and the
> the SECURITY_RESTRICTED_OPERATION, and call RestrictSearchPath()
> before pg_plan_query in ExplainOneQuery(), as in the attached patch.

I'm sorry. After sending the mail, I found the patch didn't work.
If we call RestrictSearchPath() before creating a relation, it tries
to create the relation in pg_catalog and it causes an error.

Perhaps, we would need to create the rel before RestrictSearchPath() by calling
create_ctas_nodata or something like this...

> 
> However, this is similar to the old code of ExecCreateTableAs() before
> refactored. If we want to reuse REFRESH even in EXPLAIN as similar as the
> current CREATE MATERIALIZED code, I think we would have to refactor
> RefereshMatViewByOid to receive instrument_option and eflags as arguments
> and call it in ExplainOnePlan, for example.
> 
> Do you think that is more preferred than setting up
> SECURITY_RESTRICTED_OPERATION in ExplainOneQuery?
>  
> > As for the August release, the code freeze is on Saturday. Perhaps it
> > can be done by then, but is there a reason we should rush it? This
> > affects all supported versions, so we've lived with it for a while, and
> > I don't see a security problem here. I wouldn't expect it to be a
> > common use case, either.
> 
> I agree that we don't have to rush it since it is not a security bug
> but just it could make a materialized view that cannot be refreshed.
> 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo Nagata 


-- 
Yugo Nagata

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

On Wed, 2024-08-07 at 03:06 +0900, Yugo Nagata wrote:
> I'm sorry. After sending the mail, I found the patch didn't work.
> If we call RestrictSearchPath() before creating a relation, it tries
> to create the relation in pg_catalog and it causes an error.

Yeah, that's exactly the problem I ran into in ExecCreateTableAs(),
which was the reason I refactored it to use RefreshMatViewByOid().

> Perhaps, we would need to create the rel before RestrictSearchPath()
> by calling
> create_ctas_nodata or something like this...

I haven't looked at the details, but that makes sense: break it into
two parts so that the create (without data) happens first without doing
the work of EXPLAIN, and the second part refreshes using the explain
logic. That means RefreshMatViewByOid() would need to work with
EXPLAIN.

As you point out in the other email, it's not easy to make that all
work with REFRESH ... CONCURRENTLY, but perhaps it could work with
CREATE MATERIALIZED VIEW and REFRESH (without CONCURRENTLY).

Regards,
Jeff Davis

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

Jeff Davis  writes:
> As you point out in the other email, it's not easy to make that all
> work with REFRESH ... CONCURRENTLY, but perhaps it could work with
> CREATE MATERIALIZED VIEW and REFRESH (without CONCURRENTLY).

I'm not really sure I see the point of this, if it doesn't "just work"
with all variants of C.M.V.  It's not like you can't easily EXPLAIN
the view's SELECT.

If REFRESH M. V. does something different than CREATE, there would
certainly be value in being able to EXPLAIN what that does --- but
that still isn't an argument for allowing EXPLAIN CREATE MATERIALIZED
VIEW.

regards, tom lane

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

2024-08-06 Thread Yugo NAGATA

On Tue, 06 Aug 2024 11:24:03 -0700
Jeff Davis  wrote:

> On Wed, 2024-08-07 at 03:06 +0900, Yugo Nagata wrote:
> > I'm sorry. After sending the mail, I found the patch didn't work.
> > If we call RestrictSearchPath() before creating a relation, it tries
> > to create the relation in pg_catalog and it causes an error.
> 
> Yeah, that's exactly the problem I ran into in ExecCreateTableAs(),
> which was the reason I refactored it to use RefreshMatViewByOid().

I came up with an idea that we can rewrite into->rel to add the
current schemaname before calling RestrictSearchPath() as in the
attached patch.

It seems a simpler fix at least, but I am not sure whether this design
is better than using RefreshMatViewByOid from EXPLAIN considering
to support EXPLAIN REFRESH ...

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5771aabf40..1d06a7e96e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -15,6 +15,7 @@
 
 #include "access/xact.h"
 #include "catalog/pg_type.h"
+#include "catalog/namespace.h"
 #include "commands/createas.h"
 #include "commands/defrem.h"
 #include "commands/prepare.h"
@@ -22,6 +23,7 @@
 #include "jit/jit.h"
 #include "libpq/pqformat.h"
 #include "libpq/protocol.h"
+#include "miscadmin.h"
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -467,6 +469,10 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	MemoryContext planner_ctx = NULL;
 	MemoryContext saved_ctx = NULL;
 
+	Oid			save_userid = InvalidOid;
+	int			save_sec_context = 0;
+	int			save_nestlevel = 0;
+
 	if (es->memory)
 	{
 		/*
@@ -487,6 +493,24 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
 
+	/*
+	 * For CREATE MATERIALIZED VIEW command, switch to the owner's userid, so
+	 * that any functions are run as that user.  Also lock down security-restricted
+	 * operations and arrange to make GUC variable changes local to this command.
+	 */
+	if (into && into->viewQuery)
+	{
+		Oid	nspid = RangeVarGetCreationNamespace(into->rel);
+
+		into->rel = makeRangeVar(get_namespace_name(nspid), into->rel->relname, -1);
+
+		GetUserIdAndSecContext(&save_userid, &save_sec_context);
+		SetUserIdAndSecContext(save_userid,
+			   save_sec_context | SECURITY_RESTRICTED_OPERATION);
+		save_nestlevel = NewGUCNestLevel();
+		RestrictSearchPath();
+	}
+
 	/* plan the query */
 	plan = pg_plan_query(query, queryString, cursorOptions, params);
 
@@ -510,6 +534,16 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
    &planduration, (es->buffers ? &bufusage : NULL),
    es->memory ? &mem_counters : NULL);
+
+	/* CREATE MATERIALIZED VIEW command */
+	if (into && into->viewQuery)
+	{
+		/* Roll back any GUC changes */
+		AtEOXact_GUC(false, save_nestlevel);
+
+		/* Restore userid and security context */
+		SetUserIdAndSecContext(save_userid, save_sec_context);
+	}
 }
 
 /*

Re: CI and test improvements

2024-08-06 Thread Justin Pryzby

On Tue, Jun 18, 2024 at 02:25:46PM -0700, Andres Freund wrote:
> > > If not, we can just install the binaries distributed by the project, it's 
> > > just more work to keep up2date that way. 
> > 
> > I guess you mean a separate line to copy choco's ccache.exe somewhere.
> 
> I was thinking of alternatively just using the binaries upstream
> distributes. But the mingw way seems easier.
> 
> Perhaps we should add an environment variable pointing to ccache to the image,
> or such?

I guess you mean changing the OS image so there's a $CCACHE.
That sounds fine...

> > +CCACHE_MAXSIZE: "500M"
> 
> Does it have to be this big to work?

It's using 150MB for an initial compilation, and maxsize will need to be
20% larger for it to not evict its cache before it can be used.

The other ccaches (except for mingw) seem to be several times larger
than what's needed for a single compilation, which makes sense to cache
across multiple branches (or even commits in a single branch), and for
cfbot.

> A comment explaining why /Z7 is necessary would be good.

Sure

> >build_script: |
> >  vcvarsall x64
> > -ninja -C build
> > +ninja -C build |tee build.txt
> > +REM Since pipes lose the exit status of the preceding command, rerun 
> > the compilation
> > +REM without the pipe, exiting now if it fails, to avoid trying to run 
> > checks
> > +ninja -C build > nul
> 
> Perhaps we could do something like
> (ninja -C build && touch success) | tee build.txt
> cat success
> ?

I don't know -- a pipe alone seems more direct than a
subshell+conditional+pipe written in cmd.exe, whose syntax is not well
known here.

Maybe you're suggesting to write 

sh -c "(ninja -C build && touch success) | tee build.txt ; cat ./success"

But that's another layer of complexity .. for what ?

> > Subject: [PATCH 4/7] WIP: cirrus: upload changed html docs as artifacts
> 
> I still think that this runs the risk of increasing space utilization and thus
> increase frequency of caches/artifacts being purged.

Maybe it should run on the local macs where I think you can control
that.

> > +  # The commit that this branch is rebased on.  There's no easy way to 
> > find this.
> 
> I don't think that's true, IIRC I've complained about it before. We can do
> something like:

cfbot now exposes it, so it'd be trivial for that case (although there
was no response here to my inquiries about that).  I'll revisit this in
the future, once other patches have progressed.

> major_num=$(grep PG_MAJORVERSION_NUM build/src/include/pg_config.h|awk 
> '{print $3}');
> echo major: $major_num;
> if git rev-parse --quiet --verify origin/REL_${major_num}_STABLE > /dev/null 
> ; then
> basebranch=origin/REL_${major_num}_STABLE;
> else
> basebranch=origin/master;
> fi;
> echo base branch: $basebranch
> base_commit=$(git merge-base HEAD $basebranch)
>From 0cf4fa490b19aea8e536506e975f1e9b61be6574 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 1/6] cirrus/windows: add compiler_warnings_script

The goal is to fail due to warnings only after running tests.
(At least historically, it's too slow to run a separate windows VM to
compile with -Werror.)

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

20221104161934.gb16...@telsasoft.com
20221031223744.gt16...@telsasoft.com
20221104161934.gb16...@telsasoft.com
20221123054329.gg11...@telsasoft.com
20230224002029.gq1...@telsasoft.com

ci-os-only: windows
---
 .cirrus.tasks.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 1ce6c443a8c..5dd37e07aae 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -576,12 +576,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.42.0

>From 764987ff4093783e967c13f55dee60fb6d89b4a0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 22:05:13 -0500
Subject: [PATCH 2/6

Re: Fix comments in instr_time.h and remove an unneeded cast to int64


On 06/08/2024 18:16, Tom Lane wrote:

Heikki Linnakangas  writes:

Hmm, ok I see. Then I propose:



1. Revert
2. Just fix the comment to say int64 instead of uint64.


Yeah, it's probably reasonable to specify the output as int64
not uint64 (especially since it looks like that's what the
macros actually produce).


Committed

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: optimizing pg_upgrade's once-in-each-database steps

On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote:
> -- End of review --

Thanks for the review.  I've attempted to address all your feedback in v8
of the patch set.  I think the names could still use some work, but I
wanted to get the main structure in place before trying to fix them.

> Regarding keeping the connections, the way I envisioned it entailed passing
> a list of connections from one check to the next one (or keeping a global
> state with connections?). I didn't concretely look at the code to verify
> this, so it's just an abstract idea.

My main concern with doing something like that is it could require
maintaining a huge number of connections when there are many databases.
GUCs like max_connections would need to be set accordingly.  I'm a little
dubious that the gains would be enough to justify it.

-- 
nathan
>From 2536a3aac4cabe70c59feb8134d88ee54214e3a2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 28 Jun 2024 11:02:44 -0500
Subject: [PATCH v8 01/11] introduce framework for parallelizing pg_upgrade
 tasks

---
 src/bin/pg_upgrade/Makefile  |   1 +
 src/bin/pg_upgrade/async.c   | 466 +++
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/pg_upgrade.h  |  14 +
 src/tools/pgindent/typedefs.list |   4 +
 5 files changed, 486 insertions(+)
 create mode 100644 src/bin/pg_upgrade/async.c

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..3bc4f5d740 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -12,6 +12,7 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = \
$(WIN32RES) \
+   async.o \
check.o \
controldata.o \
dump.o \
diff --git a/src/bin/pg_upgrade/async.c b/src/bin/pg_upgrade/async.c
new file mode 100644
index 00..e279fdd676
--- /dev/null
+++ b/src/bin/pg_upgrade/async.c
@@ -0,0 +1,466 @@
+/*
+ * async.c
+ * parallelization via libpq's async APIs
+ *
+ * This framework provides an efficient way of running the various
+ * once-in-each-database tasks required by pg_upgrade.  Specifically, it
+ * parallelizes these tasks by managing a simple state machine of
+ * user_opts.jobs slots and using libpq's asynchronous APIs to establish the
+ * connections and run the queries.  Callers simply need to create a callback
+ * function and build/execute an AsyncTask.  A simple example follows:
+ *
+ * static void
+ * my_process_cb(DbInfo *dbinfo, PGresult *res, void *arg)
+ * {
+ * for (int i = 0; i < PQntuples(res); i++)
+ * {
+ * ... process results ...
+ * }
+ * }
+ *
+ * void
+ * my_task(ClusterInfo *cluster)
+ * {
+ * AsyncTask  *task = async_task_create();
+ *
+ * async_task_add_step(task,
+ * "... query text 
...",
+ * my_process_cb,
+ * true,   // let 
the task free the PGresult
+ * NULL);  // 
"arg" pointer for the callbacks
+ * async_task_run(task, cluster);
+ * async_task_free(task);
+ * }
+ *
+ * Note that multiple steps can be added to a given task.  When there are
+ * multiple steps, the task will run all of the steps consecutively in the same
+ * database connection before freeing the connection and moving on.  In other
+ * words, it only ever initiates one connection to each database in the
+ * cluster for a given run.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/async.c
+ */
+
+#include "postgres_fe.h"
+
+#include "common/connect.h"
+#include "fe_utils/string_utils.h"
+#include "pg_upgrade.h"
+
+/*
+ * dbs_complete stores the number of databases that we have completed
+ * processing.  When this value equals the number of databases in the cluster,
+ * the task is finished.
+ */
+static int dbs_complete;
+
+/*
+ * dbs_processing stores the index of the next database in the cluster's array
+ * of databases that will be picked up for processing.  It will always be
+ * greater than or equal to dbs_complete.
+ */
+static int dbs_processing;
+
+/*
+ * This struct stores all the information for a single step of a task.  All
+ * steps in a task are run in a single connection before moving on to the next
+ * database (which requires a new connection).
+ */
+typedef struct AsyncTaskCallbacks
+{
+   AsyncTaskProcessCB process_cb;  /* processes the results of the query */
+   const char *query;  /* query text */
+   boolfree_result;/* should we free the result? */
+   void   *arg;

Re: Inval reliability, especially for inplace updates

2024-08-06 Thread Noah Misch

On Tue, Jun 18, 2024 at 08:23:49AM -0700, Noah Misch wrote:
> On Mon, Jun 17, 2024 at 06:57:30PM -0700, Andres Freund wrote:
> > On 2024-06-17 16:58:54 -0700, Noah Misch wrote:
> > > That inplace150 patch turned out to be unnecessary.  Contrary to the
> > > "noncritical resource releasing" comment some lines above
> > > AtEOXact_Inval(true), the actual behavior is already to promote ERROR to
> > > PANIC.  An ERROR just before or after sending invals becomes PANIC, 
> > > "cannot
> > > abort transaction %u, it was already committed".
> > 
> > Relying on that, instead of explicit critical sections, seems fragile to me.
> > IIRC some of the behaviour around errors around transaction commit/abort has
> > changed a bunch of times. Tying correctness into something that could be
> > changed for unrelated reasons doesn't seem great.
> 
> Fair enough.  It could still be a good idea for master, but given I missed a
> bug in inplace150-inval-durability-atcommit-v1.patch far worse than the ones
> $SUBJECT fixes, let's not risk it in back branches.

What are your thoughts on whether a change to explicit critical sections
should be master-only vs. back-patched?  I have a feeling your comment pointed
to something I'm still missing, but I don't know where to look next.

Unused expression indexes

2024-08-06 Thread Maciek Sakrejda

In a blog post [1], Bruce Momjian notes that expression indexes can
help with planning even if they're not used directly. But the examples
cited in that post are vague (i.e., they improve stats, but it's not
clear how they could change plans), and Bruce's answer to a comment
[2] suggests that this is not documented.

Is there any more info on this mechanism? Specifically, if one has
unused expression indexes (according to pg_stat_user_indexes), is it
safe to drop them? Or could they be providing statistics that
materially affect query planning even though the indexes themselves
are unused?

It looks like a very similar question was asked earlier this year on
pgsql-general [3], but got only a vague answer. Any background or tips
for where to look in the source regarding this behavior would be
greatly appreciated.

Thanks,
Maciek

[1]: https://momjian.us/main/blogs/pgblog/2017.html#February_20_2017
[2]: 
https://momjian.us/main/comment_item.html?/main/blogs/pgblog.html/February_20_2017#comment-3174376969
[3]: 
https://www.postgresql.org/message-id/flat/CAHg%3DPn%3DOZu7A3p%2B0Z-CDG4s2CHYe3UFQCTZp4RWGCEn2gmD35A%40mail.gmail.com

Re: Minor refactorings to eliminate some static buffers


On 06/08/2024 18:52, Andres Freund wrote:

On 2024-08-06 18:13:56 +0300, Heikki Linnakangas wrote:

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 702a6c3a0b..2732d6bfc9 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1155,7 +1155,7 @@ ProcessUtilitySlow(ParseState *pstate,
{
CreateStmt *cstmt = 
(CreateStmt *) stmt;
Datum   
toast_options;
-   static char 
*validnsps[] = HEAP_RELOPT_NAMESPACES;
+   const char *validnsps[] 
= HEAP_RELOPT_NAMESPACES;
  
  			/* Remember transformed RangeVar for LIKE */

table_rv = 
cstmt->relation;


In the other places you used "const char * const", here just "const char *" - it
doesn't look like that's a required difference?


Just an oversight.

I went back and forth on whether to use plain "char *validnps[]", "const 
char *validnsps[]" or "const char *const validnsps[]". The first form 
doesn't convey the fact it's read-only, like the "static" used to. The 
second form hints that, but only for the strings, not for the pointers 
in the array. The last form is what we want, but it's a bit verbose and 
ugly. I chose the last form in the end, but missed this one.


Fixed that and pushed. Thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Unused expression indexes

Maciek Sakrejda  writes:
> In a blog post [1], Bruce Momjian notes that expression indexes can
> help with planning even if they're not used directly. But the examples
> cited in that post are vague (i.e., they improve stats, but it's not
> clear how they could change plans), and Bruce's answer to a comment
> [2] suggests that this is not documented.

> Is there any more info on this mechanism? Specifically, if one has
> unused expression indexes (according to pg_stat_user_indexes), is it
> safe to drop them? Or could they be providing statistics that
> materially affect query planning even though the indexes themselves
> are unused?

Expression indexes definitely can affect planning, because ANALYZE
collects stats on the values of those expressions.  As a trivial
example,

regression=# create table foo (x1 float8);
CREATE TABLE
regression=# insert into foo select 10 * random() from generate_series(1,1);
INSERT 0 1
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
 QUERY PLAN 
-
 Seq Scan on foo  (cost=0.00..195.00 rows= width=8) (actual 
time=0.009..0.546 rows=1028 loops=1)
   Filter: (sqrt(x1) < '1'::double precision)
   Rows Removed by Filter: 8972
 Planning Time: 0.065 ms
 Execution Time: 0.572 ms
(5 rows)

The planner has no info about the values of sqrt(x1), so you get a
default estimate (one-third) of the selectivity of the WHERE clause.
But watch this:

regression=# create index on foo (sqrt(x1));
CREATE INDEX
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
 QUERY PLAN 

 Bitmap Heap Scan on foo  (cost=24.24..84.63 rows=1026 width=8) (actual 
time=0.078..0.229 rows=1028 loops=1)
   Recheck Cond: (sqrt(x1) < '1'::double precision)
   Heap Blocks: exact=45
   ->  Bitmap Index Scan on foo_sqrt_idx  (cost=0.00..23.98 rows=1026 width=0) 
(actual time=0.068..0.068 rows=1028 loops=1)
 Index Cond: (sqrt(x1) < '1'::double precision)
 Planning Time: 0.113 ms
 Execution Time: 0.259 ms
(7 rows)

Now there are stats about the values of sqrt(x1), allowing a far more
accurate selectivity estimate to be made.  In this particular example
there's no change of plan (it would have used the index anyway), but
certainly a different rowcount estimate can make a big difference.

This mechanism is quite ancient, and in principle it's now superseded
by extended statistics.  For example, I can drop this index and
instead do

regression=# drop index foo_sqrt_idx;
DROP INDEX
regression=# create statistics foostats on sqrt(x1) from foo;
CREATE STATISTICS
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
 QUERY PLAN 

-
 Seq Scan on foo  (cost=0.00..195.00 rows=1026 width=8) (actual 
time=0.006..0.479 rows=1028 loops=1)
   Filter: (sqrt(x1) < '1'::double precision)
   Rows Removed by Filter: 8972
 Planning Time: 0.079 ms
 Execution Time: 0.503 ms
(5 rows)

So the accurate rowcount estimate is still obtained in this example;
and we're not incurring any index maintenance costs, only ANALYZE
costs that are going to be roughly the same either way.

However, I am not certain that extended statistics are plugged into
all the places where the older mechanism applies.  Tomas Vondra might
have a better idea than I of where gaps remain in that.

regards, tom lane

Re: Minor refactorings to eliminate some static buffers


On 06.08.24 17:13, Heikki Linnakangas wrote:
> --- a/src/backend/access/transam/xlogprefetcher.c
> +++ b/src/backend/access/transam/xlogprefetcher.c
> @@ -362,7 +362,7 @@ XLogPrefetcher *
>  XLogPrefetcherAllocate(XLogReaderState *reader)
>  {
> XLogPrefetcher *prefetcher;
> -   static HASHCTL hash_table_ctl = {
> +   const HASHCTL hash_table_ctl = {

Is there a reason this is not changed to

static const HASHCTL ...

?  Most other places where changed in that way.

Re: Unused expression indexes

2024-08-06 Thread Maciek Sakrejda

On Tue, Aug 6, 2024 at 1:25 PM Tom Lane  wrote:
> The planner has no info about the values of sqrt(x1), so you get a
> default estimate (one-third) of the selectivity of the WHERE clause.
> But watch this:
>
> regression=# create index on foo (sqrt(x1));
> CREATE INDEX
> regression=# analyze foo;
> ANALYZE
> regression=# explain analyze select * from foo where sqrt(x1) < 1;
>  QUERY PLAN
> 
>  Bitmap Heap Scan on foo  (cost=24.24..84.63 rows=1026 width=8) (actual 
> time=0.078..0.229 rows=1028 loops=1)
>Recheck Cond: (sqrt(x1) < '1'::double precision)
>Heap Blocks: exact=45
>->  Bitmap Index Scan on foo_sqrt_idx  (cost=0.00..23.98 rows=1026 
> width=0) (actual time=0.068..0.068 rows=1028 loops=1)
>  Index Cond: (sqrt(x1) < '1'::double precision)
>  Planning Time: 0.113 ms
>  Execution Time: 0.259 ms
> (7 rows)
>
> Now there are stats about the values of sqrt(x1), allowing a far more
> accurate selectivity estimate to be made.  In this particular example
> there's no change of plan (it would have used the index anyway), but
> certainly a different rowcount estimate can make a big difference.

Thanks, but I was asking specifically about _unused_ indexes
(according to pg_stat_user_indexes). Bruce's blog post showed how they
can still influence rowcount estimates, but can they do that (1) even
if they don't end up being used by the query plan and (2) in a way
that leads to a different plan?

Basically, if I have some unused expression indexes, is it safe to
drop them, or could they be used for planning optimizations even if
they are not used directly.

Thanks,
Maciek

Re: Unused expression indexes

Maciek Sakrejda  writes:
> Thanks, but I was asking specifically about _unused_ indexes
> (according to pg_stat_user_indexes). Bruce's blog post showed how they
> can still influence rowcount estimates, but can they do that (1) even
> if they don't end up being used by the query plan and (2) in a way
> that leads to a different plan?

Certainly.  This example was too simple to illustrate that point
perhaps, but we would have arrived at the same rowcount estimate
whether it then chose to use the index or not.  (You could prove
that with the same example by modifying the comparison constant
to make the rowcount estimate high enough to discourage use of
the index.)  In turn, a different rowcount estimate might change
the plan for subsequent joins or other processing.  I didn't
spend enough time on the example to show that, but it's surely
not hard to show.

regards, tom lane

Re: Unused expression indexes

2024-08-06 Thread Tomas Vondra

On 8/6/24 22:25, Tom Lane wrote:
> Maciek Sakrejda  writes:
>> In a blog post [1], Bruce Momjian notes that expression indexes can
>> help with planning even if they're not used directly. But the examples
>> cited in that post are vague (i.e., they improve stats, but it's not
>> clear how they could change plans), and Bruce's answer to a comment
>> [2] suggests that this is not documented.
> 
>> Is there any more info on this mechanism? Specifically, if one has
>> unused expression indexes (according to pg_stat_user_indexes), is it
>> safe to drop them? Or could they be providing statistics that
>> materially affect query planning even though the indexes themselves
>> are unused?
> 
> Expression indexes definitely can affect planning, because ANALYZE
> collects stats on the values of those expressions.  As a trivial
> example,
> 
> regression=# create table foo (x1 float8);
> CREATE TABLE
> regression=# insert into foo select 10 * random() from 
> generate_series(1,1);
> INSERT 0 1
> regression=# analyze foo;
> ANALYZE
> regression=# explain analyze select * from foo where sqrt(x1) < 1;
>  QUERY PLAN 
> -
>  Seq Scan on foo  (cost=0.00..195.00 rows= width=8) (actual 
> time=0.009..0.546 rows=1028 loops=1)
>Filter: (sqrt(x1) < '1'::double precision)
>Rows Removed by Filter: 8972
>  Planning Time: 0.065 ms
>  Execution Time: 0.572 ms
> (5 rows)
> 
> The planner has no info about the values of sqrt(x1), so you get a
> default estimate (one-third) of the selectivity of the WHERE clause.
> But watch this:
> 
> regression=# create index on foo (sqrt(x1));
> CREATE INDEX
> regression=# analyze foo;
> ANALYZE
> regression=# explain analyze select * from foo where sqrt(x1) < 1;
>  QUERY PLAN   
>   
> 
>  Bitmap Heap Scan on foo  (cost=24.24..84.63 rows=1026 width=8) (actual 
> time=0.078..0.229 rows=1028 loops=1)
>Recheck Cond: (sqrt(x1) < '1'::double precision)
>Heap Blocks: exact=45
>->  Bitmap Index Scan on foo_sqrt_idx  (cost=0.00..23.98 rows=1026 
> width=0) (actual time=0.068..0.068 rows=1028 loops=1)
>  Index Cond: (sqrt(x1) < '1'::double precision)
>  Planning Time: 0.113 ms
>  Execution Time: 0.259 ms
> (7 rows)
> 
> Now there are stats about the values of sqrt(x1), allowing a far more
> accurate selectivity estimate to be made.  In this particular example
> there's no change of plan (it would have used the index anyway), but
> certainly a different rowcount estimate can make a big difference.
> 
> This mechanism is quite ancient, and in principle it's now superseded
> by extended statistics.  For example, I can drop this index and
> instead do
> 
> regression=# drop index foo_sqrt_idx;
> DROP INDEX
> regression=# create statistics foostats on sqrt(x1) from foo;
> CREATE STATISTICS
> regression=# analyze foo;
> ANALYZE
> regression=# explain analyze select * from foo where sqrt(x1) < 1;
>  QUERY PLAN   
>
> -
>  Seq Scan on foo  (cost=0.00..195.00 rows=1026 width=8) (actual 
> time=0.006..0.479 rows=1028 loops=1)
>Filter: (sqrt(x1) < '1'::double precision)
>Rows Removed by Filter: 8972
>  Planning Time: 0.079 ms
>  Execution Time: 0.503 ms
> (5 rows)
> 
> So the accurate rowcount estimate is still obtained in this example;
> and we're not incurring any index maintenance costs, only ANALYZE
> costs that are going to be roughly the same either way.
> 
> However, I am not certain that extended statistics are plugged into
> all the places where the older mechanism applies.  Tomas Vondra might
> have a better idea than I of where gaps remain in that.
> 

AFAIK it handles all / exactly the same cases. The magic happens in
examine_variable() in selfuncs.c, where we look for stats for simple
Vars, and then for stats for expressions. First we look at all indexes,
and then (if there's no suitable index) at all extended stats.

There might be a place doing something ad hoc, but I can't think of any.


regards

-- 
Tomas Vondra

Re: Inconsistency with EXPLAIN ANALYZE CREATE MATERIALIZED VIEW

On Tue, 2024-08-06 at 14:36 -0400, Tom Lane wrote:
> I'm not really sure I see the point of this, if it doesn't "just
> work"
> with all variants of C.M.V.  It's not like you can't easily EXPLAIN
> the view's SELECT.
> 
> If REFRESH M. V. does something different than CREATE, there would
> certainly be value in being able to EXPLAIN what that does --- but
> that still isn't an argument for allowing EXPLAIN CREATE MATERIALIZED
> VIEW.

We already allow EXPLAIN ANALYZE CREATE MATERIALIZED VIEW in all
supported versions.

That seems strange and it surprised me, but the parser structure is
shared between SELECT ... INTO and CREATE MATERIALIZED VIEW, so I
suppose it was supported out of convenience.

The problem is that the implentation is split between the EXPLAIN
ANALYZE path and the non-EXPLAIN path. The long-ago commit f3ab5d4696
missed the EXPLAIN path. NB: I do not believe this is a security
concern, but it does create the inconsistency described in the email
starting this thread.

Options:

1. Do nothing on the grounds that EXPLAIN ANALYZE CREATE MATERIALIZED
VIEW is not common enough to worry about, and the consequences of the
inconsistency are not bad enough.

2. Refactor some more to make EXPLAIN ANALYZE CREATE MATERIALIZED VIEW
share the query part of the code path with REFRESH so that it benefits
from the SECURITY_RESTRICTED_OPERATION and RestrictSearchPath().

3. Do #2 but also make it work for REFRESH, but not CONCURRENTLY.

4. Do #3 but also make it work for REFRESH ... CONCURRENTLY and provide
new information that's not available by only explaining the query.

And also figure out if any of this should be back-patched.

Regards,
Jeff Davis

Re: tiny step toward threading: reduce dependence on setlocale()

On Tue, 2024-07-30 at 12:13 -0700, Jeff Davis wrote:
> I found a couple issues with the later patches:
> 
> * There was still some confusion about the default collation vs.
> datcollate/datctype for callers of wchar2char() and char2wchar()
> (those
> functions only work for libc). I introduced a new pg_locale_t
> structure
> to represent datcollate/datctype regardless of datlocprovider to
> solve
> this.

I didn't quite like the API I introduced in 0001, so I skipped 0001.

For 0002 I left char2wchar() and wchar2char() as-is, where they expect
libc and accept a NULL pg_locale_t. I committed the rest of 0002.

> * Another loose end relying on setlocale(): in selfuncs.c, there's
> still a call directly to strxfrm(), which depends on setlocale(). I
> changed this to lookup the collation and then use pg_strxfrm(). That
> should improve histogram selectivity estimates because it uses the
> correct provider, rather than relying on setlocale(), right?

Committed 0003.

With these changes, collations are no longer dependent on the
environment locale (setlocale()) at all for either collation behavior
(ORDER BY) or ctype behavior (LOWER(), etc.).

Additionally, unless I missed something, nothing in the server is
dependent on LC_COLLATE at all.

There are still some things that depend on setlocale() in one way or
another:

  - char2wchar() & wchar2char()
  - ts_locale.c
  - various places that depend on LC_CTYPE unrelated to the collation
infrastructure
  - things that depend on other locale settings, like LC_NUMERIC

We can address those as part of a separate thread. I'll count this as
committed.

Regards,
Jeff Davis

Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)

2024-08-06 Thread Matthias van de Meent

On Fri, 1 Mar 2024 at 14:48, Matthias van de Meent
wrote:
> Attached is version 15 of this patch, with the above issues fixed.
> It's also rebased on top of 655dc310 of this morning, so that should
> keep good for some time again.

Attached is version 16 now. Relevant changes from previous patch versions:

- Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes,
and use "prefix" as naming scheme, rather than cmpcol. A lack of
prefix, previously indicated with a cmpcol value of 1, is now a prefix
value of 0.
- Adjusted README
- Improved the efficiency of the insertion path in some cases where
we've previously compared the page's highkey.

As always, why we need this:

Currently, btrees are quite inefficient when they have many key
attributes but low attribute cardinality in the prefix, e.g. an index
on ("", "", "", uuid). This is not just inefficient use of disk space
with the high repetition of duplicate prefix values in pages, but it
is also a computational overhead when we're descending the tree in
e.g. _bt_first() or btinsert(): The code we use to search a page
currently compares the full key to the full searchkey, for a
complexity of O(n_equal_attributes + 1) for every tuple on the page,
for O(log(page_ntups) * (n_equal_attributes + 1)) attribute compares
every page during descent.

This patch fixes that part of the computational complexity by applying
dynamic prefix compression, thus reducing the average computational
complexity in random index lookups to O(3 * (n_equal_attributes) +
log(page_ntups)) per page (assuming at least 4 non-hikey tuples on
each page). In practice, this makes indexes with 3+ attributes and
prefixes with low selectivity (such as the example above) much more
viable computationally, as we have to spend much less effort on
comparing known index attributes during descent.

Note that this does _not_ reuse prefix bounds across pages - it
re-establishes the left- and right prefixes every page during the
binary search. See the README modified in the patch for specific
implementation details and considerations.

This patch synergizes with the highkey optimization used in [0]: When
combined, the number of attribute compare function calls could be
further reduced to O(2 * (n_equal_atts) + log(page_ntups)), a
reduction by n_equal_atts every page, which in certain wide index
types could be over 25% of all attribute compare function calls on the
page after dynamic prefix truncation. However, both are separately
useful and reduce the amount of work done on most pages.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0]
https://www.postgresql.org/message-id/flat/CAEze2WijWhCQy_nZVP4Ye5Hwj=YW=3rqv+hbmjgcohtryqm...@mail.gmail.com

v16-0001-btree-Implement-dynamic-prefix-truncation.patch
Description: Binary data

Re: Remove dependence on integer wrapping

On Wed, Jul 24, 2024 at 05:49:41PM -0400, Matthew Kim wrote:
> Hi, I´ve attached a patch that fixes the make_date overflow. I´ve upgraded
> the patches accordingly to reflect Joseph´s committed fix.

cfbot is unhappy with this one on Windows [0], and I think I see why.
While the patch uses pg_mul_s32_overflow(), it doesn't check its return
value, so we'll just proceed with a bogus value in the event of an
overflow.  Windows apparently doesn't have HAVE__BUILTIN_OP_OVERFLOW
defined, so pg_mul_s32_overflow() sets the result to 0x5eed (24,301 in
decimal), which is where I'm guessing the year -24300 in the ERROR is from.

[0] 
https://api.cirrus-ci.com/v1/artifact/task/5872294914949120/testrun/build/testrun/regress/regress/regression.diffs

-- 
nathan

Remaining dependency on setlocale()

After some previous work here:

https://postgr.es/m/89475ee5487d795124f4e25118ea8f1853edb8cb.ca...@j-davis.com

we are less dependent on setlocale(), but it's still not completely
gone.

setlocale() counts as thread-unsafe, so it would be nice to eliminate
it completely.

The obvious answer is uselocale(), which sets the locale only for the
calling thread, and takes precedence over whatever is set with
setlocale().

But there are a couple problems:

1. I don't think it's supported on Windows.

2. I don't see a good way to canonicalize a locale name, like in
check_locale(), which uses the result of setlocale().

Thoughts?

Regards,
Jeff Davis

Re: Minor refactorings to eliminate some static buffers

On 06/08/2024 23:41, Peter Eisentraut wrote:

On 06.08.24 17:13, Heikki Linnakangas wrote:
  > --- a/src/backend/access/transam/xlogprefetcher.c
  > +++ b/src/backend/access/transam/xlogprefetcher.c
  > @@ -362,7 +362,7 @@ XLogPrefetcher *
  >  XLogPrefetcherAllocate(XLogReaderState *reader)
  >  {
  > XLogPrefetcher *prefetcher;
  > -   static HASHCTL hash_table_ctl = {
  > +   const HASHCTL hash_table_ctl = {

Is there a reason this is not changed to

static const HASHCTL ...

?  Most other places where changed in that way.

No particular reason. Grepping for HASHCTL's, this is actually different 
from all other uses of HASHCTL and hash_create. All others use a plain 
local variable, and fill the fields like this:

HASHCTL hash_ctl;

hash_ctl.keysize = sizeof(missing_cache_key);
hash_ctl.entrysize = sizeof(missing_cache_key);
hash_ctl.hcxt = TopMemoryContext;
hash_ctl.hash = missing_hash;
hash_ctl.match = missing_match;

I think that's just because we haven't allowed C99 designated 
initializers for very long.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Remaining dependency on setlocale()

Jeff Davis  writes:
> But there are a couple problems:

> 1. I don't think it's supported on Windows.

Can't help with that, but surely Windows has some thread-safe way.

> 2. I don't see a good way to canonicalize a locale name, like in
> check_locale(), which uses the result of setlocale().

What I can tell you about that is that check_locale's expectation
that setlocale does any useful canonicalization is mostly wishful
thinking [1].  On a lot of platforms you just get the input string
back again.  If that's the only thing keeping us on setlocale,
I think we could drop it.  (Perhaps we should do some canonicalization
of our own instead?)

regards, tom lane

[1] https://www.postgresql.org/message-id/14856.1348497...@sss.pgh.pa.us

Re: Vectored IO in XLogWrite()

2024-08-06 Thread Melih Mutlu

Hi Robert,

Thanks for reviewing.

Robert Haas , 6 Ağu 2024 Sal, 20:43 tarihinde şunu
yazdı:

> On Tue, Aug 6, 2024 at 5:36 AM Melih Mutlu  wrote:
> > I think that we don't have the "contiguous pages" constraint when
> writing anymore as we can do vectored IO. It seems unnecessary to write
> just because XLogWrite() is at the end of wal buffers.
> > Attached patch uses pg_pwritev() instead of pg_pwrite() and tries to
> write pages in one call even if they're not contiguous in memory, until it
> reaches the page at startidx.
>
> Here are a few notes on this patch:
>
> - It's not pgindent-clean. In fact, it doesn't even pass git diff --check.
>

Fixed.


> - You added a new comment (/* Reaching the buffer... */) in the middle
> of a chunk of lines that were already covered by an existing comment
> (/* Dump the set ... */). This makes it look like the /* Dump the
> set... */ comment only covers the 3 lines of code that immediately
> follow it rather than everything in the "if" statement. You could fix
> this in a variety of ways, but in this case the easiest solution, to
> me, looks like just skipping the new comment. It seems like the point
> is pretty self-explanatory.
>

Removed the new comment. Only keeping the updated version of the /* Dump
the set... */ comment.


> - The patch removes the initialization of "from" but not the variable
> itself. You still increment the variable you haven't initialized.
>
> - I don't think the logic is correct after a partial write. Pre-patch,
> "from" advances while "nleft" goes down, but post-patch, what gets
> written is dependent on the contents of "iov" which is initialized
> outside the loop and never updated. Perhaps compute_remaining_iovec
> would be useful here?
>

You're right. I should have thought about the partial write case. I now
fixed it by looping and trying to write until compute_remaining_iovec()
returns 0.

Thanks,
-- 
Melih Mutlu
Microsoft


v2-0001-Use-pg_pwritev-in-XlogWrite.patch
Description: Binary data

Re: Avoiding superfluous buffer locking during nbtree backwards scans

2024-08-06 Thread Matthias van de Meent

On Fri, 5 Jul 2024 at 18:48, Peter Geoghegan  wrote:
>
> Attached patch teaches nbtree backwards scans to avoid needlessly
> relocking a previously read page/buffer at the point where we need to
> consider reading the next page (the page to the left).

+1, LGTM.

This changes the backward scan code in _bt_readpage to have an
approximately equivalent handling as the forward scan case for
end-of-scan cases, which is an improvement IMO.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

Enable data checksums by default

2024-08-06 Thread Greg Sabino Mullane

Please find attached a patch to enable data checksums by default.

Currently, initdb only enables data checksums if passed the
--data-checksums or -k argument. There was some hesitation years ago when
this feature was first added, leading to the current situation where the
default is off. However, many years later, there is wide consensus that
this is an extraordinarily safe, desirable setting. Indeed, most (if not
all) of the major commercial and open source Postgres systems currently
turn this on by default. I posit you would be hard-pressed to find many
systems these days in which it has NOT been turned on. So basically we have
a de-facto standard, and I think it's time we flipped the switch to make it
on by default.

The patch is simple enough: literally flipping the boolean inside of
initdb.c, and adding a new argument '--no-data-checksums' for those
instances that truly want the old behavior. One place that still needs the
old behavior is our internal tests for pg_checksums and pg_amcheck, so I
added a new argument to init() in PostgreSQL/Test/Cluster.pm to allow those
to still pass their tests.

This is just the default - people are still more than welcome to turn it
off with the new flag. The pg_checksums program is another option that
actually argues for having the default "on", as switching to "off" once
initdb has been run is trivial.

Yes, I am aware of the previous discussions on this, but the world moves
fast - wal compression is better than in the past, vacuum is better now,
and data-checksums being on is such a complete default in the wild, it
feels weird and a disservice that we are not running all our tests like
that.

Cheers,
Greg
From 12ce067f5ba64414d1d14c5f2e763d04cdfacd13 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane 
Date: Tue, 6 Aug 2024 18:18:56 -0400
Subject: [PATCH] Make initdb enable data checksums by default

---
 doc/src/sgml/ref/initdb.sgml  |  4 ++-
 src/bin/initdb/initdb.c   |  6 -
 src/bin/initdb/t/001_initdb.pl| 31 +--
 src/bin/pg_amcheck/t/003_check.pl |  2 +-
 src/bin/pg_amcheck/t/004_verify_heapam.pl |  2 +-
 src/bin/pg_checksums/t/002_actions.pl |  2 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  7 +
 7 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index bdd613e77f..511f489d34 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -267,12 +267,14 @@ PostgreSQL documentation

 Use checksums on data pages to help detect corruption by the
 I/O system that would otherwise be silent. Enabling checksums
-may incur a noticeable performance penalty. If set, checksums
+may incur a small performance penalty. If set, checksums
 are calculated for all objects, in all databases. All checksum
 failures will be reported in the
 
 pg_stat_database view.
 See  for details.
+As of version 18, checksums are enabled by default. They can be
+disabled by use of --no-data-checksums.

   
  
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a015..ce7d3e99e5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -164,7 +164,7 @@ static bool noinstructions = false;
 static bool do_sync = true;
 static bool sync_only = false;
 static bool show_setting = false;
-static bool data_checksums = false;
+static bool data_checksums = true;
 static char *xlog_dir = NULL;
 static int	wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
 static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
@@ -3121,6 +3121,7 @@ main(int argc, char *argv[])
 		{"waldir", required_argument, NULL, 'X'},
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
+		{"no-data-checksums", no_argument, NULL, 20},
 		{"allow-group-access", no_argument, NULL, 'g'},
 		{"discard-caches", no_argument, NULL, 14},
 		{"locale-provider", required_argument, NULL, 15},
@@ -3319,6 +3320,9 @@ main(int argc, char *argv[])
 if (!parse_sync_method(optarg, &sync_method))
 	exit(1);
 break;
+			case 20:
+data_checksums = false;
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 06a35ac0b7..64395ec531 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -69,16 +69,11 @@ mkdir $datadir;
 	}
 }
 
-# Control file should tell that data checksums are disabled by default.
+# Control file should tell that data checksums are enabled by default.
 command_like(
 	[ 'pg_controldata', $datadir ],
-	qr/Data page checksum version:.*0/,
-	'checksums are disabled in control file');
-# pg_checksums fails with checksums disabled by default.

Re: Instability with incremental backup tests (pg_combinebackup, 003_timeline.pl)

2024-08-06 Thread Tomas Vondra




On 8/6/24 14:53, Tomas Vondra wrote:
> On 8/6/24 07:48, Michael Paquier wrote:
>> Hi all,
>>
>> dikkop has reported a failure with the regression tests of pg_combinebackup:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-08-04%2010%3A04%3A51
>>
>> That's in the test 003_timeline.pl, from dc212340058b:
>> #   Failed test 'incremental backup from node1'
>> #   at t/003_timeline.pl line 43.
>>
>> The node is extremely slow, so perhaps bumping up the timeout would be
>> fine enough in this case (did not spend time analyzing it).  I don't
>> think that this has been discussed, but perhaps I just missed a
>> reference to it and the incremental backup thread is quite large.
>>
> 
> Yeah, it's a freebsd running on rpi4, from a USB flash disk, and in my
> experience it's much slower than rpi4 running Linux. I'm not sure why is
> that, never found a way to make it faster
> 
> The machine already has:
> 
>   export PGCTLTIMEOUT=600
>   export PG_TEST_TIMEOUT_DEFAULT=600
> 
> I doubt increasing it further will do the trick. Maybe there's some
> other timeout that I should increase?
> 
> FWIW I just moved the buildfarm stuff to a proper SSD disk (still USB,
> but hopefully better than the crappy flash disk).
> 

Seems the move to SSD helped a lot - the runs went from ~4h to ~40m. So
chances are the instability won't be such a problem.

regards

-- 
Tomas Vondra

Re: Proposal for implementing OCSP Stapling in PostgreSQL

2024-08-06 Thread David Zhang

Thanks a lot Jacob for helping update the tests and sorry for the late 
reply.


Based on previous discussion, I remove the document patch, and start to 
focus on the v1 simple OCSP logic by checking the leaf/Postgres server 
certificate's status only 
(0001-v1-WIP-OCSP-support-certificate-status-check.patch). I also merge 
your changes and simplify the test by testing the Postgres server 
certificate's status only 
(0002-v1-WIP-OCSP-simplify-.res-generation-and-regress-tes.patch).


On 2024-07-18 10:18 a.m., Jacob Champion wrote:

A question from ignorance: how does the client decide that the
signature on the OCSP response is actually valid for the specific
chain in use?

If I understand correctly, the certificate used by the OCSP responder to
sign the OCSP response must be valid for the specific chain in use, or
the admins allow to load a new chain to validate the certificate used to
sign the OCSP response. I think it would be better to make this part to
be more open.

Okay. That part needs more design work IMO, and solid testing.
Based on the RFC here, 
https://datatracker.ietf.org/doc/html/rfc6960#section-4.2.2.2. My 
understanding is that the OCSP responder's certificate should be 
directly signed by the same CA which signed the Postgres Server's 
certificate. It looks the openssl 3.0.14 implements in this way. In 
other words, it the OCSP responder's certificate is signed by a 
different branch of the chain, then openssl will report some errors. 
Unless the end user explicitly provides the OCSP responder's certificate 
with trust_other option. In this case it will skip the some certificate 
verification. I think it should be simple enough for v1 by set 
`OCSP_basic_verify(basic_resp, NULL, trusted_store, 0)`.  The key 
function OCSP_basic_verify is documented at here, 
https://docs.openssl.org/3.0/man3/OCSP_resp_find_status/




Thank you,
DavidFrom 368c77885d7925334e8dabcce655b6a82f0a9a8f Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Tue, 6 Aug 2024 15:38:14 -0700
Subject: [PATCH 2/2] v1 WIP OCSP simplify .res generation and regress test

---
 src/test/ssl/conf/ocsp_ca.config  |  39 ++
 src/test/ssl/sslfiles.mk  |  59 +--
 src/test/ssl/t/001_ssltests.pl| 100 ++
 src/test/ssl/t/SSL/Backend/OpenSSL.pm |   3 +
 src/test/ssl/t/SSL/Server.pm  |   4 ++
 5 files changed, 201 insertions(+), 4 deletions(-)
 create mode 100644 src/test/ssl/conf/ocsp_ca.config

diff --git a/src/test/ssl/conf/ocsp_ca.config b/src/test/ssl/conf/ocsp_ca.config
new file mode 100644
index 00..04f78bacb8
--- /dev/null
+++ b/src/test/ssl/conf/ocsp_ca.config
@@ -0,0 +1,39 @@
+# An OpenSSL format CSR config file for creating the ocsp responder 
certificate.
+# This configuration file is also used when operating the CA.
+#
+# This certificate is used to sign OCSP resonpose.
+
+[ req ]
+distinguished_name = req_distinguished_name
+prompt = no
+req_extensions = v3_ocsp
+
+[ req_distinguished_name ]
+CN = Test CA for PostgreSQL SSL regression test ocsp 
response
+
+# Extensions for OCSP responder certs
+[ v3_ocsp ]
+basicConstraints = CA:false
+keyUsage = nonRepudiation, digitalSignature, keyEncipherment
+extendedKeyUsage = OCSPSigning
+
+[ ca ]
+default_ca = ocsp
+
+# A shell of a CA, mostly duplicating the server CA, which is used only during
+# the OCSP index generation recipes.
+[ ocsp ]
+dir = ./ssl/
+
+# The database (or "index") is the main thing we want.
+database = ./ssl/ocsp-certindex
+
+# Everything else should all be unused, so we specify whatever's most
+# convenient. In particular there's no need to have a unique cert/key pair for
+# this.
+certificate = ./ssl/server_ca.crt
+private_key = ./ssl/server_ca.key
+serial = ./ssl/ocsp_ca.srl
+default_md = sha256
+policy = policy_match
+
diff --git a/src/test/ssl/sslfiles.mk b/src/test/ssl/sslfiles.mk
index 88c93ec18d..6f314b7153 100644
--- a/src/test/ssl/sslfiles.mk
+++ b/src/test/ssl/sslfiles.mk
@@ -35,6 +35,10 @@ SERVERS := server-cn-and-alt-names \
server-revoked
 CLIENTS := client client-dn client-revoked client_ext client-long \
client-revoked-utf8
+OCSPS := server-ocsp-good \
+   server-ocsp-revoked \
+   server-ocsp-expired \
+   server-ocsp-unknown
 
 #
 # To add a new non-standard certificate, add it to SPECIAL_CERTS and then add
@@ -64,12 +68,13 @@ COMBINATIONS := \
ssl/client+client_ca.crt \
ssl/server-cn-only+server_ca.crt
 
-CERTIFICATES := root_ca server_ca client_ca $(SERVERS) $(CLIENTS)
+CERTIFICATES := root_ca server_ca client_ca ocsp_ca $(SERVERS) $(CLIENTS)
 STANDARD_CERTS := $(CERTIFICATES:%=ssl/%.crt)
 STANDARD_KEYS := $(CERTIFICATES:%=ssl/%.key)
 CRLS := ssl/root.crl \
ssl/client.crl \
ssl/server.crl
+OCSPRES := $(OCSPS:%=ssl/%.res)
 
 SSLFILES := \
$(STANDARD_CERTS) \
@@ -77,7 +82,8 @@ SSLFILES := \
$(SPECIAL_CERTS) \
$(SPECIAL_KEYS) \

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-08-06 Thread Matthias van de Meent

On Tue, 11 Jun 2024 at 10:58, Michail Nikolaev
 wrote:
>
> Hello.
>
> I did the POC (1) of the method described in the previous email, and it looks 
> promising.
>
> It doesn't block the VACUUM, indexes are built about 30% faster (22 mins vs 
> 15 mins).

That's a nice improvement.

> Additional index is lightweight and does not produce any WAL.

That doesn't seem to be what I see in the current patchset:
https://github.com/postgres/postgres/compare/master...michail-nikolaev:postgres:new_index_concurrently_approach#diff-cc3cb8968cf833c4b8498ad2c561c786099c910515c4bf397ba853ae60aa2bf7R311

> I'll continue the more stress testing for a while. Also, I need to 
> restructure the commits (my path was no direct) into some meaningful and 
> reviewable patches.

While waiting for this, here are some initial comments on the github diffs:

- I notice you've added a new argument to
heapam_index_build_range_scan. I think this could just as well be
implemented by reading the indexInfo->ii_Concurrent field, as the
values should be equivalent, right?

- In heapam_index_build_range_scan, it seems like you're popping the
snapshot and registering a new one while holding a tuple from
heap_getnext(), thus while holding a page lock. I'm not so sure that's
OK, expecially when catalogs are also involved (specifically for
expression indexes, where functions could potentially be updated or
dropped if we re-create the visibility snapshot)

- In heapam_index_build_range_scan, you pop the snapshot before the
returned heaptuple is processed and passed to the index-provided
callback. I think that's incorrect, as it'll change the visibility of
the returned tuple before it's passed to the index's callback. I think
the snapshot manipulation is best added at the end of the loop, if we
add it at all in that function.

- The snapshot reset interval is quite high, at 500ms. Why did you
configure it that low, and didn't you make this configurable?

- You seem to be using WAL in the STIR index, while it doesn't seem
that relevant for the use case of auxiliary indexes that won't return
any data and are only used on the primary. It would imply that the
data is being sent to replicas and more data being written than
strictly necessary, which to me seems wasteful.

- The locking in stirinsert can probably be improved significantly if
we use things like atomic operations on STIR pages. We'd need an
exclusive lock only for page initialization, while share locks are
enough if the page's data is modified without WAL. That should improve
concurrent insert performance significantly, as it would further
reduce the length of the exclusively locked hot path.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: POC, WIP: OR-clause support for indexes

2024-08-06 Thread Alexander Korotkov

On Mon, Aug 5, 2024 at 11:24 PM Alena Rybakina
 wrote:
> Ok, thank you for your work)
>
> I think we can leave only the two added libraries in the first patch,
> others are superfluous.

Thank you.
I also have fixed some grammar issues.

--
Regards,
Alexander Korotkov
Supabase


v32-0001-Transform-OR-clauses-to-SAOP-s-during-index-matc.patch
Description: Binary data


v32-0002-Teach-bitmap-path-generation-about-transforming-.patch
Description: Binary data

Re: Fix memory counter update in reorderbuffer

2024-08-06 Thread Masahiko Sawada

On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila  wrote:
>
> On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada  wrote:
> >
> > I found a bug in the memory counter update in reorderbuffer. It was
> > introduced by commit 5bec1d6bc5e, so pg17 and master are affected.
> >
> > In ReorderBufferCleanupTXN() we zero the transaction size and then
> > free the transaction entry as follows:
> >
> > /* Update the memory counter */
> > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);
> >
> > /* deallocate */
> > ReorderBufferReturnTXN(rb, txn);
> >
>
> Why do we need to zero the transaction size explicitly? Shouldn't it
> automatically become zero after freeing all the changes?

It will become zero after freeing all the changes. However, since
updating the max-heap when freeing each change could bring some
overhead, we freed the changes without updating the memory counter,
and then zerod it.

>
> > However, if the transaction entry has toast changes we free them in
> > ReorderBufferToastReset() called from ReorderBufferToastReset(),
> >
>
> Here, you mean ReorderBufferToastReset() called from
> ReorderBufferReturnTXN(), right?

Right. Thank you for pointing it out.

> BTW, commit 5bec1d6bc5e also introduced
> ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which
> is also worth considering while fixing the reported problem. It may
> not have the same problem as you have reported but we can consider
> whether setting txn size as zero explicitly is required or not.

The reason why it introduced ReorderBufferChangeMemoryUpdate() is the
same as I mentioned above. And yes, as you mentioned, it doesn't have
the same problem that I reported here.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Logical Replication of sequences

2024-08-06 Thread Amit Kapila

On Tue, Aug 6, 2024 at 5:13 PM vignesh C  wrote:
>
> On Mon, 5 Aug 2024 at 18:05, shveta malik  wrote:
> >
> > On Mon, Aug 5, 2024 at 11:04 AM vignesh C  wrote:
> > >
> > > On Wed, 31 Jul 2024 at 14:39, shveta malik  wrote:
> > > >
> > > > On Mon, Jun 10, 2024 at 5:00 PM vignesh C  wrote:
> > > > >
> > > > > On Mon, 10 Jun 2024 at 12:24, Amul Sul  wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 8, 2024 at 6:43 PM vignesh C  
> > > > > > wrote:
> > > > > >>
> > > > > >> On Wed, 5 Jun 2024 at 14:11, Amit Kapila  
> > > > > >> wrote:
> > > > > >> [...]
> > > > > >> A new catalog table, pg_subscription_seq, has been introduced for
> > > > > >> mapping subscriptions to sequences. Additionally, the sequence LSN
> > > > > >> (Log Sequence Number) is stored, facilitating determination of
> > > > > >> sequence changes occurring before or after the returned sequence
> > > > > >> state.
> > > > > >
> > > > > >
> > > > > > Can't it be done using pg_depend? It seems a bit excessive unless 
> > > > > > I'm missing
> > > > > > something.
> > > > >
> > > > > We'll require the lsn because the sequence LSN informs the user that
> > > > > it has been synchronized up to the LSN in pg_subscription_seq. Since
> > > > > we are not supporting incremental sync, the user will be able to
> > > > > identify if he should run refresh sequences or not by checking the lsn
> > > > > of the pg_subscription_seq and the lsn of the sequence(using
> > > > > pg_sequence_state added) in the publisher.
> > > >
> > > > How the user will know from seq's lsn that he needs to run refresh.
> > > > lsn indicates page_lsn and thus the sequence might advance on pub
> > > > without changing lsn and thus lsn may look the same on subscriber even
> > > > though a sequence-refresh is needed. Am I missing something here?
> > >
> > > When a sequence is synchronized to the subscriber, the page LSN of the
> > > sequence from the publisher is also retrieved and stored in
> > > pg_subscriber_rel as shown below:
> > > --- Publisher page lsn
> > > publisher=# select pg_sequence_state('seq1');
> > >  pg_sequence_state
> > > 
> > >  (0/1510E38,65,1,t)
> > > (1 row)
> > >
> > > --- Subscriber stores the publisher's page lsn for the sequence
> > > subscriber=# select * from pg_subscription_rel where srrelid = 16384;
> > >  srsubid | srrelid | srsubstate | srsublsn
> > > -+-++---
> > >16389 |   16384 | r  | 0/1510E38
> > > (1 row)
> > >
> > > If changes are made to the sequence, such as performing many nextvals,
> > > the page LSN will be updated. Currently the sequence values are
> > > prefetched for SEQ_LOG_VALS 32, so the lsn will not get updated for
> > > the prefetched values, once the prefetched values are consumed the lsn
> > > will get updated.
> > > For example:
> > > --- Updated LSN on the publisher (old lsn - 0/1510E38, new lsn - 
> > > 0/1558CA8)
> > > publisher=# select pg_sequence_state('seq1');
> > >   pg_sequence_state
> > > --
> > >  (0/1558CA8,143,22,t)
> > > (1 row)
> > >
> > > The user can then compare this updated value with the sequence's LSN
> > > in pg_subscription_rel to determine when to re-synchronize the
> > > sequence.
> >
> > Thanks for the details. But I was referring to the case where we are
> > in between pre-fetched values on publisher (say at 25th value), while
> > on subscriber we are slightly behind (say at 15th value), but page-lsn
> > will be the same on both. Since the subscriber is behind, a
> > sequence-refresh is needed on sub, but by looking at lsn (which is
> > same), one can not say that for sure.  Let me know if I have
> > misunderstood it.
>
> Yes, at present, if the value is within the pre-fetched range, we
> cannot distinguish it solely using the page_lsn.
>

This makes sense to me.

>
> However, the
> pg_sequence_state function also provides last_value and log_cnt, which
> can be used to handle these specific cases.
>

BTW, can we document all these steps for users to know when to refresh
the sequences, if not already documented?

-- 
With Regards,
Amit Kapila.

RE: Conflict detection and logging in logical replication

2024-08-06 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

> 
> Here is the V11 patch set which addressed above and Kuroda-san[1] comments.
>

Thanks for updating the patch. I read 0001 again and I don't have critical 
comments for now.
I found some cosmetic issues (e.g., lines should be shorter than 80 columns) and
attached the fix patch. Feel free to include in the next version.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



fixes_for_v11.patch
Description: fixes_for_v11.patch

Re: Windows default locale vs initdb

2024-08-06 Thread Thomas Munro

On Tue, Jul 23, 2024 at 11:19 AM Thomas Munro  wrote:
> On Tue, Jul 23, 2024 at 1:44 AM Andrew Dunstan  wrote:
> > I have an environment I can use for testing. But what exactly am I
> > testing? :-) Install a few "problem" language/region settings, switch
> > the system and ensure initdb runs ok?

I thought a bit more about what to do with the messy .UTF-8 situation
on Windows, and I think I might see a way forward that harmonises the
code and behaviour with Unix, and deletes a lot of special case code.
But it's only theories + CI so far.

0001, 0002:  As before, teach initdb.exe to choose eg "en-US" by default.

0003:  Force people to choose locales that match the database
encoding, as we do on Unix.  That is, forbid contradictory
combinations like --locale="English_United States.1252"
--encoding=UTF8, which are currently allowed (and the world is full of
such database clusters because that is how the EDB installer GUI makes
them).  The only allowed combinations for American English should now
be: --locale="en-US" --encoding="WIN1252", and --locale="en-US.UTF-8"
--encoding="UTF8".  You can still use the old names if you like, by
explicitly writing --locale="English_United States.1252", but the
encoding then has to be WIN1252.  It's crazy to mix them up, let's ban
that.

Obviously there is a pg_upgrade case to worry about there.  We'd have
to "fix" the now illegal combinations, and I don't know exactly how
yet.

0004:  Rip out the code that does extra wchar_t conversations for
collations.  If I've understood correctly, we don't need them: if you
have a .UTF-8 locale then your encoding is UTF-8 and should be able to
use strcoll_l() directly.  Right?

0005:  Something similar was being done for strftime().  And we might
as well use strftime_l() instead while we're here (part of general
movement to use _l functions and stop splattering setlocale() all over
the place, for the multithreaded future).

These patches pass on CI.  Do they give the expected results when used
on a real Windows system?

There are a few more places where we do wchar_t conversions that could
probably be stripped out too, if my assumptions are correct, and we
could dig further if the basic idea can be validated and people think
this is going in a good direction.
From 886815244ab43092562ae3118cd5588a2fad5bb2 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 20 Nov 2023 14:24:35 +1300
Subject: [PATCH v6 1/5] MinGW has GetLocaleInfoEx().

To use BCP 47 locale names like "en-US" without a suffix ".encoding", we
need to be able to call GetLocaleInfoEx() to look up the encoding.  That
was previously gated for MSVC only, but MinGW has had the function for
many years.  Remove that gating, because otherwise our MinGW build farm
animals would fail when a later commit switches to using the new names by
default.

There are probably other places where _MSC_VER is being used as a proxy
for detecting MinGW with an out-of-date idea about missing functions.

Discussion: https://postgr.es/m/CA%2BhUKGLsV3vTjPp7bOZBr3JTKp3Brkr9V0Qfmc7UvpWcmAQL4A%40mail.gmail.com
---
 src/port/chklocale.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index 8cb81c8640..a15b0d5349 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -204,7 +204,6 @@ win32_langinfo(const char *ctype)
 	char	   *r = NULL;
 	char	   *codepage;

-#if defined(_MSC_VER)
 	uint32		cp;
 	WCHAR		wctype[LOCALE_NAME_MAX_LENGTH];

@@ -229,7 +228,6 @@ win32_langinfo(const char *ctype)
 		}
 	}
 	else
-#endif
 	{
 		/*
 		 * Locale format on Win32 is _..  For
-- 
2.39.2

From 357751c04cdd3dc7dea1ee9409356d818af70d5d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 19 Jul 2022 06:31:17 +1200
Subject: [PATCH v6 2/5] Default to IETF BCP 47 locale names in initdb on
 Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Avoid selecting traditional Windows locale names written with English
words, because (1) they are unstable and explicitly not recommended for
use in databases and (2) they may contain non-ASCII characters, which we
can't put in our shared catalogs.  Since setlocale() returns such names,
on Windows use GetUserDefaultLocaleName() if the user didn't provide an
explicit locale.  It returns BCP 47 strings like "en-US".

Also update the documentation to recommend BCP 47 over the traditional
names when providing explicit values to initdb.

Reviewed-by: Juan José Santamaría Flecha 
Reviewed-by:
Discussion: https://postgr.es/m/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
---
 doc/src/sgml/charset.sgml | 13 +++--
 src/bin/initdb/initdb.c   | 31 +--
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 834cb30c85..adb21eb079 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -83,8 +83,17 @@ initdb --locale=sv_SE
 system un

Re: Logical Replication of sequences

2024-08-06 Thread Peter Smith

Hi Vignesh,

This is mostly a repeat of my previous mail from a while ago [1] but
includes some corrections, answers, and more examples. I'm going to
try to persuade one last time because the current patch is becoming
stable, so I wanted to revisit this syntax proposal before it gets too
late to change anything.

If there is some problem with the proposed idea please let me know
because I can see only the advantages and no disadvantages of doing it
this way.

~~~

The current patchset offers two forms of subscription refresh:
1. ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option
[= value] [, ... ] ) ]
2. ALTER SUBSCRIPTION name REFRESH PUBLICATION SEQUENCES

Since 'copy_data' is the only supported refresh_option, really it is more like:
1. ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( copy_data [=
true|false] ) ]
2. ALTER SUBSCRIPTION name REFRESH PUBLICATION SEQUENCES

~~~

I proposed previously that instead of having 2 commands for refreshing
subscriptions we should have a single refresh command:

ALTER SUBSCRIPTION name REFRESH PUBLICATION [TABLES|SEQUENCES] [ WITH
( copy_data [= true|false] ) ]

Why?

- IMO it is less confusing than having 2 commands that both refresh
sequences in slightly different ways

- It is more flexible because apart from refreshing everything, a user
can choose to refresh only tables or only sequences if desired; IMO
more flexibility is always good.

- There is no loss of functionality from the current implementation
AFAICT. You can still say "ALTER SUBSCRIPTION sub REFRESH PUBLICATION
SEQUENCES" exactly the same as the patchset allows.

- The implementation code will become simpler. For example, the
current implementation of AlterSubscription_refresh(...) includes the
(hacky?) 'resync_all_sequences' parameter and has an overcomplicated
relationship with other parameters as demonstrated by the assertions
below. IMO using the prosed syntax means this coding will become not
only simpler, but shorter too.
+ /* resync_all_sequences cannot be specified with refresh_tables */
+ Assert(!(resync_all_sequences && refresh_tables));
+
+ /* resync_all_sequences cannot be specified with copy_data as false */
+ Assert(!(resync_all_sequences && !copy_data));

~~~

So, to continue this proposal, let the meaning of 'copy_data' for
SEQUENCES be as follows:

- when copy_data == false: it means don't copy data (i.e. don't
synchronize anything). Add/remove sequences from pg_subscriber_rel as
needed.

- when copy_data == true: it means to copy data (i.e. synchronize) for
all sequences. Add/remove sequences from pg_subscriber_rel as needed)


~~~

EXAMPLES using the proposed syntax:

Refreshing TABLES only...

ex1.
ALTER SUBSCRIPTION sub REFRESH PUBLICATION TABLES WITH (copy_data = false)
- same as PG17 functionality for "ALTER SUBSCRIPTION sub REFRESH
PUBLICATION WITH (copy_data = false)"

ex2.
ALTER SUBSCRIPTION sub REFRESH PUBLICATION TABLES WITH (copy_data = true)
- same as PG17 functionality for "ALTER SUBSCRIPTION sub REFRESH
PUBLICATION WITH (copy_data = true)"

ex3. (using default copy_data)
ALTER SUBSCRIPTION sub REFRESH PUBLICATION TABLES
- same as ex2.

~

Refreshing SEQUENCES only...

ex4.
ALTER SUBSCRIPTION sub REFRESH PUBLICATION SEQUENCES WITH (copy data = false)
- this adds/removes only sequences to pg_subscription_rel but doesn't
update the sequence values

ex5.
ALTER SUBSCRIPTION sub REFRESH PUBLICATION SEQUENCES WITH (copy data = true)
- this adds/removes only sequences to pg_subscription_rel and also
updates (synchronizes) all sequence values.
- same functionality as "ALTER SUBSCRIPTION sub REFRESH PUBLICATION
SEQUENCES" in your current patchset

ex6. (using default copy_data)
ALTER SUBSCRIPTION sub REFRESH PUBLICATION SEQUENCES
- same as ex5.
- note, that this command has the same syntax and functionality as the
current patchset

~~~

When no object_type is specified it has intuitive meaning to refresh
both TABLES and SEQUENCES...

ex7.
ALTER SUBSCRIPTION sub REFRESH PUBLICATION WITH (copy_data = false)
- For tables, it is the same as the PG17 functionality
- For sequences it includes the same behaviour of ex4.

ex8.
ALTER SUBSCRIPTION sub REFRESH PUBLICATION WITH (copy_data = true)
- For tables, it is the same as the PG17 functionality
- For sequences it includes the same behaviour of ex5.
- There is one subtle difference from the current patchset because
this proposal will synchronize *all* sequences instead of only new
ones. But, this is a good thing. The current documentation is
complicated by having to explain the differences between REFRESH
PUBLICATION and REFRESH PUBLICATION SEQUENCES. The current patchset
also raises questions like how the user chooses whether to use
"REFRESH PUBLICATION SEQUENCES" versus "REFRESH PUBLICATION WITH
(copy_data=true)". OTHO, the proposed syntax eliminates ambiguity.

ex9. (using default copy_data)
ALTER SUBSCRIPTION sub REFRESH PUBLICATION
- same as ex8

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuFH1O

Re: Logical Replication of sequences

2024-08-06 Thread shveta malik

On Mon, Aug 5, 2024 at 10:26 AM vignesh C  wrote:
>
> On Thu, 1 Aug 2024 at 04:25, Peter Smith  wrote:
> >
> > Hi Vignesh,
> >
> > I noticed that when replicating sequences (using the latest patches
> > 0730_2*)  the subscriber-side checks the *existence* of the sequence,
> > but apparently it is not checking other sequence attributes.
> >
> > For example, consider:
> >
> > Publisher: "CREATE SEQUENCE s1 START 1 INCREMENT 2;" should be a
> > sequence of only odd numbers.
> > Subscriber: "CREATE SEQUENCE s1 START 2 INCREMENT 2;" should be a
> > sequence of only even numbers.
> >
> > Because the names match, currently the patch allows replication of the
> > s1 sequence. I think that might lead to unexpected results on the
> > subscriber. IMO it might be safer to report ERROR unless the sequences
> > match properly (i.e. not just a name check).
> >
> > Below is a demonstration the problem:
> >
> > ==
> > Publisher:
> > ==
> >
> > (publisher sequence is odd numbers)
> >
> > test_pub=# create sequence s1 start 1 increment 2;
> > CREATE SEQUENCE
> > test_pub=# select * from nextval('s1');
> >  nextval
> > -
> >1
> > (1 row)
> >
> > test_pub=# select * from nextval('s1');
> >  nextval
> > -
> >3
> > (1 row)
> >
> > test_pub=# select * from nextval('s1');
> >  nextval
> > -
> >5
> > (1 row)
> >
> > test_pub=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES;
> > CREATE PUBLICATION
> > test_pub=#
> >
> > ==
> > Subscriber:
> > ==
> >
> > (subscriber sequence is even numbers)
> >
> > test_sub=# create sequence s1 start 2 increment 2;
> > CREATE SEQUENCE
> > test_sub=# SELECT * FROM nextval('s1');
> >  nextval
> > -
> >2
> > (1 row)
> >
> > test_sub=# SELECT * FROM nextval('s1');
> >  nextval
> > -
> >4
> > (1 row)
> >
> > test_sub=# SELECT * FROM nextval('s1');
> >  nextval
> > -
> >6
> > (1 row)
> >
> > test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
> > PUBLICATION pub1;
> > 2024-08-01 08:43:04.198 AEST [24325] WARNING:  subscriptions created
> > by regression test cases should have names starting with "regress_"
> > WARNING:  subscriptions created by regression test cases should have
> > names starting with "regress_"
> > NOTICE:  created replication slot "sub1" on publisher
> > CREATE SUBSCRIPTION
> > test_sub=# 2024-08-01 08:43:04.294 AEST [26240] LOG:  logical
> > replication apply worker for subscription "sub1" has started
> > 2024-08-01 08:43:04.309 AEST [26244] LOG:  logical replication
> > sequence synchronization worker for subscription "sub1" has started
> > 2024-08-01 08:43:04.323 AEST [26244] LOG:  logical replication
> > synchronization for subscription "sub1", sequence "s1" has finished
> > 2024-08-01 08:43:04.323 AEST [26244] LOG:  logical replication
> > sequence synchronization worker for subscription "sub1" has finished
> >
> > (after the CREATE SUBSCRIPTION we are getting replicated odd values
> > from the publisher, even though the subscriber side sequence was
> > supposed to be even numbers)
> >
> > test_sub=# SELECT * FROM nextval('s1');
> >  nextval
> > -
> >7
> > (1 row)
> >
> > test_sub=# SELECT * FROM nextval('s1');
> >  nextval
> > -
> >9
> > (1 row)
> >
> > test_sub=# SELECT * FROM nextval('s1');
> >  nextval
> > -
> >   11
> > (1 row)
> >
> > (Looking at the description you would expect odd values for this
> > sequence to be impossible)

I see that for such even sequences, user can still do 'setval'  to a
odd number and then nextval will keep on returning odd value.

postgres=# SELECT nextval('s1');
   6

postgres=SELECT setval('s1', 43);
 43

postgres=# SELECT nextval('s1');
 45

> > test_sub=# \dS+ s1
> >  Sequence "public.s1"
> >   Type  | Start | Minimum |   Maximum   | Increment | Cycles? | 
> > Cache
> > +---+-+-+---+-+---
> >  bigint | 2 |   1 | 9223372036854775807 | 2 | no  | 
> > 1
>
> Even if we check the sequence definition during the CREATE
> SUBSCRIPTION/ALTER SUBSCRIPTION ... REFRESH PUBLICATION or ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES commands, there's still
> a chance that the sequence definition might change after the command
> has been executed. Currently, there's no mechanism to lock a sequence,
> and we also permit replication of table data even if the table
> structures differ, such as mismatched data types like int and
> smallint. I have modified it to log a warning to inform users that the
> sequence options on the publisher and subscriber are not the same and
> advise them to ensure that the sequence definitions are consistent
> between both.
> The v20240805 version patch attached at [1] has the changes for the same.
> [1] - 
> https://www.postgresql.org/message-id/CALDaNm1Y_ot-jFRfmtwDuwmFrgSSYHjVuy28RspSopTtwzXy8w%40mail.gmail.com

The

Re: Conflict detection and logging in logical replication

2024-08-06 Thread Amit Kapila

On Tue, Aug 6, 2024 at 1:45 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, August 5, 2024 6:52 PM Amit Kapila  wrote:
> >
> > On Sun, Aug 4, 2024 at 1:22 PM Zhijie Hou (Fujitsu) 
> > wrote:
> > >
> > > On Friday, August 2, 2024 7:03 PM Amit Kapila 
> > wrote:
> > > >
> > >
> > > Here is the V11 patch set which addressed above and Kuroda-san[1]
> > comments.
> > >
> >
> > A few design-level points:
> >
> > *
> > @@ -525,10 +602,33 @@ ExecSimpleRelationInsert(ResultRelInfo
> > *resultRelInfo,
> >   /* OK, store the tuple and create index entries for it */
> >   simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
> >
> > + conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
> > +
> >   if (resultRelInfo->ri_NumIndices > 0)
> >   recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
> > -slot, estate, false, false,
> > -NULL, NIL, false);
> > +slot, estate, false,
> > +conflictindexes ? true : false,
> > +&conflict,
> > +conflictindexes, false);
> > +
> > + /*
> > + * Checks the conflict indexes to fetch the conflicting local tuple
> > + * and reports the conflict. We perform this check here, instead of
> > + * performing an additional index scan before the actual insertion and
> > + * reporting the conflict if any conflicting tuples are found. This is
> > + * to avoid the overhead of executing the extra scan for each INSERT
> > + * operation, even when no conflict arises, which could introduce
> > + * significant overhead to replication, particularly in cases where
> > + * conflicts are rare.
> > + *
> > + * XXX OTOH, this could lead to clean-up effort for dead tuples added
> > + * in heap and index in case of conflicts. But as conflicts shouldn't
> > + * be a frequent thing so we preferred to save the performance overhead
> > + * of extra scan before each insertion.
> > + */
> > + if (conflict)
> > + CheckAndReportConflict(resultRelInfo, estate, CT_INSERT_EXISTS,
> > +recheckIndexes, slot);
> >
> > I was thinking about this case where we have some pros and cons of doing
> > additional scans only after we found the conflict. I was wondering how we 
> > will
> > handle the resolution strategy for this when we have to remote_apply the 
> > tuple
> > for insert_exists/update_exists cases.
> > We would have already inserted the remote tuple in the heap and index before
> > we found the conflict which means we have to roll back that change and then
> > start a forest transaction to perform remote_apply which probably has to
> > update the existing tuple. We may have to perform something like speculative
> > insertion and then abort it. That doesn't sound cheap either. Do you have 
> > any
> > better ideas?
>
> Since most of the codes of conflict detection can be reused in the later
> resolution patch. I am thinking we can go for re-scan after insertion approach
> for detection patch. Then in resolution patch we can probably have a check in
> the patch that if the resolver is remote_apply/last_update_win we detect
> conflict before, otherwise detect it after. This way we can save an
> subscription option in the detection patch because we are not introducing 
> overhead
> for the detection.
>

Sounds reasonable to me.

>
>
> >
> > *
> > -ERROR:  duplicate key value violates unique constraint "test_pkey"
> > -DETAIL:  Key (c)=(1) already exists.
> > +ERROR:  conflict insert_exists detected on relation "public.test"
> > +DETAIL:  Key (c)=(1) already exists in unique index "t_pkey", which
> > was modified locally in transaction 740 at 2024-06-26 10:47:04.727375+08.
> >
> > I think the format to display conflicts is not very clear. The conflict 
> > should be
> > apparent just by seeing the LOG/ERROR message. I am thinking of something
> > like below:
> >
> > LOG: CONFLICT: ;
> > DESCRIPTION: If any .. ; RESOLUTION: (This one can be added later)
> > DEATAIL: remote_tuple (tuple values); local_tuple (tuple values);
> >
> > With the above, one can easily identify the conflict's reason and action 
> > taken by
> > apply worker.
>
> Thanks for the idea! I thought about few styles based on the suggested format,
> what do you think about the following ?
>
> ---
> Version 1
> ---
> LOG: CONFLICT: insert_exists; DESCRIPTION: remote INSERT violates unique 
> constraint "uniqueindex" on relation "public.test".
> DETAIL: Existing local tuple (a, b, c) = (2, 3, 4) 
> xid=123,origin="pub",timestamp=xxx; remote tuple (a, b, c) = (2, 4, 5).
>

Won't this case be ERROR? If so, the error message format like the
above appears odd to me because in some cases, the user may want to
add some filter based on the error message though that is not ideal.
Also, the primary error message starts with a small case letter and
should be short.

> LOG: CONFLICT: update_differ; DESCRIPTION: updating a row with key (a, b) = 
> (2, 4) on relation "public.test" was modified by a different source.
> DETAIL: Existing local tuple (a, b, c) = (2, 3, 4) 
> xid=123,origin="pub",timestamp=xxx; remote tuple (a, b,

Re: Pgoutput not capturing the generated columns

2024-08-06 Thread Shubham Khanna

On Mon, Aug 5, 2024 at 9:15 AM Peter Smith  wrote:
>
> Hi,
>
> Writing many new test case combinations has exposed a possible bug in
> patch 0001.
>
> In my previous post [1] there was questionable behaviour when
> replicating from a normal (not generated) column on the publisher side
> to a generated column on the subscriber side. Initially, I thought the
> test might have exposed a possible PG17 bug, but now I think it has
> really found a bug in patch 0001.
>
> ~~~
>
> Previously (PG17) this would fail consistently both during COPY and
> during normal replication.Now, patch 0001 has changed this behaviour
> -- it is not always failing anymore.
>
> The patch should not be impacting this existing behaviour. It only
> introduces a new 'include_generated_columns', but since the publisher
> side is not a generated column I do not expect there should be any
> difference in behaviour for this test case. IMO the TAP test expected
> results should be corrected for this scenario. And fix the bug.
>
> Below is an example demonstrating PG17 behaviour.
>
> ==
>
>
> Publisher:
> --
>
> (notice column "b" is not generated)
>
> test_pub=# CREATE TABLE tab_nogen_to_gen (a int, b int);
> CREATE TABLE
> test_pub=# INSERT INTO tab_nogen_to_gen VALUES (1,101),(2,102);
> INSERT 0 2
> test_pub=# CREATE PUBLICATION pub1 for TABLE tab_nogen_to_gen;
> CREATE PUBLICATION
> test_pub=#
>
> Subscriber:
> ---
>
> (notice corresponding column "b" is generated)
>
> test_sub=# CREATE TABLE tab_nogen_to_gen (a int, b int GENERATED
> ALWAYS AS (a * 22) STORED);
> CREATE TABLE
> test_sub=#
>
> Try to create a subscription. Notice we get the error: ERROR:  logical
> replication target relation "public.tab_nogen_to_gen" is missing
> replicated column: "b"
>
> test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
> PUBLICATION pub1;
> 2024-08-05 13:16:40.043 AEST [20957] WARNING:  subscriptions created
> by regression test cases should have names starting with "regress_"
> WARNING:  subscriptions created by regression test cases should have
> names starting with "regress_"
> NOTICE:  created replication slot "sub1" on publisher
> CREATE SUBSCRIPTION
> test_sub=# 2024-08-05 13:16:40.105 AEST [29258] LOG:  logical
> replication apply worker for subscription "sub1" has started
> 2024-08-05 13:16:40.117 AEST [29260] LOG:  logical replication table
> synchronization worker for subscription "sub1", table
> "tab_nogen_to_gen" has started
> 2024-08-05 13:16:40.172 AEST [29260] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:16:40.173 AEST [20039] LOG:  background worker "logical
> replication tablesync worker" (PID 29260) exited with exit code 1
> 2024-08-05 13:16:45.187 AEST [29400] LOG:  logical replication table
> synchronization worker for subscription "sub1", table
> "tab_nogen_to_gen" has started
> 2024-08-05 13:16:45.285 AEST [29400] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:16:45.286 AEST [20039] LOG:  background worker "logical
> replication tablesync worker" (PID 29400) exited with exit code 1
> ...
>
> Create the subscription again, but this time with copy_data = false
>
> test_sub=# CREATE SUBSCRIPTION sub1_nocopy CONNECTION
> 'dbname=test_pub' PUBLICATION pub1 WITH (copy_data = false);
> 2024-08-05 13:22:57.719 AEST [20957] WARNING:  subscriptions created
> by regression test cases should have names starting with "regress_"
> WARNING:  subscriptions created by regression test cases should have
> names starting with "regress_"
> NOTICE:  created replication slot "sub1_nocopy" on publisher
> CREATE SUBSCRIPTION
> test_sub=# 2024-08-05 13:22:57.765 AEST [7012] LOG:  logical
> replication apply worker for subscription "sub1_nocopy" has started
>
> test_sub=#
>
> ~~~
>
> Then insert data from the publisher to see what happens for normal 
> replication.
>
> test_pub=#
> test_pub=# INSERT INTO tab_nogen_to_gen VALUES (3,103),(4,104);
> INSERT 0 2
>
> ~~~
>
> Notice the subscriber gets the same error as before: ERROR:  logical
> replication target relation "public.tab_nogen_to_gen" is missing
> replicated column: "b"
>
> 2024-08-05 13:25:14.897 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 10957) exited with exit code 1
> 2024-08-05 13:25:19.933 AEST [11095] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:19.966 AEST [11095] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:19.966 AEST [11095] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:19.968 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11095) exited with exit code 1
> 2024-08-05 13:25:24.917 AEST [

Re: Restart pg_usleep when interrupted

Hi,

On Tue, Aug 06, 2024 at 12:36:44PM -0500, Sami Imseih wrote:

> 
> 
> v5 takes care of the latest comments by Bertrand.
> 

Thanks!

Please find attached a rebase version (due to 39a138fbef) and in passing I 
changed
one comment:

"add t in microseconds to a instr_time" -> "add t (in microseconds) to x"

Does that make sense to you?

That said, it looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 619d4fea1eada6fb65fec673bc9600f8502f9b00 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 7 Aug 2024 05:04:10 +
Subject: [PATCH v6] vaccum_delay with absolute time nanosleep

---
 src/backend/commands/vacuum.c| 47 +++-
 src/include/portability/instr_time.h | 10 ++
 2 files changed, 56 insertions(+), 1 deletion(-)
  81.6% src/backend/commands/
  18.3% src/include/portability/

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..0ed188a083 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -116,6 +116,51 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static void vacuum_sleep(double msec);
+
+static
+void
+vacuum_sleep(double msec)
+{
+	long		microsec = msec * 1000;
+
+	if (microsec > 0)
+	{
+#ifndef WIN32
+		/*
+		 * We allow nanosleep to handle interrupts and retry with the
+		 * remaining time. However, frequent interruptions and restarts of the
+		 * nanosleep calls can substantially lead to drift in the time when
+		 * the sleep finally completes. To deal with this, we break out of the
+		 * loop whenever the current time is past the expected end time of the
+		 * sleep.
+		 */
+		struct timespec delay;
+		struct timespec remain;
+		instr_time	end_time;
+
+		INSTR_TIME_SET_CURRENT(end_time);
+		INSTR_TIME_ADD_MICROSEC(end_time, microsec);
+
+		delay.tv_sec = microsec / 100L;
+		delay.tv_nsec = (microsec % 100L) * 1000;
+
+		while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
+		{
+			instr_time	current_time;
+
+			INSTR_TIME_SET_CURRENT(current_time);
+
+			if (INSTR_TIME_IS_GREATER(current_time, end_time))
+break;
+
+			delay = remain;
+		}
+#else
+		SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
+#endif
+	}
+}
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -2384,7 +2429,7 @@ vacuum_delay_point(void)
 			msec = vacuum_cost_delay * 4;
 
 		pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
-		pg_usleep(msec * 1000);
+		vacuum_sleep(msec);
 		pgstat_report_wait_end();
 
 		/*
diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index e66ecf34cd..754ea77c43 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -36,6 +36,10 @@
  *
  * INSTR_TIME_GET_NANOSEC(t)		convert t to int64 (in nanoseconds)
  *
+ * INSTR_TIME_ADD_MICROSEC(x,t)		add t (in microseconds) to x
+ *
+ * INSTR_TIME_IS_GREATER(x,y)		is x greater than y?
+ *
  * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert
  * absolute times to intervals.  The INSTR_TIME_GET_xxx operations are
  * only useful on intervals.
@@ -194,4 +198,10 @@ GetTimerFrequency(void)
 #define INSTR_TIME_GET_MICROSEC(t) \
 	(INSTR_TIME_GET_NANOSEC(t) / NS_PER_US)
 
+#define INSTR_TIME_ADD_MICROSEC(x,t) \
+	((x).ticks += t * NS_PER_US)
+
+#define INSTR_TIME_IS_GREATER(x,y) \
+	((x).ticks > (y).ticks)
+
 #endif			/* INSTR_TIME_H */
-- 
2.34.1

Re: Fix memory counter update in reorderbuffer

2024-08-06 Thread Amit Kapila

On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada  wrote:
>
> On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila  wrote:
> >
> > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada  
> > wrote:
> > >
> > > I found a bug in the memory counter update in reorderbuffer. It was
> > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected.
> > >
> > > In ReorderBufferCleanupTXN() we zero the transaction size and then
> > > free the transaction entry as follows:
> > >
> > > /* Update the memory counter */
> > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);
> > >
> > > /* deallocate */
> > > ReorderBufferReturnTXN(rb, txn);
> > >
> >
> > Why do we need to zero the transaction size explicitly? Shouldn't it
> > automatically become zero after freeing all the changes?
>
> It will become zero after freeing all the changes. However, since
> updating the max-heap when freeing each change could bring some
> overhead, we freed the changes without updating the memory counter,
> and then zerod it.
>

I think this should be covered in comments as it is not apparent.

>
> > BTW, commit 5bec1d6bc5e also introduced
> > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which
> > is also worth considering while fixing the reported problem. It may
> > not have the same problem as you have reported but we can consider
> > whether setting txn size as zero explicitly is required or not.
>
> The reason why it introduced ReorderBufferChangeMemoryUpdate() is the
> same as I mentioned above. And yes, as you mentioned, it doesn't have
> the same problem that I reported here.
>

I checked again and found that ReorderBufferResetTXN() first calls
ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After
that, it also tries to free spec_insert change which should have the
same problem. So, what saves this path from the same problem?

*
+ /*
+ * Update the memory counter of the transaction, removing it from
+ * the max-heap.
+ */
+ ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);
+ Assert(txn->size == 0);
+
  pfree(txn);

Just before freeing the TXN, updating the size looks odd though I
understand the idea is to remove TXN from max_heap. Anyway, let's
first discuss whether the same problem exists in
ReorderBufferResetTXN() code path, and if so, how we want to fix it.

-- 
With Regards,
Amit Kapila.

Remove TRACE_SORT macro?