Re: Clock-skew management in logical replication

2024-09-20 Thread shihao zhong
Nisha Moond  writes:
> Thoughts? Looking forward to hearing others' opinions!

Had a productive conversation with Amit Kaplia today about time skew
in distributed systems, and wanted to share some thoughts.
Essentially, we're grappling with the classic distributed snapshot
problem. In a multi-active environment, where multiple nodes can
independently process transactions,  it becomes crucial to determine
the visibility of these transactions across the system.  Time skew,
where different machines have different timestamps make it a hard
problem. How can we ensure consistent transaction ordering and
visibility when time itself is unreliable?

As you mentioned, there are several ways to tackle the time skew
problem in distributed systems. These approaches generally fall into
three main categories:

1. Centralized Timestamps (Timestamp Oracle)

Mechanism: A dedicated server acts as a single source of truth for
time, eliminating skew by providing timestamps to all nodes. Google
Percolator and TiDB use this approach.
Consistency level: Serializable
Pros: Simple to implement.
Cons: High latency for cross-geo transactions due to reliance on a
central server. Can become a bottleneck.

2. Atomic Clocks (True Time)

Mechanism: Utilizes highly accurate atomic clocks to provide a
globally consistent view of time, as seen in Google Spanner.
Consistency level: External Serializable
Pros: Very high consistency level (externally consistent).
Cons: Requires specialized and expensive hardware. Adds some latency
to transactions, though less than centralized timestamps.

3. Hybrid Logical Clocks

Mechanism: CombinesNTP for rough time synchronization with logical
clocks for finer-grained ordering. Yugabyte and CockroachDB employ
this strategy.
Consistency level: Serializable
Pros: Avoids the need for specialized hardware.
Cons: Can introduce significant latency to transactions.

4 Local Clocks

Mechanism: Just use logical clock
Consistency level: Eventual Consistency
Pros: Simple implementation
Cons: The consistency level is very low

Of the four implementations considered, only local clocks and the HLC
approach offer a 'pure database' solution. Given PostgreSQL's
practical use cases, I recommend starting with a local clock
implementation. However, recognizing the increasing prevalence of
distributed clock services, we should also implement a pluggable time
access method. This allows users to integrate with different time
services as needed.

In the mid-term, implementing the HLC approach would provide highly
consistent snapshot reads. This offers a significant advantage for
many use cases.

Long-term, we should consider integrating with a distributed time
service like AWS Time Sync Service. This ensures high accuracy and
scalability for demanding applications.

Thanks,
Shihao




Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Fujii Masao




On 2024/09/21 12:09, Tom Lane wrote:

Fujii Masao  writes:

I don’t have any objections to this commit, but I’d like to confirm
whether we really want to proactively reorder #include directives,
even for standard C library headers.


I'm hesitant to do that.  We can afford to insist that our own header
files be inclusion-order-independent, because we have the ability to
fix any problems that might arise.  We have no ability to do something
about it if the system headers on $random_platform have inclusion
order dependencies.  (In fact, I'm fairly sure there are already
places in plperl and plpython where we know we have to be careful
about inclusion order around those languages' headers.)

So I would tread pretty carefully around making changes of this
type, especially in long-established code.  I have no reason to
think that the committed patch will cause any problems, but
I do think it's mostly asking for trouble.


Sounds reasonable to me.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Tom Lane
Fujii Masao  writes:
> I don’t have any objections to this commit, but I’d like to confirm
> whether we really want to proactively reorder #include directives,
> even for standard C library headers.

I'm hesitant to do that.  We can afford to insist that our own header
files be inclusion-order-independent, because we have the ability to
fix any problems that might arise.  We have no ability to do something
about it if the system headers on $random_platform have inclusion
order dependencies.  (In fact, I'm fairly sure there are already
places in plperl and plpython where we know we have to be careful
about inclusion order around those languages' headers.)

So I would tread pretty carefully around making changes of this
type, especially in long-established code.  I have no reason to
think that the committed patch will cause any problems, but
I do think it's mostly asking for trouble.

regards, tom lane




Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Fujii Masao




On 2024/09/21 5:20, Nathan Bossart wrote:

On Fri, Sep 20, 2024 at 01:56:16PM -0500, Nathan Bossart wrote:

On Fri, Sep 20, 2024 at 07:20:15PM +0200, Michael Banck wrote:

I noticed two headers are not in alphabetical order in pg_checkums.c,
patch attached.


This appears to be commit 280e5f1's fault.  Will fix.


Committed, thanks!


I don’t have any objections to this commit, but I’d like to confirm
whether we really want to proactively reorder #include directives,
even for standard C library headers. I’m asking because I know there are
several source files, like xlog.c and syslogger.c, where such #include
directives aren't in alphabetical order. I understand we usually reorder
#include directives for PostgreSQL header files, though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: meson and check-tests

2024-09-20 Thread jian he
On Fri, Sep 20, 2024 at 6:25 PM Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> I’ve been working on a patch and wanted to share my approach, which
> might be helpful for others. The patch removes the '--schedule' and
> '${schedule_file}' options from the regress/regress test command when
> the TESTS environment variable is set. Instead, it appends the TESTS
> variable to the end of the command.
>
> Please note that setup suite tests (at least tmp_install and
> initdb_cache) should be executed before running these tests. One
> drawback is that while the Meson logs will still show the test command
> with the '--schedule' and '${schedule_file}' options, the actual
> command used will be changed.
>
> Some examples after the patched build:
>
> $ meson test --suite regress -> fails
> $ TESTS="create_table copy jsonb" meson test --suite regress -> fails
> ### run required setup suite tests
> $ meson test tmp_install
> $ meson test initdb_cache
> ###
> $ meson test --suite regress -> passes (12s)
> $ TESTS="copy" meson test --suite regress -> passes (0.35s)
> $ TESTS="copy jsonb" meson test --suite regress -> passes (0.52s)
> $ TESTS='select_into' meson test --suite regress -> fails
> $ TESTS='test_setup select_into' meson test --suite regress -> passes (0.52s)
> $ TESTS='rangetypes multirangetypes' meson test --suite regress -> fails
> $ TESTS='test_setup multirangetypes rangetypes' meson test --suite
> regres -> fails
> $ TESTS='test_setup rangetypes multirangetypes' meson test --suite
> regress -> passes (0.91s)
>
> Any feedback would be appreciated.
>

hi. Thanks for your work!
I do find some issues.


TESTS="copy jsonb jsonb" meson test --suite regress
one will fail. not sure this is expected?

in [1] you mentioned "setup", but that "setup" is more or less like
"meson test  --suite setup --suite regress"
but originally, I thought was about "src/test/regress/sql/test_setup.sql".
for example, now you cannot run src/test/regress/sql/stats_ext.sql
without first running test_setup.sql, because some functions (like fipshash)
live in test_setup.sql.

so
TESTS="copy jsonb stats_ext" meson test --suite regress
will fail.

to make it work we need change it to
TESTS="test_setup copy jsonb stats_ext" meson test --suite regress

Many tests depend on test_setup.sql, maybe we can implicitly prepend it.
Another dependency issue. alter_table depending on create_index.

TESTS="test_setup alter_table" meson test --suite regress
will fail.
TESTS="test_setup create_index alter_table" meson test --suite regress
will work.


[1] 
https://www.postgresql.org/message-id/CAN55FZ3t%2BeDgKtsDoyi0UYwzbMkKDfqJgvsbamar9CvY_6qWPw%40mail.gmail.com




Re: meson and check-tests

2024-09-20 Thread Nazir Bilal Yavuz
Hi,

I’ve been working on a patch and wanted to share my approach, which
might be helpful for others. The patch removes the '--schedule' and
'${schedule_file}' options from the regress/regress test command when
the TESTS environment variable is set. Instead, it appends the TESTS
variable to the end of the command.

Please note that setup suite tests (at least tmp_install and
initdb_cache) should be executed before running these tests. One
drawback is that while the Meson logs will still show the test command
with the '--schedule' and '${schedule_file}' options, the actual
command used will be changed.

Some examples after the patched build:

$ meson test --suite regress -> fails
$ TESTS="create_table copy jsonb" meson test --suite regress -> fails
### run required setup suite tests
$ meson test tmp_install
$ meson test initdb_cache
###
$ meson test --suite regress -> passes (12s)
$ TESTS="copy" meson test --suite regress -> passes (0.35s)
$ TESTS="copy jsonb" meson test --suite regress -> passes (0.52s)
$ TESTS='select_into' meson test --suite regress -> fails
$ TESTS='test_setup select_into' meson test --suite regress -> passes (0.52s)
$ TESTS='rangetypes multirangetypes' meson test --suite regress -> fails
$ TESTS='test_setup multirangetypes rangetypes' meson test --suite
regres -> fails
$ TESTS='test_setup rangetypes multirangetypes' meson test --suite
regress -> passes (0.91s)

Any feedback would be appreciated.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 7c94889b553ffc294ddf9eba7c595ea629d24e91 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 20 Sep 2024 11:39:20 +0300
Subject: [PATCH v1] Add 'make check-tests' approach to the meson based builds

---
 src/tools/testwrap | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/tools/testwrap b/src/tools/testwrap
index 9a270beb72d..9180727b6ff 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -41,6 +41,17 @@ env_dict = {**os.environ,
 'TESTDATADIR': os.path.join(testdir, 'data'),
 'TESTLOGDIR': os.path.join(testdir, 'log')}
 
+# Symmetric behaviour with make check-tests. If TESTS environment variable is
+# set, only run these regression tests in regress/regress test. Note that setup
+# suite tests (at least tmp_install and initdb_cache tests) need to be run
+# before running these tests.
+if "TESTS" in env_dict and args.testgroup == 'regress' and args.testname == 'regress':
+elem = '--schedule'
+schedule_index = args.test_command.index(elem) if elem in args.test_command else -1
+if schedule_index >= 0:
+del args.test_command[schedule_index : schedule_index + 2]
+args.test_command.extend(env_dict["TESTS"].split(' '))
+
 sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
 # Meson categorizes a passing TODO test point as bad
 # (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
-- 
2.45.2



Clock-skew management in logical replication

2024-09-20 Thread Nisha Moond
Hello Hackers,
(CC people involved in the earlier discussion)

While considering the implementation of timestamp-based conflict
resolution (last_update_wins) in logical replication (see [1]), there
was a feedback at [2] and the discussion on whether or not to manage
clock-skew at database level. We tried to research the history of
clock-skew related discussions in Postgres itself and summarized that
at [3].

We also analyzed how other databases deal with it. Based on our
research, the other classic RDBMS  like Oracle and IBM, using similar
timestamp-based resolution methods,  do not address clock-skew at the
database level. Instead, they recommend using external time
synchronization solutions, such as NTP.

- Oracle while handling conflicts[2] assumes clocks are synchronized
and relies on external tools like NTP for time synchronization between
nodes[4].
- IBM Informix, similarly, recommends using their network commands to
ensure clock synchronization across nodes[5].

Other postgres dependent databases like EDB-BDR and YugabyteDB provide
GUC parameters to manage clock-skew within the database:

- EDB-BDR allows configuration of parameters like
bdr.maximum_clock_skew and bdr.maximum_clock_skew_action to define
acceptable skew and actions when it exceeds[6].
- YugabyteDB offers a GUC max_clock_skew_usec setting, which causes
the node to crash if the clock-skew exceeds the specified value[7].

There are, of course, other approaches to managing clock-skew used by
distributed systems, such as NTP daemons, centralized logical clocks,
atomic clocks (as in Google Spanner), and time sync services like
AWS[4].

Implementing any of these time-sync services for CDR seems quite a bit
of deviation and a big project in itself, which we are not sure is
really needed. At best, for users' aid, we should provide some GUCs
based implementation to handle clock-skew in logical replication. The
idea is that users should be able to handle clock-skew outside of the
database. But in worst case scenarios, users can rely on these GUCs.

We have attempted to implement a patch which manages clock-skew in
logical replication. It works  based on these new GUCs: (see [10] for
detailed discussion)

- max_logical_rep_clock_skew: Defines the tolerable limit for clock-skew.
- max_logical_rep_clock_skew_action: Configures the action when
clock-skew exceeds the limit.
- max_logical_rep_clock_skew_wait: Limits the maximum wait time if the
action is configured as "wait."

The proposed idea is implemented in attached patch v1. Thank you
Shveta for implementing it.
Thanks Kuroda-san for assisting in the research.

Thoughts? Looking forward to hearing others' opinions!

[1]: 
https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CAFiTN-uTycjZWdp1kEpN9w7b7SQpoGL5zyg_qZzjpY_vr2%2BKsg%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/CAA4eK1Jn4r-y%2BbkW%3DJaKCbxEz%3DjawzQAS1Z4wAd8jT%2B1B0RL2w%40mail.gmail.com
[4]: 
https://www.oracle.com/cn/a/tech/docs/technical-resources/wp-oracle-goldengate-activeactive-final2-1.pdf
[5]: 
https://docs.oracle.com/en/operating-systems/oracle-linux/8/network/network-ConfiguringNetworkTime.html
[6]: 
https://www.ibm.com/docs/en/informix-servers/14.10?topic=environment-time-synchronization
[7]:  
https://www.enterprisedb.com/docs/pgd/latest/reference/pgd-settings/#bdrmaximum_clock_skew
[8]:  
https://support.yugabyte.com/hc/en-us/articles/4403707404173-Too-big-clock-skew-leading-to-error-messages-or-tserver-crashes
[9]: 
https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-time-sync-service-microsecond-accurate-time/
[10]: 
https://www.postgresql.org/message-id/CAJpy0uDCW%2BvrBoUZWrBWPjsM%3D9wwpwbpZuZa8Raj3VqeVYs3PQ%40mail.gmail.com

--
Thanks,
Nisha


v1-0001-Implements-Clock-skew-management-between-nodes.patch
Description: Binary data


Re: Documentation to upgrade logical replication cluster

2024-09-20 Thread Amit Kapila
On Mon, May 6, 2024 at 10:40 AM vignesh C  wrote:
>
> The v9 version patch was not applying on top of HEAD because of few
> commits, the updated v10 version patch is rebased on top of HEAD.
>

Let's say publisher is in node1 and subscriber is
+  in node2. The subscriber node2 has
+  two subscriptions sub1_node1_node2 and
+  sub2_node1_node2 which are subscribing the changes
+  from node1.

Do we need to show multiple subscriptions? You are following the same
steps for both subscriptions, so it may not add much value to show
steps for two subscriptions. You can write steps for one and add a
note to say it has to be done for other subscriptions present.

-- 
With Regards,
Amit Kapila.




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-20 Thread Tomas Vondra
On 9/19/24 21:22, Peter Geoghegan wrote:
> On Mon, Sep 16, 2024 at 6:05 PM Tomas Vondra  wrote:
>> For example, one of the slowed down queries is query 702 (top of page 8
>> in the PDF). The query is pretty simple:
>>
>>   explain (analyze, timing off, buffers off)
>>   select id1,id2 from t_100_1000_1_2
>>where NOT (id1 in (:list)) AND (id2 = :value);
>>
>> and it was executed on a table with random data in two columns, each
>> with 1000 distinct values.
> 
> I cannot recreate this problem using the q702.sql repro you provided.
> Feels like I'm missing a step, because I find that skip scan wins
> nicely here.
> 

I don't know, I can reproduce it just fine. I just tried with v7.

What I do is this:

1) build master and patched versions:

  ./configure --enable-depend --prefix=/mnt/data/builds/$(build}/
  make -s clean
  make -s -j4 install

2) create a new cluster (default config), create DB, generate the data

3) restart cluster, drop caches

4) run the query from the SQL script

I suspect you don't do (3). I didn't mention this explicitly, my message
only said "with uncached data", so maybe that's the problem?


>> This is perfectly random data, so a great
>> match for the assumptions in costing etc.
> 
> FWIW, I wouldn't say that this is a particularly sympathetic case for
> skip scan. It's definitely still a win, but less than other cases I
> can imagine. This is due to the relatively large number of rows
> returned by the scan. Plus 1000 distinct leading values for a skip
> array isn't all that low, so we end up scanning over 1/3 of all of the
> leaf pages in the index.
> 

I wasn't suggesting it's a sympathetic case for skipscan. My point is
that it perfectly matches the costing assumptions, i.e. columns are
independent etc. But if it's not sympathetic, maybe the cost shouldn't
be 1/5 of cost from master?

> BTW, be careful to distinguish between leaf pages and internal pages
> when interpreting "Buffers:" output with the patch. Generally
> speaking, the patch repeats many internal page accesses, which needs
> to be taken into account when compare "Buffers:" counts against
> master. It's not uncommon for 3/4 or even 4/5 of all index page hits
> to be for internal pages with the patch. Whereas on master the number
> of internal page hits is usually tiny. This is one reason why the
> additional context provided by "Index Searches:" can be helpful.
> 

Yeah, I recall there's an issue with that.

>> But with uncached data, this runs in ~50 ms on master, but takes almost
>> 200 ms with skipscan (these timings are from my laptop, but similar to
>> the results).
> 
> Even 50ms seems really slow for your test case -- with or without my
> patch applied.
> 
> Are you sure that this wasn't an assert-enabled build? There's lots of
> extra assertions for the code paths used by skip scan for this, which
> could explain the apparent regression.
> 
> I find that this same query takes only ~2.056 ms with the patch. When
> I disabled skip scan locally via "set skipscan_prefix_cols = 0" (which
> should give me behavior that's pretty well representative of master),
> it takes ~12.039 ms. That's exactly what I'd expect for this query: a
> solid improvement, though not the really enormous ones that you'll see
> when skip scan is able to avoid reading many of the index pages that
> master reads.
> 

I'm pretty sure you're doing this on cached data, because 2ms is exactly
the timing I see in that case.


regards

-- 
Tomas Vondra




RE: Using per-transaction memory contexts for storing decoded tuples

2024-09-20 Thread Hayato Kuroda (Fujitsu)
Dear Sawada-san,

> Thank you for your interest in this patch. I've just shared some
> benchmark results (with a patch) that could be different depending on
> the environment[1]. I would be appreciated if you also do similar
> tests and share the results.

Okay, I did similar tests, the attached script is the test runner. 
rb_mem_block_size
was changed from 8kB to 8MB. Below table show the result (millisecond unit).
Each cell is the average of 5 runs.

==
8kB 12877.4
16kB12829.1
32kB11793.3
64kB13134.4
128kB   13353.1
256kB   11664.0
512kB   12603.4
1MB 13443.8
2MB 12469.0
4MB 12651.4
8MB 12381.4
==

The standard deviation of measurements was 100-500 ms, there were no noticeable
differences on my env as well.

Also, I've checked the statistics of the generation context, and confirmed the
number of allocated blocks is x1000 higher if the block size is changed 
8kB->8MB.
[1] shows the output from MemoryContextStats(), just in case. IIUC, the 
difference
of actual used space comes from the header of each block. Each block has 
attributes
for management so that the total usage becomes larger based on the number.

[1]
8kB
Tuples: 724959232 total in 88496 blocks (1000 chunks); 3328 free (0 
chunks); 724955904 used
Grand total: 724959232 bytes in 88496 blocks; 3328 free (0 chunks); 724955904 
used

8MB
Tuples: 721420288 total in 86 blocks (1000 chunks); 1415344 free (0 
chunks); 720004944 used
Grand total: 721420288 bytes in 86 blocks; 1415344 free (0 chunks); 720004944 
use

Best regards,
Hayato Kuroda
FUJITSU LIMITED



test.sh
Description: test.sh


FullTransactionIdAdvance question

2024-09-20 Thread Andy Fan


Hi,

static inline void
FullTransactionIdAdvance(FullTransactionId *dest)
{
dest->value++;

/* see FullTransactionIdAdvance() */
if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
return;

while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
dest->value++;
}

I understand this functiona as: 'dest->value++' increases the epoch when
necessary and we don't want use the TransactionId which is smaller than
FirstNormalTransactionId. But what is the point of the below code:

/* see FullTransactionIdAdvance() */
if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
return;

It looks to me it will be never true(I added a 'Assert(false);' above
the return, make check-world pass). and if it is true somehow, retruning
a XID which is smaller than FirstNormalTransactionId looks strange as
well. IIUC, should we remove it to save a prediction on each
GetNewTransactionId call?   

-- 
Best Regards
Andy Fan





Re: Allow logical failover slots to wait on synchronous replication

2024-09-20 Thread shveta malik
On Thu, Sep 19, 2024 at 12:02 PM Amit Kapila  wrote:
>
> On Tue, Sep 17, 2024 at 9:08 AM shveta malik  wrote:
> >
> > On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila  wrote:
> > >
> > > On Mon, Sep 16, 2024 at 2:55 PM shveta malik  
> > > wrote:
> > > >
> > > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila  
> > > > wrote:
> > > > >
> > > >
> > > > > Another question aside from the above point, what if someone has
> > > > > specified logical subscribers in synchronous_standby_names? In the
> > > > > case of synchronized_standby_slots, we won't proceed with such slots.
> > > > >
> > > >
> > > > Yes, it is a possibility. I have missed this point earlier. Now I
> > > > tried a case where I give a mix of logical subscriber and physical
> > > > standby in 'synchronous_standby_names' on pgHead, it even takes that
> > > > 'mix' configuration and starts waiting accordingly.
> > > >
> > > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
> > > > phy_standby_2)';
> > > >
> > >
> > > This should not happen as we don't support syncing failover slots on
> > > logical subscribers.
> >
> > +1
> >
> > > The other point to consider here is that the user
> > > may not have set 'sync_replication_slots' on all the physical standbys
> > > mentioned in 'synchronous_standby_names' and in that case, it doesn't
> > > make sense to wait for WAL to get flushed on those standbys. What do
> > > you think?
> > >
> >
> > Yes, it is a possibility. But then it is a possibility in case of
> > 'synchronized_standby_slots' as well. User may always configure one of
> > the standbys in  'synchronized_standby_slots' while may not configure
> > slot-sync GUCs on that standby (hot_standby_feedback,
> > sync_replication_slots etc). In such a case, logical replication is
> > dependent upon the concerned physical standby even though latter is
> > not syncing failover slots.
> >
>
> The difference is that the purpose of 'synchronized_standby_slots' is
> to mention slot names for which the user expects logical walsenders to
> wait before sending the logical changes to subscribers. OTOH,
> 'synchronous_standby_names' has a different purpose as well. It is not
> clear to me if the users would be interested in syncing failover slots
> to all the standbys mentioned in 'synchronous_standby_names'.
>

Okay, I see your point. But not sure what could be the solution here
except documenting. But let me think more.

thanks
Shveta




Re: Conflict detection for update_deleted in logical replication

2024-09-20 Thread Amit Kapila
On Fri, Sep 20, 2024 at 8:25 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Apart from the vacuum_defer_cleanup_age idea.
>

I think you meant to say vacuum_committs_age idea.

> we’ve given more thought to our
> approach for retaining dead tuples and have come up with another idea that can
> reliably detect conflicts without requiring users to choose a wise value for
> the vacuum_committs_age. This new idea could also reduce the performance
> impact. Thanks a lot to Amit for off-list discussion.
>
> The concept of the new idea is that, the dead tuples are only useful to detect
> conflicts when applying *concurrent* transactions from remotes. Any subsequent
> UPDATE from a remote node after removing the dead tuples should have a later
> timestamp, meaning it's reasonable to detect an update_missing scenario and
> convert the UPDATE to an INSERT when applying it.
>
> To achieve above, we can create an additional replication slot on the
> subscriber side, maintained by the apply worker. This slot is used to retain
> the dead tuples. The apply worker will advance the slot.xmin after confirming
> that all the concurrent transaction on publisher has been applied locally.
>
> The process of advancing the slot.xmin could be:
>
> 1) the apply worker call GetRunningTransactionData() to get the
> 'oldestRunningXid' and consider this as 'candidate_xmin'.
> 2) the apply worker send a new message to walsender to request the latest wal
> flush position(GetFlushRecPtr) on publisher, and save it to
> 'candidate_remote_wal_lsn'. Here we could introduce a new feedback message or
> extend the existing keepalive message(e,g extends the requestReply bit in
> keepalive message to add a 'request_wal_position' value)
> 3) The apply worker can continue to apply changes. After applying all the WALs
> upto 'candidate_remote_wal_lsn', the apply worker can then advance the
> slot.xmin to 'candidate_xmin'.
>
> This approach ensures that dead tuples are not removed until all concurrent
> transactions have been applied. It can be effective for both bidirectional and
> non-bidirectional replication cases.
>
> We could introduce a boolean subscription option (retain_dead_tuples) to
> control whether this feature is enabled. Each subscription intending to detect
> update-delete conflicts should set retain_dead_tuples to true.
>

As each apply worker needs a separate slot to retain deleted rows, the
requirement for slots will increase. The other possibility is to
maintain one slot by launcher or some other central process that
traverses all subscriptions, remember the ones marked with
retain_dead_rows (let's call this list as retain_sub_list). Then using
running_transactions get the oldest running_xact, and then get the
remote flush location from the other node (publisher node) and store
those as candidate values (candidate_xmin and
candidate_remote_wal_lsn) in slot. We can probably reuse existing
candidate variables of the slot. Next, we can check the remote_flush
locations from all the origins corresponding in retain_sub_list and if
all are ahead of candidate_remote_wal_lsn, we can update the slot's
xmin to candidate_xmin.

I think in the above idea we can an optimization to combine the
request for remote wal LSN from different subscriptions pointing to
the same node to avoid sending multiple requests to the same node. I
am not sure if using pg_subscription.subconninfo is sufficient for
this, if not we can probably leave this optimization.

If this idea is feasible then it would reduce the number of slots
required to retain the deleted rows but the launcher needs to get the
remote wal location corresponding to each publisher node. There are
two ways to achieve that (a) launcher requests one of the apply
workers corresponding to subscriptions pointing to the same publisher
node to get this information; (b) launcher launches another worker to
get the remote wal flush location.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Support Int64 GUCs

2024-09-20 Thread Alexander Korotkov
Hi, Aleksander!

Thank you for your work on this subject.

On Thu, Sep 12, 2024 at 2:08 PM Aleksander Alekseev
 wrote:
> Attached is a self-sufficient patch extracted from a larger patchset
> [1]. The entire patchset probably will not proceed further in the
> nearest future. Since there was interest in this particular patch it
> deserves being discussed in a separate thread.
>
> Currently we support 32-bit integer values in GUCs, but don't support
> 64-bit ones. The proposed patch adds this support.
>
> Firstly, it adds DefineCustomInt64Variable() which can be used by the
> extension authors.
>
> Secondly, the following core GUCs are made 64-bit:
>
> ```
> autovacuum_freeze_min_age
> autovacuum_freeze_max_age
> autovacuum_freeze_table_age
> autovacuum_multixact_freeze_min_age
> autovacuum_multixact_freeze_max_age
> autovacuum_multixact_freeze_table_age
> ```
>
> I see several open questions with the patch in its current state.
>
> Firstly, I'm not sure if it is beneficial to affect the named GUCs out
> of the context of the larger patchset. Perhaps we have better GUCs
> that could benefit from being 64-bit? Or should we just leave alone
> the core GUCs and focus on providing DefineCustomInt64Variable() ?

It doesn't look like these *_age GUCs could benefit from being 64-bit,
before 64-bit transaction ids get in.  However, I think there are some
better candidates.

autovacuum_vacuum_threshold
autovacuum_vacuum_insert_threshold
autovacuum_analyze_threshold

This GUCs specify number of tuples before vacuum/analyze.  That could
be more than 2^31.  With large tables of small tuples, I can't even
say this is always impractical to have values greater than 2^31.

> Secondly, DefineCustomInt64Variable() is not test-covered. Turned out
> it was not even defined (although declared) in the original patch.
> This was fixed in the attached version. Maybe one of the test modules
> could use it even if it makes little sense for this particular module?
> For instance, test/modules/worker_spi/ could use it for
> worker_spi.naptime.

I don't think there are good candidates among existing extension GUCs.
I think we could add something for pure testing purposes somewhere in
src/test/modules.

> Last but not least, large values like 12345678912345 could be
> difficult to read. Perhaps we should also support 12_345_678_912_345
> syntax? This is not implemented in the attached patch and arguably
> could be discussed separately when and if we merge it.

I also think we're good with 12345678912345 so far.

--
Regards,
Alexander Korotkov
Supabase




Why mention to Oracle ?

2024-09-20 Thread Marcos Pegoraro
Why PostgreSQL DOCs needs to show or compare the Oracle way of doing things
?

I understand that on page Porting from Oracle PL/SQL is ok to mention
Oracle, but there are other places where it's not needed. Or, if it's ok to
mention, why not mention SQL Server or MySQL or any other ?

Bug Reporting Guidelines
Especially refrain from merely saying that “This is not what SQL
says/Oracle does.”

LOCK
the PostgreSQL lock modes and the LOCK TABLE syntax are compatible with
those present in Oracle.

SELECT
Applications written for Oracle frequently use a workaround involving the
automatically generated rownum column, which is not available in
PostgreSQL, to implement the effects of these clauses.

ROLLBACK TO SAVEPOINT
The SQL standard specifies that the key word SAVEPOINT is mandatory, but
PostgreSQL and Oracle allow it to be omitted

Data Type Formatting Functions
FM modifies only the next specification, while in Oracle FM affects all
subsequent specifications, and repeated FM modifiers toggle fill mode on
and off.

Data Type Formatting Functions
A sign formatted using SG, PL, or MI is not anchored to the number; for
example, to_char(-12, 'MI') produces '-  12' but to_char(-12, 'S')
produces '  -12'. (The Oracle implementation does not allow the use of MI
before 9, but rather requires that 9 precede MI.)

regards
Marcos


Re: not null constraints, again

2024-09-20 Thread Alvaro Herrera
On 2024-Sep-20, Tender Wang wrote:

> By the way, the v3  failed applying on Head(d35e293878)
> git am v3-0001-Catalog-not-null-constraints.patch
> Applying: Catalog not-null constraints
> error: patch failed: doc/src/sgml/ref/create_table.sgml:77

Yeah, there's a bunch of conflicts in current master.  I rebased
yesterday but I'm still composing the email for v4.  Coming soon.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)




Re: not null constraints, again

2024-09-20 Thread jian he
> > We only have this Synopsis
> > ALTER [ COLUMN ] column_name { SET | DROP } NOT NULL
>
> Yeah, this syntax is intended to add a "normal" not-null constraint,
> i.e. one that inherits.
>
> > --tests from src/test/regress/sql/inherit.sql
> > CREATE TABLE inh_nn_parent (a int, NOT NULL a NO INHERIT);
> > ALTER TABLE inh_nn_parent ALTER a SET NOT NULL;
> > current fail at ATExecSetNotNull
> > ERROR:  cannot change NO INHERIT status of NOT NULL constraint
> > "inh_nn_parent_a_not_null" on relation "inh_nn_parent"
>
> This is correct, because here you want a normal not-null constraint but
> the table already has the weird ones that don't inherit.
>

i found a case,that in a sense kind of support to make it a no-op.
no-op means, if this attribute is already not-null, ALTER column SET NOT NULL;
won't have any effect.
or maybe there is a bug somewhere.

drop table if exists pp1;
create table pp1 (f1 int not null no inherit);
ALTER TABLE pp1 ALTER f1 SET NOT NULL;
ALTER TABLE ONLY pp1 ALTER f1 SET NOT NULL;

There is no child table, no partition, just a single regular table.
so, in this case, with or without ONLY should behave the same?
now "ALTER TABLE ONLY" works, "ALTER TABLE" error out.

per sql-altertable.html:
name
The name (optionally schema-qualified) of an existing table to alter.
If ONLY is specified before the table name, only that table is
altered. If ONLY is not specified, the table and all its descendant
tables (if any) are altered.




diff --git a/doc/src/sgml/ref/create_table.sgml
b/doc/src/sgml/ref/create_table.sgml
index 93b3f664f2..57c4ecd93a 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -77,6 +77,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } |
UNLOGGED ] TABLE [ IF NOT EXI

 [ CONSTRAINT constraint_name ]
 { CHECK ( expression ) [
NO INHERIT ] |
+  NOT NULL column_name [
NO INHERIT ] |
   UNIQUE [ NULLS [ NOT ] DISTINCT ] ( column_name [, ... ] ) index_parameters |
   PRIMARY KEY ( column_name [, ... ] ) index_parameters |
   EXCLUDE [ USING index_method ] ( exclude_element WITH operator [, ... ] ) index_parameters [ WHERE (
predicate ) ] |

we can
create table pp1 (f1 int not null no inherit);
create table pp1 (f1 int, constraint nn not null f1 no inherit);

"NO INHERIT" should be applied for column_constraint and table_constraint?




Re: SQLFunctionCache and generic plans

2024-09-20 Thread Alexander Korotkov
Hi, Alexander!

On Tue, Sep 3, 2024 at 10:33 AM Alexander Pyhalov
 wrote:
> Tom Lane писал(а) 2023-02-07 18:29:
> > Ronan Dunklau  writes:
> >> The following comment can be found in functions.c, about the
> >> SQLFunctionCache:
> >
> >>  * Note that currently this has only the lifespan of the calling
> >> query.
> >>  * Someday we should rewrite this code to use plancache.c to save
> >> parse/plan
> >>  * results for longer than that.
> >
> >> I would be interested in working on this, primarily to avoid this
> >> problem of
> >> having generic query plans for SQL functions but maybe having a longer
> >> lived
> >> cache as well would be nice to have.
> >> Is there any reason not too, or pitfalls we would like to avoid ?
> >
> > AFAIR it's just lack of round tuits.  There would probably be some
> > semantic side-effects, though if you pay attention you could likely
> > make things better while you are at it.  The existing behavior of
> > parsing and planning all the statements at once is not very desirable
> > --- for instance, it doesn't work to do
> >   CREATE TABLE foo AS ...;
> >   SELECT * FROM foo;
> > I think if we're going to nuke this code and start over, we should
> > try to make that sort of case work.
> >
> >   regards, tom lane
>
> Hi.
>
> I've tried to make SQL functions use CachedPlan machinery. The main goal
> was to allow SQL functions to use custom plans
> (the work was started from question - why sql function is so slow
> compared to plpgsql one). It turned out that
> plpgsql function used custom plan and eliminated scan of all irrelevant
> sections, but
> exec-time pruning didn't cope with pruning when ScalarArrayOpExpr,
> filtering data using int[] parameter.
>
> In current prototype there are two restrictions. The first one is that
> CachecPlan has lifetime of a query - it's not
> saved for future use, as we don't have something like plpgsql hashtable
> for long live function storage. Second -
> SQL language functions in sql_body form (with stored queryTree_list) are
> handled in the old way, as we currently lack
> tools to make cached plans from query trees.
>
> Currently this change solves the issue of inefficient plans for queries
> over partitioned tables. For example, function like
>
> CREATE OR REPLACE FUNCTION public.test_get_records(ids integer[])
>   RETURNS SETOF test
>   LANGUAGE sql
> AS $function$
>  select *
>   from test
>   where id = any (ids)
> $function$;
>
> for hash-distributed table test can perform pruning in plan time  and
> can have plan like
>
> Append  (cost=0.00..51.88 rows=26 width=36)
>->  Seq Scan on test_0 test_1  (cost=0.00..25.88 rows=13
> width=36)
>  Filter: (id = ANY ('{1,2}'::integer[]))
>->  Seq Scan on test_2  (cost=0.00..25.88 rows=13 width=36)
>  Filter: (id = ANY ('{1,2}'::integer[]))
>
> instead of
>
> Append  (cost=0.00..155.54 rows=248 width=36)
>->  Seq Scan on test_0 test_1  (cost=0.00..38.58 rows=62
> width=36)
>  Filter: (id = ANY ($1))
>->  Seq Scan on test_1 test_2  (cost=0.00..38.58 rows=62
> width=36)
>  Filter: (id = ANY ($1))
>->  Seq Scan on test_2 test_3  (cost=0.00..38.58 rows=62
> width=36)
>  Filter: (id = ANY ($1))
>->  Seq Scan on test_3 test_4  (cost=0.00..38.58 rows=62
> width=36)
>  Filter: (id = ANY ($1))
>
> This patch definitely requires more work, and I share it to get some
> early feedback.
>
> What should we do with "pre-parsed" SQL functions (when prosrc is
> empty)? How should we create cached plans when we don't have raw
> parsetrees?
> Currently we can create cached plans without raw parsetrees, but this
> means that plan revalidation doesn't work, choose_custom_plan()
> always returns false and we get generic plan. Perhaps, we need some form
> of GetCachedPlan(), which ignores raw_parse_tree?

I don't think you need a new form of GetCachedPlan().  Instead, it
seems that StmtPlanRequiresRevalidation() should be revised.  As I got
from comments and the d8b2fcc9d4 commit message, the primary goal was
to skip revalidation of utility statements.  Skipping revalidation was
a positive side effect, as long as we didn't support custom plans for
them anyway.  But as you're going to change this,
StmtPlanRequiresRevalidation() needs to be revised.

I also think it's not necessary to implement long-lived plan cache in
the initial patch.  The work could be split into two patches.  The
first could implement query lifetime plan cache.  This is beneficial
already by itself as you've shown by example.  The second could
implement long-lived plan cache.

I appreciate your work in this direction.  I hope you got the feedback
to go ahead and work on remaining issues.

--
Regards,
Alexander Korotkov
Supabase




Re: not null constraints, again

2024-09-20 Thread Alvaro Herrera
On 2024-Sep-20, jian he wrote:

> about set_attnotnull.
> 
> we can make set_attnotnull  look less recursive.
> instead of calling find_inheritance_children,
> let's just one pass, directly call  find_all_inheritors
> overall, I think it would be more intuitive.
> 
> please check the attached refactored set_attnotnull.
> regress test passed, i only test regress.

Hmm, what do we gain from doing this change?  It's longer in number of
lines of code, and it's not clear to me that it is simpler.

> I am also beginning to wonder if ATExecSetNotNull inside can also call
> find_all_inheritors.

The point of descending levels one by one in ATExecSetNotNull is that we
can stop for any child on which a constraint already exists.  We don't
need to scan any children thereof, which saves work.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"




Re: [PATCH] Support Int64 GUCs

2024-09-20 Thread Alexander Korotkov
On Thu, Sep 12, 2024 at 2:29 PM Pavel Borisov  wrote:
> Hi, Alexander!
> Thank you for working on this!
>
> On Thu, 12 Sept 2024 at 15:08, Aleksander Alekseev  
> wrote:
>>
>> Hi,
>>
>> Attached is a self-sufficient patch extracted from a larger patchset
>> [1]. The entire patchset probably will not proceed further in the
>> nearest future. Since there was interest in this particular patch it
>> deserves being discussed in a separate thread.
>>
>> Currently we support 32-bit integer values in GUCs, but don't support
>> 64-bit ones. The proposed patch adds this support.
>>
>> Firstly, it adds DefineCustomInt64Variable() which can be used by the
>> extension authors.
>>
>> Secondly, the following core GUCs are made 64-bit:
>>
>> ```
>> autovacuum_freeze_min_age
>> autovacuum_freeze_max_age
>> autovacuum_freeze_table_age
>> autovacuum_multixact_freeze_min_age
>> autovacuum_multixact_freeze_max_age
>> autovacuum_multixact_freeze_table_age
>> ```
>>
>> I see several open questions with the patch in its current state.
>>
>> Firstly, I'm not sure if it is beneficial to affect the named GUCs out
>> of the context of the larger patchset. Perhaps we have better GUCs
>> that could benefit from being 64-bit? Or should we just leave alone
>> the core GUCs and focus on providing DefineCustomInt64Variable() ?
>
> I think the direction is good and delivering 64-bit GUCs is very much worth 
> committing.
> The patch itself looks good, but we could need to add locks against 
> concurrently modifying 64-bit values, which could be non-atomic on older 
> architectures.

GUCs are located in the local memory.  No concurrent read/writes of
them are possible.  It might happen that SIGHUP comes during
read/write of the GUC variable.  But, that's protected the other way:
SignalHandlerForConfigReload() just sets the ConfigReloadPending flag,
which is processed during CHECK_FOR_INTERRUPTS().  So, I don't see a
need to locks here.

--
Regards,
Alexander Korotkov
Supabase




Re: Documentation to upgrade logical replication cluster

2024-09-20 Thread vignesh C
On Fri, 20 Sept 2024 at 16:24, Amit Kapila  wrote:
>
> On Mon, May 6, 2024 at 10:40 AM vignesh C  wrote:
> >
> > The v9 version patch was not applying on top of HEAD because of few
> > commits, the updated v10 version patch is rebased on top of HEAD.
> >
>
> Let's say publisher is in node1 and subscriber is
> +  in node2. The subscriber node2 
> has
> +  two subscriptions sub1_node1_node2 and
> +  sub2_node1_node2 which are subscribing the changes
> +  from node1.
>
> Do we need to show multiple subscriptions? You are following the same
> steps for both subscriptions, so it may not add much value to show
> steps for two subscriptions. You can write steps for one and add a
> note to say it has to be done for other subscriptions present.

I didn’t include a note because each disable/enable statement
specifies: a) Disable all subscriptions on the node, b) Enable all
subscriptions on the node. The attached v11 version patch just to show
the examples with one subscription.

Regards,
Vignesh
From 3198456c7b9dfafe05d5937733ec10da45e787ed Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 6 May 2024 10:29:01 +0530
Subject: [PATCH v11] Documentation for upgrading logical replication cluster.

Documentation for upgrading logical replication cluster.
---
 doc/src/sgml/glossary.sgml|  10 +
 doc/src/sgml/logical-replication.sgml | 772 ++
 doc/src/sgml/ref/pgupgrade.sgml   | 131 +
 3 files changed, 790 insertions(+), 123 deletions(-)

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 405fe6dc8b..f54f25c1c6 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1069,6 +1069,16 @@

   
 
+  
+   Logical replication cluster
+   
+
+ A set of publisher and subscriber instances with the publisher instance
+ replicating changes to the subscriber instance.
+
+   
+  
+
   
Log record
 
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index df62eb45ff..ae7271e92e 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -2231,6 +2231,778 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
  
 
+ 
+  Upgrade
+
+  
+   Migration of logical replication clusters
+   is possible only when all the members of the old logical replication
+   clusters are version 17.0 or later.
+  
+
+  
+   Prepare for publisher upgrades
+
+   
+pg_upgrade attempts to migrate logical
+slots. This helps avoid the need for manually defining the same
+logical slots on the new publisher. Migration of logical slots is
+only supported when the old cluster is version 17.0 or later.
+Logical slots on clusters before version 17.0 will silently be
+ignored.
+   
+
+   
+Before you start upgrading the publisher cluster, ensure that the
+subscription is temporarily disabled, by executing
+ALTER SUBSCRIPTION ... DISABLE.
+Re-enable the subscription after the upgrade.
+   
+
+   
+There are some prerequisites for pg_upgrade to
+be able to upgrade the logical slots. If these are not met an error
+will be reported.
+   
+
+   
+
+ 
+  The new cluster must have
+  wal_level as
+  logical.
+ 
+
+
+ 
+  The new cluster must have
+  max_replication_slots
+  configured to a value greater than or equal to the number of slots
+  present in the old cluster.
+ 
+
+
+ 
+  The output plugins referenced by the slots on the old cluster must be
+  installed in the new PostgreSQL executable directory.
+ 
+
+
+ 
+  The old cluster has replicated all the transactions and logical decoding
+  messages to subscribers.
+ 
+
+
+ 
+  All slots on the old cluster must be usable, i.e., there are no slots
+  whose
+  pg_replication_slots.conflicting
+  is not true.
+ 
+
+
+ 
+  The new cluster must not have permanent logical slots, i.e.,
+  there must be no slots where
+  pg_replication_slots.temporary
+  is false.
+ 
+
+   
+  
+
+  
+   Prepare for subscriber upgrades
+
+   
+Setup the 
+subscriber configurations in the new subscriber.
+pg_upgrade attempts to migrate subscription
+dependencies which includes the subscription's table information present in
+pg_subscription_rel
+system catalog and also the subscription's replication origin. This allows
+logical replication on the new subscriber to continue from where the
+old subscriber was up to. Migration of subscription dependencies is only
+supported when the old cluster is version 17.0 or later. Subscription
+dependencies on clusters before version 17.0 will silently be ignored.
+   
+
+   
+There are some prerequisites for pg_upgrade to
+be able to upgrade the subscriptions. If these are not met an error
+will be repor

Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-09-20 Thread Junwang Zhao
Hi Tom and Michael,

On Fri, Sep 20, 2024 at 12:38 PM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Fri, Sep 20, 2024 at 11:51:49AM +0800, Junwang Zhao wrote:
> >> Should you also bump the catalog version?
>
> > No need to worry about that when sending a patch because committers
> > take care of that when merging a patch into the tree.  Doing that in
> > each patch submitted just creates more conflicts and work for patch
> > authors because they'd need to recolve conflicts each time a
> > catversion bump happens.  And that can happen on a daily basis
> > sometimes depending on what is committed.
>
> Right.  Sometimes the committer forgets to do that :-(, which is
> not great but it's not normally a big problem either.  We've concluded
> it's better to err in that direction than impose additional work
> on patch submitters.
>
> If you feel concerned about the point, best practice is to include a
> mention that catversion bump is needed in your draft commit message.
>
> regards, tom lane

Got it, thanks for both of your explanations.

-- 
Regards
Junwang Zhao




Re: not null constraints, again

2024-09-20 Thread jian he
about set_attnotnull.

we can make set_attnotnull  look less recursive.
instead of calling find_inheritance_children,
let's just one pass, directly call  find_all_inheritors
overall, I think it would be more intuitive.

please check the attached refactored set_attnotnull.
regress test passed, i only test regress.

I am also beginning to wonder if ATExecSetNotNull inside can also call
find_all_inheritors.
static void
set_attnotnull(List **wqueue, Relation rel, AttrNumber attnum, bool recurse,
			   LOCKMODE lockmode)
{
	HeapTuple	tuple;
	Form_pg_attribute attForm;
	bool		changed = false;

	tuple = SearchSysCacheCopyAttNum(RelationGetRelid(rel), attnum);
	if (!HeapTupleIsValid(tuple))
		elog(ERROR, "cache lookup failed for attribute %d of relation %u",
			 attnum, RelationGetRelid(rel));
	attForm = (Form_pg_attribute) GETSTRUCT(tuple);
	if (!attForm->attnotnull)
	{
		Relation	attr_rel;

		attr_rel = table_open(AttributeRelationId, RowExclusiveLock);

		attForm->attnotnull = true;
		CatalogTupleUpdate(attr_rel, &tuple->t_self, tuple);

		table_close(attr_rel, RowExclusiveLock);

		/*
		 * And set up for existing values to be checked, unless another
		 * constraint already proves this.
		 */
		if (wqueue && !NotNullImpliedByRelConstraints(rel, attForm))
		{
			AlteredTableInfo *tab;

			tab = ATGetQueueEntry(wqueue, rel);
			tab->verify_new_notnull = true;
		}

		changed = true;
	}

	if (recurse)
	{
		List	   *child_oids;
		Relation	childrel;
		AttrNumber	childattno;
		HeapTuple	rel_tuple;
		Form_pg_attribute child_attForm;
		const char *attrname;

		attrname = get_attname(RelationGetRelid(rel), attnum, false);
		child_oids = find_all_inheritors(RelationGetRelid(rel), lockmode,
		 NULL);

		if (changed)
			CommandCounterIncrement();

		foreach_oid(childrelid, child_oids)
		{
			/* we alerady dealt with parent rel in above */
			if (childrelid == RelationGetRelid(rel))
continue;

			childrel = table_open(childrelid, NoLock);
			CheckAlterTableIsSafe(childrel);

			childattno = get_attnum(childrelid, attrname);
			rel_tuple = SearchSysCacheCopyAttNum(childrelid, childattno);

			if (!HeapTupleIsValid(rel_tuple))
elog(ERROR, "cache lookup failed for attribute %d of relation %s",
	attnum, RelationGetRelationName(childrel));

			child_attForm = (Form_pg_attribute) GETSTRUCT(rel_tuple);
			if (!child_attForm->attnotnull)
			{
Relation	attr_rel;

attr_rel = table_open(AttributeRelationId, RowExclusiveLock);

child_attForm->attnotnull = true;
CatalogTupleUpdate(attr_rel, &rel_tuple->t_self, rel_tuple);

table_close(attr_rel, RowExclusiveLock);

if (wqueue && !NotNullImpliedByRelConstraints(childrel, child_attForm))
{
	AlteredTableInfo *tab;

	tab = ATGetQueueEntry(wqueue, childrel);
	tab->verify_new_notnull = true;
}
changed = true;
			}
			changed = false;
			table_close(childrel, NoLock);
		}
	}
}

Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Shubham Khanna
On Tue, Sep 17, 2024 at 1:14 PM Peter Smith  wrote:
>
> Here are some review comments for v31-0001 (for the docs only)
>
> There may be some overlap here with some comments already made for
> v30-0001 which are not yet addressed in v31-0001.
>
> ==
> Commit message
>
> 1.
> When introducing the 'publish_generated_columns' parameter, you must
> also say this is a PUBLICATION parameter.
>
> ~~~
>
> 2.
> With this enhancement, users can now include the 'include_generated_columns'
> option when querying logical replication slots using either the pgoutput
> plugin or the test_decoding plugin. This option, when set to 'true' or '1',
> instructs the replication system to include generated column information
> and data in the replication stream.
>
> ~
>
> The above is stale information because it still refers to the old name
> 'include_generated_columns', and to test_decoding which was already
> removed in this patch.
>
> ==
> doc/src/sgml/ddl.sgml
>
> 3.
> +  Generated columns may be skipped during logical replication
> according to the
> +  CREATE PUBLICATION option
> +   linkend="sql-createpublication-params-with-include-generated-columns">
> +  publish_generated_columns.
>
> 3a.
> nit - The linkend is based on the old name instead of the new name.
>
> 3b.
> nit - Better to call this a parameter instead of an option because
> that is what the CREATE PUBLICATION docs call it.
>
> ==
> doc/src/sgml/protocol.sgml
>
> 4.
> +
> + publish_generated_columns
> +  
> +   
> +Boolean option to enable generated columns. This option controls
> +whether generated columns should be included in the string
> +representation of tuples during logical decoding in PostgreSQL.
> +   
> +  
> +
> +
>
> Is this even needed anymore? Now that the implementation is using a
> PUBLICATION parameter, isn't everything determined just by that
> parameter? I don't see the reason why a protocol change is needed
> anymore. And, if there is no protocol change needed, then this
> documentation change is also not needed.
>
> 
>
> 5.
>   
> -  Next, the following message part appears for each column included in
> -  the publication (except generated columns):
> +  Next, the following message parts appear for each column included in
> +  the publication (generated columns are excluded unless the parameter
> +  
> +  publish_generated_columns specifies 
> otherwise):
>   
>
> Like the previous comment above, I think everything is now determined
> by the PUBLICATION parameter. So maybe this should just be referring
> to that instead.
>
> ==
> doc/src/sgml/ref/create_publication.sgml
>
> 6.
> +id="sql-createpublication-params-with-include-generated-columns">
> +publish_generated_columns
> (boolean)
> +
>
> nit - the ID is based on the old parameter name.
>
> ~
>
> 7.
> + 
> +  This option is only available for replicating generated
> column data from the publisher
> +  to a regular, non-generated column in the subscriber.
> + 
>
> IMO remove this paragraph. I really don't think you should be
> mentioning the subscriber here at all. AFAIK this parameter is only
> for determining if the generated column will be published or not. What
> happens at the other end (e.g. logic whether it gets ignored or not by
> the subscriber) is more like a matrix of behaviours that could be
> documented in the "Logical Replication" section. But not here.
>
> (I removed this in my nitpicks attachment)
>
> ~~~
>
> 8.
> + 
> + This parameter can only be set true if
> copy_data is
> + set to false.
> + 
>
> IMO remove this paragraph too. The user can create a PUBLICATION
> before a SUBSCRIPTION even exists so to say it "can only be set..." is
> not correct. Sure, your patch 0001 does not support the COPY of
> generated columns but if you want to document that then it should be
> documented in the CREATE SUBSCRIBER docs. But not here.
>
> (I removed this in my nitpicks attachment)
>
> TBH, it would be better if patches 0001 and 0002 were merged then you
> can avoid all this. IIUC they were only separate in the first place
> because 2 different people wrote them. It is not making reviews easier
> with them split.
>
> ==
>
> Please see the attachment which implements some of the nits above.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Shubham Khanna
On Tue, Sep 17, 2024 at 3:12 PM Peter Smith  wrote:
>
> Review comments for v31-0001.
>
> (I tried to give only new comments, but there might be some overlap
> with comments I previously made for v30-0001)
>
> ==
> src/backend/catalog/pg_publication.c
>
> 1.
> +
> + if (publish_generated_columns_given)
> + {
> + values[Anum_pg_publication_pubgencolumns - 1] =
> BoolGetDatum(publish_generated_columns);
> + replaces[Anum_pg_publication_pubgencolumns - 1] = true;
> + }
>
> nit - unnecessary whitespace above here.
>
> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 2. prepare_all_columns_bms
>
> + /* Iterate the cols until generated columns are found. */
> + cols = bms_add_member(cols, i + 1);
>
> How does the comment relate to the statement that follows it?
>
> ~~~
>
> 3.
> + * Skip generated column if pubgencolumns option was not
> + * specified.
>
> nit - /pubgencolumns option/publish_generated_columns parameter/
>
> ==
> src/bin/pg_dump/pg_dump.c
>
> 4.
> getPublications:
>
> nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)
>
> ==
> src/bin/pg_dump/pg_dump.h
>
> 5.
> + bool pubgencolumns;
>  } PublicationInfo;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ==
> vsrc/bin/psql/describe.c
>
> 6.
>   bool has_pubviaroot;
> + bool has_pubgencol;
>
> nit - /has_pubgencol/has_pubgencols/ (plural consistency)
>
> ==
> src/include/catalog/pg_publication.h
>
> 7.
> + /* true if generated columns data should be published */
> + bool pubgencolumns;
>  } FormData_pg_publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ~~~
>
> 8.
> + bool pubgencolumns;
>   PublicationActions pubactions;
>  } Publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ==
> src/test/regress/sql/publication.sql
>
> 9.
> +-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
> +\dRp+ pub1
> +
> +CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
> +\dRp+ pub2
>
> 9a.
> nit - Use lowercase for the parameters.
>
> ~
>
> 9b.
> nit - Fix the comment to say what the test is actually doing:
> "Test the publication 'publish_generated_columns' parameter enabled or 
> disabled"
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 10.
> Later I think you should add another test here to cover the scenario
> that I was discussing with Sawada-San -- e.g. when there are 2
> publications for the same table subscribed by just 1 subscription but
> having different values of the 'publish_generated_columns' for the
> publications.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Shubham Khanna
On Wed, Sep 18, 2024 at 8:58 AM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v31-0002.
>
> ==
>
> 1. General.
>
> IMO patches 0001 and 0002 should be merged when next posted. IIUC the
> reason for the split was only because there were 2 different authors
> but that seems to be not relevant anymore.
>
> ==
> Commit message
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true, we need to copy using the syntax:
> 'COPY (SELECT column_name FROM table_name) TO STDOUT'.
>
> ~
>
> 2a.
> Should clarify that 'copy_data' is a SUBSCRIPTION parameter.
>
> 2b.
> Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.
>
> ==
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 3.
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
>
> Each time I review this code I am tricked into thinking it is wrong to
> use rel->remoterel.natts here for the localgenlist. AFAICT the code is
> actually fine because you do not store *all* the subscriber gencols in
> 'localgenlist' -- you only store those with matching names on the
> publisher table. It might be good if you could add an explanatory
> comment about that to prevent any future doubts.
>
> ~~~
>
> 4.
> + if (!remotegenlist[remote_attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" has a
> generated column \"%s\" "
> + "but corresponding column on source relation is not a generated column",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;
>
> This error message has lots of good information. OTOH, I think when
> copy_data=false the error would report the subscriber column just as
> "missing", which is maybe less helpful. Perhaps that other
> copy_data=false "missing" case can be improved to share the same error
> message that you have here.
>

This comment is still open. Will fix this in the next set of patches.

> ~~~
>
> fetch_remote_table_info:
>
> 5.
> IIUC, this logic needs to be more sophisticated to handle the case
> that was being discussed earlier with Sawada-san [1]. e.g. when the
> same table has gencols but there are multiple subscribed publications
> where the 'publish_generated_columns' parameter differs.
>
> Also, you'll need test cases for this scenario, because it is too
> difficult to judge correctness just by visual inspection of the code.
>
> 
>
> 6.
> nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
> readability, and initialize it to 'false' to make it easy to use
> later.
>
> ~~~
>
> 7.
> - * Get column lists for each relation.
> + * Get column lists for each relation and check if any of the publication
> + * has generated column option.
>
> and
>
> + /* Check if any of the publication has generated column option */
> + if (server_version >= 18)
>
> nit - tweak the comments to name the publication parameter properly.
>
> ~~~
>
> 8.
> foreach(lc, MySubscription->publications)
> {
> if (foreach_current_index(lc) > 0)
> appendStringInfoString(&pub_names, ", ");
> appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc;
> }
>
> I know this is existing code, but shouldn't all this be done by using
> the purpose-built function 'get_publications_str'
>
> ~~~
>
> 9.
> + ereport(ERROR,
> + errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not fetch gencolumns information from publication list: %s",
> +pub_names.data));
>
> and
>
> + errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("failed to fetch tuple for gencols from publication list: %s",
> +pub_names.data));
>
> nit - /gencolumns information/generated column publication
> information/ to make the errmsg more human-readable
>
> ~~~
>
> 10.
> + bool gencols_allowed = server_version >= 18 && hasgencolpub;
> +
> + if (!gencols_allowed)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
>
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> ==
>
> Please refer to the attachment which implements some of the nits
> mentioned above.
>
> ==
> [1] 
> https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com
>

I have addressed the comments in the v32-0002 Patch. Please refer to
the updated v32-0002 Patch here in [1]. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Should rolpassword be toastable?

2024-09-20 Thread Jonathan S. Katz

On 9/20/24 1:23 AM, Michael Paquier wrote:

On Thu, Sep 19, 2024 at 09:46:00PM -0500, Nathan Bossart wrote:

On Thu, Sep 19, 2024 at 07:37:55PM -0400, Jonathan S. Katz wrote:

Shouldn't we enforce the limit in every case in encrypt_password,
not just this one?  (I do agree that encrypt_password is an okay
place to enforce it.)


Yeah, that seems like a good idea.  I've attached a more fleshed-out patch
set that applies the limit in all cases.


Not sure.  Is this really something we absolutely need?  Sure, this
generates a better error when inserting a record too long to
pg_authid, but removing the toast relation is enough to avoid the
problems one would see when authenticating.  Not sure if this argument
is enough to count as an objection, just sharing some doubts :)


The errors from lack of TOAST are confusing to users. Why can't we have 
a user friendly error here?


Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-20 Thread Tomas Vondra
On 9/18/24 20:52, Peter Geoghegan wrote:
> On Wed, Sep 18, 2024 at 7:36 AM Tomas Vondra  wrote:
>> Makes sense. I started with the testing before before even looking at
>> the code, so it's mostly a "black box" approach. I did read the 1995
>> paper before that, and the script generates queries with clauses
>> inspired by that paper, in particular:
> 
> I think that this approach with black box testing is helpful, but also
> something to refine over time. Gray box testing might work best.
> 
>> OK, understood. If it's essentially an independent issue (perhaps even
>> counts as a bug?) what about correcting it on master first? Doesn't
>> sound like something we'd backpatch, I guess.
> 
> What about backpatching it to 17?
> 
> As things stand, you can get quite contradictory counts of the number
> of index scans due to irrelevant implementation details from parallel
> index scan. It just looks wrong, particularly on 17, where it is
> reasonable to expect near exact consistency between parallel and
> serial scans of the same index.
> 

Yes, I think backpatching to 17 would be fine. I'd be worried about
maybe disrupting some monitoring in production systems, but for 17 that
shouldn't be a problem yet. So fine with me.

FWIW I wonder how likely is it that someone has some sort of alerting
tied to this counter. I'd bet few people do. It's probably more about a
couple people looking at explain plans, but they'll be confused even if
we change that only starting with 17.

>> Seems like a bit of a mess. IMHO we should either divide everything by
>> nloops (so that everything is "per loop", or not divide anything. My
>> vote would be to divide, but that's mostly my "learned assumption" from
>> the other fields. But having a 50:50 split is confusing for everyone.
> 
> My idea was that it made most sense to follow the example of
> "Buffers:", since both describe physical costs.
> 
> Honestly, I'm more than ready to take whatever the path of least
> resistance is. If dividing by nloops is what people want, I have no
> objections.
> 

I don't have a strong opinion on this. I just know I'd be confused by
half the counters being total and half /loop, but chances are other
people would disagree.

>>> It's worth having skip support (the idea comes from the MDAM paper),
>>> but it's not essential. Whether or not an opclass has skip support
>>> isn't accounted for by the cost model, but I doubt that it's worth
>>> addressing (the cost model is already pessimistic).
>>>
>>
>> I admit I'm a bit confused. I probably need to reread the paper, but my
>> impression was that the increment/decrement is required for skipscan to
>> work. If we can't do that, how would it generate the intermediate values
>> to search for? I imagine it would be possible to "step through" the
>> index, but I thought the point of skip scan is to not do that.
> 
> I think that you're probably still a bit confused because the
> terminology in this area is a little confusing. There are two ways of
> explaining the situation with types like text and numeric (types that
> lack skip support). The two explanations might seem to be
> contradictory, but they're really not, if you think about it.
> 
> The first way of explaining it, which focuses on how the scan moves
> through the index:
> 
> For a text index column "a", and an int index column "b", skip scan
> will work like this for a query with a qual "WHERE b = 55":
> 
> 1. Find the first/lowest sorting "a" value in the index. Let's say
> that it's "Aardvark".
> 
> 2. Look for matches "WHERE a = 'Aardvark' and b = 55", possibly
> returning some matches.
> 
> 3. Find the next value after "Aardvark" in the index using a probe
> like the one we'd use for a qual "WHERE a > 'Aardvark'". Let's say
> that it turns out to be "Abacus".
> 
> 4. Look for matches "WHERE a = 'Abacus' and b = 55"...
> 
> ... (repeat these steps until we've exhaustively processed every
> existing "a" value in the index)...

Ah, OK. So we do probe the index like this. I was under the impression
we don't do that. But yeah, this makes sense.

> 
> The second way of explaining it, which focuses on how the skip arrays
> advance. Same query (and really the same behavior) as in the first
> explanation:
> 
> 1. Skip array's initial value is the sentinel -inf, which cannot
> possibly match any real index tuple, but can still guide the search.
> So we search for tuples "WHERE a = -inf AND b = 55" (actually we don't
> include the "b = 55" part, since it is unnecessary, but conceptually
> it's a part of what we search for within _bt_first).
> 
> 2. Find that the index has no "a" values matching -inf (it inevitably
> cannot have any matches for -inf), but we do locate the next highest
> match. The closest matching value is "Aardvark". The skip array on "a"
> therefore advances from -inf to "Aardvark".
> 
> 3. Look for matches "WHERE a = 'Aardvark' and b = 55", possibly
> returning some matches.
> 
> 4. Reach tuples after the last match for "WHERE a = 'Aard

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-20 Thread Peter Geoghegan
On Fri, Sep 20, 2024 at 9:45 AM Tomas Vondra  wrote:
> 3) restart cluster, drop caches
>
> 4) run the query from the SQL script
>
> I suspect you don't do (3). I didn't mention this explicitly, my message
> only said "with uncached data", so maybe that's the problem?

You're right that I didn't do step 3 here. I'm generally in the habit
of using fully cached data when testing this kind of work.

The only explanation I can think of is that (at least on your
hardware) OS readahead helps the master branch more than skipping
helps the patch. That's surprising, but I guess it's possible here
because skip scan only needs to access about every third page. And
because this particular index was generated by CREATE INDEX, and so
happens to have a strong correlation between key space order and
physical block order. And probably because this is an index-only scan.

> I wasn't suggesting it's a sympathetic case for skipscan. My point is
> that it perfectly matches the costing assumptions, i.e. columns are
> independent etc. But if it's not sympathetic, maybe the cost shouldn't
> be 1/5 of cost from master?

The costing is pretty accurate if we assume cached data, though --
which is what the planner will actually assume. In any case, is that
really the only problem you see here? That the costing might be
inaccurate because it fails to account for some underlying effect,
such as the influence of OS readhead?

Let's assume for a moment that the regression is indeed due to
readahead effects, and that we deem it to be unacceptable. What can be
done about it? I have a really hard time thinking of a fix, since by
most conventional measures skip scan is indeed much faster here.

-- 
Peter Geoghegan




Re: Clock-skew management in logical replication

2024-09-20 Thread Tom Lane
Nisha Moond  writes:
> While considering the implementation of timestamp-based conflict
> resolution (last_update_wins) in logical replication (see [1]), there
> was a feedback at [2] and the discussion on whether or not to manage
> clock-skew at database level.

FWIW, I cannot see why we would do anything beyond suggesting that
people run NTP.  That's standard anyway on the vast majority of
machines these days.  Why would we add complexity that we have
to maintain (and document) in order to cater to somebody not doing
that?

regards, tom lane




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-20 Thread Tomas Vondra
On 9/20/24 16:21, Peter Geoghegan wrote:
> On Fri, Sep 20, 2024 at 9:45 AM Tomas Vondra  wrote:
>> 3) restart cluster, drop caches
>>
>> 4) run the query from the SQL script
>>
>> I suspect you don't do (3). I didn't mention this explicitly, my message
>> only said "with uncached data", so maybe that's the problem?
> 
> You're right that I didn't do step 3 here. I'm generally in the habit
> of using fully cached data when testing this kind of work.
> 
> The only explanation I can think of is that (at least on your
> hardware) OS readahead helps the master branch more than skipping
> helps the patch. That's surprising, but I guess it's possible here
> because skip scan only needs to access about every third page. And
> because this particular index was generated by CREATE INDEX, and so
> happens to have a strong correlation between key space order and
> physical block order. And probably because this is an index-only scan.
> 

Good idea. Yes, it does seem to be due to readahead - if I disable that,
the query takes ~320ms on master and ~280ms with the patch.

>> I wasn't suggesting it's a sympathetic case for skipscan. My point is
>> that it perfectly matches the costing assumptions, i.e. columns are
>> independent etc. But if it's not sympathetic, maybe the cost shouldn't
>> be 1/5 of cost from master?
> 
> The costing is pretty accurate if we assume cached data, though --
> which is what the planner will actually assume. In any case, is that
> really the only problem you see here? That the costing might be
> inaccurate because it fails to account for some underlying effect,
> such as the influence of OS readhead?
> 
> Let's assume for a moment that the regression is indeed due to
> readahead effects, and that we deem it to be unacceptable. What can be
> done about it? I have a really hard time thinking of a fix, since by
> most conventional measures skip scan is indeed much faster here.
> 

It does seem to be due to readahead, and the costing not accounting for
these effects. And I don't think it's unacceptable - I don't think we
consider readahead elsewhere, and it certainly is not something I'd
expect this patch to fix. So I think it's fine.

Ultimately, I think this should be "fixed" by explicitly prefetching
pages. My index prefetching patch won't really help, because AFAIK this
is about index pages. And I don't know how feasible it is.


regards

-- 
Tomas Vondra




Re: Should rolpassword be toastable?

2024-09-20 Thread Nathan Bossart
On Fri, Sep 20, 2024 at 10:06:28AM -0400, Jonathan S. Katz wrote:
> On 9/20/24 1:23 AM, Michael Paquier wrote:
>> Not sure.  Is this really something we absolutely need?  Sure, this
>> generates a better error when inserting a record too long to
>> pg_authid, but removing the toast relation is enough to avoid the
>> problems one would see when authenticating.  Not sure if this argument
>> is enough to count as an objection, just sharing some doubts :)
> 
> The errors from lack of TOAST are confusing to users. Why can't we have a
> user friendly error here?

If I wanted to argue against adding a user-friendly error, I'd point out
that it's highly unlikely anyone is actually trying to use super long
hashes unless they are trying to break things, and it's just another
arbitrary limit that we'll need to maintain/enforce.  But on the off-chance
that someone is building a custom driver that generates long hashes for
whatever reason, I'd imagine that a clear error would be more helpful than
"row is too big."

Here is a v3 patch set that fixes the test comment and a compiler warning
in cfbot.

-- 
nathan
>From bb3aad9105b1997bc088403437ac6294e22809d9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 19 Sep 2024 20:59:10 -0500
Subject: [PATCH v3 1/2] place limit on password hash length

---
 src/backend/libpq/crypt.c  | 60 ++
 src/include/libpq/crypt.h  | 10 +
 src/test/regress/expected/password.out |  7 +++
 src/test/regress/sql/password.sql  |  4 ++
 4 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/src/backend/libpq/crypt.c b/src/backend/libpq/crypt.c
index 629e51e00b..753e5c11da 100644
--- a/src/backend/libpq/crypt.c
+++ b/src/backend/libpq/crypt.c
@@ -116,7 +116,7 @@ encrypt_password(PasswordType target_type, const char *role,
 const char *password)
 {
PasswordType guessed_type = get_password_type(password);
-   char   *encrypted_password;
+   char   *encrypted_password = NULL;
const char *errstr = NULL;
 
if (guessed_type != PASSWORD_TYPE_PLAINTEXT)
@@ -125,32 +125,56 @@ encrypt_password(PasswordType target_type, const char 
*role,
 * Cannot convert an already-encrypted password from one format 
to
 * another, so return it as it is.
 */
-   return pstrdup(password);
+   encrypted_password = pstrdup(password);
}
-
-   switch (target_type)
+   else
{
-   case PASSWORD_TYPE_MD5:
-   encrypted_password = palloc(MD5_PASSWD_LEN + 1);
+   switch (target_type)
+   {
+   case PASSWORD_TYPE_MD5:
+   encrypted_password = palloc(MD5_PASSWD_LEN + 1);
 
-   if (!pg_md5_encrypt(password, role, strlen(role),
-   
encrypted_password, &errstr))
-   elog(ERROR, "password encryption failed: %s", 
errstr);
-   return encrypted_password;
+   if (!pg_md5_encrypt(password, role, 
strlen(role),
+   
encrypted_password, &errstr))
+   elog(ERROR, "password encryption 
failed: %s", errstr);
+   break;
 
-   case PASSWORD_TYPE_SCRAM_SHA_256:
-   return pg_be_scram_build_secret(password);
+   case PASSWORD_TYPE_SCRAM_SHA_256:
+   encrypted_password = 
pg_be_scram_build_secret(password);
+   break;
 
-   case PASSWORD_TYPE_PLAINTEXT:
-   elog(ERROR, "cannot encrypt password with 'plaintext'");
+   case PASSWORD_TYPE_PLAINTEXT:
+   elog(ERROR, "cannot encrypt password with 
'plaintext'");
+   break;
+   }
}
 
+   Assert(encrypted_password);
+
/*
-* This shouldn't happen, because the above switch statements should
-* handle every combination of source and target password types.
+* Valid password hashes may be very long, but we don't want to store
+* anything that might need out-of-line storage, since de-TOASTing won't
+* work during authentication because we haven't selected a database yet
+* and cannot read pg_class. 256 bytes should be more than enough for 
all
+* practical use, so fail for anything longer.
 */
-   elog(ERROR, "cannot encrypt password to requested type");
-   return NULL;/* keep compiler quiet */
+   if (encrypted_password &&   /* keep compiler quiet */
+   strlen(encrypted_password) > MAX_ENCRYPTED_PASSWORD_LEN)
+   {
+   

Re: [Proposal] Add foreign-server health checks infrastructure

2024-09-20 Thread Fujii Masao




On 2024/09/20 12:00, Hayato Kuroda (Fujitsu) wrote:

Dear members,

(This mail is just a wrap-up)

I found that the final patch was pushed 2 days ago [1] and BF animals say OK for
now. Therefore, I've closed the CF entry as "committed".


Thanks!


We can extend the
feature to other platforms, but I think it could be at another thread later.


Yes.


Thanks everyone for many efforts!


You, too! Many thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: Large expressions in indexes can't be stored (non-TOASTable)

2024-09-20 Thread Nathan Bossart
On Fri, Sep 20, 2024 at 08:16:24AM +0900, Michael Paquier wrote:
> Perhaps the reason why these snapshots are pushed should be documented
> with a comment?

Definitely.  I'll add those once we are more confident that we've
identified all the bugs.

-- 
nathan




Re: Large expressions in indexes can't be stored (non-TOASTable)

2024-09-20 Thread Nathan Bossart
On Fri, Sep 20, 2024 at 07:00:00AM +0300, Alexander Lakhin wrote:
> I've found another two paths to reach that condition:
> CREATE INDEX CONCURRENTLY ON def (vec_quantizer(id, :'b'));
> ERROR:  cannot fetch toast data without an active snapshot
> 
> REINDEX INDEX CONCURRENTLY def_vec_quantizer_idx;
> (or REINDEX TABLE CONCURRENTLY def;)
> TRAP: failed Assert("HaveRegisteredOrActiveSnapshot()"), File: 
> "toast_internals.c", Line: 668, PID: 2934502

Here's a (probably naive) attempt at fixing these, too.  I'll give each
path a closer look once it feels like we've identified all the bugs.

> Perhaps it would make sense to check all CatalogTupleUpdate(pg_index, ...)
> calls (I've found 10 such instances, but haven't checked them yet).

Indeed.

-- 
nathan
>From b7432c42c0cea9c1aadba7c72f9ce2ba6e6407d2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 20 Sep 2024 11:48:52 -0500
Subject: [PATCH v2 1/1] fix failed assertions due to pg_index's TOAST table

---
 src/backend/catalog/index.c  | 5 +
 src/backend/commands/indexcmds.c | 9 +
 2 files changed, 14 insertions(+)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b2b3ecb524..2e378ef4ef 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2255,6 +2255,7 @@ index_drop(Oid indexId, bool concurrent, bool 
concurrent_lock_mode)
PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
+   PushActiveSnapshot(GetTransactionSnapshot());
 
/*
 * Now we must wait until no running transaction could be using 
the
@@ -2283,8 +2284,10 @@ index_drop(Oid indexId, bool concurrent, bool 
concurrent_lock_mode)
 * Again, commit the transaction to make the pg_index update 
visible
 * to other sessions.
 */
+   PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
+   PushActiveSnapshot(GetTransactionSnapshot());
 
/*
 * Wait till every transaction that saw the old index state has
@@ -2387,6 +2390,8 @@ index_drop(Oid indexId, bool concurrent, bool 
concurrent_lock_mode)
{
UnlockRelationIdForSession(&heaprelid, 
ShareUpdateExclusiveLock);
UnlockRelationIdForSession(&indexrelid, 
ShareUpdateExclusiveLock);
+
+   PopActiveSnapshot();
}
 }
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f99c2d2dee..36318c81ea 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1798,11 +1798,15 @@ DefineIndex(Oid tableId,
 
PROGRESS_CREATEIDX_PHASE_WAIT_3);
WaitForOlderSnapshots(limitXmin, true);
 
+   PushActiveSnapshot(GetTransactionSnapshot());
+
/*
 * Index can now be marked valid -- update its pg_index entry
 */
index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
 
+   PopActiveSnapshot();
+
/*
 * The pg_index update will cause backends (including this one) to 
update
 * relcache entries for the index itself, but we should also send a
@@ -4236,6 +4240,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid 
relationOid, const Rein
 */
set_indexsafe_procflags();
 
+   PushActiveSnapshot(GetTransactionSnapshot());
+
forboth(lc, indexIds, lc2, newIndexIds)
{
ReindexIndexInfo *oldidx = lfirst(lc);
@@ -4280,8 +4286,10 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid 
relationOid, const Rein
}
 
/* Commit this transaction and make index swaps visible */
+   PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
+   PushActiveSnapshot(GetTransactionSnapshot());
 
/*
 * While we could set PROC_IN_SAFE_IC if all indexes qualified, there's 
no
@@ -4316,6 +4324,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid 
relationOid, const Rein
}
 
/* Commit this transaction to make the updates visible. */
+   PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
 
-- 
2.39.5 (Apple Git-154)



Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Nathan Bossart
On Fri, Sep 20, 2024 at 07:20:15PM +0200, Michael Banck wrote:
> I noticed two headers are not in alphabetical order in pg_checkums.c,
> patch attached.

This appears to be commit 280e5f1's fault.  Will fix.

-- 
nathan




Re: Cleaning up ERRCODE usage in our XML code

2024-09-20 Thread Tom Lane
I wrote:
> [ v1-clean-up-errcodes-for-xml.patch ]

Per cfbot, rebased over d5622acb3.  No functional changes.

regards, tom lane

diff --git a/contrib/xml2/xpath.c b/contrib/xml2/xpath.c
index 0fdf735faf..ef78aa00c8 100644
--- a/contrib/xml2/xpath.c
+++ b/contrib/xml2/xpath.c
@@ -388,7 +388,7 @@ pgxml_xpath(text *document, xmlChar *xpath, xpath_workspace *workspace)
 			/* compile the path */
 			comppath = xmlXPathCtxtCompile(workspace->ctxt, xpath);
 			if (comppath == NULL)
-xml_ereport(xmlerrcxt, ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
+xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
 			"XPath Syntax Error");
 
 			/* Now evaluate the path expression. */
@@ -652,7 +652,7 @@ xpath_table(PG_FUNCTION_ARGS)
 		comppath = xmlXPathCtxtCompile(ctxt, xpaths[j]);
 		if (comppath == NULL)
 			xml_ereport(xmlerrcxt, ERROR,
-		ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
+		ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
 		"XPath Syntax Error");
 
 		/* Now evaluate the path expression. */
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index f30a3a42c0..e761ca5cb5 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -90,7 +90,7 @@ xslt_process(PG_FUNCTION_ARGS)
 XML_PARSE_NOENT);
 
 		if (doctree == NULL)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
+			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
 		"error parsing XML document");
 
 		/* Same for stylesheet */
@@ -99,14 +99,14 @@ xslt_process(PG_FUNCTION_ARGS)
 			  XML_PARSE_NOENT);
 
 		if (ssdoc == NULL)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
+			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
 		"error parsing stylesheet as XML document");
 
 		/* After this call we need not free ssdoc separately */
 		stylesheet = xsltParseStylesheetDoc(ssdoc);
 
 		if (stylesheet == NULL)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
+			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
 		"failed to parse stylesheet");
 
 		xslt_ctxt = xsltNewTransformContext(stylesheet, doctree);
@@ -141,7 +141,7 @@ xslt_process(PG_FUNCTION_ARGS)
 		  NULL, NULL, xslt_ctxt);
 
 		if (restree == NULL)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
+			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
 		"failed to apply stylesheet");
 
 		resstat = xsltSaveResultToString(&resstr, &reslen, restree, stylesheet);
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 41b1a5c6b0..2654c2d7ff 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -742,7 +742,7 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
 		{
 			/* If it's a document, saving is easy. */
 			if (xmlSaveDoc(ctxt, doc) == -1 || xmlerrcxt->err_occurred)
-xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 			"could not save document to xmlBuffer");
 		}
 		else if (content_nodes != NULL)
@@ -785,7 +785,7 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
 	if (xmlSaveTree(ctxt, newline) == -1 || xmlerrcxt->err_occurred)
 	{
 		xmlFreeNode(newline);
-		xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+		xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 	"could not save newline to xmlBuffer");
 	}
 }
@@ -793,7 +793,7 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent)
 if (xmlSaveTree(ctxt, node) == -1 || xmlerrcxt->err_occurred)
 {
 	xmlFreeNode(newline);
-	xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+	xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 "could not save content to xmlBuffer");
 }
 			}
@@ -1004,7 +1004,7 @@ xmlpi(const char *target, text *arg, bool arg_is_null, bool *result_is_null)
 
 	if (pg_strcasecmp(target, "xml") == 0)
 		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR), /* really */
+(errcode(ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION),
  errmsg("invalid XML processing instruction"),
  errdetail("XML processing instruction target name cannot be \"%s\".", target)));
 
@@ -4383,7 +4383,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 	xpath_len = VARSIZE_ANY_EXHDR(xpath_expr_text);
 	if (xpath_len == 0)
 		ereport(ERROR,
-(errcode(ERRCODE_DATA_EXCEPTION),
+(errcode(ERRCODE_INVALID_ARGUMENT_FOR_XQUERY),
  errmsg("empty XPath expression")));
 
 	string = pg_xmlCharStrndup(datastr, len);
@@ -4456,7 +4456,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		 */
 		xpathcomp = xmlXPathCtxtCompile(xpathctx, xpath_expr);
 		if (xpathcomp == NULL || xmlerrcxt->err_occurred)
-			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNA

pgbench: Improve result outputs related to failed transactinos

2024-09-20 Thread Yugo Nagata
Hi,

I would like to improve the following two points on the result outputs
of pgbench related to faild transaction. The patch is attached.

(1) Output per-script statistics even when there are no successful
transaction if there is any failed transactions due to serialization
or deadlock errors.

Previously, per-script statistics were never output when any transactions
are failed. However, it is reasonable to report per-script failed transactions
if they are due to serialization or deadlock errors, since these kinds of
failures are now objects to be reported.
 
This is fixed by modifying the following condition to use "total_cnt <= 0".

   /* Remaining stats are nonsensical if we failed to execute any xacts */
   if (total->cnt + total->skipped <= 0)
   return;

(2) Avoid to print "NaN%" in lines on failed transaction reports.
   
If the total number of successful, skipped, and failed transactions is zero,
we don't have to report the number of failed transactions as similar to that
the number of skipped transactions is not reported in this case.

So, I moved the check of total_cnt mentioned above before reporting the number
of faild transactions. Also, I added a check of script_total_cnt before
reporting per-script number of failed transactions.

Regards,
Yugo Nagata

-- 
Yugo Nagata 
>From 000340660f2c567164d7f33cf622f8225a046262 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Thu, 19 Sep 2024 01:37:54 +0900
Subject: [PATCH] pgbench: Improve result outputs related to failed
 transactinos

Previously, per-script statistics were never output when all
transactions are failed due to serialization or deadlock errors.
However, it is reasonable to report such information if there
are ones even when there are no successful transaction since
these failed transactions are now objects to be reported.

Meanwhile, if the total number of successful, skipped, and
failed transactions is zero, we don't have to report the number
of failed transactions as similar to the number of skipped
transactions, which avoids to print "NaN%" in lines on failed
transaction reports.
---
 src/bin/pgbench/pgbench.c | 68 +--
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 61618f2e18..b841bd28b2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -6391,6 +6391,13 @@ printResults(StatsData *total,
 			   total->cnt);
 	}
 
+	/*
+	 * Remaining stats are nonsensical if we failed to execute any xacts
+	 * due to others than serialization or deadlock errors
+	 */
+	if (total_cnt <= 0)
+		return;
+
 	printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
 		   failures, 100.0 * failures / total_cnt);
 
@@ -6412,10 +6419,6 @@ printResults(StatsData *total,
 		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
 	}
 
-	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt + total->skipped <= 0)
-		return;
-
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 			   total->skipped, 100.0 * total->skipped / total_cnt);
@@ -6491,37 +6494,40 @@ printResults(StatsData *total,
 	   100.0 * sstats->cnt / total->cnt,
 	   sstats->cnt / bench_duration);
 
-printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
-	   script_failures,
-	   100.0 * script_failures / script_total_cnt);
-
-if (failures_detailed)
+if (script_total_cnt > 0)
 {
-	printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
-		   sstats->serialization_failures,
-		   (100.0 * sstats->serialization_failures /
-			script_total_cnt));
-	printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
-		   sstats->deadlock_failures,
-		   (100.0 * sstats->deadlock_failures /
-			script_total_cnt));
-}
+	printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+		   script_failures,
+		   100.0 * script_failures / script_total_cnt);
 
-/* it can be non-zero only if max_tries is not equal to one */
-if (max_tries != 1)
-{
-	printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
-		   sstats->retried,
-		   100.0 * sstats->retried / script_total_cnt);
-	printf(" - total number of retries: " INT64_FORMAT "\n",
-		   sstats->retries);
-}
+	if (failures_detailed)
+	{
+		printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+			   sstats->serialization_failures,
+			   (100.0 * sstats->serialization_failures /
+script_total_cnt));
+		printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+			   sstats->deadlock_failures,
+			   (100.0 * sstats->deadlock_failures /
+script_total_cnt));
+	}
 
-if (throttle_delay && latency_limit && script_total_

Re: Should rolpassword be toastable?

2024-09-20 Thread Tom Lane
Nathan Bossart  writes:
> Here is a v3 patch set that fixes the test comment and a compiler warning
> in cfbot.

Nitpick: the message should say "%d bytes" not "%d characters",
because we're counting bytes.  Passes an eyeball check otherwise.

regards, tom lane




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-20 Thread Peter Geoghegan
On Fri, Sep 20, 2024 at 10:59 AM Peter Geoghegan  wrote:
> On Fri, Sep 20, 2024 at 10:07 AM Tomas Vondra  wrote:
> > Yes, I think backpatching to 17 would be fine. I'd be worried about
> > maybe disrupting some monitoring in production systems, but for 17 that
> > shouldn't be a problem yet. So fine with me.
>
> I'll commit minimal changes to _bt_first that at least make the
> counters consistent, then. I'll do so soon.

Pushed, thanks

-- 
Peter Geoghegan




Re: Why mention to Oracle ?

2024-09-20 Thread Tomas Vondra
On 9/20/24 19:48, Marcos Pegoraro wrote:
> Em sex., 20 de set. de 2024 às 13:18, David G. Johnston
> mailto:david.g.johns...@gmail.com>> escreveu:
> 
> It would be a boon to the community if someone were to put together
> a web/wiki page or mini-app that details this kind of information
> and, if considered accurate and relevant enough by the community,
> link to that more globally while also remove the random and
> incomplete references of this nature from the main documentation. 
> As it stands the info is at least relevant, and its incompleteness
> doesn't cause enough grief, IMO, to warrant its outright removal
> absent there existing an alternative.
> 
> 
> Oracle DOCs or MySQL DOCs or any other have these comparisons ? I don't
> think so, so why does Postgres have to mention it ? 
> 

I fail to see why would "entity X does not do A" be a good reason to not
do A ourselves. Commercial companies may have their own reasons not to
mention competing products, and few of those will likely apply to our
project. And maybe they're wrong to not do that, not us.

> All these places, and others I didn't find, I think it's correct to say
> Postgres' way of doing that, not what is different from Oracle.
> 

IMHO it's quite reasonable to say "we do X, but this other product
(which is what we try to mimic) does Y".


regards

-- 
Tomas Vondra




Re: FullTransactionIdAdvance question

2024-09-20 Thread Andres Freund
Hi,

On 2024-09-20 17:38:40 +0800, Andy Fan wrote:
> static inline void
> FullTransactionIdAdvance(FullTransactionId *dest)
> {
>   dest->value++;
> 
>   /* see FullTransactionIdAdvance() */
>   if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
>   return;
> 
>   while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
>   dest->value++;
> }
> 
> I understand this functiona as: 'dest->value++' increases the epoch when
> necessary and we don't want use the TransactionId which is smaller than
> FirstNormalTransactionId. But what is the point of the below code:
> 
> /* see FullTransactionIdAdvance() */
> if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
>   return;
>
> It looks to me it will be never true(I added a 'Assert(false);' above
> the return, make check-world pass).

Hm. I think in the past we did have some code that could end up calling
FullTransactionIdAdvance() on special xids for some reason, IIRC it was
related to BootstrapTransactionId. Turning those into a normal xid doesn't
seem quite right, I guess and could hide bugs.

But I'm not sure it'd not better to simply assert out in those cases.


> and if it is true somehow, retruning a XID which is smaller than
> FirstNormalTransactionId looks strange as well.

Well, it'd be true if you passed it a special xid.


> IIUC, should we remove it to save a prediction on each GetNewTransactionId
> call?

I could see adding an unlikely() to make sure the compiler orders the code to
make it statically predictable.

Greetings,

Andres Freund




Re: First draft of PG 17 release notes

2024-09-20 Thread Laurenz Albe
On Fri, 2024-09-20 at 10:02 -0400, Jonathan S. Katz wrote:
> Attached is a proposal for the major features section. This borrows from 
> the release announcement draft[1] and lists out features and themes that 
> have broad user impact. This was a bit challenging for this release, 
> because there are a lot of great features in PG17 that add up to a very 
> special release.
> 
> Feedback welcome.

I would have added the platform-independent binary collation provider.
And perhaps "pg_createsubscriber": that can be a game-changer for setting
up logical replication.

Yours,
Laurenz Albe




Re: pgsql: Don't enter parallel mode when holding interrupts.

2024-09-20 Thread Noah Misch
On Thu, Sep 19, 2024 at 09:25:05AM -0400, Robert Haas wrote:
> On Wed, Sep 18, 2024 at 3:27 AM Laurenz Albe  wrote:
> > On Wed, 2024-09-18 at 02:58 +, Noah Misch wrote:
> > > Don't enter parallel mode when holding interrupts.
> > >
> > > Doing so caused the leader to hang in wait_event=ParallelFinish, which
> > > required an immediate shutdown to resolve.  Back-patch to v12 (all
> > > supported versions).
> > >
> > > Francesco Degrassi
> > >
> > > Discussion: 
> > > https://postgr.es/m/CAC-SaSzHUKT=vzj8mpxydc_urpfax+yoa1hktcf4roz_q6z...@mail.gmail.com
> >
> > Does that warrant mention on this page?
> > https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html
> 
> IMHO, no. This seems too low-level and too odd to mention.

Agreed.  If I were documenting it, I would document it with the material for
writing opclasses.  It's probably too esoteric to document even there.

> TBH, I'm kind of surprised to learn that it's possible to start
> executing a query while holding an LWLock. I see Tom is expressing
> some doubts on the original thread, too. I wonder if we should instead
> be erroring out if an LWLock is held at the start of query execution
> -- or even earlier, like when we try to call a plpgsql function while
> holding one. Leaving parallel query aside, what would prevent us from
> attempting to reacquire the exact same LWLock that we already hold and
> self-deadlocking? Or attempting to acquire some other LWLock and
> deadlocking that way? I don't really feel like this is a parallel
> query problem. I don't think we should be trying to run any
> user-defined code while holding an LWLock, unless that code is written
> in C (or C++, Rust, etc.). Trying to run procedural code at that point
> doesn't seem reasonable.

Nothing prevents those lwlock deadlocks.  If you think it's worth breaking the
things folks use today (see original thread) in order to prevent that, please
do share that on the original thread.  I'm fine either way.  I think given
infinite resources across both postgresql.org and all extension maintainers, I
would do what you're thinking in v18 while in back branches, I would change
"erroring out" to "warn when assertions are enabled".  I also think it's a
low-priority bug, given the only known ways to reach it are C code or a custom
opclass.  Since resources aren't infinite, I'm inclined toward one of (a) stop
here or (b) all branches "warn when assertions are enabled" and maybe block
the plancache route discussed on the original thread.




Re: First draft of PG 17 release notes

2024-09-20 Thread Tom Lane
Bruce Momjian  writes:
> Patch applied to PG 17.

I don't see a push?

regards, tom lane




Re: First draft of PG 17 release notes

2024-09-20 Thread Laurenz Albe
On Fri, 2024-09-20 at 13:47 -0400, Jonathan S. Katz wrote:
> Please see attached.

> +
> + 
> +  Various query performance improvements, including to sequential reads
> +  using streaming I/O, write throughput under high concurrency, and
> +  searches over multiple values in a btree
> +  index.
> + 
> +

Perhaps that last part could be "and searches over IN-lists in a b-tree index".
It might be technically less correct, but I'd expect that it gives more people
the right idea.

> +
> +  +  
> linkend="app-pgcreatesubscriber">pg_createsubscriber,
> +  a utility that logical replicas from physical standbys
> +

There's a verb missing: "a utility that *creates* logical replicas..."

> +
> +  +  linkend="pgupgrade">pg_upgrade 
> now
> +  preserves replication slots on both publishers and subscribers
> +

I wonder if we should omit "on both publishers and subscribers".
It preserves replication slots anywhere, right?

> +
> + 
> +  New client-side connection option,  +  
> linkend="libpq-connect-sslnegotiation">sslnegotiation=direct,
> +  that allows direct TLS handshakes that avoids a round-trip negotation.
> + 
> +

It should be "that avoid".  Plural.

Yours,
Laurenz Albe




Re: First draft of PG 17 release notes

2024-09-20 Thread Bruce Momjian
On Fri, Sep 20, 2024 at 04:05:11PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Patch applied to PG 17.
> 
> I don't see a push?

Push was delayed because my test script found some uncommitted files due
to earlier testing.  Should be fine now.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Nathan Bossart
On Fri, Sep 20, 2024 at 01:56:16PM -0500, Nathan Bossart wrote:
> On Fri, Sep 20, 2024 at 07:20:15PM +0200, Michael Banck wrote:
>> I noticed two headers are not in alphabetical order in pg_checkums.c,
>> patch attached.
> 
> This appears to be commit 280e5f1's fault.  Will fix.

Committed, thanks!

-- 
nathan




Re: First draft of PG 17 release notes

2024-09-20 Thread Tom Lane
Laurenz Albe  writes:
> [ assorted corrections ]

I fixed a couple of these before seeing your message.

regards, tom lane




Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Michael Banck
On Fri, Sep 20, 2024 at 03:20:16PM -0500, Nathan Bossart wrote:
> On Fri, Sep 20, 2024 at 01:56:16PM -0500, Nathan Bossart wrote:
> > On Fri, Sep 20, 2024 at 07:20:15PM +0200, Michael Banck wrote:
> >> I noticed two headers are not in alphabetical order in pg_checkums.c,
> >> patch attached.
> > 
> > This appears to be commit 280e5f1's fault.  Will fix.

Oops, that was my fault then :)

> Committed, thanks!

Thanks!


Michael




Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Masahiko Sawada
On Thu, Sep 19, 2024 at 9:26 PM Amit Kapila  wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith  wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH 
> > (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.

I agree with Amit. We also publish t2's future generated column in the
first example and t1's future generated columns in the second example.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Why mention to Oracle ?

2024-09-20 Thread Marcos Pegoraro
Em sex., 20 de set. de 2024 às 15:11, Tomas Vondra 
escreveu:

> IMHO it's quite reasonable to say "we do X, but this other product
> (which is what we try to mimic) does Y".
>

Ok, for Data Type Formatting Functions is fine, if they were really copied
from, but the others ...
Bug Reporting Guidelines, LOCK, SELECT, ROLLBACK TO SAVEPOINT and CURSORS


Re: Why mention to Oracle ?

2024-09-20 Thread Jonah H. Harris
On Fri, Sep 20, 2024 at 4:27 PM Marcos Pegoraro  wrote:

> Em sex., 20 de set. de 2024 às 15:11, Tomas Vondra 
> escreveu:
>
>> IMHO it's quite reasonable to say "we do X, but this other product
>> (which is what we try to mimic) does Y".
>>
>
> Ok, for Data Type Formatting Functions is fine, if they were really copied
> from, but the others ...
> Bug Reporting Guidelines, LOCK, SELECT, ROLLBACK TO SAVEPOINT and CURSORS
>

Seems to me this has already been answered well multiple times by multiple
people; I’m not sure why this is such an issue, or one that warrants
continued discussion.

By your own admission, you wouldn’t see the value, where others who came
from Oracle would. Additionally, your assumption is incorrect: many Oracle
databases are migrated to Postgres, more-so today than when much of that
was written.

You’re arguing against it being in the docs and talking about how much
better it would be in other more focused content, which does have some
merit. But, at the same time, you’re neither qualified nor volunteering to
write it. As such, getting rid of it here serves what purpose other than
omitting useful information to those it would benefit directly in the
documentation?


Re: Why mention to Oracle ?

2024-09-20 Thread Roberto Mello
On Fri, Sep 20, 2024 at 11:43 AM Marcos Pegoraro  wrote:

>
> My suggestion is: Postgres DOCs are written and have to be read by
> Postgres users, just that. If you are Oracle user, search for a tutorial on
> how to migrate to Postgres or find tools for it, but not in DOCs
>

As Tomas, Tom and others pointed out, it's simply because it is a common
database people
migrate from and ask for help, and people contributed patches to the
documentation out of their
own need, or to help others.

(Several) years ago I wrote a since-deprecated section of the docs to port
from PL/SQL to
PL/pgSQL because it was needed back then.

Because if you write something for Oracle users, SQL Server users can claim
> why there is no "Porting from T-SQL to PL/pgSQL" ?
> And MySQL users can do the same, and so on.
>

And those users are welcome to contribute patches to the docs explaining
why they think
those additions to our docs would be helpful.


> Maybe Oracle was the most common DB which migrated to Postgres, but I'm
> not sure this is true for today.
>

I don't know about you, but in my experience that is absolutely not true. I
deal with lots of people
and companies migrating from Oracle, or whose staff have experience with
Oracle and need
help adapting that knowledge to Postgres.

Roberto


Re: Why mention to Oracle ?

2024-09-20 Thread Tomas Vondra



On 9/20/24 14:36, Marcos Pegoraro wrote:
> Why PostgreSQL DOCs needs to show or compare the Oracle way of doing
> things ?
> 
> I understand that on page Porting from Oracle PL/SQL is ok to mention
> Oracle, but there are other places where it's not needed. Or, if it's ok
> to mention, why not mention SQL Server or MySQL or any other ?
> 

It's not quite clear to me whether your suggestion is to not mention any
other databases ever, or to always mention every existing one. ;-)

I didn't dig into all the places you mention, but I'd bet those places
reference Oracle simply because it was the most common DB people either
migrated from or needed to support in their application next to PG, and
thus were running into problems. The similarity of the interfaces and
SQL dialects also likely played a role. It's less likely to run into
subtle behavior differences e.g. SQL Server when you have to rewrite
T-SQL stuff from scratch anyway.


regards

-- 
Tomas Vondra




pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Michael Banck
Hi,

I noticed two headers are not in alphabetical order in pg_checkums.c,
patch attached.


Michael
>From e4d6d6503b4c14685424c6a04c5eb2ae29024bf6 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Fri, 20 Sep 2024 19:17:43 +0200
Subject: [PATCH] Reorder headers in pg_checkums.c in alphabetical order

---
 src/bin/pg_checksums/pg_checksums.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 68a68eb0fa..f5f7ff1045 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -16,8 +16,8 @@
 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 
 #include "common/controldata_utils.h"
-- 
2.39.2



Re: Using per-transaction memory contexts for storing decoded tuples

2024-09-20 Thread Masahiko Sawada
On Thu, Sep 19, 2024 at 10:46 PM Amit Kapila  wrote:
>
> On Fri, Sep 20, 2024 at 5:13 AM David Rowley  wrote:
> >
> > On Fri, 20 Sept 2024 at 05:03, Masahiko Sawada  
> > wrote:
> > > I've done other benchmarking tests while changing the memory block
> > > sizes from 8kB to 8MB. I measured the execution time of logical
> > > decoding of one transaction that inserted 10M rows. I set
> > > logical_decoding_work_mem large enough to avoid spilling behavior. In
> > > this scenario, we allocate many memory chunks while decoding the
> > > transaction and resulting in calling more malloc() in smaller memory
> > > block sizes. Here are results (an average of 3 executions):
> >
> > I was interested in seeing the memory consumption with the test that
> > was causing an OOM due to the GenerationBlock fragmentation.
> >
>
> +1. That test will be helpful.

Sure. Here are results of peak memory usage in bytes reported by
MemoryContextMemAllocated() (when rb->size shows 43MB):

8kB:   52,371,328
16kB: 52,887,424
32kB: 53,428,096
64kB: 55,099,264
128kB:   86,163,328
256kB: 149,340,032
512kB: 273,334,144
1MB:523,419,520
2MB: 1,021,493,120
4MB: 1,984,085,888
8MB: 2,130,886,528

Probably we can increase the size to 64kB?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [18] Fix a few issues with the collation cache

2024-09-20 Thread Jeff Davis
On Wed, 2024-08-14 at 16:30 -0700, Jeff Davis wrote:
> On Thu, 2024-08-08 at 12:24 -0700, Jeff Davis wrote:
> > The collation cache, which maps collation oids to pg_locale_t
> > objects,
> > has a few longstanding issues:
> 
> Here's a patch set v2.

Updated and rebased.

Regards,
Jeff Davis

From 224470bc4d0660dc11940f5595031eecb0319d62 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 7 Aug 2024 11:05:46 -0700
Subject: [PATCH v4 1/7] Tighten up make_libc_collator() and
 make_icu_collator().

Return the result rather than using an out parameter, and make it the
caller's responsibility to copy it into the right context. Ensure that
no paths leak a collator.

The function make_icu_collator() doesn't have any external callers, so
change it to be static. Also, when re-opening with rules, use a
try/finally block to avoid leaking the collator.

In make_libc_collator(), if the first newlocale() succeeds and the
second one fails, close the first locale_t object.

Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.ca...@j-davis.com
---
 src/backend/utils/adt/pg_locale.c | 126 +++---
 src/include/utils/pg_locale.h |   4 -
 2 files changed, 80 insertions(+), 50 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5bef1b113a8..12ba5726f77 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1297,14 +1297,15 @@ report_newlocale_failure(const char *localename)
 }
 
 /*
- * Initialize the locale_t field.
+ * Create a locale_t with the given collation and ctype.
  *
- * The "C" and "POSIX" locales are not actually handled by libc, so set the
- * locale_t to zero in that case.
+ * The "C" and "POSIX" locales are not actually handled by libc, so return
+ * NULL.
+ *
+ * Ensure that no path leaks a locale_t.
  */
-static void
-make_libc_collator(const char *collate, const char *ctype,
-   pg_locale_t result)
+static locale_t
+make_libc_collator(const char *collate, const char *ctype)
 {
 	locale_t	loc = 0;
 
@@ -1343,7 +1344,11 @@ make_libc_collator(const char *collate, const char *ctype,
 			errno = 0;
 			loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
 			if (!loc)
+			{
+if (loc1)
+	freelocale(loc1);
 report_newlocale_failure(ctype);
+			}
 		}
 		else
 			loc = loc1;
@@ -1360,60 +1365,78 @@ make_libc_collator(const char *collate, const char *ctype,
 #endif
 	}
 
-	result->info.lt = loc;
+	return loc;
 }
 
-void
-make_icu_collator(const char *iculocstr,
-  const char *icurules,
-  struct pg_locale_struct *resultp)
-{
+/*
+ * Create a UCollator with the given locale string and rules.
+ *
+ * Ensure that no path leaks a UCollator.
+ */
 #ifdef USE_ICU
-	UCollator  *collator;
-
-	collator = pg_ucol_open(iculocstr);
-
-	/*
-	 * If rules are specified, we extract the rules of the standard collation,
-	 * add our own rules, and make a new collator with the combined rules.
-	 */
-	if (icurules)
+static UCollator *
+make_icu_collator(const char *iculocstr, const char *icurules)
+{
+	if (!icurules)
 	{
-		const UChar *default_rules;
-		UChar	   *agg_rules;
+		/* simple case without rules */
+		return pg_ucol_open(iculocstr);
+	}
+	else
+	{
+		UCollator  *collator_std_rules;
+		UCollator  *collator_all_rules;
+		const UChar *std_rules;
 		UChar	   *my_rules;
-		UErrorCode	status;
+		UChar	   *all_rules;
 		int32_t		length;
+		int32_t		total;
+		UErrorCode	status;
 
-		default_rules = ucol_getRules(collator, &length);
+		/*
+		 * If rules are specified, we extract the rules of the standard
+		 * collation, add our own rules, and make a new collator with the
+		 * combined rules.
+		 */
 		icu_to_uchar(&my_rules, icurules, strlen(icurules));
 
-		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
-		u_strcpy(agg_rules, default_rules);
-		u_strcat(agg_rules, my_rules);
+		collator_std_rules = pg_ucol_open(iculocstr);
 
-		ucol_close(collator);
+		std_rules = ucol_getRules(collator_std_rules, &length);
+
+		total = u_strlen(std_rules) + u_strlen(my_rules) + 1;
+
+		/* avoid leaking collator on OOM */
+		all_rules = palloc_extended(sizeof(UChar) * total, MCXT_ALLOC_NO_OOM);
+		if (!all_rules)
+		{
+			ucol_close(collator_std_rules);
+			ereport(ERROR,
+	(errcode(ERRCODE_OUT_OF_MEMORY),
+	 errmsg("out of memory")));
+		}
+
+		u_strcpy(all_rules, std_rules);
+		u_strcat(all_rules, my_rules);
+
+		ucol_close(collator_std_rules);
 
 		status = U_ZERO_ERROR;
-		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
-  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
+			UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
+			NULL, &status);
 		if (U_FAILURE(status))
+		{
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 	 errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
 			iculocstr, icurules, u

Re: Allow logical failover slots to wait on synchronous replication

2024-09-20 Thread John H
Hi,

On Mon, Sep 16, 2024 at 2:25 AM shveta malik  wrote:

> > >
> > > If we don't do something similar, then aren't there chances that we
> > > keep on waiting on the wrong lsn[mode] for the case when mode =
> > > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> > > different mode's lsn. Is my understanding correct?
> > >

Let me take a deeper look at this, I think you're right though.

>
> I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or
> SYNC_REP_WAIT_APPLY (higher one), we need to wait for
> lsn[SYNC_REP_WAIT_FLUSH].
>

I'm not sure if I agree with that. I think the sychronous_commit mode should be
a good enough proxy for what the user wants from a durability
perspective for their
application.

For an application writing to the database, if they've set mode as
SYNC_REP_WAIT_WRITE
as fine being when a commit is treated as durable, why do we need to
be concerned
with overriding that to SYNC_REP_WAIT_FLUSH?

Similarly, if a user has mode set to SYNC_REP_WAIT_APPLY, to me it's even more
confusing that there can be scenarios where the application wouldn't
see the data as committed
nor would subsequent reads but a logical consumer would be able to.
The database should be
treated as the source of truth and I don't think logical consumers
should be ever ahead of
what the database is treating as committed.


Thanks,

-- 
John Hsu - Amazon Web Services




Re: Why don't we consider explicit Incremental Sort?

2024-09-20 Thread Richard Guo
On Sun, Sep 15, 2024 at 6:01 AM Tomas Vondra  wrote:
> Hmmm ... I wasn't involved in that discussion and I haven't studied the
> thread now, but this seems a bit weird to me. If the presorted keys have
> low cardinality, can't the full sort be faster?
>
> Maybe it's not possible with the costing changes (removing the
> penalization), in which case it would be fine to not consider the full
> sort, because we'd just throw it away immediately. But if the full sort
> could be costed as cheaper, I'd say that might be wrong.

IIUC, we initially applied a 1.5x pessimism factor to the cost
estimates of incremental sort in an effort to avoid performance
regressions in cases with a large skew in the number of rows within
the presorted groups.  OTOH, this also restricted our ability to
leverage incremental sort when it would be preferable.

I agree with you that sometimes the definition of 'regression' can
depend on when the alternative plans are introduced.  Imagine if we
initially did not have the 1.5x pessimism factor and introduced it
later, it would be treated as a 'regression' because some queries that
could benefit from incremental sort would then have to resort to full
sort.

In commit 4a29eabd1, we removed the 1.5x pessimism factor to allow
incremental sort to have a fairer chance at being chosen against a
full sort.  With this change, the cost model now tends to favor
incremental sort as being cheaper than full sort in the presence of
presorted keys, making it unnecessary to consider full sort in such
cases, because, as you mentioned, we'd just throw the full sort path
away immediately.  So 4a29eabd1 also modified the logic so that if
there are presorted keys, we only create an incremental sort path.
As for the potential performance regressions caused by these changes,
4a29eabd1 opted to use enable_incremental_sort as an escape hatch.

I think the same theory applies to mergejoins too.  We can leverage
incremental sort if it is enabled and there are presorted keys,
assuming that it is always faster than full sort, and use the GUC as
an escape hatch in the worst case.

So here is the v2 patch, which is almost the same as v1, but includes
changes in test cases and comments, and also includes a commit
message.  I'll put it in the commitfest.

Thanks
Richard


v2-0001-Consider-explicit-incremental-sort-for-mergejoins.patch
Description: Binary data


Re: not null constraints, again

2024-09-20 Thread Alvaro Herrera
On 2024-Sep-20, Tender Wang wrote:

> jian he  于2024年9月20日周五 11:34写道:
> 
> > another bug.
> > I will dig later, just want to share it first.
> >
> > minimum producer:
> > drop table if exists pp1,cc1, cc2,cc3;
> > create table pp1 (f1 int );
> > create table cc1 () inherits (pp1);
> > create table cc2() inherits(pp1,cc1);
> > create table cc3() inherits(pp1,cc1,cc2);
> >
> > alter table pp1 alter f1 set not null;
> > ERROR:  tuple already updated by self
> 
> I guess some place needs call CommandCounterIncrement().

Yeah ... this fixes it:

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 579b8075b5..3f66e43b9a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7877,12 +7877,6 @@ ATExecSetNotNull(List **wqueue, Relation rel, char 
*conName, char *colName,
{
List   *children;
 
-   /*
-* Make previous addition visible, in case we process the same
-* relation again while chasing down multiple inheritance trees.
-*/
-   CommandCounterIncrement();
-
children = find_inheritance_children(RelationGetRelid(rel),

 lockmode);
 
@@ -7890,6 +7884,8 @@ ATExecSetNotNull(List **wqueue, Relation rel, char 
*conName, char *colName,
{
Relationchildrel = table_open(childoid, NoLock);
 
+   CommandCounterIncrement();
+
ATExecSetNotNull(wqueue, childrel, conName, colName,
 recurse, true, 
lockmode);
table_close(childrel, NoLock);


I was trying to save on the number of CCIs that we perform, but it's
likely not a wise expenditure of time given that this isn't a very
common scenario anyway.  (Nobody with thousands of millions of children
tables will try to run thousands of commands in a single transaction
anyway ... so saving a few increments doesn't make any actual
difference.  If such people exist, they can show us their use case and
we can investigate and fix it then.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php




Re: attndims, typndims still not enforced, but make the value within a sane threshold

2024-09-20 Thread Bruce Momjian
On Fri, Sep 20, 2024 at 10:11:00AM +0800, jian he wrote:
> On Wed, Sep 18, 2024 at 10:35 PM jian he  wrote:
> >
> > > The last time this was discussed, I think the conclusion was
> > > we should remove attndims and typndims entirely on the grounds
> > > that they're useless.  I certainly don't see a point in adding
> > > more logic that could give the misleading impression that they
> > > mean something.
> > >
> 
> 
> attached patch removes attndims and typndims entirely.
> some tests skipped in my local my machine, not skipped are all OK.

I have been hoping for a patch links this because I feel the existence
of these system columns is deceptive since we don't honor them properly.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: First draft of PG 17 release notes

2024-09-20 Thread Jonathan S. Katz

On 5/9/24 12:03 AM, Bruce Momjian wrote:

I have committed the first draft of the PG 17 release notes;  you can
see the results here:

release-17:  188

I welcome feedback.  For some reason it was an easier job than usual.


Attached is a proposal for the major features section. This borrows from 
the release announcement draft[1] and lists out features and themes that 
have broad user impact. This was a bit challenging for this release, 
because there are a lot of great features in PG17 that add up to a very 
special release.


Feedback welcome.

Thanks,

Jonathan

[1] 
https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=releases/17/release.en.md;hb=HEAD
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 9d69016cd6..7691ee3672 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -18,7 +18,62 @@

 

-TO BE COMPLETED LATER
+
+ 
+  New memory management system for VACUUM, which reduces
+  memory consumption and can improve overall vacuuming performance.
+ 
+
+
+
+ 
+  New SQL/JSON capabilities, including constructors,
+  identity functions, and the JSON_TABLE()
+  function, which converts JSON data into a table representation.
+ 
+
+
+
+ 
+  Various query performance improvements, including to sequential reads
+  using streaming I/O, write throughput under high concurrency, and
+  searches over multiple values in a btree
+  index.
+ 
+
+
+
+ 
+  Logical replication now supports failover control, and pg_upgrade now
+  preserves replications slots on both publishers and subscribers.
+ 
+
+
+
+ 
+  New client-side connection option, sslnegotiation=direct,
+  that allows direct TLS handshakes that avoids a round-trip negotation.
+ 
+
+
+
+ 
+  pg_basebackup
+  now supports incremental backup.
+ 
+
+
+
+ 
+  COPY adds a new 
option,
+  ON_ERROR ignore, that allows a copy operation to
+  continue in the event of an error.
+ 
+

 



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-20 Thread Peter Geoghegan
On Fri, Sep 20, 2024 at 10:07 AM Tomas Vondra  wrote:
> Yes, I think backpatching to 17 would be fine. I'd be worried about
> maybe disrupting some monitoring in production systems, but for 17 that
> shouldn't be a problem yet. So fine with me.

I'll commit minimal changes to _bt_first that at least make the
counters consistent, then. I'll do so soon.

> FWIW I wonder how likely is it that someone has some sort of alerting
> tied to this counter. I'd bet few people do. It's probably more about a
> couple people looking at explain plans, but they'll be confused even if
> we change that only starting with 17.

On 17 the behavior in this area is totally different, either way.

> Ah, OK. So we do probe the index like this. I was under the impression
> we don't do that. But yeah, this makes sense.

Well, we don't have *explicit* next-key probes. If you think of values
like "Aardvark" + infinitesimal as just another array value (albeit
one that requires a little special handling in _bt_first), then there
are no explicit probes. There are no true special cases required.

Maybe this sounds like a very academic point. I don't think that it
is, though. Bear in mind that even when _bt_first searches for index
tuples matching a value like "Aardvark" + infinitesimal, there's some
chance that _bt_search will return a leaf page with tuples that the
index scan ultimately returns. And so there really is no "separate
explicit probe" of the kind the MDAM paper contemplates.

When this happens, we won't get any exact matches for the sentinel
search value, but there could still be matches for (say) "WHERE a =
'Abacus' AND b = 55" on that same leaf page. In general, repositioning
the scan to later "within" the 'Abacus' index tuples might not be
required -- our initial position (based on the sentinel search key)
could be "close enough". This outcome is more likely to happen if the
query happened to be written "WHERE b = 1", rather than "WHERE b =
55".

> Yes, it does. Most of my confusion was caused by my belief that we can't
> probe the index for the next value without "incrementing" the current
> value, but that was a silly idea.

It's not a silly idea, I think. Technically that understanding is
fairly accurate -- we often *do* have to "increment" to get to the
next value (though reading the next value from an index tuple and then
repositioning using it with later/less significant scan keys is the
other possibility).

Incrementing is always possible, even without skip support, because we
can always fall back on +infinitesimal style sentinel values (AKA
SK_BT_NEXTPRIOR values). That's the definitional sleight-of-hand that
allows _bt_advance_array_keys to not have to think about skip arrays
as a special case, regardless of whether or not they happen to have
skip support.

> > It might be possible to add skip support for text, but there wouldn't
> > be much point.
> >
>
> Stupid question - so why does it make sense for types like int? There
> can also be a lot of values between the current and the next value, so
> why would that be very different from "incrementing" a text value?

Not a stupid question at all. You're right; it'd be the same.

Obviously, there are at least some workloads (probably most) where any
int columns will contain values that are more or less fully
contiguous. I also expect there to be some workloads where int columns
appear in B-Tree indexes that contain values with large gaps between
neighboring values (e.g., because the integers are hash values). We'll
always use skip support for any omitted prefix int column (same with
any opclass that offers skip support), but we can only expect to see a
benefit in the former "dense" cases -- never in the latter "sparse"
cases.

The MDAM paper talks about an adaptive strategy for dense columns and
sparse columns. I don't see any point in that, and assume that it's
down to some kind of implementation deficiencies in NonStop SQL back
in the 1990s. I can just always use skip support in the hope that
integer column data will turn out to be "sparse" because there's no
downside to being optimistic about it. The access patterns are exactly
the same as they'd be with skip support disabled.

My "academic point" about not having *explicit* next-key probes might
make more sense now. This is the thing that makes it okay to always be
optimistic about types with skip support containing "dense" data.

FWIW I actually have skip support for the UUID opclass. I implemented
it to have test coverage for pass-by-reference types in certain code
paths, but it's otherwise I don't expect it to be useful -- in
practice all UUID columns contain "sparse" data. There's still no real
downside to it, though. (I wouldn't try to do it with text because
it'd be much harder to implement skip support correctly, especially
with collated text.)

--
Peter Geoghegan




Re: Why mention to Oracle ?

2024-09-20 Thread Tom Lane
Tomas Vondra  writes:
> On 9/20/24 14:36, Marcos Pegoraro wrote:
>> Why PostgreSQL DOCs needs to show or compare the Oracle way of doing
>> things ?

> I didn't dig into all the places you mention, but I'd bet those places
> reference Oracle simply because it was the most common DB people either
> migrated from or needed to support in their application next to PG, and
> thus were running into problems. The similarity of the interfaces and
> SQL dialects also likely played a role. It's less likely to run into
> subtle behavior differences e.g. SQL Server when you have to rewrite
> T-SQL stuff from scratch anyway.

As far as the mentions in "Data Type Formatting Functions" go, those
are there because those functions are not in the SQL standard; we
stole the API definitions for them from Oracle, lock stock and barrel.
(Except for the discrepancies that are called out by referencing what
Oracle does differently.)  A number of the other references probably
have similar origins.

regards, tom lane




Re: Why mention to Oracle ?

2024-09-20 Thread David G. Johnston
On Fri, Sep 20, 2024 at 5:37 AM Marcos Pegoraro  wrote:

> Why PostgreSQL DOCs needs to show or compare the Oracle way of doing
> things ?
>
> I understand that on page Porting from Oracle PL/SQL is ok to mention
> Oracle, but there are other places where it's not needed. Or, if it's ok to
> mention, why not mention SQL Server or MySQL or any other ?
>

It would be a boon to the community if someone were to put together a
web/wiki page or mini-app that details this kind of information and, if
considered accurate and relevant enough by the community, link to that more
globally while also remove the random and incomplete references of this
nature from the main documentation.  As it stands the info is at least
relevant, and its incompleteness doesn't cause enough grief, IMO, to
warrant its outright removal absent there existing an alternative.


> Bug Reporting Guidelines
> Especially refrain from merely saying that “This is not what SQL
> says/Oracle does.”
>

I would agree that this admonishment be re-worded.  I suggest:

If referencing some external authority, like the SQL Standard or another
relational database product, mention it, but also include the literal
output values.

David J.


Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

2024-09-20 Thread Fujii Masao



On 2024/09/19 19:16, Anton A. Melnikov wrote:


On 18.09.2024 21:04, Fujii Masao wrote:


-    CreateCheckPoint(flags);
-    ckpt_performed = true;
+    ckpt_performed = CreateCheckPoint(flags);

This change could result in the next scheduled checkpoint being
triggered in 15 seconds if a checkpoint is skipped, which isn’t
the intended behavior.


Thanks for pointing this out! This is really bug.
Rearranged the logic a bit to save the previous behavior
in the v3 attached.


Thanks for updating the patch!

I've attached the updated version (0001.patch). I made some cosmetic changes,
including reverting the switch in the entries for 
pg_stat_get_checkpointer_write_time
and pg_stat_get_checkpointer_sync_time in pg_proc.dat, as I didn’t think
that change was necessary. Could you please review the latest version?

After we commit 0001.patch, how about applying 0002.patch, which updates
the documentation for the pg_stat_checkpointer view to clarify what types
of checkpoints and restartpoints each counter tracks?

In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From 859f3fecb4fb4900b6ce12f6346c5d9565fbc072 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Fri, 20 Sep 2024 11:33:07 +0900
Subject: [PATCH v4 1/2] Add num_done counter to the pg_stat_checkpointer view.

Checkpoints can be skipped when the server is idle. The existing num_timed and
num_requested counters in pg_stat_checkpointer track both completed and
skipped checkpoints, but there was no way to count only the completed ones.

This commit introduces the num_done counter, which tracks only completed
checkpoints, making it easier to see how many were actually performed.

Bump catalog version.

Author: Anton A. Melnikov
Reviewed-by: Fujii Masao
Discussion: 
https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e...@oss.nttdata.com
---
 doc/src/sgml/monitoring.sgml  | 11 +-
 src/backend/access/transam/xlog.c |  9 -
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/checkpointer.c | 39 ---
 .../utils/activity/pgstat_checkpointer.c  |  2 +
 src/backend/utils/adt/pgstatfuncs.c   |  6 +++
 src/include/access/xlog.h |  2 +-
 src/include/catalog/pg_proc.dat   |  6 +++
 src/include/pgstat.h  |  1 +
 src/test/regress/expected/rules.out   |  1 +
 10 files changed, 60 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a2fda4677d..19bf0164f1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3063,7 +3063,16 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
num_requested bigint
   
   
-   Number of requested checkpoints that have been performed
+   Number of backend requested checkpoints
+  
+ 
+
+  
+  
+   num_done bigint
+  
+  
+   Number of checkpoints that have been performed
   
  
 
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 853ab06812..64304d77d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6878,8 +6878,11 @@ update_checkpoint_display(int flags, bool restartpoint, 
bool reset)
  * In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
  * both the record marking the completion of the checkpoint and the location
  * from which WAL replay would begin if needed.
+ *
+ * Returns true if a new checkpoint was performed, or false if it was skipped
+ * because the system was idle.
  */
-void
+bool
 CreateCheckPoint(int flags)
 {
boolshutdown;
@@ -6971,7 +6974,7 @@ CreateCheckPoint(int flags)
END_CRIT_SECTION();
ereport(DEBUG1,
(errmsg_internal("checkpoint skipped 
because system is idle")));
-   return;
+   return false;
}
}
 
@@ -7353,6 +7356,8 @@ CreateCheckPoint(int flags)
 
CheckpointStats.ckpt_segs_added,
 
CheckpointStats.ckpt_segs_removed,
 
CheckpointStats.ckpt_segs_recycled);
+
+   return true;
 }
 
 /*
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 7fd5d256a1..49109dbdc8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/

Re: Why mention to Oracle ?

2024-09-20 Thread Marcos Pegoraro
Em sex., 20 de set. de 2024 às 12:53, Tom Lane  escreveu:

> As far as the mentions in "Data Type Formatting Functions" go, those
> are there because those functions are not in the SQL standard; we
> stole the API definitions for them from Oracle, lock stock and barrel.
> (Except for the discrepancies that are called out by referencing what
> Oracle does differently.)  A number of the other references probably
> have similar origins.
>

All the time we see somebody adding a new function to Postgres core that
exists in Python or GO or MySQL, but none of them are mentioned on DOCS.

I did never use Oracle but I'm almost sure on Oracle DOCS there are no
mentions of Postgres, right ? Why do we need to mention it ?

Regards
Marcos


Re: Why mention to Oracle ?

2024-09-20 Thread Marcos Pegoraro
Em sex., 20 de set. de 2024 às 11:56, Tomas Vondra 
escreveu:

> It's not quite clear to me whether your suggestion is to not mention any
> other databases ever, or to always mention every existing one. ;-)
>

My suggestion is: Postgres DOCs are written and have to be read by Postgres
users, just that. If you are Oracle user, search for a tutorial on how to
migrate to Postgres or find tools for it, but not in DOCs
Because if you write something for Oracle users, SQL Server users can claim
why there is no "Porting from T-SQL to PL/pgSQL" ?
And MySQL users can do the same, and so on.

Oracle simply because it was the most common DB people either
> migrated from or needed to support in their application next to PG, and
> thus were running into problems.
>

Maybe Oracle was the most common DB which migrated to Postgres, but I'm
not sure this is true for today.

regards
Marcos


Re: First draft of PG 17 release notes

2024-09-20 Thread Jonathan S. Katz

On 9/20/24 12:55 PM, Laurenz Albe wrote:

On Fri, 2024-09-20 at 10:02 -0400, Jonathan S. Katz wrote:

Attached is a proposal for the major features section. This borrows from
the release announcement draft[1] and lists out features and themes that
have broad user impact. This was a bit challenging for this release,
because there are a lot of great features in PG17 that add up to a very
special release.

Feedback welcome.


I would have added the platform-independent binary collation provider.
And perhaps "pg_createsubscriber": that can be a game-changer for setting
up logical replication.


I was on the fence about that, mostly because it'd make that sentence 
too much of a mouthful, but I do agree.


IIRC (didn't get to check) we did have a precedent for sublists in the 
major features, so I broke this one up. Please see attached.


Jonathan


diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 9d69016cd6..fdbbbed07a 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -18,7 +18,81 @@

 

-TO BE COMPLETED LATER
+
+ 
+  New memory management system for VACUUM, which reduces
+  memory consumption and can improve overall vacuuming performance.
+ 
+
+
+
+ 
+  New SQL/JSON capabilities, including constructors,
+  identity functions, and the JSON_TABLE()
+  function, which converts JSON data into a table representation.
+ 
+
+
+
+ 
+  Various query performance improvements, including to sequential reads
+  using streaming I/O, write throughput under high concurrency, and
+  searches over multiple values in a btree
+  index.
+ 
+
+
+
+ 
+  Logical replication enhancements, including:
+  
+   
+
+ Failover control
+
+   
+   
+
+ pg_createsubscriber,
+  a utility that logical replicas from physical standbys
+
+   
+   
+
+ pg_upgrade now
+  preserves replication slots on both publishers and subscribers
+
+   
+  
+ 
+
+
+
+ 
+  New client-side connection option, sslnegotiation=direct,
+  that allows direct TLS handshakes that avoids a round-trip negotation.
+ 
+
+
+
+ 
+  pg_basebackup
+  now supports incremental backup.
+ 
+
+
+
+ 
+  COPY adds a new 
option,
+  ON_ERROR ignore, that allows a copy operation to
+  continue in the event of an error.
+ 
+

 



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Why mention to Oracle ?

2024-09-20 Thread Marcos Pegoraro
Em sex., 20 de set. de 2024 às 13:18, David G. Johnston <
david.g.johns...@gmail.com> escreveu:

> It would be a boon to the community if someone were to put together a
> web/wiki page or mini-app that details this kind of information and, if
> considered accurate and relevant enough by the community, link to that more
> globally while also remove the random and incomplete references of this
> nature from the main documentation.  As it stands the info is at least
> relevant, and its incompleteness doesn't cause enough grief, IMO, to
> warrant its outright removal absent there existing an alternative.
>

Oracle DOCs or MySQL DOCs or any other have these comparisons ? I don't
think so, so why does Postgres have to mention it ?

All these places, and others I didn't find, I think it's correct to say
Postgres' way of doing that, not what is different from Oracle.

regards
Marcos


Re: Why mention to Oracle ?

2024-09-20 Thread Tomas Vondra



On 9/20/24 19:31, Marcos Pegoraro wrote:
> Em sex., 20 de set. de 2024 às 12:53, Tom Lane  > escreveu:
> 
> As far as the mentions in "Data Type Formatting Functions" go, those
> are there because those functions are not in the SQL standard; we
> stole the API definitions for them from Oracle, lock stock and barrel.
> (Except for the discrepancies that are called out by referencing what
> Oracle does differently.)  A number of the other references probably
> have similar origins.
> 
> 
> All the time we see somebody adding a new function to Postgres core that
> exists in Python or GO or MySQL, but none of them are mentioned on DOCS.
> 

Which Python/Go/MySQL functions we added to Postgres, for example?

AFAIK we're now adding stuff that is either described by SQL standard,
or stuff that's our own invention. Neither cases would benefit from
explaining how other products behave. That's very different from the
interfaces we copied from Oracle.

> I did never use Oracle but I'm almost sure on Oracle DOCS there are no
> mentions of Postgres, right ? Why do we need to mention it ?
> 

I think Tom already explained that we copied a lot of this stuff from
Oracle, so it makes sense to explain in which cases the behavior
differs. I don't see how removing this would help users, it'd very
clearly make life harder for them.

I'm no fan of Oracle corp myself, but I admit I don't quite understand
why you're upset with the handful of places mentioning the product.


regards

-- 
Tomas Vondra




Re: First draft of PG 17 release notes

2024-09-20 Thread Bruce Momjian
On Fri, Sep 20, 2024 at 01:47:43PM -0400, Jonathan Katz wrote:
> On 9/20/24 12:55 PM, Laurenz Albe wrote:
> > On Fri, 2024-09-20 at 10:02 -0400, Jonathan S. Katz wrote:
> > > Attached is a proposal for the major features section. This borrows from
> > > the release announcement draft[1] and lists out features and themes that
> > > have broad user impact. This was a bit challenging for this release,
> > > because there are a lot of great features in PG17 that add up to a very
> > > special release.
> > > 
> > > Feedback welcome.
> > 
> > I would have added the platform-independent binary collation provider.
> > And perhaps "pg_createsubscriber": that can be a game-changer for setting
> > up logical replication.
> 
> I was on the fence about that, mostly because it'd make that sentence too
> much of a mouthful, but I do agree.
> 
> IIRC (didn't get to check) we did have a precedent for sublists in the major
> features, so I broke this one up. Please see attached.

Patch applied to PG 17.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Allow logical failover slots to wait on synchronous replication

2024-09-20 Thread John H
Hi,

On Fri, Sep 20, 2024 at 2:44 AM shveta malik  wrote:
> > >
> >
> > The difference is that the purpose of 'synchronized_standby_slots' is
> > to mention slot names for which the user expects logical walsenders to
> > wait before sending the logical changes to subscribers. OTOH,
> > 'synchronous_standby_names' has a different purpose as well. It is not
> > clear to me if the users would be interested in syncing failover slots
> > to all the standbys mentioned in 'synchronous_standby_names'.
> >
>
> Okay, I see your point. But not sure what could be the solution here
> except documenting. But let me think more.
>

That's a great find. I didn't consider mixed physical and logical
replicas in synchronous_standby_names.
I wonder if there are users running synchronous_standby_names with a
mix of logical and
physical replicas and what the use case would be.

Not sure if there's anything straight forward we could do in general
for slot syncing if synchronous_standby_names
refers to application_names of logical replicas, the feature can't be supported.

-- 
John Hsu - Amazon Web Services