Re: buffer refcount leak in foreign batch insert code

2023-05-02 Thread Alexander Pyhalov

Michael Paquier писал 2023-04-25 04:30:

On Mon, Apr 24, 2023 at 09:57:10AM +0900, Michael Paquier wrote:

The attached is what I am finishing with, with a minimal regression
test added to postgres_fdw.  Two partitions are enough.


Well, I have gone through that again this morning, and applied the fix
down to 14.  The buildfarm is digesting it fine.
--
Michael


Thank you. Sorry for the late response, was on vacation.
--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: [PATCH] Add native windows on arm64 support

2023-05-02 Thread Niyas Sait

On 19/01/2023 10:09, Niyas Sait wrote:



On 17/01/2023 22:51, Andres Freund wrote:



  int main(void)
@@ -1960,18 +1966,19 @@ int main(void)
  }
  '''
-  if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and 
__crc32cd without -march=armv8-a+crc',

-  args: test_c_args)
-    # Use ARM CRC Extension unconditionally
-    cdata.set('USE_ARMV8_CRC32C', 1)
-    have_optimized_crc = true
-  elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and 
__crc32cd with -march=armv8-a+crc',

-  args: test_c_args + ['-march=armv8-a+crc'])
-    # Use ARM CRC Extension, with runtime check
-    cflags_crc += '-march=armv8-a+crc'
-    cdata.set('USE_ARMV8_CRC32C', false)
-    cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
-    have_optimized_crc = true
+    if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and 
__crc32cd without -march=armv8-a+crc',

+    args: test_c_args)


Seems like it'd be easier to read if you don't re-indent this, but 
just have

the cc.get_id() == 'msvc' part of this if/else-if.





I've attached a new version (v8) to fix the above indentation issue.

Could someone please help with the review ?

--
NiyasFrom 108ad061caa15c7346d53e906794c4e971afbcc0 Mon Sep 17 00:00:00 2001
From: Niyas Sait 
Date: Fri, 16 Dec 2022 10:45:56 +
Subject: [PATCH v8] Enable postgres native build for windows-arm64 platform

- Add support for meson build
- Add arm64 definition of spin_delay function
- Exclude arm_acle.h import for MSVC
---
 doc/src/sgml/install-windows.sgml |  3 ++-
 meson.build   | 11 +--
 src/include/storage/s_lock.h  | 20 ++--
 src/port/pg_crc32c_armv8.c|  2 ++
 src/tools/msvc/gendef.pl  |  8 
 5 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/install-windows.sgml 
b/doc/src/sgml/install-windows.sgml
index cbc70a039c..2ecd5fcf38 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -352,7 +352,8 @@ $ENV{MSBFLAGS}="/m";
   Special Considerations for 64-Bit Windows
 
   
-   PostgreSQL will only build for the x64 architecture on 64-bit Windows.
+   PostgreSQL will only build for the x64 and ARM64 architectures on 64-bit
+   Windows.
   
 
   
diff --git a/meson.build b/meson.build
index 096044628c..6cca212fae 100644
--- a/meson.build
+++ b/meson.build
@@ -2045,8 +2045,11 @@ int main(void)
 elif host_cpu == 'arm' or host_cpu == 'aarch64'
 
   prog = '''
+#ifdef _MSC_VER
+#include 
+#else
 #include 
-
+#endif
 int main(void)
 {
 unsigned int crc = 0;
@@ -2060,7 +2063,11 @@ int main(void)
 }
 '''
 
-  if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
+  if cc.get_id() == 'msvc'
+cdata.set('USE_ARMV8_CRC32C', false)
+cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+have_optimized_crc = true
+  elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
   args: test_c_args)
 # Use ARM CRC Extension unconditionally
 cdata.set('USE_ARMV8_CRC32C', 1)
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index c9fa84cc43..a7973eae49 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -707,15 +707,31 @@ typedef LONG slock_t;
 
 #define SPIN_DELAY() spin_delay()
 
-/* If using Visual C++ on Win64, inline assembly is unavailable.
- * Use a _mm_pause intrinsic instead of rep nop.
+/*
+ * If using Visual C++ on Win64, inline assembly is unavailable.
+ * Use architecture specific intrinsics.
  */
 #if defined(_WIN64)
+/*
+ * For Arm64, use __isb intrinsic. See aarch64 inline assembly definition for 
details.
+ */
+#ifdef _M_ARM64
+static __forceinline void
+spin_delay(void)
+{
+/* Reference: 
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions
 */
+   __isb(_ARM64_BARRIER_SY);
+}
+#else
+/*
+ * For x64, use _mm_pause intrinsic instead of rep nop.
+ */
 static __forceinline void
 spin_delay(void)
 {
_mm_pause();
 }
+#endif
 #else
 static __forceinline void
 spin_delay(void)
diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c
index d8fae510cf..3d7eb748ff 100644
--- a/src/port/pg_crc32c_armv8.c
+++ b/src/port/pg_crc32c_armv8.c
@@ -14,7 +14,9 @@
  */
 #include "c.h"
 
+#ifndef _MSC_VER
 #include 
+#endif
 
 #include "port/pg_crc32c.h"
 
diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl
index e7cbefcbc3..934dc17b96 100644
--- a/src/tools/msvc/gendef.pl
+++ b/src/tools/msvc/gendef.pl
@@ -120,9 +120,9 @@ sub writedef
{
my $isdata = $def->{$f} eq 'data';
 
-   # Strip the leading underscore for win32, but not x64
+   # Strip the leading underscore for win32, but not x64 and 
aarch64
$f =~ s/^_//
- unless ($arch eq "x86_64");
+ unless ($arch eq "x86_64" || $arch eq "aarch64");
 
# Emit

Re: min/max aggregation for jsonb

2023-05-02 Thread Peter Eisentraut

On 03.03.23 11:41, David Rowley wrote:

On Fri, 3 Mar 2023 at 23:17, Daneel Yaitskov  wrote:

I wanted to use min/max aggregation functions for jsonb type and noticed
there is no functions for this type, meanwhile string/array types are supported.


It's not really clear to me how you'd want these to sort. If you just
want to sort by what the output that you see from the type's output
function then you might get what you need by casting to text.


Is there a concern about implementing support for jsonb in min/max?


I imagine a lack of any meaningful way of comparing two jsonb values
to find out which is greater than the other is of some concern.


We already have ordering operators and operator classes for jsonb, so 
sticking min/max aggregates around that should be pretty straightforward.






Re: Logging parallel worker draught

2023-05-02 Thread Benoit Lobréau

On 5/1/23 18:33, Robert Haas wrote:
> Why not? It seems like something some people might want to log and
> others not. Running the whole server at DEBUG1 to get this information
> doesn't seem like a suitable answer.

Since the statement is also logged, it could spam the log with huge 
queries, which might also be a reason to stop logging this kind of 
problems until the issue is fixed.


I attached the patch without the guc anyway just in case.


What I was wondering was whether we would be better off putting this
into the statistics collector, vs. doing it via logging. Both
approaches seem to have pros and cons.


We tried to explore different options with my collegues in another 
thread [1]. I feel like both solutions are worthwhile, and would be 
helpful. I plan to take a look at the pg_stat_statement patch [2] next.


Since it's my first patch, I elected to choose the easiest solution to 
implement first. I also proposed this because I think logging can help 
pinpoint a lot of problems at a minimal cost, since it is easy to setup 
and exploit for everyone, everywhere


[1] 
https://www.postgresql.org/message-id/d657df20-c4bf-63f6-e74c-cb85a81d0...@dalibo.com


[2] 
https://www.postgresql.org/message-id/flat/6acbe570-068e-bd8e-95d5-00c737b865e8%40gmail.com


--
Benoit Lobréau
Consultant
http://dalibo.comFrom fc71ef40f33f94b0506a092fb5b3dcde6de6d60a Mon Sep 17 00:00:00 2001
From: benoit 
Date: Tue, 2 May 2023 10:08:00 +0200
Subject: [PATCH] Add logging for exceeded parallel worker slot limits

Procude a log message when a backend attempts to spawn a parallel worker
but fails due to insufficient worker slots. The shortage can stem from
max_worker_processes, max_parallel_worker, or
max_parallel_maintenance_workers. The log message can help database
administrators and developers diagnose performance issues related to
parallelism and optimize the configuration of the system accordingly.
---
 src/backend/access/transam/parallel.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index b26f2a64fb..c60d1bd739 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -630,6 +630,11 @@ LaunchParallelWorkers(ParallelContext *pcxt)
pcxt->nknown_attached_workers = 0;
}
 
+   if (pcxt->nworkers_launched < pcxt->nworkers_to_launch)
+   ereport(LOG,
+   (errmsg("Parallel Worker draught during 
statement execution: workers spawned %d, requested %d",
+   pcxt->nworkers_launched, 
pcxt->nworkers_to_launch)));
+
/* Restore previous memory context. */
MemoryContextSwitchTo(oldcontext);
 }
-- 
2.39.2



Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-05-02 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Sat, Apr 15, 2023 at 11:06:25AM +0900, Michael Paquier wrote:
>> Thanks, I'll look at it.
>
> +   else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) ||
> +Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny))
> +   COMPLETE_WITH("CREATE", "GRANT");
> +   else if (Matches("CREATE", "SCHEMA", MatchAny))
> +   COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");
>
> I had this grammar under my eyes a few days ago for a different patch,
> and there are much more objects types that can be appended to a CREATE
> SCHEMA, like triggers, sequences, tables or views, so this is
> incomplete, isn't it?

This is for completing the word CREATE itself after CREATE SCHEMA
[[] AUTHORIZATION] .  The things that can come after that
are already handled generically earlier in the function:

/* CREATE */
/* complete with something you can create */
else if (TailMatches("CREATE"))
matches = rl_completion_matches(text, create_command_generator);

create_command_generator uses the words_after_create array, which lists
all the things that can be created.

- ilmari




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-02 Thread Alvaro Herrera
On 2023-Apr-07, Julien Rouhaud wrote:

> That being said, I have a hard time believing that we could actually preserve
> physical replication slots.  I don't think that pg_upgrade final state is 
> fully
> reproducible:  not all object oids are preserved, and the various pg_restore
> are run in parallel so you're very likely to end up with small physical
> differences that would be incompatible with physical replication.  Even if we
> could make it totally reproducible, it would probably be at the cost of making
> pg_upgrade orders of magnitude slower.  And since many people are already
> complaining that it's too slow, that doesn't seem like something we would 
> want.

A point on preserving physical replication slots: because we change WAL
format from one major version to the next (adding new messages or
changing format for other messages), we can't currently rely on physical
slots working across different major versions.

So IMO, for now don't bother with physical replication slot
preservation, but do keep the option name as specific to logical slots.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Logging parallel worker draught

2023-05-02 Thread Amit Kapila
On Mon, May 1, 2023 at 10:03 PM Robert Haas  wrote:
>
> On Sat, Apr 22, 2023 at 7:06 AM Amit Kapila  wrote:
> > I don't think introducing a GUC for this is a good idea. We can
> > directly output this message in the server log either at LOG or DEBUG1
> > level.
>
> Why not? It seems like something some people might want to log and
> others not. Running the whole server at DEBUG1 to get this information
> doesn't seem like a suitable answer.
>

We can output this at the LOG level to avoid running the server at
DEBUG1 level. There are a few other cases where we are not able to
spawn the worker or process and those are logged at the LOG level. For
example, "could not fork autovacuum launcher process .." or "too many
background workers". So, not sure, if this should get a separate
treatment. If we fear this can happen frequently enough that it can
spam the LOG then a GUC may be worthwhile.

> What I was wondering was whether we would be better off putting this
> into the statistics collector, vs. doing it via logging. Both
> approaches seem to have pros and cons.
>

I think it could be easier for users to process the information if it
is available via some view, so there is a benefit in putting this into
the stats subsystem.

-- 
With Regards,
Amit Kapila.




Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-05-02 Thread Drouvot, Bertrand

Hi,

On 5/2/23 8:28 AM, Amit Kapila wrote:

On Fri, Apr 28, 2023 at 2:24 PM Drouvot, Bertrand
 wrote:


I can see V7 failing on "Cirrus CI / macOS - Ventura - Meson" only (other 
machines are not complaining).

It does fail on "invalidated logical slots do not lead to retaining WAL", see 
https://cirrus-ci.com/task/4518083541336064

I'm not sure why it is failing, any idea?



I think the reason for the failure is that on standby, the test is not
able to remove the file corresponding to the invalid slot. You are
using pg_switch_wal() to generate a switch record and I think you need
one more WAL-generating statement after that to achieve your purpose
which is that during checkpoint, the tes removes the WAL file
corresponding to an invalid slot. Just doing checkpoint on primary may
not serve the need as that doesn't lead to any new insertion of WAL on
standby. Is your v6 failing in the same environment?


Thanks for the feedback!

No V6 was working fine.


If not, then it
is probably due to the reason that the test is doing insert after
pg_switch_wal() in that version. Why did you change the order of
insert in v7?



I thought doing the insert before the switch was ok and as my local test
was running fine I did not re-consider the ordering.


BTW, you can confirm the failure by changing the DEBUG2 message in
RemoveOldXlogFiles() to LOG. In the case, where the test fails, it may
not remove the WAL file corresponding to an invalid slot whereas it
will remove the WAL file when the test succeeds.


Yeah, I added more debug information and what I can see is that the WAL file
we want to see removed is "00010003" while the standby emits:

"
2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG:  attempting to remove 
WAL segments older than log file 0002
2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG:  recycled write-ahead log file 
"00010002"
"

As per your suggestion, changing the insert ordering (like in V8 attached) 
makes it now work on the failing environment too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom defaeb000b09eab9ba7b5d08c032f81b5bd72a53 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 28 Apr 2023 07:27:20 +
Subject: [PATCH v8] Add a test to verify that invalidated logical slots do not
 lead to retaining WAL.

---
 .../t/035_standby_logical_decoding.pl | 39 ++-
 1 file changed, 37 insertions(+), 2 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index 66d264f230..a6b640d7fd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -500,9 +500,44 @@ $node_standby->restart;
 check_slots_conflicting_status(1);
 
 ##
-# Verify that invalidated logical slots do not lead to retaining WAL
+# Verify that invalidated logical slots do not lead to retaining WAL.
 ##
-# X TODO
+
+# Before removing WAL file(s), ensure the cascading standby catch up
+$node_standby->wait_for_replay_catchup($node_cascading_standby,
+   $node_primary);
+
+# Get the restart_lsn from an invalidated slot
+my $restart_lsn = $node_standby->safe_psql('postgres',
+   "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 
'vacuum_full_activeslot' and conflicting is true;"
+);
+
+chomp($restart_lsn);
+
+# As pg_walfile_name() can not be executed on the standby,
+# get the WAL file name associated to this lsn from the primary
+my $walfile_name = $node_primary->safe_psql('postgres',
+   "SELECT pg_walfile_name('$restart_lsn')");
+
+chomp($walfile_name);
+
+# Generate some activity and switch WAL file on the primary
+$node_primary->safe_psql(
+   'postgres', "create table retain_test(a int);
+select 
pg_switch_wal();
+insert 
into retain_test values(1);
+
checkpoint;");
+
+# Wait for the standby to catch up
+$node_primary->wait_for_catchup($node_standby);
+
+# Request a checkpoint on the standby to trigger the WAL file(s) removal
+$node_standby->safe_psql('postgres', 'checkpoint;');
+
+# Verify that the wal file has not been retained on the standby
+my $standby_walfile = $node_standby->data_dir . '/pg_wal/' . $walfile_name;
+ok(!-f "$standby_walfile",
+   "invalidated logical slots do not lead to retaining WAL");
 
 ##
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
-- 
2.34.1



Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-02 Thread Julien Rouhaud
Hi,

On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:
> On 2023-Apr-07, Julien Rouhaud wrote:
>
> > That being said, I have a hard time believing that we could actually 
> > preserve
> > physical replication slots.  I don't think that pg_upgrade final state is 
> > fully
> > reproducible:  not all object oids are preserved, and the various pg_restore
> > are run in parallel so you're very likely to end up with small physical
> > differences that would be incompatible with physical replication.  Even if 
> > we
> > could make it totally reproducible, it would probably be at the cost of 
> > making
> > pg_upgrade orders of magnitude slower.  And since many people are already
> > complaining that it's too slow, that doesn't seem like something we would 
> > want.
>
> A point on preserving physical replication slots: because we change WAL
> format from one major version to the next (adding new messages or
> changing format for other messages), we can't currently rely on physical
> slots working across different major versions.

I don't think anyone suggested to do physical replication over different major
versions.  My understanding was that it would be used to pg_upgrade a
"physical cluster" (e.g. a primary and physical standby server) at the same
time, and then simply starting them up again would lead to a working physical
replication on the new version.

I guess one could try to keep using the slots for other needs (PITR backup with
pg_receivewal or something similar), and then you would indeed have to be aware
that you won't be able to do anything with the new WAL records until you do a
fresh base backup, but that's a problem that you can already face after a
normal pg_upgrade (although in most cases it's probably quite obvious for now
as the timeline isn't preserved).




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-02 Thread Julien Rouhaud
On Tue, 2 May 2023, 19:43 Julien Rouhaud,  wrote:

> Hi,
>
> On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:
> > On 2023-Apr-07, Julien Rouhaud wrote:
> >
> > > That being said, I have a hard time believing that we could actually
> preserve
> > > physical replication slots.  I don't think that pg_upgrade final state
> is fully
> > > reproducible:  not all object oids are preserved, and the various
> pg_restore
> > > are run in parallel so you're very likely to end up with small physical
> > > differences that would be incompatible with physical replication.
> Even if we
> > > could make it totally reproducible, it would probably be at the cost
> of making
> > > pg_upgrade orders of magnitude slower.  And since many people are
> already
> > > complaining that it's too slow, that doesn't seem like something we
> would want.
> >
> > A point on preserving physical replication slots: because we change WAL
> > format from one major version to the next (adding new messages or
> > changing format for other messages), we can't currently rely on physical
> > slots working across different major versions.
>
> I don't think anyone suggested to do physical replication over different
> major
> versions.  My understanding was that it would be used to pg_upgrade a
> "physical cluster" (e.g. a primary and physical standby server) at the same
> time, and then simply starting them up again would lead to a working
> physical
> replication on the new version.
>
> I guess one could try to keep using the slots for other needs (PITR backup
> with
> pg_receivewal or something similar), and then you would indeed have to be
> aware
> that you won't be able to do anything with the new WAL records until you
> do a
> fresh base backup, but that's a problem that you can already face after a
> normal pg_upgrade (although in most cases it's probably quite obvious for
> now
> as the timeline isn't preserved).
>

if what you meant is that the slot may have to send a record generated by
an older major version, then unless I'm missing something the same
restriction could be added to such a feature as what's being discussed in
this thread for the logical replication slots. so only a final shutdown
checkpoint record would be present after the flushed WAL position. it may
be possible to work around that, if there weren't all the other problems I
mentioned.

>


Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-05-02 Thread Dagfinn Ilmari Mannsåker
Dagfinn Ilmari Mannsåker  writes:

> Michael Paquier  writes:
>
>> On Sat, Apr 15, 2023 at 11:06:25AM +0900, Michael Paquier wrote:
>>> Thanks, I'll look at it.
>>
>> +   else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) ||
>> +Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", 
>> MatchAny))
>> +   COMPLETE_WITH("CREATE", "GRANT");
>> +   else if (Matches("CREATE", "SCHEMA", MatchAny))
>> +   COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");
>>
>> I had this grammar under my eyes a few days ago for a different patch,
>> and there are much more objects types that can be appended to a CREATE
>> SCHEMA, like triggers, sequences, tables or views, so this is
>> incomplete, isn't it?
>
> This is for completing the word CREATE itself after CREATE SCHEMA
> [[] AUTHORIZATION] .  The things that can come after that
> are already handled generically earlier in the function:
>
> /* CREATE */
> /* complete with something you can create */
> else if (TailMatches("CREATE"))
> matches = rl_completion_matches(text, create_command_generator);
>
> create_command_generator uses the words_after_create array, which lists
> all the things that can be created.

But, looking closer at the docs, only tables, views, indexes, sequences
and triggers can be created as part of a CREATE SCHEMA statement. Maybe
we should add a HeadMatches("CREATE", "SCHEMA") exception in the above?

- ilmari




Re: Add PQsendSyncMessage() to libpq

2023-05-02 Thread Robert Haas
On Mon, May 1, 2023 at 8:42 PM Michael Paquier  wrote:
> Another thing that may matter in terms of extensibility?  Would a
> boolean argument really be the best design?  Could it be better to
> have instead one API with a bits32 and some flags controlling its
> internals?

I wondered that, too. If we never add any more Boolean parameters to
this function then that would end up a waste, but maybe we will and
then it will be genius. Not sure what's best.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ICU locale validation / canonicalization

2023-05-02 Thread Noah Misch
On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> On 30.03.23 04:33, Jeff Davis wrote:
> >Attached is a new version of the final patch, which performs
> >canonicalization. I'm not 100% sure that it's wanted, but it still
> >seems like a good idea to get the locales into a standard format in the
> >catalogs, and if a lot more people start using ICU in v16 (because it's
> >the default), then it would be a good time to do it. But perhaps there
> >are risks?
> 
> I say, let's do it.

The following is not cause for postgresql.git changes at this time, but I'm
sharing it in case it saves someone else the study effort.  Commit ea1db8a
("Canonicalize ICU locale names to language tags.") slowed buildfarm member
hoverfly, but that disappears if I drop debug_parallel_query from its config.
Typical end-to-end duration rose from 2h5m to 2h55m.  Most-affected were
installcheck runs, which rose from 11m to 19m.  (The "check" stage uses
NO_LOCALE=1, so it changed less.)  From profiles, my theory is that each of
the many parallel workers burns notable CPU and I/O opening its ICU collator
for the first time.  debug_parallel_query, by design, pursues parallelism
independent of cost, so this is working as intended.  If it ever matters in
non-debug configurations, we might raise the default parallel_setup_cost or
pre-load ICU collators in the postmaster.




Cleaning up array_in()

2023-05-02 Thread Tom Lane
This is in response to Alexander's observation at [1], but I'm
starting a fresh thread to keep this patch separate from the plperl
fixes in the cfbot's eyes.

Alexander Lakhin  writes:
> I continue watching the array handling bugs dancing Sirtaki too. Now it's
> another asymmetry:
> select '{{1},{{2}}}'::int[];
>   {{{1}},{{2}}}
> but:
> select '{{{1}},{2}}'::int[];
>   {}

Bleah.  Both of those should be rejected, for sure, but it's the same
situation as in the PLs: we weren't doing anything to enforce that all
the scalar elements appear at the same nesting depth.

I spent some time examining array_in(), and was pretty disheartened
by what a mess it is.  It looks like back in the dim mists of the
Berkeley era, there was an intentional attempt to allow
non-rectangular array input, with the missing elements automatically
filled out as NULLs.  Since that was undocumented, we concluded it was
a bug and plastered on some code to check for rectangularity of the
input.  I don't quibble with enforcing rectangularity, but the
underlying logic should have been simplified while we were at it.
The element-counting logic was basically magic (why is it okay to
increment temp[ndim - 1] when the current nest_level might be
different from that?) and the extra layers of checks didn't make it
any more intelligible.  Plus, ReadArrayStr was expending far more
cycles than it needs to given the assumption of rectangularity.

So, here's a rewrite.

Although I view this as a bug fix, AFAICT the only effects are to
accept input that should be rejected.  So again I don't advocate
back-patching.  But should we sneak it into v16, or wait for v17?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/9cd163da-d096-7e9e-28f6-f3620962a660%40gmail.com

From 6a9fe8117e1b91958111c679d02a2bd7944fae22 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Mon, 1 May 2023 18:31:40 -0400
Subject: [PATCH v1 1/2] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

(This leaves ArrayGetOffset0() unused, but I'm unsure whether to
remove that.)
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 87c987fb27..39b5efc661 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ static bool array_isspace(char ch);
 static int	ArrayCount(const char *str, int *dim, char typdelim,
 	   Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-		 int nitems, int ndim, int *dim,
+		 int nitems,
 		 FmgrInfo *inputproc, Oid typioparam, int32 typmod,
 		 char typdelim,
 		 int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
 	dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
 	nullsPtr = (bool *) palloc(nitems * sizeof(bool));
 	if (!ReadArrayStr(p, string,
-	  nitems, ndim, dim,
+	  nitems,
 	  &my_extra->proc, typioparam, typmod,
 	  typdelim,
 	  typlen, typbyval, typalign,
@@ -457,7 +457,8 @@ array_isspace(char ch)
 
 /*
  * ArrayCount
- *	 Determines the dimensions for an array string.
+ *	 Determines the dimensions for an array string.  This includes
+ *	 syntax-checking the array structure decoration (braces and delimiters).
  *
  * Returns number of dimensions as function result.  The axis lengths are
  * returned in dim[], which must be of size MAXDIM.
@@ -704,16 +705,14 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
 /*
  * ReadArrayStr :
  *	 parses the array string pointed to by "arrayStr" and converts the values
- *	 to internal format.  Unspecified elements are initialized to nulls.
- *	 The array dimensions must already have been determined.
+ *	 to internal format.  The array dimensions must have been determined,
+ *	 and the case of an empty array must have been handled earlier.
  *
  * Inputs:
  *	arrayStr: the string to parse.
  *			  CAUTION: the contents of "arrayStr" will be modified!
  *	origStr: the unmodified input string, used only in error messages.
  *	nitems: total number of array elements, as already determined.
- *	ndim: number of array dimensions
- *	dim[]: array axis lengths
  *	inputproc: type-specific input procedure for element datatype.
  *	typioparam, typmod: auxiliary values to pass to inputproc.
  *	typdelim: the value delimiter (type-specific).

code cleanup for CREATE STATISTICS

2023-05-02 Thread Robert Haas
Hi,

There is at present a comment at the top of transformStatsStmt which
says "To avoid race conditions, it's important that this function
relies only on the passed-in relid (and not on stmt->relation) to
determine the target relation." However, doing what the comment says
we need to do doesn't actually avoid the problem we need to avoid. The
issue here, as I understand it, is that if we look up the same
less-than-fully-qualified name multiple times, we might get different
answers due to concurrent activity, and that might create a security
vulnerability, along the lines of CVE-2014-0062. So what the code
should be doing is looking up the user-provided name just once and
then using that value throughout all subsequent processing stages, but
it doesn't actually. The caller of transformStatsStmt() looks up the
RangeVar and gets an OID, but that value is then discarded and
CreateStatistics does another lookup on the same name, which means
that we're not really avoiding the hazard about which the comment
seems to be concerned.

So that leads to the question of whether there's a security
vulnerability here. I and a few other members of the pgsql-security
haven't been able to find one in brief review, but we may have missed
something. Fortunately, the permissions checks happen after the second
name lookup inside CreateStatistics(), so it doesn't seem that, for
example, you can leverage this to create extended statistics on a
table that you don't own. You can possibly get the first part of the
operation, where we transform the CREATE STATISTICS command's WHERE
clause, to operate on one table that you do own and then the second
part on another table that you don't own, but even if that's so, the
second part is just going to fail before doing anything interesting,
so it doesn't seem like there's a problem. If anyone reading this can
spot an exploit, please speak up!

So the attached patch is proposed as code cleanup, rather than
security patches. It changes the code to avoid the duplicate name
lookup altogether. There is no reason that I know of why this needs to
be back-patched for correctness, but I think it's worth putting into
master to make the code nicer and avoid doing things that in some
circumstances can be risky.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Refactor-transformStatsStmt-to-avoid-repeated-nam.patch
Description: Binary data


Re: "variable not found in subplan target list"

2023-05-02 Thread Alvaro Herrera
Hi Amit,

On 2023-Mar-30, Alvaro Herrera wrote:

> On 2023-Mar-29, Amit Langote wrote:

> > Though, I wonder if we need to keep ec386948948 that introduced the
> > notion of part_prune_index around if the project that needed it [1]
> > has moved on to an entirely different approach altogether, one that
> > doesn't require hacking up the pruning code.
> 
> Hmm, that's indeed tempting.

We have an open item about this, and I see no reason not to do it.  I
checked, and putting things back is just a matter of reverting
589bb816499e and ec386948948, cleaning up some trivial pgindent-induced
conflicts, and bumping catversion once more.  Would you like to do that
yourself, or do you prefer that I do it?  Ideally, we'd do it before
beta1.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-02 Thread Robert Haas
On Mon, May 1, 2023 at 11:21 PM Peter Geoghegan  wrote:
> I can't imagine why you feel it necessary to communicate with me like
> this. This is just vitriol, lacking any substance.

John's email is pretty harsh, but I can understand why he's frustrated.

I told you that I did not agree with your dislike for the term
wraparound and I explained why. You sent a couple more emails telling
me that I was wrong and, frankly, saying a lot of things that seem
only tangentially related to the point that I was actually making. You
seem to expect other people to spend a LOT OF TIME trying to
understand what you're trying to say, but you don't seem to invest
similar effort in trying to understand what they're trying to say. I
couldn't even begin to grasp what your point was until Maciek stepped
in to explain, and I still don't really agree with it, and I expect
that no matter how many emails I write about that, your position won't
budge an iota.

It's really demoralizing. If I just vote -1 on the patch set, then I'm
a useless obstruction. If I actually try to review it, we'll exchange
100 emails and I won't get anything else done for the next two weeks
and I probably won't feel much better about the patch at the end of
that process than at the beginning. I don't see that I have any
winning options here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-02 Thread Peter Geoghegan
On Tue, May 2, 2023 at 1:29 PM Robert Haas  wrote:
> I told you that I did not agree with your dislike for the term
> wraparound and I explained why. You sent a couple more emails telling
> me that I was wrong and, frankly, saying a lot of things that seem
> only tangentially related to the point that I was actually making.

I agree that that's what I did. You're perfectly entitled to find that
annoying (though I maintain that my point about the 64-bit XID space
was a good one, assuming the general subject matter was of interest).
However, you're talking about this as if I dug my feet in on a
substantive issue affecting the basic shape of the patch -- I don't
believe that that conclusion is justified by anything I've said or
done. I'm not even sure that we disagree on some less important point
that will directly affect the patch (it's quite possible, but I'm not
even sure of it).

I've already said that I don't think that the term wraparound is going
anywhere anytime soon (granted, that was on the other thread). So it's
not like I'm attempting to banish all existing use of that terminology
within the scope of this patch series -- far from it. At most I tried
to avoid inventing new terms that contain the word "wraparound" (also
on the other thread).

The topic originally came up in the context of moving talk about
physical wraparound to an entirely different chapter. Which is, I
believe (based in part on previous discussions), something that all
three of us already agree on! So again, I must ask: is there actually
a substantive disagreement at all?

> It's really demoralizing. If I just vote -1 on the patch set, then I'm
> a useless obstruction. If I actually try to review it, we'll exchange
> 100 emails and I won't get anything else done for the next two weeks
> and I probably won't feel much better about the patch at the end of
> that process than at the beginning. I don't see that I have any
> winning options here.

I've already put a huge amount of work into this. It is inherently a
very difficult thing to get right -- it's not hard to understand why
it was put off for so long. Why shouldn't I have opinions, given all
that? I'm frustrated too.

Despite all this, John basically agreed with my high level direction
-- all of the important points seemed to have been settled without any
arguments whatsoever (also based in part on previous discussions).
John's volley of abuse seemed to come from nowhere at all.

-- 
Peter Geoghegan




Rename 'lpp' to 'lp' in heapam.c

2023-05-02 Thread Yaphters W
Hi,

I just found the naming of the ItemId variables is not consistent in
heapam.c. There are 13 'lpp's and 112 'lp's. Technically 'lpp' is correct
as ItemId is a line pointer's pointer and there used to be code like
"++lpp" for line pointer array iteration. Now that all the "++lpp" code has
been removed and there are 100+ more occurrences of 'Ip' than 'lpp', I
suggest we change 'lpp' to 'lp' to make things consistent and avoid
confusion.

Best Regards,
Zian Wang


v-1-0001-Rename-lpp-to-lp-in-heapam.c.patch
Description: Binary data


Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-02 Thread John Naylor
On Tue, May 2, 2023 at 9:55 AM Peter Geoghegan  wrote:
>
> On Mon, May 1, 2023 at 5:34 AM John Naylor 
wrote:

> > 0003 is now a quick-and-dirty attempt at that, only in the docs. The
MXID part is mostly copy-pasted from the XID part, just to get something
together. I'd like to abbreviate that somehow.
>
> Yeah, the need to abbreviate statements about MultiXact IDs by saying
> that they work analogously to XIDs in some particular respect
> is...another thing that makes this tricky.

Then it sounds like they should stay separate. A direct copy-paste is not
good for style, so I will add things like:

- If for some reason autovacuum fails to clear old MXIDs from a table, the
+ As in the case with XIDs, it is possible for autovacuum to fail to [...]

It might least be good for readability to gloss over the warning and only
quote the MXID limit error message, but we'll have to see how it looks.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Rename 'lpp' to 'lp' in heapam.c

2023-05-02 Thread David Rowley
On Wed, 3 May 2023 at 12:16, Yaphters W  wrote:
> I just found the naming of the ItemId variables is not consistent in 
> heapam.c. There are 13 'lpp's and 112 'lp's. Technically 'lpp' is correct as 
> ItemId is a line pointer's pointer and there used to be code like "++lpp" for 
> line pointer array iteration. Now that all the "++lpp" code has been removed 
> and there are 100+ more occurrences of 'Ip' than 'lpp', I suggest we change 
> 'lpp' to 'lp' to make things consistent and avoid confusion.

I don't really agree that one is any more correct than the other. I
also don't think we should be making changes like this as doing this
may give some false impression that we have some standard to follow
here that a local variable of a given type must be given a certain
name. To comply with such a standard seems like it would take close to
an endless number of patches which would just result in wasted
reviewer and committer time and give us nothing but pain while back
patching.

-1 from me.

David




Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-05-02 Thread Peter Geoghegan
On Tue, May 2, 2023 at 6:46 PM John Naylor  wrote:
> It might least be good for readability to gloss over the warning and only 
> quote the MXID limit error message, but we'll have to see how it looks.

Apparently you expect me to join you in pretending that you didn't
lambast my review on this thread less than 24 hours ago [1]. I happen
to believe that this particular patch is of great strategic
importance, so I'll admit that I thought about it for a second. But
just a second -- I have more self-respect than that.

That's not the only reason, though. I also genuinely don't have the
foggiest notion what was behind what you said. In particular, this
part still makes zero sense to me:

"Claim that others are holding you back, and then try to move the
goalposts in their work"

Let me get this straight: "Moving the goalposts of their work" refers
to something *I* did to *you*, on *this* thread...right?

If anything, I'm biased in *favor* of this patch. The fact that you
seem to think that I was being obstructionist just doesn't make any
sense to me, at all. I really don't know where to go from there. I'm
not so much upset as baffled.

[1] 
https://postgr.es/m/CAFBsxsGJMp43QO2cLAh0==ueYVL35pbbEHeXZ0cnZkU=q8s...@mail.gmail.com
--
Peter Geoghegan




Re: COPY TO STDOUT Apache Arrow support

2023-05-02 Thread Adam Lippai
Hi,

There is also a new Arrow C library (one .h and one .c file) which makes it
easier to use it from the postgresql codebase.

https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/
https://github.com/apache/arrow-nanoarrow/tree/main/dist

Best regards,
Adam Lippai

On Thu, Apr 13, 2023 at 2:35 PM Adam Lippai  wrote:

> Hi,
>
> There are two bigger developments in this topic:
>
>1. Pandas 2.0 is released and it can use Apache Arrow as a backend
>2. Apache Arrow ADBC is released which standardizes the client API.
>Currently it uses the postgresql wire protocol underneath
>
> Best regards,
> Adam Lippai
>
> On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai  wrote:
>
>> Hi,
>>
>> would it be possible to add Apache Arrow streaming format to the copy
>> backend + frontend?
>> The use case is fetching (or storing) tens or hundreds of millions of
>> rows for client side data science purposes (Pandas, Apache Arrow compute
>> kernels, Parquet conversion etc). It looks like the serialization overhead
>> when using the postgresql wire format can be significant.
>>
>> Best regards,
>> Adam Lippai
>>
>


Re: COPY TO STDOUT Apache Arrow support

2023-05-02 Thread Pavel Stehule
Hi

st 3. 5. 2023 v 5:15 odesílatel Adam Lippai  napsal:

> Hi,
>
> There is also a new Arrow C library (one .h and one .c file) which makes
> it easier to use it from the postgresql codebase.
>
> https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/
> https://github.com/apache/arrow-nanoarrow/tree/main/dist
>
> Best regards,
> Adam Lippai
>

With 9fcdf2c787ac6da330165ea3cd50ec5155943a2b it can be implemented in
extension

Regards

Pavel


> On Thu, Apr 13, 2023 at 2:35 PM Adam Lippai  wrote:
>
>> Hi,
>>
>> There are two bigger developments in this topic:
>>
>>1. Pandas 2.0 is released and it can use Apache Arrow as a backend
>>2. Apache Arrow ADBC is released which standardizes the client API.
>>Currently it uses the postgresql wire protocol underneath
>>
>> Best regards,
>> Adam Lippai
>>
>> On Thu, Apr 21, 2022 at 10:41 AM Adam Lippai  wrote:
>>
>>> Hi,
>>>
>>> would it be possible to add Apache Arrow streaming format to the copy
>>> backend + frontend?
>>> The use case is fetching (or storing) tens or hundreds of millions of
>>> rows for client side data science purposes (Pandas, Apache Arrow compute
>>> kernels, Parquet conversion etc). It looks like the serialization overhead
>>> when using the postgresql wire format can be significant.
>>>
>>> Best regards,
>>> Adam Lippai
>>>
>>


Re: Large files for relations

2023-05-02 Thread Thomas Munro
On Tue, May 2, 2023 at 3:28 PM Pavel Stehule  wrote:
> I like this patch - it can save some system sources - I am not sure how much, 
> because bigger tables usually use partitioning usually.

Yeah, if you only use partitions of < 1GB it won't make a difference.
Larger partitions are not uncommon, though.

> Important note - this feature breaks sharing files on the backup side - so 
> before disabling 1GB sized files, this issue should be solved.

Hmm, right, so there is a backup granularity continuum with "whole
database cluster" at one end, "only files whose size, mtime [or
optionally also checksum] changed since last backup" in the middle,
and "only blocks that changed since LSN of last backup" at the other
end.  Getting closer to the right end of that continuum can make
backups require less reading, less network transfer, less writing
and/or less storage space depending on details.  But this proposal
moves the middle thing further to the left by changing the granularity
from 1GB to whole relation, which can be gargantuan with this patch.
Ultimately we need to be all the way at the right on that continuum,
and there are clearly several people working on that goal.

I'm not involved in any of those projects, but it's fun to think about
an alien technology that produces complete standalone backups like
rsync --link-dest (as opposed to "full" backups followed by a chain of
"incremental" backups that depend on it so you need to retain them
carefully) while still sharing disk blocks with older backups, and
doing so with block granularity.  TL;DW something something WAL
something something copy_file_range().




Re: Large files for relations

2023-05-02 Thread Thomas Munro
On Wed, May 3, 2023 at 5:21 PM Thomas Munro  wrote:
> rsync --link-dest

I wonder if rsync will grow a mode that can use copy_file_range() to
share blocks with a reference file (= previous backup).  Something
like --copy-range-dest.  That'd work for large-file relations
(assuming a file system that has block sharing, like XFS and ZFS).
You wouldn't get the "mtime is enough, I don't even need to read the
bytes" optimisation, which I assume makes all database hackers feel a
bit queasy anyway, but you'd get the space savings via the usual
rolling checksum or a cheaper version that only looks for strong
checksum matches at the same offset, or whatever other tricks rsync
might have up its sleeve.




Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-02 Thread samay sharma
Hi,

On Mon, Apr 24, 2023 at 2:58 PM Peter Geoghegan  wrote:

> My work on page-level freezing for PostgreSQL 16 has some remaining
> loose ends to tie up with the documentation. The "Routine Vacuuming"
> section of the docs has no mention of page-level freezing. It also
> doesn't mention the FPI optimization added by commit 1de58df4. This
> isn't a small thing to leave out; I fully expect that the FPI
> optimization will very significantly alter when and how VACUUM
> freezes. The cadence will look quite a lot different.
>
> It seemed almost impossible to fit in discussion of page-level
> freezing to the existing structure. In part this is because the
> existing documentation emphasizes the worst case scenario, rather than
> talking about freezing as a maintenance task that affects physical
> heap pages in roughly the same way as pruning does. There isn't a
> clean separation of things that would allow me to just add a paragraph
> about the FPI thing.
>
> Obviously it's important that the system never enters xidStopLimit
> mode -- not being able to allocate new XIDs is a huge problem. But it
> seems unhelpful to define that as the only goal of freezing, or even
> the main goal. To me this seems similar to defining the goal of
> cleaning up bloat as avoiding completely running out of disk space;
> while it may be "the single most important thing" in some general
> sense, it isn't all that important in most individual cases. There are
> many very bad things that will happen before that extreme worst case
> is hit, which are far more likely to be the real source of pain.
>
> There are also very big structural problems with "Routine Vacuuming",
> that I also propose to do something about. Honestly, it's a huge mess
> at this point. It's nobody's fault in particular; there has been
> accretion after accretion added, over many years. It is time to
> finally bite the bullet and do some serious restructuring. I'm hoping
> that I don't get too much push back on this, because it's already very
> difficult work.
>

Thanks for taking the time to do this. It is indeed difficult work. I'll
give my perspective as someone who has not read the vacuum code but have
learnt most of what I know about autovacuum / vacuuming by reading the
"Routine Vacuuming" page 10s of times.


>
> Attached patch series shows what I consider to be a much better
> overall structure. To make this convenient to take a quick look at, I
> also attach a prebuilt version of routine-vacuuming.html (not the only
> page that I've changed, but the most important set of changes by far).
>
> This initial version is still quite lacking in overall polish, but I
> believe that it gets the general structure right. That's what I'd like
> to get feedback on right now: can I get agreement with me about the
> general nature of the problem? Does this high level direction seem
> like the right one?
>

There are things I like about the changes you've proposed and some where I
feel that the previous section was easier to understand. I'll comment
inline on the summary below and will put in a few points about things I
think can be improved at the end.


>
> The following list is a summary of the major changes that I propose:
>
> 1. Restructures the order of items to match the actual processing
> order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
> ANALYZE and then back to VACUUM.
>
> This flows a lot better, which helps with later items that deal with
> freezing/wraparound.
>

+1


>
> 2. Renamed "Preventing Transaction ID Wraparound Failures" to
> "Freezing to manage the transaction ID space". Now we talk about
> wraparound as a subtopic of freezing, not vice-versa. (This is a
> complete rewrite, as described by later items in this list).
>

+1 on this too. Freezing is a normal part of vacuuming and while the
aggressive vacuums are different, I think just talking about the worst case
scenario while referring to it is alarmist.


>
> 3. All of the stuff about modulo-2^32 arithmetic is moved to the
> storage chapter, where we describe the heap tuple header format.
>
> It seems crazy to me that the second sentence in our discussion of
> wraparound/freezing is still:
>
> "But since transaction IDs have limited size (32 bits) a cluster that
> runs for a long time (more than 4 billion transactions) would suffer
> transaction ID wraparound: the XID counter wraps around to zero, and
> all of a sudden transactions that were in the past appear to be in the
> future"
>
> Here we start the whole discussion of wraparound (a particularly
> delicate topic) by describing how VACUUM used to work 20 years ago,
> before the invention of freezing. That was the last time that a
> PostgreSQL cluster could run for 4 billion XIDs without freezing. The
> invariant is that we activate xidStopLimit mode protections to avoid a
> "distance" between any two unfrozen XIDs that exceeds about 2 billion
> XIDs. So why on earth are we talking about 4 billion XIDs? This is the
> most c