date:20241120

Hi,

On Tue, Nov 19, 2024 at 10:47:53AM +0900, Michael Paquier wrote:
> On Thu, Nov 14, 2024 at 01:30:11PM +, Bertrand Drouvot wrote:
> > - change the arguments to false in the pgstat_drop_entry_internal() call in 
> > pgstat_drop_all_entries()
> > - start the engine
> > - kill -9 postgres
> > - restart the engine
> > 
> > You'll see the assert failing due to the startup process. I don't think it 
> > is
> > doing something wrong though, just populating its backend stats. Do you have
> > another view on it?
> 
> It feels sad that we have to plug in the internals at this level for
> this particular case.  Perhaps there is something to do with more
> callbacks.  Or perhaps there is just no point in tracking the stats of
> auxiliary processes because we already have this data in the existing
> pg_stat_io already?

We can not discard the "per backend" stats collection for auxiliary processes
because those are the source for pg_stat_io too (once the 2 flush callbacks are
merged).

I have another idea: after a bit more of investigation it turns out that
only the startup process is generating the failed assertion.

So, for the startup process only, what about?

- don't call pgstat_create_backend_stat() in pgstat_beinit()...
- but call it in StartupXLOG() instead (after the stats are discarded or 
restored).

That way we could get rid of the changes linked to the assertion and still 
handle
the startup process particular case. Thoughts?

> > How would that work for someone doing "select * from 
> > pg_stat_get_backend_io()"?
> > 
> > Do you mean copy/paste part of the code from pg_stat_get_activity() into
> > pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
> > I'll have a look at it.
> 
> I was under the impression that we should keep PgStat_Backend for the
> reset_timestamp part, but drop BackendType as it can be guessed from
> pg_stat_activity itself.

Removing BackendType sounds doable, I'll look at it.

> >> Structurally, it may be cleaner to keep all the callbacks and the
> >> backend-I/O specific logic into a separate file, perhaps
> >> pgstat_io_backend.c or pgstat_backend_io?
> > 
> > Yeah, I was wondering the same. What about pgstat_backend.c (that would 
> > contain
> > only one "section" dedicated to IO currently)?
> 
> pgstat_backend.c looks good to me.  Could there be other stats than
> just IO, actually?  Perhaps not, just asking..

Yeah that's the reason why I suggested "pgstat_backend.c". I would not be
surprised if we add more "per backend" stats (that are not I/O related) in the
future as the main machinery would be there.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

On 10/31/24 11:18, Vitaly Davydov wrote:
> Dear Hackers,
> 
>  
> 
> I'd like to discuss a problem with replication slots's restart LSN.
> Physical slots are saved to disk at the beginning of checkpoint. At the
> end of checkpoint, old WAL segments are recycled or removed from disk,
> if they are not kept by slot's restart_lsn values.
> 

I agree that if we can lose WAL still needed for a replication slot,
that is a bug. Retaining the WAL is the primary purpose of slots, and we
just fixed a similar issue for logical replication.

> 
> If an existing physical slot is advanced in the middle of checkpoint
> execution, WAL segments, which are related to saved on disk restart LSN
> may be removed. It is because the calculation of the replication slot
> miminal LSN is occured at the end of checkpoint, prior to old WAL
> segments removal. If to hard stop (pg_stl -m immediate) the postgres
> instance right after checkpoint and to restart it, the slot's
> restart_lsn may point to the removed WAL segment. I believe, such
> behaviour is not good.
> 

Not sure I 100% follow, but let me rephrase, just so that we're on the
same page. CreateCheckPoint() does this:

  ... something ...

  CheckPointGuts(checkPoint.redo, flags);

  ... something ...

  RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
 checkPoint.ThisTimeLineID);

The slots get synced in CheckPointGuts(), so IIUC you're saying if the
slot gets advanced shortly after the sync, the RemoveOldXlogFiles() may
remove still-needed WAL, because we happen to consider a fresh
restart_lsn when calculating the logSegNo. Is that right?

> 
> The doc [0] describes that restart_lsn may be set to the some past value
> after reload. There is a discussion [1] on pghackers where such
> behaviour is discussed. The main reason of not flushing physical slots
> on advancing is a performance reason. I'm ok with such behaviour, except
> of that the corresponding WAL segments should not be removed.
> 

I don't know which part of [0] you refer to, but I guess you're
referring to this:

The current position of each slot is persisted only at checkpoint,
so in the case of a crash the slot may return to an earlier LSN,
which will then cause recent changes to be sent again when the
server restarts. Logical decoding clients are responsible for
avoiding ill effects from handling the same message more than once.

Yes, it's fine if we discard the new in-memory restart_lsn value, and we
do this for performance reasons - flushing the slot on every advance
would be very expensive. I haven't read [1] as it's quite long, but I
guess that's what it says.

But we must not make any "permanent" actions based on the unflushed
value, I think. Like, we should not remove WAL segments, for example.

>  
> 
> I propose to keep WAL segments by saved on disk (flushed) restart_lsn of
> slots. Add a new field restart_lsn_flushed into ReplicationSlot
> structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It
> doesn't change the format of storing the slot contents on disk. I
> attached a patch. It is not yet complete, but demonstate a way to solve
> the problem.
> 

That seems like a possible fix this, yes. And maybe it's the right one.

> 
> I reproduced the problem by the following way:
> 
>   * Add some delay in CheckPointBuffers (pg_usleep) to emulate long
> checkpoint execution.
>   * Execute checkpoint and pg_replication_slot_advance right after
> starting of the checkpoint from another connection.
>   * Hard restart the server right after checkpoint completion.
>   * After restart slot's restart_lsn may point to removed WAL segment.
> 
> The proposed patch fixes it.
> 

I tried to reproduce the issue using a stress test (checkpoint+restart
in a loop), but so far without success :-(

Can you clarify where exactly you added the pg_usleep(), and how long
are the waits you added? I wonder if the sleep is really needed,
considering the checkpoints are spread anyway. Also, what you mean by
"hard reset"?

What confuses me a bit is that we update the restart_lsn (and call
ReplicationSlotsComputeRequiredLSN() to recalculate the global value)
all the time. Walsender does that in PhysicalConfirmReceivedLocation for
example. So we actually see the required LSN to move during checkpoint
very often. So how come we don't see the issues much more often? Surely
I miss something important.

Another option might be that pg_replication_slot_advance() doesn't do
something it should be doing. For example, shouldn't be marking the slot
as dirty?

regards

-- 
Tomas Vondra

Re: wrong comment in libpq.h

On Thu, Oct 17, 2024 at 02:24:33PM +0200, Daniel Gustafsson wrote:
> > On 17 Oct 2024, at 04:45, Bruce Momjian  wrote:
> 
> > I looked at this and decided the GUC section was illogical, so I just
> > moved the variables up into the be-secure.c section.  Patch attached.
> 
> No objections.

Patch applied to master.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: On non-Windows, hard depend on uselocale(3)

On Wed, Nov 20, 2024 at 10:00 PM Thomas Munro  wrote:
> OK, do you think these three patches tell the _configthreadlocale()
> story properly?  (Then after that we can get back to getting rid of
> it...)

Just by the way, here's another interesting thing I have learned about
the msvcrt->ucrt evolution: ucrt introduced UTF-8 support (ie locales
can use UTF-8 encoding, and all the standard library char functions
work with it just fine like on Unix), but PostgreSQL still believes
that Windows can't do that and has a lot of special code paths to work
with wchar_t and perform extra conversions.  I started nibbling at
that[1], but at the time I was still a bit fuzzy on whether we could
really rip all that old stuff out yet and harmonise with Unix, as I
didn't understand the MinGW/runtime/history situation.  It seems the
coast is now clear, and that would be a satisfying cleanup.  (There's
still non-trivial work to do for that though: we allowed weird
mismatched encoding scenarios just on that OS, and would need to stop
that, which might create some upgrade path problems, needs some
thought, see that thread.)

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGJ%3Dca39Cg%3Dy%3DS89EaCYvvCF8NrZRO%3Duog-cnz0VzC6Kfg%40mail.gmail.com

Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-20 Thread David Rowley

On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge  wrote:
> OK, I'm fine with this. v4 patch attached with one plan showing read, 
> written, and dirtied buffers.

I think this might be a good time for anyone out there who is against
turning on BUFFERS when ANALYZE is on to speak up.

Votes for changing this so far seem to be: Me, Michael Christofides,
Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from
2020) [1].

Votes against: None

David

[1] https://www.postgresql.org/message-id/20200601010025.GS13741%40fetter.org

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-11-20 Thread Masahiko Sawada

On Mon, Nov 18, 2024 at 8:44 PM Masahiko Sawada  wrote:
>
> On Mon, Nov 18, 2024 at 5:31 PM Sutou Kouhei  wrote:
> >
> > Hi,
> >
> > In 
> >   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> > on Mon, 18 Nov 2024 17:02:41 -0800,
> >   Masahiko Sawada  wrote:
> >
> > > I have a question about v22. We use pg_attribute_always_inline for
> > > some functions to avoid function call overheads. Applying it to
> > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as
> > > we've discussed. But there are more function where the patch applied
> > > it to:
> > >
> > > -bool
> > > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> > > +static pg_attribute_always_inline bool
> > > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int
> > > *nfields, bool is_csv)
> > >
> > > -static bool
> > > -CopyReadLineText(CopyFromState cstate)
> > > +static pg_attribute_always_inline bool
> > > +CopyReadLineText(CopyFromState cstate, bool is_csv)
> > >
> > > +static pg_attribute_always_inline void
> > > +CopyToTextLikeSendEndOfRow(CopyToState cstate)
> > >
> > > I think it's out of scope of this patch even if these changes are
> > > legitimate. Is there any reason for these changes?
> >
> > Yes for NextCopyFromRawFields() and CopyReadLineText().
> > No for CopyToTextLikeSendEndOfRow().
> >
> > NextCopyFromRawFields() and CopyReadLineText() have "bool
> > is_csv". So I think that we should use
> > pg_attribute_always_inline (or inline) like
> > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow(). I think
> > that it's not out of scope of this patch because it's a part
> > of CopyToTextLikeOneRow() and CopyFromTextLikeOneRow()
> > optimization.
> >
> > Note: The optimization is based on "bool is_csv" parameter
> > and constant "true"/"false" argument function call. If we
> > can inline this function call, all "if (is_csv)" checks in
> > the function are removed.
>
> Understood, thank you for pointing this out.
>
> >
> > pg_attribute_always_inline (or inline) for
> > CopyToTextLikeSendEndOfRow() is out of scope of this
> > patch. You're right.
> >
> > I think that inlining CopyToTextLikeSendEndOfRow() is better
> > because it's called per row. But it's not related to the
> > optimization.
> >
> >
> > Should I create a new patch set without
> > pg_attribute_always_inline/inline for
> > CopyToTextLikeSendEndOfRow()? Or could you remove it when
> > you push?
>
> Since I'm reviewing the patch and the patch organization I'll include it.
>

I've extracted the changes to refactor COPY TO/FROM to use the format
callback routines from v23 patch set, which seems to be a better patch
split to me. Also, I've reviewed these changes and made some changes
on top of them. The attached patches are:

0001: make COPY TO use CopyToRoutine.
0002: minor changes to 0001 patch. will be fixed up.
0003: make COPY FROM use CopyFromRoutine.
0004: minor changes to 0003 patch. will be fixed up.

I've confirmed that v24 has a similar performance improvement to v23.
Please check these extractions and minor change suggestions.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From 257a284447e64753277f7bc08b387e901bcab8bb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Tue, 19 Nov 2024 11:52:33 -0800
Subject: [PATCH v24 2/4] fixup: fixup: minor updates for COPY TO refactoring.

includes:

- reroder function definitions.
- clenaup comments.
---
 src/backend/commands/copyto.c  | 242 +++--
 src/include/commands/copyapi.h |  23 ++--
 2 files changed, 121 insertions(+), 144 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 46f3507a8b..73b9ca4457 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -65,7 +65,7 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
-	/* format routine */
+	/* format-specific routines */
 	const CopyToRoutine *routine;
 
 	/* low-level state data */
@@ -118,6 +118,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+ bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-leve

Re: UUID v7

2024-11-20 Thread Masahiko Sawada

On Tue, Nov 19, 2024 at 7:54 PM Andrey M. Borodin  wrote:
>
>
>
> > On 20 Nov 2024, at 00:06, Masahiko Sawada  wrote:
> >
> > On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin  
> > wrote:
> >>
> >>
> >>
> >>> On 19 Nov 2024, at 14:31, Andrey M. Borodin  wrote:
> >>>
> >>> Done.
> >>
> >> Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to 
> >> compensate lack of nanoseconds).
> >> What do you think?
> >>
> >
> > Thank you for updating the patch!
> >
> > I've reviewed the v33 patch and made some changes mostly for cosmetic
> > things. Please review it to see if we accept these changes.
>
> Your changes look good to me. I particularly like sortability test.
> I see that you removed implementation of clock_gettime() for Windows. Well, 
> this makes sense.
>
>
> >
> > I have one question about the additional patch:
> >
> > +#if defined(__darwin__)
> > + /*
> > + * On MacOS real time is truncted to microseconds. Thus, 2 least
> > + * significant bits of increased_clock_precision are neither random
> > + * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
> > + * are dependent on other time-specific bits, thus they do not contribute
> > + * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
> > + */
> > + uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
> > +#endif
> >
> > I thought that the whole 12 bits in "rand_a" is actually
> > time-dependent since we store 1/4096 fraction of sub-milliseconds. Am
> > I missing something?
>
> We have 12 bits in increaesd_clock_precission but only 1000 possible values 
> of these bits. 2 least significant bits are defined by other 10 bits.
> These bits are not equal to 0, they are changing.
> True, these bits are time-dependent in a sense that these bits are be 
> computed from a full timestamp. I wanted to express the fact that timestamp 
> cannot be altered in a way so only these 2 bits are changed.

Understood the idea. But does replacing the least significant 2 bits
with random 2 bits really not affect monotonicity? The ensured minimal
timestamp step is 245, ((NS_PER_MS / (1 << 12)) + 1), meaning that if
two UUIDs are generated within a microsecond on macOS, the two
timestamps differ by 245 ns. After calculating the increased clock
precision with these two timestamps, they differ only by 1, which
seems to be problematic to me.

Suppose the two timestamps are:

ns1: 1732142033754429000 (Nov 20, 2024 10:33:53.754429000)
ns2: 1732142033754429245 (Nov 20, 2024 10:33:53.754429245)

Their sub-milliseconds are calculated (by multiplying by 4096) to:

subms1: 1757 (0b011011011101)
subms2: 1758 (0b01101100)

If we replace the least significant bits '01' of subms1 with random
bits '11' and replace '10' of subms2 with '00', we cannot guarantee
the monotonicity.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: per backend I/O statistics

On Wed, Nov 20, 2024 at 02:10:23PM +, Bertrand Drouvot wrote:
> Yeah, also this could useful for custom statistics. So I created a dedicated
> thread and a patch proposal (see [1]).
> 
> [1]: 
> https://www.postgresql.org/message-id/Zz3skBqzBncSFIuY%40ip-10-97-1-34.eu-west-3.compute.internal

Thanks for putting that on its own thread.  Will look at it.
--
Michael


signature.asc
Description: PGP signature

Re: ci: Macos failures due to MacPorts behaviour change

Oh, and yeah, we should include the branch name in the cache key.
Something like the attached.  For some reason CI is not allowing me to
see the output from macOS right now (?!) so I couldn't see what
"Populate macports cache" printed out[1], but I think this should be
right... will try again tomorrow.

I guess the alternative would be to set the package list the same
across all branches, even though they need different stuff, so they
could share the same cache without fighting over it?

[1] https://cirrus-ci.com/task/6707217489985536
From 191236b828e737de2b1a9b62140d0f9b2c9dbb70 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 18 Nov 2024 21:46:38 +1300
Subject: [PATCH v2] ci: Fix cached MacPorts installation management.

1.  The error reporting of "port setrequested list-of-packages..."
changed, hiding errors we were relying on to know if all packages in our
list were already installed.  Use a loop instead.

2.  The cached MacPorts installation was shared between PostgreSQL
major branches, though each branch wanted different packages.  Add the
branch name to the cache key, so that different branches, when tested in
one github account/repo such as postgres/postgres, stop fighting with
each other, adding and removing packages.

Back-patch to 15 where CI began.

Diagnosed-by: Andres Freund 
Discussion: https://postgr.es/m/au2uqfuy2nf43nwy2txmc5t2emhwij7kzupygto3d2ffgtrdgr%40ckvrlwyflnh2
---
 .cirrus.tasks.yml|  2 ++
 src/tools/ci/ci_macports_packages.sh | 10 --
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index fc413eb11ef..cd8e1b5eb0b 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -466,6 +466,8 @@ task:
   # Include the OS major version in the cache key.  If the OS image changes
   # to a different major version, we need to reinstall.
   sw_vers -productVersion | sed 's/\..*//'
+  # Include the branch name, because branches want different packages
+  echo ${CIRRUS_BRANCH}
   # Also start afresh if we change our MacPorts install script.
   md5 src/tools/ci/ci_macports_packages.sh
 reupload_on_changes: true
diff --git a/src/tools/ci/ci_macports_packages.sh b/src/tools/ci/ci_macports_packages.sh
index b3df6d36a4e..41cb83593f7 100755
--- a/src/tools/ci/ci_macports_packages.sh
+++ b/src/tools/ci/ci_macports_packages.sh
@@ -61,9 +61,15 @@ fi
 
 # if setting all the required packages as requested fails, we need
 # to install at least one of them
-if ! sudo port setrequested $packages > /dev/null 2>&1 ; then
-echo not all required packages installed, doing so now
+echo "checking if all required packages are installed"
+for package in $packages ; do
+  if ! sudo port setrequested $package > /dev/null 2>&1 ; then
 update_cached_image=1
+  fi
+done
+echo "done"
+if [ "$update_cached_image" -eq 1 ]; then
+echo not all required packages installed, doing so now
 # to keep the image small, we deleted the ports tree from the image...
 sudo port selfupdate
 # XXX likely we'll need some other way to force an upgrade at some
-- 
2.47.0

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

Alexander Lakhin  writes:
> 17.11.2024 05:33, Tom Lane wrote:
>> Yeah.  This has been happening off-and-on in the buildfarm ever
>> since we added that test.  I'm not sure if it's just "the test
>> is unstable" or if it's telling us there's a problem with the
>> cancel logic.  Scraping the last 3 months worth of buildfarm
>> logs finds these instances:

> Yes, I counted those bf failures at [1] too and posted my explanation
> upthread [2].

Sorry, I'd forgotten about that.  I added some more debug logging
to the modifications you made, and confirmed your theory that the
remote session is ignoring the cancel request because it receives it
while DoingCommandRead is true; which must mean that it hasn't started
the slow query yet.

This implies that the 100ms delay in query_cancel.sql is not reliably
enough for the remote to receive the command, which surprises me,
especially since the failing animals aren't particularly slow ones.
Maybe there is something else happening?  But I do reproduce the
failure after adding your delays, and the patch I'm about to propose
does fix it.

Anyway, given that info, Jelte's unapplied 0002 patch earlier in the
thread is not the answer, because this is about dropping a query
cancel not about losing a timeout interrupt.  The equivalent thing
to what he suggested would be to not clear the cancel request flag
during DoingCommandRead, instead letting it kill the next query.
But I didn't like the idea for timeouts, and I like it even less for
query cancel.  What I think we should do instead is to re-issue
the cancel request if we've waited a little and nothing came of it.
This corresponds more or less to what a human user would likely do
(or at least this human would).  The attached patch is set up to
re-cancel after 1 second, then 2 more seconds, then 4 more, etc
until we reach the 30-second "it's dead Jim" threshold.

This seems to fix the problem here.  Thoughts?

BTW, while I didn't do it in the attached, I'm tempted to greatly
reduce the 100ms delay in query_cancel.sql.  If this does make it
more robust, we shouldn't need that much time anymore.

regards, tom lane

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 2326f391d3..7a8cac83cb 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -95,6 +95,13 @@ static uint32 pgfdw_we_get_result = 0;
  */
 #define CONNECTION_CLEANUP_TIMEOUT	3

+/*
+ * Milliseconds to wait before issuing another cancel request.  This covers
+ * the race condition where the remote session ignored our cancel request
+ * because it arrived while idle.
+ */
+#define RE_CANCEL_TIMEOUT	1000
+
 /* Macro for constructing abort command to be sent */
 #define CONSTRUCT_ABORT_COMMAND(sql, entry, toplevel) \
 	do { \
@@ -145,6 +152,7 @@ static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel);
 static bool pgfdw_cancel_query(PGconn *conn);
 static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime);
 static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime,
+   TimestampTz recanceltime,
    bool consume_input);
 static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 	 bool ignore_errors);
@@ -154,6 +162,7 @@ static bool pgfdw_exec_cleanup_query_end(PGconn *conn, const char *query,
 		 bool consume_input,
 		 bool ignore_errors);
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
+	 TimestampTz recanceltime,
 	 PGresult **result, bool *timed_out);
 static void pgfdw_abort_cleanup(ConnCacheEntry *entry, bool toplevel);
 static bool pgfdw_abort_cleanup_begin(ConnCacheEntry *entry, bool toplevel,
@@ -1322,18 +1331,25 @@ pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel)
 static bool
 pgfdw_cancel_query(PGconn *conn)
 {
+	TimestampTz now = GetCurrentTimestamp();
 	TimestampTz endtime;
+	TimestampTz recanceltime;

 	/*
 	 * If it takes too long to cancel the query and discard the result, assume
 	 * the connection is dead.
 	 */
-	endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
-		  CONNECTION_CLEANUP_TIMEOUT);
+	endtime = TimestampTzPlusMilliseconds(now, CONNECTION_CLEANUP_TIMEOUT);
+
+	/*
+	 * Also, lose patience and re-issue the cancel request after a little bit.
+	 * (This serves to close some race conditions.)
+	 */
+	recanceltime = TimestampTzPlusMilliseconds(now, RE_CANCEL_TIMEOUT);

 	if (!pgfdw_cancel_query_begin(conn, endtime))
 		return false;
-	return pgfdw_cancel_query_end(conn, endtime, false);
+	return pgfdw_cancel_query_end(conn, endtime, recanceltime, false);
 }

 /*
@@ -1359,9 +1375,10 @@ pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime)
 }

 static bool
-pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, bool consume_input)
+pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime,
+	   TimestampTz recanceltime, bool consume_input)
 {
-	PGresult   *result = NUL

Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions

On Tue, Nov  5, 2024 at 05:24:07PM +0800, jian he wrote:
> On Thu, Oct 31, 2024 at 11:51 PM Bruce Momjian  wrote:
> >
> > On Fri, Oct 18, 2024 at 10:00:54AM +0800, jian he wrote:
> > > On Fri, Oct 18, 2024 at 2:05 AM Bruce Momjian  wrote:
> > > > Yes, updated patch attached.
> > > >
> > > looks good.
> > >
> > > in the meantime, do you think it's necessary to slightly rephrase
> > > jsonb_path_match doc entry:
> > >
> > > currently doc entry:
> > > jsonb_path_match ( target jsonb, path jsonpath [, vars jsonb [, silent
> > > boolean ]] ) → boolean
> > > Returns the result of a JSON path predicate check for the specified JSON 
> > > value.
> > >
> > >
> > > "the result of a JSON path predicate check for the specified JSON
> > > value." is a jsonb boolean.
> > > but jsonb_path_match returns sql boolean.
> > > maybe add something to describe case like: "if JSON path predicate
> > > check return jsonb null, jsonb_path_match will return SQL null".
> >
> > Yes, I think that is a good point, updated patch attached.
> >
> 
> played with
> https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-FILTER-EX-TABLE
> 
> The patch  looks good to me.

Patch applied back to PG 17.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: Accessing other session's temp table

Mikhail Gribkov  writes:
> What do you think?

I think this will break cases we don't want to break.

Accessing the metadata of other temp tables is fine, and indeed
necessary for operations like dropping them.  It's access to
the table contents that needs to be blocked.  I'm surprised
that we don't have sufficient tests at that level.

[ experiments... ]  It looks like this did work as expected up
through v15.  So somebody broke it fairly recently, perhaps
as a side effect of the table-AM work.  Might be worth bisecting
to see where it broke.

(Realistically though, this is all only a problem for superusers,
who are supposed to Know Better.  For ordinary users the permissions
set on temp schemas ought to be enough to prevent such things.)

regards, tom lane

Re: Return pg_control from pg_backup_stop().

2024-11-20 Thread David Steele


On 10/3/24 05:11, David Steele wrote:

On 10/3/24 07:45, Michael Paquier wrote:


1) is something that has more value than 2), IMO, because there is no
need for a manual step when a backup is taken by the replication
protocol.  Well, custom backup solutions that rely on the replication
protocol to copy the data would need to make sure that they have a
backup_label, but that's something they should do anyway and what this
patch wants to protect users from.  The SQL part is optional IMO.  It
can be done, but it has less impact overall and makes backups more
complicated by requiring the manual copy of the control file.


I don't think having incremental backup in pg_basebackup means alternate 
backup solutions are going away or that we should deprecate the SQL 
interface. If nothing else, third-party solutions need a way to get an 
untorn copy of pg_control and in general I think the new flag will be 
universally useful.


I updated this patch to fix an issue with -fsanitize=alignment. I'm not 
entirely happy with copying twice but not sure of another way to do it. 
As far as I can see VARDATA() will not return aligned data on 64-bit 
architectures.


Regards,
-David
From 659a1d9b6b1528e139741dc69146848ca15a07b9 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Wed, 20 Nov 2024 22:33:36 +
Subject: Return pg_control from pg_backup_stop().

Harden recovery by returning a copy of pg_control from pg_backup_stop() that has
a flag set to prevent recovery if the backup_label file is missing. Instead of
backup software copying pg_control from PGDATA, it stores an updated version
that is returned from pg_backup_stop(). This is better for the following
reasons:

* The user can no longer remove backup_label and get what looks like a
successful recovery (while almost certainly causing corruption). If backup_label
is removed the cluster will not start. The user may try pg_resetwal, but that
tool makes it pretty clear that corruption will result from its use.

* We don't need to worry about backup software seeing a torn copy of pg_control,
since Postgres can safely read it out of memory and provide a valid copy via
pg_backup_stop(). This solves torn reads without needing to write pg_control via
a temp file, which may affect performance on a standby.

* For backup from standby, we no longer need to instruct the backup software to
copy pg_control last. In fact the backup software should not copy pg_control 
from
PGDATA at all.

These changes have no impact on current backup software and they are free to use
the pg_control available from pg_stop_backup() or continue to use pg_control 
from
PGDATA. Of course they will miss the benefits of getting a consistent copy of
pg_control and the backup_label checking, but will be no worse off than before.

Catalog version bump is required.
---
 doc/src/sgml/backup.sgml| 18 +-
 doc/src/sgml/func.sgml  |  5 +-
 src/backend/access/transam/xlogfuncs.c  | 20 --
 src/backend/catalog/system_functions.sql|  2 +-
 src/include/catalog/pg_proc.dat |  4 +-
 src/test/recovery/t/042_low_level_backup.pl | 67 -
 6 files changed, 101 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index e4e4c56cf14..2fcf1811213 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
  values. The second of these fields should be written to a file named
  backup_label in the root directory of the backup. The
  third field should be written to a file named
- tablespace_map unless the field is empty. These 
files are
+ tablespace_map unless the field is empty. The fourth
+ field should be written into a file named
+ global/pg_control (overwriting the existing file when
+ present). These files are
  vital to the backup working and must be written byte for byte without
- modification, which may require opening the file in binary mode.
+ modification, which will require opening the file in binary mode.
 


@@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);

 

-You should, however, omit from the backup the files within the
+You should exclude global/pg_control from your backup
+and put the contents of the controlfile column
+returned from pg_backup_stop in your backup at
+global/pg_control. This version of pg_control contains
+safeguards against recovery without backup_label present and is guaranteed
+not to be torn.
+   
+
+   
+You should also omit from the backup the files within the
 cluster's pg_wal/ subdirectory.  This
 slight adjustment is worthwhile because it reduces the risk
 of mistakes when restoring.  This is easy to arrange if
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fa37a0469ae..b3d8c73295b 100644
--- a/doc/src/sgml/func.

Re: cannot freeze committed xmax

2024-11-20 Thread Mark Dilger

> On Nov 20, 2024, at 6:39 AM, Andrey M. Borodin  wrote:
> 
> 
> 
>> On 20 Nov 2024, at 15:58, Andrey M. Borodin  wrote:
>> 
>> PFA the patch doing so.
> 
> Ugh. The patch is simply dysfunctional, sorry. xmax_status is being checked 
> uninitiated.
> But, well, it highlights the idea: make verify_heapam() aware of such 
> corruptions.
> What do you think?

I like the idea of increasing the corruption checking coverage.  The worry with 
these patches is that we'll overlook some legitimate use case of the status 
bits and call it corruption when it isn't.  Indeed, that appears to be the case 
with this patch, assuming I initialized the xmax_status field in the way you 
had in mind, and that applying it to REL_17_STABLE is ok.  (Maybe this would 
work differently on HEAD?)

+   get_xid_status(xmax, ctx, &xmax_status);
+   if (xmax_status == XID_COMMITTED && (tuphdr->t_infomask & 
HEAP_UPDATED))
+   {
+   report_corruption(ctx,
+   psprintf("committed xmax %u while tuple 
has HEAP_XMAX_INVALID and HEAP_UPDATED flags",
+   xmax));
+   }

That results in TAP test failures on a uncorrupted but frozen table:

# +++ tap check in contrib/amcheck +++
t/001_verify_heapam.pl . 74/? 
#   Failed test 'all-frozen not corrupted table'
#   at t/001_verify_heapam.pl line 53.
#  got: '30|117||committed xmax 2 while tuple has HEAP_XMAX_INVALID and 
HEAP_UPDATED flags'
# expected: ''
t/001_verify_heapam.pl . 257/? # Looks like you failed 1 test of 272.
t/001_verify_heapam.pl . Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/272 subtests 

The first part of your patch which checks the xmin_status seems ok at first 
glance.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

On 11/20/24 18:24, Tomas Vondra wrote:
>
> ...
>
> What confuses me a bit is that we update the restart_lsn (and call
> ReplicationSlotsComputeRequiredLSN() to recalculate the global value)
> all the time. Walsender does that in PhysicalConfirmReceivedLocation for
> example. So we actually see the required LSN to move during checkpoint
> very often. So how come we don't see the issues much more often? Surely
> I miss something important.
> 

This question "How come we don't see this more often?" kept bugging me,
and the answer is actually pretty simple.

The restart_lsn can move backwards after a hard restart (for the reasons
explained), but physical replication does not actually rely on that. The
replica keeps track of the LSN it received (well, it uses the same LSN),
and on reconnect it sends the startpoint to the primary. And the primary
just proceeds use that instead of the (stale) restart LSN for the slot.
And the startpoint is guaranteed (I think) to be at least restart_lsn.

AFAICS this would work for pg_replication_slot_advance() too, that is if
you remember the last LSN the slot advanced to, it should be possible to
advance to it just fine. Of course, it requires a way to remember that
LSN, which for a replica is not an issue. But this just highlights we
can't rely on restart_lsn for this purpose.

(Apologies if this repeats something obvious, or something you already
said, Vitaly.)

regards

-- 
Tomas Vondra

Re: per backend I/O statistics

On Wed, Nov 20, 2024 at 02:20:18PM +, Bertrand Drouvot wrote:
> Right. I did not had in mind to go that far here (for the per backend stats
> needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
> is related to per backend stats only) call at the right place in StartupXLOG()
> for the startup process only. As that's the startup process that will reset
> or restore the stats I think that makes sense.
> 
> It looks like that what you have in mind is much more generic, what about:
> 
> - Focus on this thread first and then move the call as proposed above
> - Think about a more generic idea later on (on the per-backend I/O stats is
> in).

Moving pgstat_create_backend_stat() may be OK in the long-term, at
least that would document why we need to care about the startup
process.

Still, moving only the call feels incomplete, so how about adding a
boolean field in PgStat_ShmemControl that defaults to false when the
shmem area is initialized in StatsShmemInit(), then switched to true
once we know that the stats have been restored or reset by the startup
process.  Something like that should work to control the dshash
inserts, I guess?
--
Michael


signature.asc
Description: PGP signature

Re: sunsetting md5 password support

On Wed, Nov 20, 2024 at 11:33 AM Nathan Bossart 
wrote:

> After thinking about this some more, I'm actually finding myself leaning
> towards leaving the hint and potentially adding more detail to the
> documentation as a follow-up patch.

Sounds good to me. I think my hesitation was more that the hint was
overpromising help, so big +1 to more detail and keeping it.

Cheers,
Greg

Re: per backend I/O statistics

Hi,

On Wed, Nov 20, 2024 at 09:01:26AM +0900, Michael Paquier wrote:
> On Tue, Nov 19, 2024 at 04:28:55PM +, Bertrand Drouvot wrote:
> > So, for the startup process only, what about?
> > 
> > - don't call pgstat_create_backend_stat() in pgstat_beinit()...
> > - but call it in StartupXLOG() instead (after the stats are discarded or 
> > restored).
> > 
> > That way we could get rid of the changes linked to the assertion and still 
> > handle
> > the startup process particular case. Thoughts?
> 
> Hmm.  That may prove to be a good idea in the long-term.  The startup
> process is a specific path kicked in at a very early stage, so it is
> also a bit weird that we'd try to insert statistics while we are going
> to reset them anyway a bit later.

Exactly.

> That may also be relevant for
> custom statistics, actually, especially if some calls happen in some
> hook paths taken before the reset is done.  This could happen for
> injection point stats when loading it with shared_preload_libraries, 
> actually, if you load, attach or run a callback at this early stage.

Right. I did not had in mind to go that far here (for the per backend stats
needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
is related to per backend stats only) call at the right place in StartupXLOG()
for the startup process only. As that's the startup process that will reset
or restore the stats I think that makes sense.

It looks like that what you have in mind is much more generic, what about:

- Focus on this thread first and then move the call as proposed above
- Think about a more generic idea later on (on the per-backend I/O stats is
in).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: proposal: schema variables

2024-11-20 Thread Dmitry Dolgov

> On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote:
> Hi
>
> I wrote POC of VARIABLE(varname) syntax support

Thanks, the results look good. I'm still opposed the idea of having a
warning, and think it has to be an error -- but it's my subjective
opinion. Lets see if there are more votes on that topic.

Re: ALTER TABLE uses a bistate but not for toast tables

2024-11-20 Thread Dmitry Dolgov

> On Wed, Nov 20, 2024 at 06:43:58AM -0600, Justin Pryzby wrote:
>
> > Thanks for rebasing. To help with review, could you also describe
> > current status of the patch? I have to admit, currently the commit
> > message doesn't tell much, and looks more like notes for the future you.
>
> The patch does what it aims to do and AFAIK in a reasonable way.  I'm
> not aware of any issue with it.  It's, uh, waiting for review.
>
> I'm happy to expand on the message to describe something like design
> choices, but the goal here is really simple: why should wide column
> values escape the intention of the ring buffer?  AFAICT it's fixing an
> omission.  If you have a question, please ask; that would help to
> indicate what needs to be explained.

Here is what I see in the commit message:

DONE: ALTER, CLUSTER
TODO: copyto, copyfrom?

slot_getsomeattrs slot_deform_heap_tuple fetchatt
heap_getnextslot => heapgettup => heapgetpage => ReadBufferExtended
initscan
table_beginscan table_scan_getnextslot
RelationCopyStorageUsingBuffer ReadBufferWithoutRelcache

(gdb) bt
 #0  table_open (relationId=relationId@entry=16390, 
lockmode=lockmode@entry=1) at table.c:40
 #1  0x56444cb23d3c in toast_fetch_datum 
(attr=attr@entry=0x7f67933cc6cc) at detoast.c:372
 #2  0x56444cb24217 in detoast_attr (attr=attr@entry=0x7f67933cc6cc) at 
detoast.c:123
 #3  0x56444d07a4c8 in pg_detoast_datum_packed 
(datum=datum@entry=0x7f67933cc6cc) at fmgr.c:1743
 #4  0x56444d042c8d in text_to_cstring (t=0x7f67933cc6cc) at 
varlena.c:224
 #5  0x56444d0434f9 in textout (fcinfo=) at varlena.c:573
 #6  0x56444d078f10 in FunctionCall1Coll 
(flinfo=flinfo@entry=0x56444e4706b0, collation=collation@entry=0, 
arg1=arg1@entry=140082828592844) at fmgr.c:1124
 #7  0x56444d079d7f in OutputFunctionCall 
(flinfo=flinfo@entry=0x56444e4706b0, val=val@entry=140082828592844) at 
fmgr.c:1561
 #8  0x56444ccb1665 in CopyOneRowTo 
(cstate=cstate@entry=0x56444e470898, slot=slot@entry=0x56444e396d20) at 
copyto.c:975
 #9  0x56444ccb2c7d in DoCopyTo (cstate=cstate@entry=0x56444e470898) at 
copyto.c:891
 #10 0x56444ccab4c2 in DoCopy (pstate=pstate@entry=0x56444e396bb0, 
stmt=stmt@entry=0x56444e3759b0, stmt_location=0, stmt_len=48, 
processed=processed@entry=0x7ffc212a6310) at copy.c:308

cluster:
heapam_relation_copy_for_cluster
reform_and_rewrite_tuple
rewrite_heap_tuple
raw_heap_insert

This gave me an impression, that the patch is deeply WIP, and it doesn't
make any sense to review it. I can imagine chances are good, that I'm
not alone who get such impression, and you loose potential reviewers.
Thus, shaping up a meaningful message might be helpful.

> > Since it's in the performance category, I'm also curious how much
> > overhead does this shave off? I mean, I get it that bulk insert strategy
> > helps with buffers usage, as you've implied in the thread -- but how
> > does it look like in benchmark numbers?
>
> The intent of using a bistate isn't to help the performance of the
> process using the bistate.  Rather, the intent is to avoid harming the
> performance of other processes.  If anything, I expect it could slow
> down the process using bistate -- the same as for non-toast data.
>
> https://www.postgresql.org/message-id/CA%2BTgmobC6RD2N8kbPPTvATpUY1kisY2wJLh2jsg%3DHGoCp2RiXw%40mail.gmail.com

Right, but the question is still there, how much does it bring? My point
is that if you demonstrate "under this and that load, we get so and so
many percents boost", this will hopefully attract more attention to the
patch.

Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

On Thu, Nov 21, 2024 at 10:22:54AM +1300, David Rowley wrote:
> On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge  
> wrote:
> > OK, I'm fine with this. v4 patch attached with one plan showing read, 
> > written, and dirtied buffers.
> 
> I think this might be a good time for anyone out there who is against
> turning on BUFFERS when ANALYZE is on to speak up.
> 
> Votes for changing this so far seem to be: Me, Michael Christofides,
> Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from
> 2020) [1].

Please add me to the above list.  What we have now is too confusing.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: Fix an error while building test_radixtree.c with TEST_SHARED_RT

2024-11-20 Thread Alvaro Herrera

On 19/11/2024 01:20, Masahiko Sawada wrote:
> I realized that building test_radixtree.c with TEST_SHARED_RT fails
> because it eventually sets RT_SHMEM when #include'ing radixtree.h but
> it's missing some header files to include. I've attached a patch to
> include necessary header files in radixtree.h to make it
> self-contained.

If those includes are only needed when RT_SHMEM is defined, I suggest
they should be guarded by #ifdef RT_SHMEM, per Peter E's IWYU efforts
lately.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Accessing other session's temp table

2024-11-20 Thread Mikhail Gribkov

Hi hackers,

I was experimenting with temporary tables and noticed many odd things with
them.

Short story: Having appropriate privileges, user can access other session's
temp
tables and it is a complete mess. Let's fix it.

Longer story.
Let's open two sessions for one postgres user. In the first one we will
create
a temporary table:

postgres=# create temp table t(val int);
CREATE TABLE
postgres=# \dt
  List of relations
  Schema   | Name | Type  |  Owner
---+--+---+--
 pg_temp_0 | t| table | postgres
(1 row)


Now, in the second session we will try to insert couple of rows into this
table:

postgres=# insert into pg_temp_0.t values (1);
INSERT 0 1
postgres=# insert into pg_temp_0.t values (2);
ERROR:  cannot access temporary tables of other sessions

Wow! First row insertion succeeded, the second one - failed. But this is
just the
beginning. Now we will switch back to the first session, insert couple of
rows from
there and check, what do we have in the table:

postgres=# insert into pg_temp_0.t values (3);
INSERT 0 1
postgres=# insert into pg_temp_0.t values (4);
INSERT 0 1
postgres=# select * from pg_temp_0.t;
 val
-
   3
   4
(2 rows)

We only see the values added in this session. The row successfully inserted
from the
second session is absent. Now let's switch to the second session and check
the table there:

postgres=# select * from pg_temp_0.t;
 val
-
   1
(1 row)

Wow again! In both sessions we only see the rows this very session inserted.

Looks like very bad behavior to me.

The same situation with DELETE and in fact with all commands handled
in ExecModifyTable. By the way the second session does not have to be
opened by
the same user: it can be some other user, who was given privileges on the
temp
schema and table.

The problem is that temp tables are not intended to be accessed by other
sessions
at all, and most of the work with them is done in local buffers with no
disk syncing.
There are already some checks and restrictions, but these are definitely
not enough.
Attached is a proposed patch fixing the above problems. Maybe relation_open
is not
the prettiest place for such checks, but it's quite obvious, it works and
it covers all
the remaining problems. And the check is very cheap so we should not be
afraid of
doing it often.

What do you think?
--
 best regards,
Mikhail A. Gribkov


001-patch-v01-Fix-other-sessions-temp-tables-access.patch
Description: Binary data

Re: allow changing autovacuum_max_workers without restarting

2024-11-20 Thread Yogesh Sharma

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:not tested

Hi,
- Tested patch with check-world.
- Verified CheckAutovacuumWorkerGUCs functionality and the correct WARNING was 
reported.
- For feature specific testing, I created multiple tables and generated bloat. 
Expected behavior was witnessed.
Lower autovacuum_worker_slots = 16 setting is better suited to start with.

Thanks
Yogesh

Re: Document NULL

2024-11-20 Thread David G. Johnston

Thank you for taking the time to look this over.

On Wed, Nov 20, 2024 at 3:19 AM jian he  wrote:

> On Sat, Jun 29, 2024 at 4:40 AM David G. Johnston
>  wrote:
> >
> > The attached are complete and ready for review.  I did some file
> structure reformatting at the end and left that as the second patch.  The
> first contains all of the content.
> >
> > I'm adding this to the commitfest.
> >
> > Thanks!
> >
> > David J.
>
> in doc/src/sgml/nullvalues.sgml
> can we mention
> \pset null NULL
> command,  then NULL means this value is NULL.
>
> you can also see  doc/src/sgml/func.sgml
>(The above example can be copied-and-pasted
>into psql to set things up for the following
>examples.
>

Good idea.  I'll see how it plays out.

> -
> in doc/src/sgml/nullvalues.sgml
> see the attached  for one example output
>
> in doc/src/sgml/nullvalues.sgml we have
> one_whitespace
> two_whitespace
> three_whitespace
> four_whitespace
>
> i think you need zero whitespace for tag . like
> 
> 
>
> https://tdg.docbook.org/tdg/4.5/programlisting
> says whitespaces are significant.
>

Did you not apply patch 0002?  The indentation in 0001 exists because it
was much easier to deal with collapse-all related viewing in my editor.  I
removed it in 0002 since the final commit would indeed not be so indented.
The tag itself doesn't actually care but its content does indeed get
undesirably indented if the markup is nested in the typical manner.

> <<>>
>As noted in , JSON has a null
> value
>that does not get exposed at the SQL level.
> <<>>
> i feel like this sentence is weird. since these two are different things.
>

Yeah, the linked page and this summary/pointer need a bit of work.  I don't
like the unexplained "different concept" but probably "not exposed" isn't
much better.  As this gets closer to being committable I'll see about
getting this topic cleaned up.  Suggestions welcomed.

>
> I think some of the query and query output can be combined into one
> .
> no need one  for the query, one  for the output.
>

Trivial to change but having both seems more semantically correct and
easier to read IMO.  We don't have a policy document covering this that I'm
aware of, and IIRC both variations presently exist in the documentation.

David J.

Re: Sample rate added to pg_stat_statements

On Tue, Nov 19, 2024 at 5:07 PM Michael Paquier  wrote:

> One piece of it would be to see how much of such "bottlenecks" we
> would be able to get rid of by integrating pg_stat_statements into
> the central pgstats with the custom APIs, without pushing the module
> into core.

Any particular reason these days we cannot push this into core and allow
disabling on startup? To say this extension is widely used would be an
understatement.

Cheers,
Greg

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly




On 11/20/24 14:40, Vitaly Davydov wrote:
> Dear Hackers,
> 
>  
> 
> To ping the topic, I'd like to clarify what may be wrong with the idea
> described here, because I do not see any interest from the community.
> The topic is related to physical replication. The primary idea is to
> define the horizon of WAL segments (files) removal based on saved on
> disk restart LSN values. Now, the WAL segment removal horizon is
> calculated based on the current restart LSN values of slots, that can
> not be saved on disk at the time of the horizon calculation. The case
> take place when a slot is advancing during checkpoint as described
> earlier in the topic.
> 

Yeah, a simple way to fix this might be to make sure we don't use the
required LSN value set after CheckPointReplicationSlots() to remove WAL.
AFAICS the problem is KeepLogSeg() gets the new LSN value by:

   keep = XLogGetReplicationSlotMinimumLSN();

Let's say we get the LSN before calling CheckPointGuts(), and then pass
it to KeepLogSeg, so that it doesn't need to get the fresh value.

Wouldn't that fix the issue?

> 
> Such behaviour is not a problem when slots are used only for physical
> replication in a conventional way. But it may be a problem when physical
> slot is used for some other goals. For example, I have an extension
> which keeps the WAL using physical replication slots. It creates a new
> physical slot and advances it as needed. After restart, it can use
> restart lsn of the slot to read WAL from this LSN. In this case, there
> is no guarantee that restart lsn will point to an existing WAL segment.
> 

Yeah.

> 
> The advantage of the current behaviour is that it requires a little bit
> less WAL to keep. The disadvantage is that physical slots do not
> guarantee WAL keeping starting from its' restart lsns in general.
> 

If it's wrong, it doesn't really matter it has some advantages.


regards

-- 
Tomas Vondra

RE: Use attribute((target(sse4.2))) for SSE42 CRC32C

2024-11-20 Thread Devulapalli, Raghuveer

Anymore feedback on this patch? Hoping this is a straightforward one. 

Raghuveer

> -Original Message-
> From: Devulapalli, Raghuveer 
> Sent: Friday, November 8, 2024 11:05 AM
> To: Devulapalli, Raghuveer ; Nathan Bossart
> 
> Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash
> 
> Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C
> 
> V3: With the suggested changes.
> 
> Raghuveer
> 
> > -Original Message-
> > From: Devulapalli, Raghuveer 
> > Sent: Friday, November 8, 2024 10:43 AM
> > To: Nathan Bossart 
> > Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash
> > 
> > Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C
> >
> > > I believe we expect MSVC builds to use meson at this point, which is
> > > probably why there's this extra check:
> > >
> > >   if cc.get_id() == 'msvc'
> > > cdata.set('USE_SSE42_CRC32C', false)
> > > cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1)
> > > have_optimized_crc = true
> > >
> >
> > Ah, I missed this. This makes sense.
> >

Re: EphemeralNamedRelation and materialized view

Yugo NAGATA  writes:
>> You could even argue that case 2 isn't good enough either,
>> and we should be delivering a specific error message saying
>> that an ENR can't be used in a view/matview.  To do that,
>> we'd likely need to pass down the QueryEnvironment in more
>> places not fewer.

> We can raise a similar error for (not materialized) views by passing
> QueryEnv to DefineView() (or in ealier stage) , but there are other
> objects that can contain ENR in their definition, for examle, functions,
> cursor, or RLS policies. Is it worth introducing this version of error
> message for all these objects? 

If it's worth checking for here, why not in other cases?

I'm not sure I like using isQueryUsingTempRelation as a model,
because its existing use in transformCreateTableAsStmt seems
like mostly a hack.  (And I definitely don't love introducing
yet another scan of the query.)  It seems to me that we should
think about this, for MVs as well as those other object types,
as fundamentally a dependency problem.  That is, the reason
we can't allow a reference to an ENR in a long-lived object
is that we have no catalog representation for the reference.
So that leads to thinking that the issue ought to be handled
in recordDependencyOnExpr and friends.  If we see an ENR while
scanning a rangetable to extract dependencies, then complain.
This might be a bit messy to produce good error messages for,
though.

Speaking of error messages, I'm not sure that it's okay to
use the phrase "ephemeral named relation" in a user-facing
error message.  We don't use that term in our documentation
AFAICS, except in some SPI documentation that most users
will never have read.  In the context of triggers, "transition
relation" seems to be what the docs use.

regards, tom lane

Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-20 Thread Guillaume Lelarge

Le mer. 20 nov. 2024 à 16:51, Greg Sabino Mullane  a
écrit :

> On Wed, Nov 20, 2024 at 1:26 AM Guillaume Lelarge 
> wrote:
>
>> OK, but I'm not sure which example I should pick to add dirtied and
>> written shared buffers. It looks kinda artificial. Should I choose one
>> randomly?
>>
>
> It will be artificial, but I think that's ok: anyone running it on their
> own will be getting different numbers anyway. I was looking at the "14.1.2
> EXPLAIN ANALYZE" section in perform.sgml. Here's some actual numbers I got
> with some playing around with concurrent updates:
>
>  Recheck Cond: (unique1 < 10)
>>  Heap Blocks: exact=5
>>  Buffers: shared hit=2 read=5 written=4
>
> ...
>
>>  Planning:
>>Buffers: shared hit=289 dirtied=9
>
>
>
OK, I'm fine with this. v4 patch attached with one plan showing read,
written, and dirtied buffers.

Thanks for all your comments/reviews.


-- 
Guillaume.
From 484779613694f777e2ffb815a16aa11269404c20 Mon Sep 17 00:00:00 2001
From: Guillaume Lelarge 
Date: Tue, 12 Nov 2024 21:59:14 +0100
Subject: [PATCH v4] Enable BUFFERS by default with EXPLAIN ANALYZE

This automatically enables the BUFFER option when ANALYZE is used with
EXPLAIN. The BUFFER option stays disabled if ANALYZE isn't used.

The auto_explain extension gets the same treatment.
---
 contrib/auto_explain/auto_explain.c   |  4 +-
 doc/src/sgml/auto-explain.sgml|  2 +-
 doc/src/sgml/perform.sgml | 42 ++---
 doc/src/sgml/ref/explain.sgml |  6 +-
 src/backend/commands/explain.c|  7 ++
 src/test/regress/expected/brin_multi.out  | 18 ++--
 src/test/regress/expected/explain.out | 36 ++-
 .../regress/expected/incremental_sort.out |  4 +-
 src/test/regress/expected/memoize.out |  2 +-
 src/test/regress/expected/merge.out   |  2 +-
 src/test/regress/expected/partition_prune.out | 94 +--
 src/test/regress/expected/select.out  |  2 +-
 src/test/regress/expected/select_into.out |  4 +-
 src/test/regress/expected/select_parallel.out |  6 +-
 src/test/regress/expected/subselect.out   |  2 +-
 src/test/regress/expected/tidscan.out |  6 +-
 src/test/regress/sql/brin_multi.sql   | 18 ++--
 src/test/regress/sql/explain.sql  |  7 +-
 src/test/regress/sql/incremental_sort.sql |  4 +-
 src/test/regress/sql/memoize.sql  |  2 +-
 src/test/regress/sql/merge.sql|  2 +-
 src/test/regress/sql/partition_prune.sql  | 94 +--
 src/test/regress/sql/select.sql   |  2 +-
 src/test/regress/sql/select_into.sql  |  4 +-
 src/test/regress/sql/select_parallel.sql  |  6 +-
 src/test/regress/sql/subselect.sql|  2 +-
 src/test/regress/sql/tidscan.sql  |  6 +-
 27 files changed, 220 insertions(+), 164 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 623a674f99..484e009577 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,7 +27,7 @@ static int	auto_explain_log_min_duration = -1; /* msec or -1 */
 static int	auto_explain_log_parameter_max_length = -1; /* bytes or -1 */
 static bool auto_explain_log_analyze = false;
 static bool auto_explain_log_verbose = false;
-static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_buffers = true;
 static bool auto_explain_log_wal = false;
 static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
@@ -152,7 +152,7 @@ _PG_init(void)
 			 "Log buffers usage.",
 			 NULL,
 			 &auto_explain_log_buffers,
-			 false,
+			 true,
 			 PGC_SUSET,
 			 0,
 			 NULL,
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 0c4656ee30..6623d36c99 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -122,7 +122,7 @@ LOAD 'auto_explain';
   equivalent to the BUFFERS option of EXPLAIN.
   This parameter has no effect
   unless auto_explain.log_analyze is enabled.
-  This parameter is off by default.
+  This parameter is on by default.
   Only superusers can change this setting.
  
 
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index cd12b9ce48..24916b82b5 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -722,13 +722,19 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
 ---&zwsp;--
  Nested Loop  (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1)
+   Buffers: shared hit=36 read=6
->  Bitmap Heap Scan on tenk1 t1  (cost=4.36..39.38 rows=10 width=244) (actual time=0.009..0.017 rows=10 loops=1)

Re: pg_dump --no-comments confusion

On Mon, Nov 18, 2024 at 05:14:53PM -0500, Tom Lane wrote:
> Marcos Pegoraro  writes:
> > But it would be good to have this patch applied to all supported versions,
> > as soon as nothing was changed on that pg_dump option, no ?
> 
> Even more to the point, should we change pg_dump's help output?
> 
> ...
>   --load-via-partition-rootload partitions via the root table
>   --no-commentsdo not dump comments
>   --no-publicationsdo not dump publications
> ...
> 
> Also, the identical text appears in pg_dumpall's man page and help
> output, while pg_restore has a differently worded version:
> 
>   printf(_("  --no-commentsdo not restore comments\n"));
> 
> pg_restore's man page seems OK though:
> 
> Do not output commands to restore comments, even if the archive
> contains them.

Fixed in the attached applied patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 4d7c0464687..014f2792589 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -417,7 +417,7 @@ exclude database PATTERN
   --no-comments
   

-Do not dump comments.
+Do not dump COMMENT commands.

   
  
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a8c141b689d..c30aafbe70d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1212,7 +1212,7 @@ help(const char *progname)
 			 "   servers matching PATTERN\n"));
 	printf(_("  --insertsdump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-rootload partitions via the root table\n"));
-	printf(_("  --no-commentsdo not dump comments\n"));
+	printf(_("  --no-commentsdo not dump comment commands\n"));
 	printf(_("  --no-publicationsdo not dump publications\n"));
 	printf(_("  --no-security-labels do not dump security label assignments\n"));
 	printf(_("  --no-subscriptions   do not dump subscriptions\n"));
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index e3ad8fb2956..9a04e51c81a 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -662,7 +662,7 @@ help(void)
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
 	printf(_("  --insertsdump data as INSERT commands, rather than COPY\n"));
 	printf(_("  --load-via-partition-rootload partitions via the root table\n"));
-	printf(_("  --no-commentsdo not dump comments\n"));
+	printf(_("  --no-commentsdo not dump comment commands\n"));
 	printf(_("  --no-publicationsdo not dump publications\n"));
 	printf(_("  --no-role-passwords  do not dump passwords for roles\n"));
 	printf(_("  --no-security-labels do not dump security label assignments\n"));
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f2c1020d053..10da7d27da8 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -484,7 +484,7 @@ usage(const char *progname)
 	printf(_("  --filter=FILENAMErestore or skip objects based on expressions\n"
 			 "   in FILENAME\n"));
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
-	printf(_("  --no-commentsdo not restore comments\n"));
+	printf(_("  --no-commentsdo not restore comment commands\n"));
 	printf(_("  --no-data-for-failed-tables  do not restore data of tables that could not be\n"
 			 "   created\n"));
 	printf(_("  --no-publicationsdo not restore publications\n"));

Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-20 Thread Vik Fearing




On 20/11/2024 22:22, David Rowley wrote:

On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge  wrote:

OK, I'm fine with this. v4 patch attached with one plan showing read, written, 
and dirtied buffers.

I think this might be a good time for anyone out there who is against
turning on BUFFERS when ANALYZE is on to speak up.

Votes for changing this so far seem to be: Me, Michael Christofides,
Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from
2020) [1].



Add me to the pro list, please.

https://www.postgresql.org/message-id/b3197ba8-225f-f53c-326d-5b1756c77...@postgresfriends.org

--

Vik Fearing

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-11-20 Thread Masahiro Ikeda

Hi Peter,

On 2024-11-20 04:06, Peter Geoghegan wrote:

Hi Masahiro,

On Tue, Nov 19, 2024 at 3:30 AM Masahiro Ikeda
wrote:

Apologies for the delayed response. I've confirmed that the costing is
significantly
improved for multicolumn indexes in the case I provided. Thanks!
https://www.postgresql.org/message-id/TYWPR01MB10982A413E0EC4088E78C0E11B1A62%40TYWPR01MB10982.jpnprd01.prod.outlook.com

Great! I made it one of my private/internal test cases for the
costing. Your test case was quite helpful.

Attached is v15. It works through your feedback.

Importantly, v15 has a new patch which has a fix for your test.sql
case -- which is the most important outstanding problem with the patch
(and has been for a long time now). I've broken those changes out into
a separate patch because they're still experimental, and have some
known minor bugs. But it works well enough for you to assess how close
I am to satisfactorily fixing the known regressions, so it seems worth
posting quickly.

Thanks for your quick response!

IIUC, why not add it to the documentation? It would clearly help users
understand how to tune their queries using the counter, and it would
also show that the counter is not just for developers.

The documentation definitely needs more work. I have a personal TODO
item about that.

Changes to the documentation can be surprisingly contentious, so I
often work on it last, when we have the clearest picture of how to
talk about the feature. For example, Matthias said something that's
approximately the opposite of what you said about it (though I agree
with you about it).

OK, I understood.

From the perspective of consistency, wouldn't it be better to align
the

naming
between the EXPLAIN output and pg_stat_all_indexes.idx_scan, even
though

the
documentation states they refer to the same concept?

I personally prefer something like "search" instead of "scan", as
"scan"

is
commonly associated with node names like Index Scan and similar terms.
To maintain
consistency, how about renaming pg_stat_all_indexes.idx_scan to
pg_stat_all_indexes.idx_search?

I suspect that other hackers will reject that proposal on
compatibility grounds, even though it would make sense in a "green
field" situation.

Honestly, discussions about UI/UX details such as EXPLAIN ANALYZE
always tend to result in unproductive bikeshedding. What I really want
is something that will be acceptable to all parties. I don't have any
strong opinions of my own about it -- I just think that it's important
to show *something* like "Index Searches: N" to make skip scan user
friendly.

OK, I agree.

Although it's not an optimal solution and would only reduce the degree
of performance
degradation, how about introducing a threshold per page to switch from
skip scan to full
index scan?

The approach to fixing these regressions from the new experimental
patch doesn't need to use any such threshold. It is effective both
with simple "WHERE id2 = 100" cases (like the queries from your
test.sql test case), as well as more complicated "WHERE id2 BETWEEN 99
AND 101" inequality cases.

What do you think? The regressions are easily under 5% with the new
patch applied, which is in the noise.

I didn't come up with the idea. At first glance, your idea seems good
for all cases.

Actually, test.sql shows a performance improvement, and the performance
is almost the same as the master's seqscan. To be precise, the master's
performance is 10-20% better than the v15 patch because the seqscan is
executed in parallel. However, the v15 patch is twice as fast as when
seqscan is not executed in parallel.

However, I found that there is still a problematic case when I read your
patch. IIUC, beyondskip becomes true only if the tuple's id2 is greater
than the scan key value. Therefore, the following query (see
test_for_v15.sql)

still degrades.

-- build without '--enable-debug' '--enable-cassert' 'CFLAGS=-O0 '
-- SET skipscan_prefix_cols=32;
Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1
width=8) (actual time=65.767..65.770 rows=1 loops=1)

Output: id1, id2
Index Cond: (t.id2 = 100)
Index Searches: 1
Heap Fetches: 0
Buffers: shared hit=2736
Settings: effective_cache_size = '7932MB', work_mem = '15MB'
Planning Time: 0.058 ms
Execution Time: 65.782 ms
(9 rows)

-- SET skipscan_prefix_cols=0;
Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1
width=8) (actual time=17.276..17.278 rows=1 loops=1)

I’m reporting the above result, though you might already be aware of the
issue.

At the same time, we're just as capable of skipping whenever the scan
encounters a large group of skipped-prefix-column duplicates. For
example, if I take your test.sql test case and ad

Re: allow changing autovacuum_max_workers without restarting

2024-11-20 Thread Nathan Bossart

rebased

-- 
nathan
>From e877271830e076338f999ee72b9d8148e469d5d2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 22 Jun 2024 15:05:44 -0500
Subject: [PATCH v10 1/1] allow changing autovacuum_max_workers without
 restarting

---
 doc/src/sgml/config.sgml  | 28 ++-
 doc/src/sgml/runtime.sgml |  4 +-
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/postmaster/autovacuum.c   | 76 +++
 src/backend/postmaster/pmchild.c  |  4 +-
 src/backend/storage/lmgr/proc.c   |  6 +-
 src/backend/utils/init/postinit.c |  6 +-
 src/backend/utils/misc/guc_tables.c   | 11 ++-
 src/backend/utils/misc/postgresql.conf.sample |  3 +-
 src/include/postmaster/autovacuum.h   |  1 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  1 +
 11 files changed, 112 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a84e60c09b..7db171198a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8590,6 +8590,25 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH 
csv;
   
  
 
+ 
+  autovacuum_worker_slots (integer)
+  
+   autovacuum_worker_slots configuration 
parameter
+  
+  
+  
+   
+Specifies the number of backend slots to reserve for autovacuum worker
+processes.  The default is 16.  This parameter can only be set at 
server
+start.
+   
+   
+When changing this value, consider also adjusting
+.
+   
+  
+ 
+
  
   autovacuum_max_workers (integer)
   
@@ -8600,7 +8619,14 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH 
csv;

 Specifies the maximum number of autovacuum processes (other than the
 autovacuum launcher) that may be running at any one time.  The default
-is three.  This parameter can only be set at server start.
+is three.  This parameter can only be set in the
+postgresql.conf file or on the server command 
line.
+   
+   
+Note that a setting for this value which is higher than
+ will have no effect,
+since autovacuum workers are taken from the pool of slots established
+by that setting.

   
  
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index bcd81e2415..8b7ae27908 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -839,7 +839,7 @@ psql: error: connection to server on socket 
"/tmp/.s.PGSQL.5432" failed: No such
 When using System V semaphores,
 PostgreSQL uses one semaphore per allowed 
connection
 (), allowed autovacuum worker process
-(), allowed WAL sender process
+(), allowed WAL sender process
 (), allowed background
 process (), etc., in sets of 16.
 The runtime-computed parameter 
@@ -892,7 +892,7 @@ $ postgres -D $PGDATA -C 
num_os_semaphores
 When using POSIX semaphores, the number of semaphores needed is the
 same as for System V, that is one semaphore per allowed connection
 (), allowed autovacuum worker process
-(), allowed WAL sender process
+(), allowed WAL sender process
 (), allowed background
 process (), etc.
 On the platforms where this option is preferred, there is no specific
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 6f58412bca..706f2127de 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5403,7 +5403,7 @@ CheckRequiredParameterValues(void)
 */
if (ArchiveRecoveryRequested && EnableHotStandby)
{
-   /* We ignore autovacuum_max_workers when we make this test. */
+   /* We ignore autovacuum_worker_slots when we make this test. */
RecoveryRequiresIntParameter("max_connections",
 
MaxConnections,
 
ControlFile->MaxConnections);
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..963924cbc7 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -115,6 +115,7 @@
  * GUC parameters
  */
 bool   autovacuum_start_daemon = false;
+intautovacuum_worker_slots;
 intautovacuum_max_workers;
 intautovacuum_work_mem = -1;
 intautovacuum_naptime;
@@ -210,7 +211,7 @@ typedef struct autovac_table
 /*-
  * This struct holds information about a single worker's whereabouts.  We keep
  * an array of these in shared memory, sized according to
- * autovacuum_max_workers.
+ * autovacuum_worker_slots.
  *
  * wi_linksentry into free list or running list
  * wi_dboidOID of

Re: meson and check-tests

2024-11-20 Thread Nazir Bilal Yavuz

Hi,

On Tue, 12 Nov 2024 at 18:13, jian he  wrote:
>
> also played around with v5-0002, works fine.
> overall, v5-0001 and v5-0002 works as i expected.

Thanks for checking it!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Sample rate added to pg_stat_statements

On Tue, Nov 19, 2024 at 7:12 AM Andrey M. Borodin 
wrote:

> +1 for the idea. I heard a lot of complaints about that pgss is costly.
> Most of them were using it wrong though.

I'm curious what "using it wrong" means exactly?

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

Cheers,
Greg

Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin

> On 20 Nov 2024, at 13:36, Tomas Vondra  wrote:
> On 11/19/24 17:44, Sanjay Khatri wrote:

>> And we dont have any data backup too.

This will be your first priority once the system is up and running again, maybe
even to the point of planning for what immediate action to take before the
system boots to be able to act quickly in case it falls over again.

> It's very unlikely a Postgres issue would make the whole system crash,
> particularly in a way that prevents it from booting again.

Agreed, especially since this crash reportedly happened after uninstalling
postgres.  If the uninstaller could corrupt the OS like that I have a feeling
we'd heard about it by now.

> I don't think
> anyone will be able to help you without more info - you need to make it
> boot again, inspect the Postgres logs, etc.

I'm not familiar with Dell blade enclosures but if there is a similar system to
the iLO Lights Out console that HP ships I would start there (a quick Google
search hints at ePSA).

--
Daniel Gustafsson

Re: cannot freeze committed xmax

> On 28 Oct 2020, at 21:21, Mark Dilger  wrote:
> 
> The other possibillity is that this tuple is erroneously marked as 
> HEAP_UPDATED.  heap_update() sets that, which makes sense.  
> rewrite_heap_tuple() copies the old tuple's bits to the new tuple and then 
> does some work to resolve update chains.  I guess you could look at whether 
> that logic might leave things in an invalid state.  I don't have any theory 
> about that.

Hi Mark and Konstantin!

Recently (Oct 15, 2024) I've observed this kind of problem on one of our 
production clusters: an old tuple version had come to life.
# select ctid,* from skipped where ctid = '(16488,13)' or ctid = '(21597,16)';
-[ RECORD 1 ]+-
ctid | (16488,13)
id   | 1121skipped
date_created | 2023-08-16 03:31:36.306466+03
date_updated | 2023-08-16 03:31:36.306481+03
-[ RECORD 2 ]+-
ctid | (21597,16)
id   | 1121skipped
date_created | 2023-08-16 03:31:36.306466+03
date_updated | 2024-09-06 14:10:47.926007+03

Freezing was failing with "cannot freeze committed xmax". xmax of old version 
== xmin of new one. I have no idea what is semantics of date_created and 
date_updated.
The server was running vanilla 14 regularly updated.
I suggested
   delete from skipped where ctid = '(16488,13)';
and it worked, monitoring issue was resolved. I found no other problems in 
logs, heapcheck, amcheck etc. And found no violations of id uniqueness.

What was bothering me was that amcheck verify_heap() was quite about this table.
Let's add some checks to detect such conditions?
PFA the patch doing so.

Thanks!

Best regards, Andrey Borodin.

0001-Detect-hintbit-contradictions-to-commit-log.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-11-20 Thread vignesh C

On Tue, 19 Nov 2024 at 12:43, Nisha Moond  wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

This Assert can fail:
+   /*
+* Check if the slot needs to
be invalidated due to
+*
replication_slot_inactive_timeout GUC.
+*/
+   if (now &&
+
TimestampDifferenceExceeds(s->inactive_since, now,
+
replication_slot_inactive_timeout_sec *
1000))
+   {
+   invalidation_cause = cause;
+   inactive_since =
s->inactive_since;
+
+   /*
+* Invalidation due to
inactive timeout implies that
+* no one is using the slot.
+*/
+   Assert(s->active_pid == 0);

With the following scenario:
Set replication_slot_inactive_timeout to 10 seconds
-- Create a slot
postgres=# select pg_create_logical_replication_slot ('test',
'pgoutput', true, true);
 pg_create_logical_replication_slot

 (test,0/1748068)
(1 row)

-- Wait for 10 seconds and execute checkpoint
postgres=# checkpoint;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly

The assert fails:
#5  0x5b074f0c922f in ExceptionalCondition
(conditionName=0x5b074f2f0b4c "s->active_pid == 0",
fileName=0x5b074f2f0010 "slot.c", lineNumber=1762) at assert.c:66
#6  0x5b074ee26ead in InvalidatePossiblyObsoleteSlot
(cause=RS_INVAL_INACTIVE_TIMEOUT, s=0x740925361780, oldestLSN=0,
dboid=0, snapshotConflictHorizon=0, invalidated=0x7fffaee87e63) at
slot.c:1762
#7  0x5b074ee273b2 in InvalidateObsoleteReplicationSlots
(cause=RS_INVAL_INACTIVE_TIMEOUT, oldestSegno=0, dboid=0,
snapshotConflictHorizon=0) at slot.c:1952
#8  0x5b074ee27678 in CheckPointReplicationSlots
(is_shutdown=false) at slot.c:2061
#9  0x5b074e9dfda7 in CheckPointGuts (checkPointRedo=24412528,
flags=108) at xlog.c:7513
#10 0x5b074e9df4ad in CreateCheckPoint (flags=108) at xlog.c:7179
#11 0x5b074edc6bfc in CheckpointerMain (startup_data=0x0,
startup_data_len=0) at checkpointer.c:463

Regards,
Vignesh

Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin

2024-11-20 Thread Sanjay Khatri

Dear,
   Last week I installed Postgres 15.8 on my Windows server 2016, I
installed the pgadmin thats comes packed with postgresql installer.
I already have postgres 12.1 with pgAdmin installed on my server.
After installing Postgres 15.8, the respective pgAdmin was not working, it
showed error 'Postgres server could not be contacted'
But the old pgAdmin(of Postgres12) was working fine, but was not displaying
the postgres15 server on server list on pgAdmin.
But, I wanted to work on the new pgAdmin (which was installed with
postgres15). So I tried to find the solution on internet, and I got the
solution. It was just to delete the pgAdmin.bak file from
'C\Users\Username\AppData\Roaming\pgAdmin\'
And I did that and It worked, the bew PgAdmin started to work, it also
displayed the Postgres15 connection se server list.
But after 2 hours, my Windows Server 2016 crashed, and I am not able to
boot my system. So on hard restart the system and it restarted well. And I
quickly uninstalled the Postgres15 with its pgadmin from the machine. But
the system crashed again after uninstalling the postgres 15. Now I am not
able to boot the system. Its been 2 days the system is not able to boot.
The system is on dell iDrac PowerEdge M6300.
Is there any solution for this, I can try the solution if you suggest. We
cannot format the Storage as our data is within the hardisk (Raid) type.
And we dont have any data backup too.
Please help if possible

Re: RFC: Additional Directory for Extensions

2024-11-20 Thread Peter Eisentraut


On 18.11.24 20:19, David E. Wheeler wrote:

- The biggest problem is that many extensions set in their control file

   module_pathname = '$libdir/foo'

This disables the use of dynamic_library_path, so this whole idea of installing 
an extension elsewhere won't work that way.  The obvious solution is that 
extensions change this to just 'foo'.  But this will require a lot updating 
work for many extensions, or a lot of patching by packagers.


Yeah, '$libdir/foo' has been the documented way to do it for quite some time, 
as I recall. Perhaps the behavior of the MODULE_PATHNAME replacement function 
could be changed to omit $libdir when writing the SQL files?


Elsewhere you write:


Nothing changes about shared library files.  They are looked up in 
dynamic_library_path or any hardcoded file name.


And also point out that the way to install them is:

```
make install datadir=/else/where/share pkglibdir=/else/where/lib
```

So as long as dynamic_library_path includes /else/where/lib it should work, 
just as before, no?


The path is only consulted if the specified name does not contain a 
slash.  So if you do LOAD 'foo', the path is consulted, but if you do 
LOAD '$libdir/foo', it is not.  The problem I'm describing is that most 
extensions use the latter style, per current recommendation in the 
documentation.

Re: pg_prepared_xacts returns transactions that are foreign to the caller




> On 20 Nov 2024, at 12:48, Vladimir Sitnikov  
> wrote:
> 
> "select * from pg_prepared_xacts" might produce transactions created by a 
> different user, so the caller won't be able to issue "commit prepared".
> 
> I think there should be a view that returns only the transactions that the 
> caller can commit or rollback.
> Is it something that can be implemented at the backend?

Like "select * from pg_prepared_xacts where pg_has_role(current_user, owner, 
'member');"?


Best regards, Andrey Borodin.

Re: memory leak in pgoutput

2024-11-20 Thread by Yang

> You should be more careful with the amount of tests you are doing
> here.  This fails while waiting for some changes to be streamed when
> creating a subscription:
> cd src/test/subscription/ && PROVE_TESTS=t/017_stream_ddl.pl make check
I apologize for the obvious error in the previous patch. I have corrected it
in the new patch(v3) and pass the regression testing.


v3-0001-Fix-memory-leak-in-pgoutput.patch
Description: v3-0001-Fix-memory-leak-in-pgoutput.patch

Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin

This seems like a combination of (at least) two independent issues - the
pgadmin issue and the crash.

On 11/19/24 17:44, Sanjay Khatri wrote:
> Dear,
>    Last week I installed Postgres 15.8 on my Windows server 2016, I
> installed the pgadmin thats comes packed with postgresql installer.
> I already have postgres 12.1 with pgAdmin installed on my server.
> After installing Postgres 15.8, the respective pgAdmin was not working,
> it showed error 'Postgres server could not be contacted' 
> But the old pgAdmin(of Postgres12) was working fine, but was not
> displaying the postgres15 server on server list on pgAdmin.

I don't use pgadmin (or Windows) so I don't have any ideas why it might
be having these connection problems, and I'm not sure I quite understand
what you did. But it seems the new 15.8 server simply was not running,
that would explain all the symptoms.

> But, I wanted to work on the new pgAdmin (which was installed with
> postgres15). So I tried to find the solution on internet, and I got the
> solution. It was just to delete the pgAdmin.bak file from
> 'C\Users\Username\AppData\Roaming\pgAdmin\'
> And I did that and It worked, the bew PgAdmin started to work, it also
> displayed the Postgres15 connection se server list.

That is weird, TBH. No idea why deleting a .bak file would matter.

> But after 2 hours, my Windows Server 2016 crashed, and I am not able to
> boot my system. So on hard restart the system and it restarted well. And
> I quickly uninstalled the Postgres15 with its pgadmin from the machine.
> But the system crashed again after uninstalling the postgres 15. Now I
> am not able to boot the system. Its been 2 days the system is not able
> to boot.
> The system is on dell iDrac PowerEdge M6300.
> Is there any solution for this, I can try the solution if you suggest.
> We cannot format the Storage as our data is within the hardisk (Raid)
> type. And we dont have any data backup too.
> Please help if possible

It's very unlikely a Postgres issue would make the whole system crash,
particularly in a way that prevents it from booting again. I don't think
anyone will be able to help you without more info - you need to make it
boot again, inspect the Postgres logs, etc.

regards

-- 
Tomas Vondra

Re: ALTER TABLE uses a bistate but not for toast tables

2024-11-20 Thread Justin Pryzby

On Tue, Nov 19, 2024 at 03:45:19PM +0100, Dmitry Dolgov wrote:
> > On Mon, Jul 15, 2024 at 03:43:24PM GMT, Justin Pryzby wrote:
> > @cfbot: rebased
> 
> Thanks for rebasing. To help with review, could you also describe
> current status of the patch? I have to admit, currently the commit
> message doesn't tell much, and looks more like notes for the future you.

The patch does what it aims to do and AFAIK in a reasonable way.  I'm
not aware of any issue with it.  It's, uh, waiting for review.

I'm happy to expand on the message to describe something like design
choices, but the goal here is really simple: why should wide column
values escape the intention of the ring buffer?  AFAICT it's fixing an
omission.  If you have a question, please ask; that would help to
indicate what needs to be explained.

> The patch numbering is somewhat confusing as well, should it be v5 now?

The filename was 0001-WIP-use-BulkInsertState-for-toast-tuples-too.patch.
I guess you're referring to the previous filename: v4-*.
That shouldn't be so confusing -- I just didn't specify a version,
either by choice or by omission.

> From what I understand, the new patch does address the review feedback,
> but you want to do more, something with copy to / copy from?

If I were to do more, it'd be for a future patch, if the current patch
were to ever progress.

> Since it's in the performance category, I'm also curious how much
> overhead does this shave off? I mean, I get it that bulk insert strategy
> helps with buffers usage, as you've implied in the thread -- but how
> does it look like in benchmark numbers?

The intent of using a bistate isn't to help the performance of the
process using the bistate.  Rather, the intent is to avoid harming the
performance of other processes.  If anything, I expect it could slow
down the process using bistate -- the same as for non-toast data.

https://www.postgresql.org/message-id/CA%2BTgmobC6RD2N8kbPPTvATpUY1kisY2wJLh2jsg%3DHGoCp2RiXw%40mail.gmail.com

-- 
Justin

Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID

> On 15 Nov 2024, at 21:33, Peter Geoghegan  wrote:
> 
> Attached patch teaches btree_xlog_vacuum, nbtree VACUUM's REDO
> routine, to reset the target page's opaque->btpo_cycleid to 0. This
> makes the REDO routine match original execution, which seems like a
> good idea on consistency grounds.
> 
> I propose this for the master branch only.

The change seems correct to me: anyway cycle must be less than cycle of any 
future vacuum after promotion. I cannot say anything about beauty of resetting 
or not resetting the field.
I'd suggest renaming the field into something like "btpo_split_vaccycleid". I 
was aware of index vacuum backtracking, but it took me a while to build context 
again.

Best regards, Andrey Borodin.

RE: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

2024-11-20 Thread Zhijie Hou (Fujitsu)

On Tuesday, November 19, 2024 9:42 PM Shlok Kyal  
wrote:
> I agree that we can remove the test. I debugged and found the test modified in
> above patch does not hit the condition added in commit adedf54.
> Also, according to me we cannot trigger the bug after the fix in this thread. 
> So, I
> think we can remove the testcase.
> 
> I have attached the latest patch with an updated commit message and also
> removed the testcase.

Thanks for updating the patch.

Out of curiosity, I tried to merge the pub_collist_contains_invalid_column()
and replident_has_unpublished_gen_col() functions into a single function to
evaluate if this approach might be better.

As shown in the attached diff, it reduces some code redundancy *but* also
introduces additional IF conditions and parameters in the new combined
function, which might complicate the logic a bit.

Personally, I think the existing V10 patch looks good and clear. I am just
sharing the diff for feedback in case others want to have a look.

Best Regards,
Hou zj
From 8640a4605f32f8072d093902da0af23dfa6d119b Mon Sep 17 00:00:00 2001
From: Hou Zhijie 
Date: Wed, 20 Nov 2024 16:34:04 +0800
Subject: [PATCH v10] combine functions

---
 src/backend/commands/publicationcmds.c | 219 ++---
 src/backend/utils/cache/relcache.c |  38 ++---
 src/include/commands/publicationcmds.h |   7 +-
 3 files changed, 105 insertions(+), 159 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c 
b/src/backend/commands/publicationcmds.c
index 053877c524..0d5daf7626 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -336,21 +336,36 @@ pub_rf_contains_invalid_column(Oid pubid, Relation 
relation, List *ancestors,
 }
 
 /*
- * Check if all columns referenced in the REPLICA IDENTITY are covered by
- * the column list.
+ * Check for invalid columns in the publication table definition.
  *
- * Returns true if any replica identity column is not covered by column list.
+ * This function evaluates two conditions:
+ *
+ * 1. Ensures that all columns referenced in the REPLICA IDENTITY are covered
+ *by the column list. If any column is missing, *invalid_column_list is set
+ *to true.
+ *
+ * 2. Ensures that the REPLICA IDENTITY does not contain unpublished generated
+ *columns. If an unpublished generated column is found,
+ **unpublished_gen_col is set to true.
+ *
+ * Returns true if any of the above conditions are not met.
  */
 bool
-pub_collist_contains_invalid_column(Oid pubid, Relation relation, List 
*ancestors,
-   bool 
pubviaroot)
+pub_contains_invalid_column(Oid pubid, Relation relation, List *ancestors,
+   bool pubviaroot, bool 
pubgencols,
+   bool 
*invalid_column_list,
+   bool 
*unpublished_gen_col)
 {
-   HeapTuple   tuple;
Oid relid = RelationGetRelid(relation);
Oid publish_as_relid = RelationGetRelid(relation);
-   boolresult = false;
-   Datum   datum;
-   boolisnull;
+   Bitmapset  *idattrs;
+   Bitmapset  *columns = NULL;
+   TupleDesc   desc = RelationGetDescr(relation);
+   Publication *pub;
+   int x;
+
+   *invalid_column_list = false;
+   *unpublished_gen_col = false;
 
/*
 * For a partition, if pubviaroot is true, find the topmost ancestor 
that
@@ -368,158 +383,90 @@ pub_collist_contains_invalid_column(Oid pubid, Relation 
relation, List *ancestor
publish_as_relid = relid;
}
 
-   tuple = SearchSysCache2(PUBLICATIONRELMAP,
-   
ObjectIdGetDatum(publish_as_relid),
-   
ObjectIdGetDatum(pubid));
-
-   if (!HeapTupleIsValid(tuple))
-   return false;
-
-   datum = SysCacheGetAttr(PUBLICATIONRELMAP, tuple,
-   
Anum_pg_publication_rel_prattrs,
-   &isnull);
+   /* Fetch the column list */
+   pub = GetPublication(pubid);
+   check_and_fetch_column_list(pub, publish_as_relid, NULL, &columns);
 
-   if (!isnull)
+   if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
{
-   int x;
-   Bitmapset  *idattrs;
-   Bitmapset  *columns = NULL;
-
/* With REPLICA IDENTITY FULL, no column list is allowed. */
-   if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
-   result = true;
-
-   /* Transform the column list datum to a bitmapset. */
-   columns = pub_collist_t

Re: Conflict detection for update_deleted in logical replication

2024-11-20 Thread Nisha Moond

On Thu, Nov 14, 2024 at 8:24 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Attach the V9 patch set which addressed above comments.
>

Reviewed v9 patch-set and here are my comments for below changes:

@@ -1175,10 +1189,29 @@ ApplyLauncherMain(Datum main_arg)
  long elapsed;

  if (!sub->enabled)
+ {
+ can_advance_xmin = false;
+ xmin = InvalidFullTransactionId;
  continue;
+ }

  LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
  w = logicalrep_worker_find(sub->oid, InvalidOid, false);
+
+ if (can_advance_xmin && w != NULL)
+ {
+ FullTransactionId nonremovable_xid;
+
+ SpinLockAcquire(&w->relmutex);
+ nonremovable_xid = w->oldest_nonremovable_xid;
+ SpinLockRelease(&w->relmutex);
+
+ if (!FullTransactionIdIsValid(xmin) ||
+ !FullTransactionIdIsValid(nonremovable_xid) ||
+ FullTransactionIdPrecedes(nonremovable_xid, xmin))
+ xmin = nonremovable_xid;
+ }
+

1) In Patch-0002, could you please add a comment above "+ if
(can_advance_xmin && w != NULL)" to briefly explain the purpose of
finding the minimum XID at this point?

2) In Patch-0004, with the addition of the 'detect_update_deleted'
option, I see the following two issues in the above code:
2a) Currently, all enabled subscriptions are considered when comparing
and finding the minimum XID, even if detect_update_deleted is disabled
for some subscriptions.
I suggest excluding the oldest_nonremovable_xid of subscriptions where
detect_update_deleted=false by updating the check as follows:

if (sub->detectupdatedeleted && can_advance_xmin && w != NULL)

2b) I understand why advancing xmin is not allowed when a subscription
is disabled. However, the current check allows a disabled subscription
with detect_update_deleted=false to block xmin advancement, which
seems incorrect. Should the check also account for
detect_update_deleted?, like :
  if (sub->detectupdatedeleted &&  !sub->enabled)
+ {
+ can_advance_xmin = false;
+ xmin = InvalidFullTransactionId;
  continue;
+ }

However, I'm not sure if this is the right fix, as it could lead to
inconsistencies if the detect_update_deleted is set to false after
disabling the sub.
Thoughts?

--
Thanks,
Nisha

Re: On non-Windows, hard depend on uselocale(3)

On Fri, Nov 15, 2024 at 1:53 AM Peter Eisentraut  wrote:
> On 14.11.24 08:48, Thomas Munro wrote:
> > The three MinGW environments we test today are using ucrt, and
> > configure detects the symbol on all.  Namely: fairwren
> > (msys2/mingw64), the CI mingw64 task and the mingw cross-build that
> > runs on Linux in the CI CompilerWarnings task.  As far as I know these
> > are the reasons for, and mechanism by which, we keep MinGW support
> > working.  We have no policy requiring arbitrary old MinGW systems
> > work, and we wouldn't know anyway.
>
> Right.  So I think we could unwind this in steps.  First, remove the
> configure test for _configthreadlocale() and all the associated #ifdefs
> in the existing ecpg code.  This seems totally safe, it would just leave
> behind MinGW older than 2016 and MSVC older than 2015, the latter of
> which is already the current threshold.
>
> Then the question whether we want to re-enable the error checking on
> _configthreadlocale() that was reverted by 2cf91ccb, or at least
> something similar.  This should also be okay based on your description
> of the different Windows runtimes.  I think it would also be good to do
> this to make sure this works before we employ _configthreadlocale() in
> higher-stakes situations.
>
> I suggest doing these two steps as separate patches, so this doesn't get
> confused between the various thread-related threads that want to
> variously add or remove uses of this function.

OK, do you think these three patches tell the _configthreadlocale()
story properly?  (Then after that we can get back to getting rid of
it...)


0001-Remove-configure-check-for-_configthreadlocale.patch
Description: Binary data


0002-Formally-require-ucrt-on-Windows.patch
Description: Binary data


0003-Revert-Blind-attempt-to-fix-_configthreadlocale-fail.patch
Description: Binary data

Re: Enhancing Memory Context Statistics Reporting

2024-11-20 Thread Rahila Syed

Hi,

To achieve both completeness and avoid writing to a file, I can consider
> displaying the numbers for the remaining contexts as a cumulative total
> at the end of the output.
>
> Something like follows:
> ```
> postgres=# select * from pg_get_process_memory_contexts('237244', false);
>  name  | ident
>  |   type   | path | total_bytes | tot
> al_nblocks | free_bytes | free_chunks | used_bytes |  pid
>
> ---++--+--+-+
> ---++-++
>  TopMemoryContext  |
>  | AllocSet | {0}  |   97696 |
>  5 |  14288 |  11 |  83408 | 237244
>  search_path processing cache  |
>  | AllocSet | {0,1}|8192 |
>  1 |   5328 |   7 |   2864 | 237244
>
> *Remaining contexts total:  23456 bytes (total_bytes) ,
> 12345(used_bytes),  11,111(free_bytes)*
> ```
>

Please find attached an updated patch with this change. The file previously
used to
store spilled statistics has been removed. Instead, a cumulative total of
the
remaining/spilled context statistics is now stored in the DSM segment,
which is
displayed as follows.

postgres=# select * from pg_get_process_memory_contexts('352966', false);
 *name *| ident |   type   |  path  | *total_bytes*
| *total_nblocks* | *free_bytes *| *free_chunks *| *used_bytes* |  pi
d
--+---+--++-+---++-++

 TopMemoryContext |   | AllocSet | {0}|   97696 |
  5 |  14288 |  11 |  83408 | 352
966
.
.
.
 MdSmgr   |   | AllocSet | {0,18} |8192 |
  1 |   7424 |   0 |768 | 352
966
* Remaining Totals* |   |  || *1756016*
|   *188 *| *658584 *|* 132* |   * 1097432 *| 352
966
(7129 rows)
-

I believe this serves as a good compromise between completeness
and avoiding the overhead of file handling. However, I am open to
reintroducing file handling if displaying the complete statistics of the
remaining contexts prove to be more important.

All the known bugs in the patch have been fixed.

In summary, one DSA  per PostgreSQL process is used to share
the statistics of that process. A DSA is created by the first client
backend that requests memory context statistics, and it is pinned
for all future requests to that process.
A handle to this DSA is shared between the client and the publishing
process using fixed shared memory. The fixed shared memory consists
of an array of size MaxBackends + auxiliary processes, indexed
by procno. Each element in this array is less than 100 bytes in size.

A PostgreSQL process uses a condition variable to signal a waiting client
backend once it has finished publishing the statistics. If, for some
reason,
the signal is not sent, the waiting client backend will time out.

When statistics of a local backend is requested, this function returns the
following
WARNING and exits, since this can be handled by an existing function which
doesn't require a DSA.

WARNING:  cannot return statistics for local backend
HINT:  Use pg_get_backend_memory_contexts instead

Looking forward to your review.

Thank you,
Rahila Syed


v4-Function-to-report-memory-context-stats-of-a-process.patch
Description: Binary data

Re: A way to build PSQL 17.1 source on AIX platform

> On 18 Nov 2024, at 08:03, Raghu Dev Ramaiah  wrote:

> Is there an alternate way I can build PSQL 17.1 source code on AIX platform? 
> Please let me know..thanks.

As Tom has already answered in your other mail on this, PostgreSQL 17 does not
support building on AIX.  If PostgreSQL 18 is going to support AIX again there
needs to be a significant effort from interested parties to make that happen.

--
Daniel Gustafsson

Re: doc fail about ALTER TABLE ATTACH re. NO INHERIT

2024-11-20 Thread Amit Langote

On Tue, Nov 19, 2024 at 6:41 PM Alvaro Herrera  wrote:
> On 2024-Nov-14, Amit Langote wrote:
>
> > Sorry, here's the full example.  Note I'd changed both
> > AddRelationNotNullConstraints() and AdjustNotNullInheritance() to not
> > throw an error *if* the table is a leaf partition when the NO INHERIT
> > of an existing constraint doesn't match that of the new constraint.
> >
> > create table p (a int not null) partition by list (a);
> > create table p1 partition of p for values in (1);
> > alter table p1 add constraint a_nn_ni not null a no inherit;
>
> Yeah, I think this behavior is bogus, because the user wants to have
> something (a constraint that will not inherit) but they cannot have it,
> because there is already a constraint that will inherit.  The current
> behavior of throwing an error seems correct to me.  With your patch,
> what this does is mark the constraint as "local" in addition to
> inherited, but that'd be wrong, because the constraint the user wanted
> is not of the same state.

Yeah, just not throwing the error, as my patch did, is not enough.
The patch didn't do anything to ensure that a separate constraint with
the properties that the user entered will exist alongside the
inherited one, but that's not possible given that the design only
allows one NOT NULL constraint for a column as you've written below.

> > On Wed, Nov 13, 2024 at 10:49 PM Alvaro Herrera  
> > wrote:
>
> > > I think:
> > > * if a leaf partition already has an inherited not-null constraint
> > >   from its parent and we want to add another one, we should:
> > >   - if the one being added is NO INHERIT, then throw an error, because
> > > we cannot merge them
> >
> > Hmm, we'll be doing something else for CHECK constraints if we apply my 
> > patch:
> >
> > create table p (a int not null, constraint check_a check (a > 0)) partition 
> > by list (a);
> > create table p1 partition of p (constraint check_a check (a > 0) no 
> > inherit) for values in (1);
> > NOTICE:  merging constraint "check_a" with inherited definition
> >
> > \d p1
> >  Table "public.p1"
> >  Column |  Type   | Collation | Nullable | Default
> > +-+---+--+-
> >  a  | integer |   | not null |
> > Partition of: p FOR VALUES IN (1)
> > Check constraints:
> > "check_a" CHECK (a > 0) NO INHERIT
> >
> > I thought we were fine with allowing merging of such child
> > constraints, because leaf partitions can't have children to pass them
> > onto, so the NO INHERIT clause is essentially pointless.
>
> I'm beginning to have second thoughts about this whole thing TBH, as it
> feels inconsistent -- and unnecessary, if we get the patch to flip the
> INHERIT/NO INHERIT flag of constraints.

Ah, ok, I haven't looked at that patch, but I am happy to leave this alone.

> > >   - if the one being added is not NO INHERIT, then both constraints
> > > would have the same state and so we silently do nothing.
> >
> > Maybe we should emit some kind of NOTICE message in such cases?
>
> Hmm, I'm not sure.  It's not terribly useful, is it?  I mean, if the
> user wants to have a constraint, then whether the constraint is there or
> not, the end result is the same and we needn't say anything about it.
>
> It's only if the constraint is not what they want (because of
> mismatching INHERIT flag) that we throw some message.
>
> (I wonder if we'd throw an error if the proposed constraint has a
> different _name_ from the existing constraint.  If a DDL script is
> expecting that the constraint will be named a certain way, then by
> failing to throw an error we might be giving confusing expectations.)

Here's an example that I think matches the above description, which,
ISTM, leads to a state similar to what one would encounter with my
patch as described earlier:

create table parent (a int not null);
create table child (a int, constraint a_not_null_child not null a)
inherits (parent);
NOTICE:  merging column "a" with inherited definition

\d+ child
  Table "public.child"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   | not null | | plain   |
|  |
Not-null constraints:
"a_not_null_child" NOT NULL "a" (local, inherited)
Inherits: parent
Access method: heap

I think the inherited constraint should be named parent_a_not_null,
but the constraint that gets stored is the user-specified constraint
in `create table child`.  Actually, even the automatically generated
constraint name using the child table's name won't match the name of
the inherited constraint:

create table child (a int not null) inherits (parent);
NOTICE:  merging column "a" with inherited definition
\d+ child
  Table "public.child"
 Column |  Type   | C

Re: CSN snapshots in hot standby

2024-11-20 Thread John Naylor

On Tue, Oct 29, 2024 at 11:34 PM Heikki Linnakangas  wrote:
>  master patched
> few-xacts: 0.0041  0.0041 s / iteration
> many-xacts:0.0042  0.0042 s / iteration
> many-xacts-wide-apart: 0.0043  0.0045 s / iteration

Hi Heikki,

I have some thoughts about behavior of the cache that might not be
apparent in this test:

The tree is only as tall as need be to store the highest non-zero
byte. On a newly initialized cluster, the current txid is small. The
first two test cases here will result in a tree with height of 2. The
last one will have a height of 3, and its runtime looks a bit higher,
although that could be just noise or touching more cache lines. It
might be worth it to try a test run while forcing the upper byte of
the keys to be non-zero (something like "key | (1<<30), so that the
tree always has a height of 4. That would match real-world conditions
more closely. If need be, there are a couple things we can do to
optimize node dispatch and touch fewer cache lines.

> I added two tests to the test suite:
>  master patched
> insert-all-different-xids: 0.000270.00019 s / iteration
> insert-all-different-subxids:  0.000230.00020 s / iteration

> The point of these new tests is to test the scenario where the cache
> doesn't help and just adds overhead, because each XID is looked up only
> once. Seems to be fine. Surprisingly good actually; I'll do some more
> profiling on that to understand why it's even faster than 'master'.

These tests use a sequential scan. For things like primary key
lookups, I wonder if the overhead of creating and destroying the
tree's memory contexts for the (not used again) cache would be
noticeable. If so, it wouldn't be too difficult to teach radix tree to
create the larger contexts lazily.

> Now the downside of this new cache: Since it has no size limit, if you
> keep looking up different XIDs, it will keep growing until it holds all
> the XIDs between the snapshot's xmin and xmax. That can take a lot of
> memory in the worst case. Radix tree is pretty memory efficient, but
> holding, say 1 billion XIDs would probably take something like 500 MB of
> RAM (the radix tree stores 64-bit words with 2 bits per XID, plus the
> radix tree nodes). That's per snapshot, so if you have a lot of
> connections, maybe even with multiple snapshots each, that can add up.
>
> I'm inclined to accept that memory usage. If we wanted to limit the size
> of the cache, would need to choose a policy on how to truncate it
> (delete random nodes?), what the limit should be etc. But I think it'd
> be rare to hit those cases in practice. If you have a one billion XID
> old transaction running in the primary, you probably have bigger
> problems already.

I don't have a good sense of whether it needs a limit or not, but if
we decide to add one as a precaution, maybe it's enough to just blow
the cache away when reaching some limit? Being smarter than that would
need some work.

--
John Naylor
Amazon Web Services

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Vitaly Davydov

Dear Hackers,
To ping the topic, I'd like to clarify what may be wrong with the idea
described here, because I do not see any interest from the community. The topic
is related to physical replication. The primary idea is to define the horizon
of WAL segments (files) removal based on saved on disk restart LSN values. Now,
the WAL segment removal horizon is calculated based on the current restart LSN
values of slots, that can not be saved on disk at the time of the horizon
calculation. The case take place when a slot is advancing during checkpoint as
described earlier in the topic.
Such behaviour is not a problem when slots are used only for physical
replication in a conventional way. But it may be a problem when physical slot
is used for some other goals. For example, I have an extension which keeps the
WAL using physical replication slots. It creates a new physical slot and
advances it as needed. After restart, it can use restart lsn of the slot to
read WAL from this LSN. In this case, there is no guarantee that restart lsn
will point to an existing WAL segment.
The advantage of the current behaviour is that it requires a little bit less
WAL to keep. The disadvantage is that physical slots do not guarantee WAL
keeping starting from its' restart lsns in general.
I would be happy to get some advice, whether I am on the right or wrong way.
Thank you in advance.
With best regards,
Vitaly

On Thursday, November 07, 2024 16:30 MSK, "Vitaly Davydov"
wrote:
Dear Hackers,
I'd like to introduce an improved version of my patch (see the attached file).
My original idea was to take into account saved on disk restart_lsn
(slot→restart_lsn_flushed) for persistent slots when removing WAL segment
files. It helps tackle errors like: ERROR: requested WAL segment 000...0AA has
already been removed.
Improvements:
* flushed_restart_lsn is used only for RS_PERSISTENT slots. * Save physical
slot on disk when advancing only once - if restart_lsn_flushed is invalid. It
is needed because slots with invalid restart LSN are not used when calculating
oldest LSN for WAL truncation. Once restart_lsn becomes valid, it should be
saved to disk immediately to update restart_lsn_flushed.
Regression tests seems to be ok except:
* recovery/t/001_stream_rep.pl (checkpoint is needed) *
recovery/t/019_replslot_limit.pl (it seems, slot was invalidated, some
adjustments are needed) * pg_basebackup/t/020_pg_receivewal.pl (not sure about
it)
There are some problems:
* More WAL segments may be kept. It may lead to invalidations of slots in some
tests (recovery/t/019_replslot_limit.pl). A couple of tests should be adjusted.
With best regards,
Vitaly Davydov

On Thursday, October 31, 2024 13:32 MSK, "Vitaly Davydov"
wrote:

Sorry, attached the missed patch.

On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov"
wrote:

Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN. Physical
slots are saved to disk at the beginning of checkpoint. At the end of
checkpoint, old WAL segments are recycled or removed from disk, if they are not
kept by slot's restart_lsn values.
If an existing physical slot is advanced in the middle of checkpoint execution,
WAL segments, which are related to saved on disk restart LSN may be removed. It
is because the calculation of the replication slot miminal LSN is occured at
the end of checkpoint, prior to old WAL segments removal. If to hard stop
(pg_stl -m immediate) the postgres instance right after checkpoint and to
restart it, the slot's restart_lsn may point to the removed WAL segment. I
believe, such behaviour is not good.
The doc [0] describes that restart_lsn may be set to the some past value after
reload. There is a discussion [1] on pghackers where such behaviour is
discussed. The main reason of not flushing physical slots on advancing is a
performance reason. I'm ok with such behaviour, except of that the
corresponding WAL segments should not be removed.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots.
Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy
restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the
format of storing the slot contents on disk. I attached a patch. It is not yet
complete, but demonstate a way to solve the problem.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint
execution. * Execute checkpoint and pg_replication_slot_advance right after
starting of the checkpoint from another connection. * Hard restart the server
right after checkpoint completion. * After restart slot's restart_lsn may point
to removed WAL segment.
The proposed patch fixes it.
[0] https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[1]
https://www.postgresql.org/message-id/flat/059cc53a-8b14-653a-a24d-5f867503b0ee%40postgrespro.ru

Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin

> On 20 Nov 2024, at 14:34, Sanjay Khatri  wrote:
> 
> These are the errors from the logs of iDrac M630.
> I uninstalled the Postgres 15.8 , once I got the connection. But sadly the 
> server disconnected again and crashed, even after uninstalling.
> No I am not able to boot it.

I'm going to go out on a limb and say that this isn't a postgres bug, and
highly unlikely to be caused by one.  This smells like broken hardware and/or
corrupted BIOS/firmware, you should contact your hardware vendor for support.

--
Daniel Gustafsson

Re: proposal: schema variables

2024-11-20 Thread Pavel Stehule

Hi

st 20. 11. 2024 v 14:29 odesílatel Marcos Pegoraro 
napsal:

> Em ter., 19 de nov. de 2024 às 16:15, Pavel Stehule <
> pavel.steh...@gmail.com> escreveu:
>
>> I wrote POC of VARIABLE(varname) syntax support
>>
>
> Not related with POC of VARIABLE but seeing your patches ...
>
> Wouldn't it be better to use just one syntax and message for what to do ON
> COMMIT ?
>
> When creating a new variable you use
> CREATE VARIABLE ... ON COMMIT DROP | ON TRANSACTION END RESET
>
> On PSQL \dV+ you show
> Transactional end action
>
> Maybe all them could be just ON COMMIT
> CREATE VARIABLE ... [ON COMMIT {NO ACTION, DROP, RESET}] and \dV+ just "on
> commit" on title column
>

ON COMMIT DROP is related to temporary objects. In this case, you don't
need to think about ROLLBACK, because in this case, the temp objects are
removed implicitly.

ON TRANSACTION END RESET can be used for non temporary objects too. So this
is a little bit of a different feature. But the reset is executed if the
transaction is ended by ROLLBACK too. So using a syntax just ON COMMIT can
be a little bit messy. TRANSACTION END is more intuitive, I think. If I
remember there was a proposal ON COMMIT OR ROLLBACK, but I think
TRANSACTION END is better and more intuitive, and better describes what is
implemented. I can imagine to support clauses ON COMMIT RESET or ON
ROLLBACK RESET that can be used independently, but for this time, I don't
want to increase a complexity now - reset is just at transaction end
without dependency if the transaction was committed or rollbacked.

Regards

Pavel


> regards
> Marcos
>

Re: proposal: schema variables

2024-11-20 Thread Marcos Pegoraro

Em qua., 20 de nov. de 2024 às 10:52, Pavel Stehule 
escreveu:

> COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I
> think.
>
>>
Exactly to be not messy I would just ON COMMIT for all, and DOCs can
explain that this option is ignored for temp objects and do the same at the
end of transaction, independently if commited or rolled back

Re: proposal: schema variables

2024-11-20 Thread Pavel Stehule

st 20. 11. 2024 v 15:15 odesílatel Marcos Pegoraro 
napsal:

> Em qua., 20 de nov. de 2024 às 10:52, Pavel Stehule <
> pavel.steh...@gmail.com> escreveu:
>
>> COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I
>> think.
>>
>>>
> Exactly to be not messy I would just ON COMMIT for all, and DOCs can
> explain that this option is ignored for temp objects and do the same at the
> end of transaction, independently if commited or rolled back
>

I feel it differently - when I read ON COMMIT, then I expect really just a
COMMIT event, not ROLLBACK.  Attention - the metadata about variables are
transactional, but the variables are not transactional (by default).

But this feeling can be subjective. The objective argument against using ON
COMMIT like ON TRANSACTION END is fact, so we lost a possibility for a more
precious setting.

I can imagine scenarios with ON COMMIT RESET - and variable can hold some
last value from transaction, or ON ROLLBACK RESET - and variable can hold
some computed value from successfully ended transaction - last inserted id.

In this case, I don't see a problem to reopen a discussion about syntax or
postpone this feature. I think the possibility to reset variables at some
specified time can be an interesting feature (that can increase safety,
because the application doesn't need to solve initial setup),  but from the
list of implemented features for session variables, this is not too far
from the end. If TRANSACTION END is not intuitive - with exactly the same
functionality can be RESET AT TRANSACTION START - so the ON TRANSACTION END
can be transformed to ON BEGIN RESET, and this syntax can be maybe better,
because it signalize, for every transaction, the variable will be
initialized to default. What do you think? Can be ON BEGIN RESET acceptable
for you.

Regards

Pavel

Re: Enhancing Memory Context Statistics Reporting

2024-11-20 Thread Ashutosh Bapat

On Wed, Nov 20, 2024 at 2:39 PM Rahila Syed  wrote:
>
> Hi,
>
>> To achieve both completeness and avoid writing to a file, I can consider
>> displaying the numbers for the remaining contexts as a cumulative total
>> at the end of the output.
>>
>> Something like follows:
>> ```
>> postgres=# select * from pg_get_process_memory_contexts('237244', false);
>>  name  | ident   
>>|   type   | path | total_bytes | tot
>> al_nblocks | free_bytes | free_chunks | used_bytes |  pid
>> ---++--+--+-+
>> ---++-++
>>  TopMemoryContext  | 
>>| AllocSet | {0}  |   97696 |
>>  5 |  14288 |  11 |  83408 | 237244
>>  search_path processing cache  | 
>>| AllocSet | {0,1}|8192 |
>>  1 |   5328 |   7 |   2864 | 237244
>> Remaining contexts total:  23456 bytes (total_bytes) , 12345(used_bytes),  
>> 11,111(free_bytes)
>>
>> ```
>
>
> Please find attached an updated patch with this change. The file previously 
> used to
> store spilled statistics has been removed. Instead, a cumulative total of the
> remaining/spilled context statistics is now stored in the DSM segment, which 
> is
> displayed as follows.
>
> postgres=# select * from pg_get_process_memory_contexts('352966', false);
>  name | ident |   type   |  path  | total_bytes | 
> total_nblocks | free_bytes | free_chunks | used_bytes |  pi
> d
> --+---+--++-+---++-++
> 
>  TopMemoryContext |   | AllocSet | {0}|   97696 | 
> 5 |  14288 |  11 |  83408 | 352
> 966
> .
> .
> .
>  MdSmgr   |   | AllocSet | {0,18} |8192 | 
> 1 |   7424 |   0 |768 | 352
> 966
>  Remaining Totals |   |  || 1756016 | 
>   188 | 658584 | 132 |1097432 | 352
> 966
> (7129 rows)
> -
>
> I believe this serves as a good compromise between completeness
> and avoiding the overhead of file handling. However, I am open to
> reintroducing file handling if displaying the complete statistics of the
> remaining contexts prove to be more important.
>
> All the known bugs in the patch have been fixed.
>
> In summary, one DSA  per PostgreSQL process is used to share
> the statistics of that process. A DSA is created by the first client
> backend that requests memory context statistics, and it is pinned
> for all future requests to that process.
> A handle to this DSA is shared between the client and the publishing
> process using fixed shared memory. The fixed shared memory consists
> of an array of size MaxBackends + auxiliary processes, indexed
> by procno. Each element in this array is less than 100 bytes in size.
>
> A PostgreSQL process uses a condition variable to signal a waiting client
> backend once it has finished publishing the statistics. If, for some reason,
> the signal is not sent, the waiting client backend will time out.

How does the process know that the client backend has finished reading
stats and it can be refreshed? What happens, if the next request for
memory context stats comes before first requester has consumed the
statistics it requested?

Does the shared memory get deallocated when the backend which
allocated it exits?

>
> When statistics of a local backend is requested, this function returns the 
> following
> WARNING and exits, since this can be handled by an existing function which
> doesn't require a DSA.
>
> WARNING:  cannot return statistics for local backend
> HINT:  Use pg_get_backend_memory_contexts instead

How about using pg_get_backend_memory_contexts() for both - local as
well as other backend? Let PID argument default to NULL which would
indicate local backend, otherwise some other backend?

-- 
Best Wishes,
Ashutosh Bapat

Re: proposal: schema variables

2024-11-20 Thread Marcos Pegoraro

Em ter., 19 de nov. de 2024 às 16:15, Pavel Stehule 
escreveu:

> I wrote POC of VARIABLE(varname) syntax support
>

Not related with POC of VARIABLE but seeing your patches ...

Wouldn't it be better to use just one syntax and message for what to do ON
COMMIT ?

When creating a new variable you use
CREATE VARIABLE ... ON COMMIT DROP | ON TRANSACTION END RESET

On PSQL \dV+ you show
Transactional end action

Maybe all them could be just ON COMMIT
CREATE VARIABLE ... [ON COMMIT {NO ACTION, DROP, RESET}] and \dV+ just "on
commit" on title column

regards
Marcos

Re: sunsetting md5 password support

2024-11-20 Thread Nathan Bossart

On Wed, Nov 20, 2024 at 10:56:11AM -0500, Greg Sabino Mullane wrote:
> On Tue, Nov 19, 2024 at 8:55 PM Nathan Bossart 
> wrote:
> 
>> * Expand the documentation.  Perhaps we could add a step-by-step guide
>> for migrating to SCRAM-SHA-256 since more users will need to do so when
>> MD5 password support is removed.
>> * Remove the hint.  It's arguably doing little more than pointing out the
>> obvious, and it doesn't actually tell users where in the documentation
>> to look for this information, anyway.
>>
> 
> I think both ideally, but maybe just the hint removal for this patch?
> 
> On the other hand, "change your password and update pg_hba.conf" is pretty
> much all you need, so not sure how detailed we want to get. :)

After thinking about this some more, I'm actually finding myself leaning
towards leaving the hint and potentially adding more detail to the
documentation as a follow-up patch.  While the hint arguably points out the
obvious, it should at least nudge users in the right direction instead of
just telling them to stop using MD5 passwords.  I've always found it
incredibly frustrating when something is marked deprecated but there's zero
information about what to do instead.

I also see a few existing cases where we refer users to the documentation,
so it's not without precedent.

-- 
nathan

Re: cannot freeze committed xmax

> On 20 Nov 2024, at 15:58, Andrey M. Borodin  wrote:
> 
> PFA the patch doing so.

Ugh. The patch is simply dysfunctional, sorry. xmax_status is being checked 
uninitiated.
But, well, it highlights the idea: make verify_heapam() aware of such 
corruptions.
What do you think?

Best regards, Andrey Borodin.

Re: Add a write_to_file member to PgStat_KindInfo

Hi,

On Wed, Nov 20, 2024 at 05:45:55PM +0300, Nazir Bilal Yavuz wrote:
> I think this is a good idea. +1 for the $SUBJECT.

Thanks for looking at it!

> There are duplicated codes in the injection_stats_fixed.c file. Do you
> think that 'modifying existing functions to take an argument to
> differentiate whether the kind is default or no-write' would be
> better?

Some of the new functions are callbacks so we don't have the control on the
parameters list that the core is expecting.

It remains:

pgstat_register_inj_fixed_no_write()
pgstat_report_inj_fixed_no_write()
injection_points_stats_fixed_no_write()

for which I think we could add an extra "write_to_file" argument on their 
original
counterparts.

Not sure how many code reduction we would get, so out of curiosity I did the
exercice in v2 attached and that gives us:

v1:
8 files changed, 211 insertions(+), 2 deletions(-)

v2:
8 files changed, 152 insertions(+), 22 deletions(-)

I don't have a strong opinion for this particular case here (I think the code
is harder to read but yeah there is some code reduction): so I'm fine with
v2 too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 53f4afaed64d6975b022da22db7f5e7fe0134ca7 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 20 Nov 2024 08:35:13 +
Subject: [PATCH v2] Add a write_to_file member to PgStat_KindInfo

This new member allows the statistics to configure whether or not they want to
be written to the file on disk. That gives more flexiblity to the custom
statistics.
---
 src/backend/utils/activity/pgstat.c   |  21 +++-
 src/include/utils/pgstat_internal.h   |   3 +
 .../injection_points--1.0.sql |   3 +-
 .../injection_points/injection_points.c   |  18 ++--
 .../injection_points/injection_stats.c|   1 +
 .../injection_points/injection_stats.h|   6 +-
 .../injection_points/injection_stats_fixed.c  | 100 --
 .../modules/injection_points/t/001_stats.pl   |  22 +++-
 8 files changed, 152 insertions(+), 22 deletions(-)
  10.4% src/backend/utils/activity/
  18.1% src/test/modules/injection_points/t/
  70.5% src/test/modules/injection_points/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index ea8c5691e8..4b8c57bb8e 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -12,7 +12,8 @@
  * Statistics are loaded from the filesystem during startup (by the startup
  * process), unless preceded by a crash, in which case all stats are
  * discarded. They are written out by the checkpointer process just before
- * shutting down, except when shutting down in immediate mode.
+ * shutting down (if the stat kind allows it), except when shutting down in
+ * immediate mode.
  *
  * Fixed-numbered stats are stored in plain (non-dynamic) shared memory.
  *
@@ -281,6 +282,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "database",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 		/* so pg_stat_database entries can be seen in all databases */
 		.accessed_across_databases = true,
 
@@ -297,6 +299,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "relation",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 
 		.shared_size = sizeof(PgStatShared_Relation),
 		.shared_data_off = offsetof(PgStatShared_Relation, stats),
@@ -311,6 +314,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "function",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 
 		.shared_size = sizeof(PgStatShared_Function),
 		.shared_data_off = offsetof(PgStatShared_Function, stats),
@@ -324,6 +328,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "replslot",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 
 		.accessed_across_databases = true,
 
@@ -340,6 +345,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "subscription",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 		/* so pg_stat_subscription_stats entries can be seen in all databases */
 		.accessed_across_databases = true,
 
@@ -359,6 +365,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "archiver",
 
 		.fixed_amount = true,
+		.write_to_file = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, archiver),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, archiver),
@@ -374,6 +381,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "bgwriter",
 
 		.fixed_amount = true,
+		.write_to_file = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, bgwriter),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, bgwriter),
@@ -389,6 +397,7 @@ static const PgStat_KindInfo pgstat_kind_builti

Re: per backend I/O statistics

Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:
> + /*
> +  * Do serialize or not this kind of stats.
> +  */
> + boolto_serialize:1;
> 
> Not sure that "serialize" is the best term that applies here.  For
> pgstats entries, serialization refers to the matter of writing their
> entries with a "serialized" name because they have an undefined number
> when stored locally after a reload.  I'd suggest to split this concept
> into its own patch, rename the flag as "write_to_file" 

Yeah, also this could useful for custom statistics. So I created a dedicated
thread and a patch proposal (see [1]).

[1]: 
https://www.postgresql.org/message-id/Zz3skBqzBncSFIuY%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: fix deprecation mention for age() and mxid_age()

On Tue, Nov 19, 2024 at 06:52:40AM +, Bertrand Drouvot wrote:
> Thanks for the feedback!

Done.  The section mistake in REL_16_STABLE was..  Interesting.
--
Michael


signature.asc
Description: PGP signature

Re: Remove useless casts to (void *)

On Thu, Nov 14, 2024 at 09:59:07AM +0100, Peter Eisentraut wrote:
> On 29.10.24 15:20, Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > There are a bunch of (void *) casts in the code that don't make sense to
> > > me.  I think some of these were once necessary because char * was used
> > > in place of void * for some function arguments.  And some of these were
> > > probably just copied around without further thought.  I went through and
> > > cleaned up most of these.  I didn't find any redeeming value in these.
> > > They are just liable to hide actual problems such as incompatible types.
> > >But maybe there are other opinions.
> > 
> > I don't recall details, but I'm fairly sure some of these prevented
> > compiler warnings on some (old?) compilers.  Hard to be sure if said
> > compilers are all gone.
> > 
> > Looking at the sheer size of the patch, I'm kind of -0.1, just
> > because I'm afraid it's going to create back-patching gotchas.
> > I don't really find that it's improving readability, though
> > clearly that's a matter of opinion.
> 
> I did a bit of archeological research on these.  None of these casts were
> ever necessary, and in many cases even the original patch that introduced an
> API used the coding style inconsistently.  So I'm very confident that there
> are no significant backward compatibility or backpatching gotchas here.
> 
> I'm more concerned that many of these just keep getting copied around
> indiscriminately, and this is liable to hide actual type mismatches or
> silently discard qualifiers.  So I'm arguing in favor of a more restrictive
> style in this matter.

I agree.  I realize this will cause backpatch complexities, but I think
removing these will be a net positive.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Add a write_to_file member to PgStat_KindInfo

Hi hackers,

Working on [1] produced the need to give to the statistics the ability to
decide whether or not they want to be written to the file on disk.

Indeed, there is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so $SUBJECT.

This new member could also be useful for custom statistics, so creating this
dedicated thread.

The attached patch also adds a test in the fixed injection points statistics.
It does not add one for the variable injection points statistics as I think 
adding
one test is enough (the code changes are simple enough).

[1]: 
https://www.postgresql.org/message-id/Zy4bmvgHqGjcK1pI%40ip-10-97-1-34.eu-west-3.compute.internal

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 229cbce525382c36d461246dccc0ab6a71b31c17 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 20 Nov 2024 08:35:13 +
Subject: [PATCH v1] Add a write_to_file member to PgStat_KindInfo

This new member allows the statistics to configure whether or not they want to
be written to the file on disk. That gives more flexiblity to the custom
statistics.
---
 src/backend/utils/activity/pgstat.c   |  21 ++-
 src/include/utils/pgstat_internal.h   |   3 +
 .../injection_points--1.0.sql |  13 ++
 .../injection_points/injection_points.c   |   6 +
 .../injection_points/injection_stats.c|   1 +
 .../injection_points/injection_stats.h|   6 +
 .../injection_points/injection_stats_fixed.c  | 149 ++
 .../modules/injection_points/t/001_stats.pl   |  14 ++
 8 files changed, 211 insertions(+), 2 deletions(-)
   9.7% src/backend/utils/activity/
  11.1% src/test/modules/injection_points/t/
  78.3% src/test/modules/injection_points/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index ea8c5691e8..4b8c57bb8e 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -12,7 +12,8 @@
  * Statistics are loaded from the filesystem during startup (by the startup
  * process), unless preceded by a crash, in which case all stats are
  * discarded. They are written out by the checkpointer process just before
- * shutting down, except when shutting down in immediate mode.
+ * shutting down (if the stat kind allows it), except when shutting down in
+ * immediate mode.
  *
  * Fixed-numbered stats are stored in plain (non-dynamic) shared memory.
  *
@@ -281,6 +282,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "database",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 		/* so pg_stat_database entries can be seen in all databases */
 		.accessed_across_databases = true,
 
@@ -297,6 +299,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "relation",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 
 		.shared_size = sizeof(PgStatShared_Relation),
 		.shared_data_off = offsetof(PgStatShared_Relation, stats),
@@ -311,6 +314,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "function",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 
 		.shared_size = sizeof(PgStatShared_Function),
 		.shared_data_off = offsetof(PgStatShared_Function, stats),
@@ -324,6 +328,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "replslot",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 
 		.accessed_across_databases = true,
 
@@ -340,6 +345,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "subscription",
 
 		.fixed_amount = false,
+		.write_to_file = true,
 		/* so pg_stat_subscription_stats entries can be seen in all databases */
 		.accessed_across_databases = true,
 
@@ -359,6 +365,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "archiver",
 
 		.fixed_amount = true,
+		.write_to_file = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, archiver),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, archiver),
@@ -374,6 +381,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "bgwriter",
 
 		.fixed_amount = true,
+		.write_to_file = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, bgwriter),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, bgwriter),
@@ -389,6 +397,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "checkpointer",
 
 		.fixed_amount = true,
+		.write_to_file = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, checkpointer),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, checkpointer),
@@ -404,6 +413,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZ

Re: Add a write_to_file member to PgStat_KindInfo

2024-11-20 Thread Nazir Bilal Yavuz

Hi,

Thank you for working on this!

On Wed, 20 Nov 2024 at 17:05, Bertrand Drouvot
 wrote:
>
> Hi hackers,
>
> Working on [1] produced the need to give to the statistics the ability to
> decide whether or not they want to be written to the file on disk.
>
> Indeed, there is no need to write the per backend I/O stats to disk (no point 
> to
> see stats for backends that do not exist anymore after a re-start), so 
> $SUBJECT.

I think this is a good idea. +1 for the $SUBJECT.

> This new member could also be useful for custom statistics, so creating this
> dedicated thread.
>
> The attached patch also adds a test in the fixed injection points statistics.
> It does not add one for the variable injection points statistics as I think 
> adding
> one test is enough (the code changes are simple enough).

There are duplicated codes in the injection_stats_fixed.c file. Do you
think that 'modifying existing functions to take an argument to
differentiate whether the kind is default or no-write' would be
better?

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: scalability bottlenecks with (many) partitions (and more)

2024-11-20 Thread Matthias van de Meent

On Wed, 4 Sept 2024 at 17:32, Tomas Vondra  wrote:
>
> On 9/4/24 16:25, Matthias van de Meent wrote:
> > On Tue, 3 Sept 2024 at 18:20, Tomas Vondra  wrote:
> >> FWIW the actual cost is somewhat higher, because we seem to need ~400B
> >> for every lock (not just the 150B for the LOCK struct).
> >
> > We do indeed allocate two PROCLOCKs for every LOCK, and allocate those
> > inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in
> > dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash
> > buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being
> > used[^1]. Does that align with your usage numbers, or are they
> > significantly larger?
> >
>
> I see more like ~470B per lock. If I patch CalculateShmemSize to log the
> shmem allocated, I get this:
>
>   max_connections=100 max_locks_per_transaction=1000 => 194264001
>   max_connections=100 max_locks_per_transaction=2000 => 241756967
>
> and (((241756967-194264001)/100/1000)) = 474
>
> Could be alignment of structs or something, not sure.

NLOCKENTS is calculated based off of MaxBackends, which is the sum of
MaxConnections + autovacuum_max_workers + 1 +
max_worker_processes + max_wal_senders; which by default add
22 more slots.

After adjusting for that, we get 388 bytes /lock, which is
approximately in line with the calculation.

> >> At least based on a quick experiment. (Seems a bit high, right?).
> >
> > Yeah, that does seem high, thanks for nerd-sniping me.
[...]
> > Alltogether that'd save 40 bytes/lock entry on size, and ~35
> > bytes/lock on "safety margin", for a saving of (up to) 19% of our
> > current allocation. I'm not sure if these tricks would benefit with
> > performance or even be a demerit, apart from smaller structs usually
> > being better at fitting better in CPU caches.
> >
>
> Not sure either, but it seems worth exploring. If you do an experimental
> patch for the LOCK size reduction, I can get some numbers.

It took me some time to get back to this, and a few hours to
experiment, but here's that experimental patch. Attached 4 patches,
which together reduce the size of the shared lock tables by about 34%
on my 64-bit system.

1/4 implements the MAX_LOCKMODES changes to LOCK I mentioned before,
saving 16 bytes.
2/4 packs the LOCK struct more tightly, for another 8 bytes saved.
3/4 reduces the PROCLOCK struct size by 8 bytes with a PGPROC* ->
ProcNumber substitution, allowing packing with fields previously
reduced in size in patch 2/4.
4/4 reduces the size fo the PROCLOCK table by limiting the average
number of per-backend locks to max_locks_per_transaction (rather than
the current 2*max_locks_per_transaction when getting locks that other
backends also requested), and makes the shared lock tables fully
pre-allocated.

1-3 together save 11% on the lock tables in 64-bit builds, and 4/4
saves another ~25%, for a total of ~34% on per-lockentry shared memory
usage; from ~360 bytes to ~240 bytes.

Note that this doesn't include the ~4.5 bytes added per PGPROC entry
per mlpxid for fastpath locking; I've ignored those for now.

Not implemented, but technically possible: the PROCLOCK table _could_
be further reduced in size by acknowledging that each of that struct
is always stored after dynahash HASHELEMENTs, which have 4 bytes of
padding on 64-bit systems. By changing PROCLOCKTAG's myProc to
ProcNumber, one could pack that field into the padding of the hash
element header, reducing the effective size of the hash table's
entries by 8 bytes, and thus the total size of the tables by another
few %. I don't think that trade-off is worth it though, given the
complexity and trickery required to get that to work well.

> I'm not sure about the safety margins. 10% sure seems like quite a bit
> of memory (it might not have in the past, but as the instances are
> growing, that probably changed).

I have not yet touched this safety margin.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

v0-0002-Reduce-size-of-LOCK-by-8-more-bytes.patch
Description: Binary data

v0-0001-Reduce-size-of-LOCK-by-16-bytes.patch
Description: Binary data

v0-0003-Reduce-size-of-PROCLOCK-by-8-bytes-on-64-bit-syst.patch
Description: Binary data

v0-0004-Reduce-PROCLOCK-hash-table-size.patch
Description: Binary data

Re: Statistics Import and Export

On Tue, Nov 19, 2024 at 05:40:20PM -0500, Bruce Momjian wrote:
> On Tue, Nov 19, 2024 at 03:47:20PM -0500, Corey Huinker wrote:
> > * create a pg_stats_health_check script that lists tables missing stats, 
> > with
> > --fix/--fix-in-stages options, effectively replacing vacuumdb for those
> > purposes, and then crank up the messaging about that change. The "new shiny"
> > effect of a new utility that has "stats", "health", and "check" in the name 
> > may
> > be the search/click-bait we need to get the word out effectively. That last
> > sentence may sound facetious, but it isn't, it's just accepting how search
> > engines and eyeballs currently function. With that in place, we can then 
> > change
> > the vacuumdb documentation to be deter future use in post-upgrade 
> > situations.
> 
> We used to create a script until the functionality was added to
> vacuumdb.  Since 99% of users will not need to do anything after
> pg_upgrade, it would make sense to output the script only for the 1% of
> users who need it and tell users to run it, rather than giving
> instructions that are a no-op for 99% of users.

One problem with the above approach is that it gives users upgrading or
loading via pg_dump no way to know which tables need analyze statistics,
right?  I think that is why we ended up putting the pg_upgrade
statistics functionality in vacuumdb --analyze-in-stages.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: Consider pipeline implicit transaction as a transaction block

2024-11-20 Thread Anthonin Bonnefoy

On Tue, Nov 5, 2024 at 6:50 AM Michael Paquier  wrote:
> It would be nice to document all these behaviors with regression
> tests in pgbench as it is the only place where we can control that
> with error pattern checks.
It's not the first time I wanted to be able to do pipelining in psql
as relying on pgbench and tap tests is not the most flexible so I took
a stab at adding it.

0001: This is a small bug I've stumbled upon. The query buffer is not
cleared on a backslash error. For example, "SELECT 1 \parse" would
fail due to a missing statement name but would leave "SELECT 1\n" in
the query buffer which would impact the next command.

0002: This adds pipeline support to psql by adding the same
meta-commands as pgbench: \startpipeline, \endpipeline and
\syncpipeline. Once a pipeline is started, all extended protocol
queries are piped until \endpipeline is called. To keep things simple,
meta-commands like \gx, \gdesc and \gexec are forbidden inside a
pipeline. The tests are covering the existing pipelining behaviour
with the SET LOCAL, SAVEPOINTS, REINDEX CONCURRENTLY and LOCK
commands, depending if it's the first command or not. The
\syncpipeline allows us to see that this "first command" behaviour
also happens after a synchronisation point within a pipeline.

0003: The initial change to use implicit transaction block when
pipelining is detected. The tests reflect the change in behaviour like
the LOCK command working if it's the second command in the pipeline.
I've updated the pipeline documentation to provide more details about
the use of implicit transaction blocks with pipelines.

v03-0001-Reset-query-buffer-on-a-backslash-error.patch
Description: Binary data

v03-0003-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data

v03-0002-Add-pipeline-support-in-psql.patch
Description: Binary data

Re: [PATCH] Fixed assertion issues in "pg_get_viewdef"

"=?gb18030?B?emVuZ21hbg==?="  writes:
> Thanks for your guidance, you are right, I looked at your patch
> and combined it with the example to generate a new patch,
> which is really better.

I pushed the code fix, but I can't really convince myself that the
test case is worth the cycles it'd eat forevermore.  If we had
a way to reach the situation where there's setops but not any of
the other clauses in a leaf query, perhaps that would be worth
checking ... but we don't.  It's just belt-and-suspenders-too
programming.

regards, tom lane

Re: Add reject_limit option to file_fdw

2024-11-20 Thread Fujii Masao




On 2024/11/19 21:40, torikoshia wrote:

These messages may be unexpected for some users because the documentation of
fild_fdw does not explicitly describe that file_fdw uses COPY internally.
(I can find several wordings like "as COPY", though.)
However, since the current file_fdw already has such messages, I came to think
making the messages on "reject_limit" consistent with COPY is reasonable.
I mean, the previous version of the message seems fine.


Agreed. Thanks for your investigation.


I've pushed the 0001 patch. Thanks!



As another comment, do we need to add validator test for on_error and
reject_limit as similar to other options?


That might be better.
Added some queries which had wrong options to cause errors.

I was unsure whether to add it to "validator test" section or "on_error, log_verbosity and 
reject_limit tests" section, but I chose "validator test" as I believe it makes things clearer.

Added 0002 patch since some of the tests are not related to reject_limit but 
just on_error.


How about adding validator tests for log_verbosity and reject_limit=0,
similar to the tests in the copy2.sql regression test? I've added those
two tests, and the latest version of the patch is attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From b73cf7b8aec5d93490f0d303eda3207bd4acbf8c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 19 Nov 2024 21:12:52 +0900
Subject: [PATCH v6] file_fdw: Add regression tests for ON_ERROR and other
 options.

This commit introduces regression tests to validate incorrect settings
for the ON_ERROR, LOG_VERBOSITY, and REJECT_LIMIT options in file_fdw.
---
 contrib/file_fdw/expected/file_fdw.out | 8 
 contrib/file_fdw/sql/file_fdw.sql  | 4 
 2 files changed, 12 insertions(+)

diff --git a/contrib/file_fdw/expected/file_fdw.out 
b/contrib/file_fdw/expected/file_fdw.out
index 4f63c047ec..df8d43b374 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -90,8 +90,16 @@ ERROR:  COPY delimiter cannot be newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');   -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 
'unsupported');   -- ERROR
+ERROR:  COPY ON_ERROR "unsupported" not recognized
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', 
on_error 'ignore');   -- ERROR
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (log_verbosity 
'unsupported');   -- ERROR
+ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1'); 
  -- ERROR
 ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'ignore', 
reject_limit '0');   -- ERROR
+ERROR:  REJECT_LIMIT (0) must be greater than zero
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 ERROR:  either filename or program is required for file_fdw foreign tables
 \set filename :abs_srcdir '/data/agg.data'
diff --git a/contrib/file_fdw/sql/file_fdw.sql 
b/contrib/file_fdw/sql/file_fdw.sql
index 4805ca8c04..2cdbe7a8a4 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -77,7 +77,11 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS 
(format 'csv', delimiter
 ');   -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');   -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 
'unsupported');   -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', 
on_error 'ignore');   -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (log_verbosity 
'unsupported');   -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1'); 
  -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'ignore', 
reject_limit '0');   -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 
 \set filename :abs_srcdir '/data/agg.data'
-- 
2.47.0

Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

On Wed, Nov 20, 2024 at 1:26 AM Guillaume Lelarge 
wrote:

> OK, but I'm not sure which example I should pick to add dirtied and
> written shared buffers. It looks kinda artificial. Should I choose one
> randomly?
>

It will be artificial, but I think that's ok: anyone running it on their
own will be getting different numbers anyway. I was looking at the "14.1.2
EXPLAIN ANALYZE" section in perform.sgml. Here's some actual numbers I got
with some playing around with concurrent updates:

 Recheck Cond: (unique1 < 10)
>  Heap Blocks: exact=5
>  Buffers: shared hit=2 read=5 written=4

...

>  Planning:
>Buffers: shared hit=289 dirtied=9

Cheers,
Greg

Add Option To Check All Addresses For Matching target_session_attr

2024-11-20 Thread Andrew Jackson

Hi,

I was attempting to set up a high availability system using DNS and
target_session_attrs. I was using a DNS setup similar to below and was
trying to use the connection strings `psql postgresql://
u...@pg.database.com/db_name?target_session=read-write` to have clients
dynamically connect to the primary or `psql postgresql://
u...@pg.database.com/db_name?target_session=read-only` to have clients
connect to a read replica.

The problem that I found with this setup is that if libpq is unable to get
a matching target_session_attr on the first connection attempt it does not
consider any further addresses for the given host. This patch is designed
to provide an option that allows libpq to look at additional addresses for
a given host if the target_session_attr check fails for previous addresses.

Would appreciate any feedback on the applicability/relevancy of the goal
here or the implementation.

Example DNS setup

Name  | Type| Record
__|__|___
pg.database.com | A| ip_address_1
pg.database.com | A| ip_address_2
pg.database.com | A| ip_address_3
pg.database.com | A| ip_address_4
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 51083dcfd8..865eb74fd7 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -365,6 +365,11 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"Load-Balance-Hosts", "", 8,	/* sizeof("disable") = 8 */
 	offsetof(struct pg_conn, load_balance_hosts)},
 
+	{"check_all_addrs", "PGCHECKALLADDRS",
+		DefaultLoadBalanceHosts, NULL,
+		"Check-All-Addrs", "", 1,
+	offsetof(struct pg_conn, check_all_addrs)},
+
 	/* Terminating entry --- MUST BE LAST */
 	{NULL, NULL, NULL, NULL,
 	NULL, NULL, 0}
@@ -4030,11 +4035,11 @@ keep_going:		/* We will come back to here until there is
 		conn->status = CONNECTION_OK;
 		sendTerminateConn(conn);
 
-		/*
-		 * Try next host if any, but we don't want to consider
-		 * additional addresses for this host.
-		 */
-		conn->try_next_host = true;
+		if (conn->check_all_addrs && conn->check_all_addrs[0] == '1')
+			conn->try_next_addr = true;
+		else
+			conn->try_next_host = true;
+
 		goto keep_going;
 	}
 }
@@ -4085,11 +4090,11 @@ keep_going:		/* We will come back to here until there is
 		conn->status = CONNECTION_OK;
 		sendTerminateConn(conn);
 
-		/*
-		 * Try next host if any, but we don't want to consider
-		 * additional addresses for this host.
-		 */
-		conn->try_next_host = true;
+		if (conn->check_all_addrs && conn->check_all_addrs[0] == '1')
+			conn->try_next_addr = true;
+		else
+			conn->try_next_host = true;
+
 		goto keep_going;
 	}
 }
@@ -4703,6 +4708,7 @@ freePGconn(PGconn *conn)
 	free(conn->rowBuf);
 	free(conn->target_session_attrs);
 	free(conn->load_balance_hosts);
+	free(conn->check_all_addrs);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 08cc391cbd..c51ec28234 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -427,6 +427,7 @@ struct pg_conn
 	char	   *target_session_attrs;	/* desired session properties */
 	char	   *require_auth;	/* name of the expected auth method */
 	char	   *load_balance_hosts; /* load balance over hosts */
+	char   *check_all_addrs;  /* whether to check all ips within a host or terminate on failure */
 
 	bool		cancelRequest;	/* true if this connection is used to send a
  * cancel request, instead of being a normal

Re: sunsetting md5 password support

On Tue, Nov 19, 2024 at 8:55 PM Nathan Bossart 
wrote:

> * Expand the documentation.  Perhaps we could add a step-by-step guide
> for migrating to SCRAM-SHA-256 since more users will need to do so when
> MD5 password support is removed.
> * Remove the hint.  It's arguably doing little more than pointing out the
> obvious, and it doesn't actually tell users where in the documentation
> to look for this information, anyway.
>

I think both ideally, but maybe just the hint removal for this patch?

On the other hand, "change your password and update pg_hba.conf" is pretty
much all you need, so not sure how detailed we want to get. :)

Cheers,
Greg

Re: RFC: Additional Directory for Extensions

2024-11-20 Thread David E. Wheeler

On Nov 20, 2024, at 04:05, Peter Eisentraut  wrote:

> The path is only consulted if the specified name does not contain a slash.  
> So if you do LOAD 'foo', the path is consulted, but if you do LOAD 
> '$libdir/foo', it is not.  The problem I'm describing is that most extensions 
> use the latter style, per current recommendation in the documentation.

I see; some details here:

  https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-DYNLOAD

And I suppose the `directory` control file variable and `MODULEDIR` make 
variable make that necessary. 

Maybe $libdir should be stripped out when installing extensions to work with 
this patch?

Best,

David

Re: Replace current implementations in crypt() and gen_salt() to OpenSSL

> On 19 Nov 2024, at 18:30, Joe Conway  wrote:

> Any other opinions out there?

Couldn't installations who would be satisfied with a GUC gate revoke privileges
from the relevant functions already today and achieve almost the same result?

--
Daniel Gustafsson

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

On Wed, Nov 20, 2024 at 09:51:09PM -0500, Jonathan Katz wrote:
> On 11/20/24 9:50 PM, Jonathan S. Katz wrote:
> > On 11/20/24 9:48 PM, Tom Lane wrote:
> > > "David G. Johnston"  writes:
> > > > On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian  wrote:
> > > > > so when we decided to remove the downloads
> > > 
> > > > Can you elaborate on who "we" is here?
> > > 
> > > More to the point, what downloads were removed?  I still see the
> > > source tarballs in the usual place [1].  If some packager(s) removed
> > > or never posted derived packages, that's on them not the project.
> > 
> > Downloads weren't removed, and I don't see why we'd want to do so in
> > this case.
> 
> Maybe here's the confusion - EDB doesn't have the downloads for the latest
> released posted on the Windows installer:
> 
> https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

Yes, or for MacOS.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: proposal: schema variables

2024-11-20 Thread Pavel Stehule

st 20. 11. 2024 v 21:14 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote:
> > Hi
> >
> > I wrote POC of VARIABLE(varname) syntax support
>
> Thanks, the results look good. I'm still opposed the idea of having a
> warning, and think it has to be an error -- but it's my subjective
> opinion. Lets see if there are more votes on that topic.
>

The error breaks the possibility to use variables (session variables) like
Oracle's package variables easily. It increases effort for transformation
or porting because you should identify variables inside queries and you
should wrap it to fence.  On the other hand, extensions that can read a
query after transformation can easily detect unwrapped variables and they
can raise an error. It can be very easy to implement this check to
plpgsql_check for example or like plpgsql.extra_check.

In my ideal world, the shadowing warning should be enabled by default, and
an unfencing warning disabled by default. But I have not a problem with
activating both warnings by default. I think warnings  are enough, because
if there is some risk then a shadowing warning is activated. And my
experience is more positive about warnings, people read it.

Regards

Pavel

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

On Wed, Nov 20, 2024 at 10:12:55PM -0500, Jonathan Katz wrote:
> On 11/20/24 10:08 PM, Bruce Momjian wrote:
> > Yes, or for MacOS.
> 
> Well, why did EDB remove them? We didn't issue any guidance to remove
> downloads. We only provided guidance to users on decision making about
> whether to wait or not around the upgrade. All of the other packages hosted
> on community infrastructure (and AFAICT other OS distros) are all available.

I don't know.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

On Fri, Nov 15, 2024 at 06:28:42PM -0500, Jonathan Katz wrote:
> We're scheduling an out-of-cycle release on November 21, 2024 to address two
> regressions that were released as part of the November 14, 2024 update
> release[1]. As part of this release, we will issue fixes for all supported
> versions (17.2, 16.6, 15.10, 14.15, 13.20), and for 12.22, even though
> PostgreSQL 12 is now EOL.
> 
> A high-level description of the regressions are as follows.
> 
> 1. The fix for CVE-2024-10978 prevented `ALTER USER ... SET ROLE ...` from
> having any effect[2]. This will be fixed in the upcoming release.
> 
> 2. Certain PostgreSQL extensions took a dependency on an Application Build
> Interface (ABI) that was modified in this release and caused them to
> break[3]. Currently, this can be mitigated by rebuilding the extensions
> against the updated definition.
> 
> Please follow all standard guidelines for commits ahead of the release.
> Thanks for your help in assisting with this release,

I want to point out a complexity of this out-of-cycle release.  Our
17.1, etc. releases had four CVEs:


https://www.postgresql.org/message-id/173159332163.1547975.13346191756810493...@wrigleys.postgresql.org

so when we decided to remove the downloads and encourage people to wait
for the 17.2 etc. releases, we had the known CVEs in Postgres releases
with no recommended way to fix them.

I am not sure what we could have done differently, but I am surprised we
didn't get more complaints about the security situation we put them in.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: 039_end_of_wal: error in "xl_tot_len zero" test

On Fri, Nov 15, 2024 at 11:21 AM Thomas Munro  wrote:
> Not sure yet what is different in this environment or why you're
> suddenly noticing on 13.17.  The logic has been there since 13.13 (ie
> it was backpatched then).

Hi Christoph,

Also why only this branch, when they all have it?  Reproducible or
one-off?  Do you have any more clues?

Re: pg_rewind WAL segments deletion pitfall

2024-11-20 Thread Antonin Houska

Alvaro Herrera  wrote:

> Oh wow, thanks for noticing that.  I had already rewritten the commit
> message to some extent, but "master" had remained.  Now I pushed the
> patch to branches 14+, having replaced it as you suggested.

When doing some unrelated work I noticed that in the new test
010_keep_recycled_wals.pl the server fails to reload the configuration
file. The line it complains about is

archive_command = '/usr/bin/perl -e 'exit(1)''

The test still succeeds for some reason.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024


On 11/20/24 9:50 PM, Jonathan S. Katz wrote:

On 11/20/24 9:48 PM, Tom Lane wrote:

"David G. Johnston"  writes:

On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian  wrote:

so when we decided to remove the downloads



Can you elaborate on who "we" is here?


More to the point, what downloads were removed?  I still see the
source tarballs in the usual place [1].  If some packager(s) removed
or never posted derived packages, that's on them not the project.


Downloads weren't removed, and I don't see why we'd want to do so in 
this case.


Maybe here's the confusion - EDB doesn't have the downloads for the 
latest released posted on the Windows installer:


https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

Jonathan

Re: POC, WIP: OR-clause support for indexes

2024-11-20 Thread Alexander Korotkov

On Wed, Nov 20, 2024 at 8:20 AM Andrei Lepikhov  wrote:
> On 18/11/2024 06:19, Alexander Korotkov wrote:
> > On Fri, Nov 15, 2024 at 3:27 PM Alexander Korotkov  
> > wrote:
> > Here is the next revision of this patch.  No material changes,
> > adjustments for comments and commit message.
> I have passed through the code and found no issues. Maybe only phrase:
> "eval_const_expressions() will be simplified if there is more than one."
> which is used in both patches: here, the 'will' may be removed, as for me.

Exactly same wording is used in match_index_to_operand().  So, I think
we can save this.

> Also, I re-read the thread, and as AFAICS, no other issues remain. So, I
> think it would be OK to move the status of this feature to 'ready for
> committer'.

Yes, I also re-read the thread.  One thing caught my eye is that
Robert didn't answer my point that as we generally don't care about
lazy parameters evaluation while pushing quals as index conds then we
don't have to do this in this patch.  I think there were quite amount
of time to express disagreement if any.  If even this question will
arise, that's well isolated issue which could be nailed down later.

I'm going to push this if no objections.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfdt8kowRDUkmOnO7_WJJQ1uk%2BO379JiZCk_9_Pt5AQ4%2B0w%40mail.gmail.com

--
Regards,
Alexander Korotkov
Supabase

Redundant initialization of table AM oid in relcache

2024-11-20 Thread Jingtang Zhang

Hi~Just found 'relam' field is assigned twice during relcache initialization. Wehave a specific area for initialization AM related stuff several lines later:	/*	 * initialize the table am handler	 */	relation->rd_rel->relam = HEAP_TABLE_AM_OID;	relation->rd_tableam = GetHeapamTableAmRoutine();so previous assignment seems not needed and can be removed.—Regards, Jingtang

0001-Remove-redundant-initialization-of-table-AM-oid-in-r.patch
Description: Binary data

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

2024-11-20 Thread David G. Johnston

On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian  wrote:

> so when we decided to remove the downloads


Can you elaborate on who "we" is here?

I don't recall this event happening.

I suppose "encouraging people to wait" is arguably a bad position to take
compared to directing them to a page on our wiki where the risk factors are
laid out so they can make an informed decision based upon their situation.
But that seems like a person-to-person matter and not something the project
can take responsibility for or control.  So, "immediately create a wiki
page when PR-level problems arise" could be added to the "could have done
better" list, so people have a URL to send instead of off-the-cuff advice.

Obviously "alter role set role" is a quite common usage in our community
yet we lack any regression or tap tests exercising it.  That we could have
done better and caught the bug in the CVE fix.

If the CVEs do have mitigations available those should probably be noted
even if we expect people to apply the minor updates that remove
the vulnerability.  If we didn't reason through and write out such
mitigations for any of these 4 that would be something to consider going
forward.

David J.

Re: Add a write_to_file member to PgStat_KindInfo

On Wed, Nov 20, 2024 at 05:13:18PM +, Bertrand Drouvot wrote:
> I don't have a strong opinion for this particular case here (I think the code
> is harder to read but yeah there is some code reduction): so I'm fine with
> v2 too.

Well, I like the enthusiasm of having tests, but injection_points
stats being persistent on disk is the only write property I want to
keep in the module, because it can be useful across clean restarts.
Your patch results in a weird mix where you now need to have a total
of four stats IDs rather than two: one more for the fixed-numbered
case and one more for the variable-numbered case to be able to control
the write/no-write property set in shared memory.  The correct design,
if we were to control the write/no-write for individual entries, would
be to store a state flag in each individual entries and decide if an
entry should be flushed or not, which would require an additional
callback to check the state of an individual entry at write.  Not sure
if we really need that, TBH, until someone comes up with a case for
it.

Your patch adding backend statistics would be a much better fit to add
a test for this specific property, so let's keep injection_points out
of it, even if it means that we would have only coverage for
variable-numbered stats.  So let's keep it simple here, and just set
.write_to_file to true for both injection_points stats kinds per the
scope of this thread.  Another point is that injection_points acts as
a template for custom pgstats, this makes the code harder to use as a
reference.

Point taken that the tests added in v2 show that the patch works.
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-11-20 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 20 Nov 2024 14:14:27 -0800,
  Masahiko Sawada  wrote:

> I've extracted the changes to refactor COPY TO/FROM to use the format
> callback routines from v23 patch set, which seems to be a better patch
> split to me. Also, I've reviewed these changes and made some changes
> on top of them. The attached patches are:
> 
> 0001: make COPY TO use CopyToRoutine.
> 0002: minor changes to 0001 patch. will be fixed up.
> 0003: make COPY FROM use CopyFromRoutine.
> 0004: minor changes to 0003 patch. will be fixed up.
> 
> I've confirmed that v24 has a similar performance improvement to v23.
> Please check these extractions and minor change suggestions.

Thanks. Here are my comments:

0002:

+/* TEXT format */

"text" may be better than "TEXT". We use "text" not "TEXT"
in other places.

+static const CopyToRoutine CopyToRoutineText = {
+   .CopyToStart = CopyToTextLikeStart,
+   .CopyToOutFunc = CopyToTextLikeOutFunc,
+   .CopyToOneRow = CopyToTextOneRow,
+   .CopyToEnd = CopyToTextLikeEnd,
+};

+/* BINARY format */

"binary" may be better than "BINARY". We use "binary" not
"BINARY" in other places.

+static const CopyToRoutine CopyToRoutineBinary = {
+   .CopyToStart = CopyToBinaryStart,
+   .CopyToOutFunc = CopyToBinaryOutFunc,
+   .CopyToOneRow = CopyToBinaryOneRow,
+   .CopyToEnd = CopyToBinaryEnd,
+};

+/* Return COPY TO routines for the given option */

option ->
options

+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)


0003:

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 99981b1579..224fda172e 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,4 +56,46 @@ typedef struct CopyToRoutine
void(*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation. Note this must 
be

Should we remove a tab character here?

0004:

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e77986f9a9..7f1de8a42b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,31 +106,65 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
-
 /*
- * CopyFromRoutine implementations for text and CSV.
+ * built-in format-specific routines. One-row callbacks are defined in

built-in ->
Built-in

 /*
- * CopyFromTextLikeInFunc
- *
- * Assign input function data for a relation's attribute in text/CSV format.
+ * COPY FROM routines for built-in formats.
++

"+" ->
" *"

+/* TEXT format */

TEXT -> text?

+/* BINARY format */

BINARY -> binary?

+/* Return COPY FROM routines for the given option */

option ->
options

+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)

diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 0447c4df7e..5416583e94 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c

+static bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+  Datum 
*values, bool *nulls, bool is_csv);

Oh, I didn't know that we don't need inline in a function
declaration.
 
@@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static pg_attribute_always_inline bool
+static bool
 CopyReadLineText(CopyFromState cstate, bool is_csv)

Is this an intentional change?
CopyReadLineText() has "bool in_csv".



Thanks,
-- 
kou

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

"David G. Johnston"  writes:
> On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian  wrote:
>> so when we decided to remove the downloads

> Can you elaborate on who "we" is here?

More to the point, what downloads were removed?  I still see the
source tarballs in the usual place [1].  If some packager(s) removed
or never posted derived packages, that's on them not the project.

regards, tom lane

[1] https://www.postgresql.org/ftp/source/

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

On 11/20/24 9:18 PM, Bruce Momjian wrote:

On Fri, Nov 15, 2024 at 06:28:42PM -0500, Jonathan Katz wrote:

We're scheduling an out-of-cycle release on November 21, 2024 to address two
regressions that were released as part of the November 14, 2024 update
release[1]. As part of this release, we will issue fixes for all supported
versions (17.2, 16.6, 15.10, 14.15, 13.20), and for 12.22, even though
PostgreSQL 12 is now EOL.

A high-level description of the regressions are as follows.

1. The fix for CVE-2024-10978 prevented `ALTER USER ... SET ROLE ...` from
having any effect[2]. This will be fixed in the upcoming release.

2. Certain PostgreSQL extensions took a dependency on an Application Build
Interface (ABI) that was modified in this release and caused them to
break[3]. Currently, this can be mitigated by rebuilding the extensions
against the updated definition.

Please follow all standard guidelines for commits ahead of the release.
Thanks for your help in assisting with this release,

I want to point out a complexity of this out-of-cycle release. Our
17.1, etc. releases had four CVEs:

https://www.postgresql.org/message-id/173159332163.1547975.13346191756810493...@wrigleys.postgresql.org

so when we decided to remove the downloads and encourage people to wait
for the 17.2 etc. releases, we had the known CVEs in Postgres releases
with no recommended way to fix them.

I am not sure what we could have done differently, but I am surprised we
didn't get more complaints about the security situation we put them in.

The announcement[1] specified the issues and advised waiting if users
were impacted by them directly (and tried to be as specific as possible)
and gave guidance to prevent help users avoid upgrading and then ending
up in a situation where they're broken, regardless if they're impacted
by the CVE or not (e.g. they don't have PL/Perl installed).

That said, while it's certainly advisable to upgrade based on having
CVEs in a release, many upgrade patterns are determined by the CVE
score[2]. For example, a HIGH score (7.0 - 8.9 - our highest for this
release was 8.8; 3 of them were less than 5.0) often dictates upgrading
within 14-30 days of announcing the CVE, and lower scores having more
time. This could be why people didn't complain, particularly because we
got the announcement out 36 hours after the release, and stated the
updates would be available within the next week.

Thanks,

Jonathan

[1]
https://www.postgresql.org/about/news/out-of-cycle-release-scheduled-for-november-21-2024-2958/

[2] https://www.first.org/cvss/v3.1/specification-document

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

On Wed, Nov 20, 2024 at 07:40:36PM -0700, David G. Johnston wrote:
> On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian  wrote:
> 
> so when we decided to remove the downloads
> 
> 
> Can you elaborate on who "we" is here?
> 
> I don't recall this event happening.

Uh, I only see 17.0 available for Windows, MacOS, and all EDB downloads,
not 17.1:

https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

I am not sure if other distributions removed 17.1.

> I suppose "encouraging people to wait" is arguably a bad position to take
> compared to directing them to a page on our wiki where the risk factors are
> laid out so they can make an informed decision based upon their situation.  
> But
> that seems like a person-to-person matter and not something the project can
> take responsibility for or control.  So, "immediately create a wiki page when
> PR-level problems arise" could be added to the "could have done better" list,
> so people have a URL to send instead of off-the-cuff advice.

Interesting.

> Obviously "alter role set role" is a quite common usage in our community yet 
> we
> lack any regression or tap tests exercising it.  That we could have done 
> better
> and caught the bug in the CVE fix.

Yes, I saw a lot of reports about this failure.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024

On Wed, Nov 20, 2024 at 09:49:27PM -0500, Jonathan Katz wrote:
> That said, while it's certainly advisable to upgrade based on having CVEs in
> a release, many upgrade patterns are determined by the CVE score[2]. For
> example, a HIGH score (7.0 - 8.9 - our highest for this release was 8.8; 3
> of them were less than 5.0) often dictates upgrading within 14-30 days of
> announcing the CVE, and lower scores having more time. This could be why
> people didn't complain, particularly because we got the announcement out 36
> hours after the release, and stated the updates would be available within
> the next week.

Makes sense.  This is the discussion I wanted to have.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-11-20 Thread vignesh C

On Tue, 19 Nov 2024 at 12:51, Nisha Moond  wrote:
>
> On Thu, Nov 14, 2024 at 9:14 AM vignesh C  wrote:
> >
> > On Wed, 13 Nov 2024 at 15:00, Nisha Moond  wrote:
> > >
> > > Please find the v48 patch attached.
> > >
> > 2) Currently it allows a minimum value of less than 1 second like in
> > milliseconds, I feel we can have some minimum value at least something
> > like checkpoint_timeout:
> > diff --git a/src/backend/utils/misc/guc_tables.c
> > b/src/backend/utils/misc/guc_tables.c
> > index 8a67f01200..367f510118 100644
> > --- a/src/backend/utils/misc/guc_tables.c
> > +++ b/src/backend/utils/misc/guc_tables.c
> > @@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] =
> > NULL, NULL, NULL
> > },
> >
> > +   {
> > +   {"replication_slot_inactive_timeout", PGC_SIGHUP,
> > REPLICATION_SENDING,
> > +   gettext_noop("Sets the amount of time a
> > replication slot can remain inactive before "
> > +"it will be invalidated."),
> > +   NULL,
> > +   GUC_UNIT_S
> > +   },
> > +   &replication_slot_inactive_timeout,
> > +   0, 0, INT_MAX,
> > +   NULL, NULL, NULL
> > +   },
> >
>
> Currently, the feature is disabled by default when
> replication_slot_inactive_timeout = 0. However, if we set a minimum
> value, the default_val cannot be less than min_val, making it
> impossible to use 0 to disable the feature.
> Thoughts or any suggestions?

We could implement this similarly to how the vacuum_buffer_usage_limit
GUC is handled. Setting the value to 0 would allow the operation to
use any amount of shared_buffers. Otherwise, valid sizes would range
from 128 kB to 16 GB. Similarly, we can modify
check_replication_slot_inactive_timeout to behave in the same way as
check_vacuum_buffer_usage_limit function.

Regards,
Vignesh

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024


On 11/20/24 10:08 PM, Bruce Momjian wrote:

On Wed, Nov 20, 2024 at 09:51:09PM -0500, Jonathan Katz wrote:

On 11/20/24 9:50 PM, Jonathan S. Katz wrote:

On 11/20/24 9:48 PM, Tom Lane wrote:

"David G. Johnston"  writes:

On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian  wrote:

so when we decided to remove the downloads



Can you elaborate on who "we" is here?


More to the point, what downloads were removed?  I still see the
source tarballs in the usual place [1].  If some packager(s) removed
or never posted derived packages, that's on them not the project.


Downloads weren't removed, and I don't see why we'd want to do so in
this case.


Maybe here's the confusion - EDB doesn't have the downloads for the latest
released posted on the Windows installer:

https://www.enterprisedb.com/downloads/postgres-postgresql-downloads


Yes, or for MacOS.


Well, why did EDB remove them? We didn't issue any guidance to remove 
downloads. We only provided guidance to users on decision making about 
whether to wait or not around the upgrade. All of the other packages 
hosted on community infrastructure (and AFAICT other OS distros) are all 
available.


Jonathan

Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024