date:20240820

Re: CREATE SUBSCRIPTION - add missing test case

2024-08-20 Thread vignesh C

On Tue, 20 Aug 2024 at 08:21, Peter Smith  wrote:
>
> On Fri, Aug 16, 2024 at 2:15 PM vignesh C  wrote:
> >
>
> Thanks for the review.
>
> >
> > I agree currently there is no test to hit this code. I'm not sure if
> > this is the correct location for the test, should it be included in
> > the 008_diff_schema.pl file?
>
> Yes, that is a better home for this test. Done as suggested in
> attached patch v2.

Thanks, this looks good to me.

Regards,
Vignesh

Re: Fix memory counter update in reorderbuffer

2024-08-20 Thread Masahiko Sawada

Hi,

On Fri, Aug 16, 2024 at 12:22 AM Shlok Kyal  wrote:
>
> On Wed, 7 Aug 2024 at 11:48, Amit Kapila  wrote:
> >
> > On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > I found a bug in the memory counter update in reorderbuffer. It was
> > > > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected.
> > > > >
> > > > > In ReorderBufferCleanupTXN() we zero the transaction size and then
> > > > > free the transaction entry as follows:
> > > > >
> > > > > /* Update the memory counter */
> > > > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);
> > > > >
> > > > > /* deallocate */
> > > > > ReorderBufferReturnTXN(rb, txn);
> > > > >
> > > >
> > > > Why do we need to zero the transaction size explicitly? Shouldn't it
> > > > automatically become zero after freeing all the changes?
> > >
> > > It will become zero after freeing all the changes. However, since
> > > updating the max-heap when freeing each change could bring some
> > > overhead, we freed the changes without updating the memory counter,
> > > and then zerod it.
> > >
> >
> > I think this should be covered in comments as it is not apparent.
> >
> > >
> > > > BTW, commit 5bec1d6bc5e also introduced
> > > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which
> > > > is also worth considering while fixing the reported problem. It may
> > > > not have the same problem as you have reported but we can consider
> > > > whether setting txn size as zero explicitly is required or not.
> > >
> > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the
> > > same as I mentioned above. And yes, as you mentioned, it doesn't have
> > > the same problem that I reported here.
> > >
> >
> > I checked again and found that ReorderBufferResetTXN() first calls
> > ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After
> > that, it also tries to free spec_insert change which should have the
> > same problem. So, what saves this path from the same problem?
>
> I tried testing this scenario and I was able to reproduce the crash in
> HEAD with this scenario. I have created a patch for the testcase.
> I also tested the same scenario with the latest patch shared by
> Sawada-san in [1]. And confirm that this resolves the issue.

Thank you for testing the patch!

I'm slightly hesitant to add a test under src/test/subscription since
it's a bug in ReorderBuffer and not specific to logical replication.
If we reasonably cannot add a test under contrib/test_decoding, I'm
okay with adding it under src/test/subscription.

I've attached the updated patch with the commit message (but without a
test case for now).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v3-0001-Fix-memory-counter-update-in-ReorderBuffer.patch
Description: Binary data

Re: define PG_REPLSLOT_DIR

2024-08-20 Thread Yugo Nagata

On Tue, 20 Aug 2024 17:47:57 +0900
Michael Paquier  wrote:

> On Mon, Aug 19, 2024 at 02:11:55PM +, Bertrand Drouvot wrote:
> > I made the changes for pg_tblspc in pg_combinebackup.c as the number of 
> > occurences
> > are greater that the "pg_wal" ones and we were to define PG_TBLSPC_DIR in 
> > any
> > case.
> > 
> > Please find attached the related patches.
> 
> No real objection about the replslot and pg_logical bits.
> 
> - *$PGDATA/pg_tblspc/spcoid/PG_MAJORVER_CATVER/dboid/relfilenumber
> + *
> $PGDATA/PG_TBLSPC_DIR/spcoid/PG_MAJORVER_CATVER/dboid/relfilenumber
> 
> For the tablespace parts, I am not sure that I would update the
> comments to reflect the variables, TBH.  Somebody reading the comments
> would need to refer back to pg_tblspc/ in the header.

I also think that it is not necessary to change the comments even for 
pg_replslot.

- * Each replication slot gets its own directory inside the $PGDATA/pg_replslot
+ * Each replication slot gets its own directory inside the 
$PGDATA/PG_REPLSLOT_DIR

For example, I found that comments in xlog.c use "pg_wal" even though XLOGDIR 
is used
in the codes as below, and I don't feel any problem for this.

> static void 
> ValidateXLOGDirectoryStructure(void)
> {
> charpath[MAXPGPATH];
> struct stat stat_buf;
>
> /* Check for pg_wal; if it doesn't exist, error out */
> if (stat(XLOGDIR, &stat_buf) != 0 || 
> !S_ISDIR(stat_buf.st_mode))



Should be the follwing also rewritten using sizeof(PG_REPLSLOT_DIR)?

   struct stat statbuf;
charpath[MAXPGPATH * 2 + 12];


Regards,
Yugo Nagata

-- 
Yugo Nagata

Re: Track the amount of time waiting due to cost_delay

2024-08-20 Thread Bertrand Drouvot

Hi,

On Mon, Jul 01, 2024 at 04:59:25AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Fri, Jun 28, 2024 at 08:07:39PM +, Imseih (AWS), Sami wrote:
> > > 46ebdfe164 will interrupt the leaders sleep every time a parallel workers 
> > > reports
> > > progress, and we currently don't handle interrupts by restarting the 
> > > sleep with
> > > the remaining time. nanosleep does provide the ability to restart with 
> > > the remaining
> > > time [1], but I don't think it's worth the effort to ensure more accurate
> > > vacuum delays for the leader process. 
> > 
> > After discussing offline with Bertrand, it may be better to have 
> > a solution to deal with the interrupts and allows the sleep to continue to
> > completion. This will simplify this patch and will be useful 
> > for other cases in which parallel workers need to send a message
> > to the leader. This is the thread [1] for that discussion.
> > 
> > [1] 
> > https://www.postgresql.org/message-id/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-00%40email.amazonses.com
> > 
> 
> Yeah, I think it would make sense to put this thread on hold until we know 
> more
> about [1] (you mentioned above) outcome.

As it looks like we have a consensus not to wait on [0] (as reducing the number
of interrupts makes sense on its own), then please find attached v4, a rebase
version (that also makes clear in the doc that that new field might show 
slightly
old values, as mentioned in [1]).

[0]: 
https://www.postgresql.org/message-id/flat/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-00%40email.amazonses.com
[1]: https://www.postgresql.org/message-id/ZruMe-ppopQX4uP8%40nathan

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 90196125d1262095d02f0df74bb6cab0d03c75ff Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 24 Jun 2024 08:43:26 +
Subject: [PATCH v4] Report the total amount of time that vacuum has been
 delayed due to cost delay

This commit adds one column: time_delayed to the pg_stat_progress_vacuum system
view to show the total amount of time in milliseconds that vacuum has been
delayed.

This uses the new parallel message type for progress reporting added
by f1889729dd.

In case of parallel worker, to avoid the leader to be interrupted too frequently
(while it might be sleeping for cost delay), the report is done only if the last
report has been done more than 1 second ago.

Having a time based only approach to throttle the reporting of the parallel
workers sounds reasonable.

Indeed when deciding about the throttling:

1. The number of parallel workers should not come into play:

 1.1) the more parallel workers is used, the less the impact of the leader on
 the vacuum index phase duration/workload is (because the repartition is done
 on more processes).

 1.2) the less parallel workers is, the less the leader will be interrupted (
 less parallel workers would report their delayed time).

2. The cost limit should not come into play as that value is distributed
proportionally among the parallel workers (so we're back to the previous point).

3. The cost delay does not come into play as the leader could be interrupted at
the beginning, the midle or whatever part of the wait and we are more interested
about the frequency of the interrupts.

3. A 1 second reporting "throttling" looks a reasonable threshold as:

 3.1 the idea is to have a significant impact when the leader could have been
interrupted say hundred/thousand times per second.

 3.2 it does not make that much sense for any tools to sample pg_stat_progress_vacuum
multiple times per second (so a one second reporting granularity seems ok).

Bump catversion because this changes the definition of pg_stat_progress_vacuum.
---
 doc/src/sgml/monitoring.sgml | 13 
 src/backend/catalog/system_views.sql |  2 +-
 src/backend/commands/vacuum.c| 49 
 src/include/catalog/catversion.h |  2 +-
 src/include/commands/progress.h  |  1 +
 src/test/regress/expected/rules.out  |  3 +-
 6 files changed, 67 insertions(+), 3 deletions(-)
  23.5% doc/src/sgml/
   4.2% src/backend/catalog/
  63.4% src/backend/commands/
   4.6% src/include/
   4.0% src/test/regress/expected/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 55417a6fa9..d87604331a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6307,6 +6307,19 @@ FROM pg_stat_get_backend_idset() AS backendid;
cleaning up indexes.

+
+ 
+  
+   time_delayed bigint
+  
+  
+   Total amount of time spent in milliseconds waiting due to vacuum_cost_delay
+   or autovacuum_vacuum_cost_delay. In case of parallel
+   vacuum the reported time is across all the workers and the leader. This
+   column is updated at a 1 Hz frequency (one time per second) so could show
+   slightly old val

Re: Restart pg_usleep when interrupted

2024-08-20 Thread Bertrand Drouvot

Hi,

On Tue, Aug 13, 2024 at 11:40:27AM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 11:07:46AM -0500, Imseih (AWS), Sami wrote:
> > Having to add special handling to space out instrumentation
> > directly in vacuum_delay_point seems very odd to me. I don't
> > think vacuum_delay_point should have to worry about this.
> > 
> > Also,
> > 1/ what is an appropriate interval to collect these stats?
> > 2/ What if there are other callers in the future that wish
> > to instrument parallel vacuum workers? they will need to implement
> > similar logic.
> 
> None of this seems intractable to me.  1 Hz seems like an entirely
> reasonable place to start, and it is very easy to change (or to even make
> configurable).  pg_stat_progress_vacuum might show slightly old values in
> this column, but that should be easy enough to explain in the docs if we
> are really concerned about it.  If other callers want to do something
> similar, maybe we should add a more generic implementation in
> backend_progress.c.
> 

As it looks like we have a consensus that reducing the number of interrupts 
also 
makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
also makes clear in the doc that the new field might show slightly old values).

[0]: 
https://www.postgresql.org/message-id/ZsSQnS9OW9EWSOk4%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: generic plans and "initial" pruning

2024-08-20 Thread Amit Langote

On Tue, Aug 20, 2024 at 1:39 AM Robert Haas  wrote:
> On Fri, Aug 16, 2024 at 8:36 AM Amit Langote  wrote:
> > So it is possible for the executor to try to run a plan that has
> > become invalid since it was created, so...
>
> I'm not sure what the "so what" here is.

I meant that if the executor has to deal with broken plans anyway, we
might as well lean into that fact by choosing not to handle only the
cached plan case in a certain way.  Yes, I understand that that's not
a good justification.

> > One perhaps crazy idea [1]:
> >
> > What if we remove AcquireExecutorLocks() and move the responsibility
> > of taking the remaining necessary locks into the executor (those on
> > any inheritance children that are added during planning and thus not
> > accounted for by AcquirePlannerLocks()), like the patch already does,
> > but don't make it also check if the plan has become invalid, which it
> > can't do anyway unless it's from a CachedPlan.  That means we instead
> > let the executor throw any errors that occur when trying to either
> > initialize the plan because of the changes that have occurred to the
> > objects referenced in the plan, like what is happening in the above
> > example.  If that case is going to be rare anway, why spend energy on
> > checking the validity and replan, especially if that's not an easy
> > thing to do as we're finding out.  In the above example, we could say
> > that it's a user error to create a rule like that, so it should not
> > happen in practice, but when it does, the executor seems to deal with
> > it correctly by refusing to execute a broken plan .  Perhaps it's more
> > worthwhile to make the executor behave correctly in face of plan
> > invalidation than teach the rest of the system to deal with the
> > executor throwing its hands up when it runs into an invalid plan?
> > Again, I think this may be a crazy line of thinking but just wanted to
> > get it out there.
>
> I don't know whether this is crazy or not. I think there are two
> issues. One, the set of checks that we have right now might not be
> complete, and we might just not have realized that because it happens
> infrequently enough that we haven't found all the bugs. If that's so,
> then a change like this could be a good thing, because it might force
> us to fix stuff we should be fixing anyway. I have a feeling that some
> of the checks you hit there were added as bug fixes long after the
> code was written originally, so my confidence that we don't have more
> bugs isn't especially high.

This makes sense.

> And two, it matters a lot how frequent the errors will be in practice.
> I think we normally try to replan rather than let a stale plan be used
> because we want to not fail, because users don't like failure. If the
> design you propose here would make failures more (or less) frequent,
> then that's a problem (or awesome).

I think we'd modify plancache.c to postpone the locking of only
prunable relations (i.e., partitions), so we're looking at only a
handful of concurrent modifications that are going to cause execution
errors.  That's because we disallow many DDL modifications of
partitions unless they are done via recursion from the parent, so the
space of errors in practice would be smaller compared to if we were to
postpone *all* cached plan locks to ExecInitNode() time.  DROP INDEX
a_partion_only_index comes to mind as something that might cause an
error.  I've not tested if other partition-only constraints can cause
unsafe behaviors.

Perhaps, we can add the check for CachedPlan.is_valid after every
table_open() and index_open() in the executor that takes a lock or at
all the places we discussed previously and throw the error (say:
"cached plan is no longer valid") if it's false.  That's better than
running into and throwing into some random error by soldiering ahead
with its initialization / execution, but still a loss in terms of user
experience because we're adding a new failure mode, however rare.

-- 
Thanks, Amit Langote

Re: Remaining reference to _PG_fini() in ldap_password_func

2024-08-20 Thread Tom Lane

Heikki Linnakangas  writes:
> On 20/08/2024 07:46, Michael Paquier wrote:
>> How about removing it like in the attached to be consistent?

> +1. There's also a prototype for _PG_fini() in fmgr.h, let's remove that 
> too.

+1.  I think the fmgr.h prototype may have been left there
deliberately to avoid breaking extension code, but it's past
time to clean it up.

regards, tom lane

Re: generic plans and "initial" pruning

2024-08-20 Thread Amit Langote

On Tue, Aug 20, 2024 at 3:21 AM Robert Haas  wrote:
> On Mon, Aug 19, 2024 at 1:52 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > But that seems somewhat incidental to what this thread is about.
> >
> > Perhaps.  But if we're running into issues related to that, it might
> > be good to set aside the long-term goal for a bit and come up with
> > a cleaner answer for intra-session locking.  That could allow the
> > pruning problem to be solved more cleanly in turn, and it'd be
> > an improvement even if not.
>
> Maybe, but the pieces aren't quite coming together for me. Solving
> this would mean that if we execute a stale plan, we'd be more likely
> to get a good error and less likely to get a bad, nasty-looking
> internal error, or a crash. That's good on its own terms, but we don't
> really want user queries to produce errors at all, so I don't think
> we'd feel any more free to rearrange the order of operations than we
> do today.

Yeah, it's unclear whether executing a potentially stale plan is an
acceptable tradeoff compared to replanning, especially if it occurs
rarely. Personally, I would prefer that it is.

> > > Do you have a view on what the way forward might be?
> >
> > I'm fresh out of ideas at the moment, other than having a hope that
> > divide-and-conquer (ie, solving subproblems first) might pay off.
>
> Fair enough, but why do you think that the original approach of
> creating a data structure from within the plan cache mechanism
> (probably via a call into some new executor entrypoint) and then
> feeding that through to ExecutorRun() time can't work?

That would be ExecutorStart().  The data structure need not be
referenced after ExecInitNode().

> Is it possible
> you latched onto some non-optimal decisions that the early versions of
> the patch made, rather than there being a fundamental problem with the
> concept?
>
> I actually thought the do-it-at-executorstart-time approach sounded
> pretty good, even though we might have to abandon planstate tree
> initialization partway through, right up until Amit started talking
> about moving ExecutorStart() from PortalRun() to PortalStart(), which
> I have a feeling is going to create a bigger problem than we can
> solve. I think if we want to save that approach, we should try to
> figure out if we can teach the plancache to replan one query from a
> list without replanning the others, which seems like it might allow us
> to keep the order of major operations unchanged.  Otherwise, it makes
> sense to me to have another go at the other approach, at least to make
> sure we understand clearly why it can't work.

+1

-- 
Thanks, Amit Langote

Re: why is pg_upgrade's regression run so slow?

2024-08-20 Thread Andrew Dunstan




On 2024-08-19 Mo 8:00 AM, Alexander Lakhin wrote:

Hello Andrew,

29.07.2024 13:54, Andrew Dunstan wrote:


On 2024-07-29 Mo 4:00 AM, Alexander Lakhin wrote:


And another interesting fact is that TEMP_CONFIG is apparently 
ignored by

`meson test regress/regress`. For example, with temp.config containing
invalid settings, `meson test pg_upgrade/002_pg_upgrade` fails, but
`meson test regress/regress` passes just fine.




Well, that last fact explains the discrepancy I originally complained 
about. How the heck did that happen? It looks like we just ignored 
its use in Makefile.global.in :-(


Please look at the attached patch (partially based on ideas from [1]) for
meson.build, that aligns it with `make` in regard to use of TEMP_CONFIG.

Maybe it could be implemented via a separate meson option, but that would
also require modifying at least the buildfarm client...

[1] 
https://www.postgresql.org/message-id/CAN55FZ304Kp%2B510-iU5-Nx6hh32ny9jgGn%2BOB5uqPExEMK1gQQ%40mail.gmail.com





I think this is the way to go. The patch LGTM.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Some questions about PostgreSQL’s design.

2024-08-20 Thread Heikki Linnakangas

On 20/08/2024 11:46, 陈宗志 wrote:

I’ve recently started exploring PostgreSQL implementation. I used to
be a MySQL InnoDB developer, and I find the PostgreSQL community feels
a bit strange.

There are some areas where they’ve done really well, but there are
also some obvious issues that haven’t been improved.

For example, the B-link tree implementation in PostgreSQL is
particularly elegant, and the code is very clean.
But there are some clear areas that could be improved but haven’t been
addressed, like the double memory problem where the buffer pool and
page cache store the same page, using full-page writes to deal with
torn page writes instead of something like InnoDB’s double write
buffer.

It seems like these issues have clear solutions, such as using
DirectIO like InnoDB instead of buffered IO, or using a double write
buffer instead of relying on the full-page write approach.
Can anyone replay why?

There are pros and cons. With direct I/O, you cannot take advantage of
the kernel page cache anymore, so it becomes important to tune
shared_buffers more precisely. That's a downside: the system requires
more tuning. For many applications, squeezing the last ounce of
performance just isn't that important. There are also scaling issues
with the Postgres buffer cache, which might need to be addressed first.

With double write buffering, there are also pros and cons. It also
requires careful tuning. And replaying WAL that contains full-page
images can be much faster, because you can write new page images
"blindly" without reading the old pages first. We have WAL prefetching
now, which alleviates that, but it's no panacea.

In summary, those are good solutions but they're not obviously better in
all circumstances.

However, the PostgreSQL community’s mailing list is truly a treasure
trove, where you can find really interesting discussions. For
instance, this discussion on whether lock coupling is needed for
B-link trees, etc.
https://www.postgresql.org/message-id/flat/CALJbhHPiudj4usf6JF7wuCB81fB7SbNAeyG616k%2Bm9G0vffrYw%40mail.gmail.com

1 2 >

1 - 100 of 120 matches

Mail list logo