On Wed, Jan 25, 2023 at 11:49:27AM -0500, Tom Lane wrote:
> Right. I fixed some other infelicities and pushed it.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Wed, Jan 25, 2023 at 04:12:00PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart
> wrote in
>> Here is a first attempt at a patch. I scanned through all the existing
>> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
>> else tha
Kyotaro Horiguchi writes:
> At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart
> wrote in
>> Here is a first attempt at a patch. I scanned through all the existing
>> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
>> else that needed adjusting.
> There seems to be
At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart
wrote in
> On Tue, Jan 24, 2023 at 01:13:55PM -0500, Tom Lane wrote:
> > Either that comment needs to be rewritten or we need to invent some
> > more macros.
>
> Here is a first attempt at a patch. I scanned through all the existing
> uses of
On Tue, Jan 24, 2023 at 01:13:55PM -0500, Tom Lane wrote:
> Either that comment needs to be rewritten or we need to invent some
> more macros.
Here is a first attempt at a patch. I scanned through all the existing
uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
else th
Nathan Bossart writes:
> IMO ideally there should be a DSA_HANDLE_INVALID and DSHASH_HANDLE_INVALID
> for use with dsa_handle and dshash_table_handle, respectively. But your
> patch does seem like an improvement.
Yeah, particularly given that dsa.h says
/*
* The handle for a dsa_area is curren
On Tue, Jan 24, 2023 at 02:55:07AM +, houzj.f...@fujitsu.com wrote:
> I noticed one minor thing in this commit.
>
> -
> LogicalRepCtx->last_start_dsh = DSM_HANDLE_INVALID;
> -
>
> The code takes the last_start_dsh as dsm_handle, but it seems it is a
> dsa_pointer.
> " typedef dsa_pointer ds
On Monday, January 23, 2023 3:13 AM Tom Lane wrote:
Hi,
>
> Nathan Bossart writes:
> > On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
> >> I haven't looked in detail but isn't it better to explain somewhere
> >> in the comments that it achieves to rate limit the restart of worker
On Sun, Jan 22, 2023 at 02:12:54PM -0500, Tom Lane wrote:
> I pushed v17 with some mostly-cosmetic changes, including more comments.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart writes:
> On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
>> I haven't looked in detail but isn't it better to explain somewhere in
>> the comments that it achieves to rate limit the restart of workers in
>> case of error and allows them to restart immediately in case o
On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
> I haven't looked in detail but isn't it better to explain somewhere in
> the comments that it achieves to rate limit the restart of workers in
> case of error and allows them to restart immediately in case of
> subscription parameter ch
On Sat, Jan 7, 2023 at 6:15 AM Nathan Bossart wrote:
>
> On Fri, Jan 06, 2023 at 05:31:26PM -0500, Tom Lane wrote:
>
> > Attached is a rebased 0003, just to keep the cfbot happy.
> > I'm kind of wondering whether 0003 is worth the complexity TBH,
> > but in any case I ran out of time to look at it
rebased for cfbot
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cf220c3bcb..7661c0c86e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1996,6 +1996,16 @@ postgres 2709
On Fri, Jan 06, 2023 at 05:31:26PM -0500, Tom Lane wrote:
> I've pushed 0001 and 0002, which seem pretty uncontroversial.
Thanks!
> Attached is a rebased 0003, just to keep the cfbot happy.
> I'm kind of wondering whether 0003 is worth the complexity TBH,
> but in any case I ran out of time to lo
Nathan Bossart writes:
> I found some additional places that should remove the last-start time from
> the hash table. I've added those in v14.
I've pushed 0001 and 0002, which seem pretty uncontroversial.
Attached is a rebased 0003, just to keep the cfbot happy.
I'm kind of wondering whether 000
I found some additional places that should remove the last-start time from
the hash table. I've added those in v14.
On Fri, Jan 06, 2023 at 10:30:18AM +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 10:49 PM Nathan Bossart
> wrote:
>> On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wro
On Thu, Jan 5, 2023 at 10:49 PM Nathan Bossart wrote:
>
> On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
> > On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart
> > wrote:
> >> In v12, I moved the restart for two_phase mode to the end of
> >> process_syncing_tables_for_apply() so that w
On Thu, Jan 05, 2023 at 09:29:24AM -0800, Nathan Bossart wrote:
> On Thu, Jan 05, 2023 at 10:57:58AM +0530, Amit Kapila wrote:
>> True, if we want we can use dshash for this.
>
> I'll look into this.
Here is an attempt at using dshash. This is quite a bit cleaner since we
don't need garbage coll
On Thu, Jan 05, 2023 at 10:57:58AM +0530, Amit Kapila wrote:
> True, if we want we can use dshash for this.
I'll look into this.
> The garbage collection
> mechanism used in the patch seems odd to me as that will remove/add
> entries to the hash table even when the corresponding subscription is
>
On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart
> wrote:
>> In v12, I moved the restart for two_phase mode to the end of
>> process_syncing_tables_for_apply() so that we don't need to rely on another
>> iteration of the loop.
>
> This
On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart wrote:
>
> On Wed, Jan 04, 2023 at 08:12:37PM -0800, Nathan Bossart wrote:
> > On Thu, Jan 05, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
> >> But there doesn't appear to be any guarantee that the result for
> >> AllTablesyncsReady() will change bet
On Thu, Jan 5, 2023 at 6:19 AM Nathan Bossart wrote:
>
> On Wed, Jan 04, 2023 at 10:12:19AM -0800, Nathan Bossart wrote:
> > From the discussion thus far, it sounds like the alternatives are to 1) add
> > a global flag that causes wal_retrieve_retry_interval to be bypassed for
> > all workers or t
On Wed, Jan 04, 2023 at 08:12:37PM -0800, Nathan Bossart wrote:
> On Thu, Jan 05, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
>> But there doesn't appear to be any guarantee that the result for
>> AllTablesyncsReady() will change between the time it is invoked
>> earlier in the function and at the
On Thu, Jan 05, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
> On Wed, Jan 4, 2023 at 11:03 PM Nathan Bossart
> wrote:
>> On Wed, Jan 04, 2023 at 09:41:47AM +0530, Amit Kapila wrote:
>> > If so, we probably also need to
>> > ensure that table_states_valid is marked false probably via
>> > invalid
On Wed, Jan 4, 2023 at 11:03 PM Nathan Bossart wrote:
>
> On Wed, Jan 04, 2023 at 09:41:47AM +0530, Amit Kapila wrote:
> > I am not sure if I understand the problem you are trying to solve with
> > this part of the patch. Are you worried that after we mark some of the
> > relation's state as READY
On Wed, Jan 04, 2023 at 10:12:19AM -0800, Nathan Bossart wrote:
> From the discussion thus far, it sounds like the alternatives are to 1) add
> a global flag that causes wal_retrieve_retry_interval to be bypassed for
> all workers or to 2) add a hash map in the launcher and a
> restart_immediately
On Wed, Jan 04, 2023 at 10:57:43AM +0530, Amit Kapila wrote:
> On Tue, Jan 3, 2023 at 11:40 PM Nathan Bossart
> wrote:
>> My approach was to add a variable to LogicalRepWorker that indicated
>> whether a worker needed to be restarted immediately. While this is a
>> little weird because the worke
On Wed, Jan 04, 2023 at 09:41:47AM +0530, Amit Kapila wrote:
> I am not sure if I understand the problem you are trying to solve with
> this part of the patch. Are you worried that after we mark some of the
> relation's state as READY, all the table syncs are in the READY state
> but we will not im
On Tue, Jan 3, 2023 at 11:40 PM Nathan Bossart wrote:
>
> On Tue, Jan 03, 2023 at 11:03:32AM +0530, Amit Kapila wrote:
> > On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart
> > wrote:
> >> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> >> > Maybe we could have workers that are exiting
On Tue, Jan 3, 2023 at 11:51 PM Nathan Bossart wrote:
>
> On Tue, Jan 03, 2023 at 11:43:59AM +0530, Amit Kapila wrote:
> > On Wed, Dec 7, 2022 at 11:42 PM Nathan Bossart
> > wrote:
> >> After sleeping on this, I think we can do better. IIUC we can simply check
> >> for AllTablesyncsReady() at t
On Tue, Jan 03, 2023 at 11:43:59AM +0530, Amit Kapila wrote:
> On Wed, Dec 7, 2022 at 11:42 PM Nathan Bossart
> wrote:
>> After sleeping on this, I think we can do better. IIUC we can simply check
>> for AllTablesyncsReady() at the end of process_syncing_tables_for_apply()
>> and wake up the log
On Tue, Jan 03, 2023 at 11:03:32AM +0530, Amit Kapila wrote:
> On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart
> wrote:
>> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
>> > Maybe we could have workers that are exiting for that reason set a
>> > flag saying "please restart me without d
On Wed, Dec 7, 2022 at 11:42 PM Nathan Bossart wrote:
>
> On Wed, Dec 07, 2022 at 02:07:11PM +0300, Melih Mutlu wrote:
> > Do we also need to wake up all sync workers too? Even if not, I'm not
> > actually sure whether doing that would harm anything though.
> > Just asking since currently the patc
On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart wrote:
>
> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> > Maybe we could have workers that are exiting for that reason set a
> > flag saying "please restart me without delay"?
>
> That helps a bit, but there are still delays when starti
On Sun, Dec 18, 2022 at 03:36:07PM -0800, Nathan Bossart wrote:
> This seems to have somehow broken the archiving tests on Windows, so
> obviously I owe some better analysis here. I didn't see anything obvious
> in the logs, but I will continue to dig.
On Windows, WaitForWALToBecomeAvailable() se
On Thu, Dec 15, 2022 at 02:47:21PM -0800, Nathan Bossart wrote:
> I tried setting wal_retrieve_retry_interval to 1ms for all TAP tests
> (similar to what was done in 2710ccd), and I noticed that the recovery
> tests consistently took much longer. Upon further inspection, it looks
> like the same (
I tried setting wal_retrieve_retry_interval to 1ms for all TAP tests
(similar to what was done in 2710ccd), and I noticed that the recovery
tests consistently took much longer. Upon further inspection, it looks
like the same (or a very similar) race condition described in e5d494d's
commit message
On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> Maybe we could have workers that are exiting for that reason set a
> flag saying "please restart me without delay"?
That helps a bit, but there are still delays when starting workers for new
subscriptions. I think we'd need to create a n
Nathan Bossart writes:
> On Wed, Dec 14, 2022 at 01:23:18PM -0500, Tom Lane wrote:
>> Oh. What in the world is the rationale for that?
> My assumption is that this is meant to avoid starting workers as fast as
> possible if they repeatedly crash.
I can see the point of rate-limiting if the work
On Wed, Dec 14, 2022 at 01:23:18PM -0500, Tom Lane wrote:
> Nathan Bossart writes:
>> I'm reasonably certain the launcher is already signaled like you describe.
>> It'll just wait to start new workers if it's been less than
>> wal_retrieve_retry_interval milliseconds since the last time it started
Nathan Bossart writes:
> I'm reasonably certain the launcher is already signaled like you describe.
> It'll just wait to start new workers if it's been less than
> wal_retrieve_retry_interval milliseconds since the last time it started
> workers.
Oh. What in the world is the rationale for that?
On Wed, Dec 14, 2022 at 12:42:32PM -0500, Tom Lane wrote:
> Nathan Bossart writes:
>> My first thought is that the latter two uses should be moved to a new
>> parameter, and the apply launcher should store the last start time for each
>> apply worker like the apply workers do for the table-sync wo
Nathan Bossart writes:
> My first thought is that the latter two uses should be moved to a new
> parameter, and the apply launcher should store the last start time for each
> apply worker like the apply workers do for the table-sync workers. In any
> case, it probably makes sense to lower this pa
On Tue, Dec 13, 2022 at 04:41:05PM -0800, Nathan Bossart wrote:
> On Tue, Dec 13, 2022 at 07:20:14PM -0500, Tom Lane wrote:
>> I certainly don't think that "wake the apply launcher every 1ms"
>> is a sane configuration. Unless I'm missing something basic about
>> its responsibilities, it should se
On Tue, Dec 13, 2022 at 07:20:14PM -0500, Tom Lane wrote:
> I certainly don't think that "wake the apply launcher every 1ms"
> is a sane configuration. Unless I'm missing something basic about
> its responsibilities, it should seldom need to wake at all in
> normal operation.
This parameter appea
Nathan Bossart writes:
> On Tue, Dec 13, 2022 at 06:32:08PM -0500, Tom Lane wrote:
>> I've not chased it further than that, but I venture that the apply
>> launcher also needs a kick in the pants, and/or there needs to be
>> an interlock to ensure that it doesn't wake until after the old
>> apply
On Tue, Dec 13, 2022 at 06:32:08PM -0500, Tom Lane wrote:
> Before, there was up to 1 second (with multiple "SELECT count(1) = 0"
> probes from the test script) between the ALTER SUBSCRIPTION command
> and the "apply worker will restart" log entry. That wait is pretty
> well zapped, but instead no
Nathan Bossart writes:
> After sleeping on this, I think we can do better. IIUC we can simply check
> for AllTablesyncsReady() at the end of process_syncing_tables_for_apply()
> and wake up the logical replication workers (which should just consiѕt of
> setting the current process's latch) if we
On Wed, Dec 07, 2022 at 02:07:11PM +0300, Melih Mutlu wrote:
> Do we also need to wake up all sync workers too? Even if not, I'm not
> actually sure whether doing that would harm anything though.
> Just asking since currently the patch wakes up all workers including sync
> workers if any still exis
Hi,
> Actually, that's not quite right. The sync worker will wake up the apply
> worker to change the state from SYNCDONE to READY. AllTablesyncsReady()
> checks that all tables are READY, so we need to wake up all the workers
> when an apply worker changes the state to READY. Each worker will
On Tue, Dec 06, 2022 at 11:25:51AM -0800, Nathan Bossart wrote:
> On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote:
>> - When the state is SYNCDONE and the apply worker has to wake up to change
>> the state to READY.
>>
>> I think we already call logicalrep_worker_wakeup_ptr wherever it
Thanks for reviewing!
On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote:
> Is it really necessary to wake logical workers up when renaming other than
> subscription or publication? address.objectId will be a valid subid only
> when renaming a subscription.
Oops, that is a mistake. I on
Hi Nathan,
@@ -410,6 +411,12 @@ ExecRenameStmt(RenameStmt *stmt)
> stmt->newname);
> table_close(catalog, RowExclusiveLock);
>
> + /*
> + * Wake up the logical replication workers to handle this
> + * change quickly.
> + */
> + LogicalRepWorkersWakeupAtCommit(address.objectId);
Is it reall
On Thu, Dec 01, 2022 at 04:21:30PM -0800, Nathan Bossart wrote:
> Okay, here is a new version of the patch. This seems to clear up
> everything that I could find via the tests.
I cleaned up the patch a bit.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 72adfb291ee84b8a20f6
ttps://aws.amazon.com
>From 79db8837e5b3054fb6007326d230aef50f3cda87 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v4 1/1] wake up logical workers after ALTER SUBSCRIPTION and
when two_phase mode can be enabled
---
src/backend/access/tr
On Wed, Nov 30, 2022 at 05:27:40PM +1300, Thomas Munro wrote:
> On Wed, Nov 30, 2022 at 5:23 PM Thomas Munro wrote:
>> On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart
>> wrote:
>> > I spent some more time on the prevent-unnecessary-wakeups patch for
>> > logical/worker.c that I've been alluding t
Dear Nathan,
> I spent some more time on the prevent-unnecessary-wakeups patch for
> logical/worker.c that I've been alluding to in this thread, and I found a
> few more places where we depend on the worker periodically waking up. This
> seems to be a common technique, so I'm beginning to wonder
On Wed, Nov 30, 2022 at 5:23 PM Thomas Munro wrote:
> On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart
> wrote:
> > I spent some more time on the prevent-unnecessary-wakeups patch for
> > logical/worker.c that I've been alluding to in this thread, and I found a
> > few more places where we depend
On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart wrote:
> I spent some more time on the prevent-unnecessary-wakeups patch for
> logical/worker.c that I've been alluding to in this thread, and I found a
> few more places where we depend on the worker periodically waking up. This
> seems to be a comm
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes
On Thu, Nov 24, 2022 at 05:26:27AM +, Hayato Kuroda (Fujitsu) wrote:
> I have no comments for the v3 patch.
Thanks for reviewing! Does anyone else have thoughts on the patch?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Dear Nathan,
Thank you for updating the patch!
> In v3, I moved the call to LogicalRepWorkersWakeupAtCommit() to the end of
> the function. This should avoid waking up workers in some cases where it's
> unnecessary (e.g., if ALTER SUBSCRIPTION ERRORs in a subtransaction), but
> there are still c
On Tue, Nov 22, 2022 at 04:59:28PM +0530, Amit Kapila wrote:
> On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart
> wrote:
>> While working on avoiding unnecessary wakeups in logical/worker.c (as was
>> done for walreceiver.c in 05a7be9), I noticed that the tests began taking
>> much longer. This se
subid
>> will
>> be recorded twice.
>> I'm not sure whether it may be caused some issued, but list_member_oid() can
>> be used to avoid that.
>
> +1, list_append_unique_oid might be better.
Done in v3.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon
On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart wrote:
>
> While working on avoiding unnecessary wakeups in logical/worker.c (as was
> done for walreceiver.c in 05a7be9), I noticed that the tests began taking
> much longer. This seems to be caused by the reduced frequency of calls to
> maybe_rerea
On Tuesday, November 22, 2022 2:49 PM Hayato Kuroda (Fujitsu)
>
> Dear Nathan,
>
> > I think you are correct. I did it this way in v2. I've also moved
> > the bulk of the logic to logical/worker.c.
>
> Thanks for updating! It becomes better. Further comments:
>
> 01. AlterSubscription()
>
On Tuesday, November 22, 2022 1:39 PM Nathan Bossart
wrote:
> On Tue, Nov 22, 2022 at 03:03:52AM +, Hayato Kuroda (Fujitsu) wrote:
> > Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be
> > executed inside the transaction. So if two subscriptions are altered
> > in the same tran
Dear Nathan,
> I think you are correct. I did it this way in v2. I've also moved the
> bulk of the logic to logical/worker.c.
Thanks for updating! It becomes better. Further comments:
01. AlterSubscription()
```
+ LogicalRepWorkersWakeupAtCommit(subid);
+
```
Currently subids will be r
e
bulk of the logic to logical/worker.c.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 57bbd9e8d9da72c50d9a41704f3f42b37aba33ab Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v2 1/1] wake up logical workers after ALTER SUB
Hi Nathan,
I have done almost same thing locally for [1], but I thought your code seemed
better.
Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be executed
inside the transaction. So if two subscriptions are altered in the same
transaction, only one of them will awake. Is it expec
On Tue, Nov 22, 2022 at 03:16:05PM +1300, Thomas Munro wrote:
> Maybe a comment to explain why a single variable is enough?
This crossed my mind shortly after sending my previous message. Looking
closer, I see that several types of ALTER SUBSCRIPTION do not call
PreventInTransactionBlock(), so a
On Tue, Nov 22, 2022 at 1:41 PM Nathan Bossart wrote:
> On my machine, the attached patch
> improved 'check-world -j8' run time by ~12 seconds (from 3min 8sec to 2min
> 56 sec) and src/test/subscription test time by ~17 seconds (from 139
> seconds to 122 seconds).
Nice!
Maybe a comment to explai
Thoughts?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 80452ebbadfe80f237ebf5c7cdf8fa5eb2a61e35 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v1 1/1] wake up logical workers after ALTER SUBSCRIPTION
---
src/backend/a
73 matches
Mail list logo