On Thu, Jan 16, 2025 at 09:42:49AM +0900, Michael Paquier wrote:
> I've applied the first refactoring bits down to v13 (see for example a
> s/emit_message/emit_wal/ tweaked for consistency, with more comment
> tweaks). Attached are patches for each branch for the bug fix, that
> I'm still testing
On Wed, Jan 15, 2025 at 10:35:42AM +0100, Alexander Kukushkin wrote:
> Thank you for picking it up. I briefly looked at both patches. The actual
> fix in XLogPageRead() looks good to me.
> I also agree with suggested refactoring, where there is certainly some room
> for improvement - $WAL_SEGMENT_
Hi Michael,
On Wed, 15 Jan 2025 at 05:45, Michael Paquier wrote:.
> The new regression test is something I really want to keep around,
> to be able to emulate the infinite loop, but I got annoyed with the
> amount of duplication between the new test and the existing
> 039_end_of_wal.pl as there
On Wed, Dec 25, 2024 at 12:00:59PM +0900, Michael Paquier wrote:
> All of them refer to an infinite loop reachable in the startup process
> when we read an incorrect incomplete record just after a failover or
> when a WAL receiver restarts. Not sure which way is best in order to
> fix all of them
On Fri, Mar 01, 2024 at 01:16:37PM +0900, Kyotaro Horiguchi wrote:
> This code intends to prevent a page header error from causing a record
> reread, when a record is required to be read from multiple sources. We
> could restrict this to only fire at segment boundaries. At segment
> boundaries, we
On Wed, Nov 13, 2024 at 02:18:06PM +0100, Alexander Kukushkin wrote:
> Now that v17 is released and before v18 feature freeze we have a few
> months, I hope you will find some time to look at it.
My apologies for taking a couple of weeks before coming back to this
thread. I have been informed a c
Hi Michael,
Now that v17 is released and before v18 feature freeze we have a few
months, I hope you will find some time to look at it.
On Wed, 5 Jun 2024 at 07:09, Michael Paquier wrote:
> On Tue, Jun 04, 2024 at 04:16:43PM +0200, Alexander Kukushkin wrote:
> > Now that beta1 was released I hop
On Tue, Jun 04, 2024 at 04:16:43PM +0200, Alexander Kukushkin wrote:
> Now that beta1 was released I hope you are not so busy and hence would like
> to follow up on this problem.
I am still working on something for the v18 cycle that I'd like to
present before the beginning of the next commit fest
Hi Michael and Kyotaro,
Now that beta1 was released I hope you are not so busy and hence would like
to follow up on this problem.
Regards,
--
Alexander Kukushkin
On Wed, 13 Mar 2024 at 04:56, Kyotaro Horiguchi wrote:
>
> At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi
> wrote in
> > Oh, I once saw the fix work, but seems not to be working after some
> > point. The new issue was a corruption of received WAL records on the
> > first standby, an
Hi Kyotaro,
On Wed, 13 Mar 2024 at 03:56, Kyotaro Horiguchi
wrote:
I identified the cause of the second issue. When I tried to replay the
> issue, the second standby accidentally received the old timeline's
> last page-spanning record till the end while the first standby was
> promoting (but it
At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi
wrote in
> Oh, I once saw the fix work, but seems not to be working after some
> point. The new issue was a corruption of received WAL records on the
> first standby, and it may be related to the setting.
I identified the cause of the
On Mon, Mar 11, 2024 at 04:43:32PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 6 Mar 2024 11:34:29 +0100, Alexander Kukushkin
> wrote in
>> Thank you for spending your time on it!
>
> You're welcome, but I aplogize for the delay in the work..
Thanks for spending time on this. Everybody is busy
At Wed, 6 Mar 2024 11:34:29 +0100, Alexander Kukushkin
wrote in
> Hmm, I think you meant to use wal_segment_size, because 0x10 is just
> 1MB. As a result, currently it works for you by accident.
Oh, I once saw the fix work, but seems not to be working after some
point. The new issue was a c
Hi Kyotaro,
Oh, now I understand what you mean. Is the retry supposed to happen only
when we are reading the very first page from the WAL file?
On Wed, 6 Mar 2024 at 09:57, Kyotaro Horiguchi
wrote:
>
> xlogrecovery.c:
> @@ -3460,8 +3490,10 @@ retry:
> * responsible for the validation.
At Tue, 5 Mar 2024 09:36:44 +0100, Alexander Kukushkin
wrote in
> Please find attached the patch fixing the problem and the updated TAP test
> that addresses Nit.
Record-level retries happen when the upper layer detects errors. In my
previous mail, I cited code that is intended to prevent this
Hello Michael, Kyotaro,
Please find attached the patch fixing the problem and the updated TAP test
that addresses Nit.
--
Regards,
--
Alexander Kukushkin
042_no_contrecord_switch.pl
Description: Perl program
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xl
At Fri, 01 Mar 2024 12:37:55 +0900 (JST), Kyotaro Horiguchi
wrote in
> Anyway, our current policy here is to avoid record-rereads beyond
> source switches. However, fixing this seems to require that source
> switches cause record rereads unless some additional information is
> available to know
At Fri, 01 Mar 2024 12:04:31 +0900 (JST), Kyotaro Horiguchi
wrote in
> At Fri, 01 Mar 2024 10:29:12 +0900 (JST), Kyotaro Horiguchi
> wrote in
> > After reading this, I came up with a possibility that walreceiver
> > recovers more quickly than the calling interval to
> > WaitForWALtoBecomeAvai
At Fri, 01 Mar 2024 10:29:12 +0900 (JST), Kyotaro Horiguchi
wrote in
> After reading this, I came up with a possibility that walreceiver
> recovers more quickly than the calling interval to
> WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call
> to the function WaitForWAL...()
At Fri, 1 Mar 2024 08:17:04 +0900, Michael Paquier wrote
in
> On Thu, Feb 29, 2024 at 05:44:25PM +0100, Alexander Kukushkin wrote:
> > On Thu, 29 Feb 2024 at 08:18, Kyotaro Horiguchi
> > wrote:
> >> In the first place, it's important to note that we do not guarantee
> >> that an async standby c
On Thu, Feb 29, 2024 at 05:44:25PM +0100, Alexander Kukushkin wrote:
> On Thu, 29 Feb 2024 at 08:18, Kyotaro Horiguchi
> wrote:
>> In the first place, it's important to note that we do not guarantee
>> that an async standby can always switch its replication connection to
>> the old primary or anot
Hi Kyotaro,
On Thu, 29 Feb 2024 at 08:18, Kyotaro Horiguchi
wrote:
In the first place, it's important to note that we do not guarantee
> that an async standby can always switch its replication connection to
> the old primary or another sibling standby. This is due to the
> variations in replicat
Hi Michael,
On Thu, 29 Feb 2024 at 06:05, Michael Paquier wrote:
>
> Wow. Have you seen that in an actual production environment?
>
Yes, we see it regularly, and it is reproducible in test environments as
well.
> my $start_page = start_of_page($end_lsn);
> my $wal_file = write_wal($primary,
At Thu, 29 Feb 2024 14:05:15 +0900, Michael Paquier wrote
in
> On Wed, Feb 28, 2024 at 11:19:41AM +0100, Alexander Kukushkin wrote:
> > I spent some time debugging an issue with standby not being able to
> > continue streaming after failover.
> >
> > The problem happens when standbys received on
On Wed, Feb 28, 2024 at 11:19:41AM +0100, Alexander Kukushkin wrote:
> I spent some time debugging an issue with standby not being able to
> continue streaming after failover.
>
> The problem happens when standbys received only the first part of the WAL
> record that spans multiple pages.
> In this
Hello hackers,
I spent some time debugging an issue with standby not being able to
continue streaming after failover.
The problem manifests itself by following messages in the log:
LOG: received SIGHUP, reloading configuration files
LOG: parameter "primary_conninfo" changed to "port=58669
host=
27 matches
Mail list logo