subject:"RE\: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly"

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-07-18 Thread Alexander Korotkov

On Fri, Jul 18, 2025 at 1:48 PM Amit Kapila wrote: > On Fri, Jul 18, 2025 at 4:15 PM Alexander Korotkov > wrote: > > > > On Sun, Jun 29, 2025 at 9:22 AM Hayato Kuroda (Fujitsu) > > wrote: > > > Thanks everyone who are working on the bug. IIUC the remained task is > > > to add code comments for

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-07-18 Thread Amit Kapila

On Fri, Jul 18, 2025 at 4:15 PM Alexander Korotkov wrote: > > On Sun, Jun 29, 2025 at 9:22 AM Hayato Kuroda (Fujitsu) > wrote: > > Thanks everyone who are working on the bug. IIUC the remained task is > > to add code comments for avoiding the same mistake again described here: > > > > > Sounds re

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-07-18 Thread Alexander Korotkov

On Sun, Jun 29, 2025 at 9:22 AM Hayato Kuroda (Fujitsu) wrote: > Thanks everyone who are working on the bug. IIUC the remained task is > to add code comments for avoiding the same mistake again described here: > > > Sounds reasonable. As per analysis till now, it seems removal of new > > assert is

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-07-18 Thread Alexander Korotkov

On Wed, Jul 2, 2025 at 8:20 PM vignesh C wrote: > On Sun, 29 Jun 2025 at 11:52, Hayato Kuroda (Fujitsu) > wrote: > > > > Dear hackers, > > > > Thanks everyone who are working on the bug. IIUC the remained task is > > to add code comments for avoiding the same mistake again described here: > > > >

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-28 Thread Hayato Kuroda (Fujitsu)

Dear hackers, Thanks everyone who are working on the bug. IIUC the remained task is to add code comments for avoiding the same mistake again described here: > Sounds reasonable. As per analysis till now, it seems removal of new > assert is correct and we just need to figure out the reason in all

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-26 Thread Dilip Kumar

On Thu, Jun 26, 2025 at 3:20 AM Alexander Korotkov wrote: > > On Wed, Jun 25, 2025 at 11:25 AM Dilip Kumar wrote: > > On Wed, Jun 25, 2025 at 1:18 PM Hayato Kuroda (Fujitsu) > > wrote: > > > Another idea is to call ReplicationSlotsComputeRequiredLSN() when at > > > least one > > > of the restar

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-26 Thread Hayato Kuroda (Fujitsu)

Dear Alexander, > Regarding last_saved_restart_lsn_updated, I think the opposite. I > think we should check if last_saved_restart_lsn_updated is set already > only if it could promise us some economy of resources. In our case > the main check only compares two fields of slot. And that fields ar

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-26 Thread Alexander Korotkov

Dear Kuroda-san, On Thu, Jun 26, 2025 at 6:46 AM Hayato Kuroda (Fujitsu) wrote: > > Dear Alexander, > > > > Good idea. But I think we should associate the "updated" flag > > directly to the fact that one slot (no matter logical or physical) > > changed its last_saved_restart_lsn. See the attach

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-25 Thread Hayato Kuroda (Fujitsu)

Dear Alexander, > > Good idea. But I think we should associate the "updated" flag > directly to the fact that one slot (no matter logical or physical) > changed its last_saved_restart_lsn. See the attached patch. I'm > going to push it if no objections. + /* +* Tr

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-25 Thread Alexander Korotkov

On Wed, Jun 25, 2025 at 11:25 AM Dilip Kumar wrote: > On Wed, Jun 25, 2025 at 1:18 PM Hayato Kuroda (Fujitsu) > wrote: > > Another idea is to call ReplicationSlotsComputeRequiredLSN() when at least > > one > > of the restart_lsn is updated, like attached. I feel this could reduce the > > comput

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-25 Thread Hayato Kuroda (Fujitsu)

Dear Dilip, Another idea is to call ReplicationSlotsComputeRequiredLSN() when at least one of the restart_lsn is updated, like attached. I feel this could reduce the computation bit more. Best regards, Hayato Kuroda FUJITSU LIMITED tmp.diffs Description: tmp.diffs

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-25 Thread Dilip Kumar

On Wed, Jun 25, 2025 at 1:18 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Dilip, > > Another idea is to call ReplicationSlotsComputeRequiredLSN() when at least one > of the restart_lsn is updated, like attached. I feel this could reduce the > computation > bit more. Right, that makes sense, if the

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-24 Thread Dilip Kumar

On Wed, Jun 25, 2025 at 10:57 AM Zhijie Hou (Fujitsu) wrote: > > Hi, > > After commit ca307d5, I noticed another crash when testing > some other logical replication features. > > The server with max_replication_slots set to 0 would crash when executing > CHECKPOINT. > > TRAP: failed Assert("Repli

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-24 Thread Zhijie Hou (Fujitsu)

Hi, After commit ca307d5, I noticed another crash when testing some other logical replication features. The server with max_replication_slots set to 0 would crash when executing CHECKPOINT. TRAP: failed Assert("ReplicationSlotCtl != NULL"), File: "slot.c", Line: 1162, PID: 577315 postgres: che

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-23 Thread Alexander Korotkov

On Mon, Jun 23, 2025 at 8:04 PM vignesh C wrote: > > On Mon, 23 Jun 2025 at 04:36, Alexander Korotkov wrote: > > > > On Fri, Jun 20, 2025 at 2:24 PM vignesh C wrote: > > > On Fri, 20 Jun 2025 at 05:54, Alexander Korotkov > > > wrote: > > > > Dear Kuroda-san, > > > > > > > > On Thu, Jun 19, 202

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-23 Thread vignesh C

On Mon, 23 Jun 2025 at 04:36, Alexander Korotkov wrote: > > On Fri, Jun 20, 2025 at 2:24 PM vignesh C wrote: > > On Fri, 20 Jun 2025 at 05:54, Alexander Korotkov > > wrote: > > > Dear Kuroda-san, > > > > > > On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu) > > > wrote: > > > > > > Regar

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-22 Thread Alexander Korotkov

On Fri, Jun 20, 2025 at 2:24 PM vignesh C wrote: > On Fri, 20 Jun 2025 at 05:54, Alexander Korotkov wrote: > > Dear Kuroda-san, > > > > On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu) > > wrote: > > > > > Regarding assertion failure, I've found that assert in > > > > > PhysicalConfirmRec

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-21 Thread Alexander Korotkov

Hi, Vignesh! On Fri, Jun 20, 2025 at 3:42 PM vignesh C wrote: > On Fri, 20 Jun 2025 at 05:54, Alexander Korotkov wrote: > > On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu) > > wrote: > > > > > Regarding assertion failure, I've found that assert in > > > > > PhysicalConfirmReceivedLocati

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-19 Thread Amit Kapila

On Fri, Jun 20, 2025 at 5:48 AM Alexander Korotkov wrote: > > > > > If what I said above is correct, then the following part of the commit > > message will be incorrect: > > "As stated in the ReplicationSlotReserveWal() comment, this is not > > always true. Additionally, this issue has been spotte

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-19 Thread Alexander Korotkov

Dear Kuroda-san, On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu) wrote: > > > Regarding assertion failure, I've found that assert in > > > PhysicalConfirmReceivedLocation() conflicts with restart_lsn > > > previously set by ReplicationSlotReserveWal(). As I can see, > > > ReplicationSlot

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-19 Thread Alexander Korotkov

On Thu, Jun 19, 2025 at 1:29 PM Amit Kapila wrote: > On Wed, Jun 18, 2025 at 10:17 PM Alexander Korotkov > wrote: > > > > On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov > > wrote: > > > > I think, it is a good idea. Once we do not use the generated data, it > > > > is ok > > > > just to genera

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-19 Thread Hayato Kuroda (Fujitsu)

Dear Amit, Alexander, > > Regarding assertion failure, I've found that assert in > > PhysicalConfirmReceivedLocation() conflicts with restart_lsn > > previously set by ReplicationSlotReserveWal(). As I can see, > > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn. > > So, it d

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-19 Thread Amit Kapila

On Wed, Jun 18, 2025 at 10:17 PM Alexander Korotkov wrote: > > On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov > wrote: > > > I think, it is a good idea. Once we do not use the generated data, it is > > > ok > > > just to generate WAL segments using the proposed function. I've tested > > > this

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-18 Thread Alexander Korotkov

On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov wrote: > > I think, it is a good idea. Once we do not use the generated data, it is ok > > just to generate WAL segments using the proposed function. I've tested this > > function. The tests worked as expected with and without the fix. The > > attach

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-18 Thread vignesh C

On Wed, 18 Jun 2025 at 14:35, Vitaly Davydov wrote: > > Dear Hayato, > > > To confirm, can you tell me the theory why the walsender received old LSN? > > It is sent by the walreceiver, so is there a case that > > LogstreamResult.Flush can go backward? > > Not sure we can accept the situation. > >

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-18 Thread Hayato Kuroda (Fujitsu)

Dear Vitaly, I've been working on the bug... > This assert was introduced in the patch. Now, I think, it is a wrong one. Let > me > please explain one of the possible scenarios when it can be triggered. In case > of physical replication, when walsender receives a standby reply message, it > call

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-17 Thread Tom Lane

vignesh C writes: > While tracking buildfarm for one of other commits, I noticed this failure: > TRAP: failed Assert("s->data.restart_lsn >= > s->last_saved_restart_lsn"), File: > "../pgsql/src/backend/replication/slot.c", Line: 1813, PID: 3945797 My animal mamba is also showing this assertion fa

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-17 Thread Alexander Korotkov

Hi, Vitaly! On Tue, Jun 17, 2025 at 6:02 PM Vitaly Davydov wrote: > Thank you for reporting the issue. > > >While tracking buildfarm for one of other commits, I noticed this failure: > >TRAP: failed Assert("s->data.restart_lsn >= > >s->last_saved_restart_lsn"), File: > >"../pgsql/src/backend/repl

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-16 Thread vignesh C

Hi Alexander, While tracking buildfarm for one of other commits, I noticed this failure: TRAP: failed Assert("s->data.restart_lsn >= s->last_saved_restart_lsn"), File: "../pgsql/src/backend/replication/slot.c", Line: 1813, PID: 3945797 postgres: standby: checkpointer (ExceptionalCondition+0x83) [0

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-16 Thread Hayato Kuroda (Fujitsu)

Dear Alexander, > Thank you! All of these totally make sense. The updated patch is attached. Thanks for the update. I found another point. ``` -# Another 2M rows; that's about 260MB (~20 segments) worth of WAL. +# Another 50K rows; that's about 86MB (~5 segments) worth of WAL. $node->safe_psq

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-16 Thread Alexander Korotkov

Dear Kuroda-san, On Mon, Jun 16, 2025 at 12:11 PM Hayato Kuroda (Fujitsu) wrote: > Thanks for pushing the fix patch! BTW, I have few comments for your commits. > Can you check and include them if needed? > > 01. > ``` > $node->append_conf('postgresql.conf', > "shared_preload_libraries = '

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-16 Thread Hayato Kuroda (Fujitsu)

Dear Alexander, Thanks for pushing the fix patch! BTW, I have few comments for your commits. Can you check and include them if needed? 01. ``` $node->append_conf('postgresql.conf', "shared_preload_libraries = 'injection_points'"); ``` No need to set shared_preload_libraries in 046/047. I

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-15 Thread Alexander Korotkov

Hi, Tom! On Sun, Jun 15, 2025 at 7:05 PM Tom Lane wrote: > BTW, while you're cleaning up this commit, could you remove the > excess newlines in some of the "note" commands in 046 and 047, like > > note('starting checkpoint\n'); > > This produces bizarre output, as shown in the buildfarm logs: Th

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-15 Thread Tom Lane

BTW, while you're cleaning up this commit, could you remove the excess newlines in some of the "note" commands in 046 and 047, like note('starting checkpoint\n'); This produces bizarre output, as shown in the buildfarm logs: [04:04:38.953](603.550s) # starting checkpoint\\n

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-15 Thread Alexander Lakhin

15.06.2025 14:02, Alexander Korotkov wrote: Could you, please, check this patch? On my system it makes 046 and 047 execute in 140 secs with -O0 and -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE. Thank you for the patch! It decreases the test's duration significantly: # +++ tap check in sr

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-15 Thread Alexander Korotkov

On Sun, Jun 15, 2025 at 12:00 PM Alexander Lakhin wrote: > > Hello Alexander, > > 10.06.2025 23:14, Alexander Korotkov wrote: > > So, my proposal is to commit the attached patchset to the HEAD, and > commit [1] to the back branches. Any objections? > > > As the buildfarm animal prion shows [1], t

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-15 Thread Alexander Korotkov

Hi, Alexander! On Sun, Jun 15, 2025 at 12:00 PM Alexander Lakhin wrote: > > Hello Alexander, > > 10.06.2025 23:14, Alexander Korotkov wrote: > > So, my proposal is to commit the attached patchset to the HEAD, and > commit [1] to the back branches. Any objections? > > > As the buildfarm animal pr

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-15 Thread Alexander Lakhin

Hello Alexander, 10.06.2025 23:14, Alexander Korotkov wrote: So, my proposal is to commit the attached patchset to the HEAD, and commit [1] to the back branches. Any objections? As the buildfarm animal prion shows [1], the 046_checkpoint_logical_slot test fails with "-DRELCACHE_FORCE_RELEASE

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-12 Thread Amit Kapila

On Wed, Jun 11, 2025 at 1:44 AM Alexander Korotkov wrote: > > On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov > wrote: > > > I think we can use this approach for HEAD and probably keep the > > > previous idea for backbranches. Keeping some value in shared_memory > > > per slot sounds risky to me i

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-10 Thread Alexander Korotkov

On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov wrote: > > I think we can use this approach for HEAD and probably keep the > > previous idea for backbranches. Keeping some value in shared_memory > > per slot sounds risky to me in terms of introducing new bugs. > > Not sure, what kind of problems may

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-09 Thread Vitaly Davydov

Hi Amit, > I think we can use this approach for HEAD and probably keep the > previous idea for backbranches. Keeping some value in shared_memory > per slot sounds risky to me in terms of introducing new bugs. Not sure, what kind of problems may occur. I propose to allocate in shmem an array of la

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-05 Thread Amit Kapila

On Tue, Jun 3, 2025 at 6:51 PM Alexander Korotkov wrote: > > > > > As per my understanding, for logical slots, effective_xmin is only set > > during the initial copy phase (or say if one has to export a > > snapshot), after that, its value won't change. Please read the > > comments in CreateInitDe

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-05 Thread Amit Kapila

On Thu, Jun 5, 2025 at 8:51 PM Vitaly Davydov wrote: > > Dear Alexander, Amit > > Alexander Korotkov wrote: > > Also, I've changed ReplicationSlotsComputeRequiredLSN() call to > > CheckPointReplicationSlots() to update required LSN after > > SaveSlotToPath() updated last_saved_restart_lsn. This h

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-05 Thread Vitaly Davydov

Dear Alexander, Amit Alexander Korotkov wrote: > Also, I've changed ReplicationSlotsComputeRequiredLSN() call to > CheckPointReplicationSlots() to update required LSN after > SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass > checks in 001_stream_rep.pl without additional hacks

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-03 Thread Alexander Korotkov

On Mon, Jun 2, 2025 at 2:53 PM Amit Kapila wrote: > > On Thu, May 29, 2025 at 5:29 PM Alexander Korotkov > wrote: > > > > On Tue, May 27, 2025 at 2:26 PM Amit Kapila wrote: > > > Yeah, we should be able to change ABI during beta, but I can't comment > > > on the idea of effective_restart_lsn wi

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-02 Thread Amit Kapila

On Thu, May 29, 2025 at 5:29 PM Alexander Korotkov wrote: > > On Tue, May 27, 2025 at 2:26 PM Amit Kapila wrote: > > Yeah, we should be able to change ABI during beta, but I can't comment > > on the idea of effective_restart_lsn without seeing the patch or a > > detailed explanation of this idea.

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-29 Thread Alexander Korotkov

On Tue, May 27, 2025 at 2:26 PM Amit Kapila wrote: > Yeah, we should be able to change ABI during beta, but I can't comment > on the idea of effective_restart_lsn without seeing the patch or a > detailed explanation of this idea. Could you, please, check the patch [1]. It implements this idea ex

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-27 Thread Amit Kapila

On Tue, May 27, 2025 at 2:48 PM Alexander Korotkov wrote: > > On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov > wrote: > > > > On Tue, May 27, 2025 at 7:08 AM Amit Kapila wrote: > > > On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov > > > wrote: > > > > > > > > On Mon, May 26, 2025 at 2:

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-27 Thread Alexander Korotkov

On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov wrote: > > On Tue, May 27, 2025 at 7:08 AM Amit Kapila wrote: > > On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov > > wrote: > > > > > > On Mon, May 26, 2025 at 2:43 PM Amit Kapila > > > wrote: > > > > > > > > On Mon, May 26, 2025 at 3:52

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-27 Thread Alexander Korotkov

On Tue, May 27, 2025 at 7:08 AM Amit Kapila wrote: > On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov > wrote: > > > > On Mon, May 26, 2025 at 2:43 PM Amit Kapila wrote: > > > > > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov > > > wrote: > > > > > OTOH, if we don't want to adjust physic

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-26 Thread Amit Kapila

On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov wrote: > > On Mon, May 26, 2025 at 2:43 PM Amit Kapila wrote: > > > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov > > wrote: > > > OTOH, if we don't want to adjust physical > > slot machinery, it seems saving the logical slots to disk immed

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-26 Thread Alexander Korotkov

On Mon, May 26, 2025 at 2:43 PM Amit Kapila wrote: > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov > wrote: > > > > Dear Alexander, Amit, All > > > > > Amit wrote: > > > > Is my understanding correct that we need 0001 because > > > > PhysicalConfirmReceivedLocation() doesn't save the slot to

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-26 Thread Vitaly Davydov

Dear Amit, > OTOH, if we don't want to adjust physical > slot machinery, it seems saving the logical slots to disk immediately > when its restart_lsn is updated is a waste of effort after your patch, > no? If so, why are we okay with that? I agree, that saving logical slots at advance is a possib

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-26 Thread Alexander Korotkov

On Mon, May 26, 2025 at 9:49 AM Amit Kapila wrote: > > On Sat, May 24, 2025 at 6:59 PM Alexander Korotkov > wrote: > > > > Hi, Amit! > > > > Thank you for your attention to this patchset! > > > > On Sat, May 24, 2025 at 2:15 PM Amit Kapila wrote: > > > On Sat, May 24, 2025 at 4:08 PM Alexander

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-26 Thread Amit Kapila

On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov wrote: > > Dear Alexander, Amit, All > > > Amit wrote: > > > Is my understanding correct that we need 0001 because > > > PhysicalConfirmReceivedLocation() doesn't save the slot to disk after > > > changing the slot's restart_lsn? > > > > Yes. Also, e

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-26 Thread Vitaly Davydov

Dear Alexander, Amit, All > Amit wrote: > > Is my understanding correct that we need 0001 because > > PhysicalConfirmReceivedLocation() doesn't save the slot to disk after > > changing the slot's restart_lsn? > > Yes. Also, even if it would save slot to the disk, there is still > race condition t

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-25 Thread Amit Kapila

On Sat, May 24, 2025 at 6:59 PM Alexander Korotkov wrote: > > Hi, Amit! > > Thank you for your attention to this patchset! > > On Sat, May 24, 2025 at 2:15 PM Amit Kapila wrote: > > On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov > > wrote: > > > > > > I spend more time on this. The next re

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-24 Thread Alexander Korotkov

Hi, Amit! Thank you for your attention to this patchset! On Sat, May 24, 2025 at 2:15 PM Amit Kapila wrote: > On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov > wrote: > > > > I spend more time on this. The next revision is attached. It > > contains revised comments and other cosmetic chan

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-24 Thread Amit Kapila

On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov wrote: > > I spend more time on this. The next revision is attached. It > contains revised comments and other cosmetic changes. I'm going to > backpatch 0001 to all supported branches, > Is my understanding correct that we need 0001 because Ph

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-24 Thread Alexander Korotkov

On Fri, May 23, 2025 at 12:10 AM Alexander Korotkov wrote: > > Hi, Vitaly! > > On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov > wrote: > > > > Thank you very much for the review! > > > > > The patchset doesn't seem to build after 371f2db8b0, which adjusted > > > the signature of the INJECTION_PO

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-22 Thread Alexander Korotkov

Hi, Vitaly! On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov wrote: > > Thank you very much for the review! > > > The patchset doesn't seem to build after 371f2db8b0, which adjusted > > the signature of the INJECTION_POINT() macro. Could you, please, > > update the patchset accordingly. > > I've u

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-20 Thread Vitaly Davydov

Hi Alexander, Thank you very much for the review! > The patchset doesn't seem to build after 371f2db8b0, which adjusted > the signature of the INJECTION_POINT() macro. Could you, please, > update the patchset accordingly. I've updated the patch (see attached). Thanks. > I see in 0004 patch we'

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-18 Thread Alexander Korotkov

Hi Vitaly! On Fri, May 2, 2025 at 8:47 PM Vitaly Davydov wrote: > Thank you for the attention to the patch. I updated a patch with a better > solution for the master branch which can be easily backported to the other > branches as we agree on the final solution. > > Two tests are introduced which

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-02 Thread Vitaly Davydov

Dear All, Thank you for the attention to the patch. I updated a patch with a better solution for the master branch which can be easily backported to the other branches as we agree on the final solution. Two tests are introduced which are based on Tomas Vondra's test for logical slots with inject

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-29 Thread Masahiko Sawada

On Mon, Apr 28, 2025 at 6:39 PM Alexander Korotkov wrote: > > On Tue, Apr 29, 2025 at 4:03 AM Masahiko Sawada wrote: > > > > On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov > > wrote: > > > > > > > I have a question - is there any interest to backport the solution into > > > > existing major

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-28 Thread Alexander Korotkov

On Tue, Apr 29, 2025 at 4:03 AM Masahiko Sawada wrote: > > On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov > wrote: > > > > > I have a question - is there any interest to backport the solution into > > > existing major releases? > > > > As long as this is the bug, it should be backpatched to

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-28 Thread Masahiko Sawada

On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov wrote: > > > I have a question - is there any interest to backport the solution into > > existing major releases? > > As long as this is the bug, it should be backpatched to all supported > affected releases. Yes, but I think we cannot back-patch

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-28 Thread Alexander Korotkov

On Thu, Apr 24, 2025 at 5:32 PM Vitaly Davydov wrote: > Thank you for the review. I apologize for a late reply. I missed your email. > > > 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need > > to advance the position of WAL needed by replication slots, the usage > > pattern pr

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-24 Thread Vitaly Davydov

Hi Alexander, Thank you for the review. I apologize for a late reply. I missed your email. > 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need > to advance the position of WAL needed by replication slots, the usage > pattern probably could be changed. Thus, we probably need

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-03 Thread Alexander Korotkov

Hi, Vitaly! On Mon, Mar 3, 2025 at 5:12 PM Vitaly Davydov wrote: > The slot data is flushed to the disk at the beginning of checkpoint. If > an existing slot is advanced in the middle of checkpoint execution, its > advanced restart LSN is taken to calculate the oldest LSN for WAL > segments remov

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-03-03 Thread Vitaly Davydov

Dear Hackers, Let me please introduce a new version of the patch. Patch description: The slot data is flushed to the disk at the beginning of checkpoint. If an existing slot is advanced in the middle of checkpoint execution, its advanced restart LSN is taken to calculate the oldest LSN for WAL s

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-12-13 Thread Vitaly Davydov

> On 11/21/24 14:59, Tomas Vondra wrote: > > I don't have a great idea how to improve this. It seems wrong for > ReplicationSlotsComputeRequiredLSN() to calculate the LSN using values > from dirty slots, so maybe it should simply retry if any slot is dirty? > Or retry on that one slot? But various

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Tomas Vondra

On 11/21/24 14:59, Tomas Vondra wrote: > > ... > > But then there's the SQL API - pg_logical_slot_get_changes(). And it > turns out it ends up syncing the slot to disk pretty often, because for > RUNNING_XACTS we call LogicalDecodingProcessRecord() + standby_decode(), > which ends up calling SaveS

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Давыдов Виталий

On Thursday, November 21, 2024 17:56 MSK, "Vitaly Davydov" wrote: > I'm trying to create a perl test to reproduce it. Please, give me some time > to create the test script. Attached is the test script which reproduces my problem. It should be run on a patched postgresql with the following cha

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Vitaly Davydov

Hi Tomas, Thank you for the reply and your interest to the investigation. On Wednesday, November 20, 2024 20:24 MSK, Tomas Vondra wrote: > If an existing physical slot is advanced in the middle of checkpoint > execution, WAL segments, which are related to saved on disk restart LSN > may be r

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Tomas Vondra

On 11/20/24 23:19, Tomas Vondra wrote: > On 11/20/24 18:24, Tomas Vondra wrote: >> >> ... >> >> What confuses me a bit is that we update the restart_lsn (and call >> ReplicationSlotsComputeRequiredLSN() to recalculate the global value) >> all the time. Walsender does that in PhysicalConfirmRecei

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Tomas Vondra

On 10/31/24 11:18, Vitaly Davydov wrote: > Dear Hackers, > > > > I'd like to discuss a problem with replication slots's restart LSN. > Physical slots are saved to disk at the beginning of checkpoint. At the > end of checkpoint, old WAL segments are recycled or removed from disk, > if they are n

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Tomas Vondra

On 11/20/24 18:24, Tomas Vondra wrote: > > ... > > What confuses me a bit is that we update the restart_lsn (and call > ReplicationSlotsComputeRequiredLSN() to recalculate the global value) > all the time. Walsender does that in PhysicalConfirmReceivedLocation for > example. So we actually see the

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Tomas Vondra

On 11/20/24 14:40, Vitaly Davydov wrote: > Dear Hackers, > > > > To ping the topic, I'd like to clarify what may be wrong with the idea > described here, because I do not see any interest from the community. > The topic is related to physical replication. The primary idea is to > define the

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Vitaly Davydov

Dear Hackers, To ping the topic, I'd like to clarify what may be wrong with the idea described here, because I do not see any interest from the community. The topic is related to physical replication. The primary idea is to define the horizon of WAL segments (files) removal based on saved on

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-07 Thread Vitaly Davydov

Dear Hackers, I'd like to introduce an improved version of my patch (see the attached file). My original idea was to take into account saved on disk restart_lsn (slot→restart_lsn_flushed) for persistent slots when removing WAL segment files. It helps tackle errors like: ERROR: requested WAL s

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-10-31 Thread Vitaly Davydov

Sorry, attached the missed patch. On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov" wrote: Dear Hackers, I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments a

82 matches

Mail list logo