Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-02 Thread Vitaly Davydov
Dear All, Thank you for the attention to the patch. I updated a patch with a better solution for the master branch which can be easily backported to the other branches as we agree on the final solution. Two tests are introduced which are based on Tomas Vondra's test for logical slots with inject

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-29 Thread Masahiko Sawada
On Mon, Apr 28, 2025 at 6:39 PM Alexander Korotkov wrote: > > On Tue, Apr 29, 2025 at 4:03 AM Masahiko Sawada wrote: > > > > On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov > > wrote: > > > > > > > I have a question - is there any interest to backport the solution into > > > > existing major

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-28 Thread Alexander Korotkov
On Tue, Apr 29, 2025 at 4:03 AM Masahiko Sawada wrote: > > On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov > wrote: > > > > > I have a question - is there any interest to backport the solution into > > > existing major releases? > > > > As long as this is the bug, it should be backpatched to

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-28 Thread Masahiko Sawada
On Mon, Apr 28, 2025 at 8:17 AM Alexander Korotkov wrote: > > > I have a question - is there any interest to backport the solution into > > existing major releases? > > As long as this is the bug, it should be backpatched to all supported > affected releases. Yes, but I think we cannot back-patch

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-28 Thread Alexander Korotkov
On Thu, Apr 24, 2025 at 5:32 PM Vitaly Davydov wrote: > Thank you for the review. I apologize for a late reply. I missed your email. > > > 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need > > to advance the position of WAL needed by replication slots, the usage > > pattern pr

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-24 Thread Vitaly Davydov
Hi Alexander, Thank you for the review. I apologize for a late reply. I missed your email. > 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need > to advance the position of WAL needed by replication slots, the usage > pattern probably could be changed. Thus, we probably need

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-03 Thread Alexander Korotkov
Hi, Vitaly! On Mon, Mar 3, 2025 at 5:12 PM Vitaly Davydov wrote: > The slot data is flushed to the disk at the beginning of checkpoint. If > an existing slot is advanced in the middle of checkpoint execution, its > advanced restart LSN is taken to calculate the oldest LSN for WAL > segments remov

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-03-03 Thread Vitaly Davydov
Dear Hackers, Let me please introduce a new version of the patch. Patch description: The slot data is flushed to the disk at the beginning of checkpoint. If an existing slot is advanced in the middle of checkpoint execution, its advanced restart LSN is taken to calculate the oldest LSN for WAL s

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-12-13 Thread Vitaly Davydov
> On 11/21/24 14:59, Tomas Vondra wrote: > > I don't have a great idea how to improve this. It seems wrong for > ReplicationSlotsComputeRequiredLSN() to calculate the LSN using values > from dirty slots, so maybe it should simply retry if any slot is dirty? > Or retry on that one slot? But various

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Tomas Vondra
On 11/21/24 14:59, Tomas Vondra wrote: > > ... > > But then there's the SQL API - pg_logical_slot_get_changes(). And it > turns out it ends up syncing the slot to disk pretty often, because for > RUNNING_XACTS we call LogicalDecodingProcessRecord() + standby_decode(), > which ends up calling SaveS

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Давыдов Виталий
On Thursday, November 21, 2024 17:56 MSK, "Vitaly Davydov" wrote: > I'm trying to create a perl test to reproduce it. Please, give me some time > to create the test script. Attached is the test script which reproduces my problem. It should be run on a patched postgresql with the following cha

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Vitaly Davydov
Hi Tomas,   Thank you for the reply and your interest to the investigation. On Wednesday, November 20, 2024 20:24 MSK, Tomas Vondra wrote:   > If an existing physical slot is advanced in the middle of checkpoint > execution, WAL segments, which are related to saved on disk restart LSN > may be r

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-21 Thread Tomas Vondra
On 11/20/24 23:19, Tomas Vondra wrote: > On 11/20/24 18:24, Tomas Vondra wrote: >> >> ... >> >> What confuses me a bit is that we update the restart_lsn (and call >> ReplicationSlotsComputeRequiredLSN() to recalculate the global value) >> all the time. Walsender does that in PhysicalConfirmRecei

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Tomas Vondra
On 10/31/24 11:18, Vitaly Davydov wrote: > Dear Hackers, > >   > > I'd like to discuss a problem with replication slots's restart LSN. > Physical slots are saved to disk at the beginning of checkpoint. At the > end of checkpoint, old WAL segments are recycled or removed from disk, > if they are n

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Tomas Vondra
On 11/20/24 18:24, Tomas Vondra wrote: > > ... > > What confuses me a bit is that we update the restart_lsn (and call > ReplicationSlotsComputeRequiredLSN() to recalculate the global value) > all the time. Walsender does that in PhysicalConfirmReceivedLocation for > example. So we actually see the

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Tomas Vondra
On 11/20/24 14:40, Vitaly Davydov wrote: > Dear Hackers, > >   > > To ping the topic, I'd like to clarify what may be wrong with the idea > described here, because I do not see any interest from the community. > The topic is related to physical replication. The primary idea is to > define the

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-20 Thread Vitaly Davydov
Dear Hackers,   To ping the topic, I'd like to clarify what may be wrong with the idea described here, because I do not see any interest from the community. The topic is related to physical replication. The primary idea is to define the horizon of WAL segments (files) removal based on saved on

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-11-07 Thread Vitaly Davydov
Dear Hackers,   I'd like to introduce an improved version of my patch (see the attached file). My original idea was to take into account saved on disk restart_lsn (slot→restart_lsn_flushed) for persistent slots when removing WAL segment files. It helps tackle errors like: ERROR: requested WAL s

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-10-31 Thread Vitaly Davydov
Sorry, attached the missed patch. On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov" wrote: Dear Hackers,   I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments a

Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2024-10-31 Thread Vitaly Davydov
Dear Hackers,   I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments are recycled or removed from disk, if they are not kept by slot's restart_lsn values.   If an existing