Re: Review for GetWALAvailability()

2020-06-26 Thread Alvaro Herrera
On 2020-Jun-25, Kyotaro Horiguchi wrote: > It is error-prone restriction, as discussed before. > > Without changing updator-side, invalid restart_lsn AND valid > invalidated_at can be regarded as the lost state. With the following > change suggested by Fujii-san we can avoid the confusing status.

Re: Review for GetWALAvailability()

2020-06-25 Thread Kyotaro Horiguchi
At Thu, 25 Jun 2020 14:35:34 +0900, Fujii Masao wrote in > > > On 2020/06/25 12:57, Alvaro Herrera wrote: > > On 2020-Jun-25, Fujii Masao wrote: > > > >>/* > >> * Find the oldest extant segment file. We get 1 until checkpoint > >> removes > >> * the first WAL segment file since s

Re: Review for GetWALAvailability()

2020-06-24 Thread Fujii Masao
On 2020/06/25 12:57, Alvaro Herrera wrote: On 2020-Jun-25, Fujii Masao wrote: /* * Find the oldest extant segment file. We get 1 until checkpoint removes * the first WAL segment file since startup, which causes the status being * wrong under certain abnor

Re: Review for GetWALAvailability()

2020-06-24 Thread Alvaro Herrera
On 2020-Jun-25, Fujii Masao wrote: > /* >* Find the oldest extant segment file. We get 1 until checkpoint > removes >* the first WAL segment file since startup, which causes the status > being >* wrong under certain abnormal conditions but that doesn't actually > h

Re: Review for GetWALAvailability()

2020-06-24 Thread Fujii Masao
On 2020/06/25 3:27, Alvaro Herrera wrote: Thanks for those corrections. I have pushed this. I think all problems Masao-san reported have been dealt with, so we're done here. Sorry for my late to reply here... Thanks for committing the patch and improving the feature! /*

Re: Review for GetWALAvailability()

2020-06-24 Thread Alvaro Herrera
Thanks for those corrections. I have pushed this. I think all problems Masao-san reported have been dealt with, so we're done here. Thanks! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Review for GetWALAvailability()

2020-06-23 Thread Kyotaro Horiguchi
At Tue, 23 Jun 2020 19:06:25 -0400, Alvaro Herrera wrote in > On 2020-Jun-16, Kyotaro Horiguchi wrote: > > > I saw that the "reserved" is the state where slots are working to > > retain segments, and "normal" is the state to indicate that "WAL > > segments are within max_wal_size", which is ort

Re: Review for GetWALAvailability()

2020-06-23 Thread Alvaro Herrera
On 2020-Jun-16, Kyotaro Horiguchi wrote: > I saw that the "reserved" is the state where slots are working to > retain segments, and "normal" is the state to indicate that "WAL > segments are within max_wal_size", which is orthogonal to the notion > of "reserved". So it seems to me useless when th

Re: Review for GetWALAvailability()

2020-06-23 Thread Kyotaro Horiguchi
Thanks for looking this. At Fri, 19 Jun 2020 18:23:59 -0400, Alvaro Herrera wrote in > On 2020-Jun-17, Kyotaro Horiguchi wrote: > > > @@ -9524,7 +9533,7 @@ GetWALAvailability(XLogRecPtr targetLSN) > > * the first WAL segment file since startup, which causes the status > > being > >

Re: Review for GetWALAvailability()

2020-06-19 Thread Alvaro Herrera
On 2020-Jun-17, Kyotaro Horiguchi wrote: > @@ -9524,7 +9533,7 @@ GetWALAvailability(XLogRecPtr targetLSN) >* the first WAL segment file since startup, which causes the status > being >* wrong under certain abnormal conditions but that doesn't actually > harm. >*/ > -

Re: Review for GetWALAvailability()

2020-06-19 Thread Fujii Masao
On 2020/06/18 16:36, Kyotaro Horiguchi wrote: Mmm. I hurried too much.. At Thu, 18 Jun 2020 16:32:59 +0900 (JST), Kyotaro Horiguchi wrote in If name is specified (so slot is NULL) to ReplicationSlotAcquireInternal and the slot is not found, the ereport in following code dereferences NULL.

Re: Review for GetWALAvailability()

2020-06-18 Thread Kyotaro Horiguchi
Mmm. I hurried too much.. At Thu, 18 Jun 2020 16:32:59 +0900 (JST), Kyotaro Horiguchi wrote in > If name is specified (so slot is NULL) to > ReplicationSlotAcquireInternal and the slot is not found, the ereport > in following code dereferences NULL. That's bogus. It is using name in that case.

Re: Review for GetWALAvailability()

2020-06-18 Thread Kyotaro Horiguchi
At Thu, 18 Jun 2020 14:54:47 +0900, Fujii Masao wrote in > Sorry, this caused compiler failure. So I fixed that and > attached the updated version of the patch. At Thu, 18 Jun 2020 14:40:55 +0900, Fujii Masao wrote in > > + errmsg("replication slot \"%s\" does not exist", name))); > > The er

Re: Review for GetWALAvailability()

2020-06-17 Thread Fujii Masao
On 2020/06/18 14:40, Fujii Masao wrote: On 2020/06/18 11:44, Kyotaro Horiguchi wrote: At Wed, 17 Jun 2020 20:13:01 +0900, Fujii Masao wrote in ReplicationSlotAcquireInternal.  I think we should call ConditionVariablePrepareToSleep before the sorrounding for statement block. OK, so what

Re: Review for GetWALAvailability()

2020-06-17 Thread Fujii Masao
On 2020/06/18 11:44, Kyotaro Horiguchi wrote: At Wed, 17 Jun 2020 20:13:01 +0900, Fujii Masao wrote in ReplicationSlotAcquireInternal. I think we should call ConditionVariablePrepareToSleep before the sorrounding for statement block. OK, so what about the attached patch? I added Condition

Re: Review for GetWALAvailability()

2020-06-17 Thread Kyotaro Horiguchi
At Wed, 17 Jun 2020 20:13:01 +0900, Fujii Masao wrote in > > ReplicationSlotAcquireInternal. I think we should call > > ConditionVariablePrepareToSleep before the sorrounding for statement > > block. > > OK, so what about the attached patch? I added > ConditionVariablePrepareToSleep() > just b

Re: Review for GetWALAvailability()

2020-06-17 Thread Fujii Masao
On 2020/06/18 3:04, Alvaro Herrera wrote: I think passing the slot name when the slot is also passed is useless and wasteful; it'd be better to pass NULL for the name and ignore the strcmp() in that case -- in fact I suggest to forbid passing both name and slot. (Any failure there would risk

Re: Review for GetWALAvailability()

2020-06-17 Thread Alvaro Herrera
On 2020-Jun-17, Kyotaro Horiguchi wrote: > @@ -342,7 +351,14 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) > else > nulls[i++] = true; > > - walstate = GetWALAvailability(slot_contents.data.restart_lsn); > + /* use last_invalidated_lsn

Re: Review for GetWALAvailability()

2020-06-17 Thread Alvaro Herrera
I think passing the slot name when the slot is also passed is useless and wasteful; it'd be better to pass NULL for the name and ignore the strcmp() in that case -- in fact I suggest to forbid passing both name and slot. (Any failure there would risk raising an error during checkpoint, which is un

Re: Review for GetWALAvailability()

2020-06-17 Thread Fujii Masao
On 2020/06/17 17:30, Kyotaro Horiguchi wrote: At Wed, 17 Jun 2020 17:01:11 +0900, Fujii Masao wrote in On 2020/06/17 12:10, Kyotaro Horiguchi wrote: At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera wrote in On 2020-Jun-17, Fujii Masao wrote: On 2020/06/17 3:50, Alvaro Herrera wrote:

Re: Review for GetWALAvailability()

2020-06-17 Thread Kyotaro Horiguchi
At Wed, 17 Jun 2020 17:01:11 +0900, Fujii Masao wrote in > > > On 2020/06/17 12:10, Kyotaro Horiguchi wrote: > > At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera > > wrote in > >> On 2020-Jun-17, Fujii Masao wrote: > >>> On 2020/06/17 3:50, Alvaro Herrera wrote: > >> > >>> So InvalidateObso

Re: Review for GetWALAvailability()

2020-06-17 Thread Fujii Masao
On 2020/06/17 12:10, Kyotaro Horiguchi wrote: At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera wrote in On 2020-Jun-17, Fujii Masao wrote: On 2020/06/17 3:50, Alvaro Herrera wrote: So InvalidateObsoleteReplicationSlots() can terminate normal backends. But do we want to do this? If we w

Re: Review for GetWALAvailability()

2020-06-16 Thread Kyotaro Horiguchi
At Wed, 17 Jun 2020 10:17:07 +0900 (JST), Kyotaro Horiguchi wrote in > At Tue, 16 Jun 2020 14:31:43 -0400, Alvaro Herrera > wrote in > > On 2020-Jun-16, Kyotaro Horiguchi wrote: > > > > > I noticed the another issue. If some required WALs are removed, the > > > slot will be "invalidated", th

Re: Review for GetWALAvailability()

2020-06-16 Thread Kyotaro Horiguchi
At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera wrote in > On 2020-Jun-17, Fujii Masao wrote: > > On 2020/06/17 3:50, Alvaro Herrera wrote: > > > So InvalidateObsoleteReplicationSlots() can terminate normal backends. > > But do we want to do this? If we want, we should add the note about thi

Re: Review for GetWALAvailability()

2020-06-16 Thread Alvaro Herrera
On 2020-Jun-17, Fujii Masao wrote: > On 2020/06/17 3:50, Alvaro Herrera wrote: > So InvalidateObsoleteReplicationSlots() can terminate normal backends. > But do we want to do this? If we want, we should add the note about this > case into the docs? Otherwise the users would be surprised at termina

Re: Review for GetWALAvailability()

2020-06-16 Thread Fujii Masao
On 2020/06/17 3:50, Alvaro Herrera wrote: On 2020-Jun-17, Fujii Masao wrote: While reading InvalidateObsoleteReplicationSlots() code, I found another issue. ereport(LOG, (errmsg("terminating walsender %d because replication sl

Re: Review for GetWALAvailability()

2020-06-16 Thread Kyotaro Horiguchi
At Tue, 16 Jun 2020 14:31:43 -0400, Alvaro Herrera wrote in > On 2020-Jun-16, Kyotaro Horiguchi wrote: > > > I noticed the another issue. If some required WALs are removed, the > > slot will be "invalidated", that is, restart_lsn is set to invalid > > value. As the result we hardly see the "los

Re: Review for GetWALAvailability()

2020-06-16 Thread Kyotaro Horiguchi
At Wed, 17 Jun 2020 00:46:38 +0900, Fujii Masao wrote in > >> 2. InvalidateObsoleteReplicationSlots() uses the loop to scan > >> replication > >> slots array and uses the "for" loop in each scan. Also it calls > >> ReplicationSlot

Re: Review for GetWALAvailability()

2020-06-16 Thread Alvaro Herrera
On 2020-Jun-17, Fujii Masao wrote: > While reading InvalidateObsoleteReplicationSlots() code, I found another > issue. > > ereport(LOG, > (errmsg("terminating walsender %d > because replication slot \"%s\" is too far behind", >

Re: Review for GetWALAvailability()

2020-06-16 Thread Alvaro Herrera
On 2020-Jun-16, Kyotaro Horiguchi wrote: > I noticed the another issue. If some required WALs are removed, the > slot will be "invalidated", that is, restart_lsn is set to invalid > value. As the result we hardly see the "lost" state. > > It can be "fixed" by remembering the validity of a slot se

Re: Review for GetWALAvailability()

2020-06-16 Thread Fujii Masao
On 2020/06/16 14:00, Kyotaro Horiguchi wrote: At Tue, 16 Jun 2020 01:46:21 +0900, Fujii Masao wrote in In short, it is known behavior but it was judged as useless to prevent that. That can happen when checkpointer removes up to the segment that is being read by walsender. I think that that

Re: Review for GetWALAvailability()

2020-06-15 Thread Kyotaro Horiguchi
At Tue, 16 Jun 2020 01:46:21 +0900, Fujii Masao wrote in > > In short, it is known behavior but it was judged as useless to prevent > > that. > > That can happen when checkpointer removes up to the segment that is > > being read by walsender. I think that that doesn't happen (or > > happenswith

Re: Review for GetWALAvailability()

2020-06-15 Thread Kyotaro Horiguchi
At Mon, 15 Jun 2020 18:59:49 +0900, Fujii Masao wrote in > > It was a kind of hard to decide. Even when max_slot_wal_keep_size is > > smaller than max_wal_size, the segments more than > > max_slot_wal_keep_size are not guaranteed to be kept. In that case > > the state transits as NORMAL->LOST s

Re: Review for GetWALAvailability()

2020-06-15 Thread Fujii Masao
On 2020/06/15 13:42, Kyotaro Horiguchi wrote: At Sat, 13 Jun 2020 01:38:49 +0900, Fujii Masao wrote in Hi, The document explains that "lost" value that pg_replication_slots.wal_status reports means some WAL files are definitely lost and this slot cannot be used to resume replicat

Re: Review for GetWALAvailability()

2020-06-15 Thread Fujii Masao
On 2020/06/15 13:42, Kyotaro Horiguchi wrote: At Sat, 13 Jun 2020 01:38:49 +0900, Fujii Masao wrote in Hi, The document explains that "lost" value that pg_replication_slots.wal_status reports means some WAL files are definitely lost and this slot cannot be used to resume replica

Re: Review for GetWALAvailability()

2020-06-15 Thread Kyotaro Horiguchi
At Mon, 15 Jun 2020 13:42:25 +0900 (JST), Kyotaro Horiguchi wrote in > Oops! I don't want to believe I did that but it's definitely wrong. Hmm. Quite disappointing. The patch was just a crap. This is the right patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a

Re: Review for GetWALAvailability()

2020-06-14 Thread Kyotaro Horiguchi
At Sat, 13 Jun 2020 01:38:49 +0900, Fujii Masao wrote in > Hi, > > The document explains that "lost" value that > pg_replication_slots.wal_status reports means > > some WAL files are definitely lost and this slot cannot be used to > resume replication anymore. > > However, I observed

Review for GetWALAvailability()

2020-06-12 Thread Fujii Masao
Hi, The document explains that "lost" value that pg_replication_slots.wal_status reports means some WAL files are definitely lost and this slot cannot be used to resume replication anymore. However, I observed "lost" value while inserting lots of records, but replication could continue nor