The full setup is:

**Before:
11 primary -> 11 hotstandby binary

**During migration
11 primary -> 11 hotstandby binary
  | -> 12 new instance via logical
          |-> 12 new replica via binary

**After migration
12 primary
|-> 12 replica via binary


On Tue, Dec 22, 2020 at 7:16 PM Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 12/22/20 9:10 AM, Lars Vonk wrote:
> >     Did you have some other replication running on the 11 instance?
> >
> >
> > Yes the 11 instance also had another (11) replica running. (But these
> > logs are from the 12 instance)
>
> The 11 instance had the data that went missing in the 12 instance, so
> what shows up in logs for the 11 instance during this period that is
> relevant?
>
> >
> > The new 12 instance also had a replica running.
>
> So the setup was?:
>
> 1) 11 primary --> 11 standby via what replication logical or binary?
>      | -->         12 new instance via logical
>
> 2) 12(new) primary --> 12(?) standby via what replication logical or
> binary?
>
> >
> >     In any case what was the command logged just before the ERROR.
> >
> >
> > There is nothing logged.
> >
> > These are the only log statements just before the error message, one
> > second later the ERROR is logged:
> >
> > 2020-12-10 13:26:43 UTC::@:[5537]:LOG:  checkpoints are occurring too
> > frequently (20 seconds apart)
> > 2020-12-10 13:26:43 UTC::@:[5537]:HINT:  Consider increasing the
> > configuration parameter "max_wal_size".
> > 2020-12-10 13:26:43 UTC::@:[5537]:LOG:  checkpoint starting: wal
> >
> > Lars
> >
> > On Mon, Dec 21, 2020 at 11:51 PM Adrian Klaver
> > <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> >     On 12/21/20 2:42 PM, Lars Vonk wrote:
> >      >     What was being run when the above ERROR was triggered?
> >      >
> >      >
> >      > The initial copy of a table. Other than that we ran select
> >      > pg_size_pretty(pg_relation_size('table_name')) to see the current
> >     size
> >      > of the table being copied to get a feeling on progress.
> >      >
> >      > And whenever we added a new table to the publication we ran ALTER
> >      > SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table
> >     to the
> >      > subscription. But not around that timestamp, about 50 minutes
> >     before the
> >      > first occurence of that ERROR. (no ERRORS after prior ALTER
> >     SUBSCRIPTIONs).
> >      >
> >      > But after the initial copy's ended there are more ERROR's on
> >     different
> >      > WAL segments missing. Each missing wal segment is logged as ERROR
> a
> >      > couple of times and then no more. After a couple of hours no
> >     errors are
> >      > logged.
> >
> >     Something was looking for the WAL segment.
> >
> >     Did you have some other replication running on the 11 instance?
> >
> >     In any case what was the command logged just before the ERROR.
> >
> >      >
> >      > Lars
> >      >
> >
> >
> >     --
> >     Adrian Klaver
> >     adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Reply via email to