> What does `patronictl list` show during that interval?

Well. I can't repeat the situation anymore. Now the replication starts
immediately after starting the patroni on secondary. I did several
switchover commands meanwhile though

Meanwhile I did another test where I run a Java app with a large number of
*short* transactions (inserts) and during execution of this app I do the
patroni switchover command:

patronictl -c /etc/patroni/patroni.yml switchover

It turned out the records were not replicated to the secondary and when I
tried to execute the switchover command on the primary I got the following
error:
Error: This cluster has no master

When I tried to execute the switchover command on  the secondary it worked
but because there was a discrepancy between the primary and secondary the
records on the old primary were rolled back (the number of records on
primary and secondary became the same - the same as it was on the old
secondary)

Apparently there is something wrong with my cluster. How to debug i?. Do I
need to configure anything so the replication is synchronous?





pt., 29 kwi 2022 o 22:33 Peter J. Holzer <hjp-pg...@hjp.at> napisaƂ(a):

> On 2022-04-28 11:09:12 +0200, Zb B wrote:
> > > When the secondary starts up it should continue replicating from where
> > > it stopped. However, it can only do this if the necessary information
> is
> > > still available. If WAL files have been deleted in the mean time. it
> > > can't replay them. There should be error messages in your logs on what
> > > went wrong
> >
> > I did another test using different wal_sender_timeout parameter, as the
> time of
> > the secondary being shut down was longer than the default 60s for this
> > parameter.
>
> I don't think this will help. It will just make the primary slower in
> noticing that the secondary is gone.
>
>
> > I was hoping it would help but the result was the same (records were not
> > replicated to the secondary after the patroni start). Well, I just
> verified
> > again that the records were replicated after about 15 minutes to the
> secondary,
> > so probably the timeout setting helped, or I was not patient enough
> before.
>
> The latter, I suspect. Although I'm surprised that it takes so long. In
> my experience, that takes only a few seconds, certainly less than a
> minute for replication to start (how long it takes to finish depends on
> the amount of data, of course).
>
> Patroni can nuke the secondary database and create a fresh copy
> (using basebackup). That might take 15 minutes (depending on the
> database size). I don't think it does that automatically, though. Also I
> think you would have noticed that.
>
> What does `patronictl list` show during that interval?
>
>
> > Is it normal to wait so long for the replication? (the original
> > transaction in primary took about 5 minutes and was about 3000 small
> > records). I am providing more details for completeness below:
> >
> > I get the following errors on the primary DB:
> > 2022-04-28 04:36:50.544 EDT [13794] WARNING:  archive_mode enabled, yet
> > archive_command is not set
> > 2022-04-28 04:37:34.893 EDT [14755] ERROR:  replication slot
> "xyzd3riardb05"
> > does not exist
> > 2022-04-28 04:37:34.893 EDT [14755] STATEMENT:  START_REPLICATION SLOT
> > "xyzd3riardb05" 0/7000000 TIMELINE 18
> ...
> > and after some time such errors stop to appear.
>
> So the replication slot is probably created after some time and then
> replication starts to work.
>
> I think that replication slot is managed by Patroni. So the question
> would be: Why does Patroni take so long to create it? Did it log
> anything?
>
>         hp
>
> --
>    _  | Peter J. Holzer    | Story must make more sense than reality.
> |_|_) |                    |
> | |   | h...@hjp.at         |    -- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |       challenge!"
>

Reply via email to