> What does `patronictl list` show during that interval? Well. I can't repeat the situation anymore. Now the replication starts immediately after starting the patroni on secondary. I did several switchover commands meanwhile though
Meanwhile I did another test where I run a Java app with a large number of *short* transactions (inserts) and during execution of this app I do the patroni switchover command: patronictl -c /etc/patroni/patroni.yml switchover It turned out the records were not replicated to the secondary and when I tried to execute the switchover command on the primary I got the following error: Error: This cluster has no master When I tried to execute the switchover command on the secondary it worked but because there was a discrepancy between the primary and secondary the records on the old primary were rolled back (the number of records on primary and secondary became the same - the same as it was on the old secondary) Apparently there is something wrong with my cluster. How to debug i?. Do I need to configure anything so the replication is synchronous? pt., 29 kwi 2022 o 22:33 Peter J. Holzer <hjp-pg...@hjp.at> napisaĆ(a): > On 2022-04-28 11:09:12 +0200, Zb B wrote: > > > When the secondary starts up it should continue replicating from where > > > it stopped. However, it can only do this if the necessary information > is > > > still available. If WAL files have been deleted in the mean time. it > > > can't replay them. There should be error messages in your logs on what > > > went wrong > > > > I did another test using different wal_sender_timeout parameter, as the > time of > > the secondary being shut down was longer than the default 60s for this > > parameter. > > I don't think this will help. It will just make the primary slower in > noticing that the secondary is gone. > > > > I was hoping it would help but the result was the same (records were not > > replicated to the secondary after the patroni start). Well, I just > verified > > again that the records were replicated after about 15 minutes to the > secondary, > > so probably the timeout setting helped, or I was not patient enough > before. > > The latter, I suspect. Although I'm surprised that it takes so long. In > my experience, that takes only a few seconds, certainly less than a > minute for replication to start (how long it takes to finish depends on > the amount of data, of course). > > Patroni can nuke the secondary database and create a fresh copy > (using basebackup). That might take 15 minutes (depending on the > database size). I don't think it does that automatically, though. Also I > think you would have noticed that. > > What does `patronictl list` show during that interval? > > > > Is it normal to wait so long for the replication? (the original > > transaction in primary took about 5 minutes and was about 3000 small > > records). I am providing more details for completeness below: > > > > I get the following errors on the primary DB: > > 2022-04-28 04:36:50.544 EDT [13794] WARNING: archive_mode enabled, yet > > archive_command is not set > > 2022-04-28 04:37:34.893 EDT [14755] ERROR: replication slot > "xyzd3riardb05" > > does not exist > > 2022-04-28 04:37:34.893 EDT [14755] STATEMENT: START_REPLICATION SLOT > > "xyzd3riardb05" 0/7000000 TIMELINE 18 > ... > > and after some time such errors stop to appear. > > So the replication slot is probably created after some time and then > replication starts to work. > > I think that replication slot is managed by Patroni. So the question > would be: Why does Patroni take so long to create it? Did it log > anything? > > hp > > -- > _ | Peter J. Holzer | Story must make more sense than reality. > |_|_) | | > | | | h...@hjp.at | -- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >