On Tue, Jun 18, 2024 at 12:41 PM Euler Taveira <eu...@eulerto.com> wrote: > > On Tue, Jun 18, 2024, at 12:59 AM, Amit Kapila wrote: > > On Mon, May 20, 2024 at 12:12 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Sun, May 19, 2024 at 11:20 PM Euler Taveira <eu...@eulerto.com> wrote: > > > > > > On Sun, May 19, 2024, at 2:30 PM, Tom Lane wrote: > > > > > > I'm fairly disturbed about the readiness of pg_createsubscriber. > > > The 040_pg_createsubscriber.pl TAP test is moderately unstable > > > in the buildfarm [1], and there are various unaddressed complaints > > > at the end of the patch thread (pretty much everything below [2]). > > > I guess this is good enough to start beta with, but it's far from > > > being good enough to ship, IMO. If there were active work going > > > on to fix these things, I'd feel better, but neither the C code > > > nor the test script have been touched since 1 April. > > > > > > > > > My bad. :( I'll post patches soon to address all of the points. > > > > > > > Just to summarize, apart from BF failures for which we had some > > discussion, I could recall the following open points: > > > > 1. After promotion, the pre-existing replication objects should be > > removed (either optionally or always), otherwise, it can lead to a new > > subscriber not being able to restart or getting some unwarranted data. > > [1][2]. > > > > 2. Retaining synced slots on new subscribers can lead to unnecessary > > WAL retention and dead rows [3]. > > > > 3. We need to consider whether some of the messages displayed in > > --dry-run mode are useful or not [4]. > > > > The recent commits should silence BF failures and resolve point number > 2. But we haven't done anything yet for 1 and 3. For 3, we have a > patch in email [1] (v3-0005-Avoid*) which can be reviewed and > committed but point number 1 needs discussion. Apart from that > somewhat related to point 1, Kuroda-San has raised a point in an email > [2] for replication slots. Shlok has presented a case in this thread > [3] where the problem due to point 1 can cause ERRORs or can cause > data inconsistency. > > > I read v3-0005 and it seems to silence (almost) all "write" messages. Does it > intend to avoid the misinterpretation that the dry run mode is writing > something? It is dry run mode! If I run a tool in dry run mode, I expect it to > execute some verifications and print useful messages so I can evaluate if it > is > ok to run it. Maybe it is not your expectation for dry run mode. >
I haven't studied the patch so can't comment but the intention was to not print some unrelated write messages. I have shared my observation in the email [1]. > I think if it not clear, let's inform that it changes nothing in dry run mode. > > pg_createsubscriber: no modifications are done > > as a first message in dry run mode. I agree with you when you pointed out that > some messages are misleading. > > pg_createsubscriber: hint: If pg_createsubscriber fails after this > point, you must recreate the physical replica before continuing. > > Maybe for this one, we omit the fake information, like: > > pg_createsubscriber: setting the replication progress on database "postgres" > I think we don't need to display this message as we are not going to do anything for this in the --dry-run mode. We can even move the related code in (!dry_run) check. > I will post a patch to address the messages once we agree what needs to be > changed. > I suggest we can start a new thread with the messages shared in the email [1] and your response for each one of those. > Regarding 3, publications and subscriptions are ok to remove. You are not > allowed to create them on standby, hence, all publications and subscriptions > are streamed from primary. However, I'm wondering if you want to remove the > publications. > I am not so sure of publications but we should remove subscriptions as there are clear problems with those as shown by Shlok in this thread. > Replication slots on a standby server are "invalidated" despite > of the wal_status is saying "reserved" (I think it is an oversight in the > design that implements slot invalidation), however, required WAL files have > already been removed because of pg_resetwal (see modify_subscriber_sysid()). > The scenario is to promote a standby server, run pg_resetwal on it and check > pg_replication_slots. > Ideally, invalidated slots shouldn't create any problems but it is better that we discuss this also as a separate problem in new thread. > Do you have any other scenarios in mind? > No, so we have three issues to discuss (a) some unwarranted messages in --dry-run mode; (b) whether to remove pre-existing subscriptions during conversion; (c) whether to remove pre-existing replication slots. Would you like to start three new threads for each of these or would you like Kuroda-San or me to start some or all? [1] - https://www.postgresql.org/message-id/CAA4eK1J2fAvsJ2HihbWJ_GxETd6sdqSMrZdCVJEutRZRpm1MEQ%40mail.gmail.com -- With Regards, Amit Kapila.