Re: Single transaction in the tablesync worker?

Peter Smith Thu, 04 Feb 2021 17:39:58 -0800

On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
...


> Thanks. I have fixed one of the issues reported by me earlier [1]
> wherein the tablesync worker can repeatedly fail if after dropping the
> slot there is an error while updating the SYNCDONE state in the
> database. I have moved the drop of the slot just before commit of the
> transaction where we are marking the state as SYNCDONE. Additionally,
> I have removed unnecessary includes in tablesync.c, updated the docs
> for Alter Subscription, and updated the comments at various places in
> the patch. I have also updated the commit message this time.
>

Below are my feedback comments for V17 (nothing functional)

~~

1.
V27 Commit message:
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronizes the
position in the stream with the main apply worker.

Typo:
"synchronizes" -> "synchronize"

~~

2.
@@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable
class="parameter">name</replaceable> RENAME TO <
    (Currently, all subscription owners must be superusers, so the owner checks
    will be bypassed in practice.  But this might change in the future.)
   </para>
+
+  <para>
+   When refreshing a publication we remove the relations that are no longer
+   part of the publication and we also remove the tablesync slots if there are
+   any. It is necessary to remove tablesync slots so that the resources
+   allocated for the subscription on the remote host are released. If due to
+   network breakdown or some other error, we are not able to remove the slots,
+   we give WARNING and the user needs to manually remove such slots later as
+   otherwise, they will continue to reserve WAL and might eventually cause
+   the disk to fill up. See also <xref
linkend="logical-replication-subscription-slot"/>.
+  </para>

I think the content is good, but the 1st-person wording seemed strange.
e.g.
"we are not able to remove the slots, we give WARNING and the user needs..."
Maybe it should be like:
"... PostgreSQL is unable to remove the slots, so a WARNING is
reported. The user needs... "

~~

3.
@@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub,
bool copy_data)
...
+ * XXX If there is a network break down while dropping the

"network break down" -> "network breakdown"

~~

4.
@@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
  (errmsg("could not connect to the publisher: %s", err)));
...
+ * XXX We could also instead try to drop the slot, last time we failed
+ * but for that, we might need to clean up the copy state as it might
+ * be in the middle of fetching the rows. Also, if there is a network
+ * break down then it wouldn't have succeeded so trying it next time
+ * seems like a better bet.

"network break down" -> "network breakdown"

~~

5.
@@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int
cacheid, uint32 hashvalue)
...
+
+ /*
+ * Cleanup the tablesync slot.
+ *
+ * This has to be done after updating the state because otherwise if
+ * there is an error while doing the database operations we won't be
+ * able to rollback dropped slot.
+ */
+ ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
+ MyLogicalRepWorker->relid,
+ syncslotname);
+
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
+

Should this comment also describe why the missing_ok is false for this case?

----
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Single transaction in the tablesync worker?

Reply via email to