On 06/02/2021 06:07, Amit Kapila wrote:
On Sat, Feb 6, 2021 at 6:22 AM Peter Smith <smithpb2...@gmail.com> wrote:
On Sat, Feb 6, 2021 at 2:10 AM Petr Jelinek
<petr.jeli...@enterprisedb.com> wrote:
+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char
syncslotname[NAMEDATALEN])
+{
+ if (syncslotname)
+ sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+ else
+ syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+ return syncslotname;
+}
Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?
I think so. See, if the alternative suggested below works or if you
have any other suggestions for the same?
The PG docs [1] says "there is only one copy of pg_subscription per
cluster, not one per database". IIUC that means it is not possible for
2 different subscriptions to have the same suboid.
I think he is talking about two different clusters having separate
subscriptions but point to the same publisher. In different clusters,
we can get the same subid/relid. I think we need a cluster-wide unique
identifier to distinguish among different subscribers. How about using
the system_identifier stored in the control file (we can use
GetSystemIdentifier to retrieve it). I think one concern could be
that adding that to slot name could exceed the max length of slot
(NAMEDATALEN -1) but I don't think that is the case here
(pg_%u_sync_%u_UINT64_FORMAT (3 + 10 + 6 + 10 + 20 + '\0')). Note last
is system_identifier in this scheme.
Yep that's what I mean and system_identifier seems like a good choice to me.
Do you guys think that works or let me know if you have any other
better idea? Petr, is there a reason why such an identifier is not
considered originally, is there any risk in it?
Originally it was not considered likely because it's all based on
pglogical/BDR work where ids are hashes of stuff that's unique across
group of instances, not counter based like Oids in PostgreSQL and I
simply didn't realize it could be a problem until reading this patch :)
--
Petr Jelinek