RE: Initial Schema Sync for Logical Replication

Kumar, Sachin Wed, 22 Mar 2023 03:56:01 -0700
> From: Amit Kapila <[email protected]>
> Sent: Wednesday, March 22, 2023 5:16 AM
> To: Masahiko Sawada <[email protected]>
> Cc: Euler Taveira <[email protected]>; Kumar, Sachin
> <[email protected]>; Alvaro Herrera <[email protected]>; pgsql-
> [email protected]; Jonathan S. Katz <[email protected]>
> Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada
> <[email protected]> wrote:
> >
> > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <[email protected]>
> wrote:
> > >
> > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <[email protected]>
> wrote:
> > >
> > > > You should
> > > > exclude them removing these objects from the TOC before running
> > > > pg_restore or adding a few pg_dump options to exclude these
> > > > objects. Another issue is related to different version. Let's say
> > > > the publisher has a version ahead of the subscriber version, a new
> > > > table syntax can easily break your logical replication setup. IMO
> > > > pg_dump doesn't seem like a good solution for initial synchronization.
> > > >
> > > > Instead, the backend should provide infrastructure to obtain the
> > > > required DDL commands for the specific (set of) tables. This can
> > > > work around the issues from the previous paragraph:
> > > >
> > > ...
> > > > * don't need to worry about different versions.
> > > >
> > >
> > > AFAICU some of the reasons why pg_dump is not allowed to dump from
> > > the newer version are as follows: (a) there could be more columns in
> > > the newer version of the system catalog and then Select * type of
> > > stuff won't work because the client won't have knowledge of
> > > additional columns. (b) the newer version could have new features
> > > (represented by say new columns in existing catalogs or new
> > > catalogs) that the older version of pg_dump has no knowledge of and
> > > will fail to get that data and hence an inconsistent dump. The
> > > subscriber will easily be not in sync due to that.
> > >
> > > Now, how do we avoid these problems even if we have our own version
> > > of functionality similar to pg_dump for selected objects? I guess we
> > > will face similar problems.
> >
> > Right. I think that such functionality needs to return DDL commands
> > that can be executed on the requested version.
> >
> > > If so, we may need to deny schema sync in any such case.
> >
> > Yes. Do we have any concrete use case where the subscriber is an older
> > version, in the first place?
> >
> 
> As per my understanding, it is mostly due to the reason that it can work
> today. Today, during an off-list discussion with Jonathan on this point, he
> pointed me to a similar incompatibility in MySQL replication. See the "SQL
> incompatibilities" section in doc[1]. Also, please note that this applies not
> only to initial sync but also to schema sync during replication. I don't 
> think it
> would be feasible to keep such cross-version compatibility for DDL
> replication.
> 
> Having said above, I don't intend that we must use pg_dump from the
> subscriber for the purpose of initial sync. I think the idea at this stage is 
> to
> primarily write a POC patch to see what difficulties we may face. The other
> options that we could try out are (a) try to duplicate parts of pg_dump code
> in some way (by extracting required
> code) for the subscription's initial sync, or (b) have a common code (probably
> as a library or some other way) for the required functionality. There could be
> more possibilities that we may not have thought of yet. But the main point is
> that for approaches other than using pg_dump, we should consider ways to
> avoid duplicity of various parts of its code. Due to this, I think before 
> ruling
> out using pg_dump, we should be clear about its risks and limitations.
> 
> Thoughts?
There is one more thing which needs to be consider even if we use 
pg_dump/pg_restore
We still need to have a way to get the create table for tables , if we want to 
support
concurrent DDLs on the publisher.
>8. TableSync process should check the state of table , if it is 
>SUBREL_STATE_CREATE it should
>get the latest definition from the publisher and recreate the table. (We have 
>to recreate
>the table even if there are no changes). Then it should go into copy table 
>mode as usual.
Unless there is different way to support concurrent DDLs or we going for 
blocking publisher
till initial sync is completed.
Regards
Sachin
> 
> [1] - https://dev.mysql.com/doc/refman/8.0/en/replication-
> compatibility.html
> [2] - https://www.postgresql.org/message-
> id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-
> uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com
> 
> --
> With Regards,
> Amit Kapila.
RE: Initial Schema Sync for Logical Replication

Reply via email to