On Wednesday, March 22, 2023 1:16 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada > <sawada.m...@gmail.com> wrote: > > > > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <eu...@eulerto.com> wrote: > > > > > > > You should > > > > exclude them removing these objects from the TOC before running > > > > pg_restore or adding a few pg_dump options to exclude these > > > > objects. Another issue is related to different version. Let's say > > > > the publisher has a version ahead of the subscriber version, a new > > > > table syntax can easily break your logical replication setup. IMO > > > > pg_dump doesn't seem like a good solution for initial synchronization. > > > > > > > > Instead, the backend should provide infrastructure to obtain the > > > > required DDL commands for the specific (set of) tables. This can > > > > work around the issues from the previous paragraph: > > > > > > > ... > > > > * don't need to worry about different versions. > > > > > > > > > > AFAICU some of the reasons why pg_dump is not allowed to dump from > > > the newer version are as follows: (a) there could be more columns in > > > the newer version of the system catalog and then Select * type of > > > stuff won't work because the client won't have knowledge of > > > additional columns. (b) the newer version could have new features > > > (represented by say new columns in existing catalogs or new > > > catalogs) that the older version of pg_dump has no knowledge of and > > > will fail to get that data and hence an inconsistent dump. The > > > subscriber will easily be not in sync due to that. > > > > > > Now, how do we avoid these problems even if we have our own version > > > of functionality similar to pg_dump for selected objects? I guess we > > > will face similar problems. > > > > Right. I think that such functionality needs to return DDL commands > > that can be executed on the requested version. > > > > > If so, we may need to deny schema sync in any such case. > > > > Yes. Do we have any concrete use case where the subscriber is an older > > version, in the first place? > > > > As per my understanding, it is mostly due to the reason that it can work > today. > Today, during an off-list discussion with Jonathan on this point, he pointed > me > to a similar incompatibility in MySQL replication. See the "SQL > incompatibilities" section in doc[1]. Also, please note that this applies not > only > to initial sync but also to schema sync during replication. I don't think it > would > be feasible to keep such cross-version compatibility for DDL replication. > > Having said above, I don't intend that we must use pg_dump from the > subscriber for the purpose of initial sync. I think the idea at this stage is > to > primarily write a POC patch to see what difficulties we may face. The other > options that we could try out are (a) try to duplicate parts of pg_dump code > in > some way (by extracting required > code) for the subscription's initial sync, or (b) have a common code (probably > as a library or some other way) for the required functionality. There could be > more possibilities that we may not have thought of yet. But the main point is > that for approaches other than using pg_dump, we should consider ways to > avoid duplicity of various parts of its code. Due to this, I think before > ruling out > using pg_dump, we should be clear about its risks and limitations.
I thought about some possible problems about the design of using pg_dump. 1) According to the design, it will internally call pg_dump when creating subscription, but it requires to use a powerful user when calling pg_dump. Currently, it may not be a problem because create subscription also requires superuser. But people have recently discussed about allowing non-superuser to create the subscription[1], if that is accepted, then it seems not great to internally use superuser to call pg_dump while the user creating the subscription is a non-super user. 2) I think it's possible that some cloud DB service doesn't allow user to use the client commands(pg_dump ,..) directly, and the user that login in the database may not have the permission to execute the client commands. [1] https://www.postgresql.org/message-id/flat/20230308194743.23rmgjgwahh4i4rg%40awork3.anarazel.de Best Regards, Hou zj