On Fri, Apr 28, 2023 at 4:16 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > Yes, in this approach, we need to dump/restore objects while > specifying with fine granularity. Ideally, the table sync worker dumps > and restores the table schema, does copy the initial data, and then > creates indexes, and triggers and table-related objects are created > after that. So if we go with the pg_dump approach to copy the schema > of individual tables, we need to change pg_dump (or libpgdump needs to > be able to do) to support it.
We have been discussing how to sync schema but I'd like to step back a bit and discuss use cases and requirements of this feature. Suppose that a table belongs to a publication, what objects related to the table we want to sync by the initial schema sync features? IOW, do we want to sync table's ACLs, tablespace settings, triggers, and security labels too? If we want to replicate the whole database, e.g. when using logical replication for major version upgrade, it would be convenient if it synchronizes all table-related objects. However, if we have only this option, it could be useless in some cases. For example, in a case where users have different database users on the subscriber than the publisher, they might want to sync only CREATE TABLE, and set ACL etc by themselves. In this case, it would not be necessary to sync ACL and security labels. What use case do we want to support by this feature? I think the implementation could be varied depending on how to select what objects to sync. One possible idea is to select objects to sync depending on how DDL replication is set in the publisher. It's straightforward but I'm not sure the design of DDL replication syntax has been decided. Also, even if we create a publication with ddl = 'table' option, it's not clear to me that we want to sync table-dependent triggers, indexes, and rules too by the initial sync feature. Second idea is to make it configurable by users so that they can specify what objects to sync. But it would make the feature complex and I'm not sure users can use it properly. Third idea is that since the use case of synchronizing the whole database can be achievable even by pg_dump(all), we support synchronizing only tables (+ indexes) in the initial sync feature, which can not be achievable by pg_dump. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com