Initial Schema Sync for Logical Replication

2023-03-15 Thread Kumar, Sachin
Hi Everyone, I am working on the initial schema sync for Logical replication. Currently, user have to manually create a schema on subscriber side. Aim of this feature is to add an option in create subscription, so that schema sync can be automatic. I am sharing Design Doc below, but there are s

RE: Initial Schema Sync for Logical Replication

2023-03-16 Thread Kumar, Sachin
Hi Peter, > Hi, > > I have a couple of questions. > > Q1. > > What happens if the subscriber already has some tables present? For > example, I did not see the post saying anything like "Only if the table does > not already exist then it will be created". > My assumption was the if subscriber i

RE: Initial Schema Sync for Logical Replication

2023-03-20 Thread Kumar, Sachin
Hi Amit, > From: Amit Kapila > > > Hi, > > > > > > I have a couple of questions. > > > > > > Q1. > > > > > > What happens if the subscriber already has some tables present? For > > > example, I did not see the post saying anything like "Only if the > > > table does not already exist then it will

RE: Initial Schema Sync for Logical Replication

2023-03-20 Thread Kumar, Sachin
Hi Alvaro, > From: Alvaro Herrera > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication > On 2023-Mar-15, Kumar, Sachin wrote: > > > 1. In CreateSubscription() when we create replication > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, S

RE: Initial Schema Sync for Logical Replication

2023-03-22 Thread Kumar, Sachin
> From: Amit Kapila > Sent: Wednesday, March 22, 2023 5:16 AM > To: Masahiko Sawada > Cc: Euler Taveira ; Kumar, Sachin > ; Alvaro Herrera ; pgsql- > hack...@lists.postgresql.org; Jonathan S. Katz > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication >

RE: Initial Schema Sync for Logical Replication

2023-03-23 Thread Kumar, Sachin
> From: Amit Kapila > IIUC, this is possible only if tablesync process uses a snapshot different > than the > snapshot we have used to perform the initial schema sync, otherwise, this > shouldn't be a problem. Let me try to explain my understanding with an example > (the LSNs used are just explai

RE: Initial Schema Sync for Logical Replication

2023-03-24 Thread Kumar, Sachin
> From: Amit Kapila > > I think we won't be able to use same snapshot because the transaction will > > be committed. > > In CreateSubscription() we can use the transaction snapshot from > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure > > about this part maybe walrcv_disc

RE: Initial Schema Sync for Logical Replication

2023-03-24 Thread Kumar, Sachin
> I am not if it's feasible to support the use case the replicate DDL to old > subscriber. > +1 > First, I think the current publisher doesn't know the version number of > client(subscriber) so we need to check the feasibility of same. Also, having > client's version number checks doesn't seem

RE: Initial Schema Sync for Logical Replication

2023-03-29 Thread Kumar, Sachin
> > > > From: Amit Kapila > > > > > I think we won't be able to use same snapshot because the > > > > > transaction will be committed. > > > > > In CreateSubscription() we can use the transaction snapshot from > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am > > > > > not su

RE: Initial Schema Sync for Logical Replication

2023-03-29 Thread Kumar, Sachin
> From: Masahiko Sawada > > > > One related idea is that currently, we fetch the table list > > corresponding to publications in subscription and create the entries > > for those in pg_subscription_rel during Create Subscription, can we > > think of postponing that work till after the initial sch

RE: Initial Schema Sync for Logical Replication

2023-04-02 Thread Kumar, Sachin
> -Original Message- > From: Masahiko Sawada > > > I was thinking each TableSync process will call pg_dump --table, > > > This way if we have N tableSync process, we can have N pg_dump -- > table=table_name called in parallel. > > > In fact we can use --schema-only to get schema and then

RE: Initial Schema Sync for Logical Replication

2023-04-05 Thread Kumar, Sachin
> From: Masahiko Sawada > > > > > > 3. The apply worker launches the tablesync workers for tables that > > > need to be synchronized. > > > > > > There might be DDLs executed on the publisher for tables before the > > > tablesync worker starts. But the apply worker needs to apply DDLs > > > for pr

RE: Initial Schema Sync for Logical Replication

2023-04-18 Thread Kumar, Sachin
> From: Masahiko Sawada > > > While writing a PoC patch, I found some difficulties in this idea. > > > First, I tried to add schemaname+relname to pg_subscription_rel but > > > I could not define the primary key of pg_subscription_rel. The > > > primary key on (srsubid, srrelid) doesn't work since

RE: Initial Schema Sync for Logical Replication

2023-04-20 Thread Kumar, Sachin
I am working on a prototype with above Idea , and will send it for review by Sunday/Monday Regards Sachin

Re: Initial Schema Sync for Logical Replication

2023-04-20 Thread Kumar, Sachin
I am working on a prototype with above discussed idea, I think I will send it for initial review by Monday. Regards Sachin

RE: Initial Schema Sync for Logical Replication

2023-10-20 Thread Kumar, Sachin
> From: vignesh C > Sent: Thursday, October 19, 2023 10:41 AM > Can you rebase the patch and post the complete set of required changes for > the concurrent DDL, I will have a look at them. Sure , I will try to send the complete rebased patch within a week. Regards Sachin

Re: pg_upgrade failing for 200+ million Large Objects

2023-11-09 Thread Kumar, Sachin
 Hi Everyone , I want to continue this thread , I have rebased the patch to latest master and fixed an issue when pg_restore prints to file. ` ╰─$ pg_restore dump_small.custom --restore-blob-batch-size=2 --file=a -- -- End BLOB restore batch -- COMMIT; ` > On 09/11/2023, 17:05, "Jacob Cham

Re: pg_upgrade failing for 200+ million Large Objects

2023-11-13 Thread Kumar, Sachin
> On 09/11/2023, 18:41, "Tom Lane" > wrote: > Um ... you didn't attach the patch? Sorry , patch attached Regards Sachin pg_upgrade_improvements_v6.diff Description: pg_upgrade_improvements_v6.diff

Re: pg_upgrade failing for 200+ million Large Objects

2023-12-04 Thread Kumar, Sachin
> "Tom Lane" mailto:t...@sss.pgh.pa.us>> wrote: > FWIW, I agree with Jacob's concern about it being a bad idea to let > users of pg_upgrade pass down arbitrary options to pg_dump/pg_restore. > I think we'd regret going there, because it'd hugely expand the set > of cases pg_upgrade has to deal

Re: pg_upgrade failing for 200+ million Large Objects

2023-12-07 Thread Kumar, Sachin
> I have updated the patch to use heuristic, During pg_upgrade we count > Large objects per database. During pg_restore execution if db large_objects > count is greater than LARGE_OBJECTS_THRESOLD (1k) we will use > --restore-blob-batch-size. I think both SECTION_DATA and SECTION_POST_DATA can

Re: pg_upgrade failing for 200+ million Large Objects

2024-01-02 Thread Kumar, Sachin
> On 11/12/2023, 01:43, "Tom Lane" > wrote: > I had initially supposed that in a parallel restore we could > have child workers also commit after every N TOC items, but was > soon disabused of that idea. After a worker processes a TOC > item, any dependent items (such a

RE: Initial Schema Sync for Logical Replication

2023-08-31 Thread Kumar, Sachin
Hi Everyone, based on internal discussion with Masahiko I have implemented concurrent DDL support for initial schema sync. Concurrent Patch workflow 1. When TableSync worker creates a replicaton slot, It will save the slot lsn into pg_subscription_rel with SUBREL_SYNC_SCHEMA_DATA_SYNC state, and

RE: Initial Schema Sync for Logical Replication

2023-07-10 Thread Kumar, Sachin
> From: Amit Kapila > On Wed, Jul 5, 2023 at 7:45 AM Masahiko Sawada > wrote: > > > > On Mon, Jun 19, 2023 at 5:29 PM Peter Smith > wrote: > > > > > > Hi, > > > > > > Below are my review comments for the PoC patch 0001. > > > > > > In addition, the patch needed rebasing, and, after I rebased

RE: Initial Schema Sync for Logical Replication

2023-07-07 Thread Kumar, Sachin
> From: Masahiko Sawada > So I've implemented a different approach; doing schema synchronization at a > CREATE SUBSCRIPTION time. The backend executing CREATE SUBSCRIPTION > uses pg_dump and restores the table schemas including both partitioned tables > and their partitions regardless of publish_v