On Wed, Oct 20, 2021 at 3:03 PM Greg Nancarrow <gregn4...@gmail.com> wrote: > > On Wed, Oct 20, 2021 at 7:59 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > Actually, at least with the scenario I gave steps for, after looking > > > > at it again and debugging, I think that the behavior is understandable > > > > and not a bug. > > > > The reason is that the INSERTed data is first published though the > > > > partitions, since initially there is no partitioned table in the > > > > publication (so publish_via_partition_root=true doesn't have any > > > > effect). But then adding the partitioned table to the publication and > > > > refreshing the publication in the subscriber, the data is then > > > > published "using the identity and schema of the partitioned table" due > > > > to publish_via_partition_root=true. Note that the corresponding table > > > > in the subscriber may well be a non-partitioned table (or the > > > > partitions arranged differently) so the data does need to be > > > > replicated again. > > > > > > > Even if the partitions are arranged differently why would the user > > expect the same data to be replicated twice? > > > > It's the same data, but published in different ways because of changes > the user made to the publication. > I am not talking in general, I am specifically referring to the > scenario I gave steps for. > In the example scenario I gave, initially when the subscription was > made, the publication just explicitly included the partitions, but > publish_via_partition_root was true. So in this case it publishes > through the individual partitions (as no partitioned table is present > in the publication). Then on the publisher side, the partitioned table > was then added to the publication and then ALTER SUBSCRIPTION ... > REFRESH PUBLICATION done on the subscriber side. Now that the > partitioned table is present in the publication and > publish_via_partition_root is true, it is "published using the > identity and schema of the partitioned table rather than that of the > individual partitions that are actually changed". So the data is > replicated again. >
I don't see why data need to be replicated again even in that case. Can you see any such duplicate data replicated for non-partitioned tables? > This scenario didn't use initial table data, so initial table sync > didn't come into play > It will be equivalent to initial sync because the tablesync worker would copy the entire data again in this case unless during refresh we pass copy_data as false. -- With Regards, Amit Kapila.