On Friday, December 17, 2021 1:55 AM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > On 2021-Dec-16, houzj.f...@fujitsu.com wrote: > > > The patch ensures all columns of RT are in column list when > > CREATE/ALTER publication, but it seems doesn't prevent user from > > changing the replica identity or dropping the index used in replica > > identity. Do we also need to check those cases ? > > Yes, we do. As it happens, I spent a couple of hours yesterday writing code > for > that, at least partially. I haven't yet checked what happens with cases like > REPLICA NOTHING, or REPLICA INDEX <xyz> and then dropping that index. > > My initial ideas were a bit wrong BTW: I thought we should check the > combination of column lists in all publications (a bitwise-OR of column > bitmaps, > so to speak). But conceptually that's wrong: we need to check the column list > of each publication individually instead. Otherwise, if you wanted to hide a > column from some publication but that column was part of the replica identity, > there'd be no way to identify the tuple in the replica. (Or, if the pgouput > code > disobeys the column list and sends the replica identity even if it's not in > the > column list, then you'd be potentially publishing data that you wanted to > hide.)
Thanks for the explanation. Apart from ALTER REPLICA IDENTITY and DROP INDEX, I think there could be some other cases we need to handle for the replica identity check: 1) When adding a partitioned table with column list to the publication, I think we need to check the RI of all its leaf partition. Because the RI on the partition is the one actually takes effect. 2) ALTER TABLE ADD PRIMARY KEY; ALTER TABLE DROP CONSTRAINT "PRIMAEY KEY"; If the replica identity is default, it will use the primary key. we might also need to prevent user from adding or removing primary key in this case. Based on the above cases, the RI check seems could bring considerable amount of code. So, how about we follow what we already did in CheckCmdReplicaIdentity(), we can put the check for RI in that function, so that we can cover all the cases and reduce the code change. And if we are worried about the cost of do the check for UPDATE and DELETE every time, we can also save the result in the relcache. It's safe because every operation change the RI will invalidate the relcache. We are using this approach in row filter patch to make sure all columns in row filter expression are part of RI. Best regards, Hou zj