On Wed, Dec 11, 2019 at 11:46 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Mon, Dec 2, 2019 at 3:32 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > I have rebased the patch set on the latest head. > > 0001 looks like a clever approach, but are you sure it doesn't hurt > performance when many small XLOG records are being inserted? I think > XLogRecordAssemble() can get pretty hot in some workloads. >
I don't think we have evaluated it yet, but we should do it. The point to note is that it is only for the case when wal_level is 'logical' (see IsSubTransactionAssignmentPending) in which case we already log more WAL, so this might not impact much. I guess that it might be better to have that check in XLogRecordAssemble for the sake of clarity. > > Regarding 0005, it seems to me that this is no good: > > + errmsg("improper heap_getnext call"))); > > I think we should be using elog() rather than ereport() here, because > this should only happen if there's a bug in a logical decoding plugin. > At first, I thought maybe this should just be an Assert(), but since > there are third-party logical decoding plugins available, checking > this even in non-assert builds seems like a good idea. However, I > think making it translatable is overkill; users should never see this, > only developers. > makes sense. I think we should change it. > > + if (prev_lsn != InvalidXLogRecPtr) > + Assert(prev_lsn <= change->lsn); > > There is no reason to ever write an if statement that contains only an > Assert, and it's bad style. Write Assert(prev_lsn == InvalidXLogRecPtr > || prev_lsn <= change->lsn), or better yet, use XLogRecPtrIsInvalid. > Agreed. > The purpose and mechanism of the is_schema_sent flag is not clear to > me. The word "schema" here seems to be being used to mean "snapshot," > which is rather confusing. > I have explained this flag below along with invalidations as both are slightly related. > I'm also somewhat unclear on what's happening here with invalidations. > Perhaps that's as much a defect in my understanding as it is > reflective of any problem with the patch, but I also don't see any > comments either in 0002 or later patches explaining the theory of > operation. If I've missed some, please point me in the right > direction. Hypothetically speaking, it seems to me that if you just > did InvalidateSystemCaches() every time the snapshot changed, you > wouldn't need anything else (unless we're concerned with > non-transactional invalidation messages like smgr and relmapper > invalidations; not quite sure how those are handled). And, on the > other hand, if we don't do InvalidateSystemCaches() every time the > snapshot changes, then I don't understand why this works now, even > without streaming. > I think the way invalidations work for logical replication is that normally, we always start a new transaction before decoding each commit which allows us to accept the invalidations (via AtStart_Cache). However, if there are catalog changes within the transaction being decoded, we need to reflect those before trying to decode the WAL of operation which happened after that catalog change. As we are not logging the WAL for each invalidation, we need to execute all the invalidation messages for this transaction at each catalog change. We are able to do that now as we decode the entire WAL for a transaction only once we get the commit's WAL which contains all the invalidation messages. So, we queue them up and execute them for each catalog change which we identify by WAL record XLOG_HEAP2_NEW_CID. The second related concept is that before sending each change to downstream (via pgoutput), we check whether we need to send the schema. This we decide based on the local map entry (RelationSyncEntry) which indicates whether the schema for the relation is already sent or not. Once the schema of the relation is sent, the entry for that relation in the map will indicate it. At the time of invalidation processing we also blew up this map, so it always reflects the correct state. Now, to decode an in-progress transaction, we need to ensure that we have received the WAL for all the invalidations before decoding the WAL of action that happened immediately after that catalog change. This is the reason we started WAL logging individual Invalidations. So, with this change we don't need to execute all the invalidations for each catalog change, rather execute them as and when their WAL is being decoded. The current mechanism to send schema changes won't work for streaming transactions because after sending the change, subtransaction might abort. On subtransaction abort, the downstream will simply discard the changes where we will lose the previous schema change sent. There is no such problem currently because we process all the aborts before sending any change. So, the current idea of having a schema_sent flag in each map entry (RelationSyncEntry) won't work for streaming transactions. To solve this problem initially patch has kept a flag 'is_schema_sent' for each top-level transaction (in ReorderBufferTXN) so that we can always send a schema for each (sub)transaction for streaming transactions, but that won't work if we access multiple relations in the same subtransaction. To solve this problem, we are thinking of keeping a list/array of top-level xids in each RelationSyncEntry. Basically, whenever we send the schema for any transaction, we note that in RelationSyncEntry and at abort/commit time we can remove xid from the list. Now, whenever, we check whether to send schema for any operation in a transaction, we will check if our xid is present in that list for a particular RelationSyncEntry and take an action based on that (if xid is present, then we won't send the schema, otherwise, send it). I think during decode, we should not have that may open transactions, so the search in the array should be cheap enough but we can consider some other data structure like hash as well. I will think some more and respond to your remaining comments/suggestions. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com