On Mon, Dec 26, 2022 at 1:22 PM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > On Friday, December 23, 2022 5:20 PM houzj.f...@fujitsu.com > <houzj.f...@fujitsu.com> wrote: > > > > I noticed a CFbot failure in one of the new testcases in 015_stream.pl which > > comes from old 032_xx.pl. It's because I slightly adjusted the change size > > in a > > transaction in last version which cause the transaction's size not to > > exceed the > > decoding work mem, so the transaction is not being applied as expected as > > streaming transactions(it is applied as a non-stremaing transaction) which > > cause > > the failure. Attach the new version patch which fixed this miss. > > > > Since the GUC used to force stream changes has been committed, I removed that > patch from the patch set here and rebased the testcases based on that commit. > Here is the rebased patch set. >
Thank you for updating the patches. Here are some comments for 0001 and 0002 patches: I think it'd be better to write logs when the leader enters the serialization mode. It would be helpful for investigating issues. --- + if (!pa_can_start(xid)) + return; + + /* First time through, initialize parallel apply worker state hashtable. */ + if (!ParallelApplyTxnHash) + { + HASHCTL ctl; + + MemSet(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(TransactionId); + ctl.entrysize = sizeof(ParallelApplyWorkerEntry); + ctl.hcxt = ApplyContext; + + ParallelApplyTxnHash = hash_create("logical replication parallel apply workershash", + 16, &ctl, + HASH_ELEM |HASH_BLOBS | HASH_CONTEXT); + } + + /* + * It's necessary to reread the subscription information before assigning + * the transaction to a parallel apply worker. Otherwise, the leader may + * not be able to reread the subscription information if streaming + * transactions keep coming and are handled by parallel apply workers. + */ + maybe_reread_subscription(); pa_can_start() checks if the skiplsn is an invalid xid or not, and then maybe_reread_subscription() could update the skiplsn to a valid value. As the comments in pa_can_start() says, it won't work. I think we should call maybe_reread_subscription() in apply_handle_stream_start() before calling pa_allocate_worker(). --- +static inline bool +am_leader_apply_worker(void) +{ + return (!OidIsValid(MyLogicalRepWorker->relid) && + !isParallelApplyWorker(MyLogicalRepWorker)); +} How about using !am_tablesync_worker() instead of !OidIsValid(MyLogicalRepWorker->relid) for better readability? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com