On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada <masahiko.saw...@2ndquadrant.com> wrote: > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > > > Amit Kapila <amit.kapil...@gmail.com> writes: > > > >> It is possible that MAXALIGN stuff is playing a role here and or the > > > >> background transaction stuff. I think if we go with the idea of > > > >> testing spill_txns and spill_count being positive then the results > > > >> will be stable. I'll write a patch for that. > > > > > > Here's our first failure on a MAXALIGN-8 machine: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08 > > > > > > So this is just plain not stable. It is odd though. I can > > > easily think of mechanisms that would cause the WAL volume > > > to occasionally be *more* than the "typical" case. What > > > would cause it to be *less*, if MAXALIGN is ruled out? > > > > > > > The original theory I have given above [1] which is an interleaved > > autovacumm transaction. Let me try to explain in a bit more detail. > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > in buildfarm will happen when autovacuum transaction happens after 80% > > or more of the Insert is done. > > > > In such a situation we will start decoding 'Insert' first and need to > > spill multiple times due to the amount of changes (more than threshold > > logical_decoding_work_mem) and then before we encounter Commit of > > transaction that performed Insert (and probably some more changes from > > that transaction) we will encounter a small transaction (autovacuum > > transaction). The decode of that small transaction will send the > > stats collected till now which will lead to the problem shown in > > buildfarm. > > That seems a possible scenario. > > I think probably this also explains the reason why spill_count > slightly varied and spill_txns was still 1. The spill_count value > depends on how much the process spilled out transactions before > encountering the commit of an autovacuum transaction. Since we have > the spill statistics per reorder buffer, not per transactions, it's > possible. >
Okay, here is an updated version (changed some comments) of the patch I posted some time back. What do you think? I have tested this on both Windows and Linux environments. I think it is a bit tricky to reproduce the exact scenario so if you are fine we can push this and check or let me know if you any better idea? -- With Regards, Amit Kapila.
fix_stats_test_2.patch
Description: Binary data