Re: prevent immature WAL streaming

Andres Freund Wed, 13 Oct 2021 11:31:54 -0700

Hi,

On 2021-10-13 11:03:39 -0700, Andres Freund wrote:
> It seems that 026_overwrite_contrecord.pl test often fails under valgrind. I
> first thought that related failures on skink were due to me migrating the
> animal to a new host (and then running out of space due to a mistake in ccache
> config). But it happened again after I fixed those, and I just reproduced the
> issue locally.
> 
> It's a bit odd that we didn't start to see these failures immediately, but
> only in the last few days. I'd blame skink being migrated to a new home,
> except that I can see the issue locally.
> 
> FWIW, the way skink runs all postgres instances through valgrind is by
> replacing the postgres binary. Here's my local version of that:
> 
> andres@awork3:~/build/postgres/dev-assert/vpath$ cat 
> tmp_install/home/andres/build/postgres/dev-assert/install/bin/postgres
> #!/bin/bash
> 
> exec /usr/bin/valgrind \
>     --quiet \
>     --error-exitcode=128 \
>     --suppressions=/home/andres/src/postgresql/src/tools/valgrind.supp \
>     --trace-children=yes --track-origins=yes --read-var-info=no \
>     --leak-check=no \
>     --run-libc-freeres=no \
>     --vgdb=no \
>     --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END \
>     
> /home/andres/build/postgres/dev-assert/vpath/tmp_install/home/andres/build/postgres/dev-assert/install/bin/postgres.orig
>  \
>     "$@"
> 
> make -C src/test/recovery/ check PROVE_FLAGS='-v' 
> PROVE_TESTS='t/026_overwrite_contrecord.pl' NO_TEMP_INSTALL=1
> ...
> 
> not ok 1 - 000000010000000000000002 differs from 000000010000000000000002
> 
> #   Failed test '000000010000000000000002 differs from 
> 000000010000000000000002'


I added LSNs to the error message:
not ok 1 - 000000010000000000000002 (0/2002350) differs from 
000000010000000000000002 (0/2099600)

It appears that the problem is that inbetween the determination of
rows_walsize the insert LSN moved to the next segment separately from the
insertion, presumably due to autovacuum/analayze or such.
<retries, with log_autovacuum_min_duration_statement=0, 
log_min_duration_statement=0>


2021-10-13 11:23:25.659 PDT [1491455] 026_overwrite_contrecord.pl LOG:  
statement: insert into filler select * from generate_series(1, 1000)
2021-10-13 11:23:26.467 PDT [1491455] 026_overwrite_contrecord.pl LOG:  
duration: 861.112 ms
2021-10-13 11:23:27.055 PDT [1491458] 026_overwrite_contrecord.pl LOG:  
statement: select pg_current_wal_insert_lsn() - '0/0'
2021-10-13 11:23:27.357 PDT [1491458] 026_overwrite_contrecord.pl LOG:  
duration: 347.888 ms
2021-10-13 11:23:27.980 PDT [1491461] 026_overwrite_contrecord.pl LOG:  
statement: WITH setting AS (
          SELECT setting::int AS wal_segsize
            FROM pg_settings WHERE name = 'wal_segment_size'
        )
        INSERT INTO filler
        SELECT g FROM setting,
          generate_series(1, 1000 * (wal_segsize - 
((pg_current_wal_insert_lsn() - '0/0') % wal_segsize)) / 64232) g
2021-10-13 11:24:25.173 PDT [1491550] LOG:  automatic analyze of table 
"postgres.public.filler"
        avg read rate: 3.185 MB/s, avg write rate: 0.039 MB/s
        buffer usage: 96 hits, 731 misses, 9 dirtied
        system usage: CPU: user: 1.79 s, system: 0.00 s, elapsed: 1.81 s
2021-10-13 11:24:26.255 PDT [1491461] 026_overwrite_contrecord.pl LOG:  
duration: 58360.811 ms
2021-10-13 11:24:26.857 PDT [1491557] 026_overwrite_contrecord.pl LOG:  
statement: SELECT pg_current_wal_insert_lsn()
2021-10-13 11:24:27.120 PDT [1491557] 026_overwrite_contrecord.pl LOG:  
duration: 300.562 ms


Hm. I guess we can disable autovac. But that's not a great solution, there
might be WAL files due to catalog access etc too.

Seems like it might be worth doing the "filling" of the segment with a loop in
a DO block instead, where the end condition is to be within some distance to
the end of the segment? With plenty headroom?


Another thing: filling a segment by inserting lots of very tiny rows is pretty
expensive. Can't we use something a bit wider? Perhaps even emit_message?

Greetings,

Andres Freund

Re: prevent immature WAL streaming

Reply via email to