Hello, At Fri, 2 Feb 2018 19:52:02 -0300, Claudio Freire <klaussfre...@gmail.com> wrote in <cagtbqpainqsnjc8y4w82ubtapsvsqrrg++yei5wre1mfe2i...@mail.gmail.com> > On Thu, Jan 25, 2018 at 6:21 PM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: > > On Fri, Jan 26, 2018 at 9:38 AM, Claudio Freire <klaussfre...@gmail.com> > > wrote: > >> I had the tests running in a loop all day long, and I cannot reproduce > >> that variance. > >> > >> Can you share your steps to reproduce it, including configure flags? > > > > Here are two build logs where it failed: > > > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/332968819 > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/332592511 > > > > Here's one where it succeeded: > > > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/333139855 > > > > The full build script used is: > > > > ./configure --enable-debug --enable-cassert --enable-coverage > > --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap > > --with-icu && make -j4 all contrib docs && make -Otarget -j3 > > check-world > > > > This is a virtualised 4 core system. I wonder if "make -Otarget -j3 > > check-world" creates enough load on it to produce some weird timing > > effect that you don't see on your development system. > > I can't reproduce it, not even with the same build script.
I had the same error by "make -j3 check-world" but only twice from many trials. > It's starting to look like a timing effect indeed. It seems to be truncation skip, maybe caused by concurrent autovacuum. See lazy_truncate_heap() for details. Updates of pg_stat_*_tables can be delayed so looking it also can fail. Even though I haven't looked the patch closer, the "SELECT pg_relation_size()" doesn't seem to give something meaningful anyway. > I get a similar effect if there's an active snapshot in another > session while vacuum runs. I don't know how the test suite ends up in > that situation, but it seems to be the case. > > How do you suggest we go about fixing this? The test in question is > important, I've caught actual bugs in the implementation with it, > because it checks that vacuum effectively frees up space. > > I'm thinking this vacuum test could be put on its own parallel group > perhaps? Since I can't reproduce it, I can't know whether that will > fix it, but it seems sensible. regards, -- Kyotaro Horiguchi NTT Open Source Software Center