On Mon, Jul 22, 2024 at 1:17 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 11:48 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> > I'm a little suspicious
> > of using it for tests that merely take an unreasonable amount of
> > time --- to me, that indicates laziness on the part of the test
> > author.
>
> Laziness would have been not bothering to develop a TAP test for this
> at all. Going to the trouble of creating one and not being able to
> make it as fast or as stable as everybody would like is just being
> human.
>
> I never quite know what to do about TAP testing for issues like this.
> Ideally, we want a test case that runs quickly, is highly stable, is
> perfectly sensitive to the bug being fixed, and has a reasonable
> likelihood of being sensitive to future bugs of the same ilk. But such
> a test case need not exist, and even if it does, it need not be the
> case that any of us are able to find it. Or maybe finding it is
> possible but will take an unreasonable amount of time: if it took a
> committer six months to come up with such a test case for this bug,
> would that be worth it, or just overkill? I'd say overkill: I'd rather
> have that committer working on other stuff than spending six months
> trying to craft the perfect test case for a bug that's already fixed.
>
> Also, this particular bug seems to require a very specific combination
> of circumstances in order to trigger it. So the test gets complicated.
> As mentioned, that makes it harder to get the test case fast and
> stable, but it also reduces the chances that the test case will ever
> find anything. I don't think that this will be the last time we make a
> mistake around VACUUM's xmin handling, but the next mistake may well
> require an equally baroque but *different* setup to cause a problem. I
> hate to come to the conclusion that we just shouldn't test for this,
> but I don't think it's fair to send Melanie off on a wild goose chase
> looking for a perfect test case that may not realistically exist,
> either.

So, I've just gone through all the test failures on master and 17 for
mamba, gull, mereswine, and copperhead. I wanted to confirm that the
test was always failing for the same reason and also if it had any
failures pre-TIDStore.

We've only run tests with this commit on some of the back branches for
some of these animals. Of those, I don't see any failures so far. So,
it seems the test instability is just related to trying to get
multiple passes of index vacuuming reliably with TIDStore.

AFAICT, all the 32bit machine failures are timeouts waiting for the
standby to catch up (mamba, gull, merswine). Unfortunately, the
failures on copperhead (a 64 bit machine) are because we don't
actually succeed in triggering a second vacuum pass. This would not be
fixed by a longer timeout.

Because of this, I'm inclined to revert the test on 17 and master to
avoid distracting folks committing other work and seeing those animals
go red.

I wonder if Sawada-san or John have a test case minimally reproducing
a case needing multiple index vacuuming rounds. You can't do it with
my example and just more dead rows per page. If you just increase the
number of dead tuples, it doesn't increase the size of the TIDStore
unless those dead tuples are at different offsets. And I couldn't find
DDL which would cause the TIDStore to be > 1MB without using a low
fill-factor and many rows. Additionally, the fact that the same number
of rows does not trigger the multiple passes on two different 64bit
machines worries me and makes me think that we will struggle to
trigger these conditions without overshooting the minimum by quite a
bit.

- Melanie


Reply via email to