Andres Freund <and...@anarazel.de> writes: > I do see that the LSN that ends up on the page is the same across a few runs > of the test on serinus. Which presumably differs between different > animals. Surprised that it's this predictable - but I guess the run is short > enough that there's no variation due to autovacuum, checkpoints etc.
Uh-huh. I'm not surprised that it's repeatable on a given animal. What remains to be explained: 1. Why'd it start failing now? I'm guessing that ce95c5437 *was* the culprit after all, by slightly changing the amount of catalog data written during initdb, and thus moving the initial LSN. 2. Why just these two animals? If initial LSN is the critical thing, then the results of "locale -a" would affect it, so platform dependence is hardly surprising ... but I'd have thought that all the animals on that host would use the same initial set of collations. OTOH, I see petalura and pogona just fell over too. Do you have some of those animals --with-icu and others not? > 16bit checksums for the win. Yay :-( As for a fix, would damaging more of the page help? I guess it'd just move around the one-in-64K chance of failure. Maybe we have to intentionally corrupt (e.g. invert) the checksum field specifically. regards, tom lane