On Fri, Jun 05, 2020 at 05:20:52PM +0200, Tomas Vondra wrote:
...
which is not particularly great, I guess. There however seems to be
something wrong, because with the prefetching I see this in the log:
prefetch:
2020-06-05 02:47:25.970 CEST 1591318045.970 [22961] LOG: recovery no
longer prefetching: unexpected pageaddr 108/E8000000 in log segment
0000000100000108000000FF, offset 0
prefetch2:
2020-06-05 15:29:23.895 CEST 1591363763.895 [26676] LOG: recovery no
longer prefetching: unexpected pageaddr 108/E8000000 in log segment
000000010000010900000001, offset 0
Which seems pretty suspicious, but I have no idea what's wrong. I admit
the archive/restore commands are a bit hacky, but I've only seen this
with prefetching on the SATA storage, while all other cases seem to be
just fine. I haven't seen in on NVME (which processes much more WAL).
And the SATA baseline (no prefetching) also worked fine.
Moreover, the pageaddr value is the same in both cases, but the WAL
segments are different (but just one segment apart). Seems strange.
I suspected it might be due to a somewhat hackish restore_command that
prefetches some of the WAL segments, so I tried again with a much
simpler restore_command - essentially just:
restore_command = 'cp /archive/%f %p.tmp && mv %p.tmp %p'
which I think should be fine for testing purposes. And I got this:
LOG: recovery no longer prefetching: unexpected pageaddr 108/57000000
in log segment 0000000100000108000000FF, offset 0
LOG: restored log file "0000000100000108000000FF" from archive
which is the same segment as in the earlier examples, but with a
different pageaddr value. Of course, there's no such pageaddr in the WAL
segment (and recovery of that segment succeeds).
So I think there's something broken ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services