Hi, xlogreader.c has a pointer overflow bug, as revealed by the combination of -fsanitize=undefined -m32, the new 039_end_of_wal.pl test and Robert's incremental backup patch[1]. The bad code tests whether an object could fit using something like base + size <= end, which can be converted to something like size <= end - base to avoid the overflow. See experimental fix patch, attached.
I took a while to follow up because I wanted to understand exactly why it doesn't break in practice despite being that way since v15. I think I have it now: 1. In the case of a bad/garbage size at end-of-WAL, the following-page checks will fail first before anything bad happens as a result of the overflow. 2. In the case of a real oversized record on current 64 bit architectures including amd64, aarch64, power and riscv64, the pointer can't really overflow anyway because the virtual address space is < 64 bit, typically around 48, and record lengths are 32 bit. 3. In the case of the 32 bit kernels I looked at including Linux, FreeBSD and cousins, Solaris and Windows the top 1GB plus a bit more of virtual address space is reserved for system use*, so I think a real oversized record shouldn't be able to overflow the pointer there either. A 64 bit kernel running a 32 bit process could run into trouble, though :-(. Those don't need to block out that high memory segment. You'd need to have the WAL buffer in that address range and decode large enough real WAL records and then things could break badly. I guess 32/64 configurations must be rare these days outside developers testing 32 bit code, and that is what happened here (ie CI); and with some minor tweaks to the test it can be reached without Robert's patch too. There may of course be other more exotic systems that could break, but I don't know specifically what. TLDR; this is a horrible bug, but all damage seems to be averted on "normal" systems. The best thing I can say about all this is that the new test found a bug, and the fix seems straightforward. I will study and test this some more, but wanted to share what I have so far. (*I think the old 32 bit macOS kernels might have been an exception to this pattern but 32 bit kernels and even processes are history there.) [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGLCuW_CE9nDAbUNV40G2FkpY_kcPZkaORyVBVive8FQHQ%40mail.gmail.com#d0d00ca5cc3f756656466adc9f2ec186
0001-Fix-pointer-overflow-in-xlogreader.c.patch
Description: Binary data