On Sat, Nov 27, 2021 at 12:34 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > One thing that's not clear to me is what happened to the reasons why > this feature was reverted in the PG14 cycle?
Reasons for reverting: 1. A bug in commit 323cbe7c, "Remove read_page callback from XLogReader.". I couldn't easily revert just that piece. This new version doesn't depend on that change anymore, to try to keep things simple. (That particular bug has been fixed in a newer version of that patch[1], which I still think was a good idea incidentally.) 2. A bug where allocation for large records happened before validation. Concretely, you can see that this patch does XLogReadRecordAlloc() after validating the header (usually, same as master), but commit f003d9f8 did it first. (Though Andres pointed out[2] that more work is needed on that to make that logic more robust, and I'm keen to look into that, but that's independent of this work). 3. A wild goose chase for bugs on Tom Lane's antique 32 bit PPC machine. Tom eventually reproduced it with the patches reverted, which seemed to exonerate them but didn't leave a good feeling: what was happening, and why did the patches hugely increase the likelihood of the failure mode? I have no new information on that, but I know that several people spent a huge amount of time and effort trying to reproduce it on various types of systems, as did I, so despite not reaching a conclusion of a bug, this certainly contributed to a feeling that the patch had run out of steam for the 14 cycle. This week I'll have another crack at getting that TAP test I proposed that runs the regression tests with a streaming replica to work on Windows. That does approximately what Tom was doing when he saw problem #3, which I'd like to have as standard across the build farm. [1] https://www.postgresql.org/message-id/20211007.172820.1874635561738958207.horikyota.ntt%40gmail.com [2] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg%40alap3.anarazel.de