On Mon, 20 Jun 2022 at 07:02, Michael Paquier <mich...@paquier.xyz> wrote: > > On Sat, Jun 11, 2022 at 09:25:48PM +0200, Matthias van de Meent wrote: > > Did you initiate a new cluster or otherwise skip the invalid record > > you generated when running the instance based on master? It seems to > > me you're trying to replay the invalid record (len > MaxAllocSize), > > and this patch does not try to fix that issue. This patch just tries > > to forbid emitting records larger than MaxAllocSize, as per the check > > in XLogRecordAssemble, so that we wont emit unreadable records into > > the WAL anymore. > > > > Reading unreadable records still won't be possible, but that's also > > not something I'm trying to fix. > > As long as you cannot generate such WAL records that should be fine as > wAL is not reused across upgrades, so this kind of restriction is a > no-brainer on HEAD. The back-patching argument is not on the table > anyway, as some of the routine signatures change with the unsigned > arguments, because of those safety checks.
The signature change is mostly ornamental, see attached v4.backpatch. The main reason for changing the signature is to make sure nobody can provide a negative value, but it's not important to the patch. > > + if (unlikely(num_rdatas >= max_rdatas) || > + unlikely(!AllocSizeIsValid((uint64) mainrdata_len + (uint64) len))) > + XLogErrorDataLimitExceeded(); > [...] > +inline void > +XLogErrorDataLimitExceeded() > +{ > + elog(ERROR, "too much WAL data"); > +} > The three checks are different, OK.. They each check slightly different things, but with the same error. In RegisterData, it checks that the data can still be allocated and does not overflow the register, in RegisterBlock it checks that the total length of data registered to the block does not exceed the max value of XLogRecordBlockHeader->data_length. I've updated the comments above the checks so that this distinction is more clear. > Note that static is missing. Fixed in attached v4.patch > + if (unlikely(!AllocSizeIsValid(total_len))) > + XLogErrorDataLimitExceeded(); > Rather than a single check at the end of XLogRecordAssemble(), you'd > better look after that each time total_len is added up? I was doing so previously, but there were some good arguments against that: - Performance of XLogRecordAssemble should be impacted as little as possible. XLogRecordAssemble is in many hot paths, and it is highly unlikely this check will be hit, because nobody else has previously reported this issue. Any check, however unlikely, will add some overhead, so removing check counts reduces overhead of this patch. - The user or system is unlikely to care about which specific check was hit, and only needs to care _that_ the check was hit. An attached debugger will be able to debug the internals of the xlog machinery and find out the specific reasons for the error, but I see no specific reason why the specific reason would need to be reported to the connection. Kind regards, Matthias van de Meent
v4-0001-Add-protections-in-xlog-record-APIs-against-large.patch
Description: Binary data
v4-0001-Add-protections-in-xlog-record-APIs-against-large.backpatch
Description: Binary data