Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

Matthias van de Meent Mon, 20 Jun 2022 02:02:20 -0700

On Mon, 20 Jun 2022 at 07:02, Michael Paquier <[email protected]> wrote:
>
> On Sat, Jun 11, 2022 at 09:25:48PM +0200, Matthias van de Meent wrote:
> > Did you initiate a new cluster or otherwise skip the invalid record
> > you generated when running the instance based on master? It seems to
> > me you're trying to replay the invalid record (len > MaxAllocSize),
> > and this patch does not try to fix that issue. This patch just tries
> > to forbid emitting records larger than MaxAllocSize, as per the check
> > in XLogRecordAssemble, so that we wont emit unreadable records into
> > the WAL anymore.
> >
> > Reading unreadable records still won't be possible, but that's also
> > not something I'm trying to fix.
>
> As long as you cannot generate such WAL records that should be fine as
> wAL is not reused across upgrades, so this kind of restriction is a
> no-brainer on HEAD.  The back-patching argument is not on the table
> anyway, as some of the routine signatures change with the unsigned
> arguments, because of those safety checks.


The signature change is mostly ornamental, see attached v4.backpatch.
The main reason for changing the signature is to make sure nobody can
provide a negative value, but it's not important to the patch.

>
> +   if (unlikely(num_rdatas >= max_rdatas) ||
> +       unlikely(!AllocSizeIsValid((uint64) mainrdata_len + (uint64) len)))
> +       XLogErrorDataLimitExceeded();
> [...]
> +inline void
> +XLogErrorDataLimitExceeded()
> +{
> +   elog(ERROR, "too much WAL data");
> +}
> The three checks are different, OK..

They each check slightly different things, but with the same error. In
RegisterData, it checks that the data can still be allocated and does
not overflow the register, in RegisterBlock it checks that the total
length of data registered to the block does not exceed the max value
of XLogRecordBlockHeader->data_length. I've updated the comments above
the checks so that this distinction is more clear.

> Note that static is missing.

Fixed in attached v4.patch

> +   if (unlikely(!AllocSizeIsValid(total_len)))
> +       XLogErrorDataLimitExceeded();
> Rather than a single check at the end of XLogRecordAssemble(), you'd
> better look after that each time total_len is added up?

I was doing so previously, but there were some good arguments against that:

- Performance of XLogRecordAssemble should be impacted as little as
possible. XLogRecordAssemble is in many hot paths, and it is highly
unlikely this check will be hit, because nobody else has previously
reported this issue. Any check, however unlikely, will add some
overhead, so removing check counts reduces overhead of this patch.

- The user or system is unlikely to care about which specific check
was hit, and only needs to care _that_ the check was hit. An attached
debugger will be able to debug the internals of the xlog machinery and
find out the specific reasons for the error, but I see no specific
reason why the specific reason would need to be reported to the
connection.

Kind regards,

Matthias van de Meent

v4-0001-Add-protections-in-xlog-record-APIs-against-large.patch
Description: Binary data

v4-0001-Add-protections-in-xlog-record-APIs-against-large.backpatch
Description: Binary data

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

Reply via email to