Re: data corruption hazard in reorderbuffer.c

Mark Dilger Thu, 15 Jul 2021 16:08:07 -0700

> On Jul 15, 2021, at 3:32 PM, Tomas Vondra <tomas.von...@enterprisedb.com> 
> wrote:
> 
> I think it's mostly futile to list all the possible issues this might have 
> caused - if you skip arbitrary decoded changes, that can trigger pretty much 
> any bug in reorder buffer. But those bugs can be triggered by various other 
> issues, of course.

I thought that at first, too, but when I searched for bug reports with the 
given error messages, they all had to do with logical replication or logical 
decoding.  That seems a bit fishy to me.  If these bugs were coming from all 
over the system, why would that be so?

> It's hard to say what was the cause, but the "logic" bugs are probably 
> permanent, while the issues triggered by I/O probably disappear after a 
> restart?

If you mean "logic" bugs like passing an invalid file descriptor to close(), 
then yes, those would be permanent, but I don't believe we have any bugs of 
that kind.

The consequences of I/O errors are not going to go away after restart.  On the 
subscriber side, if logical replication has replayed less than a full 
transaction worth of changes without raising an error, the corruption will be 
permanent.

> That being said, I agree this seems like an issue and we should not ignore 
> I/O errors.

I agree.

> I'd bet other places using transient files (like sorting or hashagg spilling 
> to disk) has the same issue, although in that case the impact is likely 
> limited to a single query.

I'm not so convinced.  In most places, the result is ignored for close()-type 
operations only when the prior operation failed and we're closing the handle in 
preparation for raising an error.  There are a small number of other places, 
such as pg_import_system_collations and perform_base_backup, which ignore the 
result from closing a handle, but those are reading from the handle, not 
writing to it, so the situation is not comparable.

I believe the oversight in reorderbuffer.c really is a special case.

> I wonder if sync before the close is an appropriate solution, though. It 
> seems rather expensive, and those files are meant to be "temporary" (i.e. we 
> don't keep them over restart). So maybe we could ensure the consistency is a 
> cheaper way - perhaps tracking some sort of checksum for each file, or 
> something like that?

I'm open to suggestions of what to do, but thinking of these files as temporary 
may be what misleads developers to believe they don't have to be treated 
carefully.  The code was first committed in 2014 and as far as I am aware 
nobody else has complained about this before.  In some part that might be 
because CloseTemporaryFile() is less familiar than close() to most developers, 
and they may be assuming that it contains its own error handling and just don't 
realize that it returns an error code just like regular close() does.

The point of the reorder buffer is to sort the changes from a single 
transaction so that they can be replayed in order.  The files used for the 
sorting are temporary, but there is nothing temporary about the failure to 
include all the changes in the files, as that will have permanent consequences 
for whoever replays them.  If lucky, the attempt to replay the changes will 
abort because they don't make sense, but I've demonstrated to my own 
satisfaction that nothing prevents silent data loss if the failure to write 
changes happens to destroy a complete rather than partial change.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: data corruption hazard in reorderbuffer.c

Reply via email to