On Wed, Dec 06, 2000 at 06:53:37PM -0600, Bruce Guenter wrote:
> On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote:
> > On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
> > >
> > > I don't know how pgsql does it, but the only safe way I know of
> > > is to include an "end" marker after each record.
> >
> > An "end" marker is not sufficient, unless all writes are done in
> > one-sector units with an fsync between, and the drive buffering
> > is turned off.
>
> That's why an end marker must follow all valid records. When you write
> records, you don't touch the marker, and add an end marker to the end of
> the records you've written. After writing and syncing the records, you
> rewrite the end marker to indicate that the data following it is valid,
> and sync again. There is no state in that sequence in which partially-
> written data could be confused as real data, assuming either your drives
> aren't doing write-back caching or you have a UPS, and fsync doesn't
> return until the drives return success.
That requires an extra out-of-sequence write.
> > > Any other way I've seen discussed (here and elsewhere) either
> > > - Assume that a CRC is a guarantee.
> >
> > We are already assuming a CRC is a guarantee.
> >
> > The drive computes a CRC for each sector, and if the CRC is OK the
> > drive is happy. CRC errors within the drive are quite frequent, and
> > the drive re-reads when a bad CRC comes up.
>
> The kind of data failures that a CRC is guaranteed to catch (N-bit
> errors) are almost precisely those that a mis-read on a hardware sector
> would cause.
They catch a single mis-read, but not necessarily the quite likely
double mis-read.
> > > ... A CRC would be a good addition to
> > > help ensure the data wasn't broken by flakey drive firmware, but
> > > doesn't guarantee consistency.
> > No, a CRC would be a good addition to compensate for sector write
> > reordering, which is done both by the OS and by the drive, even for
> > "atomic" writes.
>
> But it doesn't guarantee consistency, even in that case. There is a
> possibility (however small) that the random data that was located in
> the sectors before the write will match the CRC.
Generally, there are no guarantees, only reasonable expectations. A
64-bit CRC would give sufficient confidence without the out-of-sequence
write, and also detect corruption from any source including power outage.
(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)
Nathan Myers
[EMAIL PROTECTED]