Re: Postgres, fsync, and OSs (specifically linux)

Thomas Munro Sat, 17 Nov 2018 18:27:02 -0800

On Fri, Nov 9, 2018 at 9:06 AM Robert Haas <[email protected]> wrote:
> On Thu, Nov 8, 2018 at 3:04 PM Thomas Munro
> <[email protected]> wrote:
> > My reasoning for choosing bms_join() is that it cannot fail, assuming
> > the heap is not corrupted.  It simply ORs the two bit-strings into
> > whichever is the longer input string, and frees the shorter input
> > string.  (In an earlier version I used bms_union(), this function's
> > non-destructive sibling, but then realised that it could fail to
> > allocate() causing us to lose track of a 1 bit).
>
> Oh, OK.  I was assuming it was allocating.


I did some more testing using throw-away fault injection patch 0003.
I found one extra problem:  fsync_fname() needed data_sync_elevel()
treatment, because it is used in eg CheckPointCLOG().

With data_sync_retry = on, if you update a row, touch
/tmp/FileSync_EIO and try to checkpoint then the checkpoint fails, and
the cluster keeps running.  Future checkpoint attempts report the same
error about the same file, showing that patch 0001 works (we didn't
forget about the dirty file).  Then rm /tmp/FileSync_EIO, and the next
checkpoint should succeed.

With data_sync_retry = off (the default), the same test produces a
PANIC, showing that patch 0002 works.

It's similar if you touch /tmp/pg_sync_EIO instead.  That shows that
cases like fsync_fname("pg_xact") also cause PANIC when
data_sync_retry = off, but it hides the bug that 0001 fixes when
data_sync_retry = on, hence my desire to test the two different fault
injection points.

I think these patches are looking good now.  If I don't spot any other
problems or hear any objections, I will commit them tomorrow-ish.

-- 
Thomas Munro
http://www.enterprisedb.com

0001-Don-t-forget-about-failed-fsync-requests-v5.patch
Description: Binary data

0002-PANIC-on-fsync-failure-v5.patch
Description: Binary data

0003-fsync-fault-injection.-For-testing-only-v5.patch
Description: Binary data

Re: Postgres, fsync, and OSs (specifically linux)

Reply via email to