Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Anthony Iliopoulos
Hi Robert, On Tue, Apr 10, 2018 at 11:15:46AM -0400, Robert Haas wrote: > On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund wrote: > > Let's lower the pitchforks a bit here. Obviously a grand rewrite is > > absurd, as is some of the proposed ways this is all supposed to > > work. But I think the cas

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 12:37:03PM -0700, Andres Freund wrote: > > > On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos > wrote: > > >I honestly do not expect that keeping around the failed pages will > >be an acceptable change for most kernels, and as such th

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 12:29:16PM -0700, Andres Freund wrote: > On 2018-04-09 21:26:21 +0200, Anthony Iliopoulos wrote: > > What about having buffered IO with implied fsync() atomicity via > > O_SYNC? > > You're kidding, right? We could also just add sleep(30)'

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 04:29:36PM +0100, Greg Stark wrote: > Honestly I don't think there's *any* way to use the current interface > to implement reliable operation. Even that embedded database using a > single process and keeping every file open all the time (which means > file descriptor limits

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: > > We already have dirty_bytes and dirty_background_bytes, for example. I > don't see why there couldn't be another limit defining how much dirty > data to allow before blocking writes altogether. I'm sure it's not that > simple, but yo

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 08:16:38PM +0800, Craig Ringer wrote: > > I'd like a middle ground where the kernel lets us register our interest and > tells us if it lost something, without us having to keep eight million FDs > open for some long period. "Tell us about anything that happens under > pgdat

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote: > On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: > > > What you seem to be asking for is the capability of dropping > > buffers over the (kernel) fence and idemnifying the application > > from any furthe

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 09:45:40AM +0100, Greg Stark wrote: > On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: > > On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: > >> On 8 April 2018 at 04:27, Craig Ringer wrote: > >> > On 8 April 2018 at 10:16, Thom

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Anthony Iliopoulos
On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: > On 8 April 2018 at 04:27, Craig Ringer wrote: > > On 8 April 2018 at 10:16, Thomas Munro > > wrote: > > > > If the kernel does writeback in the middle, how on earth is it supposed to > > know we expect to reopen the file and check back

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-03 Thread Anthony Iliopoulos
On Tue, Apr 03, 2018 at 03:37:30PM +0100, Greg Stark wrote: > On 3 April 2018 at 14:36, Anthony Iliopoulos wrote: > > > If EIO persists between invocations until explicitly cleared, a process > > cannot possibly make any decision as to if it should clear the error > >

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-03 Thread Anthony Iliopoulos
On Tue, Apr 03, 2018 at 12:26:05PM +0100, Greg Stark wrote: > On 3 April 2018 at 11:35, Anthony Iliopoulos wrote: > > Hi Robert, > > > > Fully agree, and the errseq_t fixes have dealt exactly with the issue > > of making sure that the error is reported to all file desc

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-03 Thread Anthony Iliopoulos
Hi Robert, On Mon, Apr 02, 2018 at 10:54:26PM -0400, Robert Haas wrote: > On Mon, Apr 2, 2018 at 2:53 PM, Anthony Iliopoulos wrote: > > Given precisely that the dirty pages which cannot been written-out are > > practically thrown away, the semantics of fsync() (after the 4.13

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-02 Thread Anthony Iliopoulos
Hi Stephen, On Mon, Apr 02, 2018 at 04:58:08PM -0400, Stephen Frost wrote: > > fsync() doesn't reflect the status of given pages, however, it reflects > the status of the file descriptor, and as such the file, on which it's > called. This notion that fsync() is actually only responsible for the >

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-02 Thread Anthony Iliopoulos
On Mon, Apr 02, 2018 at 12:32:45PM -0700, Andres Freund wrote: > On 2018-04-02 20:53:20 +0200, Anthony Iliopoulos wrote: > > On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote: > > > Throwing away the dirty pages *and* persisting the error seems a lot > > > mo

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-02 Thread Anthony Iliopoulos
On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote: > Hi, > > On 2018-04-01 03:14:46 +0200, Anthony Iliopoulos wrote: > > On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote: > > > Craig Ringer writes: > > > > So we should just use the bi

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-02 Thread Anthony Iliopoulos
On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote: > Craig Ringer writes: > > So we should just use the big hammer here. > > And bitch, loudly and publicly, about how broken this kernel behavior is. > If we make enough of a stink maybe it'll get fixed. It is not likely to be fixed (beyond

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-02 Thread Anthony Iliopoulos
On Sun, Apr 01, 2018 at 12:13:09AM +0800, Craig Ringer wrote: >On 31 March 2018 at 21:24, Anthony Iliopoulos <[1]ail...@altatus.com> >wrote: > > On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote: > > > >>

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-02 Thread Anthony Iliopoulos
On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote: > >> Yeah, I see why you want to PANIC. > > > > Indeed. Even doing that leaves question marks about all the kernel > > versions before v4.13, which at this point is pretty much everything > > out there, not even detecting this reliably.