Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-29 Thread Craig Ringer
For archive readers, this thread is continued as https://www.postgresql.org/message-id/20180427222842.in2e4mibx45zd...@alap3.anarazel.de and there's a follow-up lwn article at https://lwn.net/Articles/752613/ too.

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-26 Thread Thomas Munro
On Tue, Apr 24, 2018 at 12:09 PM, Bruce Momjian wrote: > On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote: >> Hi, >> >> On 2018-03-28 10:23:46 +0800, Craig Ringer wrote: >> > TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at >> > least on Linux. When fsync()

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-25 Thread Craig Ringer
On 24 April 2018 at 04:14, Andres Freund wrote: > I'm LSF/MM to discuss future behaviour of linux here, but that's how it > is right now. Interim LWN.net coverage of that can be found here: https://lwn.net/Articles/752613/ -- Craig Ringer http://www.2ndQuadrant.com/ Postgr

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-23 Thread Bruce Momjian
On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote: > Hi, > > On 2018-03-28 10:23:46 +0800, Craig Ringer wrote: > > TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at > > least on Linux. When fsync() returns success it means "all writes since the > > last fsync

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-23 Thread Andres Freund
Hi, On 2018-03-28 10:23:46 +0800, Craig Ringer wrote: > TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at > least on Linux. When fsync() returns success it means "all writes since the > last fsync have hit disk" but we assume it means "all writes since the last > SUCCESSF

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-21 Thread Gasper Zejn
Just for the record, I tried the test case with ZFS on Ubuntu 17.10 host with ZFS on Linux 0.6.5.11. ZFS does not swallow the fsync error, but the system does not handle the error nicely: the test case program hangs on fsync, the load jumps up and there's a bunch of z_wr_iss and z_null_int kernel

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-20 Thread Bruce Momjian
On Wed, Apr 18, 2018 at 08:45:53PM +0800, Craig Ringer wrote: > wrOn 18 April 2018 at 19:46, Bruce Momjian wrote: > > > So, if sync mode passes the write to NFS, and NFS pre-reserves write > > space, and throws an error on reservation failure, that means that NFS > > will not corrupt a cluster on

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Craig Ringer
On 19 April 2018 at 07:31, Mark Kirkwood wrote: > On 19/04/18 00:45, Craig Ringer wrote: > >> >> I guarantee you that when you create a 100GB EBS volume on AWS EC2, >> you don't get 100GB of storage preallocated. AWS are probably pretty >> good about not running out of backing store, though. >> >>

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Mark Kirkwood
On 19/04/18 00:45, Craig Ringer wrote: I guarantee you that when you create a 100GB EBS volume on AWS EC2, you don't get 100GB of storage preallocated. AWS are probably pretty good about not running out of backing store, though. Some db folks (used to anyway) advise dd'ing to your freshly a

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Craig Ringer
wrOn 18 April 2018 at 19:46, Bruce Momjian wrote: > So, if sync mode passes the write to NFS, and NFS pre-reserves write > space, and throws an error on reservation failure, that means that NFS > will not corrupt a cluster on out-of-space errors. Yeah. I need to verify in a concrete test case.

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Bruce Momjian
On Tue, Apr 17, 2018 at 02:41:42PM -0700, Andres Freund wrote: > On 2018-04-17 17:32:45 -0400, Bruce Momjian wrote: > > On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: > > > That doesn't seem like a very practical way. It's better than nothing, > > > of course, but I wonder how would

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Bruce Momjian
On Wed, Apr 18, 2018 at 06:04:30PM +0800, Craig Ringer wrote: > On 18 April 2018 at 05:19, Bruce Momjian wrote: > > On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote: > >> On 10 April 2018 at 02:59, Craig Ringer wrote: > >> > >> > Nitpick: In most cases the kernel reserves disk space imm

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Craig Ringer
On 10 April 2018 at 20:15, Craig Ringer wrote: > On 10 April 2018 at 14:10, Michael Paquier wrote: > >> Well, I think that there is place for improving reporting of failure >> in file_utils.c for frontends, or at worst have an exit() for any kind >> of critical failures equivalent to a PANIC. > >

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Craig Ringer
On 18 April 2018 at 05:19, Bruce Momjian wrote: > On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote: >> On 10 April 2018 at 02:59, Craig Ringer wrote: >> >> > Nitpick: In most cases the kernel reserves disk space immediately, >> > before returning from write(). NFS seems to be the main e

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-18 Thread Bruce Momjian
On Tue, Apr 17, 2018 at 02:34:53PM -0700, Andres Freund wrote: > On 2018-04-17 17:29:17 -0400, Bruce Momjian wrote: > > Also, if we are relying on WAL, we have to make sure WAL is actually > > safe with fsync, and I am betting only the O_DIRECT methods actually > > are safe: > > > > #wal_sync_

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-17 Thread Bruce Momjian
On Mon, Apr 9, 2018 at 12:25:33PM -0700, Peter Geoghegan wrote: > On Mon, Apr 9, 2018 at 12:13 PM, Andres Freund wrote: > > Let's lower the pitchforks a bit here. Obviously a grand rewrite is > > absurd, as is some of the proposed ways this is all supposed to > > work. But I think the case we're

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-17 Thread Andres Freund
On 2018-04-17 17:32:45 -0400, Bruce Momjian wrote: > On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: > > That doesn't seem like a very practical way. It's better than nothing, > > of course, but I wonder how would that work with containers (where I > > think you may not have access to

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-17 Thread Andres Freund
On 2018-04-17 17:29:17 -0400, Bruce Momjian wrote: > Also, if we are relying on WAL, we have to make sure WAL is actually > safe with fsync, and I am betting only the O_DIRECT methods actually > are safe: > > #wal_sync_method = fsync# the default is the first > option >

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-17 Thread Bruce Momjian
On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: > On 04/09/2018 12:29 AM, Bruce Momjian wrote: > > > > An crazy idea would be to have a daemon that checks the logs and > > stops Postgres when it seems something wrong. > > > > That doesn't seem like a very practical way. It's better

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-17 Thread Bruce Momjian
On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: > On 04/09/2018 12:29 AM, Bruce Momjian wrote: > > > > An crazy idea would be to have a daemon that checks the logs and > > stops Postgres when it seems something wrong. > > > > That doesn't seem like a very practical way. It's better

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-17 Thread Bruce Momjian
On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote: > On 10 April 2018 at 02:59, Craig Ringer wrote: > > > Nitpick: In most cases the kernel reserves disk space immediately, > > before returning from write(). NFS seems to be the main exception > > here. > > I'm kind of puzzled by this. S

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-11 Thread Jonathan Corbet
On Wed, 11 Apr 2018 07:29:09 -0700 Andres Freund wrote: > If that room can be found, I might be able to make it. Being in SF, I'm > probably the physically closest PG dev involved in the discussion. OK, I've dropped the PC a note; hopefully you'll be hearing from them. jon

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-11 Thread Andres Freund
Hi, On 2018-04-11 06:05:27 -0600, Jonathan Corbet wrote: > The event is April 23-25 in Park City, Utah. I bet that room could be > found for somebody from the postgresql community, should there be > somebody who would like to represent the group on this issue. Let me > know if an introduction or

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-11 Thread Greg Stark
On 10 April 2018 at 19:58, Joshua D. Drake wrote: > You can't unmount the file system --- that requires writing out all of the > pages > such that the dirty bit is turned off. I always wondered why Linux didn't implement umount -f. It's been in BSD since forever and it's a major annoyance that i

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-11 Thread Jonathan Corbet
On Tue, 10 Apr 2018 17:40:05 +0200 Anthony Iliopoulos wrote: > LSF/MM'18 is upcoming and it would > have been the perfect opportunity but it's past the CFP deadline. > It may still worth contacting the organizers to bring forward > the issue, and see if there is a chance to have someone from > Pg

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Joshua D. Drake
On 04/10/2018 12:51 PM, Joshua D. Drake wrote: -hackers, The thread is picking up over on the ext4 list. They don't update their archives as often as we do, so I can't link to the discussion. What would be the preferred method of sharing the info? Thanks to Anthony for this link: http://lis

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Joshua D. Drake
-hackers, The thread is picking up over on the ext4 list. They don't update their archives as often as we do, so I can't link to the discussion. What would be the preferred method of sharing the info? Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc ***

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Joshua D. Drake
-hackers, I reached out to the Linux ext4 devs, here is ty...@mit.edu response: """ Hi Joshua, This isn't actually an ext4 issue, but a long-standing VFS/MM issue. There are going to be multiple opinions about what the right thing to do. I'll try to give as unbiased a description as possible,

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Greg Stark
On 10 April 2018 at 02:59, Craig Ringer wrote: > Nitpick: In most cases the kernel reserves disk space immediately, > before returning from write(). NFS seems to be the main exception > here. I'm kind of puzzled by this. Surely NFS servers store the data in the filesystem using write(2) or the i

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Greg Stark
On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: > On Mon, Apr 09, 2018 at 09:45:40AM +0100, Greg Stark wrote: >> On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: > To make things a bit simpler, let us focus on EIO for the moment. > The contract between the block layer and the filesystem l

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Anthony Iliopoulos
Hi Robert, On Tue, Apr 10, 2018 at 11:15:46AM -0400, Robert Haas wrote: > On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund wrote: > > Let's lower the pitchforks a bit here. Obviously a grand rewrite is > > absurd, as is some of the proposed ways this is all supposed to > > work. But I think the cas

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Robert Haas
On Tue, Apr 10, 2018 at 1:37 AM, Craig Ringer wrote: > ... but *only if they hit an I/O error* or they're on a FS that > doesn't reserve space and hit ENOSPC. > > It still does 99% of the job. It still flushes all buffers to > persistent storage and maintains write ordering. It may not detect and

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Robert Haas
On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund wrote: > Let's lower the pitchforks a bit here. Obviously a grand rewrite is > absurd, as is some of the proposed ways this is all supposed to > work. But I think the case we're discussing is much closer to a near > irresolvable corner case than anyth

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-10 Thread Craig Ringer
On 10 April 2018 at 14:10, Michael Paquier wrote: > Well, I think that there is place for improving reporting of failure > in file_utils.c for frontends, or at worst have an exit() for any kind > of critical failures equivalent to a PANIC. Yup. In the mean time, speaking of PANIC, here's the fi

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Michael Paquier
On Tue, Apr 10, 2018 at 01:37:19PM +0800, Craig Ringer wrote: > On 10 April 2018 at 13:04, Michael Paquier wrote: >> And pg_basebackup. And pg_dump. And pg_dumpall. Anything using initdb >> -S or fsync_pgdata would enter in those waters. > > ... but *only if they hit an I/O error* or they're o

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Craig Ringer
On 10 April 2018 at 13:04, Michael Paquier wrote: > On Mon, Apr 09, 2018 at 03:02:11PM -0400, Robert Haas wrote: >> Another consequence of this behavior that initdb -S is never reliable, >> so pg_rewind's use of it doesn't actually fix the problem it was >> intended to solve. It also means that i

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Michael Paquier
On Mon, Apr 09, 2018 at 03:02:11PM -0400, Robert Haas wrote: > Another consequence of this behavior that initdb -S is never reliable, > so pg_rewind's use of it doesn't actually fix the problem it was > intended to solve. It also means that initdb itself isn't crash-safe, > since the data file cha

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Craig Ringer
On 10 April 2018 at 08:41, Andreas Karlsson wrote: > On 04/09/2018 02:16 PM, Craig Ringer wrote: >> >> I'd like a middle ground where the kernel lets us register our interest >> and tells us if it lost something, without us having to keep eight million >> FDs open for some long period. "Tell us ab

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
On April 9, 2018 6:59:03 PM PDT, Craig Ringer wrote: >On 10 April 2018 at 04:37, Andres Freund wrote: >> Hi, >> >> On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote: >>> Maybe. I'd certainly prefer automated recovery from an temporary I/O >>> issues (like full disk on thin-provisioning) without

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Craig Ringer
On 10 April 2018 at 04:37, Andres Freund wrote: > Hi, > > On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote: >> Maybe. I'd certainly prefer automated recovery from an temporary I/O >> issues (like full disk on thin-provisioning) without the database >> crashing and restarting. But I'm not sure it's

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Craig Ringer
On 10 April 2018 at 04:25, Mark Dilger wrote: > I was reading this thread up until now as meaning that the standby could > receive corrupt WAL data and become corrupted. Yes, it can, but not directly through the first error. What can happen is that we think a block got written when it didn't.

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Thomas Munro
On Tue, Apr 10, 2018 at 1:44 PM, Craig Ringer wrote: > On 10 April 2018 at 03:59, Andres Freund wrote: >> I don't think that's as hard as some people argued in this thread. We >> could very well open a pipe in postmaster with the write end open in >> each subprocess, and the read end open only i

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Craig Ringer
On 10 April 2018 at 03:59, Andres Freund wrote: > On 2018-04-09 14:41:19 -0500, Justin Pryzby wrote: >> On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote: >> > You could make the argument that it's OK to forget if the entire file >> > system goes away. But actually, why is that ok? >> >

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andreas Karlsson
On 04/09/2018 02:16 PM, Craig Ringer wrote: I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style p

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Thomas Munro
On Tue, Apr 10, 2018 at 10:33 AM, Thomas Munro wrote: > I wonder if anyone can tell us what Windows, AIX and HPUX do here. I created a wiki page to track what we know (or think we know) about fsync() on various operating systems: https://wiki.postgresql.org/wiki/Fsync_Errors If anyone has more

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Thomas Munro
On Tue, Apr 10, 2018 at 2:22 AM, Anthony Iliopoulos wrote: > On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: >> Well, there seem to be kernels that seem to do exactly that already. At >> least that's how I understand what this thread says about FreeBSD and >> Illumos, for example. So

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Mark Dilger
> On Apr 9, 2018, at 2:25 PM, Tomas Vondra wrote: > > > > On 04/09/2018 11:08 PM, Andres Freund wrote: >> Hi, >> >> On 2018-04-09 13:55:29 -0700, Mark Dilger wrote: >>> I can also imagine a master and standby that are similarly provisioned, >>> and thus hit an out of disk error at around the

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 11:08 PM, Andres Freund wrote: > Hi, > > On 2018-04-09 13:55:29 -0700, Mark Dilger wrote: >> I can also imagine a master and standby that are similarly provisioned, >> and thus hit an out of disk error at around the same time, resulting in >> corruption on both, even if not the sam

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
Hi, On 2018-04-09 13:55:29 -0700, Mark Dilger wrote: > I can also imagine a master and standby that are similarly provisioned, > and thus hit an out of disk error at around the same time, resulting in > corruption on both, even if not the same corruption. I think it's a grave mistake conflating E

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Mark Dilger
> On Apr 9, 2018, at 1:43 PM, Tomas Vondra wrote: > > > > On 04/09/2018 10:25 PM, Mark Dilger wrote: >> >>> On Apr 9, 2018, at 12:13 PM, Andres Freund wrote: >>> >>> Hi, >>> >>> On 2018-04-09 15:02:11 -0400, Robert Haas wrote: I think the simplest technological solution to this proble

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 10:25 PM, Mark Dilger wrote: > >> On Apr 9, 2018, at 12:13 PM, Andres Freund wrote: >> >> Hi, >> >> On 2018-04-09 15:02:11 -0400, Robert Haas wrote: >>> I think the simplest technological solution to this problem is to >>> rewrite the entire backend and all supporting processes to

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
Hi, On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote: > Maybe. I'd certainly prefer automated recovery from an temporary I/O > issues (like full disk on thin-provisioning) without the database > crashing and restarting. But I'm not sure it's worth the effort. Oh, I agree on that one. But that's m

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
Hi, On 2018-04-09 13:25:54 -0700, Mark Dilger wrote: > I was reading this thread up until now as meaning that the standby could > receive corrupt WAL data and become corrupted. I don't see that as a real problem here. For one the problematic scenarios shouldn't readily apply, for another WAL is c

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 10:04 PM, Andres Freund wrote: > Hi, > > On 2018-04-09 21:54:05 +0200, Tomas Vondra wrote: >> Isn't the expectation that when a fsync call fails, the next one will >> retry writing the pages in the hope that it succeeds? > > Some people expect that, I personally don't think it's a

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Mark Dilger
> On Apr 9, 2018, at 12:13 PM, Andres Freund wrote: > > Hi, > > On 2018-04-09 15:02:11 -0400, Robert Haas wrote: >> I think the simplest technological solution to this problem is to >> rewrite the entire backend and all supporting processes to use >> O_DIRECT everywhere. To maintain adequate p

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
Hi, On 2018-04-09 21:54:05 +0200, Tomas Vondra wrote: > Isn't the expectation that when a fsync call fails, the next one will > retry writing the pages in the hope that it succeeds? Some people expect that, I personally don't think it's a useful expectation. We should just deal with this by cras

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
On 2018-04-09 14:41:19 -0500, Justin Pryzby wrote: > On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote: > > You could make the argument that it's OK to forget if the entire file > > system goes away. But actually, why is that ok? > > I was going to say that it'd be okay to clear error f

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 09:37 PM, Andres Freund wrote: > > > On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos > wrote: > >> I honestly do not expect that keeping around the failed pages will >> be an acceptable change for most kernels, and as such the >> recommendation >> will probably be to coord

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 12:37:03PM -0700, Andres Freund wrote: > > > On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos > wrote: > > >I honestly do not expect that keeping around the failed pages will > >be an acceptable change for most kernels, and as such the > >recommendation > >will prob

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 04:22 PM, Anthony Iliopoulos wrote: > On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: >> >> We already have dirty_bytes and dirty_background_bytes, for example. I >> don't see why there couldn't be another limit defining how much dirty >> data to allow before blocking wr

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 12:29:16PM -0700, Andres Freund wrote: > On 2018-04-09 21:26:21 +0200, Anthony Iliopoulos wrote: > > What about having buffered IO with implied fsync() atomicity via > > O_SYNC? > > You're kidding, right? We could also just add sleep(30)'s all over the > tree, and hope tha

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Justin Pryzby
On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote: > You could make the argument that it's OK to forget if the entire file > system goes away. But actually, why is that ok? I was going to say that it'd be okay to clear error flag on umount, since any opened files would prevent unmountin

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos wrote: >I honestly do not expect that keeping around the failed pages will >be an acceptable change for most kernels, and as such the >recommendation >will probably be to coordinate in userspace for the fsync(). Why is that required? You cou

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
On 2018-04-09 21:26:21 +0200, Anthony Iliopoulos wrote: > What about having buffered IO with implied fsync() atomicity via > O_SYNC? You're kidding, right? We could also just add sleep(30)'s all over the tree, and hope that that'll solve the problem. There's a reason we don't permanently fsync e

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 04:29:36PM +0100, Greg Stark wrote: > Honestly I don't think there's *any* way to use the current interface > to implement reliable operation. Even that embedded database using a > single process and keeping every file open all the time (which means > file descriptor limits

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Peter Geoghegan
On Mon, Apr 9, 2018 at 12:13 PM, Andres Freund wrote: > Let's lower the pitchforks a bit here. Obviously a grand rewrite is > absurd, as is some of the proposed ways this is all supposed to > work. But I think the case we're discussing is much closer to a near > irresolvable corner case than anyt

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 08:29 PM, Mark Dilger wrote: > >> On Apr 9, 2018, at 10:26 AM, Joshua D. Drake wrote: > >> We have plenty of YEARS of people not noticing this issue > > I disagree. I have noticed this problem, but blamed it on other things. > For over five years now, I have had to tell custome

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Andres Freund
Hi, On 2018-04-09 15:02:11 -0400, Robert Haas wrote: > I think the simplest technological solution to this problem is to > rewrite the entire backend and all supporting processes to use > O_DIRECT everywhere. To maintain adequate performance, we'll have to > write a complete I/O scheduling system

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Robert Haas
On Mon, Apr 9, 2018 at 12:45 PM, Robert Haas wrote: > Ouch. If a process exits -- say, because the user typed \q into psql > -- then you're talking about potentially calling fsync() on a really > large number of file descriptor flushing many gigabytes of data to > disk. And it may well be that y

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Mark Dilger
> On Apr 9, 2018, at 10:26 AM, Joshua D. Drake wrote: > We have plenty of YEARS of people not noticing this issue I disagree. I have noticed this problem, but blamed it on other things. For over five years now, I have had to tell customers not to use thin provisioning, and I have had to add co

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Gasper Zejn
On 09. 04. 2018 15:42, Tomas Vondra wrote: > On 04/09/2018 12:29 AM, Bruce Momjian wrote: >> An crazy idea would be to have a daemon that checks the logs and >> stops Postgres when it seems something wrong. >> > That doesn't seem like a very practical way. It's better than nothing, > of course, but

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Joshua D. Drake
On 04/09/2018 09:45 AM, Robert Haas wrote: On Mon, Apr 9, 2018 at 8:16 AM, Craig Ringer wrote: In the mean time, I propose that we fsync() on close() before we age FDs out of the LRU on backends. Yes, that will hurt throughput and cause stalls, but we don't seem to have many better options. At

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Robert Haas
On Mon, Apr 9, 2018 at 8:16 AM, Craig Ringer wrote: > In the mean time, I propose that we fsync() on close() before we age FDs out > of the LRU on backends. Yes, that will hurt throughput and cause stalls, but > we don't seem to have many better options. At least it'll only flush what we > actuall

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Greg Stark
On 9 April 2018 at 15:22, Anthony Iliopoulos wrote: > On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: >> > Sure, there could be knobs for limiting how much memory such "zombie" > pages may occupy. Not sure how helpful it would be in the long run > since this tends to be highly applic

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: > > We already have dirty_bytes and dirty_background_bytes, for example. I > don't see why there couldn't be another limit defining how much dirty > data to allow before blocking writes altogether. I'm sure it's not that > simple, but yo

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 04:00 AM, Craig Ringer wrote: > On 9 April 2018 at 07:16, Andres Freund > wrote: >   > > > I think the danger presented here is far smaller than some of the > statements in this thread might make one think. > > > Clearly it's not happening a hug

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Abhijit Menon-Sen
At 2018-04-09 15:42:35 +0200, tomas.von...@2ndquadrant.com wrote: > > On 04/09/2018 12:29 AM, Bruce Momjian wrote: > > > > An crazy idea would be to have a daemon that checks the logs and > > stops Postgres when it seems something wrong. > > > > That doesn't seem like a very practical way. Not

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 12:29 AM, Bruce Momjian wrote: > > An crazy idea would be to have a daemon that checks the logs and > stops Postgres when it seems something wrong. > That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Tomas Vondra
On 04/09/2018 02:31 PM, Anthony Iliopoulos wrote: > On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote: >> On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: >> >>> What you seem to be asking for is the capability of dropping >>> buffers over the (kernel) fence and idemnifying the app

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 08:16:38PM +0800, Craig Ringer wrote: > > I'd like a middle ground where the kernel lets us register our interest and > tells us if it lost something, without us having to keep eight million FDs > open for some long period. "Tell us about anything that happens under > pgdat

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote: > On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: > > > What you seem to be asking for is the capability of dropping > > buffers over the (kernel) fence and idemnifying the application > > from any further responsibility, i.e. a

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Craig Ringer
On 9 April 2018 at 18:50, Anthony Iliopoulos wrote: > > There is a clear responsibility of the application to keep > its buffers around until a successful fsync(). The kernels > do report the error (albeit with all the complexities of > dealing with the interface), at which point the application

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Geoff Winkless
On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: > What you seem to be asking for is the capability of dropping > buffers over the (kernel) fence and idemnifying the application > from any further responsibility, i.e. a hard assurance > that either the kernel will persist the pages or it will

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Anthony Iliopoulos
On Mon, Apr 09, 2018 at 09:45:40AM +0100, Greg Stark wrote: > On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: > > On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: > >> On 8 April 2018 at 04:27, Craig Ringer wrote: > >> > On 8 April 2018 at 10:16, Thomas Munro > > > > The question

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Greg Stark
On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: > On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: >> On 8 April 2018 at 04:27, Craig Ringer wrote: >> > On 8 April 2018 at 10:16, Thomas Munro > > The question is, what should the kernel and application do in cases > where this is s

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Craig Ringer
On 9 April 2018 at 10:06, Andres Freund wrote: > > > And in many failure modes there's no reason to expect any data loss at > all, > > like: > > > > * Local disk fills up (seems to be safe already due to space reservation > at > > write() time) > > That definitely should be treated separately. >

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Andres Freund
On 2018-04-09 10:00:41 +0800, Craig Ringer wrote: > I suspect we've written off a fair few issues in the past as "it'd bad > hardware" when actually, the hardware fault was the trigger for a Pg/kernel > interaction bug. And blamed containers for things that weren't really the > container's fault. B

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Craig Ringer
On 9 April 2018 at 07:16, Andres Freund wrote: > > I think the danger presented here is far smaller than some of the > statements in this thread might make one think. Clearly it's not happening a huge amount or we'd have a lot of noise about Pg eating people's data, people shouting about how u

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Andres Freund
Hi, On 2018-04-08 16:27:57 -0700, Christophe Pettus wrote: > > On Apr 8, 2018, at 16:16, Andres Freund wrote: > > We don't panic that way when getting IO > > errors during reads either, and they're more likely to be persistent > > than errors during writes (because remapping on storage layer can

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Craig Ringer
On 9 April 2018 at 06:29, Bruce Momjian wrote: > > I think the big problem is that we don't have any way of stopping > Postgres at the time the kernel reports the errors to the kernel log, so > we are then returning potentially incorrect results and committing > transactions that might be wrong

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Craig Ringer
On 9 April 2018 at 05:28, Christophe Pettus wrote: > > > On Apr 8, 2018, at 14:23, Greg Stark wrote: > > > > They consider dirty filesystem buffers when there's > > hardware failure preventing them from being written "a memory leak". > > That's not an irrational position. File system buffers ar

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Christophe Pettus
> On Apr 8, 2018, at 16:16, Andres Freund wrote: > We don't panic that way when getting IO > errors during reads either, and they're more likely to be persistent > than errors during writes (because remapping on storage layer can fix > issues, but not during reads). There is a distinction to be

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Andres Freund
On 2018-04-08 18:29:16 -0400, Bruce Momjian wrote: > On Sun, Apr 8, 2018 at 09:38:03AM -0700, Christophe Pettus wrote: > > > > > On Apr 8, 2018, at 03:30, Craig Ringer > > > wrote: > > > > > > These are way more likely than bit flips or other storage level > > > corruption, and things that we prev

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Christophe Pettus
> On Apr 8, 2018, at 15:29, Bruce Momjian wrote: > I think the big problem is that we don't have any way of stopping > Postgres at the time the kernel reports the errors to the kernel log, so > we are then returning potentially incorrect results and committing > transactions that might be wrong o

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Bruce Momjian
On Sun, Apr 8, 2018 at 09:38:03AM -0700, Christophe Pettus wrote: > > > On Apr 8, 2018, at 03:30, Craig Ringer > > wrote: > > > > These are way more likely than bit flips or other storage level > > corruption, and things that we previously expected to detect and > > fail gracefully for. > > This i

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Anthony Iliopoulos
On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: > On 8 April 2018 at 04:27, Craig Ringer wrote: > > On 8 April 2018 at 10:16, Thomas Munro > > wrote: > > > > If the kernel does writeback in the middle, how on earth is it supposed to > > know we expect to reopen the file and check back

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Christophe Pettus
> On Apr 8, 2018, at 14:23, Greg Stark wrote: > > They consider dirty filesystem buffers when there's > hardware failure preventing them from being written "a memory leak". That's not an irrational position. File system buffers are *not* dedicated memory for file system caching; they're being

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Greg Stark
On 8 April 2018 at 04:27, Craig Ringer wrote: > On 8 April 2018 at 10:16, Thomas Munro > wrote: > > If the kernel does writeback in the middle, how on earth is it supposed to > know we expect to reopen the file and check back later? > > Should it just remember "this file had an error" forever, an

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Christophe Pettus
> On Apr 8, 2018, at 03:30, Craig Ringer wrote: > > These are way more likely than bit flips or other storage level corruption, > and things that we previously expected to detect and fail gracefully for. This is definitely bad, and it explains a few otherwise-inexplicable corruption issues we

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-08 Thread Craig Ringer
On 8 April 2018 at 17:41, Andreas Karlsson wrote: > On 04/08/2018 05:27 AM, Craig Ringer wrote:> More below, but here's an > idea #5: decide InnoDB has the right idea, and > >> go to using a single massive blob file, or a few giant blobs. >> > > FYI: MySQL has by default one file per table these

  1   2   >