Re: Postgres, fsync, and OSs (specifically linux)

2018-11-18 Thread Thomas Munro
On Fri, Nov 9, 2018 at 9:03 AM Thomas Munro wrote: > On Fri, Nov 9, 2018 at 7:07 AM Robert Haas wrote: > > On Wed, Nov 7, 2018 at 9:41 PM Thomas Munro > > wrote: > > > My plan is do a round of testing and review of this stuff next week > > > once the dust is settled on the current minor releases

Re: Postgres, fsync, and OSs (specifically linux)

2018-11-17 Thread Thomas Munro
On Fri, Nov 9, 2018 at 9:06 AM Robert Haas wrote: > On Thu, Nov 8, 2018 at 3:04 PM Thomas Munro > wrote: > > My reasoning for choosing bms_join() is that it cannot fail, assuming > > the heap is not corrupted. It simply ORs the two bit-strings into > > whichever is the longer input string, and f

Re: Postgres, fsync, and OSs (specifically linux)

2018-11-08 Thread Robert Haas
On Thu, Nov 8, 2018 at 3:04 PM Thomas Munro wrote: > My reasoning for choosing bms_join() is that it cannot fail, assuming > the heap is not corrupted. It simply ORs the two bit-strings into > whichever is the longer input string, and frees the shorter input > string. (In an earlier version I us

Re: Postgres, fsync, and OSs (specifically linux)

2018-11-08 Thread Thomas Munro
On Fri, Nov 9, 2018 at 7:07 AM Robert Haas wrote: > On Wed, Nov 7, 2018 at 9:41 PM Thomas Munro > wrote: > > My plan is do a round of testing and review of this stuff next week > > once the dust is settled on the current minor releases (including > > fixing a few typos I just spotted and some wor

Re: Postgres, fsync, and OSs (specifically linux)

2018-11-08 Thread Robert Haas
On Wed, Nov 7, 2018 at 9:41 PM Thomas Munro wrote: > My plan is do a round of testing and review of this stuff next week > once the dust is settled on the current minor releases (including > fixing a few typos I just spotted and some word-smithing). All going > well, I will then push the resultin

Re: Postgres, fsync, and OSs (specifically linux)

2018-11-07 Thread Thomas Munro
On Fri, Oct 19, 2018 at 6:42 PM Craig Ringer wrote: > On Fri, 19 Oct 2018 at 07:27, Thomas Munro > wrote: >> 2. I am +1 on back-patching Craig's PANIC-on-failure logic. Doing >> nothing is not an option I like. I have some feedback and changes to >> propose though; see attached. > > Thanks ve

Re: Postgres, fsync, and OSs (specifically linux)

2018-10-18 Thread Craig Ringer
On Fri, 19 Oct 2018 at 07:27, Thomas Munro wrote: > > 2. I am +1 on back-patching Craig's PANIC-on-failure logic. Doing > nothing is not an option I like. I have some feedback and changes to > propose though; see attached. > Thanks very much for the work on reviewing and revising this. > I

Re: Postgres, fsync, and OSs (specifically linux)

2018-10-18 Thread Thomas Munro
Hello hackers, Let's try to get this issue resolved. Here is my position on the course of action we should take in back-branches: 1. I am -1 on back-patching the fd-transfer code. It's a significant change, and even when sufficiently debugged (I don't think it's there yet), we have no idea wha

Re: Postgres, fsync, and OSs (specifically linux)

2018-10-01 Thread Thomas Munro
On Fri, Sep 28, 2018 at 9:37 PM Thomas Munro wrote: > The other patches in this tarball are all as posted already, but are > now rebased and assembled in one place. Also pushed to > https://github.com/macdice/postgres/tree/fsyncgate . Here is a new version that fixes an assertion failure during

Re: Postgres, fsync, and OSs (specifically linux)

2018-09-28 Thread Thomas Munro
On Fri, Sep 28, 2018 at 9:37 PM Thomas Munro wrote: > The 0013 patch also fixes a mistake in the 0010 patch: it is not > appropriate to call CFI() while waiting to notify the checkpointer of > a dirty segment, because then ^C could cause the following checkpoint > not to flush dirty data. (Though

Re: Postgres, fsync, and OSs (specifically linux)

2018-09-28 Thread Thomas Munro
On Thu, Aug 30, 2018 at 2:44 PM Craig Ringer wrote: > On 15 August 2018 at 07:32, Thomas Munro > wrote: >> I will soon post some more fix-up patches that add EXEC_BACKEND >> support, Windows support, and a counting scheme to fix the timing >> issue that I mentioned in my first review. I will pr

Re: Postgres, fsync, and OSs (specifically linux)

2018-08-29 Thread Craig Ringer
On 15 August 2018 at 07:32, Thomas Munro wrote: > On Wed, Aug 15, 2018 at 11:08 AM, Asim R P wrote: > > I was looking at the commitfest entry for feature > > (https://commitfest.postgresql.org/19/1639/) for the most recent list > > of patches to try out. The list doesn't look correct/complete.

Re: Postgres, fsync, and OSs (specifically linux)

2018-08-14 Thread Thomas Munro
On Wed, Aug 15, 2018 at 11:08 AM, Asim R P wrote: > I was looking at the commitfest entry for feature > (https://commitfest.postgresql.org/19/1639/) for the most recent list > of patches to try out. The list doesn't look correct/complete. Can > someone please check? Hi Asim, This thread is a b

Re: Postgres, fsync, and OSs (specifically linux)

2018-08-14 Thread Asim R P
I was looking at the commitfest entry for feature (https://commitfest.postgresql.org/19/1639/) for the most recent list of patches to try out. The list doesn't look correct/complete. Can someone please check? Asim

Re: Postgres, fsync, and OSs (specifically linux)

2018-08-10 Thread Thomas Munro
On Sun, Jul 29, 2018 at 6:14 PM, Thomas Munro wrote: > As a way of poking this thread, here are some more thoughts. I am keen to move this forward, not only because it is something we need to get fixed, but also because I have some other pending patches in this area and I want this sorted out fir

Re: Postgres, fsync, and OSs (specifically linux)

2018-07-28 Thread Thomas Munro
On Thu, Jun 14, 2018 at 5:30 PM, Thomas Munro wrote: > On Wed, May 23, 2018 at 8:02 AM, Andres Freund wrote: >> [patches] > > A more interesting question is: how will you cap the number file > handles you send through that pipe? On that OS you call > DuplicateHandle() to fling handles into anoth

Re: Postgres, fsync, and OSs (specifically linux)

2018-07-18 Thread Thomas Munro
On Thu, Jul 19, 2018 at 7:23 AM, Robert Haas wrote: > 2. I don't like promote_ioerr_to_panic() very much, partly because the > same pattern gets repeated over and over, and partly because it would > be awkwardly-named if we discovered that another 2 or 3 errors needed > similar handling (or some o

Re: Postgres, fsync, and OSs (specifically linux)

2018-07-18 Thread Robert Haas
On Tue, May 29, 2018 at 4:53 AM, Craig Ringer wrote: > I've revised the fsync patch with the cleanups discussed and gone through > the close() calls. > > AFAICS either socket closes, temp file closes, or (for WAL) already PANIC on > close. It's mainly fd.c that needs amendment. Which I've done pe

Re: Postgres, fsync, and OSs (specifically linux)

2018-06-13 Thread Thomas Munro
On Wed, May 23, 2018 at 8:02 AM, Andres Freund wrote: > [patches] Hi Andres, Obviously there is more work to be done here but the basic idea in your clone-fd-checkpointer branch as of today seems OK to me. I think Craig and I both had similar ideas (sending file descriptors that have an old eno

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-29 Thread Craig Ringer
On 21 May 2018 at 15:50, Craig Ringer wrote: > On 21 May 2018 at 12:57, Craig Ringer wrote: > >> On 18 May 2018 at 00:44, Andres Freund wrote: >> >>> Hi, >>> >>> On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: >>> > while ((src = (RewriteMappingFile *) >>> hash_seq_search(&seq_status))

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Andres Freund
On 2018-05-22 21:58:06 +0200, Dmitry Dolgov wrote: > > On 22 May 2018 at 20:59, Andres Freund wrote: > > On 2018-05-22 20:54:46 +0200, Dmitry Dolgov wrote: > > Huh? Checkpointer was in SendFsyncRequest()? Coudl you share the > > backtrace? > > Well, that's what I've got from gdb: > #3 0x000

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Dmitry Dolgov
> On 22 May 2018 at 20:59, Andres Freund wrote: > On 2018-05-22 20:54:46 +0200, Dmitry Dolgov wrote: >> > On 22 May 2018 at 18:47, Andres Freund wrote: >> > On 2018-05-22 08:57:18 -0700, Andres Freund wrote: >> >> Hi, >> >> >> >> >> >> On 2018-05-22 17:37:28 +0200, Dmitry Dolgov wrote: >> >> > Th

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Andres Freund
On 2018-05-22 20:54:46 +0200, Dmitry Dolgov wrote: > > On 22 May 2018 at 18:47, Andres Freund wrote: > > On 2018-05-22 08:57:18 -0700, Andres Freund wrote: > >> Hi, > >> > >> > >> On 2018-05-22 17:37:28 +0200, Dmitry Dolgov wrote: > >> > Thanks for the patch. Out of curiosity I tried to play with

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Dmitry Dolgov
> On 22 May 2018 at 18:47, Andres Freund wrote: > On 2018-05-22 08:57:18 -0700, Andres Freund wrote: >> Hi, >> >> >> On 2018-05-22 17:37:28 +0200, Dmitry Dolgov wrote: >> > Thanks for the patch. Out of curiosity I tried to play with it a bit. >> >> Thanks. >> >> >> > `pgbench -i -s 100` actually h

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Andres Freund
On 2018-05-22 08:57:18 -0700, Andres Freund wrote: > Hi, > > > On 2018-05-22 17:37:28 +0200, Dmitry Dolgov wrote: > > Thanks for the patch. Out of curiosity I tried to play with it a bit. > > Thanks. > > > > `pgbench -i -s 100` actually hang on my machine, because the > > copy process ended up

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Andres Freund
Hi, On 2018-05-22 17:37:28 +0200, Dmitry Dolgov wrote: > Thanks for the patch. Out of curiosity I tried to play with it a bit. Thanks. > `pgbench -i -s 100` actually hang on my machine, because the > copy process ended up with waiting after `pg_uds_send_with_fd` > had Hm, that had worked at s

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-22 Thread Dmitry Dolgov
> On 22 May 2018 at 03:08, Andres Freund wrote: > On 2018-05-19 18:12:52 +1200, Thomas Munro wrote: >> On Sat, May 19, 2018 at 4:51 PM, Thomas Munro >> wrote: >> > Next, make check hangs in initdb on both of my pet OSes when md.c >> > raises an error (fseek fails) and we raise and error while rai

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-21 Thread Andres Freund
On 2018-05-19 18:12:52 +1200, Thomas Munro wrote: > On Sat, May 19, 2018 at 4:51 PM, Thomas Munro > wrote: > > Next, make check hangs in initdb on both of my pet OSes when md.c > > raises an error (fseek fails) and we raise and error while raising and > > error and deadlock against ourselves. Bac

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-21 Thread Craig Ringer
On 21 May 2018 at 12:57, Craig Ringer wrote: > On 18 May 2018 at 00:44, Andres Freund wrote: > >> Hi, >> >> On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: >> > while ((src = (RewriteMappingFile *) >> hash_seq_search(&seq_status)) != NULL) >> > { >> > if (FileSync(src

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-20 Thread Craig Ringer
On 18 May 2018 at 00:44, Andres Freund wrote: > Hi, > > On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: > > while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) > != NULL) > > { > > if (FileSync(src->vfd, WAIT_EVENT_LOGICAL_REWRITE_SYNC) > != 0) > > -

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-20 Thread Ashutosh Bapat
On Sat, May 19, 2018 at 6:31 AM, Stephen Frost wrote: > Greetings, > > * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: >> At 2018-05-18 20:27:57 -0400, sfr...@snowman.net wrote: >> > >> > I don't agree with the general notion that we can't have a function >> > which handles the complicated bits

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Thomas Munro
On Sat, May 19, 2018 at 4:51 PM, Thomas Munro wrote: > Next, make check hangs in initdb on both of my pet OSes when md.c > raises an error (fseek fails) and we raise and error while raising and > error and deadlock against ourselves. Backtrace here: > https://paste.debian.net/1025336/ Ah, I see

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Thomas Munro
On Sat, May 19, 2018 at 9:03 AM, Andres Freund wrote: > I've written a patch series for this. Took me quite a bit longer than I > had hoped. Great. > I plan to switch to working on something else for a day or two next > week, and then polish this further. I'd greatly appreciate comments till > t

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Stephen Frost
Greetings, * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: > At 2018-05-18 20:27:57 -0400, sfr...@snowman.net wrote: > > > > I don't agree with the general notion that we can't have a function > > which handles the complicated bits about the kind of error because > > someone grep'ing the source

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Abhijit Menon-Sen
At 2018-05-18 20:27:57 -0400, sfr...@snowman.net wrote: > > I don't agree with the general notion that we can't have a function > which handles the complicated bits about the kind of error because > someone grep'ing the source for PANIC might have to do an additional > lookup. Or we could just nam

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Stephen Frost
Greetings, * Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote: > On Thu, May 17, 2018 at 11:45 PM, Robert Haas wrote: > > On Thu, May 17, 2018 at 12:44 PM, Andres Freund wrote: > >> Hi, > >> > >> On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: > >>> while ((src = (RewriteMappingFil

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Andres Freund
Hi, On 2018-04-27 15:28:42 -0700, Andres Freund wrote: > == Potential Postgres Changes == > > Several operating systems / file systems behave differently (See > e.g. [2], thanks Thomas) than we expected. Even the discussed changes to > e.g. linux don't get to where we thought we are. There's obvi

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-18 Thread Ashutosh Bapat
On Thu, May 17, 2018 at 11:45 PM, Robert Haas wrote: > On Thu, May 17, 2018 at 12:44 PM, Andres Freund wrote: >> Hi, >> >> On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: >>> while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != >>> NULL) >>> { >>> if

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 12:44 PM, Andres Freund wrote: > Hi, > > On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: >> while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != >> NULL) >> { >> if (FileSync(src->vfd, WAIT_EVENT_LOGICAL_REWRITE_SYNC) != 0) >> -

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-17 Thread Andres Freund
Hi, On 2018-05-10 09:50:03 +0800, Craig Ringer wrote: > while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != > NULL) > { > if (FileSync(src->vfd, WAIT_EVENT_LOGICAL_REWRITE_SYNC) != 0) > - ereport(ERROR, > + erepor

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-09 Thread Craig Ringer
On 10 May 2018 at 06:55, Andres Freund wrote: > Do you have a patchset including that? I didn't find anything after a > quick search... There was an earlier rev on the other thread but without msync checks. I've added panic for msync in the attached, and tidied the comments a bit. I didn't ad

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-09 Thread Andres Freund
On 2018-05-01 09:38:03 +0800, Craig Ringer wrote: > On 1 May 2018 at 00:09, Andres Freund wrote: > > > It's not. Only SYNC_FILE_RANGE_WAIT_{BEFORE,AFTER} eat errors. Which > > seems sensible, because they could be considered data integrity > > operations. > > Ah, I misread that. Thankyou. > > >

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-01 Thread Andres Freund
Hi, On 2018-04-27 15:28:42 -0700, Andres Freund wrote: > I went to LSF/MM 2018 to discuss [0] and related issues. Overall I'd say > it was a very productive discussion. I'll first try to recap the > current situation, updated with knowledge I gained. Secondly I'll try to > discuss the kernel chan

Re: Postgres, fsync, and OSs (specifically linux)

2018-05-01 Thread Catalin Iacob
On Sat, Apr 28, 2018 at 12:28 AM, Andres Freund wrote: > Before linux v4.13 errors in kernel writeback would be reported at most > once, without a guarantee that that'd happen (IIUC memory pressure could > lead to the relevant information being evicted) - but it was pretty > likely. After v4.13 (

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-30 Thread Craig Ringer
On 1 May 2018 at 00:09, Andres Freund wrote: > It's not. Only SYNC_FILE_RANGE_WAIT_{BEFORE,AFTER} eat errors. Which > seems sensible, because they could be considered data integrity > operations. Ah, I misread that. Thankyou. >> I'm very suspicious about the safety of the msync() path too. > >

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-30 Thread Andres Freund
On 2018-04-30 13:03:24 +0800, Craig Ringer wrote: > Hrm, something else that just came up. On 9.6+ we use sync_file_range. > It's surely going to eat errors: > > rc = sync_file_range(fd, offset, nbytes, > SYNC_FILE_RANGE_WRITE); > > /* don't error out,

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Craig Ringer
Hrm, something else that just came up. On 9.6+ we use sync_file_range. It's surely going to eat errors: rc = sync_file_range(fd, offset, nbytes, SYNC_FILE_RANGE_WRITE); /* don't error out, this is just a performance optimization */ if (rc != 0)

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Craig Ringer
> Not quite sure what you're getting at with "a file we don't fsync" - if > we don't, we don't care about durability anyway, no? Or do you mean > where we fsync in a different process? Right. > Either way, the answer is mostly no: On NFS et al where close() implies > an fsync you'll get the error

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Andres Freund
On 2018-04-30 10:14:23 +0800, Craig Ringer wrote: > Meanwhile, do we know if, on Linux 4.13+, if we get a buffered write > error due to dirty writeback before we close() a file we don't > fsync(), we'll get the error on close()? Not quite sure what you're getting at with "a file we don't fsync" -

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Craig Ringer
On 30 April 2018 at 09:09, Thomas Munro wrote: > Considering the variety in interpretation and liberties taken, I > wonder if fsync() is underspecified and someone should file an issue > over at http://www.opengroup.org/austin/ about that. All it's going to achieve is adding an "is implementatio

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Thomas Munro
On Sun, Apr 29, 2018 at 1:58 PM, Craig Ringer wrote: > On 28 April 2018 at 23:25, Simon Riggs wrote: >> On 27 April 2018 at 15:28, Andres Freund wrote: >>> While I'm a bit concerned adding user-code before a checkpoint, if >>> we'd do it as a shell command it seems pretty reasonable. And use

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Thomas Munro
On Mon, Apr 30, 2018 at 11:02 AM, Thomas Munro wrote: > MySQL: The default is still buffered Someone pulled me up on this off-list: the default is buffered (fsync) on Unix, but it's unbuffered on Windows. That's quite interesting. https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#s

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Thomas Munro
On Sun, Apr 29, 2018 at 10:42 PM, Simon Riggs wrote: > On 28 April 2018 at 09:15, Andres Freund wrote: >> On 2018-04-28 08:25:53 -0700, Simon Riggs wrote: >>> The people I've spoken to so far have encouraged us to continue >>> working with the filesystem layer, offering encouragement of our >>> d

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Simon Riggs
On 28 April 2018 at 09:15, Andres Freund wrote: > Hi, > > On 2018-04-28 08:25:53 -0700, Simon Riggs wrote: >> > - Use direct IO. Due to architectural performance issues in PG and the >> > fact that it'd not be applicable for all installations I don't think >> > this is a reasonable fix for the

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Simon Riggs
On 28 April 2018 at 08:25, Simon Riggs wrote: > On 27 April 2018 at 15:28, Andres Freund wrote: > >> - Add a pre-checkpoint hook that checks for filesystem errors *after* >> fsyncing all the files, but *before* logging the checkpoint completion >> record. Operating systems, filesystems, etc.

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Craig Ringer
On 28 April 2018 at 23:25, Simon Riggs wrote: > On 27 April 2018 at 15:28, Andres Freund wrote: > >> - Add a pre-checkpoint hook that checks for filesystem errors *after* >> fsyncing all the files, but *before* logging the checkpoint completion >> record. Operating systems, filesystems, etc.

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Craig Ringer
On 29 April 2018 at 00:15, Andres Freund wrote: > Hi, > > On 2018-04-28 08:25:53 -0700, Simon Riggs wrote: >> > - Use direct IO. Due to architectural performance issues in PG and the >> > fact that it'd not be applicable for all installations I don't think >> > this is a reasonable fix for the

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Andres Freund
Hi, On 2018-04-28 20:00:25 +0800, Craig Ringer wrote: > On 28 April 2018 at 06:28, Andres Freund wrote: > > The second major type of proposal was using direct-IO. That'd generally > > be a desirable feature, but a) would require some significant changes to > > postgres to be performant, b) isn't

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Andres Freund
Hi, On 2018-04-28 08:25:53 -0700, Simon Riggs wrote: > > - Use direct IO. Due to architectural performance issues in PG and the > > fact that it'd not be applicable for all installations I don't think > > this is a reasonable fix for the issue presented here. Although it's > > independently

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Andres Freund
Hi, On 2018-04-28 17:35:48 +0200, Michael Banck wrote: > This dmesg-checking has been mentioned several times now, but IME > enterprise distributions (or server ops teams?) seem to tighten access > to dmesg and /var/log to non-root users, including postgres. > > Well, or just vanilla Debian stabl

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Andres Freund
Hi, On 2018-04-28 11:10:54 -0400, Stephen Frost wrote: > When we crash-restart, we also go through and clean things up some, no? > Seems like that gives us the potential to end up fixing things ourselves > and allowing the crash-restart to succeed. Sure, there's the potential for that. But it's q

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Michael Banck
Hi, On Sat, Apr 28, 2018 at 11:21:20AM -0400, Stephen Frost wrote: > * Craig Ringer (cr...@2ndquadrant.com) wrote: > > On 28 April 2018 at 06:28, Andres Freund wrote: > > > - Add a pre-checkpoint hook that checks for filesystem errors *after* > > > fsyncing all the files, but *before* logging t

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Simon Riggs
On 27 April 2018 at 15:28, Andres Freund wrote: > - Add a pre-checkpoint hook that checks for filesystem errors *after* > fsyncing all the files, but *before* logging the checkpoint completion > record. Operating systems, filesystems, etc. all log the error format > differently, but for lar

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Stephen Frost
Greetings, * Craig Ringer (cr...@2ndquadrant.com) wrote: > On 28 April 2018 at 06:28, Andres Freund wrote: > > - Add a pre-checkpoint hook that checks for filesystem errors *after* > > fsyncing all the files, but *before* logging the checkpoint completion > > record. Operating systems, filesy

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Stephen Frost
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-04-27 19:38:30 -0400, Bruce Momjian wrote: > > On Fri, Apr 27, 2018 at 04:10:43PM -0700, Andres Freund wrote: > > > On 2018-04-27 19:04:47 -0400, Bruce Momjian wrote: > > > > On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wr

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Craig Ringer
On 28 April 2018 at 06:28, Andres Freund wrote: > Hi, > > I thought I'd send this separately from [0] as the issue has become more > general than what was mentioned in that thread, and it went off into > various weeds. Thanks very much for going and for the great summary. > - Actually marking fi

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-27 Thread Andres Freund
On 2018-04-27 19:38:30 -0400, Bruce Momjian wrote: > On Fri, Apr 27, 2018 at 04:10:43PM -0700, Andres Freund wrote: > > Hi, > > > > On 2018-04-27 19:04:47 -0400, Bruce Momjian wrote: > > > On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wrote: > > > > - We need more aggressive error checki

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-27 Thread Bruce Momjian
On Fri, Apr 27, 2018 at 04:10:43PM -0700, Andres Freund wrote: > Hi, > > On 2018-04-27 19:04:47 -0400, Bruce Momjian wrote: > > On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wrote: > > > - We need more aggressive error checking on close(), for ENOSPC and > > > EIO. In both cases afaics

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-27 Thread Andres Freund
Hi, On 2018-04-27 19:04:47 -0400, Bruce Momjian wrote: > On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wrote: > > - We need more aggressive error checking on close(), for ENOSPC and > > EIO. In both cases afaics we'll have to trigger a crash recovery > > cycle. It's entirely possible

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-27 Thread Bruce Momjian
On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wrote: > - We need more aggressive error checking on close(), for ENOSPC and > EIO. In both cases afaics we'll have to trigger a crash recovery > cycle. It's entirely possible to end up in a loop on NFS etc, but I > don't think there's a

Postgres, fsync, and OSs (specifically linux)

2018-04-27 Thread Andres Freund
Hi, I thought I'd send this separately from [0] as the issue has become more general than what was mentioned in that thread, and it went off into various weeds. I went to LSF/MM 2018 to discuss [0] and related issues. Overall I'd say it was a very productive discussion. I'll first try to recap t