Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Andres Freund
On 2014-01-23 13:56:49 +0100, Simon Riggs wrote: > IMHO we need to resolve the deadlock inherent in the > disk-full/WALlock-up/checkpoint situation. My view is that can be > solved in a similar way to the way the buffer pin deadlock was > resolved for Hot Standby. I don't think that approach works

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Simon Riggs
On 23 January 2014 01:19, Jim Nasby wrote: > On 1/21/14, 6:46 PM, Andres Freund wrote: >> >> On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: >>> >>> >On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund >>> > wrote: > >I personally think this isn't worth complicating the code for. >>> >>>

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Andres Freund
On 2014-01-22 18:19:25 -0600, Jim Nasby wrote: > On 1/21/14, 6:46 PM, Andres Freund wrote: > >On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: > >>>On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund > >>>wrote: > >I personally think this isn't worth complicating the code for. > >>> > >>>You'

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Jim Nasby
On 1/21/14, 6:46 PM, Andres Freund wrote: On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: >On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund wrote: > >I personally think this isn't worth complicating the code for. > >You're probably right. However, I don't see why the bar has to be very >hi

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 14:25, Simon Riggs wrote: > On 22 January 2014 13:14, Heikki Linnakangas wrote: >> On 01/22/2014 02:10 PM, Simon Riggs wrote: >>> >>> As Jeff points out, the blocks being modified would be locked until >>> space is freed up. Which could make other users wait. The code >>> requi

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Tom Lane
Andres Freund writes: > On 2014-01-21 21:42:19 -0500, Tom Lane wrote: >> Uh, what? The behavior I'm talking about is *exactly the same* >> as what happens now. The only change is that the data sent to the >> WAL file is laid out a bit differently, and the replay logic has >> to work harder to re

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Andres Freund
On 2014-01-21 21:42:19 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-01-21 19:45:19 -0500, Tom Lane wrote: > >> I don't think that's a comparable case. Incomplete actions are actions > >> to be taken immediately, and which the replayer then has to complete > >> somehow if it doesn't

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Kevin Grittner
Tom Lane wrote: > Well, PANIC is certainly bad, but what I'm suggesting is that we > just focus on getting that down to ERROR and not worry about > trying to get out of the disk-shortage situation automatically. > Nor do I believe that it's such a good idea to have the database > freeze up until

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 13:14, Heikki Linnakangas wrote: > On 01/22/2014 02:10 PM, Simon Riggs wrote: >> >> As Jeff points out, the blocks being modified would be locked until >> space is freed up. Which could make other users wait. The code >> required to avoid that wait would be complex and not worth

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Heikki Linnakangas
On 01/22/2014 02:10 PM, Simon Riggs wrote: As Jeff points out, the blocks being modified would be locked until space is freed up. Which could make other users wait. The code required to avoid that wait would be complex and not worth any overhead. Checkpoint also acquires the content lock of eve

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 01:30, Tom Lane wrote: > Andres Freund writes: >> How are we supposed to wait while e.g. ProcArrayLock? Aborting >> transactions doesn't work either, that writes abort records which can >> get signficantly large. > > Yeah, that's an interesting point ;-). We can't *either* com

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 01:23, Tom Lane wrote: > Andres Freund writes: >> On 2014-01-21 18:59:13 -0500, Tom Lane wrote: >>> Another thing to think about is whether we couldn't put a hard limit on >>> WAL record size somehow. Multi-megabyte WAL records are an abuse of the >>> design anyway, when you g

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund writes: > On 2014-01-21 19:45:19 -0500, Tom Lane wrote: >> I don't think that's a comparable case. Incomplete actions are actions >> to be taken immediately, and which the replayer then has to complete >> somehow if it doesn't find the rest of the action in the WAL sequence. >> The

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 19:45:19 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-01-21 19:23:57 -0500, Tom Lane wrote: > >> I'm not suggesting that we stop providing that information! I'm just > >> saying that we perhaps don't need to store it all in one WAL record, > >> if instead we put the on

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: > On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund wrote: > > I personally think this isn't worth complicating the code for. > > You're probably right. However, I don't see why the bar has to be very > high when we're considering the trade-off be

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund writes: > On 2014-01-21 19:23:57 -0500, Tom Lane wrote: >> I'm not suggesting that we stop providing that information! I'm just >> saying that we perhaps don't need to store it all in one WAL record, >> if instead we put the onus on WAL replay to be able to reconstruct what >> it ne

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 19:23:57 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-01-21 18:59:13 -0500, Tom Lane wrote: > >> Another thing to think about is whether we couldn't put a hard limit on > >> WAL record size somehow. Multi-megabyte WAL records are an abuse of the > >> design anyway, whe

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Peter Geoghegan
On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund wrote: > I personally think this isn't worth complicating the code for. You're probably right. However, I don't see why the bar has to be very high when we're considering the trade-off between taking some emergency precaution against having a PANIC s

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund writes: > How are we supposed to wait while e.g. ProcArrayLock? Aborting > transactions doesn't work either, that writes abort records which can > get signficantly large. Yeah, that's an interesting point ;-). We can't *either* commit or abort without emitting some WAL, possibly qu

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund writes: > On 2014-01-21 18:59:13 -0500, Tom Lane wrote: >> Another thing to think about is whether we couldn't put a hard limit on >> WAL record size somehow. Multi-megabyte WAL records are an abuse of the >> design anyway, when you get right down to it. So for example maybe we >>

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-22 01:18:36 +0100, Simon Riggs wrote: > > My understanding is that if it runs out of buffer space while in an > > XLogInsert, it will be holding one or more buffer content locks exclusively, > > and unless it can complete the xlog (or scrounge up the info to return that > > buffer to its

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Simon Riggs
On 21 January 2014 23:01, Jeff Janes wrote: > On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane wrote: >> >> Simon Riggs writes: >> > On 6 June 2013 16:00, Heikki Linnakangas >> > wrote: >> >> The current situation is that if you run out of disk space while >> >> writing >> >> WAL, you get a PANIC, and

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 18:59:13 -0500, Tom Lane wrote: > Another thing to think about is whether we couldn't put a hard limit on > WAL record size somehow. Multi-megabyte WAL records are an abuse of the > design anyway, when you get right down to it. So for example maybe we > could split up commit records

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund writes: > On 2014-01-21 18:24:39 -0500, Tom Lane wrote: >> Maybe we could get some mileage out of the fact that very approximate >> techniques would be good enough. For instance, I doubt anyone would bleat >> if the system insisted on having 10MB or even 100MB of future WAL space >>

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 18:24:39 -0500, Tom Lane wrote: > Jeff Janes writes: > > On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane wrote: > >> My preference would be that we simply start failing writes with ERRORs > >> rather than PANICs. I'm not real sure ATM why this has to be a PANIC > >> condition. Probably

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Peter Geoghegan
On Tue, Jan 21, 2014 at 3:24 PM, Tom Lane wrote: > Maybe we could get some mileage out of the fact that very approximate > techniques would be good enough. For instance, I doubt anyone would bleat > if the system insisted on having 10MB or even 100MB of future WAL space > always available. But I

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Jeff Janes writes: > On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane wrote: >> My preference would be that we simply start failing writes with ERRORs >> rather than PANICs. I'm not real sure ATM why this has to be a PANIC >> condition. Probably the cause is that it's being done inside a critical >> s

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Jeff Janes
On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane wrote: > Simon Riggs writes: > > On 6 June 2013 16:00, Heikki Linnakangas > wrote: > >> The current situation is that if you run out of disk space while writing > >> WAL, you get a PANIC, and the server shuts down. That's awful. > > > I don't see we nee

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Greg Stark writes: > Fwiw I think "all transactions lock up until space appears" is *much* > better than PANICing. Often disks fill up due to other transient > storage or people may have options to manually increase the amount of > space. it's much better if the database just continues to function

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Greg Stark
Fwiw I think "all transactions lock up until space appears" is *much* better than PANICing. Often disks fill up due to other transient storage or people may have options to manually increase the amount of space. it's much better if the database just continues to function after that rather than need

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Simon Riggs
On 21 January 2014 18:35, Tom Lane wrote: > Simon Riggs writes: >> On 6 June 2013 16:00, Heikki Linnakangas wrote: >>> The current situation is that if you run out of disk space while writing >>> WAL, you get a PANIC, and the server shuts down. That's awful. > >> I don't see we need to prevent W

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Simon Riggs writes: > On 6 June 2013 16:00, Heikki Linnakangas wrote: >> The current situation is that if you run out of disk space while writing >> WAL, you get a PANIC, and the server shuts down. That's awful. > I don't see we need to prevent WAL insertions when the disk fills. We > still have

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Simon Riggs
On 6 June 2013 16:00, Heikki Linnakangas wrote: > In the "Redesigning checkpoint_segments" thread, many people opined that > there should be a hard limit on the amount of disk space used for WAL: > http://www.postgresql.org/message-id/CA+TgmoaOkgZb5YsmQeMg8ZVqWMtR=6s4-ppd+6jiy4oq78i...@mail.gmail.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-22 Thread Bruce Momjian
On Mon, Jun 10, 2013 at 07:28:24AM +0800, Craig Ringer wrote: > (I'm still learning the details of Pg's WAL, WAL replay and recovery, so > the below's just my understanding): > > The problem is that WAL for all tablespaces is mixed together in the > archives. If you lose your tablespace then you h

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-17 Thread Dimitri Fontaine
Peter Eisentraut writes: > I suspect that there are actually only about 5 or 6 common ways to do > archiving (say, local, NFS, scp, rsync, S3, ...). There's no reason why > we can't fully specify and/or script what to do in each of these cases. And provide either fully reliable contrib scripts o

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Claudio Freire
On Wed, Jun 12, 2013 at 6:03 PM, Joshua D. Drake wrote: > >> Right now you have to be a rocket >> scientist no matter what configuration you're running. > > > This is quite a bit overblown. Assuming your needs are simple. Archiving is > at it is now, a relatively simple process to set up, even wi

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Joshua D. Drake
On 06/12/2013 08:49 AM, Robert Haas wrote: Sure, remote archiving is great, and I'm glad you've been working on it. In general, I think that's a cleaner approach, but there are still enough people using archive_command that we can't throw them under the bus. Correct. I guess archiving to

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Wed, Jun 12, 2013 at 12:07 PM, Peter Eisentraut wrote: > On 6/12/13 10:55 AM, Robert Haas wrote: >> But it's got to be pretty common to archive to a local >> path that happens to be a remote mount, or to a local directory whose >> contents are subsequently copied off by a batch job. Making tha

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Peter Eisentraut
On 6/12/13 10:55 AM, Robert Haas wrote: > But it's got to be pretty common to archive to a local > path that happens to be a remote mount, or to a local directory whose > contents are subsequently copied off by a batch job. Making that work > nicely with near-zero configuration would be a signific

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Wed, Jun 12, 2013 at 11:32 AM, Magnus Hagander wrote: > Wouldn't that encourage people to do local archiving, which is almost always > a bad idea? Maybe, but refusing to improve the UI because people might then use the feature seems wrong-headed. > I'd rather improve the experience with pg_re

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Sat, Jun 8, 2013 at 7:20 PM, Jeff Janes wrote: > If archiving is on and failure is due to no space, could we just keep trying > XLogFileInit again for a couple minutes to give archiving a chance to do its > things? Doing that while holding onto locks and a critical section would be > unfortuna

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Magnus Hagander
On Jun 12, 2013 4:56 PM, "Robert Haas" wrote: > > On Sat, Jun 8, 2013 at 10:36 AM, MauMau wrote: > > Yes, I feel designing reliable archiving, even for the simplest case - copy > > WAL to disk, is very difficult. I know there are following three problems > > if you just follow the PostgreSQL man

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Tatsuo Ishii
> On Sat, Jun 8, 2013 at 10:36 AM, MauMau wrote: >> Yes, I feel designing reliable archiving, even for the simplest case - copy >> WAL to disk, is very difficult. I know there are following three problems >> if you just follow the PostgreSQL manual. Average users won't notice them. >> I guess ev

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Claudio Freire
On Wed, Jun 12, 2013 at 11:55 AM, Robert Haas wrote: >> I hope PostgreSQL will provide a reliable archiving facility that is ready >> to use. > > +1. I think we should have a way to set an archive DIRECTORY, rather > than an archive command. And if you set it, then PostgreSQL should > just do al

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Sat, Jun 8, 2013 at 10:36 AM, MauMau wrote: > Yes, I feel designing reliable archiving, even for the simplest case - copy > WAL to disk, is very difficult. I know there are following three problems > if you just follow the PostgreSQL manual. Average users won't notice them. > I guess even pro

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Josh Berkus
> Not a bad idea. One that supports rsync and another that supports > robocopy. That should cover every platform we support. Example script: = #!/usr/bin/env bash # Simple script to copy WAL archives from one server to another # to be called as archive_command (call

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Joshua D. Drake
On 06/10/2013 04:42 PM, Josh Berkus wrote: Actually we describe what archive_command needs to fulfill, and tell them to use something that accomplishes that. The example with cp is explicitly given as an example, not a recommendation. If we offer cp as an example, we *are* recommending it.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Daniel Farina
On Mon, Jun 10, 2013 at 4:42 PM, Josh Berkus wrote: > Daniel, Jeff, > >> I don't doubt this, that's why I do have a no-op fallback for >> emergencies. The discussion was about defaults. I still think that >> drop-wal-from-archiving-whenever is not a good one. > > Yeah, we can argue defaults for

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Josh Berkus
Daniel, Jeff, > I don't doubt this, that's why I do have a no-op fallback for > emergencies. The discussion was about defaults. I still think that > drop-wal-from-archiving-whenever is not a good one. Yeah, we can argue defaults for a long time. What would be better is some way to actually det

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Daniel Farina
On Mon, Jun 10, 2013 at 11:59 AM, Josh Berkus wrote: > Anyway, what I'm pointing out is that this is a business decision, and > there is no way that we can make a decision for the users what to do > when we run out of WAL space. And that the "stop archiving" option > needs to be there for users,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:07 AM, Joshua D. Drake wrote: > > On 06/08/2013 07:36 AM, MauMau wrote: > > 1. If the machine or postgres crashes while archive_command is copying a >> WAL file, later archive recovery fails. >> This is because cp leaves a file of less than 16MB in archive area, and >> p

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Josh Berkus
Josh, Daniel, >> Right now, what we're telling users is "You can have continuous backup >> with Postgres, but you'd better hire and expensive consultant to set it >> up for you, or use this external tool of dubious provenance which >> there's no packages for, or you might accidentally cause your d

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread MauMau
From: "Craig Ringer" The problem is that WAL for all tablespaces is mixed together in the archives. If you lose your tablespace then you have to keep *all* WAL around and replay *all* of it again when the tablespace comes back online. This would be very inefficient, would require a lot of tricks

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread Craig Ringer
On 06/10/2013 06:39 AM, MauMau wrote: > The problem is that the reliability of the database system decreases > with more disks, because failure of any one of those disks would result > in a database PANIC shutdown More specifically, with more independent sets of disks / file systems. >> I'd rath

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread MauMau
From: "Craig Ringer" On 06/09/2013 08:32 AM, MauMau wrote: - Failure of a disk containing data directory or tablespace If checkpoint can't write buffers to disk because of disk failure, checkpoint cannot complete, thus WAL files accumulate in pg_xlog/. This means that one disk failure will lea

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread Andres Freund
On 2013-06-08 13:26:56 -0700, Joshua D. Drake wrote: > >At the points where the XLogInsert()s happens we're in critical sections > >out of which we *cannot* ERROR out because we already may have made > >modifications that cannot be allowed to be performed > >partially/unlogged. That's why we're thr

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/09/2013 03:02 AM, Jeff Janes wrote: > It would be nice to have the ability to specify multiple log destinations > with different log_min_messages for each one. I'm sure syslog already must > implement some kind of method for doing that, but I've been happy enough > with the text logs that I

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/09/2013 08:32 AM, MauMau wrote: > > - Failure of a disk containing data directory or tablespace > If checkpoint can't write buffers to disk because of disk failure, > checkpoint cannot complete, thus WAL files accumulate in pg_xlog/. > This means that one disk failure will lead to postgres sh

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/08/2013 10:57 AM, Daniel Farina wrote: > >> At which point most sensible users say "no thanks, I'll use something else". > [snip] > > I have a clear bias in experience here, but I can't relate to someone > who sets up archives but is totally okay losing a segment unceremoniously, > because it

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/06/2013 10:00 PM, Heikki Linnakangas wrote: > > I've seen a case, where it was even worse than a PANIC and shutdown. > pg_xlog was on a separate partition that had nothing else on it. The > partition filled up, and the system shut down with a PANIC. Because > there was no space left, it could

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: "Josh Berkus" There's actually three potential failure cases here: - One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it up. - Archiving: archiving is failing or too slow, causing

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: "Joshua D. Drake" On 06/08/2013 11:27 AM, Andres Freund wrote: You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. Yes I know we aren't trying to piss off users. What I am saying

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: "Joshua D. Drake" On 06/08/2013 07:36 AM, MauMau wrote: 3. You cannot know the reason of archive_command failure (e.g. archive area full) if you don't use PostgreSQL's server logging. This is because archive_command failure is not logged in syslog/eventlog. Wait, what? Is this true (som

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:27 AM, Andres Freund wrote: > > You know, the PANIC isn't there just because we like to piss of > users. There's actual technical reasons that don't just go away by > judging the PANIC as stupid. > At the points where the XLogInsert()s happens we're in critical sections >

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Simon Riggs
On 7 June 2013 10:02, Heikki Linnakangas wrote: > On 07.06.2013 00:38, Andres Freund wrote: >> >> On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: >>> >>> * Heikki Linnakangas wrote: >>> The current situation is that if you run out of disk space while writing WAL, you get a PANIC,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/08/2013 11:27 AM, Andres Freund wrote: On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote: To me, a more pragmatic approach makes sense. Obviously having some kind of code that checks the space makes sense but I don't know that it needs to be around any operation other than we are creat

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:15 AM, Joshua D. Drake wrote: > > On 06/06/2013 07:52 AM, Heikki Linnakangas wrote: > >> I think it can be made fairly robust otherwise, and the performance >> impact should be pretty easy to measure with e.g pgbench. >> > > Once upon a time in a land far, far away, we ex

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus wrote: > > >> The archive command can be made a shell script (or that matter a > >> compiled program) which can do anything it wants upon failure, including > >> emailing people. > > You're talking about using external tools -- frequently hackish, > wo

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Andres Freund
On 2013-06-07 12:02:57 +0300, Heikki Linnakangas wrote: > On 07.06.2013 00:38, Andres Freund wrote: > >On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: > >>* Heikki Linnakangas wrote: > >> > >>>The current situation is that if you run out of disk space while writing > >>>WAL, you get a PANIC,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Andres Freund
On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote: > To me, a more pragmatic approach makes sense. Obviously having some kind of > code that checks the space makes sense but I don't know that it needs to be > around any operation other than we are creating a segment. What do we care > why the seg

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/06/2013 07:52 AM, Heikki Linnakangas wrote: I think it can be made fairly robust otherwise, and the performance impact should be pretty easy to measure with e.g pgbench. Once upon a time in a land far, far away, we expected users to manage their own systems. We had things like soft and

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/08/2013 07:36 AM, MauMau wrote: 1. If the machine or postgres crashes while archive_command is copying a WAL file, later archive recovery fails. This is because cp leaves a file of less than 16MB in archive area, and postgres refuses to start when it finds such a small archive WAL file. T

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/07/2013 12:14 PM, Josh Berkus wrote: Right now, what we're telling users is "You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might accid

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: "Daniel Farina" On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus wrote: Right now, what we're telling users is "You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Daniel Farina
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus wrote: > Right now, what we're telling users is "You can have continuous backup > with Postgres, but you'd better hire and expensive consultant to set it > up for you, or use this external tool of dubious provenance which > there's no packages for, or y

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Josh Berkus
>> I would oppose that as the solution, either an unconditional one, or >> configurable with is it as the default. Those segments are not >> unneeded. I need them. That is why I set up archiving in the first >> place. If you need to shut down the database rather than violate my >> established

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Tom Lane
Heikki Linnakangas writes: > On 07.06.2013 19:33, Tom Lane wrote: >> Not only is that a horrible layering/modularity violation, but surely >> LockBuffer can have no idea how much WAL space will be needed. > It can be just a conservative guess, like, 32KB. That should be enough > for almost all W

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Heikki Linnakangas
On 07.06.2013 19:33, Tom Lane wrote: Heikki Linnakangas writes: On 06.06.2013 17:00, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_ins

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Tom Lane
Heikki Linnakangas writes: > On 06.06.2013 17:00, Heikki Linnakangas wrote: >> A more workable idea is to sprinkle checks in higher-level code, before >> you hold any critical locks, to check that there is enough preallocated >> WAL. Like, at the beginning of heap_insert, heap_update, etc., and al

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Heikki Linnakangas
On 06.06.2013 17:00, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_insert, heap_update, etc., and all similar indexam entry points. Act

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Bernd Helmle
--On 6. Juni 2013 16:25:29 -0700 Josh Berkus wrote: Archiving - In some ways, this is the simplest case. Really, we just need a way to know when the available WAL space has become 90% full, and abort archiving at that stage. Once we stop attempting to archive, we can clean up the u

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Heikki Linnakangas
On 07.06.2013 00:38, Andres Freund wrote: On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow s

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Daniel Farina
On Thu, Jun 6, 2013 at 9:30 PM, Jeff Janes wrote: > I would oppose that as the solution, either an unconditional one, or > configurable with is it as the default. Those segments are not unneeded. I > need them. That is why I set up archiving in the first place. If you need > to shut down the d

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Joshua D. Drake
On 06/06/2013 09:30 PM, Jeff Janes wrote: Archiving - In some ways, this is the simplest case. Really, we just need a way to know when the available WAL space has become 90% full, and abort archiving at that stage. Once we stop attempting to archive, we can cl

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Jeff Janes
On Thursday, June 6, 2013, Josh Berkus wrote: > Let's talk failure cases. > > There's actually three potential failure cases here: > > - One Volume: WAL is on the same volume as PGDATA, and that volume is > completely out of space. > > - XLog Partition: WAL is on its own partition/volume, and fill

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Jaime Casanova
On Thu, Jun 6, 2013 at 4:28 PM, Christian Ullrich wrote: > * Heikki Linnakangas wrote: > >> The current situation is that if you run out of disk space while writing >> WAL, you get a PANIC, and the server shuts down. That's awful. We can > > >> So we need to somehow stop new WAL insertions from ha

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Josh Berkus
Let's talk failure cases. There's actually three potential failure cases here: - One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it up. - Archiving: archiving is failing or too slow, cau

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Greg Stark
On Thu, Jun 6, 2013 at 10:38 PM, Andres Freund wrote: > That's not a bad technique. I wonder how reliable it would be in > postgres. Do all filesystems allow a rename() to succeed if there isn't > actually any space left? E.g. on btrfs I wouldn't be sure. We need to > rename because WAL files nee

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Andres Freund
On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: > * Heikki Linnakangas wrote: > > >The current situation is that if you run out of disk space while writing > >WAL, you get a PANIC, and the server shuts down. That's awful. We can > > >So we need to somehow stop new WAL insertions from happe

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Christian Ullrich
* Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow stop new WAL insertions from happening, before it's too late. A naive idea is to check if there's

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Heikki Linnakangas
On 06.06.2013 17:17, Andres Freund wrote: On 2013-06-06 17:00:30 +0300, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_insert, heap_updat

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Andres Freund
On 2013-06-06 17:00:30 +0300, Heikki Linnakangas wrote: > A more workable idea is to sprinkle checks in higher-level code, before you > hold any critical locks, to check that there is enough preallocated WAL. > Like, at the beginning of heap_insert, heap_update, etc., and all similar > indexam entr