Re: [HACKERS] Spread checkpoint sync

2011-02-10 Thread Robert Haas
On Thu, Feb 10, 2011 at 10:30 PM, Greg Smith wrote: > 3) The existing write spreading code in the background writer needs to be > overhauled, too, before spreading the syncs around is going to give the > benefits I was hoping for. I've been thinking about this problem a bit. It strikes me that t

Re: [HACKERS] Spread checkpoint sync

2011-02-10 Thread Greg Smith
Looks like it's time to close the book on this one for 9.1 development...the unfortunate results are at http://www.2ndquadrant.us/pgbench-results/index.htm Test set #12 is the one with spread sync I was hoping would turn out better than #9, the reference I was trying to improve on. TPS is abo

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Greg Smith
Kevin Grittner wrote: There are occasional posts from those wondering why their read-only queries are so slow after a bulk load, and why they are doing heavy writes. (I remember when I posted about that, as a relative newbie, and I know I've seen others.) Sure; I created http://wiki.postgre

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Kevin Grittner
Greg Smith wrote: > As a larger statement on this topic, I'm never very excited about > redesigning here starting from any point other than "saw a > bottleneck doing on a production system". There's a long list > of such things already around waiting to be addressed, and I've > never seen any g

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Greg Smith
Cédric Villemain wrote: Is it worth a new thread with the different IO improvements done so far or on-going and how we may add new GUC(if required !!!) with intelligence between those patches ? ( For instance, hint bit IO limit needs probably a tunable to define something similar to hint_write_co

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Cédric Villemain
2011/2/7 Greg Smith : > Robert Haas wrote: >> >> With the fsync queue compaction patch applied, I think most of this is >> now not needed.  Attached please find an attempt to isolate the >> portion that looks like it might still be useful.  The basic idea of >> what remains here is to make the back

Re: [HACKERS] Spread checkpoint sync

2011-02-06 Thread Greg Smith
Robert Haas wrote: With the fsync queue compaction patch applied, I think most of this is now not needed. Attached please find an attempt to isolate the portion that looks like it might still be useful. The basic idea of what remains here is to make the background writer still do its normal stu

Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Robert Haas
On Fri, Feb 4, 2011 at 2:08 PM, Greg Smith wrote: > -The total number of buffers I'm computing based on the checkpoint writes > being sorted it not a perfect match to the number reported by the > "checkpoint complete" status line.  Sometimes they are the same, sometimes > not.  Not sure why yet.

Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Greg Smith
As already mentioned in the broader discussion at http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com , I'm seeing no solid performance swing in the checkpoint sorting code itself. Better sometimes, worse others, but never by a large amount. Here's what the statistics p

Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Greg Smith
Michael Banck wrote: On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote: For example, the pre-release Squeeze numbers we're seeing are awful so far, but it's not really done yet either. Unfortunately, it does not look like Debian squeeze will change any more (or has changed mu

Re: [HACKERS] Spread checkpoint sync

2011-02-03 Thread Michael Banck
On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote: > For example, the pre-release Squeeze numbers we're seeing are awful so > far, but it's not really done yet either. Unfortunately, it does not look like Debian squeeze will change any more (or has changed much since your post) at this p

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian writes: > > My trivial idea was: let's assume we checkpoint every 10 minutes, and > > it takes 5 minutes for us to write the data to the kernel. If no one > > else is writing to those files, we can safely wait maybe 5 more minutes > > before issuing the fsync. I

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Tom Lane
Bruce Momjian writes: > My trivial idea was: let's assume we checkpoint every 10 minutes, and > it takes 5 minutes for us to write the data to the kernel. If no one > else is writing to those files, we can safely wait maybe 5 more minutes > before issuing the fsync. If, however, hundreds of wr

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Kevin Grittner wrote: > Robert Haas wrote: > > > I also think Bruce's idea of calling fsync() on each relation just > > *before* we start writing the pages from that relation might have > > some merit. > > What bothers me about that is that you may have a lot of the same > dirty pages in the O

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Robert Haas
On Tue, Feb 1, 2011 at 12:58 PM, Kevin Grittner wrote: > Robert Haas wrote: > >> I also think Bruce's idea of calling fsync() on each relation just >> *before* we start writing the pages from that relation might have >> some merit. > > What bothers me about that is that you may have a lot of the

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Greg Smith wrote: > Greg Smith wrote: > > I think the right way to compute "relations to sync" is to finish the > > sorted writes patch I sent over a not quite right yet update to already > > Attached update now makes much more sense than the misguided patch I > submitted two weesk ago. This ta

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Robert Haas wrote: > Back to your idea: One problem with trying to bound the unflushed data > is that it's not clear what the bound should be. I've had this mental > model where we want the OS to write out pages to disk, but that's not > always true, per Greg Smith's recent posts about Linux kerne

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Kevin Grittner
Robert Haas wrote: > I also think Bruce's idea of calling fsync() on each relation just > *before* we start writing the pages from that relation might have > some merit. What bothers me about that is that you may have a lot of the same dirty pages in the OS cache as the PostgreSQL cache, and y

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Robert Haas
On Mon, Jan 31, 2011 at 4:28 PM, Tom Lane wrote: > Robert Haas writes: >> Back to the idea at hand - I proposed something a bit along these >> lines upthread, but my idea was to proactively perform the fsyncs on >> the relations that had gone the longest without a write, rather than >> the ones w

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Greg Smith
Greg Smith wrote: I think the right way to compute "relations to sync" is to finish the sorted writes patch I sent over a not quite right yet update to already Attached update now makes much more sense than the misguided patch I submitted two weesk ago. This takes the original sorted write co

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Greg Smith
Tom Lane wrote: Robert Haas writes: 3. Pause for 3 seconds after every fsync. I think something along the lines of #3 is probably a good idea, Really? Any particular delay is guaranteed wrong. '3 seconds' is just a placeholder for whatever comes out of a "total time

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas writes: > Back to the idea at hand - I proposed something a bit along these > lines upthread, but my idea was to proactively perform the fsyncs on > the relations that had gone the longest without a write, rather than > the ones with the most dirty data. Yeah. What I meant to suggest

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Greg Smith
Tom Lane wrote: I wonder whether it'd be useful to keep track of the total amount of data written-and-not-yet-synced, and to issue fsyncs often enough to keep that below some parameter; the idea being that the parameter would limit how much dirty kernel disk cache there is. Of course, ideally th

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Bruce Momjian
Robert Haas wrote: > Back to the idea at hand - I proposed something a bit along these > lines upthread, but my idea was to proactively perform the fsyncs on > the relations that had gone the longest without a write, rather than > the ones with the most dirty data. I'm not sure which is better. >

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 12:11 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane wrote: >>> I wonder whether it'd be useful to keep track of the total amount of >>> data written-and-not-yet-synced, and to issue fsyncs often enough to >>> keep that below some

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas writes: > On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane wrote: >> I wonder whether it'd be useful to keep track of the total amount of >> data written-and-not-yet-synced, and to issue fsyncs often enough to >> keep that below some parameter; the idea being that the parameter would >> lim

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 12:01 PM, Tom Lane wrote: > Robert Haas writes: >> 3. Pause for 3 seconds after every fsync. > >> I think something along the lines of #3 is probably a good idea, > > Really?  Any particular delay is guaranteed wrong. What I was getting at was - I think it's probably a go

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas writes: > 3. Pause for 3 seconds after every fsync. > I think something along the lines of #3 is probably a good idea, Really? Any particular delay is guaranteed wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane wrote: >>> That sounds like you have an entirely wrong mental model of where the >>> cost comes from.  Those times are not independent. > >> Yeah, Greg Smith made the same point

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas writes: > On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane wrote: >> That sounds like you have an entirely wrong mental model of where the >> cost comes from.  Those times are not independent. > Yeah, Greg Smith made the same point a week or three ago. But it > seems to me that there is p

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane wrote: > Heikki Linnakangas writes: >> IMHO we should re-consider the patch to sort the writes. Not so much >> because of the performance gain that gives, but because we can then >> re-arrange the fsyncs so that you write one file, then fsync it, then >>

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Heikki Linnakangas writes: > IMHO we should re-consider the patch to sort the writes. Not so much > because of the performance gain that gives, but because we can then > re-arrange the fsyncs so that you write one file, then fsync it, then > write the next file and so on. Isn't that going to m

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Heikki Linnakangas
On 31.01.2011 16:44, Robert Haas wrote: On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro wrote: On Mon, Jan 31, 2011 at 13:41, Robert Haas wrote: 1. Absorb fsync requests a lot more often during the sync phase. 2. Still try to run the cleaning scan during the sync phase. 3. Pause for 3 seco

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro wrote: > On Mon, Jan 31, 2011 at 13:41, Robert Haas wrote: >> 1. Absorb fsync requests a lot more often during the sync phase. >> 2. Still try to run the cleaning scan during the sync phase. >> 3. Pause for 3 seconds after every fsync. >> >> So if

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 13:41, Robert Haas wrote: > 1. Absorb fsync requests a lot more often during the sync phase. > 2. Still try to run the cleaning scan during the sync phase. > 3. Pause for 3 seconds after every fsync. > > So if we want the checkpoint > to finish in, say, 20 minutes, we can't

Re: [HACKERS] Spread checkpoint sync

2011-01-30 Thread Robert Haas
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: > I've attached an updated version of the initial sync spreading patch here, > one that applies cleanly on top of HEAD and over top of the sync > instrumentation patch too.  The conflict that made that hard before is gone > now. With the fsync qu

Re: [HACKERS] Spread checkpoint sync

2011-01-29 Thread Robert Haas
On Fri, Jan 28, 2011 at 12:53 AM, Greg Smith wrote: > Where there are still very ugly maximum latency figures here in every case, > these periods just aren't as wide with the patch in place. OK, committed the patch, with some additional commenting, and after fixing the compiler warning Chris Brow

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Greg Smith
Robert Haas wrote: During each cluster, the system probably slows way down, and then recovers when the queue is emptied. So the TPS improvement isn't at all a uniform speedup, but simply relief from the stall that would otherwise result from a full queue. That does seem to be the case here.

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Greg Smith
Robert Haas wrote: Based on what I saw looking at this, I'm thinking that the backend fsyncs probably happen in clusters - IOW, it's not 2504 backend fsyncs spread uniformly throughout the test, but clusters of 100 or more that happen in very quick succession, followed by relief when the backgrou

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Robert Haas
On Thu, Jan 27, 2011 at 12:18 PM, Greg Smith wrote: > Greg Smith wrote: >> >> I think a helpful next step here would be to put Robert's fsync compaction >> patch into here and see if that helps.  There are enough backend syncs >> showing up in the difficult workloads (scale>=1000, clients >=32) th

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Greg Smith
Greg Smith wrote: I think a helpful next step here would be to put Robert's fsync compaction patch into here and see if that helps. There are enough backend syncs showing up in the difficult workloads (scale>=1000, clients >=32) that its impact should be obvious. Initial tests show everythin

Re: [HACKERS] Spread checkpoint sync

2011-01-18 Thread Josh Berkus
> To be frank, I really don't care about fixing this behavior on ext3, > especially in the context of that sort of hack. That filesystem is not > the future, it's not possible to ever really make it work right, and > every minute spent on pandering to its limitations would be better spent > elsew

Re: [HACKERS] Spread checkpoint sync

2011-01-18 Thread Greg Smith
Robert Haas wrote: Idea #4: For ext3 filesystems that like to dump the entire buffer cache instead of only the requested file, write a little daemon that runs alongside of (and completely indepdently of) PostgreSQL. Every 30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and close

Re: [HACKERS] Spread checkpoint sync

2011-01-18 Thread Cédric Villemain
2011/1/18 Greg Smith : > Bruce Momjian wrote: >> >> Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00? >> > > The idea of having a dead period doing no work at all between write phase > and sync phase may have some merit.  I don't have enough test data yet on > some more funda

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Greg Smith
Bruce Momjian wrote: Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00? The idea of having a dead period doing no work at all between write phase and sync phase may have some merit. I don't have enough test data yet on some more fundamental issues in this area to com

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Greg Smith
Jim Nasby wrote: Wow, that's the kind of thing that would be incredibly difficult to figure out, especially while your production system is in flames... Can we change ereport that happens in that case from DEBUG1 to WARNING? Or provide some other means to track it That's why we already added

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Robert Haas
On Mon, Jan 17, 2011 at 6:07 PM, Jim Nasby wrote: > On Jan 15, 2011, at 8:15 AM, Robert Haas wrote: >> Well, the point of this is not to save time in the bgwriter - I'm not >> surprised to hear that wasn't noticeable.  The point is that when the >> fsync request queue fills up, backends start perf

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Jim Nasby
On Jan 15, 2011, at 8:15 AM, Robert Haas wrote: > Well, the point of this is not to save time in the bgwriter - I'm not > surprised to hear that wasn't noticeable. The point is that when the > fsync request queue fills up, backends start performing an fsync *for > every block they write*, and that

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Greg Smith
Jeff Janes wrote: Have you ever tested Robert's other idea of having a metronome process do a periodic fsync on a dummy file which is located on the same ext3fs as the table files? I think that that would be interesting to see. To be frank, I really don't care about fixing this behavior on

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Jeff Janes
On Sun, Jan 16, 2011 at 7:13 PM, Greg Smith wrote: > I have finished a first run of benchmarking the current 9.1 code at various > sizes.  See http://www.2ndquadrant.us/pgbench-results/index.htm for many > details.  The interesting stuff is in Test Set 3, near the bottom.  That's > the first one t

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Bruce Momjian
Greg Smith wrote: > One of the components to the write queue is some notion that writes that > have been waiting longest should eventually be flushed out. Linux has > this number called dirty_expire_centiseconds which suggests it enforces > just that, set to a default of 30 seconds. This is wh

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Robert Haas
On Sun, Jan 16, 2011 at 10:13 PM, Greg Smith wrote: > I have finished a first run of benchmarking the current 9.1 code at various > sizes.  See http://www.2ndquadrant.us/pgbench-results/index.htm for many > details.  The interesting stuff is in Test Set 3, near the bottom.  That's > the first one

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Greg Smith
I have finished a first run of benchmarking the current 9.1 code at various sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm for many details. The interesting stuff is in Test Set 3, near the bottom. That's the first one that includes buffer_backend_fsync data. This iall on ex

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Robert Haas
On Sun, Jan 16, 2011 at 7:32 PM, Jeff Janes wrote: > But since you already wrote a patch to do the whole thing, I figured > I'd time it. Thanks! > I arranged to test an instrumented version of your patch under large > shared_buffers of 4GB, conditions that would maximize the opportunity > for it

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Jeff Janes
On Tue, Jan 11, 2011 at 5:27 PM, Robert Haas wrote: > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: >> One of the ideas Simon and I had been considering at one point was adding >> some better de-duplication logic to the fsync absorb code, which I'm >> reminded by the pattern here might be he

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: What is the basis for thinking that the sync should get the same amount of time as the writes? That seems pretty arbitrary. Right now, you're allowing 3 seconds per fsync, which could be a lot more or a lot less than 40% of the total checkpoint time... Just that it's where

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 5:57 PM, Greg Smith wrote: > I was just giving an example of how I might do an initial split.  There's a > checkpoint happening now at time T; we have a rough idea that it needs to be > finished before some upcoming time T+D.  Currently with default parameters > this become

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Marti Raudsepp
On Sat, Jan 15, 2011 at 14:05, Robert Haas wrote: > Idea #4: For ext3 filesystems that like to dump the entire buffer > cache instead of only the requested file, write a little daemon that > runs alongside of (and completely indepdently of) PostgreSQL.  Every > 30 s, it opens a 1-byte file, change

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: That seems like a bad idea - don't we routinely recommend that people crank this up to 0.9? You'd be effectively bounding the upper range of this setting to a value to the less than the lowest value we recommend anyone use today. I was just giving an example of how I migh

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 10:31 AM, Greg Smith wrote: > That's going to give worse performance than the current code in some cases. OK. >> How does the checkpoint target give you any time to sync them?  Unless >> you squeeze the writes together more tightly, but that seems sketchy. > > Obviously t

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Simon Riggs
On Sat, 2011-01-15 at 09:15 -0500, Robert Haas wrote: > On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs wrote: > > On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: > >> Robert Haas wrote: > >> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: > >> > > >> > > One of the ideas Simon and I had b

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: I'll believe it when I see it. How about this: a 1 a 2 sync a b 1 b 2 sync b c 1 c 2 sync c Or maybe some variant, where we become willing to fsync a file a certain number of seconds after writing the last block, or when all the writes are done, whichever comes first. That

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 9:25 AM, Greg Smith wrote: > Once upon a time we got a patch from Itagaki Takahiro whose purpose was to > sort writes before sending them out: > > http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php Ah, a fine idea! > Which has very low odds of the sync on "

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: Idea #2: At the beginning of a checkpoint when we scan all the buffers, count the number of buffers that need to be synced for each relation. Use the same hashtable that we use for tracking pending fsync requests. Then, interleave the writes and the fsyncs... Idea #3: Stick

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs wrote: > On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: >> Robert Haas wrote: >> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: >> > >> > > One of the ideas Simon and I had been considering at one point was adding >> > > some better de-dupl

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Simon Riggs
On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: > Robert Haas wrote: > > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: > > > > > One of the ideas Simon and I had been considering at one point was adding > > > some better de-duplication logic to the fsync absorb code, which I'm > > >

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 5:47 AM, Greg Smith wrote: > No toe damage, this is great, I hadn't gotten to coding for this angle yet > at all.  Suffering from an overload of ideas and (mostly wasted) test data, > so thanks for exploring this concept and proving it works. Yeah - obviously I want to mak

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: One of the ideas Simon and I had been considering at one point was adding some better de-duplication logic to the fsync absorb code, which I'm reminded by the pattern here might be helpful independently of other improvemen

Re: [HACKERS] Spread checkpoint sync

2011-01-11 Thread Robert Haas
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote: > Having the pg_stat_bgwriter.buffers_backend_fsync patch available all the > time now has made me reconsider how important one potential bit of > refactoring here would be.  I managed to catch one of the situations where > really popular relation

Re: [HACKERS] Spread checkpoint sync

2010-12-08 Thread Simon Riggs
On Mon, 2010-12-06 at 23:26 -0300, Alvaro Herrera wrote: > Why would multiple bgwriter processes worry you? Because it complicates the tracking of files requiring fsync. As Greg says, the last attempt to do that was a lot of code. -- Simon Riggs http://www.2ndQuadrant.com/books/ Po

Re: [HACKERS] Spread checkpoint sync

2010-12-07 Thread Greg Smith
Alvaro Herrera wrote: Why would multiple bgwriter processes worry you? Of course, it wouldn't work to have multiple processes trying to execute a checkpoint simultaneously, but what if we separated the tasks so that one process is in charge of checkpoints, and another one is in charge of the LRU

Re: [HACKERS] Spread checkpoint sync

2010-12-06 Thread Alvaro Herrera
Excerpts from Greg Smith's message of dom dic 05 20:02:48 -0300 2010: > When ends up happening if you push toward fully sync I/O is the design > you see in some other databases, where you need multiple writer > processes. Then requests for new pages can continue to allocate as > needed, while

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Greg Smith
Rob Wultsch wrote: Forgive me, but is all of this a step on the slippery slope to direct io? And is this a bad thing I don't really think so. There's an important difference in my head between direct I/O, where the kernel is told "write this immediately!", and what I'm trying to achive. I w

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Rob Wultsch
On Sun, Dec 5, 2010 at 2:53 PM, Greg Smith wrote: > Heikki Linnakangas wrote: >> >> If you fsync() a file with one dirty page in it, it's going to return very >> quickly, but a 1GB file will take a while. That could be problematic if you >> have a thousand small files and a couple of big ones, as

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Greg Smith
Heikki Linnakangas wrote: If you fsync() a file with one dirty page in it, it's going to return very quickly, but a 1GB file will take a while. That could be problematic if you have a thousand small files and a couple of big ones, as you would want to reserve more time for the big ones. I'm not

Re: [HACKERS] Spread checkpoint sync

2010-12-04 Thread Greg Smith
Greg Stark wrote: Using sync_file_range you can specify the set of blocks to sync and then block on them only after some time has passed. But there's no documentation on how this relates to the I/O scheduler so it's not clear it would have any effect on the problem. I believe this is the exact

Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 2:24 PM, Greg Stark wrote: > On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith wrote: >>> I ask because I don't have a mental model of how the pause can help. >>> Given that this dirty data has been hanging around for many minutes >>> already, what is a 3 second pause going to hea

Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Josh Berkus
> Using sync_file_range you can specify the set of blocks to sync and > then block on them only after some time has passed. But there's no > documentation on how this relates to the I/O scheduler so it's not > clear it would have any effect on the problem. We might still have to > delay the begini

Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Greg Stark
On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith wrote: >> I ask because I don't have a mental model of how the pause can help. >> Given that this dirty data has been hanging around for many minutes >> already, what is a 3 second pause going to heal? >> > > The difference is that once an fsync call is m

Re: [HACKERS] Spread checkpoint sync

2010-12-01 Thread Heikki Linnakangas
On 01.12.2010 23:30, Greg Smith wrote: Heikki Linnakangas wrote: Do you have any idea how to autotune the delay between fsyncs? I'm thinking to start by counting the number of relations that need them at the beginning of the checkpoint. Then use the same basic math that drives the spread write

Re: [HACKERS] Spread checkpoint sync

2010-12-01 Thread Greg Smith
Heikki Linnakangas wrote: Do you have any idea how to autotune the delay between fsyncs? I'm thinking to start by counting the number of relations that need them at the beginning of the checkpoint. Then use the same basic math that drives the spread writes, where you assess whether you're on

Re: [HACKERS] Spread checkpoint sync

2010-12-01 Thread Heikki Linnakangas
On 01.12.2010 06:25, Greg Smith wrote: Jeff Janes wrote: I ask because I don't have a mental model of how the pause can help. Given that this dirty data has been hanging around for many minutes already, what is a 3 second pause going to heal? The difference is that once an fsync call is made,

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Greg Smith
Jeff Janes wrote: Have you tested out this "absorb during syncing phase" code without the sleep between the syncs? I.e. so that it still a tight loop, but the loop alternates between sync and absorb, with no intentional pause? Yes; that's how it was developed. It helped to have just the ext

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Jeff Janes
On Sun, Nov 14, 2010 at 3:48 PM, Greg Smith wrote: ... > One change that turned out be necessary rather than optional--to get good > performance from the system under tuning--was to make regular background > writer activity, including fsync absorb checks, happen during these sync > pauses.  The

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Josh Berkus
> Maybe, but it's hard to argue that the current implementation--just > doing all of the sync calls as fast as possible, one after the other--is > going to produce worst-case behavior in a lot of situations. Given that > it's not a huge amount of code to do better, I'd rather do some work in > th

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Greg Smith
Ron Mayer wrote: Might smoother checkpoints be better solved by talking to the OS vendors & virtual-memory-tunning-knob-authors to work with them on exposing the ideal knobs; rather than saying that our only tool is a hammer(fsync) so the problem must be handled as a nail. Maybe, but it's ha

Re: [HACKERS] Spread checkpoint sync

2010-11-26 Thread Ron Mayer
Josh Berkus wrote: > On 11/20/10 6:11 PM, Jeff Janes wrote: >> True, but I think that changing these from their defaults is not >> considered to be a dark art reserved for kernel hackers, i.e they are >> something that sysadmins are expected to tweak to suite their work >> load, just like the shmma

Re: [HACKERS] Spread checkpoint sync

2010-11-23 Thread Cédric Villemain
2010/11/21 Andres Freund : > On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote: >> For a similar problem we had (kernel buffering too much) we had success >> using the fadvise and madvise WONTNEED syscalls to force the data to >> exit the cache much sooner than it would otherwise. Th

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 4:54 PM, Greg Smith wrote: > Let me throw some numbers out [...] Interesting. > Ultimately what I want to do here is some sort of smarter write-behind sync > operation, perhaps with a LRU on relations with pending fsync requests.  The > idea would be to sync relations tha

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Josh Berkus
On 11/20/10 6:11 PM, Jeff Janes wrote: > True, but I think that changing these from their defaults is not > considered to be a dark art reserved for kernel hackers, i.e they are > something that sysadmins are expected to tweak to suite their work > load, just like the shmmax and such. I disagree.

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Andres Freund
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote: > For a similar problem we had (kernel buffering too much) we had success > using the fadvise and madvise WONTNEED syscalls to force the data to > exit the cache much sooner than it would otherwise. This was on Linux > and it had the

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Martijn van Oosterhout
On Sun, Nov 21, 2010 at 04:54:00PM -0500, Greg Smith wrote: > Ultimately what I want to do here is some sort of smarter write-behind > sync operation, perhaps with a LRU on relations with pending fsync > requests. The idea would be to sync relations that haven't been touched > in a while in

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Greg Smith
Robert Haas wrote: Doing all the writes and then all the fsyncs meets this requirement trivially, but I'm not so sure that's a good idea. For example, given files F1 ... Fn with dirty pages needing checkpoint writes, we could do the following: first, do any pending fsyncs for files not among F1

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Greg Smith
Jeff Janes wrote: And for very large memory systems, even 1% may be too much to cache (dirty*_ratio can only be set in integer percent points), so recent kernels introduced dirty*_bytes parameters. I like these better because they do what they say. With the dirty*_ratio, I could never figure ou

Re: [HACKERS] Spread checkpoint sync

2010-11-20 Thread Jeff Janes
On Sat, Nov 20, 2010 at 5:17 PM, Robert Haas wrote: > On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes wrote: >>> Doing all the writes and then all the fsyncs meets this requirement >>> trivially, but I'm not so sure that's a good idea.  For example, given >>> files F1 ... Fn with dirty pages needing

Re: [HACKERS] Spread checkpoint sync

2010-11-20 Thread Robert Haas
On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes wrote: >>> The thing to realize >>> that complicates the design is that the actual sync execution may take a >>> considerable period of time.  It's much more likely for that to happen than >>> in the case of an individual write, as the current spread che

Re: [HACKERS] Spread checkpoint sync

2010-11-20 Thread Jeff Janes
On Mon, Nov 15, 2010 at 6:15 PM, Robert Haas wrote: > On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith wrote: >> The second issue is that the delay between sync calls is currently >> hard-coded, at 3 seconds.  I believe the right path here is to consider the >> current checkpoint_completion_target to

Re: [HACKERS] Spread checkpoint sync

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith wrote: > The second issue is that the delay between sync calls is currently > hard-coded, at 3 seconds.  I believe the right path here is to consider the > current checkpoint_completion_target to still be valid, then work back from > there.  That raises

[HACKERS] Spread checkpoint sync

2010-11-14 Thread Greg Smith
Final patch in this series for today spreads out the individual checkpoint fsync calls over time, and was written by myself and Simon Riggs. Patch is based against a system that's already had the two patches I sent over earlier today applied, rather than HEAD, as both are useful for measuring