On Thu, Feb 10, 2011 at 10:30 PM, Greg Smith wrote:
> 3) The existing write spreading code in the background writer needs to be
> overhauled, too, before spreading the syncs around is going to give the
> benefits I was hoping for.
I've been thinking about this problem a bit. It strikes me that t
Looks like it's time to close the book on this one for 9.1
development...the unfortunate results are at
http://www.2ndquadrant.us/pgbench-results/index.htm Test set #12 is the
one with spread sync I was hoping would turn out better than #9, the
reference I was trying to improve on. TPS is abo
Kevin Grittner wrote:
There are occasional posts from those wondering why their read-only
queries are so slow after a bulk load, and why they are doing heavy
writes. (I remember when I posted about that, as a relative newbie,
and I know I've seen others.)
Sure; I created http://wiki.postgre
Greg Smith wrote:
> As a larger statement on this topic, I'm never very excited about
> redesigning here starting from any point other than "saw a
> bottleneck doing on a production system". There's a long list
> of such things already around waiting to be addressed, and I've
> never seen any g
Cédric Villemain wrote:
Is it worth a new thread with the different IO improvements done so
far or on-going and how we may add new GUC(if required !!!) with
intelligence between those patches ? ( For instance, hint bit IO limit
needs probably a tunable to define something similar to
hint_write_co
2011/2/7 Greg Smith :
> Robert Haas wrote:
>>
>> With the fsync queue compaction patch applied, I think most of this is
>> now not needed. Attached please find an attempt to isolate the
>> portion that looks like it might still be useful. The basic idea of
>> what remains here is to make the back
Robert Haas wrote:
With the fsync queue compaction patch applied, I think most of this is
now not needed. Attached please find an attempt to isolate the
portion that looks like it might still be useful. The basic idea of
what remains here is to make the background writer still do its normal
stu
On Fri, Feb 4, 2011 at 2:08 PM, Greg Smith wrote:
> -The total number of buffers I'm computing based on the checkpoint writes
> being sorted it not a perfect match to the number reported by the
> "checkpoint complete" status line. Sometimes they are the same, sometimes
> not. Not sure why yet.
As already mentioned in the broader discussion at
http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com
, I'm seeing no solid performance swing in the checkpoint sorting code
itself. Better sometimes, worse others, but never by a large amount.
Here's what the statistics p
Michael Banck wrote:
On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote:
For example, the pre-release Squeeze numbers we're seeing are awful so
far, but it's not really done yet either.
Unfortunately, it does not look like Debian squeeze will change any more
(or has changed mu
On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote:
> For example, the pre-release Squeeze numbers we're seeing are awful so
> far, but it's not really done yet either.
Unfortunately, it does not look like Debian squeeze will change any more
(or has changed much since your post) at this p
Tom Lane wrote:
> Bruce Momjian writes:
> > My trivial idea was: let's assume we checkpoint every 10 minutes, and
> > it takes 5 minutes for us to write the data to the kernel. If no one
> > else is writing to those files, we can safely wait maybe 5 more minutes
> > before issuing the fsync. I
Bruce Momjian writes:
> My trivial idea was: let's assume we checkpoint every 10 minutes, and
> it takes 5 minutes for us to write the data to the kernel. If no one
> else is writing to those files, we can safely wait maybe 5 more minutes
> before issuing the fsync. If, however, hundreds of wr
Kevin Grittner wrote:
> Robert Haas wrote:
>
> > I also think Bruce's idea of calling fsync() on each relation just
> > *before* we start writing the pages from that relation might have
> > some merit.
>
> What bothers me about that is that you may have a lot of the same
> dirty pages in the O
On Tue, Feb 1, 2011 at 12:58 PM, Kevin Grittner
wrote:
> Robert Haas wrote:
>
>> I also think Bruce's idea of calling fsync() on each relation just
>> *before* we start writing the pages from that relation might have
>> some merit.
>
> What bothers me about that is that you may have a lot of the
Greg Smith wrote:
> Greg Smith wrote:
> > I think the right way to compute "relations to sync" is to finish the
> > sorted writes patch I sent over a not quite right yet update to already
>
> Attached update now makes much more sense than the misguided patch I
> submitted two weesk ago. This ta
Robert Haas wrote:
> Back to your idea: One problem with trying to bound the unflushed data
> is that it's not clear what the bound should be. I've had this mental
> model where we want the OS to write out pages to disk, but that's not
> always true, per Greg Smith's recent posts about Linux kerne
Robert Haas wrote:
> I also think Bruce's idea of calling fsync() on each relation just
> *before* we start writing the pages from that relation might have
> some merit.
What bothers me about that is that you may have a lot of the same
dirty pages in the OS cache as the PostgreSQL cache, and y
On Mon, Jan 31, 2011 at 4:28 PM, Tom Lane wrote:
> Robert Haas writes:
>> Back to the idea at hand - I proposed something a bit along these
>> lines upthread, but my idea was to proactively perform the fsyncs on
>> the relations that had gone the longest without a write, rather than
>> the ones w
Greg Smith wrote:
I think the right way to compute "relations to sync" is to finish the
sorted writes patch I sent over a not quite right yet update to already
Attached update now makes much more sense than the misguided patch I
submitted two weesk ago. This takes the original sorted write co
Tom Lane wrote:
Robert Haas writes:
3. Pause for 3 seconds after every fsync.
I think something along the lines of #3 is probably a good idea,
Really? Any particular delay is guaranteed wrong.
'3 seconds' is just a placeholder for whatever comes out of a "total
time
Robert Haas writes:
> Back to the idea at hand - I proposed something a bit along these
> lines upthread, but my idea was to proactively perform the fsyncs on
> the relations that had gone the longest without a write, rather than
> the ones with the most dirty data.
Yeah. What I meant to suggest
Tom Lane wrote:
I wonder whether it'd be useful to keep track of the total amount of
data written-and-not-yet-synced, and to issue fsyncs often enough to
keep that below some parameter; the idea being that the parameter would
limit how much dirty kernel disk cache there is. Of course, ideally th
Robert Haas wrote:
> Back to the idea at hand - I proposed something a bit along these
> lines upthread, but my idea was to proactively perform the fsyncs on
> the relations that had gone the longest without a write, rather than
> the ones with the most dirty data. I'm not sure which is better.
>
On Mon, Jan 31, 2011 at 12:11 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane wrote:
>>> I wonder whether it'd be useful to keep track of the total amount of
>>> data written-and-not-yet-synced, and to issue fsyncs often enough to
>>> keep that below some
Robert Haas writes:
> On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane wrote:
>> I wonder whether it'd be useful to keep track of the total amount of
>> data written-and-not-yet-synced, and to issue fsyncs often enough to
>> keep that below some parameter; the idea being that the parameter would
>> lim
On Mon, Jan 31, 2011 at 12:01 PM, Tom Lane wrote:
> Robert Haas writes:
>> 3. Pause for 3 seconds after every fsync.
>
>> I think something along the lines of #3 is probably a good idea,
>
> Really? Any particular delay is guaranteed wrong.
What I was getting at was - I think it's probably a go
Robert Haas writes:
> 3. Pause for 3 seconds after every fsync.
> I think something along the lines of #3 is probably a good idea,
Really? Any particular delay is guaranteed wrong.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane wrote:
> Robert Haas writes:
>> On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane wrote:
>>> That sounds like you have an entirely wrong mental model of where the
>>> cost comes from. Those times are not independent.
>
>> Yeah, Greg Smith made the same point
Robert Haas writes:
> On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane wrote:
>> That sounds like you have an entirely wrong mental model of where the
>> cost comes from. Those times are not independent.
> Yeah, Greg Smith made the same point a week or three ago. But it
> seems to me that there is p
On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane wrote:
> Heikki Linnakangas writes:
>> IMHO we should re-consider the patch to sort the writes. Not so much
>> because of the performance gain that gives, but because we can then
>> re-arrange the fsyncs so that you write one file, then fsync it, then
>>
Heikki Linnakangas writes:
> IMHO we should re-consider the patch to sort the writes. Not so much
> because of the performance gain that gives, but because we can then
> re-arrange the fsyncs so that you write one file, then fsync it, then
> write the next file and so on.
Isn't that going to m
On 31.01.2011 16:44, Robert Haas wrote:
On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro
wrote:
On Mon, Jan 31, 2011 at 13:41, Robert Haas wrote:
1. Absorb fsync requests a lot more often during the sync phase.
2. Still try to run the cleaning scan during the sync phase.
3. Pause for 3 seco
On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro
wrote:
> On Mon, Jan 31, 2011 at 13:41, Robert Haas wrote:
>> 1. Absorb fsync requests a lot more often during the sync phase.
>> 2. Still try to run the cleaning scan during the sync phase.
>> 3. Pause for 3 seconds after every fsync.
>>
>> So if
On Mon, Jan 31, 2011 at 13:41, Robert Haas wrote:
> 1. Absorb fsync requests a lot more often during the sync phase.
> 2. Still try to run the cleaning scan during the sync phase.
> 3. Pause for 3 seconds after every fsync.
>
> So if we want the checkpoint
> to finish in, say, 20 minutes, we can't
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
> I've attached an updated version of the initial sync spreading patch here,
> one that applies cleanly on top of HEAD and over top of the sync
> instrumentation patch too. The conflict that made that hard before is gone
> now.
With the fsync qu
On Fri, Jan 28, 2011 at 12:53 AM, Greg Smith wrote:
> Where there are still very ugly maximum latency figures here in every case,
> these periods just aren't as wide with the patch in place.
OK, committed the patch, with some additional commenting, and after
fixing the compiler warning Chris Brow
Robert Haas wrote:
During each cluster, the system probably slows way down, and then recovers when
the queue is emptied. So the TPS improvement isn't at all a uniform
speedup, but simply relief from the stall that would otherwise result
from a full queue.
That does seem to be the case here.
Robert Haas wrote:
Based on what I saw looking at this, I'm thinking that the backend
fsyncs probably happen in clusters - IOW, it's not 2504 backend fsyncs
spread uniformly throughout the test, but clusters of 100 or more that
happen in very quick succession, followed by relief when the
backgrou
On Thu, Jan 27, 2011 at 12:18 PM, Greg Smith wrote:
> Greg Smith wrote:
>>
>> I think a helpful next step here would be to put Robert's fsync compaction
>> patch into here and see if that helps. There are enough backend syncs
>> showing up in the difficult workloads (scale>=1000, clients >=32) th
Greg Smith wrote:
I think a helpful next step here would be to put Robert's fsync
compaction patch into here and see if that helps. There are enough
backend syncs showing up in the difficult workloads (scale>=1000,
clients >=32) that its impact should be obvious.
Initial tests show everythin
> To be frank, I really don't care about fixing this behavior on ext3,
> especially in the context of that sort of hack. That filesystem is not
> the future, it's not possible to ever really make it work right, and
> every minute spent on pandering to its limitations would be better spent
> elsew
Robert Haas wrote:
Idea #4: For ext3 filesystems that like to dump the entire buffer
cache instead of only the requested file, write a little daemon that
runs alongside of (and completely indepdently of) PostgreSQL. Every
30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and
close
2011/1/18 Greg Smith :
> Bruce Momjian wrote:
>>
>> Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00?
>>
>
> The idea of having a dead period doing no work at all between write phase
> and sync phase may have some merit. I don't have enough test data yet on
> some more funda
Bruce Momjian wrote:
Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00?
The idea of having a dead period doing no work at all between write
phase and sync phase may have some merit. I don't have enough test data
yet on some more fundamental issues in this area to com
Jim Nasby wrote:
Wow, that's the kind of thing that would be incredibly difficult to figure out,
especially while your production system is in flames... Can we change ereport
that happens in that case from DEBUG1 to WARNING? Or provide some other means
to track it
That's why we already added
On Mon, Jan 17, 2011 at 6:07 PM, Jim Nasby wrote:
> On Jan 15, 2011, at 8:15 AM, Robert Haas wrote:
>> Well, the point of this is not to save time in the bgwriter - I'm not
>> surprised to hear that wasn't noticeable. The point is that when the
>> fsync request queue fills up, backends start perf
On Jan 15, 2011, at 8:15 AM, Robert Haas wrote:
> Well, the point of this is not to save time in the bgwriter - I'm not
> surprised to hear that wasn't noticeable. The point is that when the
> fsync request queue fills up, backends start performing an fsync *for
> every block they write*, and that
Jeff Janes wrote:
Have you ever tested Robert's other idea of having a metronome process
do a periodic fsync on a dummy file which is located on the same ext3fs
as the table files? I think that that would be interesting to see.
To be frank, I really don't care about fixing this behavior on
On Sun, Jan 16, 2011 at 7:13 PM, Greg Smith wrote:
> I have finished a first run of benchmarking the current 9.1 code at various
> sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm for many
> details. The interesting stuff is in Test Set 3, near the bottom. That's
> the first one t
Greg Smith wrote:
> One of the components to the write queue is some notion that writes that
> have been waiting longest should eventually be flushed out. Linux has
> this number called dirty_expire_centiseconds which suggests it enforces
> just that, set to a default of 30 seconds. This is wh
On Sun, Jan 16, 2011 at 10:13 PM, Greg Smith wrote:
> I have finished a first run of benchmarking the current 9.1 code at various
> sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm for many
> details. The interesting stuff is in Test Set 3, near the bottom. That's
> the first one
I have finished a first run of benchmarking the current 9.1 code at
various sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm
for many details. The interesting stuff is in Test Set 3, near the
bottom. That's the first one that includes buffer_backend_fsync data.
This iall on ex
On Sun, Jan 16, 2011 at 7:32 PM, Jeff Janes wrote:
> But since you already wrote a patch to do the whole thing, I figured
> I'd time it.
Thanks!
> I arranged to test an instrumented version of your patch under large
> shared_buffers of 4GB, conditions that would maximize the opportunity
> for it
On Tue, Jan 11, 2011 at 5:27 PM, Robert Haas wrote:
> On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
>> One of the ideas Simon and I had been considering at one point was adding
>> some better de-duplication logic to the fsync absorb code, which I'm
>> reminded by the pattern here might be he
Robert Haas wrote:
What is the basis for thinking that the sync should get the same
amount of time as the writes? That seems pretty arbitrary. Right
now, you're allowing 3 seconds per fsync, which could be a lot more or
a lot less than 40% of the total checkpoint time...
Just that it's where
On Sat, Jan 15, 2011 at 5:57 PM, Greg Smith wrote:
> I was just giving an example of how I might do an initial split. There's a
> checkpoint happening now at time T; we have a rough idea that it needs to be
> finished before some upcoming time T+D. Currently with default parameters
> this become
On Sat, Jan 15, 2011 at 14:05, Robert Haas wrote:
> Idea #4: For ext3 filesystems that like to dump the entire buffer
> cache instead of only the requested file, write a little daemon that
> runs alongside of (and completely indepdently of) PostgreSQL. Every
> 30 s, it opens a 1-byte file, change
Robert Haas wrote:
That seems like a bad idea - don't we routinely recommend that people
crank this up to 0.9? You'd be effectively bounding the upper range
of this setting to a value to the less than the lowest value we
recommend anyone use today.
I was just giving an example of how I migh
On Sat, Jan 15, 2011 at 10:31 AM, Greg Smith wrote:
> That's going to give worse performance than the current code in some cases.
OK.
>> How does the checkpoint target give you any time to sync them? Unless
>> you squeeze the writes together more tightly, but that seems sketchy.
>
> Obviously t
On Sat, 2011-01-15 at 09:15 -0500, Robert Haas wrote:
> On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs wrote:
> > On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
> >> Robert Haas wrote:
> >> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
> >> >
> >> > > One of the ideas Simon and I had b
Robert Haas wrote:
I'll believe it when I see it. How about this:
a 1
a 2
sync a
b 1
b 2
sync b
c 1
c 2
sync c
Or maybe some variant, where we become willing to fsync a file a
certain number of seconds after writing the last block, or when all
the writes are done, whichever comes first.
That
On Sat, Jan 15, 2011 at 9:25 AM, Greg Smith wrote:
> Once upon a time we got a patch from Itagaki Takahiro whose purpose was to
> sort writes before sending them out:
>
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php
Ah, a fine idea!
> Which has very low odds of the sync on "
Robert Haas wrote:
Idea #2: At the beginning of a checkpoint when we scan all the
buffers, count the number of buffers that need to be synced for each
relation. Use the same hashtable that we use for tracking pending
fsync requests. Then, interleave the writes and the fsyncs...
Idea #3: Stick
On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs wrote:
> On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
>> Robert Haas wrote:
>> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
>> >
>> > > One of the ideas Simon and I had been considering at one point was adding
>> > > some better de-dupl
On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
> Robert Haas wrote:
> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
> >
> > > One of the ideas Simon and I had been considering at one point was adding
> > > some better de-duplication logic to the fsync absorb code, which I'm
> > >
On Sat, Jan 15, 2011 at 5:47 AM, Greg Smith wrote:
> No toe damage, this is great, I hadn't gotten to coding for this angle yet
> at all. Suffering from an overload of ideas and (mostly wasted) test data,
> so thanks for exploring this concept and proving it works.
Yeah - obviously I want to mak
Robert Haas wrote:
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
One of the ideas Simon and I had been considering at one point was adding
some better de-duplication logic to the fsync absorb code, which I'm
reminded by the pattern here might be helpful independently of other
improvemen
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith wrote:
> Having the pg_stat_bgwriter.buffers_backend_fsync patch available all the
> time now has made me reconsider how important one potential bit of
> refactoring here would be. I managed to catch one of the situations where
> really popular relation
On Mon, 2010-12-06 at 23:26 -0300, Alvaro Herrera wrote:
> Why would multiple bgwriter processes worry you?
Because it complicates the tracking of files requiring fsync.
As Greg says, the last attempt to do that was a lot of code.
--
Simon Riggs http://www.2ndQuadrant.com/books/
Po
Alvaro Herrera wrote:
Why would multiple bgwriter processes worry you?
Of course, it wouldn't work to have multiple processes trying to execute
a checkpoint simultaneously, but what if we separated the tasks so that
one process is in charge of checkpoints, and another one is in charge of
the LRU
Excerpts from Greg Smith's message of dom dic 05 20:02:48 -0300 2010:
> When ends up happening if you push toward fully sync I/O is the design
> you see in some other databases, where you need multiple writer
> processes. Then requests for new pages can continue to allocate as
> needed, while
Rob Wultsch wrote:
Forgive me, but is all of this a step on the slippery slope to
direct io? And is this a bad thing
I don't really think so. There's an important difference in my head
between direct I/O, where the kernel is told "write this immediately!",
and what I'm trying to achive. I w
On Sun, Dec 5, 2010 at 2:53 PM, Greg Smith wrote:
> Heikki Linnakangas wrote:
>>
>> If you fsync() a file with one dirty page in it, it's going to return very
>> quickly, but a 1GB file will take a while. That could be problematic if you
>> have a thousand small files and a couple of big ones, as
Heikki Linnakangas wrote:
If you fsync() a file with one dirty page in it, it's going to return
very quickly, but a 1GB file will take a while. That could be
problematic if you have a thousand small files and a couple of big
ones, as you would want to reserve more time for the big ones. I'm not
Greg Stark wrote:
Using sync_file_range you can specify the set of blocks to sync and
then block on them only after some time has passed. But there's no
documentation on how this relates to the I/O scheduler so it's not
clear it would have any effect on the problem.
I believe this is the exact
On Thu, Dec 2, 2010 at 2:24 PM, Greg Stark wrote:
> On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith wrote:
>>> I ask because I don't have a mental model of how the pause can help.
>>> Given that this dirty data has been hanging around for many minutes
>>> already, what is a 3 second pause going to hea
> Using sync_file_range you can specify the set of blocks to sync and
> then block on them only after some time has passed. But there's no
> documentation on how this relates to the I/O scheduler so it's not
> clear it would have any effect on the problem. We might still have to
> delay the begini
On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith wrote:
>> I ask because I don't have a mental model of how the pause can help.
>> Given that this dirty data has been hanging around for many minutes
>> already, what is a 3 second pause going to heal?
>>
>
> The difference is that once an fsync call is m
On 01.12.2010 23:30, Greg Smith wrote:
Heikki Linnakangas wrote:
Do you have any idea how to autotune the delay between fsyncs?
I'm thinking to start by counting the number of relations that need them
at the beginning of the checkpoint. Then use the same basic math that
drives the spread write
Heikki Linnakangas wrote:
Do you have any idea how to autotune the delay between fsyncs?
I'm thinking to start by counting the number of relations that need them
at the beginning of the checkpoint. Then use the same basic math that
drives the spread writes, where you assess whether you're on
On 01.12.2010 06:25, Greg Smith wrote:
Jeff Janes wrote:
I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3 second pause going to heal?
The difference is that once an fsync call is made,
Jeff Janes wrote:
Have you tested out this "absorb during syncing phase" code without
the sleep between the syncs?
I.e. so that it still a tight loop, but the loop alternates between
sync and absorb, with no intentional pause?
Yes; that's how it was developed. It helped to have just the ext
On Sun, Nov 14, 2010 at 3:48 PM, Greg Smith wrote:
...
> One change that turned out be necessary rather than optional--to get good
> performance from the system under tuning--was to make regular background
> writer activity, including fsync absorb checks, happen during these sync
> pauses. The
> Maybe, but it's hard to argue that the current implementation--just
> doing all of the sync calls as fast as possible, one after the other--is
> going to produce worst-case behavior in a lot of situations. Given that
> it's not a huge amount of code to do better, I'd rather do some work in
> th
Ron Mayer wrote:
Might smoother checkpoints be better solved by talking
to the OS vendors & virtual-memory-tunning-knob-authors
to work with them on exposing the ideal knobs; rather than
saying that our only tool is a hammer(fsync) so the problem
must be handled as a nail.
Maybe, but it's ha
Josh Berkus wrote:
> On 11/20/10 6:11 PM, Jeff Janes wrote:
>> True, but I think that changing these from their defaults is not
>> considered to be a dark art reserved for kernel hackers, i.e they are
>> something that sysadmins are expected to tweak to suite their work
>> load, just like the shmma
2010/11/21 Andres Freund :
> On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
>> For a similar problem we had (kernel buffering too much) we had success
>> using the fadvise and madvise WONTNEED syscalls to force the data to
>> exit the cache much sooner than it would otherwise. Th
On Sun, Nov 21, 2010 at 4:54 PM, Greg Smith wrote:
> Let me throw some numbers out [...]
Interesting.
> Ultimately what I want to do here is some sort of smarter write-behind sync
> operation, perhaps with a LRU on relations with pending fsync requests. The
> idea would be to sync relations tha
On 11/20/10 6:11 PM, Jeff Janes wrote:
> True, but I think that changing these from their defaults is not
> considered to be a dark art reserved for kernel hackers, i.e they are
> something that sysadmins are expected to tweak to suite their work
> load, just like the shmmax and such.
I disagree.
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
> For a similar problem we had (kernel buffering too much) we had success
> using the fadvise and madvise WONTNEED syscalls to force the data to
> exit the cache much sooner than it would otherwise. This was on Linux
> and it had the
On Sun, Nov 21, 2010 at 04:54:00PM -0500, Greg Smith wrote:
> Ultimately what I want to do here is some sort of smarter write-behind
> sync operation, perhaps with a LRU on relations with pending fsync
> requests. The idea would be to sync relations that haven't been touched
> in a while in
Robert Haas wrote:
Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea. For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
Jeff Janes wrote:
And for very large memory
systems, even 1% may be too much to cache (dirty*_ratio can only be
set in integer percent points), so recent kernels introduced
dirty*_bytes parameters. I like these better because they do what
they say. With the dirty*_ratio, I could never figure ou
On Sat, Nov 20, 2010 at 5:17 PM, Robert Haas wrote:
> On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes wrote:
>>> Doing all the writes and then all the fsyncs meets this requirement
>>> trivially, but I'm not so sure that's a good idea. For example, given
>>> files F1 ... Fn with dirty pages needing
On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes wrote:
>>> The thing to realize
>>> that complicates the design is that the actual sync execution may take a
>>> considerable period of time. It's much more likely for that to happen than
>>> in the case of an individual write, as the current spread che
On Mon, Nov 15, 2010 at 6:15 PM, Robert Haas wrote:
> On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith wrote:
>> The second issue is that the delay between sync calls is currently
>> hard-coded, at 3 seconds. I believe the right path here is to consider the
>> current checkpoint_completion_target to
On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith wrote:
> The second issue is that the delay between sync calls is currently
> hard-coded, at 3 seconds. I believe the right path here is to consider the
> current checkpoint_completion_target to still be valid, then work back from
> there. That raises
Final patch in this series for today spreads out the individual
checkpoint fsync calls over time, and was written by myself and Simon
Riggs. Patch is based against a system that's already had the two
patches I sent over earlier today applied, rather than HEAD, as both are
useful for measuring
99 matches
Mail list logo