On 01/03/2012 06:22 PM, Jim Nasby wrote:
On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
I think that our current freelist is practically useless, because it
is almost always empty, and the cases where it's not empty (startup,
and after a table or database drop) are so narrow that we don't really
get any benefit out of having it. However, I'm not opposed to the
idea of a freelist in general: I think that if we actually put in some
effort to keep the freelist in a non-empty state it would help a lot,
because backends would then have much less work to do at buffer
allocation time.
This is exactly what the FreeBSD VM system does (which is at least one of the
places where the idea of a clock sweep for PG came from ages ago). There is a
process that does nothing but attempt to keep X amount of memory on the free
list, where it can immediately be grabbed by anything that needs memory. Pages
on the freelist are guaranteed to be clean (as in not dirty), but not zero'd.
In fact, IIRC if a page on the freelist gets referenced again it can be pulled
back out of the free list and put back into an active state.
The one downside I see to this is that we'd need some heuristic to determine
how many buffers we want to keep on the free list.
http://wiki.postgresql.org/wiki/Todo#Background_Writer has "Consider
adding buffers the background writer finds reusable to the free list"
and "Automatically tune bgwriter_delay based on activity rather then
using a fixed interval", which both point to my 8.3 musing and other
suggestionss starting at
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php I
could write both those in an afternoon. The auto-tuning stuff already
in the background writer originally intended to tackle this issue, but
dropped it in lieu of shipping something simpler first. There's even a
prototype somewhere on an old drive here.
The first missing piece needed before this was useful was separating out
the background writer and checkpointer processes. Once I realized the
checkpoints were monopolizing so much time, especially when they hit bad
states, it was obvious the writer couldn't be relied upon for this job.
That's much better now since Simon's
806a2aee3791244bf0f916729bfdb5489936e068 "Split work of bgwriter between
2 processes: bgwriter and checkpointer", which just became available in
November to build on.
The second missing piece blocking this work in my mind was how exactly
we're going to benchmark the result, mainly to prove it doesn't hurt
some workloads. I haven't fully internalized the implications of
Robert's upthread comments, in terms of being able to construct a
benchmark stressing both the best and worst case situation here. That's
really the hardest part of this whole thing, by a lot. Recent spending
has brought me an 8 HyperThread core laptop that can also run DTrace, so
I expect to have better visibility into this soon too.
I think here in 2011 the idea of having a background writer process that
could potentially occupy most of a whole core doing work so backends
don't have to is an increasingly attractive one. So long as that comes
along with an auto-tuning delay, it shouldn't hurt the work toward
lowering power management either. Might even help really, by allowing
larger values of bgwriter_delay than you'd want to use during busy
periods. I was planning to mimic the sort of fast attack/slow delay
logic already used for the auto-tuned timing, so that you won't fall
behind by more than one bgwriter_delay worth of activity. Then it
should realize a burst is here and the writer has to start moving faster.
--
Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers