On 01/03/2012 06:22 PM, Jim Nasby wrote:
On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
I think that our current freelist is practically useless, because it
is almost always empty, and the cases where it's not empty (startup,
and after a table or database drop) are so narrow that we don't really
get any benefit out of having it.  However, I'm not opposed to the
idea of a freelist in general: I think that if we actually put in some
effort to keep the freelist in a non-empty state it would help a lot,
because backends would then have much less work to do at buffer
allocation time.
This is exactly what the FreeBSD VM system does (which is at least one of the 
places where the idea of a clock sweep for PG came from ages ago). There is a 
process that does nothing but attempt to keep X amount of memory on the free 
list, where it can immediately be grabbed by anything that needs memory. Pages 
on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. 
In fact, IIRC if a page on the freelist gets referenced again it can be pulled 
back out of the free list and put back into an active state.

The one downside I see to this is that we'd need some heuristic to determine 
how many buffers we want to keep on the free list.

http://wiki.postgresql.org/wiki/Todo#Background_Writer has "Consider adding buffers the background writer finds reusable to the free list" and "Automatically tune bgwriter_delay based on activity rather then using a fixed interval", which both point to my 8.3 musing and other suggestionss starting at http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php I could write both those in an afternoon. The auto-tuning stuff already in the background writer originally intended to tackle this issue, but dropped it in lieu of shipping something simpler first. There's even a prototype somewhere on an old drive here.

The first missing piece needed before this was useful was separating out the background writer and checkpointer processes. Once I realized the checkpoints were monopolizing so much time, especially when they hit bad states, it was obvious the writer couldn't be relied upon for this job. That's much better now since Simon's 806a2aee3791244bf0f916729bfdb5489936e068 "Split work of bgwriter between 2 processes: bgwriter and checkpointer", which just became available in November to build on.

The second missing piece blocking this work in my mind was how exactly we're going to benchmark the result, mainly to prove it doesn't hurt some workloads. I haven't fully internalized the implications of Robert's upthread comments, in terms of being able to construct a benchmark stressing both the best and worst case situation here. That's really the hardest part of this whole thing, by a lot. Recent spending has brought me an 8 HyperThread core laptop that can also run DTrace, so I expect to have better visibility into this soon too.

I think here in 2011 the idea of having a background writer process that could potentially occupy most of a whole core doing work so backends don't have to is an increasingly attractive one. So long as that comes along with an auto-tuning delay, it shouldn't hurt the work toward lowering power management either. Might even help really, by allowing larger values of bgwriter_delay than you'd want to use during busy periods. I was planning to mimic the sort of fast attack/slow delay logic already used for the auto-tuned timing, so that you won't fall behind by more than one bgwriter_delay worth of activity. Then it should realize a burst is here and the writer has to start moving faster.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to