Re: [HACKERS] Spread checkpoint sync

Greg Smith Sat, 15 Jan 2011 07:31:33 -0800

Robert Haas wrote:

I'll believe it when I see it.  How about this:


a 1
a 2
sync a
b 1
b 2
sync b
c 1
c 2
sync c

Or maybe some variant, where we become willing to fsync a file a
certain number of seconds after writing the last block, or when all
the writes are done, whichever comes first.

That's going to give worse performance than the current code in somecases. The goal of what's in there now is that you get a sequence likethis:


a1
b1
a2
[Filesystem writes a1]
b2
[Filesystem writes b1]
sync a [Only has to write a2]
sync b [Only has to write b2]

This idea works until you to get where the filesystem write cache is solarge that it becomes lazier about writing things. The fundamentalidea--push writes out some time before the sync, in hopes the filesystemwill get to them before that said--it not unsound. On some systems,doing the sync more aggressively than that will be a regression. Thisapproach just breaks down in some cases, and those cases are happeningmore now because their likelihood scales with total RAM. I don't wantto screw the people with smaller systems, who may be gettingconsiderable benefit from the existing sequence. Today's littlesystems--which are very similar to the high-end ones the spreadcheckpoint stuff was developed on during 8.3--do get some benefit fromit as far as I know.

Anyway, now that the ability to get logging on all this stuff went induring the last CF, it's way easier to just setup a random system to runtests in this area than it used to be. Whatever testing does happenshould include, say, a 2GB laptop with a single hard drive in it. Ithink that's the bottom of what is reasonable to consider a reasonabletarget for tweaking write performance on, given hardware 9.1 is likelyto be deployed on.

How does the checkpoint target give you any time to sync them?  Unless
you squeeze the writes together more tightly, but that seems sketchy.

Obviously the checkpoint target idea needs to be shuffled around sometoo. I was thinking of making the new default 0.8, and having it splitthe time in half for write and sync. That will make the write phaseclose to the speed people are seeing now, at the default of 0.5, whilegiving some window for spread sync too. The exact way to redistributethat around I'm not so concerned about yet. When I get to where that'sthe most uncertain thing left I'll benchmark the TPS vs. latencytrade-off and see what happens. If the rest of the code is good enoughbut this just needs to be tweaked, that's a perfect thing to get betafeedback to finalize.

Well you don't have to put it in shared memory on account of any of
that.  You can just hang it on a global variable.

Hmm. Because it's so similar to other things being allocated in sharedmemory, I just automatically pushed it over to there. But you're right;it doesn't need to be that complicated. Nobody is touching it but thebackground writer.

If we can find something that's a modest improvement on the
status quo and we can be confident in quickly, good, but I'd rather
have 9.1 go out the door on time without fully fixing this than delay
the release.

I'm not somebody who needs to be convinced of that. There are two nearcommit quality pieces of this out there now:

1) Keep some BGW cleaning and fsync absorption going while sync ishappening, rather than starting it and ignoring everything else untilit's done.


2) Compact fsync requests when the queue fills

If that's all we can get for 9.1, it will still be a major improvement.I realize I only have a very short period of time to complete a majorintegration breakthrough on the pieces floating around before the goalhere has to drop to something less ambitious. I head to the West Coastfor a week on the 23rd; I'll be forced to throw in the towel at thatpoint if I can't get the better ideas we have in pieces here allassembled well by then.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

Reply via email to