Robert Haas wrote:
I'll believe it when I see it.  How about this:

a 1
a 2
sync a
b 1
b 2
sync b
c 1
c 2
sync c

Or maybe some variant, where we become willing to fsync a file a
certain number of seconds after writing the last block, or when all
the writes are done, whichever comes first.

That's going to give worse performance than the current code in some cases. The goal of what's in there now is that you get a sequence like this:

a1
b1
a2
[Filesystem writes a1]
b2
[Filesystem writes b1]
sync a [Only has to write a2]
sync b [Only has to write b2]

This idea works until you to get where the filesystem write cache is so large that it becomes lazier about writing things. The fundamental idea--push writes out some time before the sync, in hopes the filesystem will get to them before that said--it not unsound. On some systems, doing the sync more aggressively than that will be a regression. This approach just breaks down in some cases, and those cases are happening more now because their likelihood scales with total RAM. I don't want to screw the people with smaller systems, who may be getting considerable benefit from the existing sequence. Today's little systems--which are very similar to the high-end ones the spread checkpoint stuff was developed on during 8.3--do get some benefit from it as far as I know.

Anyway, now that the ability to get logging on all this stuff went in during the last CF, it's way easier to just setup a random system to run tests in this area than it used to be. Whatever testing does happen should include, say, a 2GB laptop with a single hard drive in it. I think that's the bottom of what is reasonable to consider a reasonable target for tweaking write performance on, given hardware 9.1 is likely to be deployed on.

How does the checkpoint target give you any time to sync them?  Unless
you squeeze the writes together more tightly, but that seems sketchy.

Obviously the checkpoint target idea needs to be shuffled around some too. I was thinking of making the new default 0.8, and having it split the time in half for write and sync. That will make the write phase close to the speed people are seeing now, at the default of 0.5, while giving some window for spread sync too. The exact way to redistribute that around I'm not so concerned about yet. When I get to where that's the most uncertain thing left I'll benchmark the TPS vs. latency trade-off and see what happens. If the rest of the code is good enough but this just needs to be tweaked, that's a perfect thing to get beta feedback to finalize.

Well you don't have to put it in shared memory on account of any of
that.  You can just hang it on a global variable.

Hmm. Because it's so similar to other things being allocated in shared memory, I just automatically pushed it over to there. But you're right; it doesn't need to be that complicated. Nobody is touching it but the background writer.

If we can find something that's a modest improvement on the
status quo and we can be confident in quickly, good, but I'd rather
have 9.1 go out the door on time without fully fixing this than delay
the release.

I'm not somebody who needs to be convinced of that. There are two near commit quality pieces of this out there now:

1) Keep some BGW cleaning and fsync absorption going while sync is happening, rather than starting it and ignoring everything else until it's done.

2) Compact fsync requests when the queue fills

If that's all we can get for 9.1, it will still be a major improvement. I realize I only have a very short period of time to complete a major integration breakthrough on the pieces floating around before the goal here has to drop to something less ambitious. I head to the West Coast for a week on the 23rd; I'll be forced to throw in the towel at that point if I can't get the better ideas we have in pieces here all assembled well by then.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to