Greg Smith wrote:
I think a helpful next step here would be to put Robert's fsync
compaction patch into here and see if that helps. There are enough
backend syncs showing up in the difficult workloads (scale>=1000,
clients >=32) that its impact should be obvious.
Initial tests show everything expected from this change and more. This
took me a while to isolate because of issues where the filesystem
involved degraded over time, giving a heavy bias toward a faster first
test run, before anything was fragmented. I just had to do a whole new
mkfs on the database/xlog disks when switching between test sets in
order to eliminate that.
At a scale of 500, I see the following average behavior:
Clients TPS backend-fsync
16 557 155
32 587 572
64 628 843
128 621 1442
256 632 2504
On one run through with the fsync compaction patch applied this turned into:
Clients TPS backend-fsync
16 637 0
32 621 0
64 721 0
128 716 0
256 841 0
So not only are all the backend fsyncs gone, there is a very clear TPS
improvement too. The change in results at >=64 clients are well above
the usual noise threshold in these tests.
The problem where individual fsync calls during checkpoints can take a
long time is not appreciably better. But I think this will greatly
reduce the odds of running into the truly dysfunctional breakdown, where
checkpoint and backend fsync calls compete with one another, that caused
the worst-case situation kicking off this whole line of research here.
--
Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers