On Tue, Jan 11, 2011 at 5:27 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith <g...@2ndquadrant.com> wrote: >> One of the ideas Simon and I had been considering at one point was adding >> some better de-duplication logic to the fsync absorb code, which I'm >> reminded by the pattern here might be helpful independently of other >> improvements. > > Hopefully I'm not stepping on any toes here, but I thought this was an > awfully good idea and had a chance to take a look at how hard it would > be today while en route from point A to point B. The answer turned > out to be "not very", so PFA a patch that seems to work. I tested it > by attaching gdb to the background writer while running pgbench, and > it eliminate the backend fsyncs without even breaking a sweat.
I had been concerned about how long the lock would be held, and I was pondering ways to do only partial deduplication to reduce the time. But since you already wrote a patch to do the whole thing, I figured I'd time it. I arranged to test an instrumented version of your patch under large shared_buffers of 4GB, conditions that would maximize the opportunity for it to take a long time. Running your compaction to go from 524288 to a handful (14 to 29, depending on run) took between 36 and 39 milliseconds. For comparison, doing just the memcpy part of AbsorbFsyncRequest on a full queue took from 24 to 27 milliseconds. They are close enough to each other that I am no longer interested in partial deduplication. But both are long enough that I wonder if having a hash table in shared memory that is kept unique automatically at each update might not be worthwhile. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers