On Tue, 01 Nov 2005 07:32:32 +0000 Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Mon, 2005-10-31 at 16:10 -0800, Mark Wong wrote: > > On Thu, 20 Oct 2005 23:03:47 +0100 > > Simon Riggs <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote: > > > > > > > > > > This isn't exactly elegant coding, but it provides a useful > > > > > improvement > > > > > on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this > > > > > looks > > > > > pretty darn stupid. But it does follow the CPU optimization handbook > > > > > advice and I did see a noticeable improvement in performance and a > > > > > reduction in context switching. > > > > > > > > I'm not in a position to try this again now on 8.1beta, but I'd > > > > > welcome > > > > > a performance test result from anybody that is. I'll supply a patch > > > > > against 8.1beta for anyone wanting to test this. > > > > > > > > Ok, I've produce a few results on a 4 way (8 core) POWER 5 system, which > > > > I've just set up and probably needs a bit of tuning. I don't see much > > > > difference but I'm wondering if the cacheline sizes are dramatically > > > > different from Intel/AMD processors. I still need to take a closer look > > > > to make sure I haven't grossly mistuned anything, but I'll let everyone > > > > take a look: > > > > > > Well, the Power 5 architecture probably has the lowest overall memory > > > delay you can get currently so in some ways that would negate the > > > effects of the patch. (Cacheline is still 128 bytes, AFAICS). But it's > > > clear the patch isn't significantly better (like it was with 8.0 when we > > > tried this on the 8-way Itanium in Feb). > > > > > > > cvs 20051013 > > > > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/19/ > > > > 2501 notpm > > > > > > > > cvs 20051013 w/ lw.patch > > > > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/20/ > > > > 2519 notpm > > > > > > Could you re-run with wal_buffers = 32 ? (Without patch) Thanks > > > > Ok, sorry for the delay. I've bumped up the wal_buffers to 2048 and > > redid the disk layout. Here's where I'm at now: > > > > cvs 20051013 > > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/40/ > > 3257 notpm > > > > cvs 20051013 w/ lw.patch > > http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/42/ > > 3285 notpm > > > > Still not much of a difference with the patch. A quick glance over the > > iostat data suggests I'm still not i/o bound, but the i/o wait is rather > > high according to vmstat. Will try to see if there's anything else > > obvious to get the load up higher. > > OK, thats fine. I'm glad there's some gain, but not much yet. I think we > should leave out doing any more tests on lw.patch for now. > > Concerned about the awful checkpointing. Can you bump wal_buffers to > 8192 just to make sure? Thats way too high, but just to prove it. > > We need to rdeuce the number of blocks to be written at checkpoint. > > bgwriter_all_maxpages 5 -> 15 > bgwriter_all_percent 0.333 > bgwriter_delay 200 > bgwriter_lru_maxpages 5 -> 7 > bgwriter_lru_percent 1 > > shared_buffers set lower to 100000 > (which should cause some amusement on-list) Okay, here goes, all with the same source base w/ the lw.patch: http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/44/ only increased wal_buffers to 8192 from 2048 3242 notpm http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/43/ only increased bgwriter_all_maxpages to 15, and bgwriter_lru_maxpages to 7 3019 notpm (but more interesting graph) http://www.testing.osdl.org/projects/dbt2dev/results/dev4-014/45/ Same as the previously listen run with hared_buffers lowered to 10000 2503 notpm Mark ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings