Re: IO queueing and complete affinity w/ threads: Some results

2008-02-20 Thread Jens Axboe
On Tue, Feb 19 2008, Mike Travis wrote: > Paul Jackson wrote: > > Jens wrote: > >> My main worry with the current code is the ->lock in the per-cpu > >> completion structure. > > > > Drive-by-comment here: Does the patch posted later this same day by Mike > > Travis: > > > > [PATCH 0/2] percp

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-19 Thread Mike Travis
Paul Jackson wrote: > Jens wrote: >> My main worry with the current code is the ->lock in the per-cpu >> completion structure. > > Drive-by-comment here: Does the patch posted later this same day by Mike > Travis: > > [PATCH 0/2] percpu: Optimize percpu accesses v3 > > help with this lock is

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-19 Thread Paul Jackson
Jens wrote: > My main worry with the current code is the ->lock in the per-cpu > completion structure. Drive-by-comment here: Does the patch posted later this same day by Mike Travis: [PATCH 0/2] percpu: Optimize percpu accesses v3 help with this lock issue any? (I have no real clue here --

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-18 Thread Nick Piggin
On Mon, Feb 18, 2008 at 02:33:17PM +0100, Andi Kleen wrote: > Jens Axboe <[EMAIL PROTECTED]> writes: > > > and that scrapping the remote > > softirq trigger stuff is sanest. > > I actually liked Nick's queued smp_function_call_single() patch. So even > if it was not used for block I would still l

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-18 Thread Jens Axboe
On Mon, Feb 18 2008, Andi Kleen wrote: > Jens Axboe <[EMAIL PROTECTED]> writes: > > > and that scrapping the remote > > softirq trigger stuff is sanest. > > I actually liked Nick's queued smp_function_call_single() patch. So even > if it was not used for block I would still like to see it being m

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-18 Thread Andi Kleen
Jens Axboe <[EMAIL PROTECTED]> writes: > and that scrapping the remote > softirq trigger stuff is sanest. I actually liked Nick's queued smp_function_call_single() patch. So even if it was not used for block I would still like to see it being merged in some form to speed up all the other IPI use

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-18 Thread Jens Axboe
On Thu, Feb 14 2008, Alan D. Brunelle wrote: > Taking a step back, I went to a very simple test environment: > > o 4-way IA64 > o 2 disks (on separate RAID controller, handled by separate ports on the > same FC HBA - generates different IRQs). > o Using write-cached tests - keep all IOs inside

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-14 Thread Alan D. Brunelle
Taking a step back, I went to a very simple test environment: o 4-way IA64 o 2 disks (on separate RAID controller, handled by separate ports on the same FC HBA - generates different IRQs). o Using write-cached tests - keep all IOs inside of the RAID controller's cache, so no perturbations due

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-13 Thread Alan D. Brunelle
Comparative results between the original affinity patch and the kthreads-based patch on the 32-way running the kernel make sequence. It may be easier to compare/contrast with the graphs provided at http://free.linux.hp.com/~adb/jens/kernmk.png (kernmk.agr also provided, if you want to run xmgr

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-12 Thread Alan D. Brunelle
Alan D. Brunelle wrote: > > Hopefully, the first column is self-explanatory - these are the settings > applied to the queue_affinity, completion_affinity and rq_affinity tunables. > Due to the fact that the standard deviations are so large coupled with the > very close average results, I'm not

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-12 Thread Alan D. Brunelle
Back on the 32-way, in this set of tests we're running 12 disks spread out through the 8 cells of the 32-way. Each disk will have an Ext2 FS placed on it, a clean Linux kernel source untar()ed onto it, then a full make (-j4) and then a make clean performed. The 12 series are done in parallel - s

Re: IO queueing and complete affinity w/ threads: Some results

2008-02-12 Thread Alan D. Brunelle
Whilst running a series of file system related loads on our 32-way*, I dropped down to a 16-way w/ only 24 disks, and ran two kernels: the original set of Jens' patches and then his subsequent kthreads-based set. Here are the results: Original: A Q C | MBPS Avg Lat StdDev | Q-local Q-remote

IO queueing and complete affinity w/ threads: Some results

2008-02-11 Thread Alan D. Brunelle
The test case chosen may not be a very good start, but anyways, here are some initial test results with the "nasty arch bits". This was performed on a 32-way ia64 box with 1 terrabyte of RAM, and 144 FC disks (contained in 24 HP MSA1000 RAID controlers attached to 12 dual-port adapters). Each te