Hi Robert,
I got inspired by your question to run a few more tests. They are
crude, and I don't have actual cpu timing information because of a
library mismatch. However:
Setup:
Xserve, 2x2.26 GHz Quad-core Intel Xeon
6.0 Gb memory 1067 MHz DDR3
Mac OS X 10.5.6
Nodes are connected with a dedicated gigabit ethernet switch.
I'm running the MITgcm, a nonhydrostatic global circulation model.
The grid size is modest: 10x150x1600, so bear that in mind. Message
passing is on the dimension that is 150x10, and typically is 3 grid
cells in either direction. I'm not sure how many variables are
passed, but I would guess on the order of 24.
I turned off all the I/O I knew of to reduce disk latency.
1 node: 8 processes: 54 minutes
1 node: 16 processes: 40 minutes (oversubscribed)
2 nodes, 16 processes: 29 minutes
So, oversubscribing was faster (in this case), but it didn't double
the speed. Certainly spreading the load to another node was much
faster.
I haven't had a chance to implement Warner's suggestion of turning
hyperthreading off to see what affect that has on the speed.
Cheers, Jody