It depends on the characteristics of the nodes in question. You
mention the CPU speeds and the RAM, but there are other factors as
well: cache size, memory architecture, how many MPI processes you're
running, etc. Memory access patterns, particularly across UMA
machines like clovertown and follow-in intel architectures can really
get bogged down by the RAM bottlneck (all 8 cores hammering on memory
simultaneously via a single memory bus).
On Mar 9, 2009, at 10:30 AM, Sangamesh B wrote:
Dear Open MPI team,
With Open MPI-1.3, the fortran application CPMD is installed on
Rocks-4.3 cluster - Dual Processor Quad core Xeon @ 3 GHz. (8 cores
per node)
Two jobs (4 processes job) are run on two nodes, separately - one node
has a ib connection ( 4 GB RAM) and the other node has gigabit
connection (8 GB RAM).
Note that, the network-connectivity may not be or not required to be
used as the two jobs are running in stand alone mode.
Since the jobs are running on single node - no intercommunication
between nodes - so the performance of both the jobs should be same
irrespective of network connectivity. But here this is not the case.
The gigabit job is taking double the time of infiniband job.
Following are the details of two jobs:
Infiniband Job:
CPU TIME : 0 HOURS 10 MINUTES 21.71 SECONDS
ELAPSED TIME : 0 HOURS 10 MINUTES 23.08 SECONDS
*** CPMD| SIZE OF THE PROGRAM IS 301192/ 571044 kBYTES ***
Gigabit Job:
CPU TIME : 0 HOURS 12 MINUTES 7.93 SECONDS
ELAPSED TIME : 0 HOURS 21 MINUTES 0.07 SECONDS
*** CPMD| SIZE OF THE PROGRAM IS 123420/ 384344 kBYTES ***
More details are attached here in a file.
Why there is a long difference between CPU TIME and ELAPSED TIME for
Gigabit job?
This could be an issue with Open MPI itself. What could be the reason?
Is there any flags need to be set?
Thanks in advance,
Sangamesh
<cpmd_gb_ib_1node><ATT3915213.txt>
--
Jeff Squyres
Cisco Systems