Hello Ralph & Jeff, This is the same issue - but this time the job is running on a single node.
The two systems on which the jobs are run, have the same hardware/OS configuration. The only differences are: One node has 4 GB RAM and it is part of infiniband connected nodes. The other node has 8 GB RAM and it is part of gigabit connected nodes. For both jobs only 4 processes are used. All the processes are run on a single node. But why the GB node is taking more time than IB node? {ELAPSED TIME = WALL CLOCK TIME} Hope you are now clear with the problem. Thanks, Sangamesh On Mon, Mar 9, 2009 at 10:56 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > It depends on the characteristics of the nodes in question. You mention the > CPU speeds and the RAM, but there are other factors as well: cache size, > memory architecture, how many MPI processes you're running, etc. Memory > access patterns, particularly across UMA machines like clovertown and > follow-in intel architectures can really get bogged down by the RAM > bottlneck (all 8 cores hammering on memory simultaneously via a single > memory bus). > > > > On Mar 9, 2009, at 10:30 AM, Sangamesh B wrote: > >> Dear Open MPI team, >> >> With Open MPI-1.3, the fortran application CPMD is installed on >> Rocks-4.3 cluster - Dual Processor Quad core Xeon @ 3 GHz. (8 cores >> per node) >> >> Two jobs (4 processes job) are run on two nodes, separately - one node >> has a ib connection ( 4 GB RAM) and the other node has gigabit >> connection (8 GB RAM). >> >> Note that, the network-connectivity may not be or not required to be >> used as the two jobs are running in stand alone mode. >> >> Since the jobs are running on single node - no intercommunication >> between nodes - so the performance of both the jobs should be same >> irrespective of network connectivity. But here this is not the case. >> The gigabit job is taking double the time of infiniband job. >> >> Following are the details of two jobs: >> >> Infiniband Job: >> >> CPU TIME : 0 HOURS 10 MINUTES 21.71 SECONDS >> ELAPSED TIME : 0 HOURS 10 MINUTES 23.08 SECONDS >> *** CPMD| SIZE OF THE PROGRAM IS 301192/ 571044 kBYTES *** >> >> Gigabit Job: >> >> CPU TIME : 0 HOURS 12 MINUTES 7.93 SECONDS >> ELAPSED TIME : 0 HOURS 21 MINUTES 0.07 SECONDS >> *** CPMD| SIZE OF THE PROGRAM IS 123420/ 384344 kBYTES *** >> >> More details are attached here in a file. >> >> Why there is a long difference between CPU TIME and ELAPSED TIME for >> Gigabit job? >> >> This could be an issue with Open MPI itself. What could be the reason? >> >> Is there any flags need to be set? >> >> Thanks in advance, >> Sangamesh >> >> <cpmd_gb_ib_1node><ATT3915213.txt> > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >