Hello INK,

   I've run couple of jobs with different mpirun options.

CRITERIA 1:

On one of the nodes - connected to infiniband network:

Job No 1:

mpirun command: /opt/mpi/openmpi/1.3/intel/bin/mpirun  --mca btl
^openib -np $NSLOTS -hostfile $TMPDIR/machines
/opt/apps/cpmd/3.11/ompi-atl
as/SOURCE/cpmd311-ompi-atlas.x  job.in $PP_LIBRARY > job_nn_out_omp_$JOB_ID



       CPU TIME :    0 HOURS 10 MINUTES 11.58 SECONDS
   ELAPSED TIME :    0 HOURS 10 MINUTES 30.51 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS  123384/ 384344 kBYTES ***

 PROGRAM CPMD ENDED AT:   Wed Mar 11 12:38:48 2009


 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE              116817. BYTES                891.  =
 = BROADCAST                 123195. BYTES                284.  =
 = GLOBAL SUMMATION           32926. BYTES                404.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM          2799401. BYTES               1226.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE             1040.965  MB/S           0.100 SEC  =
 = BROADCAST                 388.748  MB/S           0.090 SEC  =
 = GLOBAL SUMMATION            0.924  MB/S          28.780 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           121.233  MB/S          28.310 SEC  =
 = SYNCHRONISATION                                   0.010 SEC  =
 ================================================================


Job No 2:

/opt/mpi/openmpi/1.3/intel/bin/mpirun  --mca btl ^tcp -np $NSLOTS
-hostfile $TMPDIR/machines /opt/apps/cpmd/3.11/ompi-atlas/
SOURCE/cpmd311-ompi-atlas.x  job.in $PP_LIBRARY > job_nn_omp_tcp$JOB_ID

       CPU TIME :    0 HOURS 10 MINUTES 42.46 SECONDS
   ELAPSED TIME :    0 HOURS 10 MINUTES 43.76 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS  300480/ 567860 kBYTES ***

 PROGRAM CPMD ENDED AT:   Wed Mar 11 12:43:06 2009


 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE              116817. BYTES                891.  =
 = BROADCAST                 123195. BYTES                284.  =
 = GLOBAL SUMMATION           32926. BYTES                404.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM          2799401. BYTES               1226.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE             1487.163  MB/S           0.070 SEC  =
 = BROADCAST                 388.751  MB/S           0.090 SEC  =
 = GLOBAL SUMMATION            1.899  MB/S          14.010 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           264.404  MB/S          12.980 SEC  =
 = SYNCHRONISATION                                   0.001 SEC  =
 ================================================================


Job No 3:

/opt/mpi/openmpi/1.3/intel/bin/mpirun  -np $NSLOTS -hostfile
$TMPDIR/machines /opt/apps/cpmd/3.11/ompi-atlas/SOURCE/cpmd311-
ompi-atlas.x  job.in $PP_LIBRARY > job_nn_out_omp_$JOB_ID

       CPU TIME :    0 HOURS  9 MINUTES 31.99 SECONDS
   ELAPSED TIME :    0 HOURS  9 MINUTES 33.37 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS  301192/ 571044 kBYTES ***

 PROGRAM CPMD ENDED AT:   Wed Mar 11 20:25:12 2009


 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE              116817. BYTES                891.  =
 = BROADCAST                 123195. BYTES                284.  =
 = GLOBAL SUMMATION           32926. BYTES                404.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM          2799401. BYTES               1226.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE             2600.799  MB/S           0.040 SEC  =
 = BROADCAST                 349.872  MB/S           0.100 SEC  =
 = GLOBAL SUMMATION            3.811  MB/S           6.980 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           286.729  MB/S          11.970 SEC  =
 = SYNCHRONISATION                                   0.010 SEC  =
 ================================================================


CRITERIA 2:

On one of the nodes connected to Gigabit network:

Job No 1:

/opt/mpi/openmpi/1.3/intel/bin/mpirun  -np $NSLOTS -hostfile
$TMPDIR/machines /opt/apps/cpmd/3.11/ompi-atlas/SOURCE/cpmd311-
ompi-atlas.x  job.in $PP_LIBRARY > job_nn_GB_out_omp_$JOB_ID

       CPU TIME :    0 HOURS  5 MINUTES 57.45 SECONDS
   ELAPSED TIME :    0 HOURS  6 MINUTES 10.21 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS  123392/ 384344 kBYTES ***

 PROGRAM CPMD ENDED AT:   Wed Mar 11 20:07:52 2009


 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE              116817. BYTES                891.  =
 = BROADCAST                 123195. BYTES                284.  =
 = GLOBAL SUMMATION           32926. BYTES                404.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM          2799401. BYTES               1226.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE             2081.711  MB/S           0.050 SEC  =
 = BROADCAST                 583.121  MB/S           0.060 SEC  =
 = GLOBAL SUMMATION            3.514  MB/S           7.570 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           438.891  MB/S           7.820 SEC  =
 = SYNCHRONISATION                                   0.010 SEC  =
 ================================================================

Job No 2:

/opt/mpi/openmpi/1.3/intel/bin/mpirun --mca btl sm,self,tcp -np
$NSLOTS -hostfile $TMPDIR/machines /opt/apps/cpmd/3.11/ompi-
atlas/SOURCE/cpmd311-ompi-atlas.x  job.in $PP_LIBRARY >
job_nn_GB_out_omp_$JOB_ID

       CPU TIME :    0 HOURS  6 MINUTES 37.24 SECONDS
   ELAPSED TIME :    0 HOURS  6 MINUTES 49.97 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS  123416/ 384344 kBYTES ***

 PROGRAM CPMD ENDED AT:   Wed Mar 11 20:09:32 2009


 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE              116817. BYTES                891.  =
 = BROADCAST                 123195. BYTES                284.  =
 = GLOBAL SUMMATION           32926. BYTES                404.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM          2799401. BYTES               1226.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE             2080.441  MB/S           0.050 SEC  =
 = BROADCAST                 583.130  MB/S           0.060 SEC  =
 = GLOBAL SUMMATION            2.043  MB/S          13.020 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           338.792  MB/S          10.130 SEC  =
 = SYNCHRONISATION                                   0.001 SEC  =
 ================================================================


Observations:

For all jobs 4 processes are used and have run on a single node.

This time gigabit jobs are performing far better than infiniband jobs.
i.e gigabit jobs have taken 6 minutes and infiniband jobs 10 minutes
approximately.

What factors may be causing this change?

During these jobs execution, there were no jobs running on gigabit
network - nodes were completely free. But the infiniband nodes were
almost filled up with other jobs. Is this causing the lower
performance of ib jobs?

Note that, all jobs have submitted through gridengine from master
node. In this case, eventhough 4 processes are running on a single
node will there be a communication/link between master node and
execution node?

Thanks,
Sangamesh

On Tue, Mar 10, 2009 at 4:46 PM, Igor Kozin <i.n.ko...@googlemail.com> wrote:
> Hi Sangamesh,
> As far as I can tell there should be no difference if you run CPMD on a
> single node whether with or without ib. One easy thing that you could do is
> to repeat your runs on the infiniband node(s) with and without infiniband
> using --mca btl ^tcp and --mca btl ^openib respectively. But since you are
> using a single node I doubt it will make any difference.
>
> I agree with Jeff that there are many factors you need to be sure of. Please
> note that not only your elapsed times but also your CPU times are different.
> Furthermore the difference in communication times as indicated in your CPMD
> outputs can not be the only reason for the difference in the elapsed times.
> CPMD, MKL, and compiler versions, memory bandwidth, i/o and rogue processes
> running on a node could be the differentiating factors.
>
> The standard wat32 benchmark is a good test for a single node. You can find
> our benchmarking results here if you want to compare yours
> http://www.cse.scitech.ac.uk/disco/dbd/index.html
>
> Regards,
>
> INK
>
> 2009/3/10 Sangamesh B <forum....@gmail.com>
>>
>> Hello Ralph & Jeff,
>>
>>    This is the same issue - but this time the job is running on a single
>> node.
>>
>> The two systems on which the jobs are run, have the same hardware/OS
>> configuration. The only differences are:
>>
>> One node has 4 GB RAM and it is part of infiniband connected nodes.
>>
>> The other node has 8 GB RAM and it is part of gigabit connected nodes.
>>
>> For both jobs only 4 processes are used.
>>
>> All the processes are run on a single node.
>>
>> But why the GB node is taking more time than IB node?
>>
>> {ELAPSED TIME = WALL CLOCK TIME}
>>
>> Hope you are now clear with the problem.
>>
>> Thanks,
>> Sangamesh
>> On Mon, Mar 9, 2009 at 10:56 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
>> > It depends on the characteristics of the nodes in question.  You mention
>> > the
>> > CPU speeds and the RAM, but there are other factors as well: cache size,
>> > memory architecture, how many MPI processes you're running, etc.  Memory
>> > access patterns, particularly across UMA machines like clovertown and
>> > follow-in intel architectures can really get bogged down by the RAM
>> > bottlneck (all 8 cores hammering on memory simultaneously via a single
>> > memory bus).
>> >
>> >
>> >
>> > On Mar 9, 2009, at 10:30 AM, Sangamesh B wrote:
>> >
>> >> Dear Open MPI team,
>> >>
>> >>      With Open MPI-1.3, the fortran application CPMD is installed on
>> >> Rocks-4.3 cluster - Dual Processor Quad core Xeon @ 3 GHz. (8 cores
>> >> per node)
>> >>
>> >> Two jobs (4 processes job) are run on two nodes, separately - one node
>> >> has a ib connection ( 4 GB RAM)  and the other node has gigabit
>> >> connection (8 GB RAM).
>> >>
>> >> Note that, the network-connectivity may not be or not required to be
>> >> used as the two jobs are running in stand alone mode.
>> >>
>> >> Since the jobs are running on single node - no intercommunication
>> >> between nodes - so the performance of both the jobs should be same
>> >> irrespective of network connectivity. But here this is not the case.
>> >> The gigabit job is taking double the time of infiniband job.
>> >>
>> >> Following are the details of two jobs:
>> >>
>> >> Infiniband Job:
>> >>
>> >>      CPU TIME :    0 HOURS 10 MINUTES 21.71 SECONDS
>> >>   ELAPSED TIME :    0 HOURS 10 MINUTES 23.08 SECONDS
>> >>  ***      CPMD| SIZE OF THE PROGRAM IS  301192/ 571044 kBYTES ***
>> >>
>> >> Gigabit Job:
>> >>
>> >>       CPU TIME :    0 HOURS 12 MINUTES  7.93 SECONDS
>> >>   ELAPSED TIME :    0 HOURS 21 MINUTES  0.07 SECONDS
>> >>  ***      CPMD| SIZE OF THE PROGRAM IS  123420/ 384344 kBYTES ***
>> >>
>> >> More details are attached here in a file.
>> >>
>> >> Why there is a long difference between CPU TIME and ELAPSED TIME for
>> >> Gigabit job?
>> >>
>> >> This could be an issue with Open MPI itself. What could be the reason?
>> >>
>> >> Is there any flags need to be set?
>> >>
>> >> Thanks in advance,
>> >> Sangamesh
>> >>
>> >> <cpmd_gb_ib_1node><ATT3915213.txt>
>> >
>> >
>> > --
>> > Jeff Squyres
>> > Cisco Systems
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to