Hi Jeff,

Thanks for your response.
Is there is any requirement on the size of the data buffers
I should use in these warmup broadcasts ? If I use small
buffers like 1000 real values during warmup, the following
actual and timed MPI_BCAST over IB is taking a lot of time
(more than that on GiGE). If I use a bigger buffer of 10000 real
values during warmup the following timed MPI_BCAST is quick.

Surprisingly just doing two consecutive 80K byte MPI_BCASTs
performs very quick (forget about warmup and actual broadcast).
wheres as a single 80K broadcast is slow. Not sure if I'm missing
anything!.

Thanks for you time and suggestions,
--Krishna.

On Mon, 12 Jan 2009, Jeff Squyres wrote:

You might want to do some "warmup" bcasts before doing your timing measurements.

Open MPI makes network connections lazily, meaning that we only make connections upon the first send (e.g., the sends underneath the MPI_BCAST). So the first MPI_BCAST is likely to be quite slow, while all the IB network connections are being made. Subsequent bcasts are likely to be much faster.


On Jan 9, 2009, at 8:47 PM, kmur...@lbl.gov wrote:


Hello there,

We have a DDR IB cluster with Open MPI ver 1.2.8.
I'm testing on two nodes with two processors each and both
the nodes are adjacent (2 hops distant) on the same leaf
of the tree interconnect.

I observe that when I try to MPI_BCAST among the four MPI
tasks it takes a lot of time with IB network (more than
the GiGE network) when the payload sizes range from 24K bytes
to 800K bytes.

For payloads below 8K bytes and above 200K bytes the performance
is acceptable.

Any suggestions on how I debug this and locate the source of
the problem ? (More info below) Please let me know if you need
any more information from my side.

thanks for your time,
Krishna Muriki,
HPC User Services,
Scientific Cluster Support,
Lawrence Berkeley National Laboratory.

I) Payload size 8M bytes over IB:

[kmuriki@n0005 pub]$ mpirun -v -display-map --mca btl openib,self -np 4 -hostfile hostfile.lr ./testbcast.8000000
[n0005.scs00:13902]  Map for job: 1     Generated by mapping mode: byslot
      Starting vpid: 0        Vpid range: 4   Num app_contexts: 1
      Data for app_context: index 0   app: ./testbcast.8000000
              Num procs: 4
              Argv[0]: ./testbcast.8000000
              Env[0]: OMPI_MCA_btl=openib,self
              Env[1]: OMPI_MCA_rmaps_base_display_map=1
              Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
Env[3]: OMPI_MCA_orte_precondition_transports=1405b3b501aa4086-00dbc7151c7348e1
              Env[4]: OMPI_MCA_rds=proxy
              Env[5]: OMPI_MCA_ras=proxy
              Env[6]: OMPI_MCA_rmaps=proxy
              Env[7]: OMPI_MCA_pls=proxy
              Env[8]: OMPI_MCA_rmgr=proxy
Working dir: /global/home/users/kmuriki/sample_executables/pub (user: 0)
              Num maps: 0
      Num elements in nodes list: 2
      Mapped node:
Cell: 0 Nodename: n0172.lr Launch id: -1 Username: NULL
              Daemon name:
                      Data type: ORTE_PROCESS_NAME    Data Value: NULL
              Oversubscribed: False   Num elements in procs list: 2
              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,0]
                      Proc Rank: 0    Proc PID: 0     App_context index: 0

              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,1]
                      Proc Rank: 1    Proc PID: 0     App_context index: 0

      Mapped node:
Cell: 0 Nodename: n0173.lr Launch id: -1 Username: NULL
              Daemon name:
                      Data type: ORTE_PROCESS_NAME    Data Value: NULL
              Oversubscribed: False   Num elements in procs list: 2
              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,2]
                      Proc Rank: 2    Proc PID: 0     App_context index: 0

              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,3]
                      Proc Rank: 3    Proc PID: 0     App_context index: 0
About to call broadcast           3
About to call broadcast           1
About to call broadcast           2
About to call broadcast           0
Done with call to broadcast           2
time for bcast  0.133496046066284
Done with call to broadcast           3
time for bcast  0.148098945617676
Done with call to broadcast           0
time for bcast  0.113168954849243
Done with call to broadcast           1
time for bcast  0.145189046859741
[kmuriki@n0005 pub]$


II) Payload size 80K bytes using GiGE Network:

[kmuriki@n0005 pub]$ mpirun -v -display-map --mca btl tcp,self -np 4 -hostfile hostfile.lr ./testbcast.80000
[n0005.scs00:13928]  Map for job: 1     Generated by mapping mode: byslot
      Starting vpid: 0        Vpid range: 4   Num app_contexts: 1
      Data for app_context: index 0   app: ./testbcast.80000
              Num procs: 4
              Argv[0]: ./testbcast.80000
              Env[0]: OMPI_MCA_btl=tcp,self
              Env[1]: OMPI_MCA_rmaps_base_display_map=1
              Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
Env[3]: OMPI_MCA_orte_precondition_transports=305b93d4acc82685-12bbf20d2e6d250b
              Env[4]: OMPI_MCA_rds=proxy
              Env[5]: OMPI_MCA_ras=proxy
              Env[6]: OMPI_MCA_rmaps=proxy
              Env[7]: OMPI_MCA_pls=proxy
              Env[8]: OMPI_MCA_rmgr=proxy
Working dir: /global/home/users/kmuriki/sample_executables/pub (user: 0)
              Num maps: 0
      Num elements in nodes list: 2
      Mapped node:
Cell: 0 Nodename: n0172.lr Launch id: -1 Username: NULL
              Daemon name:
                      Data type: ORTE_PROCESS_NAME    Data Value: NULL
              Oversubscribed: False   Num elements in procs list: 2
              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,0]
                      Proc Rank: 0    Proc PID: 0     App_context index: 0

              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,1]
                      Proc Rank: 1    Proc PID: 0     App_context index: 0

      Mapped node:
Cell: 0 Nodename: n0173.lr Launch id: -1 Username: NULL
              Daemon name:
                      Data type: ORTE_PROCESS_NAME    Data Value: NULL
              Oversubscribed: False   Num elements in procs list: 2
              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,2]
                      Proc Rank: 2    Proc PID: 0     App_context index: 0

              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,3]
                      Proc Rank: 3    Proc PID: 0     App_context index: 0
About to call broadcast           0
About to call broadcast           2
About to call broadcast           1
Done with call to broadcast           2
time for bcast  7.137393951416016E-002
About to call broadcast           3
Done with call to broadcast           3
time for bcast  1.110005378723145E-002
Done with call to broadcast           0
time for bcast  7.121706008911133E-002
Done with call to broadcast           1
time for bcast  3.379988670349121E-002
[kmuriki@n0005 pub]$

III) Payload size 80K bytes using IB Network:


[kmuriki@n0005 pub]$ mpirun -v -display-map --mca btl openib,self -np 4 -hostfile hostfile.lr ./testbcast.80000
[n0005.scs00:13941]  Map for job: 1     Generated by mapping mode: byslot
      Starting vpid: 0        Vpid range: 4   Num app_contexts: 1
      Data for app_context: index 0   app: ./testbcast.80000
              Num procs: 4
              Argv[0]: ./testbcast.80000
              Env[0]: OMPI_MCA_btl=openib,self
              Env[1]: OMPI_MCA_rmaps_base_display_map=1
              Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
Env[3]: OMPI_MCA_orte_precondition_transports=4cdb5ae2babe9010-709842ac574605f9
              Env[4]: OMPI_MCA_rds=proxy
              Env[5]: OMPI_MCA_ras=proxy
              Env[6]: OMPI_MCA_rmaps=proxy
              Env[7]: OMPI_MCA_pls=proxy
              Env[8]: OMPI_MCA_rmgr=proxy
Working dir: /global/home/users/kmuriki/sample_executables/pub (user: 0)
              Num maps: 0
      Num elements in nodes list: 2
      Mapped node:
Cell: 0 Nodename: n0172.lr Launch id: -1 Username: NULL
              Daemon name:
                      Data type: ORTE_PROCESS_NAME    Data Value: NULL
              Oversubscribed: False   Num elements in procs list: 2
              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,0]
                      Proc Rank: 0    Proc PID: 0     App_context index: 0

              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,1]
                      Proc Rank: 1    Proc PID: 0     App_context index: 0

      Mapped node:
Cell: 0 Nodename: n0173.lr Launch id: -1 Username: NULL
              Daemon name:
                      Data type: ORTE_PROCESS_NAME    Data Value: NULL
              Oversubscribed: False   Num elements in procs list: 2
              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,2]
                      Proc Rank: 2    Proc PID: 0     App_context index: 0

              Mapped proc:
                      Proc Name:
                      Data type: ORTE_PROCESS_NAME    Data Value: [0,1,3]
                      Proc Rank: 3    Proc PID: 0     App_context index: 0
About to call broadcast           0
About to call broadcast           3
About to call broadcast           1
Done with call to broadcast           1
time for bcast  2.550005912780762E-002
About to call broadcast           2
Done with call to broadcast           2
time for bcast  2.154898643493652E-002
Done with call to broadcast           3
Done with call to broadcast           0
time for bcast   38.1956140995026
time for bcast   38.2115209102631
[kmuriki@n0005 pub]$

Finally here is the fortran code I'm playing with and I'm modifying the
payload size by changing the value of the variable 'ndat':

[kmuriki@n0005 pub]$ more testbcast.f90
program em3d
implicit real*8 (a-h,o-z)
include 'mpif.h'
! em3d_inv main driver
!  INITIALIZE MPI AND DETERMINE BOTH INDIVIDUAL PROCESSOR #
!   AND THE TOTAL NUMBER OF PROCESSORS
!
integer:: Proc
real*8, allocatable:: dbuf(:)

call MPI_INIT(ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD,Proc,IERROR)
call MPI_COMM_SIZE(MPI_COMM_WORLD,Num_Proc,IERROR)

ndat=1000000

!print*,'bcasting to no of tasks',num_proc
allocate(dbuf(ndat))
do i=1,ndat
dbuf(i)=dble(i)
enddo

print*, 'About to call broadcast',proc
t1=MPI_WTIME()
call MPI_BCAST(dbuf,ndat, &
   MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierror)
print*, 'Done with call to broadcast',proc
t2=MPI_WTIME()
write(*,*)'time for bcast',t2-t1

deallocate(dbuf)
call MPI_FINALIZE(IERROR)
end program em3d
[kmuriki@n0005 pub]$

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to