Hello there,
We have a DDR IB cluster with Open MPI ver 1.2.8.
I'm testing on two nodes with two processors each and both
the nodes are adjacent (2 hops distant) on the same leaf
of the tree interconnect.
I observe that when I try to MPI_BCAST among the four MPI
tasks it takes a lot of time with IB network (more than
the GiGE network) when the payload sizes range from 24K bytes
to 800K bytes.
For payloads below 8K bytes and above 200K bytes the performance
is acceptable.
Any suggestions on how I debug this and locate the source of
the problem ? (More info below) Please let me know if you need
any more information from my side.
thanks for your time,
Krishna Muriki,
HPC User Services,
Scientific Cluster Support,
Lawrence Berkeley National Laboratory.
I) Payload size 8M bytes over IB:
[kmuriki@n0005 pub]$ mpirun -v -display-map --mca btl openib,self -np 4
-hostfile hostfile.lr ./testbcast.8000000
[n0005.scs00:13902] Map for job: 1 Generated by mapping mode: byslot
Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
Data for app_context: index 0 app: ./testbcast.8000000
Num procs: 4
Argv[0]: ./testbcast.8000000
Env[0]: OMPI_MCA_btl=openib,self
Env[1]: OMPI_MCA_rmaps_base_display_map=1
Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
Env[3]:
OMPI_MCA_orte_precondition_transports=1405b3b501aa4086-00dbc7151c7348e1
Env[4]: OMPI_MCA_rds=proxy
Env[5]: OMPI_MCA_ras=proxy
Env[6]: OMPI_MCA_rmaps=proxy
Env[7]: OMPI_MCA_pls=proxy
Env[8]: OMPI_MCA_rmgr=proxy
Working dir:
/global/home/users/kmuriki/sample_executables/pub (user: 0)
Num maps: 0
Num elements in nodes list: 2
Mapped node:
Cell: 0 Nodename: n0172.lr Launch id: -1 Username:
NULL
Daemon name:
Data type: ORTE_PROCESS_NAME Data Value: NULL
Oversubscribed: False Num elements in procs list: 2
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,0]
Proc Rank: 0 Proc PID: 0 App_context index: 0
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,1]
Proc Rank: 1 Proc PID: 0 App_context index: 0
Mapped node:
Cell: 0 Nodename: n0173.lr Launch id: -1 Username:
NULL
Daemon name:
Data type: ORTE_PROCESS_NAME Data Value: NULL
Oversubscribed: False Num elements in procs list: 2
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,2]
Proc Rank: 2 Proc PID: 0 App_context index: 0
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,3]
Proc Rank: 3 Proc PID: 0 App_context index: 0
About to call broadcast 3
About to call broadcast 1
About to call broadcast 2
About to call broadcast 0
Done with call to broadcast 2
time for bcast 0.133496046066284
Done with call to broadcast 3
time for bcast 0.148098945617676
Done with call to broadcast 0
time for bcast 0.113168954849243
Done with call to broadcast 1
time for bcast 0.145189046859741
[kmuriki@n0005 pub]$
II) Payload size 80K bytes using GiGE Network:
[kmuriki@n0005 pub]$ mpirun -v -display-map --mca btl tcp,self -np 4
-hostfile hostfile.lr ./testbcast.80000
[n0005.scs00:13928] Map for job: 1 Generated by mapping mode: byslot
Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
Data for app_context: index 0 app: ./testbcast.80000
Num procs: 4
Argv[0]: ./testbcast.80000
Env[0]: OMPI_MCA_btl=tcp,self
Env[1]: OMPI_MCA_rmaps_base_display_map=1
Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
Env[3]:
OMPI_MCA_orte_precondition_transports=305b93d4acc82685-12bbf20d2e6d250b
Env[4]: OMPI_MCA_rds=proxy
Env[5]: OMPI_MCA_ras=proxy
Env[6]: OMPI_MCA_rmaps=proxy
Env[7]: OMPI_MCA_pls=proxy
Env[8]: OMPI_MCA_rmgr=proxy
Working dir:
/global/home/users/kmuriki/sample_executables/pub (user: 0)
Num maps: 0
Num elements in nodes list: 2
Mapped node:
Cell: 0 Nodename: n0172.lr Launch id: -1 Username:
NULL
Daemon name:
Data type: ORTE_PROCESS_NAME Data Value: NULL
Oversubscribed: False Num elements in procs list: 2
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,0]
Proc Rank: 0 Proc PID: 0 App_context index: 0
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,1]
Proc Rank: 1 Proc PID: 0 App_context index: 0
Mapped node:
Cell: 0 Nodename: n0173.lr Launch id: -1 Username:
NULL
Daemon name:
Data type: ORTE_PROCESS_NAME Data Value: NULL
Oversubscribed: False Num elements in procs list: 2
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,2]
Proc Rank: 2 Proc PID: 0 App_context index: 0
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,3]
Proc Rank: 3 Proc PID: 0 App_context index: 0
About to call broadcast 0
About to call broadcast 2
About to call broadcast 1
Done with call to broadcast 2
time for bcast 7.137393951416016E-002
About to call broadcast 3
Done with call to broadcast 3
time for bcast 1.110005378723145E-002
Done with call to broadcast 0
time for bcast 7.121706008911133E-002
Done with call to broadcast 1
time for bcast 3.379988670349121E-002
[kmuriki@n0005 pub]$
III) Payload size 80K bytes using IB Network:
[kmuriki@n0005 pub]$ mpirun -v -display-map --mca btl openib,self -np 4
-hostfile hostfile.lr ./testbcast.80000
[n0005.scs00:13941] Map for job: 1 Generated by mapping mode: byslot
Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
Data for app_context: index 0 app: ./testbcast.80000
Num procs: 4
Argv[0]: ./testbcast.80000
Env[0]: OMPI_MCA_btl=openib,self
Env[1]: OMPI_MCA_rmaps_base_display_map=1
Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
Env[3]:
OMPI_MCA_orte_precondition_transports=4cdb5ae2babe9010-709842ac574605f9
Env[4]: OMPI_MCA_rds=proxy
Env[5]: OMPI_MCA_ras=proxy
Env[6]: OMPI_MCA_rmaps=proxy
Env[7]: OMPI_MCA_pls=proxy
Env[8]: OMPI_MCA_rmgr=proxy
Working dir:
/global/home/users/kmuriki/sample_executables/pub (user: 0)
Num maps: 0
Num elements in nodes list: 2
Mapped node:
Cell: 0 Nodename: n0172.lr Launch id: -1 Username:
NULL
Daemon name:
Data type: ORTE_PROCESS_NAME Data Value: NULL
Oversubscribed: False Num elements in procs list: 2
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,0]
Proc Rank: 0 Proc PID: 0 App_context index: 0
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,1]
Proc Rank: 1 Proc PID: 0 App_context index: 0
Mapped node:
Cell: 0 Nodename: n0173.lr Launch id: -1 Username:
NULL
Daemon name:
Data type: ORTE_PROCESS_NAME Data Value: NULL
Oversubscribed: False Num elements in procs list: 2
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,2]
Proc Rank: 2 Proc PID: 0 App_context index: 0
Mapped proc:
Proc Name:
Data type: ORTE_PROCESS_NAME Data Value: [0,1,3]
Proc Rank: 3 Proc PID: 0 App_context index: 0
About to call broadcast 0
About to call broadcast 3
About to call broadcast 1
Done with call to broadcast 1
time for bcast 2.550005912780762E-002
About to call broadcast 2
Done with call to broadcast 2
time for bcast 2.154898643493652E-002
Done with call to broadcast 3
Done with call to broadcast 0
time for bcast 38.1956140995026
time for bcast 38.2115209102631
[kmuriki@n0005 pub]$
Finally here is the fortran code I'm playing with and I'm modifying the
payload size by changing the value of the variable 'ndat':
[kmuriki@n0005 pub]$ more testbcast.f90
program em3d
implicit real*8 (a-h,o-z)
include 'mpif.h'
! em3d_inv main driver
! INITIALIZE MPI AND DETERMINE BOTH INDIVIDUAL PROCESSOR #
! AND THE TOTAL NUMBER OF PROCESSORS
!
integer:: Proc
real*8, allocatable:: dbuf(:)
call MPI_INIT(ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD,Proc,IERROR)
call MPI_COMM_SIZE(MPI_COMM_WORLD,Num_Proc,IERROR)
ndat=1000000
!print*,'bcasting to no of tasks',num_proc
allocate(dbuf(ndat))
do i=1,ndat
dbuf(i)=dble(i)
enddo
print*, 'About to call broadcast',proc
t1=MPI_WTIME()
call MPI_BCAST(dbuf,ndat, &
MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierror)
print*, 'Done with call to broadcast',proc
t2=MPI_WTIME()
write(*,*)'time for bcast',t2-t1
deallocate(dbuf)
call MPI_FINALIZE(IERROR)
end program em3d
[kmuriki@n0005 pub]$
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users