Hi Jeff,
Here is the code with a warmup broadcast of 10K real values and
actual broadcast of 100K real*8 values(different buffers):
[kmuriki@n pub]$ more testbcast.f90
program em3d
implicit real*8 (a-h,o-z)
include 'mpif.h'
! em3d_inv main driver
! INITIALIZE MPI AND DETERMINE BOTH INDIVID
On Jan 13, 2009, at 3:32 PM, kmur...@lbl.gov wrote:
With IB, there's also the issue of registered memory. Open MPI
v1.2.x defaults to copy in/copy out semantics (with pre-registered
memory) until the message reaches a certain size, and then it uses
a pipelined register/RDMA protocol. Howe
Hi Jeff,
Please read below:
On Jan 12, 2009, at 2:50 PM, kmur...@lbl.gov wrote:
Is there is any requirement on the size of the data buffers
I should use in these warmup broadcasts ? If I use small
buffers like 1000 real values during warmup, the following
actual and timed MPI_BCAST over IB is
On Jan 12, 2009, at 2:50 PM, kmur...@lbl.gov wrote:
Is there is any requirement on the size of the data buffers
I should use in these warmup broadcasts ? If I use small
buffers like 1000 real values during warmup, the following
actual and timed MPI_BCAST over IB is taking a lot of time
(more tha
Hi Jeff,
Thanks for your response.
Is there is any requirement on the size of the data buffers
I should use in these warmup broadcasts ? If I use small
buffers like 1000 real values during warmup, the following
actual and timed MPI_BCAST over IB is taking a lot of time
(more than that on GiGE).
You might want to do some "warmup" bcasts before doing your timing
measurements.
Open MPI makes network connections lazily, meaning that we only make
connections upon the first send (e.g., the sends underneath the
MPI_BCAST). So the first MPI_BCAST is likely to be quite slow, while
all t
Hey Krishna,
Is this part of the reason that our users are seeing a significant
slowdown when they go beyond using 2 nodes with espresso? You should try
that as an example. It's surprising that using more than 2 nodes can
lead to a slower wall time for calculations than using 2 alone.
David
k
Hello there,
We have a DDR IB cluster with Open MPI ver 1.2.8.
I'm testing on two nodes with two processors each and both
the nodes are adjacent (2 hops distant) on the same leaf
of the tree interconnect.
I observe that when I try to MPI_BCAST among the four MPI
tasks it takes a lot of time wit