We can't seem to run across TCP. We did a default 'configure'. Shared
memory seems to work, but trying tcp give us:
[0,1,1][btl_tcp_endpoint.c:557:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
I'm assuming that the tcp backend is the most thoroughly tested, so I
th
When only sending a few messages, we get reasonably good IB performance,
~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL
mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3
firmware on PCI-X (Couger
IB.
Thanks,
george.
On Oct 31, 2005, at 10:50 AM, Mike Houston wrote:
When only sending a few messages, we get reasonably good IB
performance,
~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL
mpi_bandwidth
I'll give it a go. Attached is the code.
Thanks!
-Mike
Tim S. Woodall wrote:
Hello Mike,
Mike Houston wrote:
When only sending a few messages, we get reasonably good IB performance,
~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
messages up, we drop to
e to tweak up performance. Now if I can get the tcp layer working,
I'm pretty much good to go.
Any word on an SDP layer? I can probably modify the tcp layer quickly
to do SDP, but I thought I would ask.
-Mike
Tim S. Woodall wrote:
Hello Mike,
Mike Houston wrote:
When only sending a fe
: VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error :
VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720
repeated many times.
-Mike
Mike Houston wrote:
That seems to work with the pinning option enabled. THANKS!
Now I'll go back to testing my real code. I'm getting 7
:mca_btl_mvapi_component_progress] Got
error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73412fc
repeated until it eventually hangs.
-Mike
Mike Houston wrote:
Woops, spoke to soon. The performance quoted was not actually going
between nodes. Actually using the network with the pinned option gives
What's the ETA, or should I try grabbing from cvs?
-Mike
Tim S. Woodall wrote:
Mike,
I believe was probably corrected today and should be in the
next release candidate.
Thanks,
Tim
Mike Houston wrote:
Woops, spoke to soon. The performance quoted was not actually going
between
I have things working now. I needed to limit OpenMPI to actual working
interfaces (thanks for the tip). It still seems that should be figured
out correctly... Now I've moved onto stress testing with the bandwidth
testing app I posted earlier in the Infiniband thread:
mpirun -mca btl_tcp_if_
this:
/u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw
25 131072
131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec)
Mike Houston wrote:
What's the ETA, or should I try grabbing from cvs?
-Mike
Tim S. Woodall wrote:
Mike,
I bel
vapi_flags 2 ./bw
25 131072
131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec)
Mike Houston wrote:
What's the ETA, or should I try grabbing from cvs?
-Mike
Tim S. Woodall wrote:
Mike,
I believe was probably corrected today and should be in the
next release can
At least with 1.1.4, I'm having a heck of a time with enabling
multi-threading. Configuring with --with-threads=posix
--enable-mpi-threads --enable-progress-threads leads to mpirun just
hanging, even when not launching MPI apps, i.e. mpirun -np 1 hostname,
and I can't crtl-c to kill it, I have
I've been having similar issues with brand new FC5/6 and RHEL5 machines,
but our FC4/RHEL4 machines are just fine. On the FC5/6 RHEL5 machines,
I can get things to run as root. There must be some ACL or security
setting issue that's enabled by default on the newer distros. If I
figure it out
If I only do gets/puts, things seem to be working correctly with version
1.2. However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults:
[expose:21249] *** Process received signal ***
[expose:21249] Signal: Segm
Brian Barrett wrote:
On Mar 20, 2007, at 3:15 PM, Mike Houston wrote:
If I only do gets/puts, things seem to be working correctly with
version
1.2. However, if I have a posted Irecv on the target node and issue a
MPI_Get against that target, MPI_Test on the posed IRecv causes a
I did notice that single sided transfers seem to be a
little slower than explicit send/recv, at least on GigE. Once I do some
more testing, I'll bring things up on IB and see how things are going.
-Mike
Mike Houston wrote:
Brian Barrett wrote:
On Mar 20, 2007, at 3:15 PM, Mike Hou
That's pretty cool. The main issue with this, and addressed at the end
of the report, is that the code size is going to be a problem as data
and code must live in the same 256KB in each SPE. They mention dynamic
overlay loading, which is also how we deal with large code size, but
things get t
Marcus G. Daniels wrote:
Marcus G. Daniels wrote:
Mike Houston wrote:
The main issue with this, and addressed at the end
of the report, is that the code size is going to be a problem as data
and code must live in the same 256KB in each SPE. They mention dynamic
overlay loading
Also make sure that /tmp is user writable. By default, that is where
openmpi likes to stick some files.
-Mike
David Burns wrote:
Could also be a firewall problem. Make sure all nodes in the cluster
accept tcp packets from all others.
Dave
Walker, David T. wrote:
I am presently trying t
lar to what I
was seeing, so hopefully I can make some progress on a real solution.
Brian
On Mar 20, 2007, at 8:54 PM, Mike Houston wrote:
Well, I've managed to get a working solution, but I'm not sure how
I got
there. I built a test case that looked like a nice simple version
Well, mpich2 and mvapich2 are working smoothly for my app. mpich2 under
gige is also giving ~2X the performance of openmpi during the working
cases for openmpi. After the paper deadline, I'll attempt to package up
a simple test case and send it to the list.
Thanks!
-Mike
Mike Ho
21 matches
Mail list logo