We can't seem to run across TCP. We did a default 'configure'. Shared
memory seems to work, but trying tcp give us:
[0,1,1][btl_tcp_endpoint.c:557:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
I'm assuming that the tcp backend is the most thoroughly tested, so I
th
When only sending a few messages, we get reasonably good IB performance,
~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL
mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3
firmware on PCI-X (Couger
Mike,
If your nodes have more than one network interface it can happens
that we do not select the right one. There is a simple way to insure
that this does not happens. Create a directory named .openmpi in your
home area. In this directory edit the file mca-params.conf. This file
is loade
Mike,
Mike Houston wrote:
We can't seem to run across TCP. We did a default 'configure'. Shared
memory seems to work, but trying tcp give us:
[0,1,1][btl_tcp_endpoint.c:557:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
This error indicates the IP address exporte
This error indicates the IP address exported by the peer is not reachable.
You can use the tcp btl parameters:
-mca btl_tcp_include eth0,eth1
or
-mca btl_tcp_exclude eth1
To specify the set of interfaces to use/not use.
George was correct - these should be btl_tcp_if_include/btl_tcp_if_
On Oct 31, 2005, at 8:50 AM, Mike Houston wrote:
When only sending a few messages, we get reasonably good IB
performance,
~500MB/s (MVAPICH is 850MB/s).
What is your message size? Are you using the leave pinned option? If
not, specify -mca mpi_leave_pinned 1 option to mpirun. This tells
O
Same performance problems with that fix. In fact, if I ever use tcp
currently, OpenMPI crashes...
-Mike
George Bosilca wrote:
If there are several networks available between 2 nodes they will get
selected. That can lead to poor performances in the case when the
second network is a high la
Hello Mike,
Mike Houston wrote:
When only sending a few messages, we get reasonably good IB performance,
~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL
mpi_bandwidth test. We are running Mellanox IB Gold 1.8 wit
I'll give it a go. Attached is the code.
Thanks!
-Mike
Tim S. Woodall wrote:
Hello Mike,
Mike Houston wrote:
When only sending a few messages, we get reasonably good IB performance,
~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
messages up, we drop to 3MB/s(!!!). T
That seems to work with the pinning option enabled. THANKS!
Now I'll go back to testing my real code. I'm getting 700MB/s for
messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but
still pretty darn good. My guess is that I can play with the setting
more to tweak up perform
Woops, spoke to soon. The performance quoted was not actually going
between nodes. Actually using the network with the pinned option gives:
[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress]
[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got
error : VAPI
mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2
-hostfile /u/mhouston/mpihosts mpi_bandwidth 21 131072
131072 519.922184 (MillionBytes/sec) 495.836433(MegaBytes/sec)
mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2
-hostfile /u/mhouston/mpihosts mpi_bandwidt
Mike,
I believe was probably corrected today and should be in the
next release candidate.
Thanks,
Tim
Mike Houston wrote:
Woops, spoke to soon. The performance quoted was not actually going
between nodes. Actually using the network with the pinned option gives:
[0,1,0][btl_mvapi_component.
What's the ETA, or should I try grabbing from cvs?
-Mike
Tim S. Woodall wrote:
Mike,
I believe was probably corrected today and should be in the
next release candidate.
Thanks,
Tim
Mike Houston wrote:
Woops, spoke to soon. The performance quoted was not actually going
between nodes. A
I have things working now. I needed to limit OpenMPI to actual working
interfaces (thanks for the tip). It still seems that should be figured
out correctly... Now I've moved onto stress testing with the bandwidth
testing app I posted earlier in the Infiniband thread:
mpirun -mca btl_tcp_if_
Mike,
Let me confirm this was the issue and look at the TCP problem as well.
Will let you know.
Thanks,
Tim
Mike Houston wrote:
What's the ETA, or should I try grabbing from cvs?
-Mike
Tim S. Woodall wrote:
Mike,
I believe was probably corrected today and should be in the
next release c
FWIW, you can grab from Subversion (see http://www.open-mpi.org/svn/)
or grab a nightly snapshot tarball (Tim's changes went into the trunk
-- they have not yet been ported over to the 1.0 release branch; he
wants to verify before porting: http://www.open-mpi.org/nightly/trunk/
)
On Oct 31,
Mike,
There appears to be an issue in our mvapi get protocol. To temporarily
disable this:
/u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw
25 131072
131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec)
Mike Houston wrote:
What's the ETA, or should
Better, but still having issues at lots of outstanding messages:
mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2
mpi_bandwidth 1000 131072
131072 669.574904 (MillionBytes/sec) 638.556389(MegaBytes/sec)
mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2
mpi_bandwidth 10
Sometimes getting crashes:
mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 25 131072
mpirun noticed that job rank 0 with PID 10611 on node
"spire-2.stanford.edu" exited on signal 11.
1 process killed (possibly by Open MPI).
The backtrac
On Oct 28, 2005, at 3:08 PM, Jeff Squyres wrote:
1. I'm concerned about the MPI_Reduce error -- that one shouldn't be
happening at all. We have table lookups for the MPI_Op/MPI_Datatype
combinations that are supposed to work; the fact that you're getting
this error means that HPCC is using a co
On Oct 31, 2005, at 11:05 AM, George Bosilca wrote:
For TCP you can get the list of available MCA parameters using
"ompi_info --param btl tcp". The one involved in selecting the
network are:
btl_tcp_if_include
btl_tcp_if_exclude
You just have to set one of them as they are exclusive. So if you
w
22 matches
Mail list logo