[O-MPI users] TCP problems with 1.0rc4

2005-10-31 Thread Mike Houston
We can't seem to run across TCP. We did a default 'configure'. Shared memory seems to work, but trying tcp give us: [0,1,1][btl_tcp_endpoint.c:557:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 I'm assuming that the tcp backend is the most thoroughly tested, so I th

[O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3 firmware on PCI-X (Couger

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
IB. Thanks, george. On Oct 31, 2005, at 10:50 AM, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL mpi_bandwidth

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
I'll give it a go. Attached is the code. Thanks! -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a few messages, we get reasonably good IB performance, ~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of messages up, we drop to

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
e to tweak up performance. Now if I can get the tcp layer working, I'm pretty much good to go. Any word on an SDP layer? I can probably modify the tcp layer quickly to do SDP, but I thought I would ask. -Mike Tim S. Woodall wrote: Hello Mike, Mike Houston wrote: When only sending a fe

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
: VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720 repeated many times. -Mike Mike Houston wrote: That seems to work with the pinning option enabled. THANKS! Now I'll go back to testing my real code. I'm getting 7

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
:mca_btl_mvapi_component_progress] Got error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73412fc repeated until it eventually hangs. -Mike Mike Houston wrote: Woops, spoke to soon. The performance quoted was not actually going between nodes. Actually using the network with the pinned option gives

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
What's the ETA, or should I try grabbing from cvs? -Mike Tim S. Woodall wrote: Mike, I believe was probably corrected today and should be in the next release candidate. Thanks, Tim Mike Houston wrote: Woops, spoke to soon. The performance quoted was not actually going between

[O-MPI users] TCP problems

2005-10-31 Thread Mike Houston
I have things working now. I needed to limit OpenMPI to actual working interfaces (thanks for the tip). It still seems that should be figured out correctly... Now I've moved onto stress testing with the bandwidth testing app I posted earlier in the Infiniband thread: mpirun -mca btl_tcp_if_

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
this: /u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 25 131072 131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec) Mike Houston wrote: What's the ETA, or should I try grabbing from cvs? -Mike Tim S. Woodall wrote: Mike, I bel

Re: [O-MPI users] Infiniband performance problems (mvapi)

2005-10-31 Thread Mike Houston
vapi_flags 2 ./bw 25 131072 131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec) Mike Houston wrote: What's the ETA, or should I try grabbing from cvs? -Mike Tim S. Woodall wrote: Mike, I believe was probably corrected today and should be in the next release can

[OMPI users] Fun with threading

2007-03-13 Thread Mike Houston
At least with 1.1.4, I'm having a heck of a time with enabling multi-threading. Configuring with --with-threads=posix --enable-mpi-threads --enable-progress-threads leads to mpirun just hanging, even when not launching MPI apps, i.e. mpirun -np 1 hostname, and I can't crtl-c to kill it, I have

Re: [OMPI users] Signal 13

2007-03-15 Thread Mike Houston
I've been having similar issues with brand new FC5/6 and RHEL5 machines, but our FC4/RHEL4 machines are just fine. On the FC5/6 RHEL5 machines, I can get things to run as root. There must be some ACL or security setting issue that's enabled by default on the newer distros. If I figure it out

[OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a segfaults: [expose:21249] *** Process received signal *** [expose:21249] Signal: Segm

Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Houston wrote: If I only do gets/puts, things seem to be working correctly with version 1.2. However, if I have a posted Irecv on the target node and issue a MPI_Get against that target, MPI_Test on the posed IRecv causes a

Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-20 Thread Mike Houston
I did notice that single sided transfers seem to be a little slower than explicit send/recv, at least on GigE. Once I do some more testing, I'll bring things up on IB and see how things are going. -Mike Mike Houston wrote: Brian Barrett wrote: On Mar 20, 2007, at 3:15 PM, Mike Hou

Re: [OMPI users] Cell EIB support for OpenMPI

2007-03-22 Thread Mike Houston
That's pretty cool. The main issue with this, and addressed at the end of the report, is that the code size is going to be a problem as data and code must live in the same 256KB in each SPE. They mention dynamic overlay loading, which is also how we deal with large code size, but things get t

Re: [OMPI users] Cell EIB support for OpenMPI

2007-03-23 Thread Mike Houston
Marcus G. Daniels wrote: Marcus G. Daniels wrote: Mike Houston wrote: The main issue with this, and addressed at the end of the report, is that the code size is going to be a problem as data and code must live in the same 256KB in each SPE. They mention dynamic overlay loading

Re: [OMPI users] Failure to launch on a remote node. SSH problem?

2007-03-24 Thread Mike Houston
Also make sure that /tmp is user writable. By default, that is where openmpi likes to stick some files. -Mike David Burns wrote: Could also be a firewall problem. Make sure all nodes in the cluster accept tcp packets from all others. Dave Walker, David T. wrote: I am presently trying t

Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-26 Thread Mike Houston
lar to what I was seeing, so hopefully I can make some progress on a real solution. Brian On Mar 20, 2007, at 8:54 PM, Mike Houston wrote: Well, I've managed to get a working solution, but I'm not sure how I got there. I built a test case that looked like a nice simple version

Re: [OMPI users] Issues with Get/Put and IRecv

2007-03-27 Thread Mike Houston
Well, mpich2 and mvapich2 are working smoothly for my app. mpich2 under gige is also giving ~2X the performance of openmpi during the working cases for openmpi. After the paper deadline, I'll attempt to package up a simple test case and send it to the list. Thanks! -Mike Mike Ho