Mike --

We've been unable to reproduce this problem, but Tim just noticed that we had a patch on the trunk from several days ago that we forgot to apply to the v1.0 branch (Tim just applied it now).

Could you give the nightly v1.0 tarball a whirl tomorrow morning? It should contain the patch, and may fix your problem.

    http://www.open-mpi.org/nightly/v1.0/

Thanks!


On Oct 31, 2005, at 2:00 PM, Mike Houston wrote:

I have things working now. I needed to limit OpenMPI to actual working interfaces (thanks for the tip). It still seems that should be figured out correctly... Now I've moved onto stress testing with the bandwidth
testing app I posted earlier in the Infiniband thread:

mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 3750 262144

262144  109.697279 (MillionBytes/sec)   104.615478(MegaBytes/sec)

mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 4000 262144
[spire-2.Stanford.EDU:06645] mca_btl_tcp_frag_send: writev failed with
errno=104mpirun noticed that job rank 1 with PID 21281 on node
"spire-3.stanford.edu" exited on signal 11.

Cranking up the number of messages in flight makes things really
unhappy.  I haven't seen this behavior with LAM or MPICH so I thought
I'd mention it.

Thanks!

-Mike
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to