Mike --
We've been unable to reproduce this problem, but Tim just noticed
that we had a patch on the trunk from several days ago that we forgot
to apply to the v1.0 branch (Tim just applied it now).
Could you give the nightly v1.0 tarball a whirl tomorrow morning? It
should contain the patch, and may fix your problem.
http://www.open-mpi.org/nightly/v1.0/
Thanks!
On Oct 31, 2005, at 2:00 PM, Mike Houston wrote:
I have things working now. I needed to limit OpenMPI to actual
working
interfaces (thanks for the tip). It still seems that should be
figured
out correctly... Now I've moved onto stress testing with the
bandwidth
testing app I posted earlier in the Infiniband thread:
mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 3750 262144
262144 109.697279 (MillionBytes/sec) 104.615478(MegaBytes/sec)
mpirun -mca btl_tcp_if_include eth0 -mca btl tcp -np 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 4000 262144
[spire-2.Stanford.EDU:06645] mca_btl_tcp_frag_send: writev failed with
errno=104mpirun noticed that job rank 1 with PID 21281 on node
"spire-3.stanford.edu" exited on signal 11.
Cranking up the number of messages in flight makes things really
unhappy. I haven't seen this behavior with LAM or MPICH so I thought
I'd mention it.
Thanks!
-Mike
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users