Brian,
Thanks for your help. The hanging problem came back again a day ago.
However, I can now run only if I use either "-mca btl_tcp_if_include
en0" or "-mca btl_tcp_if_include en1". Using btl_tcp_if_exclude on
either en0 or en1 doesn't work.
Regarding the TCP performance, I ran the HPL benchmark again and see
typically 85% to 90% of the LAM-MPI speed, provided the problem size
isn't too small.
Lee
On Apr 16, 2006, at 12:21 PM, Brian Barrett wrote:
On Apr 14, 2006, at 9:33 AM, Lee D. Peterson wrote:
This problem went away yesterday. There was no intervening reboot of
my cluster or a recompile of the code. So all I can surmise is
something got cleaned up in a cron script. Wierd.
Very strange. Could there have been a networking issue (switch
restart or something)?
Anyways, now I've benchmarked the HPL using OpenMPI vs LAM-MPI. The
OpenMPI runs about 13% to sometimes 50% slower than the LAM-MPI. I'm
running over TCP and using SSH.
Our TCP performance on Open MPI is not as good as it is in LAM/MPI,
so it's not totally surprising. 50% is, however, much more than we
expected. There are some pathalogically bad cases that can occur
with multi-NIC (especially our unoptimized multi-NIC support). It
would be interesting to see what the performance would be if you only
use one NIC. You can specify the NIC to use with the
btl_tcp_if_include MCA parameter:
mpirun -np X -mca btl_tcp_if_include en0 <app>
Hope this helps,
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users