Brian,

Thanks for your help. The hanging problem came back again a day ago. However, I can now run only if I use either "-mca btl_tcp_if_include en0" or "-mca btl_tcp_if_include en1". Using btl_tcp_if_exclude on either en0 or en1 doesn't work.

Regarding the TCP performance, I ran the HPL benchmark again and see typically 85% to 90% of the LAM-MPI speed, provided the problem size isn't too small.

Lee

On Apr 16, 2006, at 12:21 PM, Brian Barrett wrote:

On Apr 14, 2006, at 9:33 AM, Lee D. Peterson wrote:

This problem went away yesterday. There was no intervening reboot of
my cluster or a recompile of the code. So all I can surmise is
something got cleaned up in a cron script. Wierd.

Very strange.  Could there have been a networking issue (switch
restart or something)?

Anyways, now I've benchmarked the HPL using OpenMPI vs LAM-MPI. The
OpenMPI runs about 13% to sometimes 50% slower than the LAM-MPI. I'm
running over TCP and using SSH.

Our TCP performance on Open MPI is not as good as it is in LAM/MPI,
so it's not totally surprising.  50% is, however, much more than we
expected.  There are some pathalogically bad cases that can occur
with multi-NIC (especially our unoptimized multi-NIC support).  It
would be interesting to see what the performance would be if you only
use one NIC.  You can specify the NIC to use with the
btl_tcp_if_include MCA parameter:

   mpirun -np X -mca btl_tcp_if_include en0 <app>


Hope this helps,

Brian

--
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to