On Apr 14, 2006, at 9:33 AM, Lee D. Peterson wrote:
This problem went away yesterday. There was no intervening reboot of my cluster or a recompile of the code. So all I can surmise is something got cleaned up in a cron script. Wierd.
Very strange. Could there have been a networking issue (switch restart or something)?
Anyways, now I've benchmarked the HPL using OpenMPI vs LAM-MPI. The OpenMPI runs about 13% to sometimes 50% slower than the LAM-MPI. I'm running over TCP and using SSH.
Our TCP performance on Open MPI is not as good as it is in LAM/MPI, so it's not totally surprising. 50% is, however, much more than we expected. There are some pathalogically bad cases that can occur with multi-NIC (especially our unoptimized multi-NIC support). It would be interesting to see what the performance would be if you only use one NIC. You can specify the NIC to use with the btl_tcp_if_include MCA parameter:
mpirun -np X -mca btl_tcp_if_include en0 <app> Hope this helps, Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/