We are investigating a problem that occurs when running a particular
code on more than 120 nodes.  That number, 120, was arrived at purely
from empirical testing.  We have tried various versions of openmpi
including 1.0.2, 1.1, and 1.1.2.  They all fail the same way.  The
archives indicate this was possibly a problem with 1.0.2 that was
resolved in later versions - but we get the same error with later
versions.  

This is an LNXI 64bit bproc cluster w/ IB interconnect.

Attached is tgz file containing a snippet of stderr output, the output
from /opt/OpenMPI/openmpi-1.1/ib/bin/ompi_info, and 
/usr/share/doc/openmpi-ib-1.1/config.log.

Please let me know what other info you may want.  Any feedback will be
appreciated.


-- 
=============================================
Susan Coulter
Scientific Computing Resources
HPC-3 High Performance Computing 
Los Alamos National Laboratory
505-667-8425 - voice
505-665-7793 - fax
=============================================
Increase the Peace ...
An eye for an eye makes the whole world blind

Reply via email to