We are investigating a problem that occurs when running a particular code on more than 120 nodes. That number, 120, was arrived at purely from empirical testing. We have tried various versions of openmpi including 1.0.2, 1.1, and 1.1.2. They all fail the same way. The archives indicate this was possibly a problem with 1.0.2 that was resolved in later versions - but we get the same error with later versions.
This is an LNXI 64bit bproc cluster w/ IB interconnect. Attached is tgz file containing a snippet of stderr output, the output from /opt/OpenMPI/openmpi-1.1/ib/bin/ompi_info, and /usr/share/doc/openmpi-ib-1.1/config.log. Please let me know what other info you may want. Any feedback will be appreciated. -- ============================================= Susan Coulter Scientific Computing Resources HPC-3 High Performance Computing Los Alamos National Laboratory 505-667-8425 - voice 505-665-7793 - fax ============================================= Increase the Peace ... An eye for an eye makes the whole world blind