For those following this thread: there was off-list discussion about
this topic -- re-starting the Torque daemons *seemed* to fix the
problem.
On Oct 20, 2006, at 6:00 PM, Ogden, Jeffry Brandon wrote:
We don't actually have the capability to test the mpiexec + MVAPICH
launch at the moment.
Some more background information:
1) the environment is all run inside an initrd with a static pbs_mom.
2) the file we change in the torque distributions is:
torque-2.1.2/src/include/dis.h
---
255 /* NOTE: increase THE_BUF_SIZE to 131072 for systems > 5k nodes */
256
257 /* OLD: #define T