On Thu, 15 Mar 2012 at 12:44am, Reuti wrote

Which version of SGE are you using? The traditional rsh startup was replaced by the builtin startup some time ago (although it should still work).

We're currently running the rather ancient 6.1u4 (due to the "If it ain't broke..." philosophy). The hardware for our new queue master recently arrived and I'll soon be upgrading to the most recent Open Grid Scheduler release. Are you saying that the upgrade with the new builtin startup method should avoid this problem?

Maybe this shows already the problem: there are two `qrsh -inherit`, as Open MPI thinks these are different machines (I ran only with one slot on each host hence didn't get it first but can reproduce it now). But for SGE both may end up in the same queue overriding the openmpi-session in $TMPDIR.

Although it's running: you get all output? If I request 4 slots and get one from each queue on both machines the mpihello outputs only 3 lines: the "Hello World from Node 3" is always missing.

I do seem to get all the output -- there are indeed 64 Hello World lines.

Thanks again for all the help on this. This is one of the most productive exchanges I've had on a mailing list in far too long.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

Reply via email to