On Wed, 14 Mar 2012 at 5:50pm, Ralph Castain wrote

On Mar 14, 2012, at 5:44 PM, Reuti wrote:

(I was just typing when Ralph's message came in: I can confirm this. To avoid it, it would mean for Open MPI to collect all lines from the hostfile which are on the same machine. SGE creates entries for each queue/host pair in the machine file).

Hmmm…I can take a look at the allocator module and see why we aren't doing it. Would the host names be the same for the two queues?

I can't speak authoritatively like Reuti can, but here's what a hostfile
looks like on my cluster (note that all our name resolution is done via /etc/hosts -- there's no DNS involved):

iq103 8 lab.q@iq103 <NULL>
iq103 1 test.q@iq103 <NULL>
iq104 8 lab.q@iq104 <NULL>
iq104 1 test.q@iq104 <NULL>
opt221 2 lab.q@opt221 <NULL>
opt221 1 test.q@opt221 <NULL>

@Ralph: it could work if SGE would have a facility to request the desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be unique for each orted again (assuming its using different ports for each).

Gotcha! I suspect getting the allocator to handle this cleanly is the better solution, though.

If I can help (testing patches, e.g.), let me know.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

Reply via email to