On Thu, 15 Mar 2012 at 4:41pm, Reuti wrote
Am 15.03.2012 um 15:50 schrieb Ralph Castain:
On Mar 15, 2012, at 8:46 AM, Reuti wrote:
Am 15.03.2012 um 15:37 schrieb Ralph Castain:
FWIW: I see the problem. Our parser was apparently written assuming
every line was a unique host, so it doesn't even check to see if
there is duplication. Easy fix - can shoot it to you today.
But even with the fix the nice value will be the same for all
processes forked there. Either all have the nice value of his low
priority queue or the high priority queue.
Agreed - nothing I can do about that, though. We only do the one qrsh
call, so the daemons are going to fall into a single queue, and so will
all their children. In this scenario, it isn't clear to me (from this
discussion) that I can control which queue gets used
Correct.
Which I understand. Our queue setup is admittedly a bit wonky (which is
probably why I'm the first one to have this issue). I'm much more
concerned with things not crashing than with them absolutely having the
"right" nice levels. :)
Should I?
I can't speak for the community. Personally I would say: don't
distribute parallel jobs among different queues at all, as some
applications will use some internal communication about the environment
variables of the master process to distribute them to the slaves (even
if SGE's `qrsh -inherit ...` is called without -V, and even if Open MPI
is not told to forward and specific environment variable). If you have a
custom application it can work of course, but with closed source ones
you can only test and get the experience whether it's working or not.
Not to mention the timing issue of differently niced processes.
Adjusting the SGE setup of the OP would be the smarter way IMO.
And I agree with that as well. I understand if the decision is made to
leave the parser the way it is, given that my setup is outside the norm.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF