While core binding itself should work with such an topology (I never tried it)
in 6.2u5, the reporting of the topology string will be wrong. As you might
noticed,
string based load values are just reported up to a length of 1024 bytes,
that means that with 1000 nodes not the full topology strin
It is not needed for Linux hosts with UGE 8.0.0 and above, but still for
Solaris hosts since they use processor sets, which could be misused.
Cheers
Daniel
Am 26.04.2012 um 15:49 schrieb Rayson Ho:
> On Thu, Apr 26, 2012 at 7:40 AM, Pablo Escobar wrote:
>> ¿maybe I am missing something in the
Am 19.05.2012 um 19:16 schrieb Farkas, Illes:
> Hello,
>
> Is there a command (or an argument/switch of qsub) that tells the queue
> manager to write into a file the maximum amount of memory used by one of the
> jobs during its entire life time? To the best of my knowledge, after a job
> fini
The expected behavior would be that when there is
never an host with 12 slots free, that your cluster
will be filled up with 4 slot jobs, even when they have
lower priorities. The reservation you gave the 12 slot
jobs will be attached at the end of that year. Hence
I would assume that your 12 slo
t; On 25 May 2012 10:31, Daniel Gruber wrote:
>> The expected behavior would be that when there is
>> never an host with 12 slots free, that your cluster
>> will be filled up with 4 slot jobs, even when they have
>> lower priorities. The reservation you gave the 12 slot
>
Am 25.05.2012 um 12:35 schrieb Richard Ems:
> On 05/25/2012 12:27 PM, Daniel Gruber wrote:
>> Exactly, looks like your runtime estimation for your slot4 jobs
>> is smaller than for your slot12 jobs. Backfilling must be active
>> here. Did you submit both jobs in exac
with allocation rule fillup the scheduler tries to maximize the
amount of slots which can be collected on any host. The host
selection order depends usually *not* on the amount of free slots
(anyway this could be configured).
It looks like that you have either already some smaller jobs running
on
Try "qacct -b 120101 -pe" without anything.
Daniel
Am 24.07.2012 um 13:52 schrieb Nick Holway:
> Dear all,
>
> I'm trying to get some aggregate stats for all our parallel
> environments using qacct. I'm using "qacct -b 120101 -pe \*" and I
> also tried it with the * in double quotes. Th
What you could do is creating a queue for each GPU you
have on a host and assign them a queue exclusive GPU complex.
The amount of GPU queues are limiting then the amount of
GPU jobs. Then the total amount of cpu cores must be limited
differently by a RQS on a per host basis.
Daniel
Am 23.08.2
You can set arbitrary signals to be sent when suspension
is triggered (like SIGKILL). See: man queue_conf
section "suspend_method"
Daniel
Am 24.08.2012 um 03:13 schrieb Joseph Farran:
> Howdy.
>
> Is there a flag one can set on a job so that it will be killed instead of
> being suspended for
.
Am 24.08.2012 um 08:52 schrieb Daniel Gruber:
> You can set arbitrary signals to be sent when suspension
> is triggered (like SIGKILL). See: man queue_conf
> section "suspend_method"
>
> Daniel
>
> Am 24.08.2012 um 03:13 schrieb Joseph Farran:
>
>> H
The easiest way would be to give a job a name with qsub -N job1 (or use -terse
for getting the job id) and
then using -hold_jid for the second job. More details you will find in the qsub
man page. Of course you
can also use DRMAA, or more unusual an array job with task throttling (-tc 1).
Dani
Am 26.10.2012 um 07:58 schrieb Joseph Farran:
> Howdy.
>
> One of my queues has a wall time hard limit of 4 days ( 96 hours ):
> # qconf -sq queue | grep h_rt
> h_rt 96:00:00
>
> There is a job which has been running much longer than 4 days and I am not
> sure how to get the
You need to configure a fixed allocation rule of 8 in your
parallel environment and request that PE than on command line.
It is common to have multiple parallel environments for
the same job type with different allocation rules.
qconf -mp yourpe_08
...
allocation_rule8
With a wildcard
Am 26.03.2013 um 17:10 schrieb Reuti:
> Hi,
>
> Am 26.03.2013 um 12:17 schrieb Arnau Bria:
>
>> I'm migrating a bash jsv script to perl and adding some
>> modifications, but I have some doubts:
>>
>> 1) jsv_correct vs jsv_accept. From man:
>>
>> If the result_type is ACCEPTED the job will be
Since queue requests are not a part of DRMAA1 you
should use "DRMAA_NATIVE_SPECIFICATION", which allows
you to set (almost) any qsub command line parameter
available. You can also use job categories but than
you have to configure it in the qtasks file.
DRMAA version 2 specifies "queueName" in the
Hi,
Please notice the difference between "set linear:1:0,0“ and
"set linear:1“. The first one means - give me one core starting
at socket 0 core 0 (which means here obviously you are
requesting core 0 on socket 0). The second means that
you want one core on the host and the execution daemon
takes
There is unfortunately no way in SGE to limit main memory.
h_rss / s_rss does not work with the rlimit call in Linux kernel version above
2.4.
Hence in Univa Grid Engine we introduced multiple ways for doing main memory
limitations. If you have cgroups support turned on then the cgroup takes car
Hi Atul,
We have included Intel Xeon Phi Support for Univa Grid Engine
in the year 2012 in Univa Grid Engine 8.1.3. For that it required
to add some new functionality which was missing in Sun Grid Engine.
So basically what we did was:
- Create a new resource type which allows you to do a mapping
Hi Joe,
Univa Grid Engine 8.3 added such a functionality to its APIs (WebService API)
so that you can submit on behalf of another user. The intention is to
simplify building web portals. But this is restricted to users listed in the
new sudoers Grid Engine ACL.
We can chat privately about that i
Hi Bill,
You changed the global configuration (qconf -mconf or qconf -mconf global).
This is most likely overridden by the host local configuration.
Try with changing it in the host local configuration (qconf -mconf ).
You are right it takes a few seconds that the changes are propagated but
it is
Hi Mikhail
That is indeed strange and the support request is handled properly in the
support portal. Things I can imagine: You are using host resources which
are requesting cores implicitly when requested (having cores attached with
topology masks) or you are running into an rare strtok() issue
If you are referring to SGE_EXECD_PORT and SGE_QMASTER_PORT for example they
are
not really Univa specific. They are installation specific. If you installed UGE
with self-set ports
then they are required (set by settings.sh file). If you install it with taking
out the ports from
the services f
No direct support for that in SGE.
When a job is released from hold (like when another starts) does not
mean it is executed. Hence you would not have not any guarantee that
both are running at the same point in time.
You could submit the successor before the other one and give it the job id
of t
Just to add to what Ondrej said - there are two different settings in the
initial cgroup integration implemented.
One allows to over-commit memory as long as there is no memory pressure in the
kernel. But the actual
behavior depends on the Linux kernel. For debugging what Grid Engine set you
can
Am 03.08.2011 um 10:28 schrieb William Hay:
> On 2 August 2011 17:58, Rayson Ho wrote:
>> It's a bug introduced by another bug fix in SGE 6.2u5, and Oracle was
>> first who fixed the bug in Oracle Grid Engine. Then we added a
>> workaround in SGE 6.2u5p1 in Open Grid Scheduler, and Son of Grid
>>
Am 08.08.2011 um 18:41 schrieb William Deegan:
> On 8/6/2011 12:59 AM, Daniel Gruber wrote:
>> Am 03.08.2011 um 10:28 schrieb William Hay:
>>
>>> On 2 August 2011 17:58, Rayson Ho wrote:
>>>> It's a bug introduced by another bug fix in SGE 6.2u5, and O
27 matches
Mail list logo