Am 16.05.2012 um 09:44 schrieb Arturo:

> What I want to do is that a user could request 128 slots in just 2 nodes of 
> 64 cores. Using the parallel environment
> $ qsub -q conmat -pe foobar 5 submit.sh
> makes the script to be executed in 5 slots, but they could be assigned to 5 
> different nodes.

The selection of nodes depends on the allocation_rule in the PE setting. It's 
true that this selection criteria is to be set up by the admin, and not by the 
user like in Torque.

But a second variable slots_free won't help - how should this variable be used 
to select any particular number of hosts?


> If I try to use the built in complex_value slots:
> $ qsub -q conmat -l slosts 5 submit.sh

To get 5 slots on a single machine the allocation_rule can be set to $pe_slots. 
If you have 64 core machines and want to be sure to get 2 and only 2 machines 
for a request of 128 slots, the "exclusive" complex feature can be used if 
requested in addition to the allocation_rule $fill_up. 

Another option could be a fixed allocation_rule of 64. But then you are limited 
to multiple of 64 of course.

-- Reuti


> gives an error so I can't use it:
> 
> Unable to run job: "job" denied: use parallel environments instead of 
> requesting slots explicitly.
> Exiting.
> 
> So, I have created a new complex_value, slotsfree similar to slots, but I can 
> use it as a requestable and consumable.
> 
> But now, thanks to William, we have observed that slotsfree consmuption is 
> multiplied by the slots configured in the PE.
> 
> Do you understant what my problem is?
> 
> Ideally I would like to specify how many slots to use (128 for example) and 
> in how many different nodes, but without specifying explicitly in which nodes.
> 
> Many thanks for your help!!!
> 
> El 15/05/12 16:32, Reuti escribió:
>> Am 15.05.2012 um 16:23 schrieb Arturo:
>> 
>>> <snip>
>>> 
>>> 
>>> It doesn't matter to which queue I submit the script.
>>> 
>>> I would use the built in slot complex, but when I use it gives me this 
>>> error:
>>> 
>>> qsub -q conmat -l slots=5 submit.sh
>>> Unable to run job: "job" denied: use parallel environments instead of 
>>> requesting slots explicitly.
>>> Exiting.
>> As the message says: you are not requesting a PE with the proper slot count?
>> 
>> $ qsub  -q conmat -pe foobar 5 submit.sh
>> 
>> -- Reuti
>> 
>> 
>>> Regards
>>> 
>>> El 15/05/12 16:12, William Hay escribió:
>>>> Ok that makes more sense.  The queue instance on node045 is called
>>>> conmat not test.   If test only exists as a single slot on each of
>>>> node046 and node047
>>>> then when you request -q test you are restricting it to those two
>>>> slots which isn't enough for a 4 slot job.
>>>> We would really need the full output of qstat -f to be sure though.
>>>> 
>>>> 
>>>> William
>>>> On 15 May 2012 14:42, Arturo
>>>> <[email protected]>
>>>>  wrote:
>>>> 
>>>>> More info:
>>>>> 
>>>>> output of qstat -f
>>>>> 
>>>>> ---------------------------------------------------------------------------------
>>>>> 
>>>>> [email protected]
>>>>>       BIP   0/0/64         0.00     lx26-amd64
>>>>> ---------------------------------------------------------------------------------
>>>>> 
>>>>> ############################################################################
>>>>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING 
>>>>> JOBS
>>>>> ############################################################################
>>>>>   74550 0.60500 test     arturo       qw    05/15/2012 15:26:50     4
>>>>> 
>>>>> qconf -sq test |grep slot
>>>>> 
>>>>>     slots                 64
>>>>> 
>>>>> 
>>>>> qconf -sp openmpi |grep slots
>>>>> 
>>>>> slots              99999
>>>>> urgency_slots      min
>>>>> 
>>>>> Regards
>>>>> 
>>>>> El 15/05/12 15:39, Arturo escribió:
>>>>> 
>>>>> Hi William,
>>>>> 
>>>>> you were right, it was running in various nodos:
>>>>> 
>>>>>   74545 0.60500 test     arturo       r     05/15/2012 15:17:46
>>>>> 
>>>>> [email protected]
>>>>>       MASTER
>>>>> 
>>>>> 
>>>>> [email protected]
>>>>>       SLAVE
>>>>>   74545 0.60500 test     arturo       r     05/15/2012 15:17:46
>>>>> 
>>>>> [email protected]
>>>>>         SLAVE
>>>>>   74545 0.60500 test     arturo       r     05/15/2012 15:17:46
>>>>> 
>>>>> [email protected]
>>>>>         SLAVE
>>>>> 
>>>>> Well, looking deeply, the problem is that I created a complex value
>>>>> "slotsfree" consumable and requestable and I assigned it to the node045 
>>>>> with
>>>>> the value:
>>>>> slotsfree=8 (for example).
>>>>> 
>>>>> If I submit a job using a parallel environment to this node without
>>>>> configuring this complex_value, it works perfectly.
>>>>> And when I submit a job without using a PE to this node, but with this
>>>>> complex_value configured, it also works,
>>>>> but when I submit the same job, using a PE  and the complex_value, it 
>>>>> doen't
>>>>> work, and in the output it only says this:
>>>>> 
>>>>> cannot run in PE "openmpi" because it only offers 2 slots
>>>>> 
>>>>> 
>>>>> Is it more clear now? Why does not work if I PE is configured without slot
>>>>> imitation, the node has 64 slots, and the slotsfree value is greated than 
>>>>> 4?
>>>>> 
>>>>> Thanks for your help.
>>>>> 
>>>>> Regards
>>>>> Arturo
>>>>> 
>>>>> 
>>>>> El 15/05/12 14:33, William Hay escribió:
>>>>> 
>>>>> On 15 May 2012 13:05, Arturo
>>>>> <[email protected]>
>>>>>   wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have a very strange behaviour when I try to use a parallel environment
>>>>> with hard_queue_list option.
>>>>> 
>>>>> In my script I have a parallel configuration:
>>>>> 
>>>>>      #$ -pe openmpi 4
>>>>> 
>>>>> and if submit the script in the following way it works and runs in node
>>>>> test@node045
>>>>> 
>>>>>      qsub script.sh
>>>>> 
>>>>> But If I submit the script using the hard_queue_list it doesn't run:
>>>>> 
>>>>>      qsub -q test script.sh
>>>>> 
>>>>> With this error:
>>>>> 
>>>>>      cannot run in PE "openmpi" because it only offers 2 slots
>>>>> 
>>>>> Obviously, the node is always empty. What may be wrong?
>>>>> 
>>>>> It's hard to diagnose what's going on without knowing more about your
>>>>> configuration.
>>>>> Are you certain the entire job is running in the queue instance
>>>>> test@node045 when you submit without a queue list?
>>>>> One possibility is that queue test@node045 has only two slots.  The
>>>>> master slot of the job plus one slave runs
>>>>> in test@node045 while the remaining slots run elsewhere.
>>>>> 
>>>>> When the job is running what output do you get from qstat -g t?
>>>>> 
>>>>> William
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to