Dear Ran,

you can only ask for GPUS PER NODE, as gres are ressources per node.

So, you can ask for 5 gpus and then get 5 gpus on each of the two nodes.
At the moment it is not possible to ask for 8 gpus on one node and 2 on another. That MIGHT change with slurm 19.05, since SchedMD is overhauling besides pother things the gpu handling within slurm.


Best
Marcus

On 4/16/19 9:15 AM, Ran Du wrote:
Dear Antony,

      It's worked!

      I checked the allocation, and here is the record:

      Nodes=gpu012 CPU_IDs=0-2 Mem=3072 GRES_IDX=gpu:v100(IDX:0-7) Nodes=gpu013 CPU_IDs=0 Mem=1024 GRES_IDX=gpu:v100(IDX:0-7)

      The job has got what it applied for.

      And another question is : how to apply for multiple cards could not be divided exactly by 8? For example, to apply for 10 GPU cards, 8 cards on one node and 2 cards on another node?

     Thanks a lot again for your kind help.

Best regards,
Ran

On Mon, Apr 15, 2019 at 8:25 PM Ran Du <bella.ran...@gmail.com <mailto:bella.ran...@gmail.com>> wrote:

    Dear Antony,

           Thanks a lot for your reply, I tried to submit a job with
    your advice, and no more sbatch errors.

           But because our cluster is under maintenance, I have to
    wait till tomorrow to see if GPU cards are allocated correctly.  I
    will let you know as soon as the job is submitted successfully.

           Thanks a lot for your kind help.

    Best regards,
    Ran

    On Mon, Apr 15, 2019 at 4:40 PM Antony Cleave
    <antony.cle...@gmail.com <mailto:antony.cle...@gmail.com>> wrote:

        Ask for 8 gpus on 2 nodes instead.

        In your script just change the 16 to 8 and it should do
        what you want.

        You are currently asking for 2 nodes with 16 gpu each as Gres
        resources are per node.

        Antony

        On Mon, 15 Apr 2019, 09:08 Ran Du, <bella.ran...@gmail.com
        <mailto:bella.ran...@gmail.com>> wrote:

            Dear all,

                 Does anyone know how to set #SBATCH options to get
            multiple GPU cards from different worker nodes?

                 One of our users would like to apply for 16 NVIDIA
            V100 cards for his job, and there are 8 GPU cards on each
            worker node, I have tried the following #SBATCH options:

                  #SBATCH --partition=gpu
                  #SBATCH --qos=normal
                  #SBATCH --account=u07
                  #SBATCH --job-name=cross
                  #SBATCH --nodes=2
                  #SBATCH --mem-per-cpu=1024
                  #SBATCH --output=test.32^4.16gpu.log
                  #SBATCH --gres=gpu:v100:16

                  but got the sbatch error message :
                  sbatch: error: Batch job submission failed:
            Requested node configuration is not available

                  And I found a similar question on stack overflow:
            
https://stackoverflow.com/questions/45200926/how-to-access-to-gpus-on-different-nodes-in-a-cluster-with-slurm

                  And it is said that multiple GPU cards allocation on
            different worker nodes are not available, the post is in
            2017, is it still true at present?

                  Thanks a lot for your help.

            Best regards,
            Ran


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Reply via email to