this might be nothing, but i usually call --gres with an equals srun --gres=gpu:k10:8
i'm not sure if the equals is optional or not On Wed, Nov 16, 2016 at 4:34 AM, Dmitrij S. Kryzhevich <kryz...@ispms.ru> wrote: > > Hi, > > I have some issues with gres usage. I'm running slurm of 16.05.4 version and > I have a small stand with 4 nodes+master. The best description of it would > be to paste confs: > slurm.conf: http://paste.org.ru/?m8v7ca > gres.conf: http://paste.org.ru/?ouspnz > They are populated on each node. > > And the problem is following: > > [dkryzhevich@gpu ~]$ srun -N 1 --gres gpu:c2050 <whatever> > srun: error: Unable to allocate resources: Requested node configuration is > not available > [dkryzhevich@gpu ~]$ > > Relevant logs: http://paste.org.ru/?mj4dfs > Whatever I did with --gres flag it just does not start. What am I missing > here? > > I tried to remove Type column from gres.conf and all nodes have gone into > "drain" state. I tried to remove all details from Gres column in slurm.conf > in addition (i.e. "NodeName=node2 Gres=gpu:1 CoresPerSocket=2 > ThreadsPerCore=2 State=UNKNOWN") and task was submitted but I want the > ability to specify type of card in case I really need it. > > And two small unrelevant questions. > 1. Is it possible to submit a job from any node, or is it master only? Start > secondary slurmctl daemon on each node may be, I don't know. > 2. Is it possible to start a job on two separate nodes with nvidia cards in > a way something like > $ srun --gres gpu:2 > ? The point is to use 2-3-4 cards installed on different nodes with some MPI > connection between threads. > > BR, > Dmitrij