Your slurm.conf line doesn't specify the node's physical memory:

NodeName=ozd2485u Gres=gpu:2 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 State=UNKNOWN

See "man slurm.conf":

   RealMemory
Size of real memory on the node in megabytes (e.g. "2048"). The default value is 1.

On 4/9/19 8:47 AM, sudhagar s wrote:
Attaching my slurm.conf file. can you please help me to find the issue.

On Tue, Apr 9, 2019 at 12:08 PM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:

    On 09-04-2019 08:33, sudhagar s wrote:
     > Thanks Ole,
     >
     > when i give "scontrol show node" it list down the details. where
    i can
     > see RealMemory=1 is this will be a problem?

    In your "scontrol show node" image I read RealMemory=1 (units of MB)
    and
    mem=1M.  I think you configured slurm.conf incorrectly.

     > On Tue, Apr 9, 2019 at 11:53 AM Ole Holm Nielsen
     > <ole.h.niel...@fysik.dtu.dk <mailto:ole.h.niel...@fysik.dtu.dk>
    <mailto:ole.h.niel...@fysik.dtu.dk
    <mailto:ole.h.niel...@fysik.dtu.dk>>> wrote:
     >
     >     On 09-04-2019 07:37, sudhagar s wrote:
     >      > Hi, Iam newbee in slurm. trying to setup a cluster for ML
    training
     >      > purpose. i created controle node and compute node. both are up
     >     and running.
     >      >
     >      > when i enter "srun -N 1 hostname" it says
     >      > " srun error memory specification can not be satisfied"
     >      > "unable to allocate resources: requested node
    configuration is not
     >      > available"
     >      >
     >      > how to fix this?
     >
>     Probably you made some errors in configuring slurm.conf. Look at your
     >     NodeName and PartitionName definitions to figure out why the
    resources
     >     are incorrect.
     >
     >     /Ole


Reply via email to