[slurm-users] Having issue in running Job using tensorflow

2019-04-16 Thread sudhagar s
sh-4.3# srun -N 2 -n 40 -t 24:00:00 job.sh srun: error: timeout waiting for task launch, started 0 of 40 tasks srun: Job step 13.0 aborted before step completely launched. srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** STEP 13.0 ON ozd2485u CANCELLE

Re: [slurm-users] Having Issue in Slurm cluster setup

2019-04-08 Thread sudhagar s
Attaching my slurm.conf file. can you please help me to find the issue. On Tue, Apr 9, 2019 at 12:08 PM Ole Holm Nielsen wrote: > On 09-04-2019 08:33, sudhagar s wrote: > > Thanks Ole, > > > > when i give "scontrol show node" it list down the details. where i c

Re: [slurm-users] Having Issue in Slurm cluster setup

2019-04-08 Thread sudhagar s
i didnt place any additional GPU card. i run this z840 workstation with default GPU (p2000) which is used for display(VGA). This might be the reason for this error then? On Tue, Apr 9, 2019 at 12:01 PM Ole Holm Nielsen wrote: > On 09-04-2019 08:25, sudhagar s wrote: > > Thank

Re: [slurm-users] Having Issue in Slurm cluster setup

2019-04-08 Thread sudhagar s
Thanks Ole, when i give "scontrol show node" it list down the details. where i can see RealMemory=1 is this will be a problem? On Tue, Apr 9, 2019 at 11:53 AM Ole Holm Nielsen wrote: > On 09-04-2019 07:37, sudhagar s wrote: > > Hi, Iam newbee in slurm. trying to set

[slurm-users] Having Issue in Slurm cluster setup

2019-04-08 Thread sudhagar s
Hi, Iam newbee in slurm. trying to setup a cluster for ML training purpose. i created controle node and compute node. both are up and running. when i enter "srun -N 1 hostname" it says " srun error memory specification can not be satisfied" "unable to allocate resources: requested node configurati