Dear Mahmood,

I'm not aware of any nodes, that have 32, or even 10 sockets. Are you sure, you want to use the cluster like that?

Best
Marcus

On 12/17/19 10:03 AM, Mahmood Naderan wrote:
Please see the latest update

# for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory; done && scontrol show node hpc | grep RealMemory
   RealMemory=64259 AllocMem=1024 FreeMem=57163 Sockets=32 Boards=1
   RealMemory=120705 AllocMem=1024 FreeMem=97287 Sockets=32 Boards=1
   RealMemory=64259 AllocMem=1024 FreeMem=40045 Sockets=32 Boards=1
   RealMemory=64259 AllocMem=1024 FreeMem=24154 Sockets=10 Boards=1



$ sbatch slurm_qe.sh
Submitted batch job 125
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)                125       SEA    qe-fb  mahmood PD       0:00      4 (Resources)                124       SEA   U1phi1 abspou     R 3:52      4 compute-0-[0-2],hpc
$ scontrol show -d job 125
JobId=125 JobName=qe-fb
   UserId=mahmood(1000) GroupId=mahmood(1000) MCS_label=N/A
   Priority=1751 Nice=0 Account=fish QOS=normal WCKey=*default
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=30-00:00:00 TimeMin=N/A
   SubmitTime=2019-12-17T12:29:08 EligibleTime=2019-12-17T12:29:08
   AccrueTime=2019-12-17T12:29:08
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T12:29:09
   Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:22742 <http://hpc.scu.ac.ir:22742>
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=4-4 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=20,mem=40G,node=4,billing=20
   Socks/Node=* NtasksPerN:B:S:C=5:0:*:* CoreSpec=*
   MinCPUsNode=5 MinMemoryNode=10G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/mahmood/qe/f_borophene/slurm_qe.sh
   WorkDir=/home/mahmood/qe/f_borophene
   StdErr=/home/mahmood/qe/f_borophene/my_fb.log
   StdIn=/dev/null
   StdOut=/home/mahmood/qe/f_borophene/my_fb.log
   Power=

$ cat slurm_qe.sh
#!/bin/bash
#SBATCH --job-name=qe-fb
#SBATCH --output=my_fb.log
#SBATCH --partition=SEA
#SBATCH --account=fish
#SBATCH --mem=10GB
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=5
mpirun -np $SLURM_NTASKS /share/apps/q-e-qe-6.5/bin/pw.x -in f_borophene_scf.in <http://f_borophene_scf.in>




You can also see the job detail of 124


$ scontrol show -d job 124
JobId=124 JobName=U1phi1
   UserId= abspou(1002) GroupId= abspou(1002) MCS_label=N/A
   Priority=958 Nice=0 Account=fish QOS=normal WCKey=*default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:06:17 TimeLimit=30-00:00:00 TimeMin=N/A
   SubmitTime=2019-12-17T12:25:17 EligibleTime=2019-12-17T12:25:17
   AccrueTime=2019-12-17T12:25:17
   StartTime=2019-12-17T12:25:17 EndTime=2020-01-16T12:25:17 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-17T12:25:17
   Partition=SEA AllocNode:Sid=hpc.scu.ac.ir:20085 <http://hpc.scu.ac.ir:20085>
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=compute-0-[0-2],hpc
   BatchHost=compute-0-0
   NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=24,mem=4G,node=4,billing=24
   Socks/Node=* NtasksPerN:B:S:C=6:0:*:* CoreSpec=*
     Nodes=compute-0-[0-2],hpc CPU_IDs=0-5 Mem=1024 GRES=
   MinCPUsNode=6 MinMemoryNode=1G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
 
Command=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/slurm_script.sh
 WorkDir=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1
 
StdErr=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
   StdIn=/dev/null
 
StdOut=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
   Power=


I can not figure out what is the root of the problem.



Regards,
Mahmood




On Tue, Dec 17, 2019 at 11:18 AM Marcus Wagner <wag...@itc.rwth-aachen.de <mailto:wag...@itc.rwth-aachen.de>> wrote:

    Dear Mahmood,

    could you please show the output of

    scontrol show -d job 119

    Best
    Marcus


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Reply via email to