from:"Mahmood Naderan"

[slurm-users] Finding submitted job script

2018-07-10 Thread Mahmood Naderan

Hi How can I check the submitted script of a running based on its jobid? Regards, Mahmood

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Mahmood Naderan

a Nettelblad, UPPMAX > > On Tue, Jul 10, 2018 at 6:16 PM, Shenglong Wang wrote: > >> scontrol show job -dd JOBID >> >> then search >> >> Command= >> >> Best, >> Shenglong >> >> On Jul 10, 2018, at 12:02 PM, Mahmood Naderan >> wrote: >> >> Hi >> How can I check the submitted script of a running based on its jobid? >> >> >> Regards, >> Mahmood >> >> >> >> >

[slurm-users] cpu limit issue

2018-07-10 Thread Mahmood Naderan

Hi, I see that although I have specified cpu limit of 12 for a user, his job only utilizes 8 cores. [root@rocks7 ~]# sacctmgr list association format=partition,account,user,grptres,maxwall PartitionAccount User GrpTRES MaxWall -- -- -- - -

Re: [slurm-users] cpu limit issue

2018-07-11 Thread Mahmood Naderan

>Check the Gaussian log file for mention of its using just 8 CPUs-- just because there are 12 CPUs available doesn't mean the program uses all of >them. It will scale-back if 12 isn't a good match to the problem as I recall. Well, in the log file, it says *

Re: [slurm-users] cpu limit issue

2018-07-11 Thread Mahmood Naderan

>Try runningps -eaf --forest while a job is running. noor 30907 30903 0 Jul10 ?00:00:00 \_ /bin/bash /var/spool/slurmd/job00749/slurm_script noor 30908 30907 0 Jul10 ?00:00:00 \_ g09 trimmer.gjf noor 30909 30908 99 Jul10 ?4-13:00:21 \_ /usr/local/chem

Re: [slurm-users] cpu limit issue

2018-07-11 Thread Mahmood Naderan

My fault. One of the other nodes was in my mind! The node which is running g09 is [root@compute-0-3 ~]# ps aux | grep l502 root 11198 0.0 0.0 112664 968 pts/0S+ 13:31 0:00 grep --color=auto l502 nooriza+ 30909 803 1.4 21095004 947968 ? Rl Jul10 6752:47 /usr/local/chem/g

[slurm-users] siesta jobs with slurm, an issue

2018-07-22 Thread Mahmood Naderan

Hi, I don't know why siesta jobs are aborted by the slurm. [mahmood@rocks7 sie]$ cat slurm_script.sh #!/bin/bash #SBATCH --output=siesta.out #SBATCH --job-name=siesta #SBATCH --ntasks=8 #SBATCH --mem=4G #SBATCH --account=z3 #SBATCH --partition=EMERALD mpirun /share/apps/chem/siesta-4.0.2/spar/sies

Re: [slurm-users] siesta jobs with slurm, an issue

2018-07-22 Thread Mahmood Naderan

I am able to directly run the command on the node. Please note in the following output that I have pressed ^C after some minutes. So, the errors are related to ^C. [mahmood@compute-0-3 ~]$ mpirun -np 4 /share/apps/chem/siesta-4.0.2/spar/siesta dimer1prime.fdf dimer1prime.out Siesta Version : v4.

Re: [slurm-users] siesta jobs with slurm, an issue

2018-07-22 Thread Mahmood Naderan

Yes. Since with my user account, I can not login to nodes, I first ssh to the node via root and the su there. [root@rocks7 ~]# ssh compute-0-3 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Last login: Sun Jul 22 21:40:09 2018 from rocks7.local Rocks Compute Node Rock

Re: [slurm-users] siesta jobs with slurm, an issue

2018-07-22 Thread Mahmood Naderan

Thanks for the hint. In fact the siesta user wasted my time too!! :/ Regards, Mahmood On Sun, Jul 22, 2018 at 11:13 PM, Renfro, Michael wrote: > You’re getting the same fundamental error in both the interactive and > batch version, though. > > The ‘reinit: Reading from standard input’ line se

[slurm-users] Unable to contact slurm controller

2018-07-31 Thread Mahmood Naderan

Hi, It seems that squeue is broken due to the following error: [root@rocks7 ~]# squeue slurm_load_jobs error: Unable to contact slurm controller (connect failure) [root@rocks7 ~]# systemctl restart slurmd [root@rocks7 ~]# systemctl restart slurmctld [root@rocks7 ~]# squeue slurm_load_jobs error:

Re: [slurm-users] Unable to contact slurm controller

2018-07-31 Thread Mahmood Naderan

I don't know what happened. It seems that it had been crashed before [root@rocks7 ~]# systemctl status slurmctld -l ● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) si

Re: [slurm-users] Unable to contact slurm controller

2018-07-31 Thread Mahmood Naderan

Thank you very much. It seems that there was an unknown control character in one of the config files which I couldn't see that in the editor. Regards, Mahmood On Tue, Jul 31, 2018 at 10:22 PM, Hadrian Djohari wrote: > Look at /var/log/slurm/slurmctld.log > >

[slurm-users] squeue column width

2018-08-29 Thread Mahmood Naderan

Hi I want to make the user column larger than the default display format. The following command fails # squeue --format="user%20" squeue: error: Invalid job format specification: user%20 user%20 Regards, Mahmood

[slurm-users] Defining constraints for job dispatching

2018-09-01 Thread Mahmood Naderan

Hi, I have found that when user A is running a fluent job (some 100% processes in top) and user B decides to run a fluent job for his own, the console window of fluent shows some messages that another fluent process is running and it can not set affinity. This is not an error, but I see that the sp

Re: [slurm-users] Defining constraints for job dispatching

2018-09-20 Thread Mahmood Naderan

my (currently few) Fluent users do all their GUI work off the cluster, > and just submit batch jobs using the generated case and data files. > > -- > Mike Renfro / HPC Systems Administrator, Information Technology Services > 931 372-3601 / Tennessee Tech University > > > O

[slurm-users] Dealing with wrong things that users do

2018-09-20 Thread Mahmood Naderan

Hi, Since users are not experts with cluster environment and slurm, their wrong works take up my time. It seems that when their fluent job crashes for some reasons, or they decide to close the fluent window without terminating the job or closing the terminal suddenly or ... the fluent processes r

Re: [slurm-users] x11 forwarding not available?

2018-10-15 Thread Mahmood Naderan

Dave, With previous versions, I followed some steps with the help of guys here. Don't know about newer versions. Please sent me a reminder in the next 24 hours and I will send you the instructions. At the moment, I don't have access to the server. Regards, Mahmood Sent from Gmail on Android

Re: [slurm-users] x11 forwarding not available?

2018-10-16 Thread Mahmood Naderan

Dave, My platform is Rocks with Centos 7.0. It may not be exactly your case, but it may help you with some ideas on what to do. I used https://github.com/hautreux/slurm-spank-x11 and here is the guide which Ian Mortimer told me: There should be a binary slurm-spank-x11 and a library x11.so whic

[slurm-users] Removing a node

2018-10-17 Thread Mahmood Naderan

Hi, I have removed a node, but the squeue command doesn't work and it seems that it still searches for the missing node. [root@rocks7 home]# > /var/log/slurm/slurmctld.log [root@rocks7 home]# systemctl restart slurmctld [root@rocks7 home]# systemctl restart slurmd [root@rocks7 home]# rocks sync s

[slurm-users] About x11 support

2018-11-15 Thread Mahmood Naderan

Hi, Is there any update about native support of x11 in slurm v18? Prior to that, I used spank-x11 where an rpm file was installed on the nodes to support x11. Now that I removed the rpm, I can not use srun with x11 support. [mahmood@rocks7 ~]$ srun --nodelist=rocks7 -n 1 -c 4 --mem=4G --x11 -A y8

Re: [slurm-users] About x11 support

2018-11-15 Thread Mahmood Naderan

>You can (apparently) still use the external plugin if you build Slurm without >its internal X11 support. Is there any way to query slurm to see if the x11 module has been compiled? Currently, I am using the slurm roll on rocks 7. Previously, I was able to use spank with slurm roll 17. While, the

Re: [slurm-users] About x11 support

2018-11-16 Thread Mahmood Naderan

So, is it still possible to use spank even when the code is compiled for x11? It seems that Rocks uses RSA keys. It also uses hostbasedauthentication. [root@rocks7 ~]# cd /etc/ssh/ [root@rocks7 ssh]# ls authorized_keys shosts.equiv ssh_host_ecdsa_key ssh_host_ed25519_key.pub ssh_known_hosts mod

Re: [slurm-users] About x11 support

2018-11-17 Thread Mahmood Naderan

>What does this command say? >scontrol show config | fgrep PrologFlags [root@rocks7 ~]# scontrol show config | fgrep PrologFlags PrologFlags = Alloc,Contain,X11 That means x11 has been compiled in the code (while Werner created the roll). >Check your slurmd logs on the compute n

Re: [slurm-users] About x11 support

2018-11-19 Thread Mahmood Naderan

ms to be right Regards, Mahmood On Mon, Nov 19, 2018 at 3:27 PM Chris Samuel wrote: > On Sunday, 18 November 2018 4:24:08 AM AEDT Mahmood Naderan wrote: > > > >What does this command say? > > > > > >scontrol show config | fgrep PrologFlags > >

Re: [slurm-users] About x11 support

2018-11-19 Thread Mahmood Naderan

means the terminal was blocked while xclock was running. Regards, Mahmood On Mon, Nov 19, 2018 at 7:21 PM Mahmood Naderan wrote: > With and without --x11, I am not able to see xclock on a compute node. > > [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-3 -n 1 -c 6 --mem=8G &

Re: [slurm-users] About x11 support

2018-11-20 Thread Mahmood Naderan

8-November/072367.html [2] https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2018-November/072380.html Regards, Mahmood On Tue, Nov 20, 2018 at 2:58 PM Chris Samuel wrote: > On Tuesday, 20 November 2018 2:51:26 AM AEDT Mahmood Naderan wrote: > > > With and without --x11, I

Re: [slurm-users] About x11 support

2018-11-21 Thread Mahmood Naderan

>The 'fix' for Mahmood would be to ssh to another host and then submit >the X11 job. The idea is to have a job manager that find the best node for a newly submitted job. If the user has to manually ssh to a node, why one should use slurm or any other thing? Regards, Mahmood

Re: [slurm-users] About x11 support

2018-11-23 Thread Mahmood Naderan

Hi Gareth, Thanks for the info. My cluster is not a big one and I have configured in the following way. 1- A frontend which has the rocks 7 (based on centos 7) with gnome. Users login to this node *only* via vncviewer. 2- While a user is connected to his gnome desktop, he opens a terminal and may r

Re: [slurm-users] About x11 support

2018-11-23 Thread Mahmood Naderan

!/bin/bash > xterm > > You should get an xterm in a batch session. > head mydomain.com must be network accessible from the compute nodes. > > Gareth > > Get Outlook for Android <https://aka.ms/ghei36> > > -- > *From:* slurm-users on

Re: [slurm-users] About x11 support

2018-11-23 Thread Mahmood Naderan

network accessible from the compute nodes. > > Gareth > > Get Outlook for Android <https://aka.ms/ghei36> > > -- > *From:* slurm-users on behalf of > Mahmood Naderan > *Sent:* Friday, November 23, 2018 6:34:42 PM > *To:* Slurm User Com

Re: [slurm-users] About x11 support

2018-11-23 Thread Mahmood Naderan

>You would need to manipulate the xauth and DISPLAY settings to make then in a different form (hostname:number or IP:number). This is not hard >when you know the trick... Can you give me a keyword for that to search? I can not understand what is going to be done. Regards, Mahmood

Re: [slurm-users] About x11 support

2018-11-23 Thread Mahmood Naderan

>I suspect if you do: >echo $DISPLAY >it will say something like :0 and Slurm doesn't allow that at present. Actually that is not applicable here. Please see below [mahmood@rocks7 ~]$ echo $DISPLAY :1 [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-3 -n 1 -c 6 --mem=8G -A y8 -p RUBY xclock

[slurm-users] Account not permitted to use this partition

2018-12-01 Thread Mahmood Naderan

Hi Although I have created an account and associated that to a partition, but the submitted job remains in PD with an error that the account is not allowed in this partition. Please see the output below: [root@rocks7 mahmood]# sacctmgr list association format=account,user,partition,association,g

[slurm-users] About srun command

2018-12-26 Thread Mahmood Naderan

Hi I use the following command to bring up a qmeu guest supported by spankx11 $ srun -n 1 -c 20 --mem=10G -p QEMU -A q20_10 --spankx11 ./run_qemu.sh where run_qemu.sh is a command like this $ cat run_qemu.sh #!/bin/bash USERN=`whoami` qemu-system-x86_64 -m 8192 -smp cores=20 -hda win7_64_snap.i

Re: [slurm-users] About srun command

2018-12-26 Thread Mahmood Naderan

m. You should not copy the messsage nor > disclose its contents to anyone. Thanks. > > > El mié., 26 dic. 2018 a las 14:31, Mahmood Naderan () > escribió: > >> Hi >> I use the following command to bring up a qmeu guest supported by spankx11 >> >> $ srun -n 1

[slurm-users] salloc unable to find the file path

2018-12-26 Thread Mahmood Naderan

Hi, Although the command exists on the node, salloc says there is no such file. Please see below [mahmood@rocks7 ~]$ cat workbench.sh #!/bin/bash #SBATCH --nodes=1 #SBATCH --cores=4 #SBATCH --mem=4G #SBATCH --partition=RUBY #SBATCH --account=y4 unset SLURM_GTIDS /state/partition1/v190/Framework/bi

Re: [slurm-users] salloc unable to find the file path

2018-12-26 Thread Mahmood Naderan

information that is > privileged or not authorized to be disclosed. If you have received it by > mistake, delete it from your system. You should not copy the messsage nor > disclose its contents to anyone. Thanks. > > > El mié., 26 dic. 2018 a las 16:37, Mahmood Naderan () > esc

Re: [slurm-users] salloc unable to find the file path

2018-12-26 Thread Mahmood Naderan

it the job. Regards, Mahmood On Wed, Dec 26, 2018 at 11:44 PM Mahmood Naderan wrote: > But the problem is that it doesn't find the file path. That is not related > to slurm parameters AFAIK. > > Regards, > Mahmood > > > > > On Wed, Dec 26, 2018 at 11:37

[slurm-users] salloc doesn't care about GrpTRES

2018-12-27 Thread Mahmood Naderan

Hi I have noticed that an interactive job via salloc doesn't care about the number of cores that I have limited in the sacctmgr. The script is: #!/bin/bash #SBATCH --nodes=1 #SBATCH --cores=16 #SBATCH --mem=8G #SBATCH --partition=QEMU #SBATCH --account=q20_8 qemu-system-x86_64 -m 4096 -smp cores=1

[slurm-users] salloc with bash scripts problem

2018-12-30 Thread Mahmood Naderan

Hi I have read that salloc has some problem running bash scripts while it is OK with binary files. The following script works fine from bash terminal, but salloc is unable to to that. $ cat slurm.sh #!/bin/bash ./script.sh files_android.txt report/android.txt $ salloc -n 1 -c 1 --mem=4G -p RUBY

Re: [slurm-users] salloc with bash scripts problem

2018-12-30 Thread Mahmood Naderan

oc ... >> >> After you have the node you run >> >> $ hostname >> >> $ stun hostname >> >> Check that difference then do the same with script >> >> >> El dom., 30 de dic. de 2018 07:17, Mahmood Naderan >> escribió: >> >>> Hi

Re: [slurm-users] salloc with bash scripts problem

2018-12-30 Thread Mahmood Naderan

r some of them, srun doesn't work while salloc works. On the other hand with srun I can choose a target nide while I can't do that with salloc. Has anybody faced such issues? On Sun, Dec 30, 2018, 20:15 Chris Samuel wrote: > On 30/12/18 7:16 am, Mahmood Naderan wrote: > > >

[slurm-users] exporting PATH inside sbatch script

2019-01-01 Thread Mahmood Naderan

Hi, May I know why the exported path in my sbatch script doesn't work? [mahmood@rocks7 qe]$ cat slurm_script.sh #!/bin/bash #SBATCH --ntasks=4 #SBATCH --mem=2G #SBATCH --partition=RUBY #SBATCH --account=y4 export PATH=$PATH:/share/apps/softwares/q-e-qe-6.2.1/bin mpirun pw.x mos2.in output.out [mah

Re: [slurm-users] exporting PATH inside sbatch script

2019-01-01 Thread Mahmood Naderan

Excuse me. That email was sent wrongly. Please ignore it. Regards, Mahmood On Tue, Jan 1, 2019 at 2:34 PM Mahmood Naderan wrote: > Hi, > May I know why the exported path in my sbatch script doesn't work? > > [mahmood@rocks7 qe]$ cat slurm_script.sh > #!/bin/bash

[slurm-users] salloc job is not shown in squeue

2019-01-02 Thread Mahmood Naderan

Hi, A user has submitted a QEMU job with "salloc --spankx11 ./run.sh" where run.sh contains some SBATCH directives and a qemu-system-x86_64 command. While in the terminal we see slurm message that job allocation is granted, we don't see any entry in the squeue. On the other hand, a qemu process is

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan

ted. So, I expected to override the default command. Regards, Mahmood On Sun, Dec 30, 2018 at 9:11 PM Mahmood Naderan wrote: > So, isn't possible to override that "default"? I mean the target node. In > the faq page it is possible to change the default command for

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan

llocate the resources AND initiate a > shell on one of the allocated nodes. > Best > Andreas > > Am 02.01.2019 um 14:43 schrieb Mahmood Naderan : > > Chris, > Can you explain why I can not get a prompt on a specific node while I have > passed the node name to salloc? > > [

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan

On Wed, Jan 2, 2019 at 6:54 PM Mahmood Naderan wrote: > I want to know if there any any way to push the node selection part on > slurm and not a manual thing that is done by user. > Currently, I have to manually ssh to a node and try to "allocate > resources" using

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan

> -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On Jan 2, 2019, at 9:24 AM, Mahmood Naderan > wrote: > > > > I want to know if there any any way to push the n

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan

I have included my login node in the list of nodes. Not all cores are included though. Please see the output of "scontrol" below [mahmood@rocks7 ~]$ scontrol show nodes NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUTot=32 CPULoad=31.96 AvailableFeatures=rack-0,32CPUs Ac

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan

So, I get [mahmood@rocks7 ~]$ salloc --spankx11 srun ./run_qemu.sh salloc: Granted job allocation 281 srun: error: Bad value for --x11: (null) srun: error: Invalid argument ((null)) for environment variable: SLURM_SPANK__SLURM_SPANK_OPTION_spankx11_spankx11 salloc: Relinquishing job allocation 281

Re: [slurm-users] salloc with bash scripts problem

2019-01-03 Thread Mahmood Naderan

p cores=1 -hda win7_x64_snap.img -boot c -usbdevice tablet -enable-kvm -device e1000,netdev=host_files -netdev user,net= 10.0.2.0/24,id=host_files,restrict=off,smb=/home/$USERN,smbserver=10.0.2.4 Regards, Mahmood On Thu, Jan 3, 2019 at 6:21 AM Chris Samuel wrote: > On 30/12/18 9:41 am

[slurm-users] Analyzing a stuck job

2019-02-14 Thread Mahmood Naderan

Hi, One job is in RH state which means JobHoldMaxRequeue. The output file, specified by --output shows nothing suspicious. Is there any way to analyze the stuck job? Regards, Mahmood

[slurm-users] Multinode MPI job

2019-03-25 Thread Mahmood Naderan

Hi Is it possible to submit a multinode mpi job with the following config: Node1: 16 cpu, 90GB Node2: 8 cpu, 20GB ? Regards, Mahmood

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan

>If your SLURM version is at least 18.08 then you should be able to do it with an heterogeneous job. See https://slurm.schedmd.com/>heterogeneous_jobs.html >From the example in that page, I have written this #!/bin/bash #SBATCH --job-name=myQE

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan

2 PM Christopher Samuel wrote: > On 3/27/19 8:39 AM, Mahmood Naderan wrote: > > > mpirun pw.x -imos2.rlx.in <http://mos2.rlx.in> > > You will need to read the documentation for this: > > https://slurm.schedmd.com/heterogeneous_jobs.html > > Especially note bo

[slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Mahmood Naderan

Hi, I want to remove a user's memory limit. Currently, I see # sacctmgr list association format=account,user,partition,grptres,maxwall | grep ghatee local ghatee cpu=16 z5 ghatee quartz cpu=16,mem=1+ 30-00:00:00 I have modified with different number of

Re: [slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Mahmood Naderan

dify user ghatee set GrpTRES=mem=-1 > > Similar for other TRES settings > > On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan > wrote: > >> Hi, >> I want to remove a user's memory limit. Currently, I see >> >> # sacctmgr list association format=accou

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan

partition, whether > intentionally or not. > > Use > scontrol show partition CLUSTER > to view it. > > On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan > wrote: > >> So, it seems that it is not an easy thing at the moment! >> >> >Partitions are defined by

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan

t forward. Regards, Mahmood On Wed, Mar 27, 2019 at 11:03 PM Christopher Samuel wrote: > On 3/27/19 11:29 AM, Mahmood Naderan wrote: > > > Thank you very much. you are right. I got it. > > Cool, good to hear. > > I'd love to hear whether you get heterogenou

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan

>srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in Still only one node is running the processes $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 755+1QUARTZ myQE ghatee R 0:47 1 rocks7

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan

Thu, Mar 28, 2019 at 11:09 AM Chris Samuel wrote: > On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote: > > > Still only one node is running the processes > > What does "srun --version" say? > > Do you get any errors in your output file from the

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan

The run is not consistent. I have manually test "mpirun -np 4 pw.x -i mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine. However, with the script "srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in" I see some errors in the output file which results in abortion

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan

gards, Mahmood On Thu, Mar 28, 2019 at 8:23 PM Mahmood Naderan wrote: > The run is not consistent. I have manually test "mpirun -np 4 pw.x -i > mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine. > However, with the script "srun --pack-group=0 --ntasks=2 : --

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan

Yes that works. $ grep "Parallel version" big-mem Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors $ squeue

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan

I test with env strace srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in in the slurm script and everything is fine now!! This is going to be a nasty bug to find... Regards, Mahmood On Thu, Mar 28, 2019 at 9:18 PM Mahmood Naderan wrote: > Yes that wo

Re: [slurm-users] Multinode MPI job

2019-03-29 Thread Mahmood Naderan

I found out that the standard script that specifies the number of tasks and memory per cpu will do the same thing that I was expecting from packjob (heterogeneous job). #SBATCH --job-name=myQE #SBATCH --output=big-mem #SBATCH --ntasks=14 #SBATCH --mem-per-cpu=17G #SBATCH --nodes=6 #SBATCH --partit

[slurm-users] Getting current memory size of a job

2019-03-29 Thread Mahmood Naderan

Hi, Is there any way to view current memory allocation of a running job? With 'sstat' I can get only MAX values, including MaxVMSize, MaxRSS. Any idea? Regards, Mahmood

Re: [slurm-users] Getting current memory size of a job

2019-04-07 Thread Mahmood Naderan

Hi again and sorry for the delay >When I was at Swinburne we asked for this as an enhancement here: >https://bugs.schedmd.com/show_bug.cgi?id=4966 The output of sstat shows the following error # squeue -j 821 JOBID PARTITION

[slurm-users] MPI job termination

2019-04-07 Thread Mahmood Naderan

Hi, A multinode MPI job terminated with the following messages in the log file =--= JOB DONE. =--= STOP 2 STOP 2 STOP 2 STOP 2 STOP 2 STOP 2 ST

Re: [slurm-users] Getting current memory size of a job

2019-04-10 Thread Mahmood Naderan

On Sunday, 7 April 2019 10:13:49 AM PDT Mahmood Naderan wrote: > > > The output of sstat shows the following error > > > > # squeue -j 821 > > JOBID PARTITION NAME USER ST TIME NODES > > NODELIST(REASON) 821 EMERALD g09-test shakerza R

[slurm-users] Pending with resource problems

2019-04-17 Thread Mahmood Naderan

Hi, Although it was fine for previous job runs, the following script now stuck as PD with the reason about resources. $ cat slurm_script.sh #!/bin/bash #SBATCH --output=test.out #SBATCH --job-name=g09-test #SBATCH --ntasks=20 #SBATCH --nodelist=compute-0-0 #SBATCH --mem=40GB #SBATCH --account=z7 #

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Mahmood Naderan

gt; And your job wants another 40G although the node only has 63G in total. > Best, > Andreas > > Am 17.04.2019 um 16:45 schrieb Mahmood Naderan : > > Hi, > Although it was fine for previous job runs, the following script now stuck > as PD with the reason about resour

[slurm-users] Job dispatching policy

2019-04-22 Thread Mahmood Naderan

Hi, How can I change the job distribution policy? Since some nodes are running non-slurm jobs, it seems that the dispatcher isn't aware of system load. Therefore, it assumes that the node is free. I want to change the policy based on the system load. Regards, Mahmood

Re: [slurm-users] Job dispatching policy

2019-04-23 Thread Mahmood Naderan

ugly, but there are some X11 applications that are not slurm friendly. Number of non slurm nodes though are small. On Tue, Apr 23, 2019, 18:45 Prentice Bisbal wrote: > > On 4/23/19 2:47 AM, Mahmood Naderan wrote: > > Hi, > How can I change the job distribution policy? Since so

Re: [slurm-users] Job dispatching policy

2019-04-27 Thread Mahmood Naderan

>More constructively - maybe the list can help you get the X11 applications to run using Slurm. >Could you give some details please? For example, I an not run this GUI program with salloc [mahmood@rocks7 ~]$ cat workbench.sh #!/bin/bash unset SLURM_GTIDS /state/partition1/ans190/v190/Framewor

Re: [slurm-users] Job dispatching policy

2019-04-29 Thread Mahmood Naderan

ompute-0-1 ~]$ On the node, the program opened and I saw the GUI. Then I closed it. This is not the only problem. I also have problems with qemu runs. Regards, Mahmood On Sat, Apr 27, 2019 at 8:18 PM Chris Samuel wrote: > On 27/4/19 2:20 am, Mahmood Naderan wrote: > > >

Re: [slurm-users] Job dispatching policy

2019-04-30 Thread Mahmood Naderan

>Also why aren't you using the Slurm commands to run things? Which command? Regards, Mahmood

[slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan

Hi I think I have asked this question before, but wasn't able to fix that. While "xclock" command works by "ssh -Y", srun with x11 option fails to opens xclock. [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-0 --account y4 --partition RUBY -n 1 -c 4 --mem=1GB xclock srun: error: Cannot forwa

Re: [slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan

>What does this say? >echo $DISPLAY On frontend of compute-0-0? [mahmood@rocks7 ~]$ echo $DISPLAY :1 >To get native X11 working with SLURM, we had to add this config to sshd_config on the login node (your rocks7 host) >X11UseLocalhost no >You'll then need to restart sshd I checked that and it

Re: [slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan

>No, but you'll need to logout of rocks7 and ssh back into it. >Are you physically logged into rocks7? Or are you connecting via SSH? $DISPLAY = :1 kind of means that you are physically logged into the machine I am connecting through a vnc session. Right now, I have access to the desktop of the f

Re: [slurm-users] Issue with x11

2019-05-15 Thread Mahmood Naderan

>please open a console in the VNC session, do a ssh -Y rocks7 in the console (yes, relogin to the console) and try it again. >SLURM does not want to use local displays, and a VNC session is a "local" display, as far as it concerns linux and the X11 >subsystem. >So, you need to relogin or login to a

Re: [slurm-users] Issue with x11

2019-05-15 Thread Mahmood Naderan

>Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7) from a different machine, and then try your srun >--x11 command? No... This doesn't work either. The error is X11 forwarding not available. Please see the picture at https://pasteboard.co/IeQGNOx.png Regards, Mahmood

Re: [slurm-users] Issue with x11

2019-05-15 Thread Mahmood Naderan

>A >lot of interactive software works a lot better over things like NX, so >why this limitation? Agreed... Slurm is a very powerful job manager and I really appreciate its capabilities. However, I don't know why x11 has been always a pain for that? spank-x11 was good but that was not a builtin fe

Re: [slurm-users] Issue with x11

2019-05-16 Thread Mahmood Naderan

Can I ask what is the expected release date for 19? It seems that rc1 has been released in theMay? Regards, Mahmood On Thu, May 16, 2019 at 4:48 PM Marcus Wagner wrote: > Hi Alan, > > we are also seeing this, but that has nothing to do with X11 support, > since we compile atm. SLURM without

[slurm-users] Access/permission denied

2019-05-20 Thread Mahmood Naderan

Hi Although proper configuration has been defined as below [root@rocks7 software]# grep RUBY /etc/slurm/parts PartitionName=RUBY AllowAccounts=y4,y8 Nodes=compute-0-[1-4] [root@rocks7 software]# sacctmgr list association format=account,"user%20",partition,grptres,maxwall | grep kouhikamali3 l

[slurm-users] Counting total number of cores specified in the sbatch file

2019-06-08 Thread Mahmood Naderan

Hi, A genetic program uses -num_threads in command line for parallel run. I use the following directives in slurm batch file #SBATCH --ntasks-per-node=6 #SBATCH --nodes=2 #SBATCH --mem-per-cpu=2G for 12 processes and 24GB of memory. Is there any slurm variable that counts all threads from the dir

Re: [slurm-users] Counting total number of cores specified in the sbatch file

2019-06-08 Thread Mahmood Naderan

get the total tasks, $SLURM_NTASKS is probably > what you are looking for > > > > Brian Andrus > > > > On 6/8/2019 2:46 AM, Mahmood Naderan wrote: > > Hi, > > A genetic program uses -num_threads in command line for parallel run. I > use the following directives

[slurm-users] salloc not able to run sbash script

2019-06-17 Thread Mahmood Naderan

Hi, May I know why the user is not able to run a qemu interactive job? According to the configuration which I made, everything should be fine. Isn't that? [valipour@rocks7 ~]$ salloc run_qemu.sh salloc: Granted job allocation 1209 salloc: error: Unable to exec command "run_qemu.sh" salloc: Relinqu

[slurm-users] Job not running of the specified node

2019-07-09 Thread Mahmood Naderan

Hi, I use the following script for qemu run #!/bin/bash #SBATCH --nodelist=compute-0-1 #SBATCH --cores=8 #SBATCH --mem=40G #SBATCH --partition=QEMU #SBATCH --account=q20_8 USERN=`whoami` qemu-system-x86_64 -m 4 -smp cores=8 -hda win7_sp1_x64.img -boot c -usbdevice tablet -enable-kvm -device e

[slurm-users] No error/output/run

2019-07-24 Thread Mahmood Naderan

Hi, I don't know why no error/output file is generated after the job submission. $ ls -l total 8 -rw-r--r-- 1 montazeri montazeri 472 Jul 24 12:52 in.lj -rw-rw-r-- 1 montazeri montazeri 254 Jul 24 12:53 slurm_script.sh $ cat slurm_script.sh #!/bin/bash #SBATCH --job-name=my_lammps #SBATCH --output

Re: [slurm-users] No error/output/run

2019-07-24 Thread Mahmood Naderan

> why not use sacct? squeue is only for queued and running jobs. $ sacct -j 1277 JobIDJobName PartitionAccount AllocCPUS State ExitCode -- -- -- -- -- 1277 my_lammpsEMERALDz55 12 F

Re: [slurm-users] No error/output/run

2019-07-24 Thread Mahmood Naderan

Indeed the problem was about disk space on one of the computes that the job used that. Thank you very much. which implies that the job failed before creating the output file. > could you have a problem accessing the working directory on the compute > nodes? over-quota even? I would certainly e

[slurm-users] Changing node core count

2019-07-29 Thread Mahmood Naderan

Hi, I want to change the number of cores of a node. In sview, I right click on "update available features" and in the text box, I write "cpu=12". However, it seems that it is not correct as it writes an error in the bottom of sview window. Any guide? Regards, Mahmood

[slurm-users] Heterogeneous HPC

2019-09-19 Thread Mahmood Naderan

Hi The question is not directly related to Slurm, but is actually related to the people in this community. For heterogeneous environments, where different operating systems, application and library versions are needed for HPC users, I would like to know it using docker/containers is better than yi

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Mahmood Naderan

what outdated... > > Best, > Christoph > > > On 19/09/2019 10.08, Mahmood Naderan wrote: > > Hi > > The question is not directly related to Slurm, but is actually related > > to the people in this community. > > > > For heterogeneous environments, wher

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Mahmood Naderan

For the replies. Matlab was an example. I would also like to create to containers for OpenFoam with different versions. Then a user can choose what he actually wants. I would also like to know, if the technologies you mentioned can be deployed in multinode clusters. Currently, we use Rocks 7. Shou

Re: [slurm-users] Heterogeneous HPC

2019-09-20 Thread Mahmood Naderan

I appreciate the repplies. I will try to test Charliecloud to see what is what... On Fri, Sep 20, 2019, 10:37 Fulcomer, Samuel wrote: > > > Thanks! and I'll watch the video... > > Privileged containers! never! > > On Thu, Sep 19, 2019 at 9:06 PM Michael Jennings wrote: > >> On Thursday

[slurm-users] Removing user from slurm configuration

2019-10-10 Thread Mahmood Naderan

Hi I had created multiple test users, and then removed them. However, I see they are still present in slurm database. How can I remove them? # sacctmgr list association format=account,user Account User -- -- root root root local localmahmood

1 2 3 >

1 - 100 of 233 matches

Mail list logo