Re: [slurm-users] ntasks-per-node for mpirun

2020-09-28 Thread Mahmood Naderan
Excuse me, my fault... Please ignore that email. Regards, Mahmood On Mon, Sep 28, 2020 at 11:34 PM Mahmood Naderan wrote: > Hi > With "cpu=48,mem=40G" limits and the following script > > #!/bin/bash > #SBATCH --job-name=gr > #SBATCH --output=my_gr.log > #S

[slurm-users] ntasks-per-node for mpirun

2020-09-28 Thread Mahmood Naderan
Hi With "cpu=48,mem=40G" limits and the following script #!/bin/bash #SBATCH --job-name=gr #SBATCH --output=my_gr.log #SBATCH --partition=SEA #SBATCH --account=fish #SBATCH --mem=6GB #SBATCH --nodes=4 #SBATCH --ntasks-per-node=20 mpirun -np $SLURM_NTASKS /share/apps/gromacs-2019.6/single/bin/gmx_m

Re: [slurm-users] Internet connection loss with srun to a node

2020-08-04 Thread Mahmood Naderan
lurm issue , > pretty sure! > > Matt > > On 8/2/20, 7:53 AM, "slurm-users on behalf of Mahmood Naderan" < > slurm-users-boun...@lists.schedmd.com on behalf of mahmood...@gmail.com> > wrote: > > Hi > A frontend machine is connected to the intern

Re: [slurm-users] Internet connection loss with srun to a node

2020-08-02 Thread Mahmood Naderan
aditional router? > > Get Outlook for iOS <https://aka.ms/o0ukef> > -- > *From:* slurm-users on behalf of > Mahmood Naderan > *Sent:* Sunday, August 2, 2020 7:52:52 AM > *To:* Slurm User Community List > *Subject:* [slurm-users] Internet connection loss wit

[slurm-users] Internet connection loss with srun to a node

2020-08-02 Thread Mahmood Naderan
Hi A frontend machine is connected to the internet and from that machine, I use srun to get a bash on another node. But it seems that the node is unable to access the internet. The http_proxy and https_proxy are defined in ~/.bashrc mahmood@main-proxy:~$ ping google.com PING google.com (216.58.215

[slurm-users] Yet another issue with AssocGrpMemLimit

2020-05-12 Thread Mahmood Naderan
Hi With the following memory stats on two nodes [root@hpc slurm]# scontrol show node compute-0-0 | grep Memory RealMemory=64259 AllocMem=0 FreeMem=63429 Sockets=32 Boards=1 [root@hpc slurm]# scontrol show node compute-0-1 | grep Memory RealMemory=120705 AllocMem=1024 FreeMem=103051 Sockets=3

[slurm-users] One node is not used by slurm

2020-04-19 Thread Mahmood Naderan
Hi, Although compute-0-0 is included in a partition, I have noticed that no job is offloaded there automatically. If someone intentionally write --nodelist=compute-0-0 it will be fine. # grep -r compute-0-0 . ./nodenames.conf.new:NodeName=compute-0-0 NodeAddr=10.1.1.254 CPUs=32 Weight=20511900 Fea

Re: [slurm-users] Virtual memory size requested by slurm

2020-01-28 Thread Mahmood Naderan
>If you want the virtual memory size to be unrestricted by slurm, set VSizeFactor to 0 in slurm.conf, which according >to the documentation disables virtual memory limit enforcement. > >https://slurm.schedmd.com/slurm.conf.html#OPT_VSizeFactor

Re: [slurm-users] Virtual memory size requested by slurm

2020-01-27 Thread Mahmood Naderan
>This line is probably what is limiting you to around 40gb. >#SBATCH --mem=38GB Yes. If I change that value, the "ulimit -v" also changes. See below [shams@hpc ~]$ cat slurm_blast.sh | grep mem #SBATCH --mem=50GB [shams@hpc ~]$ cat my_blast.log virtual memory (kbytes, -v) 57671680 /var/

Re: [slurm-users] Print slurm cgroup parameters

2020-01-27 Thread Mahmood Naderan
> scontrol show config | tail -22 Thank you. Regards, Mahmood

Re: [slurm-users] Virtual memory size requested by slurm

2020-01-27 Thread Mahmood Naderan
>You may check here and see if it helps you: >https://unix.stackexchange.com/questions/345595/how-to-set-ulimits-on-service-with-systemd I guess these settings are related to the system. As I said, outsid

Re: [slurm-users] Virtual memory size requested by slurm

2020-01-27 Thread Mahmood Naderan
Excuse me, I see that "ulimit -a" shows unlimited virtual memory size when the user runs that on terminal. However, when he puts the command in sbatch script, the value is limited to about 40GB. [shams@hpc ~]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) un

[slurm-users] Print slurm cgroup parameters

2020-01-27 Thread Mahmood Naderan
Hi Is there any command to print current cgroup parameters or configurations that are used by Slurm? Regards, Mahmood

Re: [slurm-users] Virtual memory size requested by slurm

2020-01-26 Thread Mahmood Naderan
>alternatively, you can ask slurm not to limit VSZ: in cgroup.conf, have >ConstrainSwapSpace=no >this does not actually permit arbitrary VSZ, since there are mechanisms >outside the cgroup limit that affect max VSZ (overcommit sysctls, swap space) Hi Mark, ConstrainSwapSpace=no or ConstrainSwapSp

[slurm-users] Virtual memory size requested by slurm

2020-01-26 Thread Mahmood Naderan
Hi, As a follow up to my last problem, I would like to know how can I tell slurm to increase the virtual memory size for a process? The program has no problem when I run it outside of the slurm via terminal/bash. Regards, Mahmood

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
I added that line and restarted the service via # systemctl restart slurmctld However, still I get the same error. Moreover, when I salloc, I don't see slurm/ in cgroup path [shams@hpc ~]$ salloc salloc: Granted job allocation 293 [shams@hpc ~]$ bin/show_my_cgroup --debug bash: bin/show_my_cgrou

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
>depends on whether "ConstrainSwapSpace=yes" appears in cgroup.conf. Thanks for the detail. On the head node, mine is # cat cgroup.conf CgroupAutomount=yes CgroupReleaseAgentDir="/etc/slurm/cgroup" ConstrainCores=no ConstrainRAMSpace=no Is that the root of the problem? Regards, Mahmood

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
>you have it backwards. slurm creates a cgroup for the job (step) >and uses the cgroup control to tell the kernel how much memory to >permit the job-step to use. I would like to know how can I increase the threshold in slurm config files. I can not find it. According to [1], " No value is provi

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
>how much memory are you requesting from Slurm in your job? #SBATCH --mem=38GB also, # sacctmgr list association format=user,grptres%30 | grep shams shams cpu=10,mem=40G Regards, Mahmood

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
Excuse me, I have confused with that. While the cgroup value is 68GB, I run on terminal and see the VSZ is about 80GB and the program runs normally. However, with slurm on that node, I can not run. Why on terminal I can run, but I can not run via slurm? I wonder if slurm gets the right value from

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
ood On Fri, Jan 24, 2020 at 7:35 PM Mahmood Naderan wrote: > Yes, it uses a large value for virtual size. > Since I can run it via terminal (outside of slurm), I think kernel > parameters are OK. > In other words, I have to configure slurm for that purpose. > Which slurm configur

Re: [slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
Yes, it uses a large value for virtual size. Since I can run it via terminal (outside of slurm), I think kernel parameters are OK. In other words, I have to configure slurm for that purpose. Which slurm configuration parameter is in charge of that? Regards, Mahmood On Fri, Jan 24, 2020 at 5:22

[slurm-users] blastx fails with "Error memory mapping"

2020-01-24 Thread Mahmood Naderan
Hi, Although I can run the blastx command on terminal on all nodes, I can not use slurm for that due to a so called "memory map error". Please see below that I pressed ^C after some seconds when running via terminal. Fri Jan 24 15:29:57 +0330 2020 [shams@hpc ~]$ blastx -db ~/ncbi-blast-2.9.0+/bin/

[slurm-users] Multinode blast run

2020-01-24 Thread Mahmood Naderan
Hi, Has anyone run blast on multiple nodes via slurm? The question should be asked from blast guys but I didn't find their discussion mailing list. I see the example on [1] which uses "-N 1" and "--ntasks-per-node". So that limits to one node run only. Thanks for any comment. [1] http://hpc.medi

Re: [slurm-users] Question about memory allocation

2019-12-17 Thread Mahmood Naderan
>Did you do an scontrol reconfigure? Thank you. That solved the issue. Regards, Mahmood

Re: [slurm-users] Question about memory allocation

2019-12-17 Thread Mahmood Naderan
>Your running job is requesting 6 CPUs per node (4 nodes, 6 CPUs per node). That means 6 CPUs are being used on node hpc. >Your queued job is requesting 5 CPUs per node (4 nodes, 5 CPUs per node). In total, if it was running, that would require 11 CPUs on node hpc. But hpc only has 10 cores, so it

Re: [slurm-users] Question about memory allocation

2019-12-17 Thread Mahmood Naderan
Please see the latest update # for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory; done && scontrol show node hpc | grep RealMemory RealMemory=64259 AllocMem=1024 FreeMem=57163 Sockets=32 Boards=1 RealMemory=120705 AllocMem=1024 FreeMem=97287 Sockets=32 Boards=1 RealMem

Re: [slurm-users] Question about memory allocation

2019-12-16 Thread Mahmood Naderan
Excuse me, still I have problem. Although I freed memory on the nodes as below RealMemory=64259 AllocMem=1024 FreeMem=61882 Sockets=32 Boards=1 RealMemory=120705 AllocMem=1024 FreeMem=115257 Sockets=32 Boards=1 RealMemory=64259 AllocMem=26624 FreeMem=61795 Sockets=32 Boards=1 RealMemor

Re: [slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread Mahmood Naderan
OK. It takes some time for scontrol to update the values. I can now see more free memory as below RealMemory=120705 AllocMem=1024 FreeMem=115290 Sockets=32 Boards=1 Thank you William. Regards, Mahmood On Mon, Dec 16, 2019 at 7:55 PM Mahmood Naderan wrote: > >Memory may be bein

Re: [slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread Mahmood Naderan
>Memory may be being used by jobs running, or tasks outside the control of >Slurm running, or possibly NFS buffer cache or similar. You may need to >start an ssh session on the node and look. I checked that. For example, on compute-0-1, I see RealMemory=120705 AllocMem=1024 FreeMem=8442 Sock

[slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread Mahmood Naderan
Hi, With the following output RealMemory=64259 AllocMem=1024 FreeMem=38620 Sockets=32 Boards=1 RealMemory=120705 AllocMem=1024 FreeMem=309 Sockets=32 Boards=1 RealMemory=64259 AllocMem=1024 FreeMem=59334 Sockets=32 Boards=1 RealMemory=64259 AllocMem=1024 FreeMem=282 Sockets=10 Boards=1

Re: [slurm-users] Question about memory allocation

2019-12-15 Thread Mahmood Naderan
--ntasks-per-node=5 the total requested memory should be 40GB and not 200GB. Regards, Mahmood On Mon, Dec 16, 2019 at 10:19 AM Mahmood Naderan wrote: > >No, this indicates the amount of residual/real memory as reqeusted per > node. Your job will be only runnable on nodes >that offe

Re: [slurm-users] Question about memory allocation

2019-12-15 Thread Mahmood Naderan
> Institut für Chemie > Sekretariat C3 > Straße des 17. Juni 135 > 10623 Berlin > > Email: sebastian.kr...@tu-berlin.de > > -- > *From:* slurm-users on behalf of > Mahmood Naderan > *Sent:* Monday, December 16, 2019 07:19 > *To:* Slu

Re: [slurm-users] Question about memory allocation

2019-12-15 Thread Mahmood Naderan
AllocMem=1024 FreeMem=282 Sockets=10 Boards=1 However, a script with #SBATCH --mem=10GB #SBATCH --nodes=4 #SBATCH --ntasks-per-node=5 stuck in PD state with (resources) reason. Any idea about that? Regards, Mahmood On Mon, Dec 16, 2019 at 9:49 AM Mahmood Naderan wrote: > Hi, > If I

[slurm-users] Question about memory allocation

2019-12-15 Thread Mahmood Naderan
Hi, If I write #SBATCH --mem=10GB #SBATCH --nodes=4 #SBATCH --ntasks-per-node=5 will it reserve (looks for) 200GB of memory for the job? Or this is the hard limit of the memory required by job? Regards, Mahmood

[slurm-users] Issue with my hello mpi toy program

2019-10-17 Thread Mahmood Naderan
Hi, I used to run a hello mpi for testing purposes. Now, I see that it doesn't work. While the log file shows memory allocation problem, squeue shows that job is in R state endlessly. [mahmood@hpc ~]$ cat slurm_script1.sh #!/bin/bash #SBATCH --job-name=hello_mpi #SBATCH --output=hellompi.log #SBAT

Re: [slurm-users] Removing user from slurm configuration

2019-10-12 Thread Mahmood Naderan
> sacctmgr delete user XXX Thanks. This is much better than manually changing the db. Regards, Mahmood On Fri, Oct 11, 2019 at 7:00 PM Christopher Samuel wrote: > On 10/10/19 8:53 AM, Marcus Wagner wrote: > > > if you REALLY want to get rid of that user, you might need to manipulate > > th

[slurm-users] Removing user from slurm configuration

2019-10-10 Thread Mahmood Naderan
Hi I had created multiple test users, and then removed them. However, I see they are still present in slurm database. How can I remove them? # sacctmgr list association format=account,user Account User -- -- root root root local localmahmood

Re: [slurm-users] Heterogeneous HPC

2019-09-20 Thread Mahmood Naderan
I appreciate the repplies. I will try to test Charliecloud to see what is what... On Fri, Sep 20, 2019, 10:37 Fulcomer, Samuel wrote: > > > Thanks! and I'll watch the video... > > Privileged containers! never! > > On Thu, Sep 19, 2019 at 9:06 PM Michael Jennings wrote: > >> On Thursday

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Mahmood Naderan
For the replies. Matlab was an example. I would also like to create to containers for OpenFoam with different versions. Then a user can choose what he actually wants. I would also like to know, if the technologies you mentioned can be deployed in multinode clusters. Currently, we use Rocks 7. Shou

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Mahmood Naderan
what outdated... > > Best, > Christoph > > > On 19/09/2019 10.08, Mahmood Naderan wrote: > > Hi > > The question is not directly related to Slurm, but is actually related > > to the people in this community. > > > > For heterogeneous environments, wher

[slurm-users] Heterogeneous HPC

2019-09-19 Thread Mahmood Naderan
Hi The question is not directly related to Slurm, but is actually related to the people in this community. For heterogeneous environments, where different operating systems, application and library versions are needed for HPC users, I would like to know it using docker/containers is better than yi

[slurm-users] Changing node core count

2019-07-29 Thread Mahmood Naderan
Hi, I want to change the number of cores of a node. In sview, I right click on "update available features" and in the text box, I write "cpu=12". However, it seems that it is not correct as it writes an error in the bottom of sview window. Any guide? Regards, Mahmood

Re: [slurm-users] No error/output/run

2019-07-24 Thread Mahmood Naderan
Indeed the problem was about disk space on one of the computes that the job used that. Thank you very much. which implies that the job failed before creating the output file. > could you have a problem accessing the working directory on the compute > nodes? over-quota even? I would certainly e

Re: [slurm-users] No error/output/run

2019-07-24 Thread Mahmood Naderan
> why not use sacct? squeue is only for queued and running jobs. $ sacct -j 1277 JobIDJobName PartitionAccount AllocCPUS State ExitCode -- -- -- -- -- 1277 my_lammpsEMERALDz55 12 F

[slurm-users] No error/output/run

2019-07-24 Thread Mahmood Naderan
Hi, I don't know why no error/output file is generated after the job submission. $ ls -l total 8 -rw-r--r-- 1 montazeri montazeri 472 Jul 24 12:52 in.lj -rw-rw-r-- 1 montazeri montazeri 254 Jul 24 12:53 slurm_script.sh $ cat slurm_script.sh #!/bin/bash #SBATCH --job-name=my_lammps #SBATCH --output

[slurm-users] Job not running of the specified node

2019-07-09 Thread Mahmood Naderan
Hi, I use the following script for qemu run #!/bin/bash #SBATCH --nodelist=compute-0-1 #SBATCH --cores=8 #SBATCH --mem=40G #SBATCH --partition=QEMU #SBATCH --account=q20_8 USERN=`whoami` qemu-system-x86_64 -m 4 -smp cores=8 -hda win7_sp1_x64.img -boot c -usbdevice tablet -enable-kvm -device e

[slurm-users] salloc not able to run sbash script

2019-06-17 Thread Mahmood Naderan
Hi, May I know why the user is not able to run a qemu interactive job? According to the configuration which I made, everything should be fine. Isn't that? [valipour@rocks7 ~]$ salloc run_qemu.sh salloc: Granted job allocation 1209 salloc: error: Unable to exec command "run_qemu.sh" salloc: Relinqu

Re: [slurm-users] Counting total number of cores specified in the sbatch file

2019-06-08 Thread Mahmood Naderan
get the total tasks, $SLURM_NTASKS is probably > what you are looking for > > > > Brian Andrus > > > > On 6/8/2019 2:46 AM, Mahmood Naderan wrote: > > Hi, > > A genetic program uses -num_threads in command line for parallel run. I > use the following directives

[slurm-users] Counting total number of cores specified in the sbatch file

2019-06-08 Thread Mahmood Naderan
Hi, A genetic program uses -num_threads in command line for parallel run. I use the following directives in slurm batch file #SBATCH --ntasks-per-node=6 #SBATCH --nodes=2 #SBATCH --mem-per-cpu=2G for 12 processes and 24GB of memory. Is there any slurm variable that counts all threads from the dir

[slurm-users] Access/permission denied

2019-05-20 Thread Mahmood Naderan
Hi Although proper configuration has been defined as below [root@rocks7 software]# grep RUBY /etc/slurm/parts PartitionName=RUBY AllowAccounts=y4,y8 Nodes=compute-0-[1-4] [root@rocks7 software]# sacctmgr list association format=account,"user%20",partition,grptres,maxwall | grep kouhikamali3 l

Re: [slurm-users] Issue with x11

2019-05-16 Thread Mahmood Naderan
Can I ask what is the expected release date for 19? It seems that rc1 has been released in theMay? Regards, Mahmood On Thu, May 16, 2019 at 4:48 PM Marcus Wagner wrote: > Hi Alan, > > we are also seeing this, but that has nothing to do with X11 support, > since we compile atm. SLURM without

Re: [slurm-users] Issue with x11

2019-05-15 Thread Mahmood Naderan
>A >lot of interactive software works a lot better over things like NX, so >why this limitation? Agreed... Slurm is a very powerful job manager and I really appreciate its capabilities. However, I don't know why x11 has been always a pain for that? spank-x11 was good but that was not a builtin fe

Re: [slurm-users] Issue with x11

2019-05-15 Thread Mahmood Naderan
>Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7) from a different machine, and then try your srun >--x11 command? No... This doesn't work either. The error is X11 forwarding not available. Please see the picture at https://pasteboard.co/IeQGNOx.png Regards, Mahmood

Re: [slurm-users] Issue with x11

2019-05-15 Thread Mahmood Naderan
>please open a console in the VNC session, do a ssh -Y rocks7 in the console (yes, relogin to the console) and try it again. >SLURM does not want to use local displays, and a VNC session is a "local" display, as far as it concerns linux and the X11 >subsystem. >So, you need to relogin or login to a

Re: [slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan
>No, but you'll need to logout of rocks7 and ssh back into it. >Are you physically logged into rocks7? Or are you connecting via SSH? $DISPLAY = :1 kind of means that you are physically logged into the machine I am connecting through a vnc session. Right now, I have access to the desktop of the f

Re: [slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan
>What does this say? >echo $DISPLAY On frontend of compute-0-0? [mahmood@rocks7 ~]$ echo $DISPLAY :1 >To get native X11 working with SLURM, we had to add this config to sshd_config on the login node (your rocks7 host) >X11UseLocalhost no >You'll then need to restart sshd I checked that and it

[slurm-users] Issue with x11

2019-05-14 Thread Mahmood Naderan
Hi I think I have asked this question before, but wasn't able to fix that. While "xclock" command works by "ssh -Y", srun with x11 option fails to opens xclock. [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-0 --account y4 --partition RUBY -n 1 -c 4 --mem=1GB xclock srun: error: Cannot forwa

Re: [slurm-users] Job dispatching policy

2019-04-30 Thread Mahmood Naderan
>Also why aren't you using the Slurm commands to run things? Which command? Regards, Mahmood

Re: [slurm-users] Job dispatching policy

2019-04-29 Thread Mahmood Naderan
ompute-0-1 ~]$ On the node, the program opened and I saw the GUI. Then I closed it. This is not the only problem. I also have problems with qemu runs. Regards, Mahmood On Sat, Apr 27, 2019 at 8:18 PM Chris Samuel wrote: > On 27/4/19 2:20 am, Mahmood Naderan wrote: > > >

Re: [slurm-users] Job dispatching policy

2019-04-27 Thread Mahmood Naderan
>More constructively - maybe the list can help you get the X11 applications to run using Slurm. >Could you give some details please? For example, I an not run this GUI program with salloc [mahmood@rocks7 ~]$ cat workbench.sh #!/bin/bash unset SLURM_GTIDS /state/partition1/ans190/v190/Framewor

Re: [slurm-users] Job dispatching policy

2019-04-23 Thread Mahmood Naderan
ugly, but there are some X11 applications that are not slurm friendly. Number of non slurm nodes though are small. On Tue, Apr 23, 2019, 18:45 Prentice Bisbal wrote: > > On 4/23/19 2:47 AM, Mahmood Naderan wrote: > > Hi, > How can I change the job distribution policy? Since so

[slurm-users] Job dispatching policy

2019-04-22 Thread Mahmood Naderan
Hi, How can I change the job distribution policy? Since some nodes are running non-slurm jobs, it seems that the dispatcher isn't aware of system load. Therefore, it assumes that the node is free. I want to change the policy based on the system load. Regards, Mahmood

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Mahmood Naderan
gt; And your job wants another 40G although the node only has 63G in total. > Best, > Andreas > > Am 17.04.2019 um 16:45 schrieb Mahmood Naderan : > > Hi, > Although it was fine for previous job runs, the following script now stuck > as PD with the reason about resour

[slurm-users] Pending with resource problems

2019-04-17 Thread Mahmood Naderan
Hi, Although it was fine for previous job runs, the following script now stuck as PD with the reason about resources. $ cat slurm_script.sh #!/bin/bash #SBATCH --output=test.out #SBATCH --job-name=g09-test #SBATCH --ntasks=20 #SBATCH --nodelist=compute-0-0 #SBATCH --mem=40GB #SBATCH --account=z7 #

Re: [slurm-users] Getting current memory size of a job

2019-04-10 Thread Mahmood Naderan
On Sunday, 7 April 2019 10:13:49 AM PDT Mahmood Naderan wrote: > > > The output of sstat shows the following error > > > > # squeue -j 821 > > JOBID PARTITION NAME USER ST TIME NODES > > NODELIST(REASON) 821 EMERALD g09-test shakerza R

Re: [slurm-users] Getting current memory size of a job

2019-04-07 Thread Mahmood Naderan
Hi again and sorry for the delay >When I was at Swinburne we asked for this as an enhancement here: >https://bugs.schedmd.com/show_bug.cgi?id=4966 The output of sstat shows the following error # squeue -j 821 JOBID PARTITION

[slurm-users] MPI job termination

2019-04-07 Thread Mahmood Naderan
Hi, A multinode MPI job terminated with the following messages in the log file =--= JOB DONE. =--= STOP 2 STOP 2 STOP 2 STOP 2 STOP 2 STOP 2 ST

[slurm-users] Getting current memory size of a job

2019-03-29 Thread Mahmood Naderan
Hi, Is there any way to view current memory allocation of a running job? With 'sstat' I can get only MAX values, including MaxVMSize, MaxRSS. Any idea? Regards, Mahmood

Re: [slurm-users] Multinode MPI job

2019-03-29 Thread Mahmood Naderan
I found out that the standard script that specifies the number of tasks and memory per cpu will do the same thing that I was expecting from packjob (heterogeneous job). #SBATCH --job-name=myQE #SBATCH --output=big-mem #SBATCH --ntasks=14 #SBATCH --mem-per-cpu=17G #SBATCH --nodes=6 #SBATCH --partit

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
I test with env strace srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in in the slurm script and everything is fine now!! This is going to be a nasty bug to find... Regards, Mahmood On Thu, Mar 28, 2019 at 9:18 PM Mahmood Naderan wrote: > Yes that wo

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
Yes that works. $ grep "Parallel version" big-mem Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors $ squeue

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
gards, Mahmood On Thu, Mar 28, 2019 at 8:23 PM Mahmood Naderan wrote: > The run is not consistent. I have manually test "mpirun -np 4 pw.x -i > mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine. > However, with the script "srun --pack-group=0 --ntasks=2 : --

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
The run is not consistent. I have manually test "mpirun -np 4 pw.x -i mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine. However, with the script "srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in" I see some errors in the output file which results in abortion

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
Thu, Mar 28, 2019 at 11:09 AM Chris Samuel wrote: > On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote: > > > Still only one node is running the processes > > What does "srun --version" say? > > Do you get any errors in your output file from the

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
>srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in Still only one node is running the processes $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 755+1QUARTZ myQE ghatee R 0:47 1 rocks7

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
t forward. Regards, Mahmood On Wed, Mar 27, 2019 at 11:03 PM Christopher Samuel wrote: > On 3/27/19 11:29 AM, Mahmood Naderan wrote: > > > Thank you very much. you are right. I got it. > > Cool, good to hear. > > I'd love to hear whether you get heterogenou

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
partition, whether > intentionally or not. > > Use > scontrol show partition CLUSTER > to view it. > > On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan > wrote: > >> So, it seems that it is not an easy thing at the moment! >> >> >Partitions are defined by

Re: [slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Mahmood Naderan
dify user ghatee set GrpTRES=mem=-1 > > Similar for other TRES settings > > On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan > wrote: > >> Hi, >> I want to remove a user's memory limit. Currently, I see >> >> # sacctmgr list association format=accou

[slurm-users] Remove memory limit from GrpTRES

2019-03-27 Thread Mahmood Naderan
Hi, I want to remove a user's memory limit. Currently, I see # sacctmgr list association format=account,user,partition,grptres,maxwall | grep ghatee local ghatee cpu=16 z5 ghatee quartz cpu=16,mem=1+ 30-00:00:00 I have modified with different number of

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
2 PM Christopher Samuel wrote: > On 3/27/19 8:39 AM, Mahmood Naderan wrote: > > > mpirun pw.x -imos2.rlx.in <http://mos2.rlx.in> > > You will need to read the documentation for this: > > https://slurm.schedmd.com/heterogeneous_jobs.html > > Especially note bo

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
>If your SLURM version is at least 18.08 then you should be able to do it with an heterogeneous job. See https://slurm.schedmd.com/>heterogeneous_jobs.html >From the example in that page, I have written this #!/bin/bash #SBATCH --job-name=myQE

[slurm-users] Multinode MPI job

2019-03-25 Thread Mahmood Naderan
Hi Is it possible to submit a multinode mpi job with the following config: Node1: 16 cpu, 90GB Node2: 8 cpu, 20GB ? Regards, Mahmood

[slurm-users] Analyzing a stuck job

2019-02-14 Thread Mahmood Naderan
Hi, One job is in RH state which means JobHoldMaxRequeue. The output file, specified by --output shows nothing suspicious. Is there any way to analyze the stuck job? Regards, Mahmood

Re: [slurm-users] salloc with bash scripts problem

2019-01-03 Thread Mahmood Naderan
p cores=1 -hda win7_x64_snap.img -boot c -usbdevice tablet -enable-kvm -device e1000,netdev=host_files -netdev user,net= 10.0.2.0/24,id=host_files,restrict=off,smb=/home/$USERN,smbserver=10.0.2.4 Regards, Mahmood On Thu, Jan 3, 2019 at 6:21 AM Chris Samuel wrote: > On 30/12/18 9:41 am

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan
So, I get [mahmood@rocks7 ~]$ salloc --spankx11 srun ./run_qemu.sh salloc: Granted job allocation 281 srun: error: Bad value for --x11: (null) srun: error: Invalid argument ((null)) for environment variable: SLURM_SPANK__SLURM_SPANK_OPTION_spankx11_spankx11 salloc: Relinquishing job allocation 281

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan
I have included my login node in the list of nodes. Not all cores are included though. Please see the output of "scontrol" below [mahmood@rocks7 ~]$ scontrol show nodes NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUTot=32 CPULoad=31.96 AvailableFeatures=rack-0,32CPUs Ac

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan
> -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On Jan 2, 2019, at 9:24 AM, Mahmood Naderan > wrote: > > > > I want to know if there any any way to push the n

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan
On Wed, Jan 2, 2019 at 6:54 PM Mahmood Naderan wrote: > I want to know if there any any way to push the node selection part on > slurm and not a manual thing that is done by user. > Currently, I have to manually ssh to a node and try to "allocate > resources" using

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan
llocate the resources AND initiate a > shell on one of the allocated nodes. > Best > Andreas > > Am 02.01.2019 um 14:43 schrieb Mahmood Naderan : > > Chris, > Can you explain why I can not get a prompt on a specific node while I have > passed the node name to salloc? > > [

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Mahmood Naderan
ted. So, I expected to override the default command. Regards, Mahmood On Sun, Dec 30, 2018 at 9:11 PM Mahmood Naderan wrote: > So, isn't possible to override that "default"? I mean the target node. In > the faq page it is possible to change the default command for

[slurm-users] salloc job is not shown in squeue

2019-01-02 Thread Mahmood Naderan
Hi, A user has submitted a QEMU job with "salloc --spankx11 ./run.sh" where run.sh contains some SBATCH directives and a qemu-system-x86_64 command. While in the terminal we see slurm message that job allocation is granted, we don't see any entry in the squeue. On the other hand, a qemu process is

Re: [slurm-users] exporting PATH inside sbatch script

2019-01-01 Thread Mahmood Naderan
Excuse me. That email was sent wrongly. Please ignore it. Regards, Mahmood On Tue, Jan 1, 2019 at 2:34 PM Mahmood Naderan wrote: > Hi, > May I know why the exported path in my sbatch script doesn't work? > > [mahmood@rocks7 qe]$ cat slurm_script.sh > #!/bin/bash

[slurm-users] exporting PATH inside sbatch script

2019-01-01 Thread Mahmood Naderan
Hi, May I know why the exported path in my sbatch script doesn't work? [mahmood@rocks7 qe]$ cat slurm_script.sh #!/bin/bash #SBATCH --ntasks=4 #SBATCH --mem=2G #SBATCH --partition=RUBY #SBATCH --account=y4 export PATH=$PATH:/share/apps/softwares/q-e-qe-6.2.1/bin mpirun pw.x mos2.in output.out [mah

Re: [slurm-users] salloc with bash scripts problem

2018-12-30 Thread Mahmood Naderan
r some of them, srun doesn't work while salloc works. On the other hand with srun I can choose a target nide while I can't do that with salloc. Has anybody faced such issues? On Sun, Dec 30, 2018, 20:15 Chris Samuel wrote: > On 30/12/18 7:16 am, Mahmood Naderan wrote: > > >

Re: [slurm-users] salloc with bash scripts problem

2018-12-30 Thread Mahmood Naderan
oc ... >> >> After you have the node you run >> >> $ hostname >> >> $ stun hostname >> >> Check that difference then do the same with script >> >> >> El dom., 30 de dic. de 2018 07:17, Mahmood Naderan >> escribió: >> >>> Hi

[slurm-users] salloc with bash scripts problem

2018-12-30 Thread Mahmood Naderan
Hi I have read that salloc has some problem running bash scripts while it is OK with binary files. The following script works fine from bash terminal, but salloc is unable to to that. $ cat slurm.sh #!/bin/bash ./script.sh files_android.txt report/android.txt $ salloc -n 1 -c 1 --mem=4G -p RUBY

[slurm-users] salloc doesn't care about GrpTRES

2018-12-27 Thread Mahmood Naderan
Hi I have noticed that an interactive job via salloc doesn't care about the number of cores that I have limited in the sacctmgr. The script is: #!/bin/bash #SBATCH --nodes=1 #SBATCH --cores=16 #SBATCH --mem=8G #SBATCH --partition=QEMU #SBATCH --account=q20_8 qemu-system-x86_64 -m 4096 -smp cores=1

Re: [slurm-users] salloc unable to find the file path

2018-12-26 Thread Mahmood Naderan
it the job. Regards, Mahmood On Wed, Dec 26, 2018 at 11:44 PM Mahmood Naderan wrote: > But the problem is that it doesn't find the file path. That is not related > to slurm parameters AFAIK. > > Regards, > Mahmood > > > > > On Wed, Dec 26, 2018 at 11:37

Re: [slurm-users] salloc unable to find the file path

2018-12-26 Thread Mahmood Naderan
information that is > privileged or not authorized to be disclosed. If you have received it by > mistake, delete it from your system. You should not copy the messsage nor > disclose its contents to anyone. Thanks. > > > El mié., 26 dic. 2018 a las 16:37, Mahmood Naderan () > esc

  1   2   3   >