Re: [slurm-users] Job dispatching policy
>More constructively - maybe the list can help you get the X11 applications to run using Slurm. >Could you give some details please? For example, I an not run this GUI program with salloc [mahmood@rocks7 ~]$ cat workbench.sh #!/bin/bash unset SLURM_GTIDS /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2 [mahmood@rocks7 ~]$ rocks run host compute-0-1 "ls /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2" Warning: untrusted X11 forwarding setup failed: xauth key data not generated /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2 [mahmood@rocks7 ~]$ salloc -w compute-0-1 -c 2 --mem=4G -p RUBY -A y4 ./workbench.sh salloc: Granted job allocation 938 ./workbench.sh: line 4: /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2: No such file or directory salloc: Relinquishing job allocation 938 Regards, Mahmood On Wed, Apr 24, 2019 at 12:33 PM John Hearns wrote: > I would suggest that if those applications really are not possible with > Slurm - then reserve a set of nodes for interactive use and disable the > Slurm daemon on them. > Direct users to those nodes. > > More constructively - maybe the list can help you get the X11 applications > to run using Slurm. > Could you give some details please? > >
Re: [slurm-users] Job dispatching policy
On 27/4/19 2:20 am, Mahmood Naderan wrote: ./workbench.sh: line 4: /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2: No such file or directory That doesn't look like it's related to Slurm to me, if the file itself exists then my suspicion is that it's a script and the interpreter it has in the first #! line does not exist. What does this command say on that node? file /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] job startup timeouts?
"time srun hostname" reports on the order of 0.2 seconds, so at least single node requests are handled expediently! it might be useful to simply collect the scaling curve for that test - is it linear, superlinear, fast to a point then blows up, etc? have you already looked at the difference between 1 task/node and all threads/node? regards, mark hahn.
[slurm-users] Setting NodeAddr dynamically and not in slurm.conf
I’m trying to set my cloud nodes dynamically. In my slurm.conf, I do not specify a NodeAddr. PartitionName=cloud Nodes=ALL Default=YES MaxTime=INFINITE State=UP NodeName=CPRuby1 CPUs=2 State=Cloud My PowerSave script will then update the slurm controller via control with an IP address that AWS assigns me. $scontrol update nodename=CPRuby1 nodeaddr= state=POWER_UP $sinfo cloud* up infinite 1idle CPRuby1 Awesome! I can see by both the log files that the controller and the slurmd are indeed communicating and waiting for a job. However, when I try: srun "echo hello world" srun: error: fwd_tree_thread: can't find address for host CPRuby1, check slurm.conf srun: error: Task launch for 7.0 failed on node CPRuby1: Can't find an address, check slurm.conf srun: error: Application launch failed: Can't find an address, check slurm.conf srun: Job step aborted: Waiting up to 32 seconds for job step to finish. It seems like the slurm controller is hell bent on me declaring the IP address ahead of time in slurm.conf. Is what I’m trying to do not possible? Using slurm 15.08.7 Thank you, Jordan
Re: [slurm-users] Setting NodeAddr dynamically and not in slurm.conf
On 27/4/19 10:07 pm, J.R. W wrote: Using slurm 15.08.7 Is that a typo for 18.08.7 ? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA