Re: [slurm-users] Job dispatching policy

2019-04-27 Thread Mahmood Naderan
 >More constructively - maybe the list can help you get the X11
applications to run using Slurm.
>Could you give some details please?



For example, I an not run this GUI program with salloc


[mahmood@rocks7 ~]$ cat workbench.sh
#!/bin/bash
unset SLURM_GTIDS
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2
[mahmood@rocks7 ~]$ rocks run host compute-0-1 "ls
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2"
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2
[mahmood@rocks7 ~]$ salloc -w compute-0-1 -c 2 --mem=4G -p RUBY -A y4
./workbench.sh
salloc: Granted job allocation 938
./workbench.sh: line 4:
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2: No such file or
directory
salloc: Relinquishing job allocation 938



Regards,
Mahmood




On Wed, Apr 24, 2019 at 12:33 PM John Hearns  wrote:

> I would suggest that if those applications really are not possible with
> Slurm - then reserve a set of nodes for interactive use and disable the
> Slurm daemon on them.
> Direct users to those nodes.
>
> More constructively - maybe the list can help you get the X11 applications
> to run using Slurm.
> Could you give some details please?
>
>


Re: [slurm-users] Job dispatching policy

2019-04-27 Thread Chris Samuel

On 27/4/19 2:20 am, Mahmood Naderan wrote:

./workbench.sh: line 4: 
/state/partition1/ans190/v190/Framework/bin/Linux64/runwb2: No such file 
or directory


That doesn't look like it's related to Slurm to me, if the file itself 
exists then my suspicion is that it's a script and the interpreter it 
has in the first #! line does not exist.


What does this command say on that node?

file /state/partition1/ans190/v190/Framework/bin/Linux64/runwb2

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] job startup timeouts?

2019-04-27 Thread Mark Hahn
"time srun hostname" reports on the order of 0.2 seconds, so at least single 
node requests are handled expediently!


it might be useful to simply collect the scaling curve for that test - 
is it linear, superlinear, fast to a point then blows up, etc?

have you already looked at the difference between 1 task/node and all
threads/node?

regards, mark hahn.



[slurm-users] Setting NodeAddr dynamically and not in slurm.conf

2019-04-27 Thread J.R. W
I’m trying to set my cloud nodes dynamically. In my slurm.conf, I do not 
specify a NodeAddr.

PartitionName=cloud Nodes=ALL Default=YES MaxTime=INFINITE State=UP
NodeName=CPRuby1 CPUs=2 State=Cloud

My PowerSave script will then update the slurm controller via control with an 
IP address that AWS assigns me.

$scontrol update nodename=CPRuby1 nodeaddr= state=POWER_UP
$sinfo 
cloud*   up   infinite  1idle CPRuby1

Awesome! I can see by both the log files that the controller and the slurmd are 
indeed communicating and waiting for a job. However, when I try:

srun "echo hello world" 
   
srun: error: fwd_tree_thread: can't find address for host CPRuby1, check 
slurm.conf
srun: error: Task launch for 7.0 failed on node CPRuby1: Can't find an address, 
check slurm.conf
srun: error: Application launch failed: Can't find an address, check slurm.conf
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

It seems like the slurm controller is hell bent on me declaring the IP address 
ahead of time in slurm.conf. Is what I’m trying to do not possible?

Using slurm 15.08.7

Thank you,
Jordan

Re: [slurm-users] Setting NodeAddr dynamically and not in slurm.conf

2019-04-27 Thread Chris Samuel

On 27/4/19 10:07 pm, J.R. W wrote:


Using slurm 15.08.7


Is that a typo for 18.08.7 ?

--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA