[slurm-users] Correct way to do sbcast with sbatch

2020-04-18 Thread Hector Yuen
Hello, I have a very basic question about using sbcast.

I start from the master node (the one running slurmctld). And have my
binary there.

When I submit a job, I do this through sbatch. The first command I want to
run is an sbcast. But by then I am already inside one of the allocated
nodes for the job which doesn't have the binary in first place.

What is the correct way to use sbcast inside sbatch?

Thanks

-- 
-h


Re: [slurm-users] Correct way to do sbcast with sbatch

2020-04-18 Thread William Brown
I will admit that I have not used sbcast but from reading the man pages I think 
that it does not do what you hope.

 

The sbcast command will indeed run on the first allocated node, so the source 
file must be accessible from there.  The man page does say that shared file 
systems are a better solution than sbcast, and I think that is the clue; sbcast 
is designed for clusters where there is no shared file system available between 
all the nodes.   If there was such a shared file system, the main part of the 
script (that executes on every node including the first) could just copy any 
file from shared storage to the local storage.  That might be common where the 
shared storage is e.g. NFS which might not be very fast, and make use of local 
fast storage on each node.

 

It sounds like sbcast also has some useful options for copying different files 
to different subsets of allocated nodes; there is an example for heterogenous 
jobs: https://slurm.schedmd.com/heterogeneous_jobs.html

 

Hopefully some else has actually used it and can give an idea how to do what 
you need to do.

 

William

 

From: slurm-users  On Behalf Of Hector 
Yuen
Sent: 18 April 2020 09:06
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Correct way to do sbcast with sbatch

 

Hello, I have a very basic question about using sbcast.

 

I start from the master node (the one running slurmctld). And have my binary 
there.

 

When I submit a job, I do this through sbatch. The first command I want to run 
is an sbcast. But by then I am already inside one of the allocated nodes for 
the job which doesn't have the binary in first place.

 

What is the correct way to use sbcast inside sbatch?

 

Thanks


 

-- 

-h



Re: [slurm-users] Alternative to munge for use with slurm?

2020-04-18 Thread Daniel Letai

  
  
in v20.02 you can use jwt, as per
  https://slurm.schedmd.com/jwt.html


Only issue is getting libjwt for most rpm based distros.
The current libjwt configure;make dist-all doesn't work.
I had to cd into dist, and 'make rpm' to create the spec file,
  then rpmbuild -ba after placing the tar gz file in the SOURCES dir
  of rpmbuild tree.


Possibly just installing libjwt manually is easier for image
  based clusters.
HTH.





On 17/04/2020 22:42, Dean Schulze
  wrote:


  
  Is there an alternative to munge when running
slurm?  Munge issues are a common problem in slurm, and munge
doesn't give any useful information when a problem occurs.  An
alternative that at least gave some useful information when a
problem occurs would be a big improvement.


Thanks.