Hello,
On 09/24/2017 08:35 AM, Nadav Toledo
wrote:
defaults, passwd and data
Hey all,
We are trying to setup a Slurm cluster for both cpu and gpu
partitions for research and education(courses) in a computer
science faculty at my university
everything seems to work fine and we have managed to accomplish
almost everythingן¿½ needed except a few things:
A. Is it possible to setup a global defaults values for
srun/sbatch (i.e. number of cores ,email, etc)? if so, how can it
be done?
You might create a submit plugin (lua) or else use template sbatch
scripts, and a wrapper script that will fill out the templates
according to the user input
A2. Is it possible
to make some srun/sbatch parameters required(i.e a user cannot run
a job via srun unless specifying email)? if so, how?
B. We have active directory(AD) in our faculty, and We prefer
manage users/groups from there , is it possible? any guide
available somewhere?
Search this mailing list, this question pops up every now and again,
there is no builtin solution.
You should consider using accounting, but if you decide to
incorporate AD into slurm accounting, you will have to decide how to
group users and accounts (create correct rules).
C. What is the recommanded way to handle data files? meaning , a
user wants his data/code files (for example a data set of pictures
for gpu deep learning) to be accessible to the nodes allocated to
him and get the result back easily without sshing to those nodes(I
want to close the nodes to ssh if possible), so far we
investigatedן¿½ nfs(low preformance vs files locally on server),
nextcloud(file syncing back and forth), is there a better way we
overlooked?
Some form of shared storage, with an http file server, and a post
script (epilog in slurm speak) that would automatically send a URI
to the user's email? This will mean each job must create it's own
path for the server to publish.
D. We need to give a specific known user the ability to run his
jobs on specific nodes on specific hours while no other jobs
allowed to run concurrently(exclusion)
We saw there is reservation, but it takes the resources even if
that user didn't eventually use his reservation, another solution
was to create a partition with priority higher than all the others
, put this partition in down state and only give that user a right
to submit jobs to it, then put a script in crontab to change the
state of the partition in the time window needed,
What do you think? is there a more elegant way?
Be evil. Use fair share. Once that user's credit goes down, they
will pay more attention and cancel or use the reservation.
Our most common os is ubuntu, and we are using slurm 17.02.7
Thanks in advance for you time and effort, Nadav
--
![]()
|