Re: [slurm-users] gres with docker problem

2019-01-04 Thread Marcin Stolarek
I think that the main reason is the lack of access to some /dev "files" in your docker container. For singularity nvidia plugin is required, maybe there is something similar for docker... Cheers, Marcin - https://funinit.wordpress.com On Wed, 2 Jan 2019, 05:53 허웅 Hi Chris. > > > > T

Re: [slurm-users] Restore Last JOBID After Reinstall of Slurm Master Node?

2018-12-23 Thread Marcin Stolarek
Their is an option for that in slurm.conf, check man but it's something like FirstJobId ; ) Cheers, Marcin funinit.wordpress.com On Mon, 24 Dec 2018, 06:13 Hanby, Mike Howdy, > > > > We installed a new server to take over the duties of the Slurm master. I > imported our accounting database int

Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-08 Thread Marcin Stolarek
I have very similar issue for quite a time and I was unable to find its root cause. Are you using sssd and AD as a data source with only a subtree of entries searched - this is my case. Did you disable users enumeration? It also what I have. I didn’t find ang evidence that it’s related but... may

Re: [slurm-users] Accounting: set default account with no access

2018-11-07 Thread Marcin Stolarek
I had exactly the same requirement - you can find my notes from it here; https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/ cheers, Marcin wt., 6 lis 2018 o 20:48 Sam Hawarden napisał(a): > Hi Yair, > > > You can set maxsubmitjob=0 on an account. > > > The error mes

Re: [slurm-users] sprio/sacct priority question

2018-10-18 Thread Marcin Stolarek
As far as I remember sprio does the calculation on its own when executed and priority in job structure stored by slurmctl is updated periodically... maybe this is the answer ? cheers, Marcin śr., 17 paź 2018 o 00:42 Glen MacLachlan napisał(a): > > Hi all, > > I'm using slurm 17.02.8 and when I

Re: [slurm-users] Help with developing a lua job submit script

2018-10-09 Thread Marcin Stolarek
This should be quite easy.. if job_desc.min_cpus or job_desc.min_cpus < YOUR_NUMBER then job_desc.partition = "YourPartition" end Check slurm.h definition of job_descriptior and (small self advert but maybe helpful..) you can also check my blog post on job_submit/lua ( https://funinit.wordpress.

Re: [slurm-users] Spec-ing a Slurm DB server

2018-07-21 Thread Marcin Stolarek
>From my experience it's the question about your future sacct queries. If you are not going to list a lot of jobs that were executed long time ago vm with a few gb of ram should be fine. It depends on the numer of jobs you expect a lot of small or a few big. Nevertheless, if you think that you'll b

[slurm-users] how-to use job_submit_lua (notes from Centos 6)

2018-06-07 Thread Marcin Stolarek
I spent some time debugging issues I had working on lua script for job_submit_lua. Ended up with notes in form of blog post. Sharing for those who may have similar issues https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/ cheers, Marcin

Re: [slurm-users] sacctmgr - Use-case for 'Organisation' != 'Parent'

2018-04-26 Thread Marcin Stolarek
For me is a shortcut for description On Thu, 26 Apr 2018 at 15:29, Loris Bennett wrote: > Hi, > > I'm currently looking at ironing out a few crinkles in my account > hierarchy and was looking at the attributes 'Parent' and 'Organisation' > again. I use 'Parent' to set up the account hierarchy,

Re: [slurm-users] srun not allowed in a partition

2018-03-22 Thread Marcin Stolarek
Check config.log, is pkg-config aware of paths to your lua shared libraries? cheers, Marcin

[slurm-users] How to map slurm node state to "meningful state "

2018-03-20 Thread Marcin Stolarek
In our environment we're getting various statisting to grafana, where we have dashboards designed for IT team (either to be used as one displayed on TV or something we use from time to time to foresee future limitations or unsed resources ), but we also have dashboards for our management to help th

Re: [slurm-users] Fairshare factor not reflected in job priority

2018-03-15 Thread Marcin Stolarek
r him. cheers, Marcin 2018-03-15 11:00 GMT+01:00 Marcin Stolarek : > I'm working on a priority multifactor plugin configuration and I'm not > sure if I'm missing something or the behaviour I see is the result of bug. > Basically > > # sshare | grep XX > X

[slurm-users] Fairshare factor not reflected in job priority

2018-03-15 Thread Marcin Stolarek
I'm working on a priority multifactor plugin configuration and I'm not sure if I'm missing something or the behaviour I see is the result of bug. Basically # sshare | grep XX XX10.0714294367 0.031536 0.736368 which I read as fairshare factor = 0.

Re: [slurm-users] Problem with nodes appear as DOWN (Not responding) slurm 17.02.9

2018-02-06 Thread Marcin Stolarek
Check returntoservice parameter in slurm.conf On Mon, 5 Feb 2018 at 20:30, Guy - wrote: > Hi, > I've compiled and installed slurm on ubuntu. it works great but if I take > a node down by running slurmd stop and start, it keeps appearing as DOWN > (Not responding) > The only fix is restarting slu

Re: [slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Marcin Stolarek
We're using icinga2 storing accounting data in influxdb for grafana dashboards. In terms of monitoring I prefere end-user functionality, so apart from services we also have a plugin that submits a jobs to cluster (to idle nodes, with a few minutes of deadline) the job simply creates files on shared

Re: [slurm-users] Best practice: How much node memory to specify in slurm.conf?

2018-01-16 Thread Marcin Stolarek
I think that it depends on your kernel and the way the cluster is booted (for instance initrd size). You can check the memory used by kernel in dmesg output - search for the line starting with "Memory:". This is fixed. It may be also good idea to "reserve" some space for cache and buffers - check h

Re: [slurm-users] Stagein/Stageout

2018-01-06 Thread Marcin Stolarek
If nothing changed recently the shared filesystem like nfs/gpfs/lustre is a requirement for normal cluster configuration. You can workaround it with prologue/spank plugins but honestly I haven't seen real hpc cluster without shared filesystem. cheers, Marcin 2018-01-05 23:25 GMT+01:00 Andrew Mel

Re: [slurm-users] Show job command after completion with sacct

2017-12-10 Thread Marcin Stolarek
You can use slurmctl prologue to save it the way you want. cheers, Marcin 2017-11-30 0:25 GMT+01:00 Chris Samuel : > On 30/11/17 8:57 am, Jacob Chappell wrote: > > Using "scontrol show jobid X" I can see info about running jobs, including >> the command used to launch the job, the user's worki

Re: [slurm-users] Invoke squeue sort on submit time?

2017-12-10 Thread Marcin Stolarek
I don't see any reason. You can try the attached lines, I've also sent it to schedmd to check if there is any reason someone should not do that https://bugs.schedmd.com/show_bug.cgi?id=4496 cheers, Marcin 2017-12-08 23:08 GMT+01:00 E.M. Dragowsky : > Greetings -- > > According to the documenta

Re: [slurm-users] Increasing MaxArraySize

2017-11-28 Thread Marcin Stolarek
I think it's more related to your configuration than general slurm capabilities. For example if you have quite long prolog/epilog scripts it may be good idea to discourage users from submitting huge job arrays (with very short tasks?). In my case it's quite common to see users submitting arrays wi