artially practical (to ensure users explicitly requesting slow nodes
instead of just dumping them on ancient Opterons). Also, each user
gets their own Account, so the QoS Grp limits apply to each human
separately. Accounts would also have absolute core limits.
Thank you for your thoughts!
Corey
eractively)
SLURM version: 17.02.5, compiled from source (after installing
Lua) using ./configure --prefix=/usr --sysconfdir=/etc/slurm
Any guidance to get me up and running would be greatly appreciated!
Thanks,
Nathan
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
gt;
>
Could you set AllowedRamSpace/AllowedSwapSpace in
/etc/slurm/cgroup.conf to some big number? That way the job memory
limit will be the cgroup soft limit, and the cgroup hard limit
which is when the kernel will OOM kill the job would be
"job_memory_limit * AllowedRamSpace" that is, some large value?
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi
<mailto:janne.blomqv...@aalto.fi>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
I'm sure someone has already blazed this trail before, but this is how
I am going about it.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
y and
outside the scope of the employment of the individual concerned. The company
will not accept any liability in respect of such communication, and the
employee responsible will be personally liable for any damages or other
liability arising. XMA Limited is registered in England and Wales (registered
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane,
Wilford, Nottingham, NG11 7EP
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
22.2 20.4 17 16.9799
* denotes the node where the batch script executes (node 0)
CPU usage is cumulative since the start of the job
Ryan
On 09/19/2016 11:13 AM, Ryan Cox wrote:
We use this script that we cobbled together:
https://github.com/BYUHPC/slurm-random/blob/master/rjobstat
We use this script that we cobbled together:
https://github.com/BYUHPC/slurm-random/blob/master/rjobstat. It assumes
that you're using cgroups. It uses ssh to connect to each node so it's
not very scalable but it works well enough for us.
Ryan
On 09/18/2016 06:42 PM, Igor Yakushin wrote:
ho
searched the documentation and I
just can’t seem to find any switch to enable that. Help me Obiwan
Kenobi, you’re my only hope!
--
Nick Eggleston
Missouri S&T
IT Research Support Services
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
ces
The George Washington University
725 21st Street
Washington, DC 20052
Suite 211, Corcoran Hall
==
On Fri, Apr 15, 2016 at 1:07 PM, Ryan Cox <mailto:ryan_...@byu.edu>> wrote:
Did you try this: --reservation=root_13
O
Did you try this: --reservation=root_13
On 04/15/2016 08:10 AM, Glen MacLachlan wrote:
scontrol update not allowing jobs
Dear all,
Wrapping up a maintenance period and I want to run some test jobs
before I release the reservation and allow regular user jobs to start
running. I've modified th
Coincidentally, I asked about that yesterday in a bug report:
http://bugs.schedmd.com/show_bug.cgi?id=2465. The short answer is to use
SchedulerParameters=assoc_limit_continue that was introduced in
15.08.8. It only works if the Reason for the job is something like
Assoc*Limit.
Ryan
On 02
SelectType = select/cons_res*
*SelectTypeParameters = CR_CORE_MEMORY*
What am I missing to get more than one job to run on a node?
Thanks in advance,
Brian Andrus
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
, and slurmctld decided the data was invalid and killed all jobs.
(I don't know if this is still a problem.)
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
We have seen similar issues on 14.11.8 but haven't bothered to diagnose
or report it. I think I've seen it twice so far out of dozens of new users.
Ryan
On 09/07/2015 09:16 AM, Loris Bennett wrote:
Hi,
This problem occurs with 14.11.8.
A user I set up today got the following error when su
Be sure to test it first before trying anything else:
https://stackoverflow.com/questions/18661976/reading-dev-cpu-msr-from-userspace-operation-not-permitted.
We ran into this issue once when we had a "trusted" person and we
couldn't easily grant him access to the MSRs. We couldn't find a goo
horized
use, disclosure, copying or the taking of any action in reliance on
the contents of this information is strictly prohibited. If you have
received this email in error, please immediately notify the sender via
telephone or return mail.
--
Ryan Cox
Operations Director
Fulton Supercom
dvanced for your assistance.
Jackie Scoggins
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
=
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
Jabber: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
On
demy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
Jabber: treyd...@tamu.edu <mailto:treyd...@tamu.edu>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
iomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novos...@rutgers.edu- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
On Apr 6, 2015, at 20:17, Ryan Cox wrote:
Chris,
Just have GPU users request the numbers of CPU cores that t
Chris,
Just have GPU users request the numbers of CPU cores that they need and
don't lie to Slurm about the number of cores. If a GPU user needs 4
cores and 4 GPUs, have them request that. That leaves 20 cores for
others to use.
Ryan
On 04/06/2015 03:43 PM, Christopher B Coffey wrote:
H
On 01/21/2015 09:23 AM, Bill Wichser wrote:
A user underneath gets the expected 0.009091 normalized shares since
there are a lot of fairshare=1 users there. The user3 gets basically
25x this value as the fairshare for user3=25
Yet the normalized shares is actually MORE than the normalized
ht out by a typo on
http://slurm.schedmd.com/gres.html where the example has GresType=gpu,bandwith
rather than GresTypes=...
Could you please fix the doc!
BTW. Slurm was quite ungracious about having that bad entry in slurm.conf
Regards,
Gareth
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
if you just want to see what was used, you can get the raw
usage using sacct. For example, for a given job, you can do something
like:
>
> sacct -X -a -j 1182128 --format
Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
>
> -
> Gary Skouson
>
>
>
ovember 25, 2014 9:51 AM
*To:* slurm-dev
*Subject:* [slurm-dev] Re: [ sshare ] RAW Usage
Thanks Ryan,
Is this value stored anywhere in the SLURM accounting DB? I could not
find any value for the JOB that corresponds to this RAW usage.
Roshan
-----
Raw usage is a long double and the time added by jobs can be off by a
few seconds. You can take a look at _apply_new_usage() in
src/plugins/priority/multifactor/priority_multifactor.c to see exactly
what happens.
Ryan
On 11/25/2014 10:34 AM, Roshan Mathew wrote:
Hello SLURM users,
http://s
Dave,
I have done testing on 5-6 year old hardware with 100,000 users randomly
distributed in 10,000 accounts with semi-random depths with most being
between 1-4 levels from root but some much deeper than that, plus
100,000 jobs pending. slurmctld startup time was really long but, after
gett
George,
Wouldn't a QOS with GrpNodes=10 accomplish that?
Ryan
On 10/30/2014 11:47 AM, Brown George Andrew wrote:
Hi,
I would like to have a partition of N nodes without statically
defining which nodes should belong to a partition and I'm trying to
work out the best way to achieve this.
Cu
Trey,
I'm not sure why your jobs aren't starting. Someone else will have to
answer that question.
You can model an organizational hierarchy a lot better in 14.11 due to
changes in Fairshare=parent for accounts. If you only want fairshare to
matter at the research group and user levels but
oping/wishing the values would be between 0.0 and
1.0, but I can work with 0.5 as the max value. It just means that I
need to double the PriorityWeightFairshare factor in order to achieve
the intended relative weighting between Fairshare, QOS, Partitions,
JobSize, Age.
Ed
*From:*Ryan Cox [mailto:
I assume you are using the default fairshare algorithm since you didn't
specify otherwise. F=2**(-U/S) where U is Effectv Usage (often
displayed in documentation as UE) and S is Norm Shares. See
http://slurm.schedmd.com/priority_multifactor.html under the heading
"The SLURM Fair-Share Formula
please notify us by e-mail or by telephone (+ 34
690207492). Any reproduction of this e-mail by whatsoever means and
any transmission or dissemination thereof to other persons is
prohibited. It should be deleted immediately from your system.
Idiria Sociedad Limitada reserves the right to take legal action
against any persons unlawfully gaining access to the content of any
external message it has emitted.
For additional information, please visit our website
http://www.idiria.com <http://www.idiria.com/>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
On 09/23/2014 11:27 AM, Trey Dockendorf wrote:
Has anyone used the Lua job_submit plugin and also allows multiple partitions? I'm not
even user what the partition value would be in the Lua code when a job is submitted with
"--partition=general,background", for example.
We do. We use the a
It was great to see so many of you at Slurm User Group Meeting. We
received several questions after our presentation and wanted to clarify
some things.
One of our "possible concerns" was about a "tiny user in a very active
account". To clarify, this is the scenario we were mentioning. We
So is there a way to achieve this using the confg file? Do I have to
use
accounting to enfoce the limits? Or is there another way that I
don't
see?
Best regards,
Uwe Sauter
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
slurm.user_msg("Test message")
return slurm.ERROR
On 08/07/2014 04:40 AM, Bjørn-Helge Mevik wrote:
I read in the NEWS for 14.03.0pre1:
-- Add mechanism for job_submit plugin to generate error message for srun,
salloc or sbatch to stderr. New argument added to job_submit function in
All,
There has been more conversation on
http://bugs.schedmd.com/show_bug.cgi?id=858. It might be good to post
future comments there so we have just one central location for
everything. No worries if you'd rather reply on the list.
Once a solution is ready I'll post something to the list
Thanks. I can certainly call it that. My understanding is that this
would be a slightly different implementation from Moab/Maui, but I don't
know those as well so I could be wrong. Either way, the concept is
similar enough that a more recognizable term might be good.
Does anyone else have
ore flexible than DRF in that it allows arbitrary charge
rates to be specified, I'm not sure it makes sense to specify rates different
from the DRF ones? Or if one does specify different rates, it might end up
breaking some of the fairness properties that are described in the DRF paper
:
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley
Can anyone explain what I'm doing in this setup?
-- max(∫(εὐδαιμονία)dt)
--
Ryan Cox
Opera
o take. The patch currently
implements charging for CPUs, memory (GB), and nodes.
Note: I saw a similar idea in a bug report from the University of
Chicago: http://bugs.schedmd.com/show_bug.cgi?id=858.
Ryan
On 07/25/2014 10:31 AM, Ryan Cox wrote:
Bill and Don,
We have wondered about this
sed and so the process is never ending.
Another solution is to simply trust the users and just keep reminding
them about allocations. They are usually a smart bunch and are quite
creative when it comes to getting jobs to run! So maybe I am concerned
over nothing at all and things will just
e value is
correct and here is why. Or do I just need to figure out a database
query to cull this information?
Thanks,
Bill
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
ored in an electronic records management system.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
apply
to our use case), see
http://tech.ryancox.net/2014/06/problems-with-slurm-prioritization.html.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
e nice for the
grad student to have administrative control over the subaccount since he
actually knows the students but not have it affect priority calculations.
Ryan
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
/ http://twitter.com/vlsci
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
gt; 'sched/builtin',
'SelectTypeParameters' => 'CR_Core_Memory',
'SelectType'=> 'select/cons_res',
--
Perfection is just a word I use occasionally with mustard.
--Atom Powers--
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
group feature to start swapping out the exceeding
50 MB or so... they would actually fit in the swap area
and the job should not be killed...
What am I missing here?
Should the code itself be aware of the given "mem.limit=9000MB"?
Thanks for any explanation.
MG
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
opment Manager
Computing Platforms
CSC - IT Center for Science Ltd.
E-Mail: olli-pekka.le...@csc.fi
Tel: +358 50 381 8604
skype: oplehto // twitter: ople
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
http://tech.ryancox.net
limit is being imposed about 5 minutes into the job.
Thanks
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
r
research computing center
university of chicago
773.702.1104
--
andy wettstein
hpc system administrator
research computing center
university of chicago
773.702.1104
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
mance. Is this
amount realistic?
Is there a more efficient method to control memory usage on nodes which
are shared?
Thank you for any advice,
Kevin
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
vements. Thanks.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
' to expand each
> NodeList.
>
> This gets... suboptimal at installations with large numbers of jobs in
> flight. Is there a better way?
>
> john
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
gt;
> On 06/19/2013 12:15 PM, Ryan Cox wrote:
>> Paul,
>>
>> We were discussing this yesterday due to a user not limiting the amount
>> of jobs hammering our storage. A QOS with a GrpJobs limit sounds like
>> the best approach for both us and you.
>>
>> R
first before
> putting a nail in it. From my look at the documentation I don't see
> anyway to do this other than what I stated above.
>
> -Paul Edmon-
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
e:
> Hi,
> I am configuring a cluster with computing nodes and two administration
> nodes ( with slurmctld and slurmdbd). But i want users to use another
> server for job submission. How can i do that ?
> It may be easy to do, but i can't find how in the documentation.
>
cess IDs(?) In any event, I'm guessing I'm not the first
>> person to run into this. Is there a recommended solution to
>> configure SLURM to track codes like this?
>>
>> Thanks,
>> ~Mike C.
>>
>>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
59 matches
Mail list logo