[slurm-dev] Re: node selection

2017-10-19 Thread Steffen Grunewald
On Wed, 2017-10-18 at 09:36:42 -0600, Michael Di Domenico wrote: > > is there anyway after a job starts to determine why the scheduler > choose the series of nodes it did? > > for some reason on an empty cluster when i spin up a large job it's > staggering the allocation across a seemingly rando

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Chris Samuel
On Thursday, 19 October 2017 4:43:13 PM AEDT Nadav Toledo wrote: > so adding manually only works if I dont restart slurmctrld... That usually points to a communication problem for slurmdbd trying to tell the slurmctld about these changes via an RPC. What does this say? sacctmgr list clusters

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Nadav Toledo
slurmdbd and slurmctrld are on the same machine... sacctmgr list clusters format=cluster,controlhost,controlport Cluster ControlHost ControlPort -- --- imageproc+ 127.0.0.1 6817 Its still beta stage for testing, so

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Douglas Jacobsen
Are they both running as the same user? (They need to to trust each other's messages) On Oct 19, 2017 01:12, "Nadav Toledo" wrote: > slurmdbd and slurmctrld are on the same machine... > > sacctmgr list clusters format=cluster,controlhost,controlport >ClusterControlHost ControlPort >

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Nadav Toledo
Re: [slurm-dev] Re: Qos limits associations and AD auth yeah, slurmuser=slurmduser=root On 19/10/2017 11:17, Douglas Jacobsen wrote: Are they both running as the same user? (They need to to trust each other's messages) On Oct 19, 2017 01:12, "Nadav Toledo" <1nadavtol...@cs.techn

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Nadav Toledo
Re: [slurm-dev] Re: Qos limits associations and AD auth also running with sudo both slurmctrld and slurmdbd perhaps i should "sudo su -" first? On 19/10/2017 11:17, Douglas Jacobsen wrote: Are they both running as the same user? (They need to to trust each other's messages) O

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Nadav Toledo
Re: [slurm-dev] Re: Qos limits associations and AD auth Re: [slurm-dev] Re: Qos limits associations and AD auth I found a hint, running sacctmgr show problem: Cluster Account User Problem -- -- -- domain_name a

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Chris Samuel
On Thursday, 19 October 2017 7:41:37 PM AEDT Nadav Toledo wrote: > running : id -u domain_name\\username , does return its uid So your system is not finding users as just "username", but instead only as domain_name\\username which is probably not ideal. You probably want to see if you can find

[slurm-dev] Re: mysql job_table and step_table growth

2017-10-19 Thread Douglas Meyer
Thank you for your help Chris! sacctmgr list config | fgrep Purge PurgeEventAfter= 11 days PurgeJobAfter = 61 days PurgeResvAfter = 11 days PurgeStepAfter = 11 days PurgeSuspendAfter = 11 days [2017-10-18T09:04:52.098] slurmdbd version 15.08.3 started [2017

[slurm-dev] Re: "unrecognized key: OverSubscribe" for partition

2017-10-19 Thread Christian Leitold
Hi Benjamin, On 18 October 2017 at 15:13, Benjamin Redling wrote: oversubscribe fka. shared Thanks, I'll try that, so, Shared=YES in the partition config. > Search in the archive documentatin for 15.08: > https://slurm.schedmd.com/archive/ Great, thank you. It seems that --share is the old

[slurm-dev] Slurm version 17.02.8 is now available

2017-10-19 Thread Tim Wickberg
Slurm version 17.02.8 contains about 42 bug fixes developed over the past two months. Slurm downloads are available from https://www.schedmd.com/downloads.php Details about the changes are listed below. * Changes in Slurm 17.02.8 == -- Add 'slurmdbd:' to the account

[slurm-dev] Slurm User Group 2017 presentations online

2017-10-19 Thread Tim Wickberg
Many thanks to all the attendees, and especially to all those who presented at the Slurm User Group 2017 meeting in Berkeley. That you to NERSC as well for hosting, and I hope to see many of you at SLUG18 at CIEMAT in Madrid, Spain. PDFs of the presentations are online at http://slurm.schedm

[slurm-dev] Re: Qos limits associations and AD auth

2017-10-19 Thread Lachlan Musicman
On 19 October 2017 at 20:37, Chris Samuel wrote: > > On Thursday, 19 October 2017 7:41:37 PM AEDT Nadav Toledo wrote: > > > running : id -u domain_name\\username , does return its uid > > So your system is not finding users as just "username", but instead only as > domain_name\\username which is