[slurm-dev] Re: question about federation

2017-11-01 Thread Ole Holm Nielsen
I'm pretty sure that a single, central slurmdbd service is required for multiple, federated clusters. I think that's what ties multiple clusters together into a single "federation". You mention a problem with squeue, but you don't list the error messages. Are you sure that all nodes have i

[slurm-dev] Re: what does File=/dev/nvidai0 actually do?

2017-11-01 Thread Allan Streib
"Hwa, George" writes: > In example gres.conf, > >Name=gpu File=/dev/nvidia0 > > Does slurm actually read the device file and get information from it > for configuration/control? It seems to me that it at least will do an existence check. If the GPU device files are not there (e.g. drivers n

[slurm-dev] RE: : Re: what does File=/dev/nvidai0 actually do?

2017-11-01 Thread Hwa, George
Allan, thanks for the explanation. So the name "gpu" is not that generic. It actually means something to SLURM, or its plugin. This leads me another related question: how do I define a generic resource that isn't associated with any real device? Regards, George > -Original Message

[slurm-dev] RE: : Re: what does File=/dev/nvidai0 actually do?

2017-11-01 Thread Hwa, George
Allan, thanks for the explanation. So the name "gpu" is not that generic. It actually means something to SLURM, or its plugin. This leads me another related question: how do I define a generic resource that isn't associated with any real device? Regards, George > -Original Message

[slurm-dev] Slurm versions 16.05.11, 17.02.9 and 17.11.0rc2 are now available (CVE-2017-15566)

2017-11-01 Thread Tim Wickberg
Slurm versions 16.05.11, 17.02.9 and 17.11.0rc2 are now available, and include a series of recent bug fixes as well as a fix for a recently discovered security vulnerability (CVE-2017-15566). Downloads are available at https://www.schedmd.com/downloads.php . Ryan Day (LLNL) reported an issue

[slurm-dev] How to strictly limit the memory per CPU

2017-11-01 Thread 马银萍
hello, I want to limit the memory per cpu in my cluster, some settings in slurm.conf are like this: NodeName=c[01-10] CPUs=32 RealMemory=127360 PartitionName=C032 Nodes=c[01-10] MaxMemPerCPU=3980 DefMemPerCPU=3980 MaxCPUsPerNode=32 I have 10 nodes, each node has 127360M memory, and 32 CP

[slurm-dev] Re: How to strictly limit the memory per CPU

2017-11-01 Thread Christopher Samuel
On 02/11/17 14:34, 马银萍 wrote: > It means that he used only one cpu and asked for 125G memoey, so he used > most of the memory on that node, then it will affect other user's job, > this is invalid. > So is there any  way to strictly limit the avarage memory per CPU and > users can't override it? o

[slurm-dev] Re: question about federation

2017-11-01 Thread zhangtao102...@126.com
hi, I'll try to test it again. Thank you for your help, Ole Best regards. zhangtao102...@126.com From: Ole Holm Nielsen Date: 2017-11-01 15:36 To: slurm-dev Subject: [slurm-dev] Re: question about federation I'm pretty sure that a single, central slurmdbd service is required for multiple,