Hi Peter,
On 3/20/19 11:19 AM, Peter Steinbach
wrote:
[root@ernie
/]# scontrol show node -dd g1
NodeName=g1 CoresPerSocket=4
CPUAlloc=3 CPUTot=4 CPULoad=N/A
AvailableFeatures=(null)
ActiveFeat
After more tests, the situation clears a bit.
If "COREs=0,1" (etc) is present in the `gres.conf` file, then one can
inject gres jobs on a single core only by using
`--gres-flags=disable-bindung` if a non-gres job is running the same node.
If "COREs=0,1" is NOT present in `gres.conf`. then any
On 3/20/19 9:09 AM, Peter Steinbach wrote:
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf
file, everything stops working again. :/ Should I send around scontrol
outputs? And yes, I watched out to set the --mem flag for the job
submission this time.
Well there you've sa
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf
file, everything stops working again. :/ Should I send around scontrol
outputs? And yes, I watched out to set the --mem flag for the job
submission this time.
Best,
Peter
smime.p7s
Description: S/MIME Cryptographic Signat
Hi Philippe,
thanks for spotting this. This indeed appears to solve this first issue.
Now I can try to make the GPUs available and play with pinning etc.
Superb - if you happen to be at ISC, let me know. I'd buy you a
coffee/beer! ;)
Peter
smime.p7s
Description: S/MIME Cryptographic Sign
SelectTypeParameters= CR_Core_Memory
Therefore, there's no more memory available for incoming jobs (jobs 15 and 16).
Regards,
Philippe DS
- Mail original -
De: "Peter Steinbach"
À: slurm-users@lists.schedmd.com
Envoyé: Mercredi 20 Mars 2019 10:19:07
Objet: Re: [slur
Hi Chris,
I changed the initial state a bit (the number of cores per node was
misconfigured):
https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf
But that doesn't change things. Initially, I see this:
# sinfo -N -l
Wed Mar 20 09:03:26 2019
NODELIST NO
On 3/19/19 5:31 AM, Peter Steinbach wrote:
For example, let's say I have a 4-core GPU node called gpu1. A non-GPU job
$ sbatch --wrap="sleep 10 && hostname" -c 3
Can you share the output for "scontrol show job [that job id]" once you
submit this please?
Also please share "scontrol show node
Hi Benson,
As you can perhaps see from our slurm.conf, we have task affinity or similar
switches off. Along the same route, i also removed the core binding of the
GPUs. That is why, I am quite surprised, that slurm doesn’t allow new jobs in.
I am aware of the PCIe bandwidth implications of a GP
Hi,
Many MPI implementations will have some sort of core binding allocation
policy - which may impact such node sharing. Would these only be limited
to single CPU jobs? Can users request a particular core, for example for
a GPU based job some cores will have better memory transfer rates to the
I've read through the parameters. I am not sure if any of those would
help in our situation. What suggestions would you make? Note, it's not
the scheduler policy that appears to hinder us. It's about how slurm
keeps track of the generic resource and (potentially) binds it to
available cores. Th
Dear Eli,
thanks for your reply. The slurm.conf file I suggested lists this
parameter. We use
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
See also:
https://github.com/psteinb/docker-centos7-slurm/blob/18.08.5-with-gres/slurm.conf#L60
I'll check if that makes a difference.
On Tue, Mar 19, 2019 at 8:34 AM Peter Steinbach wrote:
>
> Hi,
>
> we are struggling with a slurm 18.08.5 installation of ours. We are in a
> situation, where our GPU nodes have a considerable number of cores but
> "only" 2 GPUs inside. While people run jobs using the GPUs, non-GPU jobs
> can ente
Hi,
we are struggling with a slurm 18.08.5 installation of ours. We are in a
situation, where our GPU nodes have a considerable number of cores but
"only" 2 GPUs inside. While people run jobs using the GPUs, non-GPU jobs
can enter alright. However, we found out the hard way, that the inverse
14 matches
Mail list logo