This is a strange one
I have built a Slurm cluster using AWS parallelcluster and noticed that the
permissions of my /etc/sysconfig directory are broken!
I have found the logs that support this find but don't have any idea why this
is happening nor where to find the necessary script/config file t
Hi list
I have built a small cluster and have attached a few clients to it.
My clients can submit jobs so am confident that the service is setup
sufficiently.
What I would like to do is to deploy the slurm client into a docker container.
From within the docker container, I have setup munge and
-wise) and how do you want to use
them?
Brian Andrus
On 2/28/2023 9:49 AM, Jake Jellinek wrote:
> Hi all
>
> I come from a SGE/UGE background and am used to the convention that I can
> qrsh to a node and, from there, start a new qrsh to a different node with
> different paramete
Hi all
I come from a SGE/UGE background and am used to the convention that I can qrsh
to a node and, from there, start a new qrsh to a different node with different
parameters.
I've tried this with Slurm and found that this doesn’t work the same.
For example, if I issue an 'srun' command, I get
I cannot think of any way to do this within the Slurm configuration
I would solve this by having a wrapper run at boot time which started a new
sshd process on a different port which you secured (ie only that user could
connect) and then start this as part of your boot time scripts
If your scrip
, such
that Slurm is assigning all of memory to each job as it runs; you can verify
w/scontrol show job. If that's what's happening, try setting a DefMemPerCPU
value for your partition(s).
Best of luck,
Lyn
On Thu, May 26, 2022 at 1:39 PM Jake Jellinek
mailto:jakejelli...@outlook.c
000 is extremely low and might prevent jobs from starting!
> Run "slurmd -C" on the nodes to read appropriate node parameters for
> slurm.conf.
>
> I hope this helps.
>
> /Ole
>
>
>> On 26-05-2022 21:12, Jake Jellinek wrote:
>> Hi
>>
Hi
I am just building my first Slurm setup and have got everything running - well,
almost.
I have a two node configuration. All of my setup exists on a single HyperV
server and I have divided up the resources to create my VMs
One node I will use for heavy duty work; this is called compute001
O