to a
secondary MariaDB instance, but that train has passed.
The format of the archive files is not well documented. Does anyone have a
program (python/C/whatever) that will read a job_table_archive file and decode
it into a parsable structure?
Douglas O'Neal, Ph.D. (contractor)
Manager
Hello,
I need to use SLURM for a project. I installed it by this quick start guide
( https://ibmimaster.cs.uni-tuebingen.de/quickstart_admin.html ). First I
just want to run it on one cluster.
- I did steps 1 to 7, create the slurm user with my slurm binaries as home
dir
- created the necessary d
Script,
> Multiple Datasets). We eventually wrote an abstract utility to try to help
> them with the process:
>
>
> https://github.com/jtfrey/job-templating-tool
>
>
>
> May be of some use to you.
>
>
>
>
> On Jul 15, 2020, at 16:13 , c b wrote:
>
> I'
I'm trying to run an embarrassingly parallel experiment, with 500+ tasks
that all differ in one parameter. e.g.:
job 1 - script.py foo
job 2 - script.py bar
job 3 - script.py baz
and so on.
This seems like a case where having a slurm array hold all of these jobs
would help, so I could just submi
Hi,
I have a bunch of jobs that according to the slurm status have been running
for 30+ minutes, but in reality aren't running. When i go to the node
where the job is supposed to be, the processes aren't there (not showing up
in top or ps) and the job's stdout/stderr logs are empty. I know it's
running
simultaneously on each machine.
thanks
> Best regards
> Jürgen
>
> --
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50
ailable as far as slurm is concerned.
>
> Brian
> On 11/1/2019 10:52 AM, c b wrote:
>
> yes, there is enough memory for each of these jobs, and there is enough
> memory to run the high resource and low resource jobs at the same time.
>
> On Fri, Nov 1, 2019 at 1:37 PM Brian Andrus
e isn't enough memory available for it.
>
> Brian Andrus
> On 11/1/2019 7:42 AM, c b wrote:
>
> I have:
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
>
> On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn wrote:
>
>> > In theory, these sm
I tried setting a 5 minute time limit on some low resource jobs, and one
hour on high resource jobs, but my 5 minute jobs are still waiting behind
the hourlong jobs.
Can you suggest some combination of time limits that would work here?
On Fri, Nov 1, 2019 at 11:08 AM c b wrote:
> On my
rm knows,
> the low priority jobs will take longer to finish than just waiting for the
> current running jobs to finish.
>
>
>
> John
>
>
>
>
>
> *From: *slurm-users on behalf of
> c b
> *Reply-To: *Slurm User Community List
> *Date: *Friday, November 1,
I have:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn wrote:
> > In theory, these small jobs could slip in and run alongside the large
> jobs,
>
> what are your SelectType and SelectTypeParameters settings?
> ExclusiveUser=YES on partitio
Hi,
Apologies for the weird subject line...I don't know how else to describe
what I'm seeing.
Suppose my cluster has machines with 8 cores each. I have many large high
priority jobs that each require 6 cores, so each machine in my cluster runs
one of each of these jobs at a time. However, I als
he
cluster, and on some other machines just restrict the cores allocated to
slurm. For example, I want machine A to be unavailable to slurm from
9am-5pm Monday-Friday, machine B to only have 50% of its cores
available during this time, but machine C to be 100% available at all
times.
It sounds lik
son, SLURM doesn't allow access to the devices
Jul 10 13:54:23 imk-dl-01 slurmstepd[2232]: debug: Not allowing
access to device c 195:0 rwm(/dev/nvidia0) for job
Jul 10 13:54:23 imk-dl-01 slurmstepd[2232]: debug: Not allowing
access to device c 195:1 rwm(/dev/nvidia1) for job
Jul 10 13:
Why do you have?
SchedulerParameters = (null)
Is that even allowed
?
https://slurm.schedmd.com/sched_config.html
On Thu, Jan 11, 2018 at 1:39 PM, Colas Rivière
wrote:
> Hello,
>
> I'm managing a small cluster (one head node, 24 workers, 1160 total worker
> threads). The head node has t
15 matches
Mail list logo