date:20190226

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Blomqvist Janne

Hi, I was perhaps a bit unprecise, sorry about that. The point of the "datasync" tool and the "datasync-reaper" cronjob would be to replace or augment the per-job /tmp that is cleaned at the end of each job. Datasets would then be left on the node local disks until they are deleted by datasync-

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Goetz, Patrick G

But rsync -a will only help you if people are using identical or at least overlapping data sets? And you don't need rsync to prune out old files. On 2/26/19 1:53 AM, Janne Blomqvist wrote: > On 22/02/2019 18.50, Will Dennis wrote: >> Hi folks, >> >> Not directly Slurm-related, but... We have a

[slurm-users] sacct end time for failed jobs

2019-02-26 Thread Brian Andrus

All, So I am using sacct to generate daily reports of job run times that is imported into an external db for cost and projected use planning. One thing I have noticed is that the END field for jobs with a state of FAILED is "Unknown" but the ELAPSED field has the time it ran. It seems to me

Re: [slurm-users] pmix and ucx versions compatibility with slurm

2019-02-26 Thread Christopher Samuel

On 2/26/19 5:13 AM, Daniel Letai wrote: I couldn't find any documentation regarding which api from pmix or ucx Slurm is using, and how stable those api are. There is information about PMIx at least on the SchedMD website: https://slurm.schedmd.com/mpi_guide.html#pmix For UCX I'd suggest test

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

2019-02-26 Thread Loris Bennett

Hi Chris, I had JobAcctGatherType=jobacct_gather/linux TaskPlugin=task/affinity ProctrackType=proctrack/cgroup ProctrackType was actually unset but cgroup is the default. I have now changed the settings to JobAcctGatherType=jobacct_gather/cgroup TaskPlugin=task/affinity,task/cgroup

Re: [slurm-users] maximum size of array jobs

2019-02-26 Thread Marcus Wagner

Hi Jeffrey, thanks for the hint regarding scontrol reconfig. That one drove me nuts again. I changed it to MaxArraySize=10. I restartet slurmctld, since i also changed some features of the nodes. I soon realized, that I only could submit --array=1-9, I then already myself increased

Re: [slurm-users] maximum size of array jobs

2019-02-26 Thread Marcus Wagner

Hi Merlin, thanks for the answer, but our user is not in need of a high index, but in fact in need of 100k taskids. Best Marcus On 2/26/19 3:50 PM, Merlin Hartley wrote: *max_array_tasks* Specify the maximum number of tasks that be included in a job array. The default limit is MaxA

Re: [slurm-users] maximum size of array jobs

2019-02-26 Thread Merlin Hartley

max_array_tasks Specify the maximum number of tasks that be included in a job array. The default limit is MaxArraySize, but this option can be used to set a lower limit. For example, max_array_tasks=1000 and MaxArraySize=11 would permit a maximum task ID of 10, but limit the number of ta

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

2019-02-26 Thread Christopher Benjamin Coffey

Hi Loris, Odd, we never saw that issue with memory efficiency being out of whack, just the cpu efficiency. We are running 18.08.5-2 and here is a 512 core job run last night: Job ID: 18096693 Array Job ID: 18096693_5 Cluster: monsoon User/Group: abc123/cluster State: COMPLETED (exit code 0) Nod

Re: [slurm-users] maximum size of array jobs

2019-02-26 Thread Jeffrey Frey

Also see "https://slurm.schedmd.com/slurm.conf.html"; for MaxArraySize/MaxJobCount. We just went through a user-requested adjustment to MaxArraySize to bump it from 1000 to 1; as the documentation states, since each index of an array job is essentially "a job," you must be sure to also adju

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

2019-02-26 Thread Marcus Wagner

Hi Loris, ok, THAT seems really much. What do you use for gathering these values? jobacct_gather/cgroup? If I remember right, there was a discussion lately in this list regarding the JobAcctGatherType, yet I do not remember the outcame. I remember though, that someone pointed to SLUG18 (or 17?

[slurm-users] pmix and ucx versions compatibility with slurm

2019-02-26 Thread Daniel Letai

Hi all, Is there any issue regarding which versions of pmix or ucx slurm is compiled with? should I require installation of same versions in the compute nodes? I couldn't find any documentation regarding which api from pmix or ucx Slurm is us

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

2019-02-26 Thread Loris Bennett

Hi Marcus, Thanks for the response, but that doesn't seem to be the issue. The problem seems to be that the raw data are incorrect: Slurm data: ... Ncpus Nnodes NtasksReqmem PerNode Cput Walltime Mem ExitStatus Slurm data: ...50 2 1 10240 0 503611

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

2019-02-26 Thread Marcus Wagner

Hi Loris, I assume, this job used FAIRLY few memory, in the kb range, might that be true? replace sub kbytes2str { my $kbytes = shift; if ($kbytes == 0) { return sprintf("%.2f %sB", 0.0, 'M'); } my $mul = 1024; my $exp = int(log($kbytes) / log($mul)); my @pre

[slurm-users] seff: incorrect memory usage (18.08.5-2)

2019-02-26 Thread Loris Bennett

Hi, With seff 18.08.5-2 we have been getting spurious results regarding memory usage: $ seff 1230_27 Job ID: 1234 Array Job ID: 1230_27 Cluster: curta User/Group: x/x State: COMPLETED (exit code 0) Nodes: 4 Cores per node: 25 CPU Utilized: 9-16:49:18 CPU Effici

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Ansgar Esztermann-Kirchner

Hi, I'd like to share our set-up as well, even though it's very specialized and thus probably won't work in most places. However, it's also very efficient in terms of budget when it does. Our users don't usually have shared data sets, so we don't need high bandwidth at any particular point -- the

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Adam Podstawka

Am 26.02.19 um 09:20 schrieb Tru Huynh: > On Fri, Feb 22, 2019 at 04:46:33PM -0800, Christopher Samuel wrote: >> On 2/22/19 3:54 PM, Aaron Jackson wrote: >> >>> Happy to answer any questions about our setup. >> >> > >> Email me directly to get added (I had to disable the Mailman web > Coul

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Raymond Wan

Hi Janne, On Tue, Feb 26, 2019 at 3:56 PM Janne Blomqvist wrote: > When reaping, it searches for these special .datasync directories (up to > a configurable recursion depth, say 2 by default), and based on the > LAST_SYNCED timestamps, deletes entire datasets starting with the oldest > LAST_SYNC

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Tru Huynh

On Fri, Feb 22, 2019 at 04:46:33PM -0800, Christopher Samuel wrote: > On 2/22/19 3:54 PM, Aaron Jackson wrote: > > >Happy to answer any questions about our setup. > > > > Email me directly to get added (I had to disable the Mailman web Could you add me to that list? Thanks Tru -- Dr Tr

Re: [slurm-users] maximum size of array jobs

2019-02-26 Thread Ole Holm Nielsen

On 2/26/19 9:07 AM, Marcus Wagner wrote: Does anyone know, why per default the number of array elements is limited to 1000? We have one user, who would like to have 100k array elements! What is more difficult for the scheduler, one array job with 100k elements or 100k non-array jobs? Where

[slurm-users] maximum size of array jobs

2019-02-26 Thread Marcus Wagner

Hello everyone, I have another question ;) Does anyone know, why per default the number of array elements is limited to 1000? We have one user, who would like to have 100k array elements! What is more difficult for the scheduler, one array job with 100k elements or 100k non-array jobs?

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

[slurm-users] sacct end time for failed jobs

Re: [slurm-users] pmix and ucx versions compatibility with slurm

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

Re: [slurm-users] maximum size of array jobs

Re: [slurm-users] maximum size of array jobs

Re: [slurm-users] maximum size of array jobs

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

Re: [slurm-users] maximum size of array jobs

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

[slurm-users] pmix and ucx versions compatibility with slurm

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

Re: [slurm-users] seff: incorrect memory usage (18.08.5-2)

[slurm-users] seff: incorrect memory usage (18.08.5-2)

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

Re: [slurm-users] maximum size of array jobs

[slurm-users] maximum size of array jobs

21 matches

Site Navigation

Mail list logo

Footer information