Hi,
I was perhaps a bit unprecise, sorry about that. The point of the "datasync"
tool and the "datasync-reaper" cronjob would be to replace or augment the
per-job /tmp that is cleaned at the end of each job. Datasets would then be
left on the node local disks until they are deleted by datasync-
But rsync -a will only help you if people are using identical or at
least overlapping data sets? And you don't need rsync to prune out old
files.
On 2/26/19 1:53 AM, Janne Blomqvist wrote:
> On 22/02/2019 18.50, Will Dennis wrote:
>> Hi folks,
>>
>> Not directly Slurm-related, but... We have a
All,
So I am using sacct to generate daily reports of job run times that is
imported into an external db for cost and projected use planning.
One thing I have noticed is that the END field for jobs with a state of
FAILED is "Unknown" but the ELAPSED field has the time it ran.
It seems to me
On 2/26/19 5:13 AM, Daniel Letai wrote:
I couldn't find any documentation regarding which api from pmix or ucx
Slurm is using, and how stable those api are.
There is information about PMIx at least on the SchedMD website:
https://slurm.schedmd.com/mpi_guide.html#pmix
For UCX I'd suggest test
Hi Chris,
I had
JobAcctGatherType=jobacct_gather/linux
TaskPlugin=task/affinity
ProctrackType=proctrack/cgroup
ProctrackType was actually unset but cgroup is the default.
I have now changed the settings to
JobAcctGatherType=jobacct_gather/cgroup
TaskPlugin=task/affinity,task/cgroup
Hi Jeffrey,
thanks for the hint regarding scontrol reconfig. That one drove me nuts
again.
I changed it to MaxArraySize=10. I restartet slurmctld, since i also
changed some features of the nodes.
I soon realized, that I only could submit --array=1-9, I then
already myself increased
Hi Merlin,
thanks for the answer, but our user is not in need of a high index, but
in fact in need of 100k taskids.
Best
Marcus
On 2/26/19 3:50 PM, Merlin Hartley wrote:
*max_array_tasks*
Specify the maximum number of tasks that be included in a job
array. The default limit is MaxA
max_array_tasks
Specify the maximum number of tasks that be included in a job array. The
default limit is MaxArraySize, but this option can be used to set a lower
limit. For example, max_array_tasks=1000 and MaxArraySize=11 would permit a
maximum task ID of 10, but limit the number of ta
Hi Loris,
Odd, we never saw that issue with memory efficiency being out of whack, just
the cpu efficiency. We are running 18.08.5-2 and here is a 512 core job run
last night:
Job ID: 18096693
Array Job ID: 18096693_5
Cluster: monsoon
User/Group: abc123/cluster
State: COMPLETED (exit code 0)
Nod
Also see "https://slurm.schedmd.com/slurm.conf.html"; for
MaxArraySize/MaxJobCount.
We just went through a user-requested adjustment to MaxArraySize to bump it
from 1000 to 1; as the documentation states, since each index of an array
job is essentially "a job," you must be sure to also adju
Hi Loris,
ok, THAT seems really much.
What do you use for gathering these values? jobacct_gather/cgroup?
If I remember right, there was a discussion lately in this list
regarding the JobAcctGatherType, yet I do not remember the outcame. I
remember though, that someone pointed to SLUG18 (or 17?
Hi all,
Is there any issue regarding which versions of pmix or ucx slurm
is compiled with? should I require installation of same versions
in the compute nodes?
I couldn't find any documentation regarding which api from pmix
or ucx Slurm is us
Hi Marcus,
Thanks for the response, but that doesn't seem to be the issue. The
problem seems to be that the raw data are incorrect:
Slurm data: ... Ncpus Nnodes NtasksReqmem PerNode Cput Walltime
Mem ExitStatus
Slurm data: ...50 2 1 10240 0 503611
Hi Loris,
I assume, this job used FAIRLY few memory, in the kb range, might that
be true?
replace
sub kbytes2str {
my $kbytes = shift;
if ($kbytes == 0) {
return sprintf("%.2f %sB", 0.0, 'M');
}
my $mul = 1024;
my $exp = int(log($kbytes) / log($mul));
my @pre
Hi,
With seff 18.08.5-2 we have been getting spurious results regarding
memory usage:
$ seff 1230_27
Job ID: 1234
Array Job ID: 1230_27
Cluster: curta
User/Group: x/x
State: COMPLETED (exit code 0)
Nodes: 4
Cores per node: 25
CPU Utilized: 9-16:49:18
CPU Effici
Hi,
I'd like to share our set-up as well, even though it's very
specialized and thus probably won't work in most places. However, it's
also very efficient in terms of budget when it does.
Our users don't usually have shared data sets, so we don't need high
bandwidth at any particular point -- the
Am 26.02.19 um 09:20 schrieb Tru Huynh:
> On Fri, Feb 22, 2019 at 04:46:33PM -0800, Christopher Samuel wrote:
>> On 2/22/19 3:54 PM, Aaron Jackson wrote:
>>
>>> Happy to answer any questions about our setup.
>>
>>
>
>> Email me directly to get added (I had to disable the Mailman web
> Coul
Hi Janne,
On Tue, Feb 26, 2019 at 3:56 PM Janne Blomqvist
wrote:
> When reaping, it searches for these special .datasync directories (up to
> a configurable recursion depth, say 2 by default), and based on the
> LAST_SYNCED timestamps, deletes entire datasets starting with the oldest
> LAST_SYNC
On Fri, Feb 22, 2019 at 04:46:33PM -0800, Christopher Samuel wrote:
> On 2/22/19 3:54 PM, Aaron Jackson wrote:
>
> >Happy to answer any questions about our setup.
>
>
>
> Email me directly to get added (I had to disable the Mailman web
Could you add me to that list?
Thanks
Tru
--
Dr Tr
On 2/26/19 9:07 AM, Marcus Wagner wrote:
Does anyone know, why per default the number of array elements is
limited to 1000?
We have one user, who would like to have 100k array elements!
What is more difficult for the scheduler, one array job with 100k
elements or 100k non-array jobs?
Where
Hello everyone,
I have another question ;)
Does anyone know, why per default the number of array elements is
limited to 1000?
We have one user, who would like to have 100k array elements!
What is more difficult for the scheduler, one array job with 100k
elements or 100k non-array jobs?
21 matches
Mail list logo