Hello,
sorry for answering late…
Thank you to everybody!
I will try to increase NFS performance.
Best regards,
Brigitte Selch
Von: John Hearns [mailto:hear...@googlemail.com]
Gesendet: Donnerstag, 28. September 2017 16:45
An: slurm-dev
Betreff: [slurm-dev] Re: MPI-Jobs on cluster - how to
Hello everybody,
On 10/10/17 8:25 AM, Marcus Wagner wrote:
For a quick view, manually starting the controller
slurmctld -D -vvv
good advice,
for beginners (or a tired help-seeker) a hint to "-f" might be necessary.
Without the current configuration running the central management daemon
is I
Hello, everyone.
I'm also fairly new to slurm, still in a conceptual rather than a test
or productive phase. Currently I am still trying to find out where to
create which files and directories, on the host or in a network directory.
I'm a little confused about the description in the manpage o
Arghh,
On 10/10/2017 10:32 AM, Marcus Wagner wrote:
Hello, everyone.
I'm also fairly new to slurm, still in a conceptual rather than a test
or productive phase. Currently I am still trying to find out where to
create which files and directories, on the host or in a network
directory.
I'm a
Hello,
I have a problem with 1 node in our cluster. It is exactly as all the other
nodes (200 GB of temporary storage)
Here is what I have in slurm.conf:
# COMPUTES
TmpFS=/local/scratch
# NODES
GresTypes=disk,gpu
ReturnToService=2
NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0 TmpDisk=2
Thanks!
I'm probably missing something basic, but I don't see any difference by
applying the changes you suggest - the signals does still not seem to be
effectuated until after the grace time is over.
Could it be something wrong with how my partitions are defined?
PartitionName=cheap Nodes=A
Hi Véronique,
Did you check the result of slurmd -C on tars-XXX ?
Regards,
Pierre-Marie Le Biot
From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
Sent: Tuesday, October 10, 2017 12:02 PM
To: slurm-dev
Subject: [slurm-dev] Node always going to DRAIN state with reason=Low TmpDisk
Hel
Hello Pierre-Marie,
First, thank you for your hint.
I just tried.
>slurmd -C
NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6
ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
UpTime=0-20:50:54
The value for TmpDisk is erroneous. I do not know what can be the cause of this
si
Hi,
see the man page for slurm.conf:
TmpFS
Fully qualified pathname of the file system available to user jobs for
temporary storage. This parameter is used in
establishing a node's TmpDisk space. The default value is "/tmp".
So it is using /tmp. You need to change that parameter to /local
Hello Uwe,
This is already done. Please have a look at my first email. In slurm.conf I
have :
# COMPUTES
TmpFS=/local/scratch
Regards,
Véronique
--
Véronique Legrand
IT engineer – scientific calculation & software development
https://research.pasteur.fr/en/member/veronique-legrand/
Cluster
Véronique,
This not what I expected, I was thinking slurmd -C would return TmpDisk=204000
or more probably 129186 as seen in slurmctld log.
I suppose that you already checked slurmd logs on tars-XXX ?
Regards,
Pierre-Marie Le Biot
From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
S
writes:
> Thanks!
>
> I'm probably missing something basic, but I don't see any difference by
> applying the changes you
> suggest - the signals does still not seem to be effectuated until after the
> grace time is over.
I could remember the details wrong. You could write a shell script that
Pierre-Marie,
Here is what I have in slurmd.log on tars-XXX
-sh-4.1$ sudo cat slurmd.log
2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message aggregation
enabled: WindowMsgs=24, WindowTime=200
2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency setting
not configu
I think Uwe was on the right track.
It looks to me like the problem node is somehow thinking
TmpFS=/tmp rather than /local/scratch.
That seems to be consistent with what is being reported
(TmpDisk = 500 ).
I would check the slurm.conf/scontrol show config output
on the problem node and confi
Here is what I get:
-sh-4.1$ scontrol show config|grep TmpFS
TmpFS = /local/scratch
Véronique
--
Véronique Legrand
IT engineer – scientific calculation & software development
https://research.pasteur.fr/en/member/veronique-legrand/
Cluster and computing group
IT department
Inst
Véronique,
So that’s the culprit :
2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 Boards=1
Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 Uptime=74
CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
For a reason you have to determine, when slurmd starts on
Hi Marcus,
Marcus Wagner writes:
> Hello, everyone.
>
> I'm also fairly new to slurm, still in a conceptual rather than a test or
> productive phase. Currently I am still trying to find out where to create
> which
> files and directories, on the host or in a network directory.
> I'm a little c
Thx Loris!
On 10/11/2017 08:17 AM, Loris Bennett wrote:
Hi Marcus,
Marcus Wagner writes:
Hello, everyone.
I'm also fairly new to slurm, still in a conceptual rather than a test or
productive phase. Currently I am still trying to find out where to create which
files and directories, on the
New thread since I have narrowed down the problem.
Consider the script:
**
#!/bin/bash
#SBATCH -p cheap
#SBATCH -n 32
#SBATCH -t 12:00:00
sig_term()
{
echo "function sig_term called. Exiting"
echo 'sig_term' > slask_term
echo $(date) >> slask_term
}
# associate the func
19 matches
Mail list logo