date:20171010

[slurm-dev] Re: MPI-Jobs on cluster - how to set batchhost

2017-10-10 Thread Selch, Brigitte (FIDF)

Hello, sorry for answering late… Thank you to everybody! I will try to increase NFS performance. Best regards, Brigitte Selch Von: John Hearns [mailto:hear...@googlemail.com] Gesendet: Donnerstag, 28. September 2017 16:45 An: slurm-dev Betreff: [slurm-dev] Re: MPI-Jobs on cluster - how to

[slurm-dev] Re: Camacho Barranco, Roberto ssirimu...@utep.edu

2017-10-10 Thread Benjamin Redling

Hello everybody, On 10/10/17 8:25 AM, Marcus Wagner wrote: For a quick view, manually starting the controller slurmctld -D -vvv good advice, for beginners (or a tired help-seeker) a hint to "-f" might be necessary. Without the current configuration running the central management daemon is I

[slurm-dev] file and directory permissions

2017-10-10 Thread Marcus Wagner

Hello, everyone. I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which files and directories, on the host or in a network directory. I'm a little confused about the description in the manpage o

[slurm-dev] Re: file and directory permissions

2017-10-10 Thread Marcus Wagner

Arghh, On 10/10/2017 10:32 AM, Marcus Wagner wrote: Hello, everyone. I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which files and directories, on the host or in a network directory. I'm a

[slurm-dev] Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND

Hello, I have a problem with 1 node in our cluster. It is exactly as all the other nodes (200 GB of temporary storage) Here is what I have in slurm.conf: # COMPUTES TmpFS=/local/scratch # NODES GresTypes=disk,gpu ReturnToService=2 NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0 TmpDisk=2

[slurm-dev] Re: Preemtion and signals

2017-10-10 Thread tegner

Thanks! I'm probably missing something basic, but I don't see any difference by applying the changes you suggest - the signals does still not seem to be effectuated until after the grace time is over. Could it be something wrong with how my partitions are defined? PartitionName=cheap Nodes=A

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Le Biot, Pierre-Marie

Hi Véronique, Did you check the result of slurmd -C on tars-XXX ? Regards, Pierre-Marie Le Biot From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr] Sent: Tuesday, October 10, 2017 12:02 PM To: slurm-dev Subject: [slurm-dev] Node always going to DRAIN state with reason=Low TmpDisk Hel

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND

Hello Pierre-Marie, First, thank you for your hint. I just tried. >slurmd -C NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=258373 TmpDisk=500 UpTime=0-20:50:54 The value for TmpDisk is erroneous. I do not know what can be the cause of this si

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Uwe Sauter

Hi, see the man page for slurm.conf: TmpFS Fully qualified pathname of the file system available to user jobs for temporary storage. This parameter is used in establishing a node's TmpDisk space. The default value is "/tmp". So it is using /tmp. You need to change that parameter to /local

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND

Hello Uwe, This is already done. Please have a look at my first email. In slurm.conf I have : # COMPUTES TmpFS=/local/scratch Regards, Véronique -- Véronique Legrand IT engineer – scientific calculation & software development https://research.pasteur.fr/en/member/veronique-legrand/ Cluster

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Le Biot, Pierre-Marie

Véronique, This not what I expected, I was thinking slurmd -C would return TmpDisk=204000 or more probably 129186 as seen in slurmctld log. I suppose that you already checked slurmd logs on tars-XXX ? Regards, Pierre-Marie Le Biot From: Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr] S

[slurm-dev] Re: Preemtion and signals

2017-10-10 Thread Bjørn-Helge Mevik

writes: > Thanks! > > I'm probably missing something basic, but I don't see any difference by > applying the changes you > suggest - the signals does still not seem to be effectuated until after the > grace time is over. I could remember the details wrong. You could write a shell script that

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND

Pierre-Marie, Here is what I have in slurmd.log on tars-XXX -sh-4.1$ sudo cat slurmd.log 2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message aggregation enabled: WindowMsgs=24, WindowTime=200 2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency setting not configu

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Thomas M. Payerle

I think Uwe was on the right track. It looks to me like the problem node is somehow thinking TmpFS=/tmp rather than /local/scratch. That seems to be consistent with what is being reported (TmpDisk = 500 ). I would check the slurm.conf/scontrol show config output on the problem node and confi

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Véronique LEGRAND

Here is what I get: -sh-4.1$ scontrol show config|grep TmpFS TmpFS = /local/scratch Véronique -- Véronique Legrand IT engineer – scientific calculation & software development https://research.pasteur.fr/en/member/veronique-legrand/ Cluster and computing group IT department Inst

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Le Biot, Pierre-Marie

Véronique, So that’s the culprit : 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) For a reason you have to determine, when slurmd starts on

[slurm-dev] Re: file and directory permissions

2017-10-10 Thread Loris Bennett

Hi Marcus, Marcus Wagner writes: > Hello, everyone. > > I'm also fairly new to slurm, still in a conceptual rather than a test or > productive phase. Currently I am still trying to find out where to create > which > files and directories, on the host or in a network directory. > I'm a little c

[slurm-dev] Re: file and directory permissions

2017-10-10 Thread Marcus Wagner

Thx Loris! On 10/11/2017 08:17 AM, Loris Bennett wrote: Hi Marcus, Marcus Wagner writes: Hello, everyone. I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which files and directories, on the

[slurm-dev] Preemtion and signals, v2

2017-10-10 Thread tegner

New thread since I have narrowed down the problem. Consider the script: ** #!/bin/bash #SBATCH -p cheap #SBATCH -n 32 #SBATCH -t 12:00:00 sig_term() { echo "function sig_term called. Exiting" echo 'sig_term' > slask_term echo $(date) >> slask_term } # associate the func

[slurm-dev] Re: MPI-Jobs on cluster - how to set batchhost

[slurm-dev] Re: Camacho Barranco, Roberto ssirimu...@utep.edu

[slurm-dev] file and directory permissions

[slurm-dev] Re: file and directory permissions

[slurm-dev] Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] Re: Preemtion and signals

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] Re: Preemtion and signals

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

[slurm-dev] Re: file and directory permissions

[slurm-dev] Re: file and directory permissions

[slurm-dev] Preemtion and signals, v2

19 matches

Site Navigation

Mail list logo

Footer information