--ntasks-per-node=1
#SBATCH --cpus-per-task=1
env
Jeff (ChaoFeng Zhang, 张超锋) PMP(r)
zhang...@lenovo.com<mailto:zhang...@lenovo.com>
HPC&AI | Cloud Software Architect (+86) - 18116117420
Software solution development(+8621) - 20590223
Shanghai, China
of
job running node.
On 14 September 2017 at 19:41, Chaofeng Zhang
mailto:zhang...@lenovo.com>> wrote:
On node A, I submit job file using sbatch command, the job is running on the
node B, you will find that the output is not the env of node B, it is the env
of node A.
#!/bin/bash
#SBATCH
#SBATCH --export=NONE solve my problem, thanks.
-Original Message-
From: Dr. Thomas Orgis [mailto:thomas.or...@uni-hamburg.de]
Sent: Friday, September 15, 2017 3:11 PM
To: slurm-dev
Subject: [slurm-dev] Re: why the env is the env of submit node, not the env of
job running node.
Hi Zha
I met the error when using slurm.
srun: error: _server_read: fd 18 error reading header: Connection reset by peer
srun: error: step_launch_notify_io_failure: aborting, io error with slurmstepd
on node 1
0: slurmstepd: error: execve(): singularity: No such file or directory
srun: error: master: ta
multi job on the same gpus?
I noticed :no_consume can be added to the Gres, at this time, I can run multi
jobs, but there is no CUDA_VISIBLE_DEVICE can be found in the job env.
Slurm.conf
NodeName=node1 Gres=gpu:1 CPUs=4 State=UNKNOWN
Thanks.
Jeff (ChaoFeng Zhang, 张超锋) PMP®
zhang
Below is worked for cpu, with OverSubscribe, I can have more than 4 process in
running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will
be just 1 process in running status, the other are in pending status.
The OverSubscribe can just be used for the resource cpu, whether it c
challenge first. Can you run multiple GPU jobs from
the command line without slurm? GPU sharing between multiple independent tasks
has been tough.
Thank you,
Doug
On Fri, Oct 20, 2017 at 12:34 AM, Chaofeng Zhang
mailto:zhang...@lenovo.com>> wrote:
First, the gpu is already set shared mode.
Hi Guys
When we submit one slurm job on the login node, job use env of login node on
compute nodes, so we add #SBATCH export=None in the job file, then the job will
use the env of compute node.
We want to get the real-time output in the job out file, so we use this command
to submit job file: