[slurm-dev] why the env is the env of submit node, not the env of job running node.

2017-09-14 Thread Chaofeng Zhang
--ntasks-per-node=1 #SBATCH --cpus-per-task=1 env Jeff (ChaoFeng Zhang, 张超锋) PMP(r) zhang...@lenovo.com<mailto:zhang...@lenovo.com> HPC&AI | Cloud Software Architect (+86) - 18116117420 Software solution development(+8621) - 20590223 Shanghai, China

[slurm-dev] Re: why the env is the env of submit node, not the env of job running node.

2017-09-14 Thread Chaofeng Zhang
of job running node. On 14 September 2017 at 19:41, Chaofeng Zhang mailto:zhang...@lenovo.com>> wrote: On node A, I submit job file using sbatch command, the job is running on the node B, you will find that the output is not the env of node B, it is the env of node A. #!/bin/bash #SBATCH

[slurm-dev] Re: why the env is the env of submit node, not the env of job running node.

2017-09-15 Thread Chaofeng Zhang
#SBATCH --export=NONE solve my problem, thanks. -Original Message- From: Dr. Thomas Orgis [mailto:thomas.or...@uni-hamburg.de] Sent: Friday, September 15, 2017 3:11 PM To: slurm-dev Subject: [slurm-dev] Re: why the env is the env of submit node, not the env of job running node. Hi Zha

[slurm-dev] srun: error: _server_read: fd 18 error reading header: Connection reset by peer

2017-10-17 Thread Chaofeng Zhang
I met the error when using slurm. srun: error: _server_read: fd 18 error reading header: Connection reset by peer srun: error: step_launch_notify_io_failure: aborting, io error with slurmstepd on node 1 0: slurmstepd: error: execve(): singularity: No such file or directory srun: error: master: ta

[slurm-dev] How can I run multi job on one gpu

2017-10-20 Thread Chaofeng Zhang
multi job on the same gpus? I noticed :no_consume can be added to the Gres, at this time, I can run multi jobs, but there is no CUDA_VISIBLE_DEVICE can be found in the job env. Slurm.conf NodeName=node1 Gres=gpu:1 CPUs=4 State=UNKNOWN Thanks. Jeff (ChaoFeng Zhang, 张超锋) PMP® zhang

[slurm-dev] OverSubscribe can just be used for the resource cpu, whether it can be used for gpu

2017-10-20 Thread Chaofeng Zhang
Below is worked for cpu, with OverSubscribe, I can have more than 4 process in running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will be just 1 process in running status, the other are in pending status. The OverSubscribe can just be used for the resource cpu, whether it c

[slurm-dev] Re: How can I run multi job on one gpu

2017-10-21 Thread Chaofeng Zhang
challenge first. Can you run multiple GPU jobs from the command line without slurm? GPU sharing between multiple independent tasks has been tough. Thank you, Doug On Fri, Oct 20, 2017 at 12:34 AM, Chaofeng Zhang mailto:zhang...@lenovo.com>> wrote: First, the gpu is already set shared mode.

[slurm-dev] How to get the realtime output in the job output file

2017-10-26 Thread Chaofeng Zhang
Hi Guys When we submit one slurm job on the login node, job use env of login node on compute nodes, so we add #SBATCH export=None in the job file, then the job will use the env of compute node. We want to get the real-time output in the job out file, so we use this command to submit job file: