from:"Chaofeng Zhang"

[slurm-dev] why the env is the env of submit node, not the env of job running node.

2017-09-14 Thread Chaofeng Zhang

--ntasks-per-node=1 #SBATCH --cpus-per-task=1 env Jeff (ChaoFeng Zhang, 张超锋) PMP(r) zhang...@lenovo.com<mailto:zhang...@lenovo.com> HPC&AI | Cloud Software Architect (+86) - 18116117420 Software solution development(+8621) - 20590223 Shanghai, China

[slurm-dev] Re: why the env is the env of submit node, not the env of job running node.

2017-09-14 Thread Chaofeng Zhang

of job running node. On 14 September 2017 at 19:41, Chaofeng Zhang mailto:zhang...@lenovo.com>> wrote: On node A, I submit job file using sbatch command, the job is running on the node B, you will find that the output is not the env of node B, it is the env of node A. #!/bin/bash #SBATCH

[slurm-dev] Re: why the env is the env of submit node, not the env of job running node.

2017-09-15 Thread Chaofeng Zhang

#SBATCH --export=NONE solve my problem, thanks. -Original Message- From: Dr. Thomas Orgis [mailto:thomas.or...@uni-hamburg.de] Sent: Friday, September 15, 2017 3:11 PM To: slurm-dev Subject: [slurm-dev] Re: why the env is the env of submit node, not the env of job running node. Hi Zha

[slurm-dev] srun: error: _server_read: fd 18 error reading header: Connection reset by peer

2017-10-17 Thread Chaofeng Zhang

I met the error when using slurm. srun: error: _server_read: fd 18 error reading header: Connection reset by peer srun: error: step_launch_notify_io_failure: aborting, io error with slurmstepd on node 1 0: slurmstepd: error: execve(): singularity: No such file or directory srun: error: master: ta

[slurm-dev] How can I run multi job on one gpu

2017-10-20 Thread Chaofeng Zhang

multi job on the same gpus? I noticed :no_consume can be added to the Gres, at this time, I can run multi jobs, but there is no CUDA_VISIBLE_DEVICE can be found in the job env. Slurm.conf NodeName=node1 Gres=gpu:1 CPUs=4 State=UNKNOWN Thanks. Jeff (ChaoFeng Zhang, 张超锋) PMP® zhang

[slurm-dev] OverSubscribe can just be used for the resource cpu, whether it can be used for gpu

2017-10-20 Thread Chaofeng Zhang

Below is worked for cpu, with OverSubscribe, I can have more than 4 process in running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will be just 1 process in running status, the other are in pending status. The OverSubscribe can just be used for the resource cpu, whether it c

[slurm-dev] Re: How can I run multi job on one gpu

2017-10-21 Thread Chaofeng Zhang

challenge first. Can you run multiple GPU jobs from the command line without slurm? GPU sharing between multiple independent tasks has been tough. Thank you, Doug On Fri, Oct 20, 2017 at 12:34 AM, Chaofeng Zhang mailto:zhang...@lenovo.com>> wrote: First, the gpu is already set shared mode.

[slurm-dev] How to get the realtime output in the job output file

2017-10-26 Thread Chaofeng Zhang

Hi Guys When we submit one slurm job on the login node, job use env of login node on compute nodes, so we add #SBATCH export=None in the job file, then the job will use the env of compute node. We want to get the real-time output in the job out file, so we use this command to submit job file:

[slurm-dev] why the env is the env of submit node, not the env of job running node.

[slurm-dev] Re: why the env is the env of submit node, not the env of job running node.

[slurm-dev] Re: why the env is the env of submit node, not the env of job running node.

[slurm-dev] srun: error: _server_read: fd 18 error reading header: Connection reset by peer

[slurm-dev] How can I run multi job on one gpu

[slurm-dev] OverSubscribe can just be used for the resource cpu, whether it can be used for gpu

[slurm-dev] Re: How can I run multi job on one gpu

[slurm-dev] How to get the realtime output in the job output file

8 matches

Site Navigation

Mail list logo

Footer information