date:20230719

Re: [slurm-users] MIG-Slice: Unavailable GRES

2023-07-19 Thread Groner, Rob

At some point when we were experimenting with MIG, I was being entirely frustrated in getting it to work until I finally removed the autodetect from gres.conf and explicitly listed the stuff instead. THEN it worked. I think you can find the list of files that are the device files using nvidia

[slurm-users] MIG-Slice: Unavailable GRES

2023-07-19 Thread Vogt, Timon

Dear Slurm Mailing List, I am experiencing a problem which affects our cluster and for which I am completely out of ideas by now, so I would like to ask the community for hints or ideas. We run a partition on our cluster containing multiple nodes with Nvidia A100 GPUs (40GB), which we have s

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-19 Thread Wilson, Steven M

I found that this is actually a known bug in Slurm so I'll note it here in case anyone comes across this thread in the future: https://bugs.schedmd.com/show_bug.cgi?id=10598 Steve From: slurm-users on behalf of Wilson, Steven M Sent: Tuesday, July 18, 202

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Xaver Stiensmeier

Hi Hermann, count doesn't make a difference, but I noticed that when I reconfigure slurm and do reloads afterwards, the error "gpu count lower than configured" no longer appears - so maybe it is just because a reconfigure is needed after reloading slurmctld - or maybe it doesn't show the error an

[slurm-users] MCNP6.2 test

2023-07-19 Thread Ozeryan, Vladimir

Hello everyone, Has anyone here ever ran MCNP6.2 parallel job via Slurm scheduler? I am looking for a simple test job to test my software compilation. Thank you, Vlad Ozeryan

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Timo Rothenpieler

On 19/07/2023 15:04, Jan Andersen wrote: Hmm, OK - but that is the only nvml.h I can find, as shown by the find command. I downloaded the official NVIDIA-Linux-x86_64-535.54.03.run and ran it successfully; do I need to install something else beside? A google search for 'CUDA SDK' leads directly

Re: [slurm-users] Notify users about job submit plugin actions

2023-07-19 Thread Jeffrey T Frey

In case you're developing the plugin in C and not LUA, behind the scenes the LUA mechanism is concatenating all log_user() strings into a single variable (user_msg). When the LUA code completes, the C code sets the *err_msg argument to the job_submit()/job_modify() function to that string, then

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Groner, Rob

Worth a try, but the documentation says that by default the count is the same as the number of files specified...so, should automatically be 1. If you want to stop the node from going to INVAL, you can always set config_overrides in slurm.conf. That will tell the node what it has, instead of w

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Hermann Schwärzler

Hi Xaver, I think you are missing the "Count=..." part in gres.conf It should read NodeName=NName Name=gpu File=/dev/tty0 Count=1 in your case. Regards, Hermann On 7/19/23 14:19, Xaver Stiensmeier wrote: Okay, thanks to S. Zhang I was able to figure out why nothing changed. While I did re

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Jan Andersen

Hmm, OK - but that is the only nvml.h I can find, as shown by the find command. I downloaded the official NVIDIA-Linux-x86_64-535.54.03.run and ran it successfully; do I need to install something else beside? A google search for 'CUDA SDK' leads directly to NVIDIA's page: https://docs.nvidia.co

Re: [slurm-users] slurmctld and slurmdbd on the server, mysql on remote

2023-07-19 Thread AMU

oups, i found my error, i forgot to remove JobCompHost, found it after reading this: https://bugs.schedmd.com/show_bug.cgi?id=2322#c5 sorry for the noise On 19/07/2023 14:51, Gérard Henry (AMU) wrote: Hello all, is it possible to have this configuration? i installed slurm on ubuntu 20 LTS, bu

[slurm-users] slurmctld and slurmdbd on the server, mysql on remote

2023-07-19 Thread AMU

Hello all, is it possible to have this configuration? i installed slurm on ubuntu 20 LTS, but slurmctld refuses to start with messages: [2023-07-19T14:37:59.563] Job completion MYSQL plugin loaded [2023-07-19T14:37:59.563] debug: /var/log/slurm/jobcomp doesn't look like a database name using

Re: [slurm-users] Notify users about job submit plugin actions

2023-07-19 Thread Angel de Vicente

Hello Lorenzo, Lorenzo Bosio writes: > I'm developing a job submit plugin to check if some conditions are met before > a job runs. > I'd need a way to notify the user about the plugin actions (i.e. why its jobs > was killed and what to do), but after a lot of research I could only write to >

Re: [slurm-users] Notify users about job submit plugin actions

2023-07-19 Thread Ole Holm Nielsen

Hi Lorenzo, On 7/19/23 14:22, Lorenzo Bosio wrote: > I'm developing a job submit plugin to check if some conditions are met > before a job runs. > I'd need a way to notify the user about the plugin actions (i.e. why its > jobs was killed and what to do), but after a lot of research I could only

[slurm-users] Notify users about job submit plugin actions

2023-07-19 Thread Lorenzo Bosio

Hello everyone, I'm developing a job submit plugin to check if some conditions are met before a job runs. I'd need a way to notify the user about the plugin actions (i.e. why its jobs was killed and what to do), but after a lot of research I could only write to logs and not the user shell. The

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Xaver Stiensmeier

Okay, thanks to S. Zhang I was able to figure out why nothing changed. While I did restart systemctld at the beginning of my tests, I didn't do so later, because I felt like it was unnecessary, but it is right there in the fourth line of the log that this is needed. Somehow I misread it and thoug

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Timo Rothenpieler

On 19/07/2023 11:47, Jan Andersen wrote: I'm trying to build slurm with nvml support, but configure doesn't find it: root@zorn:~/slurm-23.02.3# ./configure --with-nvml ... checking for hwloc installation... /usr checking for nvml.h... no checking for nvmlInit in -lnvidia-ml... yes configure: err

[slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Jan Andersen

I'm trying to build slurm with nvml support, but configure doesn't find it: root@zorn:~/slurm-23.02.3# ./configure --with-nvml ... checking for hwloc installation... /usr checking for nvml.h... no checking for nvmlInit in -lnvidia-ml... yes configure: error: unable to locate libnvidia-ml.so and/o

Re: [slurm-users] GRES and GPUs

2023-07-19 Thread Xaver Stiensmeier

Alright, I tried a few more things, but I still wasn't able to get past: srun: error: Unable to allocate resources: Invalid generic resource (gres) specification. I should mention that the node I am trying to test GPU with, doesn't really have a gpu, but Rob was so kind to find out that you do n

Re: [slurm-users] MIG-Slice: Unavailable GRES

[slurm-users] MIG-Slice: Unavailable GRES

Re: [slurm-users] Unconfigured GPUs being allocated

Re: [slurm-users] GRES and GPUs

[slurm-users] MCNP6.2 test

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

Re: [slurm-users] Notify users about job submit plugin actions

Re: [slurm-users] GRES and GPUs

Re: [slurm-users] GRES and GPUs

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

Re: [slurm-users] slurmctld and slurmdbd on the server, mysql on remote

[slurm-users] slurmctld and slurmdbd on the server, mysql on remote

Re: [slurm-users] Notify users about job submit plugin actions

Re: [slurm-users] Notify users about job submit plugin actions

[slurm-users] Notify users about job submit plugin actions

Re: [slurm-users] GRES and GPUs

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

[slurm-users] configure script can't find nvml.h or libnvidia-ml.so

Re: [slurm-users] GRES and GPUs

19 matches

Site Navigation

Mail list logo

Footer information