[slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
4 billing 5 fsdisk 6 vmem 7 pages 8 gres gpu 1001 gres gpu:k20 1002 gres gpu:1080gtx 1003 Can anyone point out what am I missing? Thanks! Lou -- *Lou Nicotra* IT Systems Engi

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
using my login name. The default account for all users is "slt" Is this the cause of my problems? root@panther02 slurm# getent passwd lnicotra lnicotra:*:1498:1152:Lou Nicotra:/home/lnicotra:/bin/bash If so, how is this resolved as we use multiple servers and there are no local accounts for

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
fig` from the machine > having trouble > On Mon, Dec 3, 2018 at 12:10 PM Lou Nicotra > wrote: > > > > I'm running slurmd version 18.08.0... > > > > It seems that the system recognizes the GPUs after a slurmd restart. I > tuned debug to 5, restarted and the

Re: [slurm-users] GRES GPU issues

2018-12-03 Thread Lou Nicotra
wrote: > Is that a lowercase k in k20 specified in the batch script and nodename > and a uppercase K specified in gres.conf? > > On 12/03/2018 09:13 AM, Lou Nicotra wrote: > > Hi All, I have recently set up a slurm cluster with my servers and I'm > running into an issue whi

Re: [slurm-users] GRES GPU issues

2018-12-04 Thread Lou Nicotra
ee anything specifically wrong. The one thing i might try > is backing the software down to a 17.x release series. I recently > tried 18.x and had some issues. I can't say whether it'll be any > different, but you might be exposing an undiagnosed bug in the 18.x > branch > On Mon

Re: [slurm-users] GRES GPU issues

2018-12-04 Thread Lou Nicotra
, 2018 at 9:31 AM Lou Nicotra wrote: > Thanks Michael. I will try 17.x as I also could not see anything wrong > with my settings... Will report back afterwards... > > Lou > > On Tue, Dec 4, 2018 at 9:11 AM Michael Di Domenico > wrote: > >> unfortunately, someone sma

Re: [slurm-users] GRES GPU issues

2018-12-04 Thread Lou Nicotra
Cores=0 > NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=k80 > File=/dev/nvidia1 Cores=1 > NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080gtx File=/dev/nvidia0 > Cores=0 > NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080gtx File=/dev/nvidia1 > Cores=1 > > which ca

Re: [slurm-users] GRES GPU issues

2018-12-05 Thread Lou Nicotra
t recent release is 18.08.3. NEWS packed in the > > tarballs gives the fixes in the versions. I don't see any that would > > fit you case. > > > > > > On 12/04/2018 02:11 PM, Lou Nicotra wrote: > >> Brian, I used a single gres.conf file and distribute

Re: [slurm-users] GRES GPU issues

2018-12-05 Thread Lou Nicotra
ned a node that has two different nvidia > cards, so what was on what port became important, not because the > 'range' configuration caused problems. > > This wasn't a fresh install of 18.x - it was a 17.x installation that I > upgraded to 18.x. Not sure if tha

[slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-08 Thread Lou Nicotra
ild... My LD_LIBRARY_PATH is /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/: Can anyone provide suggestions on working out this issue? Thanks. -- LOU NICOTRA IT Systems Engineer - SLT Interactions LLC o: 908-673-1833 <781-405-5114> m: 908-451-6983 <78

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Lou Nicotra
1.el7.centos # [100%] Oh, well... Lou On Mon, Aug 12, 2019 at 1:32 AM Barbara KraĊĦovec wrote: > What if you try to run ldconfig manually before building the rpm? > > Cheers, > > Barbara > On 8/8/19 5:57 PM, Lou Nicotra wrote: > > I am running int

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-16 Thread Lou Nicotra
Are the nvidia libraries installed by RPM or a 'make install' on the box > you compiled it on? > > Brian Andrus > On 8/15/2019 7:53 AM, Lou Nicotra wrote: > > I have tried running ldconfig manually as suggested with > slurm-19.05.1-2 and it fails the same way... > erro

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-16 Thread Lou Nicotra
an into that trying to install tensorflow. > > If you can, downgrade to 10.0, which does a better job of installing > itself. > > Brian > On 8/16/2019 5:47 AM, Lou Nicotra wrote: > > Brian, the package is being built and installed on the master server. I > am testing b