[slurm-users] GPU configuration

2021-12-10 Thread Giuseppe G. A. Celano
Hi, My cluster has 2 nodes, with the first having 2 gpus and the second 1 gpu. The states of both nodes is "drained" because "gres/gpu count reported lower than configured": any idea why this happens? Thanks. My .conf files are: slurm.conf AccountingStorageTRES=gres/gpu GresTypes=gpu NodeName=t

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-06 Thread Giuseppe G. A. Celano
Grazie Gennaro, It's working! On Mon, Dec 6, 2021 at 9:41 AM Gennaro Oliva wrote: > Ciao Giuseppe, > > On Mon, Dec 06, 2021 at 03:46:02AM +0100, Giuseppe G. A. Celano wrote: > > sinfo: symbol lookup error: sinfo: undefined symbol: slurm_conf > > srun: symbol loo

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-05 Thread Giuseppe G. A. Celano
xfree_ptr sacct: symbol lookup error: sacct: undefined symbol: slurm_destroy_selected_step Does anyone know the reason for that? Thanks. Best, Giuseppe On Sat, Dec 4, 2021 at 5:31 PM Giuseppe G. A. Celano < giuseppegacel...@gmail.com> wrote: > Hi Gennaro, > &

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-04 Thread Giuseppe G. A. Celano
am not sure whether I should try to uninstall my previous installation and reinstall slurm-wlm... On Sat, Dec 4, 2021 at 12:38 PM Gennaro Oliva wrote: > Ciao Giuseppe, > > On Sat, Dec 04, 2021 at 02:30:40AM +0100, Giuseppe G. A. Celano wrote: > > I have installed almost all

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
ent.so", whereas > libmariadb-dev provides "libmariadb.so" > -- > *From:* slurm-users on behalf of > Giuseppe G. A. Celano > *Sent:* Saturday, 4 December 2021 11:40 > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
10.4.22 On Sat, Dec 4, 2021 at 1:35 AM Brian Andrus wrote: > Which version of Mariadb are you using? > > Brian Andrus > On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote: > > After installation of libmariadb-dev, I have reinstalled the entire slurm > with ./configure + op

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
normally use) > make > make install > > on your DBD server after you installed the mariadb-devel package? > > -- > *From:* slurm-users on behalf of > Giuseppe G. A. Celano > *Sent:* Saturday, 4 December 2021 10:07 > *To:* Slurm User Commun

Re: [slurm-users] slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
The problem is the lack of /usr/lib/slurm/accounting_storage_mysql.so I have installed many mariadb-related packages, but that file is not created by slurm after installation: is there a point in the documentation where the installation procedure for the database is made explicit? On Fri, Dec

Re: [slurm-users] slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
:36:41.022] error: _slurm_persist_recv_msg: only read 0 of 2613 bytes [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error [2021-12-03T15:36:41.022] error: DBD_GET_RES failure: No error [2021-12-03T15:36:41.022] fatal: You are running with a database but for some reason we have no TRES

[slurm-users] slurmdbd does not work

2021-12-02 Thread Giuseppe G. A. Celano
OND failure: Unspecified error* Does anyone have a suggestion to solve this problem? Thank you very much. Best, Giuseppe

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-30 Thread Luecht, Jeff A
| Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 30, 2020, at 09:38, Luecht, Jeff A mail

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-30 Thread Luecht, Jeff A
and used 'scontrol --defaults job ' command. The CPU allocation now works as expected. I do have one question though - what is the benefit/recommendation of using srun to execute a process within SBATCH. We are running primarily python jobs, but need to also support R jobs. ---

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-29 Thread Luecht, Jeff A
HadoopTest UserId=** GroupId=** MCS_label=N/A Priority=4294901604 Nice=0 Account=(null) QOS=(null) JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=08:00:00 TimeMin=N/A SubmitTime=2020-0

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-29 Thread Luecht, Jeff A
before clicking on links, opening attachments, or responding. ** what leads you to believe that you're getting 2 CPU's instead of 1? 'scontrol show job ' would be a helpful first start. On Tue, Sep 29, 2020 at 9:56 AM Luecht, Jeff A wrote: > > I am working on my first ev

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-29 Thread Luecht, Jeff A
** GroupId=** MCS_label=N/A Priority=4294901604 Nice=0 Account=(null) QOS=(null) JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:06 TimeLimit=08:00:00 TimeMin=N/A SubmitTime=2020-09-29T10:40:09 EligibleTime=2020-09-2

[slurm-users] Memory per CPU

2020-09-29 Thread Luecht, Jeff A
I am working on my first ever SLURM cluster build for use as a resource manager in a JupyterHub Development environment. I have configured the cluster for SelectType of 'select/con_res' with DefMemPerCPU and MaxMemPerCPU of 16Gb. The idea is to essentially provide for jobs that run

[slurm-users] slurm password -what is impact when changing it

2020-09-14 Thread Braun, Ruth A
Is there any issue if I set/change the slurm account password?I'm running 19.05.x Current state is locked but I have to reset it periodically: # passwd --status slurm slurm LK 2014-02-03 -1 -1 -1 -1 (Password locked.) Best Regards, RB

[slurm-users] Installing GPU Features of Slurm 20

2020-06-22 Thread Petrillo, Neale A. (Contractor)
Hi! I'm trying to install Slurm 20.02 on my cluster with the GPU features. However, only my compute nodes have GPUs attached and so when I try to install the slurm-slurmctld RPM on my head node it fails saying it requires the NVIDIA control software. How do other folks work around this? Do you

Re: [slurm-users] SLURM in Virtual Machine

2019-09-12 Thread Jose A.
Dear all, thank you for your fast feedback. My initial idea was to run slurmctld and slurmdb in respective KVMs and running while keeping the worker nodes physical. From what I see that is a setup that works without problem. However, I also find interesting some of the suggestions that you

[slurm-users] SLURM in Virtual Machine

2019-09-12 Thread Jose A
Dear all, In the expansion of our Cluster we are considering to install SLURM within a virtual machine in order to simplify updates and reconfigurations. Does any of your have experience running SLURM in VMs? I would really appreciate if you could share your ideas and experiences. Thanks a

Re: [slurm-users] Changing node weights in partitions

2019-03-26 Thread Jose A
from one partition to another. That will allow that each job type, associated to an account, starts differently in different partitions. 4. Once a job start in one partition, the other submitted jobs are killed and get out of SLURM. It’s a bit more work but gets the effect I am looking for: that

Re: [slurm-users] Changing node weights in partitions

2019-03-23 Thread Jose A
Hello Chris, You got my point. I want a way in which a partition influences the priority with a node takes new jobs. Any tip will be really appreciated. Thanks a lot. Cheers, José > On 23. Mar 2019, at 03:38, Chris Samuel wrote: > >> On 22/3/19 12:51 pm, Ole Holm N

Re: [slurm-users] Changing node weights in partitions

2019-03-22 Thread Jose A
Dear Ole, Thanks for your fast reply. I really appreciate that. I had a look at your website and googled about “weight masks” but still have some questions. From your example I see that the mask definition is commented out. How to define what the mask means? If helps, I’ll put an easy

[slurm-users] Changing node weights in partitions

2019-03-22 Thread José A .
Dear all, I would like to create two partitions, A and B, in which node1 had a certain weight in partition A and a different one in partition B. Does anyone know how to implement it? Thanks very much for the help! Cheers, José

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-09 Thread Miguel A . Sánchez
Oh, thanks Paddy for your patch, it works very well !! Miguel A. Sánchez Gómez System Administrator Research Programme on Biomedical Informatics - GRIB (IMIM-UPF) Barcelona Biomedical Research Park (office 4.80) Doctor Aiguader 88 | 08003 Barcelona (Spain) Phone: +34/ 93 316 0522 | Fax: +34/ 93

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-08 Thread Miguel A . Sánchez
Hi and thanks for all your answers and sorry for the delay in my answer. Yesterday I have installed in the controller machine the Slurm-18.08.3 to check if with this last release the Seff command is working fine. The behavior has improve but I still receive a error message: # /usr/local/slurm

[slurm-users] Seff error with Slurm-18.08.1

2018-10-23 Thread Miguel A . Sánchez
y the seff that was compiled in the 17.11.0 version works fine. To compile the seff tool, from the source Slurm tree: cd contrib make make install I think the problem is in the perlapi. Could it be a bug? Any Idea about how can I fix this problem? Thanks a lot. -- Miguel A. Sánchez Gó

Re: [slurm-users] swap size

2018-09-23 Thread A
Ray I'm also on Ubuntu. I'll try the same test, but do it with and without swap on (e.g. by running the swapoff and swapon commands first). To complicate things I also don't know if the swapiness level makes a difference. Thanks Ashton On Sun, Sep 23, 2018, 7:48 AM Raymond Wan

Re: [slurm-users] swap size

2018-09-21 Thread A
Hi John! Thanks for the reply, lots to think about. In terms of suspending/resuming, my situation might be a bit different than other people. As I mentioned this is an install on a single node workstation. This is my daily office machine. I run alot of python processing scripts that have low CPU

[slurm-users] swap size

2018-09-21 Thread A
I have a single node slurm config on my workstation (18 cores, 256 gb ram, 40 Tb disk space). I recently just extended the array size to its current config and am reconfiguring my LVM logical volumes. I'm curious on people's thoughts on swap sizes for a node. Redhat these days recommend

[slurm-users] ubuntu 16.04 > 18.04

2018-09-12 Thread A
Thinking about upgrading to Ubuntu 18.04 on my workstation, where I am running a single node slurm setup. Any issues any one has run across in the update? Thanks! ashton

Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-20 Thread Juan A. Cordero Varelaq
I am just running an interactive job with "srun -I --pty /bin/bash" and then run "echo $SLURM_MEM_PER_NODE", but it shows nothing. Does it have to be defined in any conf file? On 20/08/18 09:59, Chris Samuel wrote: On Monday, 20 August 2018 4:43:57 PM AEST Juan A. C

Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-19 Thread Juan A. Cordero Varelaq
That variable does not exist somehow on my environment. Is it possible my Slurm version (17.02.3) does not include it? Thanks On 17/08/18 11:04, Bjørn-Helge Mevik wrote: Yes. It is documented in sbatch(1): SLURM_MEM_PER_CPU Same as --mem-per-cpu SLURM_MEM_PER_N

[slurm-users] Slurm Environment Variable for Memory

2018-08-17 Thread Juan A. Cordero Varelaq
Dear Community, does anyone know whether there is an environment variable, such as $SLURM_CPUS_ON_NODE, but for the requested RAM (by using --mem argument)? Thanks

[slurm-users] allocate more resources for a current interactive job

2018-06-18 Thread Juan A. Cordero Varelaq
Dear Slurm users, Is it possible to allocate more resources for a current job on an interactive shell? I just allocate (by default) 1 core and 2Gb RAM: srun -I -p main --pty /bin/bash The node and queue where the job is located has 120 Gb and 4 cores available. I just want to use more

Re: [slurm-users] Getting nodes in a partition

2018-05-18 Thread SLIM, HENK A.
Subject: [slurm-users] Getting nodes in a partition Hi, Is there any slurm variable to read the node names of a partition? There is an MPI option --hostfile which we can write the node names. I want to use something like this in the sbatch script: #SBATCH --partition=MYPART ... --hostfile

Re: [slurm-users] srun seg faults immediately from within sbatch but not salloc

2018-05-09 Thread a . vitalis
nd isolated this function as the culprit: static void _setup_env_working_cluster(void) With my configuration, this routine ended up performing a strcmp of two NULL pointers, which seg-faults on our system (and is not language-compliant

Re: [slurm-users] srun seg faults immediately from within sbatch but not salloc

2018-05-08 Thread a . vitalis
_setup_env_working_cluster(void) With my configuration, this routine ended up performing a strcmp of two NULL pointers, which seg-faults on our system (and is not language-compliant I would think?). My current understanding is that this is a slurm bug. The issue is rectifiable by simply giving the cluster

[slurm-users] Areas for improvement on our site's cluster scheduling

2018-05-07 Thread Jonathon A Anderson
their scheduling targets; however, every now and again, we have a user who has a relatively high-throughput (not HPC) workload that they're willing to wait a significant period of time for. They're low-priority work, but they put a few thousand jobs into the queue, and just sit and wait.

[slurm-users] srun seg faults immediately from within sbatch but not salloc

2018-05-07 Thread a . vitalis
Dear all, I am trying to set up a small cluster running slurm on Ubuntu 16.04. I installed slurm-17.11.5 along with pmix-2.1.1 on an NFS-shared partition. Installation seems fine. Munge is taken from the system package. Something like this: ./configure --prefix=/software/slurm/slurm-17.11.5

[slurm-users] constrain partition to a unique shell

2018-01-24 Thread Juan A. Cordero Varela
Dear users, I would like to force the use of only one type of shell, let's say, bash, on a partition that shares a node with another one. Do you know if it's possible to do it? What I actually want to do is to install a limited shell (lshell) on one node and force a given parti

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-23 Thread James A. Peltier
We put SSSD caches on a RAMDISK which helped a little bit with performance. - On 22 Jan, 2018, at 02:38, Alessandro Federico a.feder...@cineca.it wrote: | Hi John, | | just an update... | we not have a solution for the SSSD issue yet, but we changed the ACL | on the 2 partitions from

Re: [slurm-users] restrict application to a given partition

2018-01-16 Thread Juan A. Cordero Varelaq
I ended up with a more simple solution: I tweaked the program executable (a bash script), so that it inspects which partition it is running on, and if its the wrong one, it exits. Just added the following lines: if [ $SLURM_JOB_PARTITION == 'big' ]; then exi

Re: [slurm-users] restrict application to a given partition

2018-01-15 Thread Juan A. Cordero Varelaq
But what if the user knows the path to such application (let's say python command) and executes it on the partition he/she should not be allowed to? Is it possible through lua scripts to set constrains on software usage such as a limited shell, for instance? In fact, what I'

[slurm-users] restrict application to a given partition

2018-01-12 Thread Juan A. Cordero Varelaq
Dear Community, I have a node (20 Cores) on my HPC with two different partitions: big (16 cores) and small (4 cores). I have installed software X on this node, but I want only one partition to have rights to run it. Is it then possible to restrict the execution of an specific application to a

Re: [slurm-users] Changing resource limits while running jobs

2018-01-04 Thread Juan A. Cordero Varelaq
put in place. -Paul Edmon- On 1/4/2018 6:44 AM, Juan A. Cordero Varelaq wrote: Hi, A couple of jobs have been running for almost one month and I would like to change resource limits to prevent users from running so much time. Besides, I'd like to set AccountingStorageEnforce to qos

[slurm-users] Changing resource limits while running jobs

2018-01-04 Thread Juan A. Cordero Varelaq
Hi, A couple of jobs have been running for almost one month and I would like to change resource limits to prevent users from running so much time. Besides, I'd like to set AccountingStorageEnforce to qos,safe. If I make such changes would the running jobs be stopped (the user runnin

[slurm-users] which daemons should I restart when editing slurm.conf

2018-01-04 Thread Juan A. Cordero Varelaq
Hi, I have the following configuration: * head node: hosts the slurmctld and the slurmdbd daemons. * compute nodes (4): host the slurmd daemons. I need to change a couple of lines of the slurm.conf corresponding to the slurmctld. If I restart its service, should I also have to restart the

[slurm-users] slurm.spec-legacy - how to invoke

2017-12-21 Thread Braun, Ruth A
Can someone provide an example of using the rpmbuild command while specifying the slurm.spec-legacy file? I need to build the new version of slurm for RHEL6 and need to invoke the slurm.spec-legacy file (if possible) on this command line: # rpmbuild -tb slurm-17.11.1.tar.bz2 Regards, Ruth R. B

Re: [slurm-users] Can't start slurmdbd

2017-11-21 Thread Juan A. Cordero Varelaq
I guess mariadb-devel was not installed by the time another person installed slurm. I have a bunch of slurm-* rpms I installed using "yum localinstall ...". Should I installed them in another way or remove slurm? The file accounting_storage_mysql.so is bythe way absent on the machin

Re: [slurm-users] Can't start slurmdbd

2017-11-20 Thread Juan A. Cordero Varelaq
0/11/17 12:11, Lachlan Musicman wrote: On 20 November 2017 at 20:50, Juan A. Cordero Varelaq mailto:bioinformatica-i...@us.es>> wrote: $ systemctl start slurmdbd Job for slurmdbd.service failed because the control process exited with error code. See "systemctl st

[slurm-users] Can't start slurmdbd

2017-11-20 Thread Juan A. Cordero Varelaq
Hi, Slurm 17.02.3 was installed on my cluster some time ago but recently I decided to use SlurmDBD for the accounting. After installing several packages (slurm-devel, slurm-munge, slurm-perlapi, slurm-plugins, slurm-slurmdbd and slurm-sql) and MariaDB in CentOS 7, I created an SQL database:

Re: [slurm-users] Priority wait

2017-11-13 Thread A
I'm guessing you should have sent them to cluster Decepticon, instead In all seriousness though, provide the conf file. You might have accidentally set a maximum number of running jobs somewhere On Nov 13, 2017 7:28 AM, "Benjamin Redling" wrote: > Hi Roy, > >

Re: [slurm-users] Quick hold on all partitions, all jobs

2017-11-08 Thread Jonathon A Anderson
he IT team sent an email saying "complete network wide network outage tomorrow night from 10pm across the whole institute". Our plan is to put all queued jobs on hold, suspend all running jobs, and turning off the login node. I've just discovered that the partitions have a state,