[slurm-users] error no error

2025-02-12 Thread Ricardo Román-Brenes via slurm-users
Hello. Could someone enlighten me as to what this error message is? Feb 13 10:02:00 gpu1 slurmd[573705]: slurmd: error: slurm_msg_sendto: address:port=192.168.9.1:36698 msg_type=8001: No error -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-use

[slurm-users] error and output files

2024-12-08 Thread michaelmorgan937--- via slurm-users
Hi all, I have a program (for example x.x). When I run it as "x.x -I input -o output", I will get an "output" file, as well as output and error files from slurm. However, all the content of slurm output file is in the "output" file (and there are some extra content), so it is a waste to print

[slurm-users] error: Unable to contact slurm controller (connect failure)

2024-11-18 Thread Daniel Rodriguez Lopez (ext) via slurm-users
Dear all, We recently tried to fix our version of slurm in every node of our cluster. After the instalation (slurm 20.11.9) in one of the compute nodes, most of the commads (squeue, sinfo, scontrol show config etc) returns this error:  error: Unable to contact slurm controller (connect failu

[slurm-users] "error: slurm_auth_get_host: Lookup failed: Unknown host" in slurmctld.log

2024-08-06 Thread Chris Taylor via slurm-users
I keep getting this logged on my Slurm control host: [2024-08-06T20:31:40.196] error: slurm_auth_get_host: Lookup failed: Unknown host I don't see an identifiable pattern and I'm not sure how to troubleshoot. The jobs being submitted at or around that time seem fine and nobody's complained abou

[slurm-users] Error binding slurm stream socket: Address already in use, and GPU GRES verification

2024-07-23 Thread Shooktija S N via slurm-users
Hi, I am trying to set up Slurm with GPUs as GRES on a 3 node configuration (hostnames: server1, server2, server3). For a while everything looked fine and I was able to run srun --label --nodes=3 hostname which is what I use to test if Slurm is working correctly, and then it randomly stops. Tu

[slurm-users] error: unpack_header: protocol_version 9472 not supported

2024-06-05 Thread Arnuld via slurm-users
I have built Slurm 23.11.7 on two machines. Both are running Ubuntu 22.04. While Slurm runs fine on one machine, on the 2nd machine it does not. First machine is both a controller and a node while the 2nd machine is just a node. On both machines, I built the Slurm debian package as per the Slurm do

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Ryan Novosielski
Ah, I see — no, it’s 24.08. That’s why I didn’t find any reference to it. Carry on! :-D -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 97

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Jesse Aiton
Yeah, 24.0.8 is the bleeding edge version. I wanted to try the latest in case it was a bug in 20.x.x. I’m happy to go back to any older Slurm version but I don’t think that will matter much if the issue occurs on both Slurm 20 and Slurm 24. git clone https://github.com/SchedMD/slurm.git Thank

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Ryan Novosielski
On Jan 23, 2024, at 18:14, Jesse Aiton wrote: This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8 Thank you, Jesse I’m not sure what version you’re actually running, but I don’t believe there is a 24.0.8. The latest version I’m aware of is 23.11.2. -- #BlackLivesMatter __

[slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Jesse Aiton
Hello Slurm Folks, I have a weird issue where on the same server, which acts as both a controller and a node, slurmctld can’t find cred_munge.so slurmctld: debug3: Trying to load plugin /app/slurm-24.0.8/lib/slurm/cred_munge.so slurmctld: debug4: /app/slurm-24.0.8/lib/slurm/cred_munge.so: Does

Re: [slurm-users] error

2024-01-18 Thread Ole Holm Nielsen
On 1/18/24 17:42, Felix wrote: I started a new AMD node, and the error is as follows: "CPU frequency setting not configured for this node" extended looks like this: [2024-01-18T18:28:06.682] CPU frequency setting not configured for this node [2024-01-18T18:28:06.691] slurmd started on Thu, 18

[slurm-users] error

2024-01-18 Thread Felix
Hello I started a new AMD node, and the error is as follows: "CPU frequency setting not configured for this node" extended looks like this: [2024-01-18T18:28:06.682] CPU frequency setting not configured for this node [2024-01-18T18:28:06.691] slurmd started on Thu, 18 Jan 2024 18:28:06 +0200 [

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-04-03 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 15:51:51 +0200 schrieb Ole Holm Nielsen : > As for job scheduling, slurmctld may allocate a job to some powered-off > nodes and then calls the ResumeProgram defined in slurm.conf. From this > point it may indeed take 2-3 minutes before a node is up and running > slurmd, dur

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Ole Holm Nielsen
Hi Thomas, I think the Slurm power_save is not problematic for us with bare-metal on-premise nodes, in contrast to the problems you're having. We use power_save with on-premise nodes where we control the power down/up by means of IPMI commands as provided in the scripts which you will find i

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 14:42:33 +0200 schrieb Ben Polman : > I'd be interested in your kludge, we face a similar situation where the > slurmctld node > does not have access to the ipmi network and can not ssh to machines > that have access. > We are thinking on creating a rest interface to a contro

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Ben Polman
I'd be interested in your kludge, we face a similar situation where the slurmctld node does not have access to the ipmi network and can not ssh to machines that have access. We are thinking on creating a rest interface to a control server which would be running the ipmi commands Ben On 29-

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Dr. Thomas Orgis
Am Mon, 27 Mar 2023 13:17:01 +0200 schrieb Ole Holm Nielsen : > FYI: Slurm power_save works very well for us without the issues that you > describe below. We run Slurm 22.05.8, what's your version? I'm sure that there are setups where it works nicely;-) For us, it didn't, and I was faced with h

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Ole Holm Nielsen
Hi Thomas, FYI: Slurm power_save works very well for us without the issues that you describe below. We run Slurm 22.05.8, what's your version? I've documented our setup in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving T

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Dr. Thomas Orgis
Am Mon, 06 Mar 2023 13:35:38 +0100 schrieb Stefan Staeglich : > But this fixed not the main error but might have reduced the frequency of > occurring. Has someone observed similar issues? We will try a higher > SuspendTimeout. We had issues with power saving. We powered the idle nodes off, caus

[slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-06 Thread Stefan Staeglich
Hi, since a half year we using the suspend/resume support for Slurm. This works quite well but sometimes it breaks and no nodes are suspended or resumed anymore. In this case we see the following message in the log: error: power_save module disabled, NULL SuspendProgram A restart of slurmctld

[slurm-users] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-01 Thread Nousheen
Hello Everyone, I am using slurm version 21.08.5 and Centos 7. I successfully start slurmd on all compute nodes but when I start slurmctld on server node it gives the following error: *(base) [nousheen@nousheen ~]$ systemctl status slurmctld.service -l* ● slurmctld.service - Slurm controller da

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Christopher Samuel
On 12/1/21 5:51 am, Gestió Servidors wrote: I can’t syncronize before with “ntpdate” because when I run “ntpdate -s my_NTP_server”, I only received message “ntpdate: no server suitable for synchronization found”… Yeah, you'll need to make sure your NTP infrastructure is working first. There

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Gestió Servidors
Hi, I can't syncronize before with "ntpdate" because when I run "ntpdate -s my_NTP_server", I only received message "ntpdate: no server suitable for synchronization found"... Thanks.-- [cid:image001.jpg@01D7E6C2.E78DE900] Daniel Ruiz Molina Tècnic Mitjà Informàtic Arquitec

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-11-30 Thread Nicolas Greneche
Hi,I had the same issue with ntpd. My ntp service on clients did not synchronize because the drift with the ntp server was too large.Maybe you can synchronize with ntpdate before using ntp service on your clients.Regards,Le 30 nov. 2021 12:23, Gestió Servidors a écrit : Hello,   In last days,

[slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-11-30 Thread Gestió Servidors
Hello, In last days, my nodes are showing error "slurm_receive_msg_and_forward: Zero Bytes were transmitted or received". After reviewing all configuration, I have notice that problem is the time difference between nodes and server. If nodes are "bad" configured (time in the future or in the pa

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
Ok, a fresh start after installing the two recommended packages and things appear to be working. Thank for the help! On 9/23/21, 3:04 PM, "slurm-users on behalf of Hoot Thompson" wrote: Do I need to specify the json path in the configure process? On 9/23/21, 2:45 PM, "slurm-users o

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
Do I need to specify the json path in the configure process? On 9/23/21, 2:45 PM, "slurm-users on behalf of Hoot Thompson" wrote: If this useful, note that there's no attempt to build anything in the serializer/json directory. Making all in serializer make[4]: Entering directory

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
If this useful, note that there's no attempt to build anything in the serializer/json directory. Making all in serializer make[4]: Entering directory '/home/ubuntu/slurm-21.08.1/src/plugins/serializer' Making all in url-encoded make[5]: Entering directory '/home/ubuntu/slurm-21.08.1/src/plugins/

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
What's getting built is serializer_url_encoded.a serializer_url_encoded.la serializer_url_encoded.so if this helps. On 9/23/21, 2:10 PM, "slurm-users on behalf of Hoot Thompson" wrote: On Ubuntu 20.04 I installed ... libjson-c-dev Libhttp-parser-dev That work? No joy

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
On Ubuntu 20.04 I installed ... libjson-c-dev Libhttp-parser-dev That work? No joy if so. On 9/23/21, 1:30 PM, "slurm-users on behalf of Ole Holm Nielsen" wrote: On 23-09-2021 16:01, Hoot Thompson wrote: > In upgrading to 21.08.1, slurmctld status reports: > > Sep 23 13:49

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Ole Holm Nielsen
On 23-09-2021 16:01, Hoot Thompson wrote: In upgrading to 21.08.1, slurmctld status reports: Sep 23 13:49:52 ip-10-10-7-17 systemd[1]: Started Slurm controller daemon. Sep 23 13:49:52 ip-10-10-7-17 slurmctld[1323]: fatal: Unable to find plugin: serializer/json Sep 23 13:49:52 ip-10-10-7-17 s

[slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
In upgrading to 21.08.1, slurmctld status reports: Sep 23 13:49:52 ip-10-10-7-17 systemd[1]: Started Slurm controller daemon. Sep 23 13:49:52 ip-10-10-7-17 slurmctld[1323]: fatal: Unable to find plugin: serializer/json Sep 23 13:49:52 ip-10-10-7-17 systemd[1]: slurmctld.service: Main process

[slurm-users] error: DBD_SEND_MULT_MSG message from invalid uid 9920

2021-01-15 Thread Michael Smith
I’m new to SLURM and attempting to setup a new installation. I’ve built the 20.11.2 tools on CentOS 7, and now I’ve got the MariaDB running but the slurmdbd log file is full of: [2021-01-15T09:34:25.002] error: Processing last message from connection 10(192.168.1.16) uid(9920) [2021-01-15T09:3

Re: [slurm-users] error: user not found

2020-09-30 Thread Diego Zuccato
Il 30/09/20 12:33, Marcus Wagner ha scritto: > the submission process runs on the slurmctld, so the user must be known > there. It is. The frontend is the node users use to submit jobs and it's where slurmctld runs. The user is known (he's logged in via ssh). His home is available (NFS share visib

Re: [slurm-users] error: user not found

2020-09-30 Thread Marcus Wagner
Hi Diego, the submission process runs on the slurmctld, so the user must be known there. Best Marcus Am 30.09.2020 um 08:37 schrieb Diego Zuccato: Il 30/09/20 03:49, Brian Andrus ha scritto: Tks for the answer. That means the system has no idea who that user is. But which system? Being a

Re: [slurm-users] error: user not found

2020-09-29 Thread Diego Zuccato
Il 30/09/20 03:49, Brian Andrus ha scritto: Tks for the answer. > That means the system has no idea who that user is. But which system? Being a message generated by slurmctld, I thought it must be the frontend node. But, as I wrote, that system correctly identifies the user (he's logged in, 'id'

Re: [slurm-users] error: user not found

2020-09-29 Thread Brian Andrus
That means the system has no idea who that user is. If you are using /etc/passwd, that file is not synched on the slurm master node(s) If you are part of a domain or other shared directory (ldap, etc), your master is likely not configured right. If you are using SSSD, it is also possible yo

[slurm-users] error: user not found

2020-09-29 Thread Diego Zuccato
Hello all. One of the users is unable to submit jobs to our cluster. The first time he tries, he gets $ sbatch test.job sbatch: fatal: Invalid user id: 621049927 then: $ sbatch test.job sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified In slur

Re: [slurm-users] Error when running srun: error: task X launch failed: Invalid MPI plugin name

2020-04-27 Thread Josep Guerrero
Hi again, > > So does someone have any suggestion about what I could try? > > Please have a look at: > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954272 This seems to have worked. Thanks a lot! Just in case someone else is interested, that debian bug thread suggests the following wor

Re: [slurm-users] Error when running srun: error: task X launch failed: Invalid MPI plugin name

2020-04-27 Thread Gennaro Oliva
Hi Josep, On Mon, Apr 27, 2020 at 12:26:56PM +0200, Josep Guerrero wrote: > does not seem to have support for pmix. There seems to be an "openmpi" > option, > but I haven't been able to find documentation on how it is supposed to work. > So, as I understand the situation, Debian openmpi package

[slurm-users] Error when running srun: error: task X launch failed: Invalid MPI plugin name

2020-04-27 Thread Josep Guerrero
Dear all, I'm trying to install slurm, for the first time, as a queue managing system in a computing cluster. All of the nodes are using Debian 10, and for OpenMPI I'm using the distribution packages (openmpi 3.1.3): === $ ompi_info Package: Debian OpenMPI

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-08 Thread Alfonso Núñez Slagado
Thanks guys, it took me a while to check the solutions you proposed and both of them works. The mariadb downgrade is a bit tricky using "rpm -e --nodeps" and the solution Ole proposed keep the system updated to the MariaDB 10.4. @Ole, thanks for the guide, is really usefull Alfonso El 7/4/20

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-07 Thread Ole Holm Nielsen
Hi Alfonso, You just need to get the CentOS 7 prerequisites right, check out my Slurm installation Wiki page: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms HTH, Ole On 07-04-2020 13:07, Alfonso Núñez Slagado wrote:     I'm trying to build rpm packages running follow

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-07 Thread William Brown
Search the list archive, I had the same and it was because I had MariaDB installed but as the packaging of MariaDB changed I was missing a required RPM. They split it differently and there is another RPM prerequisite. Can't recall the name just now, but search the archive. William On Tue, 7 Apr

[slurm-users] Error buildind rpm on Centos 7

2020-04-07 Thread Alfonso Núñez Slagado
Hi all,     I'm trying to build rpm packages running following comands but i get allways the same error. I've checked the BUILD config.log and /usr/bin/mysql_config seems to be found... Any clue? rpmbuild -ta slurm-20.02.1.tar.bz2 or rpmbuild -ta slurm-19.05.6.tar.bz2   RPM build errors:  

[slurm-users] error: _find_node_record: lookup failure in Slurm 20.02.0

2020-03-28 Thread Giovanni Torres
Hello Everyone, In 19.05 and previous versions, I was able to run multiple nodes on the same virtual machine or container. While upgrading to 20.02.0, when I run sbatch to kick off a job, it is stuck in the CF (Configuring) state. [root@slurmcluster log]# squeue JOBID PARTITION N

Re: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

2020-03-13 Thread Steininger, Herbert
somebody knows what could be done to get slurmdbd up? is there an option to prevent this upgrade to mysql? Would I have to rebuild slurm? Thanks in Advance, Herbert -Ursprüngliche Nachricht- Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von Steininger

[slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

2020-03-12 Thread Steininger, Herbert
Hello, while upgrading slurm from 19.05 to 20.02 an error occurred while trying to upgrade slurmdbd first. The error is: slurmdbd: debug: Munge authentication plugin loaded slurmdbd: debug2: mysql_connect() called for db slurm_acct_db slurmdbd: debug2: Attempting to connect to slurmmaster:3306

[slurm-users] error: _x11_socket_read: slurm_open_msg_conn: Connection refused

2020-03-12 Thread Weijun Gao
Hi all, I compiled Slurm 18.08.7, 19.05.5 and 20.02.0 on Ubuntu 18.04 server. X-Forwarding works fine with 18.08.7. * (19.05.5 / 20.02.0) However, slurmd logs the following error message for version 19.05.5 and 20.02.0 when start a GUI application, for example:     [2020-03-11T11:42:55.673] [

Re: [slurm-users] error: persistent connection experienced an error

2019-12-13 Thread Chris Samuel
On 13/12/19 12:19 pm, Christopher Benjamin Coffey wrote: error: persistent connection experienced an error Looking at the source code that comes from here: if (ufds.revents & POLLERR) { error("persistent connection experienced an error");

[slurm-users] error: persistent connection experienced an error

2019-12-13 Thread Christopher Benjamin Coffey
Hi All, I wonder if any of you have seen these errors in slurmdbd.log error: persistent connection experienced an error When we see these errors, we are seeing job errors with some kind of accounting in slurm like: slurmstepd: error: _prec_extra: Could not find task_memory_cg, this should nev

Re: [slurm-users] Error when the stdout or stderror path does not exist

2019-03-25 Thread Antonio Knight
I have created a small group of 4 nodes using my lab mates computers to perform calculations overnight. The algorithm has a random component. I have to run the same program with the same input data several thousand times. To distinguish the executions I have tried to create a folder with the co

[slurm-users] Error when the stdout or stderror path does not exist

2019-03-20 Thread Andrés Marín Díaz
Hello, When I redefine the stdout or stderror to files into a folder that does not exist the job fails instead of creating the folder I do not know if it is the expected behavior and if so, if it can be changed (with some config parameter or environment variable). I have another installatio

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-13 Thread Prentice Bisbal
I managed to figure out why this conditional wasn't working: shortly before this conditional,  I had another conditional that checked for my user_id. If I was submitting a job, it would skip the rest of the job_submit.lua file. I had added this so I could test some new features out that would h

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Prentice Bisbal
Thanks! Prentice On 2/6/19 11:00 AM, Marcus Wagner wrote: Hi Prentice, there, I might help. I've created a table, e.g.: local userflags = {    --  "" = {    -- "bypass"  = 1, # optional, if you want to bypass the submit_plugin    -- "debug"   = 1, # optional, if you want to

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Marcus Wagner
Hi Prentice, there, I might help. I've created a table, e.g.: local userflags = {    --  "" = {    -- "bypass"  = 1, # optional, if you want to bypass the submit_plugin    -- "debug"   = 1, # optional, if you want to get debug messages    -- "param"   = 1, # optional,

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Prentice Bisbal
"Dirty debugging" I like that. I'm going to use that from now on. I have tried that method in the past while debugging other issues. I try not to use it too much, since I don't want these "dirty debugging" messages being seen by users (I don't have a test environment, so I have to test debug in

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Prentice Bisbal
Whew! I have use 'user_id' in a dozen other conditionals that I tested exhaustively. After reading your first e-mail, I thought I was going crazy. I suspect the issue is some sort of subtle typo or syntax error. I use similar conditionals throughout my job_submit.lua script, and they all work

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread mercan
Hi; I think dirty debugging is required using printf (slurm.log_user), because the lua of our slurm installation returns a lot of variables as nil. You can limit the output to a specific user as below: if job_desc.user_name == "mercan" then     slurm.log_user("job_desc.user_id=")     slurm.l

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread Marcus Wagner
Hmm..., no, I was wrong. IT IS 'user_id'. Now I'm a bit dazzled Marcus On 2/4/19 11:27 PM, Prentice Bisbal wrote: Can anyone see an error in this conditional in my job_submit.lua?     if ( job_desc.user_id == 28922 or job_desc.user_id == 41266 ) and ( job_desc.partition == 'general' or job

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread Marcus Wagner
Hi Prentice, I also hate lua sometimes, as it does not complain, when you hope it would complain. It is called 'userid', not 'user_id', so the first part is all the time false ;) Best Marcus On 2/4/19 11:27 PM, Prentice Bisbal wrote: Can anyone see an error in this conditional in my jo

[slurm-users] Error in job_submit.lua conditional?

2019-02-04 Thread Prentice Bisbal
Can anyone see an error in this conditional in my job_submit.lua?     if ( job_desc.user_id == 28922 or job_desc.user_id == 41266 ) and ( job_desc.partition == 'general' or job_desc.partition == 'interruptible' ) then     job_desc.qos = job_desc.partition     return slurm.SUCCESS     e

Re: [slurm-users] Error running jobs with srun

2017-11-09 Thread Elisabetta Falivene
I'll surely produce documentation as soon as I understand how all the cluster is working. (It was something kinda "Here it is the root password and the key to the room. You don't need anything else, don't you?" :) ) Thank to your precious suggestions I was able to get that the common shared space

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 10:54, Elisabetta Falivene wrote: > I am the admin and I have no documentation :D I'll try The third option. > Thank you very much > Ah. Yes. Well, you will need some sort of drive shared between all the nodes so that they can read and write from a common space. Also, I re

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
I am the admin and I have no documentation :D I'll try The third option. Thank you very much Il giovedì 9 novembre 2017, Lachlan Musicman ha scritto: > On 9 November 2017 at 10:35, Elisabetta Falivene > wrote: > >> Wow, thank you. There's a way to check which directories the master and >> The n

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 10:35, Elisabetta Falivene wrote: > Wow, thank you. There's a way to check which directories the master and > The nodes share? > There's no explicit way. 1. Check the cluster documentation written by the cluster admins 2. Ask the cluster admins 3. Run "mount" or "cat /etc/m

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
Wow, thank you. There's a way to check which directories the master and The nodes share? Il mercoledì 8 novembre 2017, Lachlan Musicman ha scritto: > On 9 November 2017 at 09:19, Elisabetta Falivene > wrote: > >> I'm getting this message anytime I try to execute any job on my cluster. >> (node

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 09:19, Elisabetta Falivene wrote: > I'm getting this message anytime I try to execute any job on my cluster. > (node01 is the name of my first of eight nodes and is up and running) > > Trying a python simple script: > *root@mycluster:/tmp# srun python test.py * > *slurmd[nod

[slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
I'm getting this message anytime I try to execute any job on my cluster. (node01 is the name of my first of eight nodes and is up and running) Trying a python simple script: *root@mycluster:/tmp# srun python test.py * *slurmd[node01]: error: task/cgroup: unable to build job physical cores* */usr/b