[slurm-users] Re: Background tasks in Slurm scripts?

2024-07-26 Thread mercan via slurm-users
Good Morning; This is not a slurm issue. This is a default shell script feature. If you want to wait to finish until all background processes, you should use wait command after all. Regards; C. Ahmet Mercan On 26.07.2024 10:23, Steffen Grunewald via slurm-users wrote: Good morning

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread mercan via slurm-users
accounting-configuration-before-build C. Ahmet Mercan 30.05.2024 16:53 tarihinde Radhouane Aniba via slurm-users yazdı: Yes I can connect to my database using mysql --user=slurm --password=slurmdbpass  slurm_acct_db and there is no firewall blocking mysql after checking the firewall question ALso h

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread mercan via slurm-users
Did you try to connect database using mysql command? mysql --user=slurm --password=slurmdbpass  slurm_acct_db C. Ahmet Mercan On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote: Thank you Ahmet, I dont have a firewall active. And because slurmdbd cannot connect to the database I am

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-29 Thread mercan via slurm-users
. Ahmet Mercan On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote: Hi everyone I am trying to get slurmdbd to run on my local home server but I am really struggling. Note : am a novice slurm user my slurmdbd always times out even though all the details in the conf file are correct My log

Re: [slurm-users] How to hold a job until a feature is available?

2022-09-29 Thread mercan
Why don't use a spesific queue instead of the specific feature.The queue is an object for waiting resource, it is ready to use for this purpose. When required resources are ready to use, the jobs will start. Regards; Ahmet M. 29.09.2022 22:27 tarihinde Groner, Rob yazdı: I'm trying to se

Re: [slurm-users] Epilog script does not execute

2022-07-18 Thread mercan
Hi; The Epilog script will be invoked by slurm user at job's node. Who is your slurm user? Did the slurm user have right to read & execute your epilog script. Did you check slurmctld logs? Also, instead of the using the /tmp directory, if you can use a shared directory, you can look for the

Re: [slurm-users] detailed worker state with sinfo

2022-06-27 Thread mercan
Hi; You can look the slurm code for information. https://github.com/SchedMD/slurm/blob/master/src/common/slurm_protocol_defs.c#L3838 The "ALLOCATED + DRAIN" and  "MIX + DRAIN" are same. Others are different. Also There are some other flags which can change status keywords. Regards; Ahmet M.

Re: [slurm-users] Setting up a reactivity margin with SLURM

2022-05-24 Thread mercan
Hi; We don't modify /use SuspendExcNodes parameter, and even the Slurm power-saving feature at all. Because of this, we don't reconfigure slurm. We use our script as a separate solution. You can find the script on my githup page: https://github.com/mercanca/powerSave But I did not add enoug

Re: [slurm-users] Setting up a reactivity margin with SLURM

2022-05-23 Thread mercan
Hi; Because of the same reasons as you said, I don't use slurm power saving features. I want to keep a certain number of nodes always power on and ready to run. The Slurm settings are very limited, just SuspendExcNodes and SuspendExcParts parameters are exist. But SuspendExcNodes totally usel

Re: [slurm-users] How to run a job at the end of a set of jobs

2022-05-09 Thread mercan
Instead of the using the epilog script, You can use the -d ( --dependency) feature of the sbatch: https://slurm.schedmd.com/sbatch.html It supports to run a job after finished multiple jobs. Regards; Ahmet M. 10.05.2022 00:50 tarihinde David Henkemeyer yazdı: Prologue is a feature whereby

Re: [slurm-users] Problem with job allocation

2022-03-30 Thread mercan
Hi; Slurm log says that your prolog did not finish at 300 seconds. Only possible cause that I see, is the line started with "sudo /usr/bin/beeond start -F -P -b /usr/bin/pdsh". You can put a timeout command at the begining of the sudo line to test: timeout 150  sudo /usr/bin/beeond start -

Re: [slurm-users] List only available and up partitions

2022-01-26 Thread mercan
Hi; You can use the spart to list only partitions a user has access to that are in the 'UP' state (and with other limiting factors such as partition limits, allow/deny groups and qos etc.) : https://github.com/mercanca/spart It is a user-oriented partition info command for slurm. Also, it gi

Re: [slurm-users] TimeLimit parameter

2021-12-02 Thread mercan
Hi; The EnforcePartLimits parameter in slurm.conf, should be set to ALL or ANY to enforce time limit for partition. Regards. Ahmet M. 2.12.2021 16:18 tarihinde Gestió Servidors yazdı: Hello, I’m going a problema I have detected in my SLURM cluster. If I configure a partition with a “Tim

Re: [slurm-users] random allocation of resources

2021-12-01 Thread mercan
Hi; The Slurm is selecting the nodes according to the weight parameter of the nodes. I don't know any settings to change the way of the selecting node, except the changing values of the weights. But it is not a suitable for the randomly selecting nodes. Fortunately, absolutely there is not a

Re: [slurm-users] sreport question when specifying partitions=

2021-11-10 Thread mercan
The Partitions= option is only valid for "sreport job". Ref: https://slurm.schedmd.com/sreport.html Ahmet M. 10.11.2021 18:56 tarihinde Bill Wichser yazdı: I can't seem to figure out how to do a query against a partition. sreport cluster AccountUtilizationByUser user=bill cluster=della, no

Re: [slurm-users] restarting slurmctld restarts jobs???

2021-09-20 Thread mercan
Hi; Please check the StateSaveLocation directory which should readable and writable by both slurmctld nodes and it should be a shared directory, not two local directory. The explanation at below is taken from slurm web site: "The backup controller recovers state information from the StateSa

Re: [slurm-users] 4 sockets but "

2021-07-20 Thread mercan
Hi; Did you check slurmctld log for a complain about the host line. if the slumctld can not recognize a parameter, may be it give up processing whole host line. Ahmet M. 20.07.2021 13:49 tarihinde Diego Zuccato yazdı: Hello all. It's been since yesterday that I'm facing this issue. I'm co

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

2021-05-20 Thread mercan
Hi; We use a bash script to watch and kill users' processes, if they exceed the our cpu and memory limits. Also this solution ensures total usage of cpu or memory can not exceed because of a lot of well behaved users as well as a bad user: https://github.com/mercanca/kill_for_loginnode.sh A

Re: [slurm-users] Slurmdbd purge settings

2021-02-23 Thread mercan
Hi; May be the database can not fit innodb buffer any more. If there are enough room to increase this value(innodb_buffer_pool_size) , to find reason, you can try the increase. Ahmet M. 23.02.2021 17:03 tarihinde Luke Sudbery yazdı: That great, thanks. We were thinking about staging it lik

Re: [slurm-users] prolog not passing env var to job

2021-02-12 Thread mercan
Hi; Prolog and TaskProlog are different parameters and scripts. You should use the TaskProlog script to set env. variables. Regards; Ahmet M. 13.02.2021 00:12 tarihinde Herc Silverstein yazdı: Hi, I have a prolog script that is being run via the slurm.conf Prolog= setting.  I've verifie

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-01-25 Thread mercan
Hi; We are using the yumlock feature of the yum to protect unwanted upgrade of the some packages. Also, Ole mentioned "exclude=slurm" option of the repo file. It is not a solutionless problem. But, the package maintainer is a valued resource which hard to find. Regards, Ahmet M. 25.01.202

Re: [slurm-users] Running slurmd without enabling jobs on a node

2021-01-06 Thread mercan
Hi; I don't know the best way, but if you did not put a loginnode's name into a partition, the sinfo will not show this node and any job will not run on this node, just because of a node have a running slurmd. Ahmet M. 6.01.2021 19:45 tarihinde Steve Brasier yazdı: Hi all, For a cluster i

Re: [slurm-users] slurmctld daemon error

2020-12-15 Thread mercan
that variable defined as a hostname, not localhost. Thanks, Avery On Tue, Dec 15, 2020, 1:51 PM mercan <mailto:ahmet.mer...@uhem.itu.edu.tr>> wrote: Hi; I dont know the problem is this, but, I think the setting "ControlMachine=localhost" and not setting a hostname for

Re: [slurm-users] slurmctld daemon error

2020-12-15 Thread mercan
Hi; I dont know the problem is this, but, I think the setting "ControlMachine=localhost" and not setting a hostname for slurm master node are not good decisions. How compute nodes decide the ip address of the slurm masternode from "localhost". Also, I suggest not using capital letters for any

Re: [slurm-users] Slurmdbd - error: cannot find auth plugin for auth/none

2020-12-08 Thread mercan
Hi; There is an explanation at https://slurm.schedmd.com/quickstart_admin.html "The configure script in the top-level directory of this distribution will determine which authentication plugins may be built." If you have munge, may be the configure script decided to not compile the auth/none

Re: [slurm-users] [EXTERNAL] Re: trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes

2020-11-30 Thread mercan
Hi; Did you test munge connection? If not, would you test it like this munge -n | ssh  SRVGRIDSLURM02  unmunge Ahmet M. 30.11.2020 14:43 tarihinde Steve Bland yazdı: Thanks Diego actually, nothing at all in the hosts file, did not seem to need to modify it to see the nodes. the differe

Re: [slurm-users] Users can't scancel

2020-11-18 Thread mercan
bug:  Waiting for job 110's prolog to complete [2020-11-18T10:21:10.121] debug:  Finished wait for job 110's prolog to complete [2020-11-18T10:21:10.121] debug:  [job 110] attempting to run epilog [/cm/local/apps/cmd/scripts/epilog] [2020-11-18T10:21:10.124] debug:  completed epilog fo

Re: [slurm-users] Users can't scancel

2020-11-18 Thread mercan
Hi; Check epilog return value which comes from the return value of the last line of epilog script. Also, you can add a "exit 0" line at the last line of the epilog script to ensure to get a zero return value for testing purpose. Ahmet M. 18.11.2020 20:00 tarihinde William Markuske yazdı:

Re: [slurm-users] sbatch overallocation

2020-10-10 Thread mercan
Hi; You can submit each pimplefoam as a seperate job. or if you realy submit as a single job, you can use a program to run each of them as much as cpu count such as gnu parallel: https://www.gnu.org/software/parallel/ regards; Ahmet M. 10.10.2020 14:05 tarihinde Max Quast yazdı: Dear sl

Re: [slurm-users] Working with local licenses

2020-09-25 Thread mercan
duplicate records - direct insert is working and case sensitive, but scontrol doesn't see change until slurmctld restart Regards Alexey -Original Message- From: mercan Sent: Friday, September 25, 2020 11:16 AM To: Slurm User Community List ; Tager, Alexey Subject: RE: [EXTERNAL] [

Re: [slurm-users] Working with local licenses

2020-09-25 Thread mercan
Hi; You don't need to modify slurm.conf and reconfigure. There is a remote/dynamic licenses feature: https://slurm.schedmd.com/licenses.html You can add licenses using scontrol command such as: sacctmgr add resource name=matlab count=50 server=rlm_host \   servertype=rlm type=license Regard

Re: [slurm-users] Option for remote licenses to be case sensitive in slurm ?

2020-09-16 Thread mercan
Hi; The Slurm license feature is just a simple counter, not more than that. It can not connect to the license server to read or update the licenses. Slurm only count used license and subtract from setted license count. If result is zero, does not run new jobs. The license feature names and sl

Re: [slurm-users] Alternatives for MailProg

2020-08-27 Thread mercan
Hi; If you want, You can use our script, very simple, but it works: https://github.com/mercanca/slurmmail Regards; Ahmet M. 27.08.2020 08:02 tarihinde Andrew Elwell yazdı: Hi folks, I'm getting fed up receiving out-of-office replies to slurm job state mails. Given that by default slurmctl

Re: [slurm-users] Update users partitions

2020-08-21 Thread mercan
esult using: sacctmgr show assoc where user=foo Ahmet M. 21.08.2020 23:51 tarihinde mercan yazdı: Hi; I think You can not update user's partition: https://slurm-dev.schedmd.narkive.com/2UnWaNQJ/setting-a-users-s-partition-with-sacctmgr It is a part of the assosiation and it can be set a

Re: [slurm-users] Update users partitions

2020-08-21 Thread mercan
Hi; I think You can not update user's partition: https://slurm-dev.schedmd.narkive.com/2UnWaNQJ/setting-a-users-s-partition-with-sacctmgr It is a part of the assosiation and it can be set at creating user as an option: https://slurm.schedmd.com/accounting.html#database-configuration     'Ac

Re: [slurm-users] Nodes going into drain because of "Kill task failed"

2020-07-23 Thread mercan
Hi; Are you sure this is a job task completing issue. When the epilog script fails, slurm will set node to DRAIN state: "If the Epilog fails (returns a non-zero exit code), this will result in the node being set to a DRAIN state" https://slurm.schedmd.com/prolog_epilog.html You can test th

Re: [slurm-users] [External] Fwd: Slurm MySQL database configuration

2020-07-23 Thread mercan
Hi; I think you can use pacemaker cluster for a virtual slurmdb server. A virtual slurmdb server which runs both slurmdb and mysql services on the active slurmctl server. When the active slurmctl server die, You can try to start on the passive one. Regards; Ahmet M. 23.07.2020 19:12 tarih

Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread mercan
Hi Janna; It sounds like a Arp cache table problem to me. If your slurm head node can reachable ~1000 or more network devices (all connected network cards, switches etc., even they are reachable by different ports of the server), you need to increse some network settings at headnode and serve

Re: [slurm-users] howto list/get all scripts run by a job?

2020-06-19 Thread mercan
But don't forget, if there aren't a script you can not get running script such as salloc jobs. Ahmet M. On 19.06.2020 12:39, Adrian Sevcenco wrote: On 6/19/20 12:35 PM, mercan wrote: Hi; For running jobs, you can get the running script with using: scontrol write ba

Re: [slurm-users] howto list/get all scripts run by a job?

2020-06-19 Thread mercan
Hi; For running jobs, you can get the running script with using: scontrol write batch_script  "$SLURM_JOBID" - command. the - parameter reqired for screen output. Ahmet M. On 19.06.2020 12:25, Adrian Sevcenco wrote: On 6/18/20 9:35 AM, Loris Bennett wrote: Hi Adrain, Hi Adrian Sevcenco

Re: [slurm-users] Slurm 20.02.3 error: CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs.

2020-06-16 Thread mercan
Hi; Did you check /var/log/messages file for errors. Systemctl logs this file, instead of the slurmctl log file. Ahmet M. 16.06.2020 11:12 tarihinde Ole Holm Nielsen yazdı: Today we upgraded the controller node from 19.05 to 20.02.3, and immediately all Slurm commands (on the controller nod

Re: [slurm-users] Show detailed information from a finished job

2020-04-23 Thread mercan
Sorry, I falsely crop the "mkdir" line at below: mkdir -p $JDIR I should be after "JDIR=/okyanus/..." line Regards; Ahmet M. 23.04.2020 12:31 tarihinde mercan yazdı: Hi; I prefer to use epilog script to store the job information to a top directory owned by the sl

Re: [slurm-users] Show detailed information from a finished job

2020-04-23 Thread mercan
Hi; I prefer to use epilog script to store the job information to a top directory owned by the slurm user. To avoid a directory with a lot of files, It creates a sub-directory for a thousand job file. For a job which its jobid is 230988, It creates a directory named as 230XXX. Also the SLURM_

Re: [slurm-users] clarification on slurm scheduler and the "nice" parameter

2020-04-14 Thread mercan
Hi; Did you restart slurmctld after changing "PriorityType=priority/multifactor"? Also your nice values are too small. It is not unix nice. Its range is +/-2147483645, and it race with other priority factors at priority factor formula. Look priority factor formula at https://slurm.schedmd.c

Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread mercan
Hi; If you have working job_submit.lua script, you can put a block new jobs of the spesific user: if job_desc.user_name == "baduser" then     return 2045 end thats all! Regards; Ahmet M. 1.04.2020 16:22 tarihinde Mark Dixon yazdı: Hi David, Thanks for this, it sounds like I'

[slurm-users] The spart command version 1.0.0 is released.

2020-03-28 Thread mercan
Hi; The spart command version 1.0.0 is available: https://github.com/mercanca/spart The spart is a user-oriented info command for slurm. It shows the user specific brief partition info with core count of available nodes and pending jobs. It hides unnecessary information for users in the out

Re: [slurm-users] Slurm 17.11 and configuring backfill and oversubscribe to allow concurrent processes

2020-02-27 Thread mercan
Hi; At your partition definition, there is "Shared=NO". This is means "do not share nodes between jobs". This parameter conflict with "OverSubscribe=FORCE:12 " parameter. Acording to the slurm documentation, the Shared parameter has been replaced by the OverSubscribe parameter. But, I suppose

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-30 Thread mercan
hi; From the slurm.conf documentation web page:  Note: The filetxt plugin records only a limited subset of accounting information and will prevent some sacct options from proper operation. regards; Ahmet M. 29.01.2020 21:47 tarihinde Dr. Thomas Orgis yazdı: Hi, I happen to run a small cl

Re: [slurm-users] Question about networks and connectivity

2019-12-05 Thread mercan
Hi; Your mpi and NAMD use your second network because of your applications did not compiled for infiniband. There are many compiled NAMD versions. the verb and ibverb versions are for using infiniband. Also, when you compiling the mpi source, you should check configure script detect the infin

Re: [slurm-users] OverMemoryKill Not Working?

2019-10-25 Thread mercan
f they are exceeded), then I could extract what I need from that. Again, thanks for the assistance. Mike On Thu, Oct 24, 2019 at 11:27 PM mercan <mailto:ahmet.mer...@uhem.itu.edu.tr>> wrote: Hi; You should set SelectType=select/cons_res and plus one of these: S

Re: [slurm-users] OverMemoryKill Not Working?

2019-10-24 Thread mercan
Hi; You should set SelectType=select/cons_res and plus one of these: SelectTypeParameters=CR_Memory SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_CPU_Memory SelectTypeParameters=CR_Socket_Memory to open Memory allocation tracking according to documentation: https://slurm.schedm

Re: [slurm-users] Interlocking / Concurrent runner

2019-10-22 Thread mercan
Hi; You can use the "--dependency=afterok:jobid:jobid ..." parameter of the sbatch to ensure the new submitted job will be waiting until all older jobs are finished. Simply, you can submit the new job even while older jobs are running, the new job will not start before old jobs are finished.

Re: [slurm-users] Sacct selecting jobs outside range

2019-10-16 Thread mercan
Hi; Starttime and Endtime are for any states include PENDING. If you want to restrict only working jobs between start and end time, you should give which states you want using -s parameter. Ahmet M. 16.10.2019 20:31 tarihinde Brian Andrus yazdı: All, When running a report to try and get j

Re: [slurm-users] Creating a partition with memory and CPU limits

2019-07-19 Thread mercan
Hi; I think you should set SelectType=select/cons_res and plus one of these: SelectTypeParameters=CR_Memory SelectTypeParameters=CR_Core_Memory SelectTypeParameters=CR_CPU_Memory SelectTypeParameters=CR_Socket_Memory to open Memory allocation tracking according to documentation: https://slur

Re: [slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread mercan
Hi; If you want to use the threads as cpus, you should set CR_CPU, instead of CR_Core. Regards; Ahmet M. 12.07.2019 21:29 tarihinde mercan yazdı: Hi; You can find the Definitions of Socket, Core, & Thread at: https://slurm.schedmd.com/mc_support.html Your status: CPUs=COREs=Soc

Re: [slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread mercan
Hi; You can find the Definitions of Socket, Core, & Thread at: https://slurm.schedmd.com/mc_support.html Your status: CPUs=COREs=Sockets*CoresPerSocket=1*4=4 Threads=COREs*ThreadsPerCore=4*2=8 Regards; Ahmet M. 12.07.2019 20:15 tarihinde Hanu Pathuri yazdı: Hi, Here is my node informa

Re: [slurm-users] Hints, Cheatsheets, etc

2019-07-08 Thread mercan
Hi; There is a official page which gives a lot of link to third party solutions you can use: https://slurm.schedmd.com/download.html According to me, the best slurm page for system administration is: https://wiki.fysik.dtu.dk/niflheim/SLURM At this page, You can find a lot of links and inf

Re: [slurm-users] slurm configuration for reprise license mangagement - cloud version

2019-06-25 Thread mercan
Hi; As far as I know, the slurm is not able to work (communicate) with reprise license manager or any other license manager. Slurm just sums the used licenses according to the -L parameter of the jobs, and subtracts this sum from the total license count which given by using "sacctmgr add/modi

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
2019 at 12:24 PM mercan <mailto:ahmet.mer...@uhem.itu.edu.tr>> wrote: Hi; Sorry, as you can see, I did a mistake again.  I wrote two different directories: "The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and slurmd.pid files should be &quo

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
R noki:root /var/run/slurm-llnl Regards; Ahmet M. 19.06.2019 05:55 tarihinde Noki Lee yazdı: Hi, slurm-users and mercan. I tried what you said. |noki@noki-System-Product-Name:~$ sudo chown -R noki:root /var/spool/slurm-llnl/ |noki@noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l total 92 -r

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
Hi; I did not notice SlurmUser=noki line. The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and slurmd.pid files should be "noki" user. chown -R noki:root /var/spool/slurm-llnl Regards; Ahmet M. On 18.06.2019 15:15, mercan wrote: Hi; The owner of the /var

Re: [slurm-users] How to fix “slurmd.service: Can't open PID file” error

2019-06-18 Thread mercan
Hi; The owner of the /var/run/slurm-llnl directory and the slurmctld.pid and slurmd.pid files should be "slurm" user. Your files owner are root and noki. chown -R slurm:slurm /var/spool/slurm-llnl Regards; Ahmet M. On 18.06.2019 15:03, Noki Lee wrote: Though SLURM works fine for job su

Re: [slurm-users] salloc not able to run sbash script

2019-06-17 Thread mercan
Hi; Try: salloc ./run_qemu.sh Regards; Ahmet M. 17.06.2019 20:28 tarihinde Mahmood Naderan yazdı: Hi, May I know why the user is not able to run a qemu interactive job? According to the configuration which I made, everything should be fine. Isn't that? [valipour@rocks7 ~]$ salloc run_qe

Re: [slurm-users] strigger on CG, completing state

2019-05-28 Thread mercan
Hi; If you did not use the epilog script, you can set the epilog script to clean up all residues from the finished jobs: https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-prolog-and-epilog-scripts Ahmet M. 28.05.2019 19:03 tarihinde Matthew BETTINGER yazdı: We use triggers f

Re: [slurm-users] Nodes not responding... how does slurm track it?

2019-05-15 Thread mercan
Hi; Do not think "the number of devices" as "the number of servers". If a devices which have a MAC address and connected to your node's local networks, it counts as a device. For example, if your BMC ports (ILO,iDRAC etc.) connected to one of the networks of your nodes, it doubles the number

[slurm-users] the slurm_load_partitions2 function from slurm api

2019-05-08 Thread mercan
Hi; I am trying to use the slurm_load_partitions2 function from slurm api. It is defined as: extern int slurm_load_partitions2(time_t update_time,   partition_info_msg_t **resp,   uint16_t show_flags,  

Re: [slurm-users] How to get a summary of the use of compute nodes and/or partition of a cluster in real time ?

2019-04-30 Thread mercan
Hi; For a summary of the partitions, you can use spart: https://github.com/mercanca/spart Regards, Ahmet M. On 30.04.2019 15:47, Jean-mathieu CHANTREIN wrote: Hello, Do you know a command to get a summary of the use of compute nodes and/or partition of a cluster in real time ? Something wi

Re: [slurm-users] Increasing job priority based on resources requested.

2019-04-19 Thread mercan
Hi; We use node weight parameter to do that. When you set High mem nodes with high weight, and low mem nodes with low weight; Slurm will select lowest weight nodes which have enough mem job requested. So, if there are free low mem nodes, high mem nodes will stay free. At our cluster, low mem

Re: [slurm-users] spart: A user-oriented partition info command for slurm

2019-04-02 Thread mercan
multiple partitions, the program will work fine. On Mar 27, 2019, at 5:51 AM, mercan <mailto:ahmet.mer...@uhem.itu.edu.tr>> wrote: Hi; Except sjstat script, Slurm does not contains a command to show user-oriented partition info. I wrote a command. I hope you wil

[slurm-users] spart: A user-oriented partition info command for slurm

2019-03-27 Thread mercan
Hi; Except sjstat script, Slurm does not contains a command to show user-oriented partition info. I wrote a command. I hope you will find it useful. https://github.com/mercanca/spart Regards, Ahmet M.

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread mercan
Hi; I think dirty debugging is required using printf (slurm.log_user), because the lua of our slurm installation returns a lot of variables as nil. You can limit the output to a specific user as below: if job_desc.user_name == "mercan" then     slurm.log_user("j

Re: [slurm-users] Question: How to see all srun/sbatch commands in one place?

2019-01-24 Thread mercan
Hi; You can use a job submit plugin to logging. We use lua job_submit plugin. The slurm.log_info() function writes a string to slurmctl log file. But we use a seperate file as a user activity log file. The logging lua code something as below:     dt = os.date()     jaccount = job_des

Re: [slurm-users] 18.08.4 - batch scripts named "batch" getting rejected.

2018-12-19 Thread mercan
Hi; We upgraded from 18.08.3 to 18.08.4 and there is a job_submit.lua script also. And nearly same issue at our cluster: $ sbatch batch sbatch: error: Batch job submission failed: Unspecified error $ mv batch nobatchy $ sbatch nobatchy Submitted batch job 172174 I hope this helps. Ahmet M.

Re: [slurm-users] new user; ExitCode reporting

2018-11-23 Thread mercan
Hi; As far as I know exit code 141 and 13 are the same. Signal + 128 gives exit code: https://slurm-dev.schedmd.narkive.com/MYGH56EW/job-exit-codes Ahmet M. On 23.11.2018 14:36, Matthew Goulden wrote: A confirmation re-run yielded the same outcome but the correct outcome was available

Re: [slurm-users] srun problem -- Can't find an address, check slurm.conf

2018-11-13 Thread mercan
Hi; Are there some typo errors or they are really different paths: /opt/exp_soft/slurm/bin/srun vs. which srun /opt/exp_soft/bin/srun Ahmet Mercan 13.11.2018 11:24 tarihinde Scott Hazelhurst yazdı: Dear all I still haven’t found the cause to the problem I raised last week where srun -w