[slurm-dev] Re: backfill scheduler look ahead?

2014-02-25 Thread Moe Jette
Quoting Yuri D'Elia : On 02/20/2014 07:21 PM, Moe Jette wrote: Slurm uses what is known as a conservative backfill scheduling algorithm. No job will be started that adversely impacts the expected start time of _any_ higher priority job. The scheduling can also be effected by a

[slurm-dev] Re: Job resizing with cons_res/CR_CORE_MEMORY

2014-03-04 Thread Moe Jette
That would be of value, but is currently not supported. Quoting Damien François : Hello all, Slurm allows resizing jobs with 'scontrol update jobid=$SLURM_JOB_ID NumNodes=1' as described here: http://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf . That works without prob

[slurm-dev] Re: gres-related slurmctld crash

2014-03-04 Thread Moe Jette
Thank you for the problem analysis and patch. It will be included in the version 2.6.7 release. The commit is here: https://github.com/SchedMD/slurm/commit/f005e5086aa9461e5accac6ef812b92c9b0b8bf7 Quoting Carlos Bederián : Here's a patch to avoid the overrun assert on bit_test: diff --gi

[slurm-dev] Re: Added spank_item.

2014-03-04 Thread Moe Jette
Thank you for your contribution. It will be included in version 2.6.7 when releassed. The commit is here: https://github.com/SchedMD/slurm/commit/96673989ed2c62e4f9fa7730eba9bae4be54b08e Quoting Magnus Jonsson : I have made a patch for spank to allow to fetch the SLURM_RESTART_COUNT into

[slurm-dev] Re: HA : not switching fast from master to backup server

2014-03-05 Thread Moe Jette
See the configuration parameter SlurmctldTimeout as described here: http://slurm.schedmd.com/slurm.conf.html Quoting Marc Vecsys : Hi It takes 5mn for the backup controler to start after the master failed, is there any setup to have a fast switching ? Thanks Marc slurm.conf file ControlMach

[slurm-dev] Re: Prevent interactive session longer than x minutes

2014-03-05 Thread Moe Jette
The job submit data structure does not have a "batch_flag". Check if "script" is NULL or not. Quoting Oriol Mula-Valls : Hi, I am creating a LUA plugin and I am trying to prevent the interactive jobs longer than 8h. How can I know if a job is interactive or not? I've tried to use job_desc.

[slurm-dev] Re: preemption/suspend problems

2014-03-10 Thread Moe Jette
Are the job's allocated different nodes or even different cores on the nodes? If so, they don't need to preempt each other. Also see: http://slurm.schedmd.com/preempt.html Quoting "Ryan M. Bergmann" : Hi dev list, I’m having trouble getting preemption to work in suspend mode. I have thr

[slurm-dev] Re: job_submit.lua and custom user message

2014-03-11 Thread Moe Jette
I don't have time to test this right now, but believe the commit below will fix the problem by initializing a variable to NULL. https://github.com/SchedMD/slurm/commit/e3363b95b0cedd4972c8c7b8dc87a1750f6bc3dd Quoting Marco Passerini : Hi, I'm trying Slurm 14.03.0, and in particular the n

[slurm-dev] Re: job_submit.lua and custom user message

2014-03-15 Thread Moe Jette
== 0x42) failed Best Regards, Tommi Tervo CSC On Tuesday, March 11, 2014 4:35 PM, Moe Jette wrote: I don't have time to test this right now, but believe the commit below  will fix the problem by initializing a variable to NULL. https://github.com/

[slurm-dev] Re: slurm_jobid2pid

2014-03-24 Thread Moe Jette
The closest thing available today is the "scontrol listpids" command described on the scontrol man page. Quoting Ulf Markwardt : Dear developers, in the API, I can find a function slurm_pid2jobid, that's fine. For our monitoring, we need the inverse function, which gives a list of proc

[slurm-dev] Re: fix select_nodeinfo_set_all in select/linear

2014-03-25 Thread Moe Jette
Your patch will be in the next release of version 14.03. Thank you! https://github.com/SchedMD/slurm/commit/18ca8adf9437cb1d7756537785ee6ee573249f66 Quoting Hongjia Cao : allocated but drained node will be shown mixed by sinfo.

[slurm-dev] Slurm version 14.03.0 is now available

2014-03-26 Thread Moe Jette
Slurm version 14.03.0 is now available. This is a major Slurm release with many new features. See the RELEASE_NOTES and NEWS files in the distribution for detailed descriptions of the changes, a few of which are noted below. Upgrading from Slurm versions 2.5 or 2.6 should proceed without

[slurm-dev] Re: scontrol ignores config ID

2014-03-26 Thread Moe Jette
On the scontrol man page where it describes "show ENTITY ID", the "ID" can refer to a specific job ID, node name, partition name, etc. but not specific configuration parameters. You will need to use "grep" to print specific configuration parameters (e.g. "scontrol show config | grep Clust

[slurm-dev] Re: slurm 14.03 and flexlm

2014-03-27 Thread Moe Jette
Web page for this and a couple of other things still under development. For now see: http://slurm.schedmd.com/SUG13/license_management.pdf Quoting Albert Solernou : Hi all, I was very happy to read that the new stable version of Slurm supports FLEXlm. However, I cannot find any document

[slurm-dev] Re: DRMAA job submission mis-calculates NumCPUs

2014-03-27 Thread Moe Jette
It works for me with versiion 2.6.2, but the results might vary depending upon your configuration. $ sbatch -n1 --cpus-per-task=2 tmp Submitted batch job 3 $ scontrol show job JobId=3 Name=tmp UserId=jette(1001) GroupId=jette(1001) Priority=4294901759 Account=root QOS=normal JobStat

[slurm-dev] Re: gres hierarchy

2015-02-27 Thread Moe Jette
/export/home/s_user1/slurm/sleep.py 0.1' an sbatch like the following would use Tmp3 since 2500 is less than 1 and greater than 2000 : sbatch --comment="job_id.prog_id:6162.5" --gres=Process,TmpX:2500 /export/home/s_user1/slurm/sleep.py 0.1' -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Prolog for requeued jobs not run on all nodes

2015-02-27 Thread Moe Jette
at all. I have not done any detailed analysis of this case, but I would guess something similar is causing this. Regards, Pär Lindfors, NSC -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: mixing mpi and per node tasks

2015-03-02 Thread Moe Jette
or each core). I'd also like any solution to work with hybrid mpi/openmp with one openmp task per node or per socket. Thanks, Gareth -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Heterogenous GRES nodes

2015-03-02 Thread Moe Jette
allocated and used? Thanks, Mike Robbert -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Exporting a variable in a TaskProlog

2015-03-02 Thread Moe Jette
o do the thing? Would you have a better / more elegant solution? -- DANY TELLO -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: mixing mpi and per node tasks

2015-03-02 Thread Moe Jette
advice as it was the default. However, in that case to use srun and use all the cores, extra options were needed. I know that, but wanted to provide you with a more general solution. -----Original Message- From: Moe Jette [mailto:je...@schedmd.com] Sent: Tuesday, 3 March 2015 3:42 AM

[slurm-dev] Re: Bug in Slurm (14.11.3) when running under debugger and executable not existing?

2015-03-02 Thread Moe Jette
un: error: step_launch_notify_io_failure: aborting, io error with slurmstepd on node 0 Regards, Dirk -- Dirk Schubert - Lead Software Developer || Allinea Software -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: mixing mpi and per node tasks

2015-03-02 Thread Moe Jette
Quoting gareth.willi...@csiro.au: -Original Message- From: Moe Jette [mailto:je...@schedmd.com] Sent: Tuesday, 3 March 2015 9:54 AM -snip- The options for srun, sbatch, and salloc are almost identical with respect to specification of a job's allocation requirements. Yes. Pa

[slurm-dev] Re: Prolog for requeued jobs not run on all nodes

2015-03-05 Thread Moe Jette
delaying job restarts by setting begin_time. Unfortunately I probably will not have time too look at this more myself. Pär Lindfors, NSC -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Prolog for requeued jobs not run on all nodes

2015-03-05 Thread Moe Jette
parameters and delaying job restarts by setting begin_time. Unfortunately I probably will not have time too look at this more myself. Pär Lindfors, NSC -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Separate slurm-realdev list

2015-03-10 Thread Moe Jette
"more advanced topics". Any news on this? cheers, marcin -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: confused by some values in `scontrol show job`

2015-03-12 Thread Moe Jette
: * SecsPreSuspend * ReqB:S:C:T * NtasksPerN:B:S:C * Socks/Node (I think I get it; but it’s not present in scontrol manpage) * CoreSpec Could someone explain these to me? ~jonathon -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Array job: get number of array tasks in batch script

2015-03-13 Thread Moe Jette
handling a different slice. Thanks for your help, jc -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: limits for GRES resources?

2015-03-13 Thread Moe Jette
Sorry, not with the current code. Quoting Bill Wichser : Is there any way to set a limit in a QOS for GRES resources? Bill -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Array job: get number of array tasks in batch script

2015-03-13 Thread Moe Jette
sbatch/opt.c seems beyond my reach to be frank. Is there a way I can submit a feature request? On Fri, Mar 13, 2015 at 4:48 PM, Moe Jette mailto:je...@schedmd.com> <mailto:je...@schedmd.com <mailto:je...@schedmd.com>>> wrote: That information is

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Moe Jette
:58.988] debug3: Success. [2015-03-16T15:39:58.988] debug2: No acct_gather.conf file (/etc/slurm-llnl/acct_gather.conf) -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: query only submission times

2015-03-17 Thread Moe Jette
treet Cambridge, MA Office: 211A | Phone: 617-496-7468 ====== -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Propegation of limits using ulimit for SLURM daemons

2015-03-17 Thread Moe Jette
pt on the compute node.Curiosity is peaked as to why the SLURM daemon doesn't obtain the correct values in from the environment.  Does anyone know?Kelly -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm versions 14.11.5 and 15.08.0-pre3 are now available

2015-03-19 Thread Moe Jette
o a single user. Added "--exclusive=user" option to salloc, sbatch and srun commands. Added "owner" field to node record, visible using the scontrol and sview commands. Added new partition configuration parameter "ExclusiveUser=yes|no". -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Array tasks log files

2015-03-25 Thread Moe Jette
u for your help ! Philippe -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Summary of Slurm commands and options online

2015-03-25 Thread Moe Jette
There is a two-page summary of Slurm commands and options available online: http://slurm.schedmd.com/pdfs/summary.pdf We plan to make cards available with this information at upcoming conferences. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Error connecting slurm stream socket at IP:6817: Connection refused

2015-03-30 Thread Moe Jette
ebug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.16.40.42:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused . . . Someone can help? Bests, Jorge Góis -- Morris "Moe" Jette CTO, SchedMD

[slurm-dev] A way of abuse the priority option in Slurm?

2015-03-31 Thread Moe Jette
/commit/4454316ef527b8700743d94c958811a39609e7d5.patch -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: job req fields in lua plugin

2015-04-02 Thread Moe Jette
ne: +43 1 58801 420759 email: samuel.seno...@tuwien.ac.at -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Upgrade Rollbacks

2015-04-02 Thread Moe Jette
e for rolling back a minor and major release of slurm in case something goes wrong? -Paul Edmon- -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Handle cancel from job submit plugin

2015-04-02 Thread Moe Jette
catch that in a job submit plugin? Or some other plugin type? Thanks Martins -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: "CG" state forever?

2015-04-02 Thread Moe Jette
ot seem to have any effect. I am planning on updating versions to the latest but is there anything I can do to prevent or circumvent this? Thanks, ~Mike C. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: "CG" state forever?

2015-04-02 Thread Moe Jette
f man page. Quoting Moe Jette : Slurm can't kill the process, so does not reallocate those resources. See: http://slurm.schedmd.com/troubleshoot.html#completing Quoting Michael Colonno : Hi ~ I've run into this issue with several different versions (currently 14.0.3) and I'v

[slurm-dev] Re: --reboot

2015-04-06 Thread Moe Jette
computing center university of chicago 773.702.1104 -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: scancel and old data.

2015-04-07 Thread Moe Jette
Configure an Epilog script to do that. Quoting Anatoliy Kovalenko : Is there way to clean all users data on a slurm's node after task was cancelled via "scancel job_id" command? We haven't a shared directory and copy input/output files every times to/from nodes. Thank

[slurm-dev] Re: scancel and old data.

2015-04-07 Thread Moe Jette
email to user when task is finished. That's is why we needs a bit another instrument for do it. Thanks you for your help. 2015-04-07 20:47 GMT+03:00 Moe Jette : Configure an Epilog script to do that. Quoting Anatoliy Kovalenko : Is there way to clean all users data on a slurm's node

[slurm-dev] Re: requeue all jobs

2015-04-08 Thread Moe Jette
please immediately notify the sender via telephone or return mail. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Some work and thoughts about the backfill scheduler in Slurm.

2015-04-13 Thread Moe Jette
major has to be done with the backfill scheduler for Slurm to be usable for us in the future. Not just small patches that add some tweaks to get us through the day. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: What is SPANK logging function slurm_debug?

2015-04-17 Thread Moe Jette
iffer? The man page and Schedmd documentation are silent on these. Thanks, Bob -- Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227 -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: What is SPANK logging function slurm_debug?

2015-04-20 Thread Moe Jette
rminal, but not the slurm_debug. Neither show up in slurmctld.log, where I was expecting to see the slurm_debug. Any idea what I am doing wrong? Thanks, Bob On Fri, 17 Apr 2015, Moe Jette wrote: Messages printed with those functions are only seen if someone has the daemons configured to p

[slurm-dev] Re: Single node configuration- CPU resources

2015-04-20 Thread Moe Jette
pecify the number of jobs that can be run concurrently on a single shared node (assuming available resources)? Thanks, Peter. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: prevent slurm from parsing the full script

2015-04-21 Thread Moe Jette
endungen Deutsches Klimarechenzentrum GmbH (DKRZ) Bundesstraße 45 a, D-20146 Hamburg, Germany Phone: +49 40 460094-144 FAX: +49 40 460094-270 Email: bockelm...@dkrz.de URL: www.dkrz.de Geschäftsführer: Prof. Dr. Thomas Ludwig Sitz der Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784 -

[slurm-dev] Re: What is SPANK logging function slurm_debug?

2015-04-21 Thread Moe Jette
_local_user_init(). I have not done anything to change config for "DebugFlags". "scontrol show config" says it is (null). So that doesn't seem to explain how the other debug entries have been enabled. Any idea what else I am missing? Thanks, Bob On Mon, 20 Apr 2015, Mo

[slurm-dev] Slurm versions 14.11.6 is now available

2015-04-23 Thread Moe Jette
hift when loading job archives. -- ALPS - Added new SchedulerParameters=inventory_interval to specify how often an inventory request is handled. -- ALPS - Don't run a release on a reservation on the slurmctld for a batch job. This is already handled on the stepd when the script finishes. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Truncated array_task_str from slurm_job_info_t api

2015-04-27 Thread Moe Jette
ethod to obtain the full list of tasks actually scheduled via the slurm_job_info_t struct? Cheers, -JX -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: sbcast, prolog and SPANK.

2015-04-29 Thread Moe Jette
run until you actually send a job to the node. I.e. you can send data to a node with sbcast before the prolog this might not be an expected/wanted behaviour. Best, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: since version 14.11.6 srun takes 2 cpu by default

2015-04-29 Thread Moe Jette
nique.legr...@pasteur.fr Tel: 01 44 38 95 03 -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: How to ship a SPANK plugin

2015-05-01 Thread Moe Jette
ise empty. -- Bob Moench Voice: (651) 605-9034 Programming Environment -- Debugger Dev. FAX: (651) 605-8972 Cray Inc. 380 Jackson St. Suite 210 Email: r...@cray.com St. Paul, MN 55101 URL: http://www.cray.com/ -- Morris "Moe" Jett

[slurm-dev] Re: How to ship a SPANK plugin

2015-05-01 Thread Moe Jette
ments: slurm_spank_local_user_init slurm_spank_exit and uses: slurm_debug slurm_error spank_get_item spank_context // perhaps unnessarily Is that still simple enough? Bob On Fri, 1 May 2015, Moe Jette wrote: If your plugin is as simple as you describe, then you probably

[slurm-dev] Re: Spreading Job Array Across Different Nodes

2015-05-01 Thread Moe Jette
motivation for this is network-related. It could be advantageous to spread job arrays across multiple nodes (and, more importantly, multiple racks/broods connected to different switches) if the tasks are network-bound. Thanks, Will -- Morris "Moe" Jette CTO, SchedMD LLC Commer

[slurm-dev] Slurm User Group Meeting 2015, CFP

2015-05-05 Thread Moe Jette
2015: Slurm User Group Meeting 2015 *Program Committee:* Yiannis Georgiou (Bull) Brian Gilmer (Cray) Matthieu Hautreux (CEA) Morris Jette (SchedMD) Bruce Pfaff (NASA Goddard Space Flight Center) Tim Wickberg (The George Washington University) -- Morris "Moe" Jette CTO, SchedMD LLC Commer

[slurm-dev] Re: slurmd on first node not responding and is running

2015-05-06 Thread Moe Jette
available (down or drained). The issue is in Munge's configuration, which Slurm user for authentication. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Unexpected change between versions 14.11.3 and 14.11.6

2015-05-07 Thread Moe Jette
n Nodes=h1 Default=YES Shared=NO DefaultTime=00:00:01 MaxTime=14400 MaxNodes=1 State=UP -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Issues with cons_res

2015-05-07 Thread Moe Jette
edulerPort=7321 SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory Is there some configuration I'm missing? Thank you! David -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: bug fix in task error reporting

2015-05-08 Thread Moe Jette
(): %s: %m", task->argv[0]); exit(errno); } -- Jon Nelson Dyn / Senior Software Engineer p. +1 (603) 263-8029 -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: (Custom) warnings from job_submit.lua?

2015-05-11 Thread Moe Jette
ng: foobar Is that possible? (This is slurm 14.03.7, btw.) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Dynamically change job over time

2015-05-12 Thread Moe Jette
configured within SLURM. Thanks so much, Alejandro -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Elastic Computing Question

2015-05-14 Thread Moe Jette
do this simply. Thanks! Eric -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Elastic Computing Question

2015-05-14 Thread Moe Jette
I should be able to try that out once I get the latest code installed. Eric On 5/14/15 3:10 PM, Moe Jette wrote: There were some changes made to Slurm version 15.08 to support this type of problem, but they are not available with earlier versions. With the new version (not yet released), you

[slurm-dev] Re: How to set a resource limit for every user in a partition

2015-05-22 Thread Moe Jette
be helpful. Many thanks. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Polling resource allocations on compute nodes locally

2015-05-22 Thread Moe Jette
regards, Olli-Pekka= -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm User Group Meeting 2015, Abstracts for talks due 1 June

2015-05-25 Thread Moe Jette
) -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: scancel job_id.step_id fails in Slurm 14.11.3

2015-05-27 Thread Moe Jette
Riebs Hewlett-Packard Company High Performance Computing +1 404 648 9024 My opinions are not necessarily those of HP -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm User Group Meeting, CFP extension

2015-06-01 Thread Moe Jette
Due to several requests, the deadline for submitting abstracts to the Slurm User Group meeting has been extended to June 5. Meeting information is available here: http://slurm.schedmd.com/slurm_ug_cfp.html -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Messing with job checkpointing

2015-06-02 Thread Moe Jette
ID SPAIN -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Limit user to use not more than N number of ntasks/cpu's on specific partition

2015-06-08 Thread Moe Jette
like that so there will be no situation that one user is using all resources in the partition. I want that my partition will not allow to user use more than N number of cpus per day or per partition. Thanks, Igor. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Set 1 job per core.

2015-06-09 Thread Moe Jette
/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/site/slurm/log/slurmd.log -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Slow powering up nodes seen as rebooted nodes (ReturnToService=1)

2015-06-10 Thread Moe Jette
f (reg_msg->job_count) { node_ptr->node_state = NODE_STATE_ALLOCATED | node_flags; Didier -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Question about running interactive job on all cores on a list of heterogeneous nodes.

2015-06-11 Thread Moe Jette
ive job on all cores on a list of heterogeneous nodes (meaning that the nodes in the list may have different numbers of cores). Anyone out there know how to accomplish this? Thanks in advance... -- I may be inconsistent, but at least I'm consistently inconsistent. -- Morris "Moe"

[slurm-dev] Re: Scheduling of GPU resources

2015-06-22 Thread Moe Jette
/etc/slurm-llnl/gres.conf Any help with how to achieve my desired setup would be greatly appreciated, because I am unsure how to carry on with troubleshooting. Best, Antonia -- Dr. Antonia Mey University of Edinburgh Department of Chemistry Joseph Black Building -- Morris "Moe&q

[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Moe Jette
uting, University of Oslo -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Job name truncated in email

2015-06-24 Thread Moe Jette
al Genomics Tomtebodavägen 23A SE-171 65 Solna, Sweden Email: kenny.bill...@scilifelab.se -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Job name truncated in email

2015-06-24 Thread Moe Jette
e's output, it is truncated in the same way, but not when I issue `scontrol show jobid -d` wkr, Kenny On 24 June 2015 at 16:57, Moe Jette wrote: There is no truncation in current code (src/slurmctld/agent.c): if (job_ptr->array_task_id != NO_VAL) { mi->mes

[slurm-dev] Re: [RFC PATCH] srun: Enable output processing on stdout in pty mode

2015-07-01 Thread Moe Jette
. Dr. Sebastian M. Schmidt -------- -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: How to use job steps for set of single-core runs?

2015-07-03 Thread Moe Jette
allocation of 2 cores. #! /bin/bash #SBATCH -n2 srun -n1 prog1 & srun -n1 prog2 & srun -n1 prog3 & srun -n1 prog4 & wait Do the 3rd and 4th job steps wait to start until the 1st and/or 2nd step is completed? Thanks -- Morris "Moe" Jette CTO, Sc

[slurm-dev] Re: job requeued in held state

2015-07-07 Thread Moe Jette
esume) from uid=0 [2015-07-06T20:31:18.469] _slurm_rpc_suspend(resume) for 8 Job is pending execution What we can do to continue execution without breaking or cansel? -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: PATCH - update job QOS before partition

2015-07-07 Thread Moe Jette
ademy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu Jabber: treyd...@tamu.edu -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: PATCHES - job_submit/lua plugin

2015-07-07 Thread Moe Jette
Only valid for master (15.08pre) branch. - Trey = Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu Jabber: treyd...@tamu.edu -- Morris "Moe&quo

[slurm-dev] Slurm versions 14.11.8 and 15.08.0-pre6 are now available

2015-07-07 Thread Moe Jette
for authentication. -- job_submit/lua: Add default_qos fields. Add job record qos. Add partition record allow_qos and qos_char fields. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Slurm versions 14.11.8 and 15.08.0-pre6 are now available

2015-07-08 Thread Moe Jette
e, On 7/7/15 7:04 PM, Moe Jette wrote: -- Backfill scheduler: The configured backfill_interval value (default 30 seconds) is now interpreted as a maximum run time for the backfill scheduler. Once reached, the scheduler will build a new job queue and start over, even if not all jobs

[slurm-dev] Re: Subject: Avoid segfault in delete_resv() caused by invalid RPC

2015-07-27 Thread Moe Jette
of. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------ -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Sl

[slurm-dev] Re: --comment on srun within salloc

2015-07-31 Thread Moe Jette
ks fine, and "salloc --comment" also works fine. However, "srun --comment" from within an salloc appears as an empty field when viewed with sacct. Martin -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support ===

[slurm-dev] Re: Non-default settings of AuthInfo not being consistently propagated

2015-08-13 Thread Moe Jette
_cred; - auth_cred = g_slurm_auth_create(NULL, 2, NULL); + auth_cred = g_slurm_auth_create(NULL, 2, slurm_get_auth_info()); if (auth_cred == NULL) { error("authentication: %s", g_slurm_auth_errstr(g_slurm_auth_errno(NULL)) ); Best regards, Daniel Ahli

[slurm-dev] Re: Viewing Status of Inactive Cloud Nodes (sinfo?)

2015-08-14 Thread Moe Jette
to go get some nodes in order to have them) but it is confusing to my users. Is there some way that users can ask sinfo to report on cloud nodes that are powered down? Eric -- Morris "Moe" Jette CTO, SchedMD LLC Commercial

[slurm-dev] Slurm version 15.08.0-rc1 is now available

2015-08-20 Thread Moe Jette
pos If you would like to find out more about these new features and others, please join us at the Slurm User Group meeting: http://slurm.schedmd.com/slurm_ug_agenda.html -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support =

[slurm-dev] Re: sview: Unselect job?

2015-08-21 Thread Moe Jette
corresponding to the job are shown. With Slurm 2.4.5 I could unselect the job by pressing either Alt or Ctl while clicking the job. With Slurm 14.11.08 this doesn't seem to work. Any ideas? Loris -- This signature is currently under construction. -- Morris "Moe" Jette C

[slurm-dev] Re: Non-default settings of AuthInfo not being consistently propagated

2015-08-21 Thread Moe Jette
tions. However, I have not tested these two patches and probably each replacement should be checked by someone who knows more of the surrounding context. Best regards, Daniel On Fri, Aug 14, 2015 at 12:57 AM, Moe Jette wrote: Hi Daniel, You seem to have found two places where the AuthInf

[slurm-dev] Re: srun unable to start in tight loop

2015-08-24 Thread Moe Jette
leep 2". The exact number of successful runs varies by 10 or 20. Am I using up some resource with each run? For completeness, I am running on a Cray system with SLURM 14.11.8 Thanks, Bob -- Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227 -- Morris "Moe&q

[slurm-dev] Re: scontrol command allows all the users to see all the job detail

2015-08-27 Thread Moe Jette
s. It is being hijacked for political and monetary gains." -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting, 15-16 September 2015, Washington D.C. http://slurm.schedmd.com/slurm_ug_agenda.html

[slurm-dev] Re: MPI-OpenMP jobs on SLURM fail ORTE_ERROR_LOG: Not found in file ess_slurmd_module.c

2015-08-27 Thread Moe Jette
CoresPerSocket=8 ThreadsPerCore=1 State=UNKNOWN NodeName=DEFAULT Procs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=129009 State=UNKNOWN NodeName=erik[001-044] PartitionName=erik Nodes=erik[001-044] Default=YES MaxTime=INFINITE State=UP -- Morr

  1   2   3   4   5   6   >