[slurm-users] unable to kill namd3 process

2023-04-25 Thread Shaghuf Rahman
Hi,

We are facing one issue in my environment and the behaviour looks strange
to me. It is specifically associated with the namd3 application.
The issue is narrated below and I have made some of the cases.

I am trying to understand the way to kill the processes of the namd3
application submitted through sbatch without making the node in drain.

What I observed is when a user submits a single job on a node and then when
he do scancel of namd3 job it kills the job and the node gets to idle state
and everything looks as expected.
But when the user submit multiple jobs on a single node and do scancel 1 of
his job, it puts the node in drain state. However the other jobs are
running fine without an issue.

Due to this issue multiple nodes getting to drain state when a user
do scancel of the namd3 job.

Note: When the user is not performing scancel, all job run successfully and
the node states are also fine.

It is not creating issues with any of the applications. So we are
suspecting the issue could be with the namd3 application
Kindly suggest some solution or any ideas on how to fix this issue.

Thanks in advance,
Shaghuf Rahman


Re: [slurm-users] unable to kill namd3 process

2023-04-25 Thread Shaghuf Rahman
Hi,

Also forgot to mention the process is still running when user do scancel
and epilog does not clean if one job finished when doing multiple job
submission.
We tried to use unkillable option but did not work. The process still
remains the same until killing it manually.



On Tue, 25 Apr 2023 at 19:57, Shaghuf Rahman  wrote:

> Hi,
>
> We are facing one issue in my environment and the behaviour looks strange
> to me. It is specifically associated with the namd3 application.
> The issue is narrated below and I have made some of the cases.
>
> I am trying to understand the way to kill the processes of the namd3
> application submitted through sbatch without making the node in drain.
>
> What I observed is when a user submits a single job on a node and then
> when he do scancel of namd3 job it kills the job and the node gets to idle
> state and everything looks as expected.
> But when the user submit multiple jobs on a single node and do scancel 1
> of his job, it puts the node in drain state. However the other jobs are
> running fine without an issue.
>
> Due to this issue multiple nodes getting to drain state when a user
> do scancel of the namd3 job.
>
> Note: When the user is not performing scancel, all job run successfully
> and the node states are also fine.
>
> It is not creating issues with any of the applications. So we are
> suspecting the issue could be with the namd3 application
> Kindly suggest some solution or any ideas on how to fix this issue.
>
> Thanks in advance,
> Shaghuf Rahman
>
>


[slurm-users] scanceling a job puts the node in a draining state

2023-04-25 Thread Patrick Goetz

Hi -

This was a known bug:  https://bugs.schedmd.com/show_bug.cgi?id=3941

However, the bug report says this was fixed in version 17.02.7.

The problem is we're running version 17.11.2, but appear to still have 
this bug going on:


[2023-04-18T17:09:42.482] _slurm_rpc_kill_job: REQUEST_KILL_JOB job 
163837 uid 38879
[2023-04-18T17:09:42.482] email msg to sim...@gmail.com: SLURM 
Job_id=163837 Name=clip_v3_1view_s3dis_mink_crop_075 Ended, Run time 
00:37:37, CANCELLED, ExitCode 0
[2023-04-18T17:09:45.104] _slurm_rpc_submit_batch_job: JobId=163843 
InitPrio=43243 usec=267
[2023-04-18T17:10:33.057] Resending TERMINATE_JOB request JobId=163837 
Nodelist=dgx-4
[2023-04-18T17:10:48.244] error: slurmd error running JobId=163837 on 
node(s)=dgx-4: Kill task failed

[2023-04-18T17:10:48.244] drain_nodes: node dgx-4 state set to DRAIN
[2023-04-18T17:10:53.524] cleanup_completing: job 163837 completion 
process took 71 seconds



That particular node is still in a draining state a week later. Just 
wondering if I'm missing something.




Re: [slurm-users] Terminating Jobs based on GrpTRESMins

2023-04-25 Thread Hoot Thompson
So Ole, any thoughts on the config info I sent? 

I’m still not certain if terminating a running job based on GrpTRESMins is even 
possible or supposed to work.

Hoot


> On Apr 24, 2023, at 3:21 PM, Hoot Thompson  wrote:
> 
> See below…...
> 
>> On Apr 24, 2023, at 1:55 PM, Ole Holm Nielsen  
>> wrote:
>> 
>> On 24-04-2023 18:33, Hoot Thompson wrote:
>>> In my reading of the Slurm documentation, it seems that exceeding the 
>>> limits set in GrpTRESMins should result in terminating a running job. 
>>> However, in testing this, The ‘current value’ of the GrpTRESMins only 
>>> updates upon job completion and is not updated as the job progresses. 
>>> Therefore jobs aren’t being stopped. On the positive side, no new jobs are 
>>> started if the limit is exceeded. Here’s the documentation that is 
>>> confusing me…..
>> 
>> I think the jobs resource usage will only be added to the Slurm database 
>> upon job completion.  I believe that Slurm doesn't update the resource usage 
>> continually as you seem to expect.
>> 
>>> If any limit is reached, all running jobs with that TRES in this group will 
>>> be killed, and no new jobs will be allowed to run.
>>> Perhaps there is a setting or misconfiguration on my part.
>> 
>> The sacctmgr manual page states:
>> 
>>> GrpTRESMins=TRES=[,TRES=,...]
>>> The total number of TRES minutes that can possibly be used by past, present 
>>> and future jobs running from this association and its children.  To clear a 
>>> previously set value use the modify command with a new value of -1 for each 
>>> TRES id.
>>> NOTE: This limit is not enforced if set on the root association of a 
>>> cluster.  So even though it may appear in sacctmgr output, it will not be 
>>> enforced.
>>> ALSO NOTE: This limit only applies when using the Priority Multifactor 
>>> plugin.  The time is decayed using the value of PriorityDecayHalfLife or 
>>> PriorityUsageResetPeriod as set in the slurm.conf.  When this limit is 
>>> reached all associated jobs running will be killed and all future jobs 
>>> submitted with associations in the group will be delayed until they are 
>>> able to run inside the limit.
>> 
>> Can you please confirm that you have configured the "Priority Multifactor" 
>> plugin?
> Here’s relevant items from slurm.conf
> 
> 
> # Activate the Multifactor Job Priority Plugin with decay
> PriorityType=priority/multifactor
>  
> # apply no decay
> PriorityDecayHalfLife=0
>  
> # reset usage after 1 month
> PriorityUsageResetPeriod=MONTHLY
>  
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
>  
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
>  
> # This next group determines the weighting of each of the
> # components of the Multifactor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=1000
> PriorityWeightFairshare=1
> PriorityWeightJobSize=1000
> PriorityWeightPartition=1000
> PriorityWeightQOS=0 # don't use the qos factor
> 
> 
>> 
>> Your jobs should not be able to start if the user's GrpTRESMins has been 
>> exceeded.  Hence they won't be killed!
> 
> Yes, this works fine
>> 
>> Can you explain step by step what you observe?  It may be that the above 
>> documentation of killing jobs is in error, in which case we should make a 
>> bug report to SchedMD.
> 
> I set the GrpTRESMins limit to a very small number and then ran a sleep job 
> that exceeded the limit. The job continued to run past the limits until I 
> killed it. It was the only job in the queue. And if it makes any difference, 
> this testing is being done in AWS on a parallel cluster.
>> 
>> /Ole



[slurm-users] Questions about SLURM configure options

2023-04-25 Thread Elliott Slaughter
Hi,

I have some questions about SLURM configuration options:

--with-pmix

I was confused about this because I thought that SLURM had its own
first-party PMIx implementation, but I can't see a configuration option to
control it. (And also, building without this option does not appear to
generate a PMIx library.)

I was going to build against https://github.com/openpmix/openpmix, is that
how people normally do this?

--with-nvml

Is this required for GPU binding, or does it do something else? What would
I lose if I don't use this?

--with-hwloc

Similarly, I know what hwloc does, but what specific impact does this have
on SLURM? Do I lose CPU core binding if I don't compile this?

Thanks.

-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay


Re: [slurm-users] Questions about SLURM configure options

2023-04-25 Thread Elliott Slaughter
Oh, and also, does it matter if hwloc itself is built against CUDA/NVML or
not? Will SLURM take advantage of that functionality if available, and if
so what capabilities does it add?

On Tue, Apr 25, 2023 at 8:28 PM Elliott Slaughter 
wrote:

> Hi,
>
> I have some questions about SLURM configuration options:
>
> --with-pmix
>
> I was confused about this because I thought that SLURM had its own
> first-party PMIx implementation, but I can't see a configuration option to
> control it. (And also, building without this option does not appear to
> generate a PMIx library.)
>
> I was going to build against https://github.com/openpmix/openpmix, is
> that how people normally do this?
>
> --with-nvml
>
> Is this required for GPU binding, or does it do something else? What would
> I lose if I don't use this?
>
> --with-hwloc
>
> Similarly, I know what hwloc does, but what specific impact does this have
> on SLURM? Do I lose CPU core binding if I don't compile this?
>
> Thanks.
>
> --
> Elliott Slaughter
>
> "Don't worry about what anybody else is going to do. The best way to
> predict the future is to invent it." - Alan Kay
>


-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay


Re: [slurm-users] Questions about SLURM configure options

2023-04-25 Thread Paul H. Hargrove
Elliot,

The proper use of `--with-pmix` is documented at
https://slurm.schedmd.com/mpi_guide.html

I don't know anything about your other questions.

-Paul

On Tue, Apr 25, 2023 at 8:52 PM Elliott Slaughter 
wrote:

> Oh, and also, does it matter if hwloc itself is built against CUDA/NVML or
> not? Will SLURM take advantage of that functionality if available, and if
> so what capabilities does it add?
>
> On Tue, Apr 25, 2023 at 8:28 PM Elliott Slaughter <
> slaugh...@cs.stanford.edu> wrote:
>
>> Hi,
>>
>> I have some questions about SLURM configuration options:
>>
>> --with-pmix
>>
>> I was confused about this because I thought that SLURM had its own
>> first-party PMIx implementation, but I can't see a configuration option to
>> control it. (And also, building without this option does not appear to
>> generate a PMIx library.)
>>
>> I was going to build against https://github.com/openpmix/openpmix, is
>> that how people normally do this?
>>
>> --with-nvml
>>
>> Is this required for GPU binding, or does it do something else? What
>> would I lose if I don't use this?
>>
>> --with-hwloc
>>
>> Similarly, I know what hwloc does, but what specific impact does this
>> have on SLURM? Do I lose CPU core binding if I don't compile this?
>>
>> Thanks.
>>
>> --
>> Elliott Slaughter
>>
>> "Don't worry about what anybody else is going to do. The best way to
>> predict the future is to invent it." - Alan Kay
>>
>
>
> --
> Elliott Slaughter
>
> "Don't worry about what anybody else is going to do. The best way to
> predict the future is to invent it." - Alan Kay
>


-- 
Paul H. Hargrove 
Pronouns: he, him, his
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory