On 4/10/24 10:41 pm, archisman.pathak--- via slurm-users wrote:
In our case, that node has been removed from the cluster and cannot be
added back right now ( is being used for some other work ). What can we
do in such a case?
Mark the node as "DOWN" in Slurm, this is what we do when we get job
In our case, that node has been removed from the cluster and cannot be added
back right now ( is being used for some other work ). What can we do in such a
case?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Could you give more details regarding this and how you debugged the same?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Yes. You can build the 8 rpms on 9. Look at 'mock' to do so. I did
similar when I still had to support EL7
Fairly generic plan, the devil is in the details and verifying each
step, but those are the basic bases you need to touch.
Brian Andrus
On 4/10/2024 1:48 PM, Steve Berg via slurm-users
I just finished migrating a few dozen blade servers from torque to
slurm. They're all running Alma 8 currently with the slurm that is
available from epel. I do want to get it all upgraded to running Alma 9
and the current version of slurm. Got one system set up as the
slurmctld system runnin
We have Weka filesystems on one of our clusters and saw this; we discovered we
had slightly misconfigured the weka client and the result was that Weka’s and
SLURMs cgroups were fighting with each other, and this seemed to be the result.
Fixing the weka cgroups config improved the problem, for u
Usually to clear jobs like this you have to reboot the node they are on.
That will then force the scheduler to clear them.
-Paul Edmon-
On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote:
We are running a slurm cluster with version `slurm 22.05.8`. One of our users
has reported t
On Tue, 2024-04-09 at 11:07:32 -0700, Slurm users wrote:
> Hi everyone, I'm conducting some tests. I've just set up SLURM on the head
> node and haven't added any compute nodes yet. I'm trying to test it to
> ensure it's working, but I'm encountering an error: 'Nodes required for the
> job are DOWN
Various options that might help reduce job fragmentation.
Turn up debugging on slurmctld and add the DebugFlags like TraceJobs,
SelectType, and Steps. With debugging set high enough one can see a good bit of
the logic in regard to node selection.
CR_LLN Schedule
Is here anybody having nice visualization of JobComp and JobacctGather
data in Grafana?
I save JobComp data in Elasticsearch, JobacctGather data in influxDB,
and thinking about how to provide meaningful insights to $users.
Things I'd like to show..: especially memory & cpu utilization, job
r
10 matches
Mail list logo