[slurm-users] Building Slurm debian package vs building from source

2024-05-22 Thread Arnuld via slurm-users
We have several nodes, most of which have different Linux distributions
(distro for short). Controller has a different distro as well. The only
common thing between controller and all the does is that all of them ar
x86_64.

I can install Slurm using package manager on all the machines but this will
not work because controller will have a different version of Slurm compared
to the nodes (21.08 vs 23.11)

If I build from source then I see two solutions:
 - build a deb package
 - build a custom package (./configure, make, make install)

Building a debian package on the controller and then distributing the
binaries on nodes won't work either because that binary will start looking
for the shared libraries that it was built for and those don't exist on the
nodes.

So the only solution I have is to build a static binary using a custom
package. Am I correct or is there another solution here?

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Shunran Zhang via slurm-users
Hi Arnuld

It is most important to keep the Slurm version the same across the board.

As you are mentioning the "deb" package I am assuming all of your nodes are
of a debian-based distribution that should be close enough for each other.
However, Debian based distros are not as "binary compatible" as RHEL based
distros (Say, RHEL, Alma, Rocky, CentOS, Oracle, Fedora etc.), thus even
though they all use "deb" package, it would be better to avoid sharing deb
across different distros.

If all of your distros have a similar package version for the dependencies
(say, at least on glibc level), except for different way to name a package
(e.g. apache2 - httpd), that would potentially allow you to run the same
slurm on other distros. In this case, you may work around them by using the
DEBIAN/control Depends field to list all of the potential names for each
dependency.

Static linking packages or using a conda-like environment may help you more
if those distros are more different and require a rebuild per distro.
Otherwise, it would probably make more sense to just build them on each and
every node based on the feature they need (say, ROCm or nvml makes no sense
on a node without such devices).

More complex structure does indeed require more maintenance work. I got
quite tired of it and decided to just ship with RHEL-family OS for all
computer nodes and let those who are  more familiar with whatever distro to
start one up with singularity or docker by themselves.

Sincerely,

S. Zhang

2024年5月22日(水) 17:11 Arnuld via slurm-users :

> We have several nodes, most of which have different Linux distributions
> (distro for short). Controller has a different distro as well. The only
> common thing between controller and all the does is that all of them ar
> x86_64.
>
> I can install Slurm using package manager on all the machines but this
> will not work because controller will have a different version of Slurm
> compared to the nodes (21.08 vs 23.11)
>
> If I build from source then I see two solutions:
>  - build a deb package
>  - build a custom package (./configure, make, make install)
>
> Building a debian package on the controller and then distributing the
> binaries on nodes won't work either because that binary will start looking
> for the shared libraries that it was built for and those don't exist on the
> nodes.
>
> So the only solution I have is to build a static binary using a custom
> package. Am I correct or is there another solution here?
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] SLUG Call for Papers Deadline

2024-05-22 Thread Victoria Hobson via slurm-users
Slurm User Group (SLUG) 2024 is set for September 12-13 at the
University of Oslo in Oslo, Norway.

Registration information and a high-level schedule can be found
here:https://slug24.splashthat.com/

The deadline to submit a presentation abstract is Friday, May 31st. We
do not intend to extend this deadline.

If you are interested in presenting your own usage, developments, site
report, tutorial, etc about Slurm, please fill out the following
form:https://forms.gle/N7bFo5EzwuTuKkBN7

Notifications of final presentations accepted will go out by Friday, June 14th.

--
Victoria Hobson
SchedMD LLC
Vice President of Marketing

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Brian Andrus via slurm-users
Not that I recommend it much, but you can build them for each 
environment and install the ones needed in each.


A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for the 
ones that have them.


Generally, so long as versions are compatible, they can work together. 
You will need to be aware of differences for jobs and configs, but it is 
possible.


Brian Andrus

On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
We have several nodes, most of which have different Linux 
distributions (distro for short). Controller has a different distro as 
well. The only common thing between controller and all the does is 
that all of them ar x86_64.


I can install Slurm using package manager on all the machines but this 
will not work because controller will have a different version of 
Slurm compared to the nodes (21.08 vs 23.11)


If I build from source then I see two solutions:
 - build a deb package
 - build a custom package (./configure, make, make install)

Building a debian package on the controller and then distributing the 
binaries on nodes won't work either because that binary will start 
looking for the shared libraries that it was built for and those don't 
exist on the nodes.


So the only solution I have is to build a static binary using a custom 
package. Am I correct or is there another solution here?




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: memory high water mark reporting

2024-05-22 Thread Cutts, Tim via slurm-users
Users can, of course always just wrap the job itself in time  to record the 
maximum memory usage.  Bit of a naïve approach but it does work.  I agree the 
polling of current usage is not very satisfactory.

Tim

--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue |


From: greent10--- via slurm-users 
Date: Monday, 20 May 2024 at 12:10
To: Emyr James , Davide DelVento 
Cc: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: memory high water mark reporting
Hi,

We have had similar questions from users regarding how best to find out the 
high memory peak of a job since they may run a job and get a not very useful 
value for variables in sacct such as the MaxRSS since Slurm didn’t poll during 
the use of its maximum memory usage.

With Cgroupv1 looking online it looks like memory.max_usage_in_bytes takes into 
account caches so can vary on how much I/O is used whilst total_rss in 
memory.stats looks more useful maybe. Maybe memory.peak is clearer?

Its not clear in the documentation how a user should in the sacct values to 
infer the actual usage of jobs to correct their behaviour in future submissions.

I would be keen to see improvements in high water mark reporting.  I noticed 
that the jobacctgather plugin documentation was deleted back in Slurm 21.08 – 
Spank plugin does possibly look like the way to go.  Also it seems a common 
problem across technologies e.g. 
https://github.com/google/cadvisor/issues/3286

Tom

From: Emyr James via slurm-users 
Date: Monday, 20 May 2024 at 10:50
To: Davide DelVento , Emyr James 
Cc: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: memory high water mark reporting
External email to Cardiff University - Take care when replying/opening 
attachments or links.
Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor 
atodiadau neu ddolenni.

Looking here :

https://slurm.schedmd.com/spank.html#SECTION_SPANK-PLUGINS

It looks like it's possible to hook something in at the right place using the 
slurm_spank_task_exit or slurm_spank_exit plugins. Does anyone have any 
experience or examples of doing this ? Is there any more documentation 
available on this functionality ?

Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation


From: Emyr James via slurm-users 
Sent: 17 May 2024 01:15
To: Davide DelVento 
Cc: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: memory high water mark reporting

Hi,

I have got a very simple LD_PRELOAD that can do this. Maybe I should see if I 
can force slurmstepd to be run with that LD_PRELOAD and then see if that does 
it.

Ultimately am trying to get all the useful accounting metrics into a clickhouse 
database. If the LD_PRELOAD on slurmstepd seems to work then I can expand it to 
insert the relevant row into the clickhouse DB in the C code of the preload 
library.

But still...this seems like a very basic thing to do and am very suprised that 
it seems so difficult to do this with the standard accounting recording out of 
the box.

Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation


From: Davide DelVento 
Sent: 17 May 2024 01:02
To: Emyr James 
Cc: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] memory high water mark reporting

Not exactly the answer to your question (which I don't know) but if you can get 
to prefix whatever is executed with this 
https://github.com/NCAR/peak_memusage
 (which also uses getrusage) or a variant you will be able to do that.

On Thu, May 16, 2024 at 4:10 PM Emyr James via slurm-users 
mailto:slurm-users@lists.schedmd.com>> wrote:
Hi,

We are trying out slurm having been running grid engine for a long while.
In grid engine, the cgroups peak memory and max_rss are generated at the end of 
a job and recorded. It logs the information from the cgroup hierarchy as well 
as doing a getrusage call right at the end on the parent pid of the whole job 
"container" before cleaning up.
With slurm it seems that the only way memory is recorded is by the acct gather 
polling. I am trying to add something in an epilog script to get the 
memory.peak but It looks like the cgroup hierarchy has been destroyed by the 
time the epilog is run.
Where in the code is the cgroup hierarchy cleared up ? Is there no way to add 
something in so that the accounting is updated during the job cleanup process 
so that peak memory usage can be accurately logged ?

I can reduce the polling interval from 30s to 5s but don't know if this causes 
a lot 

[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Arnuld via slurm-users
> Not that I recommend it much, but you can build them for each
> environment and install the ones needed in each.

Oh cool, I will download the latest version 23.11.7 and build debian
packages on every machine then


> A simple example is when you have nodes with and without GPUs.
> You can build slurmd packages without for those nodes and with for the
> ones that have them.

I do have non-gpu machines.  I guess I need to learn to modify the debian
Control files for this


> Generally, so long as versions are compatible, they can work together.
> You will need to be aware of differences for jobs and configs, but it is
> possible.

you mean the versions of the dependencies are compatible?  It  is true for
most (like munge) but might not be true for others like (yaml or
http-parser). I need to check on that.


On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Not that I recommend it much, but you can build them for each
> environment and install the ones needed in each.
>
> A simple example is when you have nodes with and without GPUs.
> You can build slurmd packages without for those nodes and with for the
> ones that have them.
>
> Generally, so long as versions are compatible, they can work together.
> You will need to be aware of differences for jobs and configs, but it is
> possible.
>
> Brian Andrus
>
> On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
> > We have several nodes, most of which have different Linux
> > distributions (distro for short). Controller has a different distro as
> > well. The only common thing between controller and all the does is
> > that all of them ar x86_64.
> >
> > I can install Slurm using package manager on all the machines but this
> > will not work because controller will have a different version of
> > Slurm compared to the nodes (21.08 vs 23.11)
> >
> > If I build from source then I see two solutions:
> >  - build a deb package
> >  - build a custom package (./configure, make, make install)
> >
> > Building a debian package on the controller and then distributing the
> > binaries on nodes won't work either because that binary will start
> > looking for the shared libraries that it was built for and those don't
> > exist on the nodes.
> >
> > So the only solution I have is to build a static binary using a custom
> > package. Am I correct or is there another solution here?
> >
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Shunran Zhang via slurm-users

Hi Arnuld,

What I would probably do is to build one for each distro and install 
them either directly into /usr/local or using deb package.


The DEBIAN/control is used by apt to manage a couple of things, such as 
indexing so apt search shows what this package is for, which package it 
could replace, which packages are recommended to be installed with it, 
and which packages need to be installed before this can work.


For those machines with a certain brand of GPU, you would need a slurm 
that is configured and compiled with such option ON, and such device 
driver in the DEBIAN/control to allow apt to check the driver on the 
machine meets the requirement of your deb package. You can forget about 
the second part if you are not using deb packages and just compile - run 
the slurm on the client machine.


The last thing he mentioned is about the slurm versions. A slurm client 
of lower version (say 23.02) should be able to talk to a slurmctld of 
higher version (say 23.11) just fine, though the reverse do not apply. 
For dependency management it is of such complexity that maintaining a 
distribution of Linux is quite some work - I knew it as I am a 
maintainer of a Linux distro that uses dpkg packages, but without a 
debian root and uses a different cli tool etc.


In fact I am more worried about how the users would benefit from such a 
mixture of execution environments - a misstep in configuration or a user 
submitting job without specifying enough info on what they asks for 
would probably make the user's job works or does not work purely by 
chance of which node it got executed, and which environment the job's 
executables are built against. It probably need a couple of "similar" 
nodes to allow users benefiting from the job queue to send their job to 
the place where available.


Good luck with your setup

Sincerely,

S. Zhang

On 2024/05/23 13:04, Arnuld via slurm-users wrote:

> Not that I recommend it much, but you can build them for each
> environment and install the ones needed in each.

Oh cool, I will download the latest version 23.11.7 and build debian 
packages on every machine then



> A simple example is when you have nodes with and without GPUs.
> You can build slurmd packages without for those nodes and with for the
> ones that have them.

I do have non-gpu machines.  I guess I need to learn to modify the 
debian Control files for this



> Generally, so long as versions are compatible, they can work together.
> You will need to be aware of differences for jobs and configs, but it is
> possible.

you mean the versions of the dependencies are compatible?  It is true 
for most (like munge) but might not be true for others like (yaml or 
http-parser). I need to check on that.



On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users 
 wrote:


Not that I recommend it much, but you can build them for each
environment and install the ones needed in each.

A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for
the
ones that have them.

Generally, so long as versions are compatible, they can work
together.
You will need to be aware of differences for jobs and configs, but
it is
possible.

Brian Andrus

On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
> We have several nodes, most of which have different Linux
> distributions (distro for short). Controller has a different
distro as
> well. The only common thing between controller and all the does is
> that all of them ar x86_64.
>
> I can install Slurm using package manager on all the machines
but this
> will not work because controller will have a different version of
> Slurm compared to the nodes (21.08 vs 23.11)
>
> If I build from source then I see two solutions:
>  - build a deb package
>  - build a custom package (./configure, make, make install)
>
> Building a debian package on the controller and then
distributing the
> binaries on nodes won't work either because that binary will start
> looking for the shared libraries that it was built for and those
don't
> exist on the nodes.
>
> So the only solution I have is to build a static binary using a
custom
> package. Am I correct or is there another solution here?
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Arnuld via slurm-users
> n fact I am more worried about how the users would benefit
> from such a mixture of execution environments
> ...SNIIP

So what is an ideal setup?   Keep the same .deb distro on all machines and
use apt to install slurm on every machine?

On Thu, May 23, 2024 at 10:20 AM Shunran Zhang <
szh...@ngs.gen-info.osaka-u.ac.jp> wrote:

> Hi Arnuld,
>
> What I would probably do is to build one for each distro and install them
> either directly into /usr/local or using deb package.
>
> The DEBIAN/control is used by apt to manage a couple of things, such as
> indexing so apt search shows what this package is for, which package it
> could replace, which packages are recommended to be installed with it, and
> which packages need to be installed before this can work.
>
> For those machines with a certain brand of GPU, you would need a slurm
> that is configured and compiled with such option ON, and such device driver
> in the DEBIAN/control to allow apt to check the driver on the machine meets
> the requirement of your deb package. You can forget about the second part
> if you are not using deb packages and just compile - run the slurm on the
> client machine.
>
> The last thing he mentioned is about the slurm versions. A slurm client of
> lower version (say 23.02) should be able to talk to a slurmctld of higher
> version (say 23.11) just fine, though the reverse do not apply. For
> dependency management it is of such complexity that maintaining a
> distribution of Linux is quite some work - I knew it as I am a maintainer
> of a Linux distro that uses dpkg packages, but without a debian root and
> uses a different cli tool etc.
>
> In fact I am more worried about how the users would benefit from such a
> mixture of execution environments - a misstep in configuration or a user
> submitting job without specifying enough info on what they asks for would
> probably make the user's job works or does not work purely by chance of
> which node it got executed, and which environment the job's executables are
> built against. It probably need a couple of "similar" nodes to allow users
> benefiting from the job queue to send their job to the place where
> available.
>
> Good luck with your setup
>
> Sincerely,
>
> S. Zhang
> On 2024/05/23 13:04, Arnuld via slurm-users wrote:
>
> > Not that I recommend it much, but you can build them for each
> > environment and install the ones needed in each.
>
> Oh cool, I will download the latest version 23.11.7 and build debian
> packages on every machine then
>
>
> > A simple example is when you have nodes with and without GPUs.
> > You can build slurmd packages without for those nodes and with for the
> > ones that have them.
>
> I do have non-gpu machines.  I guess I need to learn to modify the debian
> Control files for this
>
>
> > Generally, so long as versions are compatible, they can work together.
> > You will need to be aware of differences for jobs and configs, but it is
> > possible.
>
> you mean the versions of the dependencies are compatible?  It  is true for
> most (like munge) but might not be true for others like (yaml or
> http-parser). I need to check on that.
>
>
> On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Not that I recommend it much, but you can build them for each
>> environment and install the ones needed in each.
>>
>> A simple example is when you have nodes with and without GPUs.
>> You can build slurmd packages without for those nodes and with for the
>> ones that have them.
>>
>> Generally, so long as versions are compatible, they can work together.
>> You will need to be aware of differences for jobs and configs, but it is
>> possible.
>>
>> Brian Andrus
>>
>> On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
>> > We have several nodes, most of which have different Linux
>> > distributions (distro for short). Controller has a different distro as
>> > well. The only common thing between controller and all the does is
>> > that all of them ar x86_64.
>> >
>> > I can install Slurm using package manager on all the machines but this
>> > will not work because controller will have a different version of
>> > Slurm compared to the nodes (21.08 vs 23.11)
>> >
>> > If I build from source then I see two solutions:
>> >  - build a deb package
>> >  - build a custom package (./configure, make, make install)
>> >
>> > Building a debian package on the controller and then distributing the
>> > binaries on nodes won't work either because that binary will start
>> > looking for the shared libraries that it was built for and those don't
>> > exist on the nodes.
>> >
>> > So the only solution I have is to build a static binary using a custom
>> > package. Am I correct or is there another solution here?
>> >
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com