[slurm-users] The issue in the distribution of job

2024-08-09 Thread Sundaram Kumaran via slurm-users
Dear All,
May I have your suggestion in my issue facing,
While the job is launched using "salloc -N4--mem 4000 -p active"  I find the 
job is running in the one compute node and the other 3 machines are free, I 
don`t find the job is distributed evenly, May I have your suggestion,
I do squeue /scontrol to find the job distribution and it displays the 4 
machines but when I check on the respective machines I don`t find the job 
running only one machine takes the whole node,
Is there any issue in my conf file or what needs to be done, May I have your 
suggestion pls.

FYI, salloc -N4 --mem 4000 -p active
[cid:image001.png@01DAE8B4.215FBA50]
 While using TOP,
I find only Debussy is used heavily, I don`t find my job is evenly distributed, 
May I have your guidance pls.


[cid:image002.jpg@01DAEA98.2573DE10]




Regards,
KumaranS


This e-mail and any attachments are only for the use of the intended recipient 
and may contain material that is confidential, privileged and/or protected by 
the Official Secrets Act. If you are not the intended recipient, please delete 
it or notify the sender immediately. Please do not copy or use it for any 
purpose or disclose the contents to any other person.
#
# See the slurm.conf man page for more information.
#
# Legacy configuration
#ControlMachine=wagner
#ControlAddr=10.218.28.8
#BackupController=brahms
#BackupAddr=10.218.28.7

# New configuration
#SlurmctldHost=wagner
#ControlAddr=wagner:10.218.28.8
#SlurmctldHost=brahms
#ControlAddr=brahms:10.218.28.7
#SlurmctldHost=ravel
#ControlAddr=ravel:10.218.28.73
#SlurmctldHost=verdi
#ControlAddr=verdi:10.218.28.74

# New configuration
SlurmctldHost=wagner(10.218.28.8)
SlurmctldHost=brahms(10.218.28.7)
#SlurmctldHost=ravel(10.218.28.73)
#SlurmctldHost=verdi(10.218.28.74)
#SlurmctldHost=debussy(10.218.28.208)
#SlurmctldHost=schubert(10.218.28.207)
#SlurmctldHost=vivaldi(10.218.28.205)


AuthType=auth/munge
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=99
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/true
MaxJobCount=1
MaxStepCount=4
MaxTasksPerNode=512  # Maximum tasks per node (this is a count, not a memory 
unit)
#MaxTasksPerNode=128  # Maximum tasks per node (this is a count, not a memory 
unit)
#MpiDefault=pmix
MpiDefault=pmi2
#MpiParams=ports=#-#
PluginDir=/usr/local/lib:/usr/local/lib/slurm:/usr/lib:/lib
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
PrologFlags=x11
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
PropagateResourceLimitsExcept=MEMLOCK
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm/d
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool/slurm/ctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
# Resource Limits and Defaults
DefMemPerCPU=2048   # 2048 MB = 2 GB per CPU
MaxMemPerCPU=8192   # 8192 MB = 8 GB per CPU
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
#SelectType=select/linear
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
#OverSubscribe=FORCE:5
LaunchParameters=use_interactive_step
#
#
# JOB PRIORITY
#PriorityFlags=
PriorityType=priority/multifactor
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
# This next group determines the weighting of each of the
# components of the Multifactor Job Priority Plugin.
# The default value for each of the following is 1.
PriorityWeightAge=1000
PriorityWeightFairshare=1
PriorityWeightJobSize=1000
PriorityWeightPartition=1000
PriorityWeightQOS=0 # don't use the qos factor
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
#AccountingStoreJobComment=YES
ClusterName=cluster
DebugFlags=NO_CONF_HASH
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/

[slurm-users] Re: The issue in the distribution of job

2024-08-09 Thread Renfro, Michael via slurm-users
It may be difficult to narrow down the problem without knowing what commands 
you're running inside the salloc session. For example, if it's a pure OpenMP 
program, it can't use more than one node.

From: Sundaram Kumaran via slurm-users 
Sent: Friday, August 9, 2024 7:10:16 AM
To: slurm-us...@schedmd.com 
Subject: [slurm-users] The issue in the distribution of job


External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.



Dear All,

May I have your suggestion in my issue facing,

While the job is launched using “salloc -N4--mem 4000 -p active”  I find the 
job is running in the one compute node and the other 3 machines are free, I 
don`t find the job is distributed evenly, May I have your suggestion,

I do squeue /scontrol to find the job distribution and it displays the 4 
machines but when I check on the respective machines I don`t find the job 
running only one machine takes the whole node,

Is there any issue in my conf file or what needs to be done, May I have your 
suggestion pls.



FYI, salloc -N4 --mem 4000 -p active

[cid:image001.png@01DAE8B4.215FBA50]

 While using TOP,

I find only Debussy is used heavily, I don`t find my job is evenly distributed, 
May I have your guidance pls.





[cid:image002.jpg@01DAEA98.2573DE10]









Regards,

KumaranS





This e-mail and any attachments are only for the use of the intended recipient 
and may contain material that is confidential, privileged and/or protected by 
the Official Secrets Act. If you are not the intended recipient, please delete 
it or notify the sender immediately. Please do not copy or use it for any 
purpose or disclose the contents to any other person.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Print Slurm Stats on Login

2024-08-09 Thread Paul Edmon via slurm-users
We are working to make our users more aware of their usage. One of the 
ideas we came up with was to having some basic usage stats printed at 
login (usage over past day, fairshare, job efficiency, etc). Does anyone 
have any scripts or methods that they use to do this? Before baking my 
own I was curious what other sites do and if they would be willing to 
share their scripts and methodology.


-Paul Edmon-


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-09 Thread Jeffrey T Frey via slurm-users
You'd have to do this within e.g. the system's bashrc infrastructure.  The 
simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have 
some canned commands/scripts running.  That does introduce load to the system 
and Slurm on every login, though, and slows the startup of login shells based 
on how responsive slurmctld/slurmdbd are at that moment.

Another option would be to run the commands/scripts for all users on some timed 
schedule — e.g. produce per-user stats every 30 minutes.  So long as the stats 
are publicly-visible anyway, put those summaries in a shared file system with 
open read access.  Name the files by uid number.  Now your /etc/profile.d 
script just cat's ${STATS_DIR}/$(id -u).




> On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users 
>  wrote:
> 
> We are working to make our users more aware of their usage. One of the ideas 
> we came up with was to having some basic usage stats printed at login (usage 
> over past day, fairshare, job efficiency, etc). Does anyone have any scripts 
> or methods that they use to do this? Before baking my own I was curious what 
> other sites do and if they would be willing to share their scripts and 
> methodology.
> 
> -Paul Edmon-
> 
> 
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-09 Thread Paul Edmon via slurm-users
Yeah, I was contemplating doing that so I didn't have a dependency on 
the scheduler being up or down or busy.


What I was more curious about is if any one had an prebaked scripts for 
that.


-Paul Edmon-

On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:

You'd have to do this within e.g. the system's bashrc infrastructure.  The 
simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have 
some canned commands/scripts running.  That does introduce load to the system 
and Slurm on every login, though, and slows the startup of login shells based 
on how responsive slurmctld/slurmdbd are at that moment.

Another option would be to run the commands/scripts for all users on some timed 
schedule — e.g. produce per-user stats every 30 minutes.  So long as the stats 
are publicly-visible anyway, put those summaries in a shared file system with 
open read access.  Name the files by uid number.  Now your /etc/profile.d 
script just cat's ${STATS_DIR}/$(id -u).





On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users 
 wrote:

We are working to make our users more aware of their usage. One of the ideas we 
came up with was to having some basic usage stats printed at login (usage over 
past day, fairshare, job efficiency, etc). Does anyone have any scripts or 
methods that they use to do this? Before baking my own I was curious what other 
sites do and if they would be willing to share their scripts and methodology.

-Paul Edmon-


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-09 Thread Reid, Andrew C.E. (Fed) via slurm-users

  Maybe a heavier lift than you had in mind, but check
out xdmod, open.xdmod.org.

  It was developed by the NSF as part of the now-shuttered
XSEDE program, and is useful for both system and user monitoring.

  -- A.

On Fri, Aug 09, 2024 at 12:12:08PM -0400, Paul Edmon via slurm-users wrote:
> Yeah, I was contemplating doing that so I didn't have a dependency on the
> scheduler being up or down or busy.
> 
> What I was more curious about is if any one had an prebaked scripts for
> that.
> 
> -Paul Edmon-
> 
> On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:
> > You'd have to do this within e.g. the system's bashrc infrastructure.  The 
> > simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and 
> > have some canned commands/scripts running.  That does introduce load to the 
> > system and Slurm on every login, though, and slows the startup of login 
> > shells based on how responsive slurmctld/slurmdbd are at that moment.
> > 
> > Another option would be to run the commands/scripts for all users on some 
> > timed schedule — e.g. produce per-user stats every 30 minutes.  So long as 
> > the stats are publicly-visible anyway, put those summaries in a shared file 
> > system with open read access.  Name the files by uid number.  Now your 
> > /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
> > 
> > 
> > 
> > 
> > > On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users 
> > >  wrote:
> > > 
> > > We are working to make our users more aware of their usage. One of the 
> > > ideas we came up with was to having some basic usage stats printed at 
> > > login (usage over past day, fairshare, job efficiency, etc). Does anyone 
> > > have any scripts or methods that they use to do this? Before baking my 
> > > own I was curious what other sites do and if they would be willing to 
> > > share their scripts and methodology.
> > > 
> > > -Paul Edmon-
> > > 
> > > 
> > > -- 
> > > slurm-users mailing list -- slurm-users@lists.schedmd.com
> > > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
> 
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
Dr. Andrew C. E. Reid
Physical Scientist, Computer Operations Administrator
Center for Theoretical and Computational Materials Science
National Institute of Standards and Technology, Mail Stop 8555
Gaithersburg MD 20899 USA
andrew.r...@nist.gov

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-09 Thread Paul Edmon via slurm-users
Yup, we have that installed already. It's been very beneficial for over 
all monitoring.


-Paul Edmon-

On 8/9/2024 12:27 PM, Reid, Andrew C.E. (Fed) wrote:

   Maybe a heavier lift than you had in mind, but check
out xdmod, open.xdmod.org.

   It was developed by the NSF as part of the now-shuttered
XSEDE program, and is useful for both system and user monitoring.

   -- A.

On Fri, Aug 09, 2024 at 12:12:08PM -0400, Paul Edmon via slurm-users wrote:

Yeah, I was contemplating doing that so I didn't have a dependency on the
scheduler being up or down or busy.

What I was more curious about is if any one had an prebaked scripts for
that.

-Paul Edmon-

On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:

You'd have to do this within e.g. the system's bashrc infrastructure.  The 
simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have 
some canned commands/scripts running.  That does introduce load to the system 
and Slurm on every login, though, and slows the startup of login shells based 
on how responsive slurmctld/slurmdbd are at that moment.

Another option would be to run the commands/scripts for all users on some timed 
schedule — e.g. produce per-user stats every 30 minutes.  So long as the stats 
are publicly-visible anyway, put those summaries in a shared file system with 
open read access.  Name the files by uid number.  Now your /etc/profile.d 
script just cat's ${STATS_DIR}/$(id -u).





On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users 
 wrote:

We are working to make our users more aware of their usage. One of the ideas we 
came up with was to having some basic usage stats printed at login (usage over 
past day, fairshare, job efficiency, etc). Does anyone have any scripts or 
methods that they use to do this? Before baking my own I was curious what other 
sites do and if they would be willing to share their scripts and methodology.

-Paul Edmon-


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-09 Thread Jeffrey Layton via slurm-users
Good afternoon,

I know this question has been asked a million times, but what is the
canonical way to convert the list of nodes for a job that is container in a
Slurm variable, I use SLURM_JOB_NODELIST, to a host list appropriate for
mpirun in OpenMPI (perhaps MPICH as well)?

Before anyone says, compile OpenMPI with Slurm, I can't change the Slurm
installation.

I have a script that does the conversion on a single node, but when I try a
cluster that does not include the single node, I get an error:

scontrol: error: host list is empty

The line in the script corresponding to this is,

list=$(scontrol show hostname $SLURM_NODELIST)

I've tried using the env variable SLURM_JOB_NODELIST and I get the same
error message.

Thanks!

Jeff

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-09 Thread Paul Edmon via slurm-users
As I recall I think OpenMPI needs a list that has an entry on each line, 
rather than one seperated by a space. See:


[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
holy7c[26401-26405]
[root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
[root@holy7c26401 ~]# echo $list
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405

The first would be fine for OpenMPI (though usually you also need to 
have slots=numranks for each entry, where numranks is equal to the 
number of ranks per host you are trying to set up). The second I don't 
think would be interpreted properly. So you will need to make sure that 
things are passed in a manner that it can read. I usually just have it 
dump to file and then read in that file rather than holding it as a 
environmental variable.


-Paul Edmon-

On 8/9/2024 12:34 PM, Jeffrey Layton via slurm-users wrote:

Good afternoon,

I know this question has been asked a million times, but what is the 
canonical way to convert the list of nodes for a job that is container 
in a Slurm variable, I use SLURM_JOB_NODELIST, to a host list 
appropriate for mpirun in OpenMPI (perhaps MPICH as well)?


Before anyone says, compile OpenMPI with Slurm, I can't change the 
Slurm installation.


I have a script that does the conversion on a single node, but when I 
try a cluster that does not include the single node, I get an error:


scontrol: error: host list is empty

The line in the script corresponding to this is,

list=$(scontrol show hostname $SLURM_NODELIST)

I've tried using the env variable SLURM_JOB_NODELIST and I get the 
same error message.


Thanks!

Jeff



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-09 Thread Hermann Schwärzler via slurm-users

Hi Paul,

On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
As I recall I think OpenMPI needs a list that has an entry on each line, 
rather than one seperated by a space. See:


[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
holy7c[26401-26405]
[root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

[root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
[root@holy7c26401 ~]# echo $list
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405


proper quoting does wonders here (please consult the man-page of bash).
If you try

echo "$list"

you will see that you will get

holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

So you *can* pass this around in a variable if you use "$variable" 
whenever you provide it to a utility.


Regards,
Hermann

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
Simple question:

Does FairShare still work if every user is under one account? E.g.:

$ sacctmgr show assoc format=Account,User
   Account   User
-- --
  root
  root   root
   mic
   mic   asmith
   mic  bsmith
   mic   csmith
   mic djones
   mic ejones
   mic frubble


Will it divide time up fairly between the users? I have:

PriorityType=priority/multifactor
PriorityFavorSmall=YES
PriorityWeightAge=5
PriorityWeightFairshare=10
PriorityWeightJobSize=0
PriorityWeightQOS=0

In 21.08.8.



--
Daniel M. Drucker, Ph.D.
Director of IT, MGB Imaging at Belmont
McLean Hospital, a Harvard Medical School Affiliate

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Renfro, Michael via slurm-users
I don’t have any 21.08 systems to verify with, but that’s how I remember it. 
Use “sshare -a -A mic” to verify. You should see both a RawShares and a 
NormShares column for each user. By default they’ll all have the same value, 
but they can be adjusted if needed.

From: Drucker, Daniel via slurm-users 
Date: Friday, August 9, 2024 at 1:39 PM
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] FairShare if there's only one account?

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Simple question:

Does FairShare still work if every user is under one account? E.g.:

$ sacctmgr show assoc format=Account,User
   Account   User
-- --
  root
  root   root
   mic
   mic   asmith
   mic  bsmith
   mic   csmith
   mic djones
   mic ejones
   mic frubble


Will it divide time up fairly between the users? I have:

PriorityType=priority/multifactor
PriorityFavorSmall=YES
PriorityWeightAge=5
PriorityWeightFairshare=10
PriorityWeightJobSize=0
PriorityWeightQOS=0

In 21.08.8.



--
Daniel M. Drucker, Ph.D.
Director of IT, MGB Imaging at Belmont
McLean Hospital, a Harvard Medical School Affiliate

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
Looks like this:

$ sshare -a -A mic
AccountUser  RawShares  NormSharesRawUsage  
EffectvUsage  FairShare
 -- -- --- --- 
- --
mic1200.99173655524598  1.00
 mic   asmith parent0.991736 2532311  0.045607  
 0.983871
 mic  bsmith parent0.991736   0  0.00   
0.983871
 mic   csmith parent0.991736 3265529  0.058805  
 0.983871
 mic djones parent0.991736   0  0.00   
0.983871
 mic ejones parent0.991736 2210952  
0.039820   0.983871
...etc etc etc...

Does that look right?




On Aug 9, 2024, at 4:05 PM, Renfro, Michael via slurm-users 
 wrote:


External Email - Use Caution

I don’t have any 21.08 systems to verify with, but that’s how I remember it. 
Use “sshare -a -A mic” to verify. You should see both a RawShares and a 
NormShares column for each user. By default they’ll all have the same value, 
but they can be adjusted if needed.

From: Drucker, Daniel via slurm-users 
mailto:slurm-users@lists.schedmd.com>>
Date: Friday, August 9, 2024 at 1:39 PM
To: slurm-users@lists.schedmd.com 
mailto:slurm-users@lists.schedmd.com>>
Subject: [slurm-users] FairShare if there's only one account?
External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Simple question:

Does FairShare still work if every user is under one account? E.g.:

$ sacctmgr show assoc format=Account,User
   Account   User
-- --
  root
  root   root
   mic
   mic   asmith
   mic  bsmith
   mic   csmith
   mic djones
   mic ejones
   mic frubble


Will it divide time up fairly between the users? I have:

PriorityType=priority/multifactor
PriorityFavorSmall=YES
PriorityWeightAge=5
PriorityWeightFairshare=10
PriorityWeightJobSize=0
PriorityWeightQOS=0

In 21.08.8.



--
Daniel M. Drucker, Ph.D.
Director of IT, MGB Imaging at Belmont
McLean Hospital, a Harvard Medical School Affiliate

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

--
slurm-users mailing list -- 
slurm-users@lists.schedmd.com
To unsubscribe send an email to 
slurm-users-le...@lists.schedmd.com

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Jobs distribution over CPUs

2024-08-09 Thread Rafał Lalik via slurm-users
Hi,

I have a very simple computing farm on a single PC with AMD Ryzen 7950X (2x16 
cores).

I have configured my slurm to use up to 25 CPUs:

NodeName=palmer CPUs=25 RealMemory=4 State=UNKNOWN # Boards=1 
SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=2
PartitionName=main Nodes=ALL Default=YES MaxTime=INFINITE State=UP

But using htop I see that with all 25 jobs running I use max 16 cores. It seems 
that like

  *
6 jobs are using 100% of CPU
  *
20 jobs use 50% CPU each

I use slurm 24.05.2, but I am pretty sure in the past when using I think one of 
22.x version, the jobs distribution was like 25 CPUs for 25 jobs with 100% each.

Is there anything I can do in the configuration of my farm to have better 
utilisation of CPU cores? Right now like half of them is not really used.

Regards,
Rafał

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Renfro, Michael via slurm-users
The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.

From: Drucker, Daniel 
Date: Friday, August 9, 2024 at 3:11 PM
To: Renfro, Michael 
Cc: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] FairShare if there's only one account?

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Looks like this:

$ sshare -a -A mic
AccountUser  RawShares  NormSharesRawUsage  
EffectvUsage  FairShare
 -- -- --- --- 
- --
mic1200.99173655524598  1.00
 mic   asmith parent0.991736 2532311  0.045607  
 0.983871
 mic  bsmith parent0.991736   0  0.00   
0.983871
 mic   csmith parent0.991736 3265529  0.058805  
 0.983871
 mic djones parent0.991736   0  0.00   
0.983871
 mic ejones parent0.991736 2210952  
0.039820   0.983871
...etc etc etc...

Does that look right?





On Aug 9, 2024, at 4:05 PM, Renfro, Michael via slurm-users 
 wrote:


External Email - Use Caution

I don’t have any 21.08 systems to verify with, but that’s how I remember it. 
Use “sshare -a -A mic” to verify. You should see both a RawShares and a 
NormShares column for each user. By default they’ll all have the same value, 
but they can be adjusted if needed.

From: Drucker, Daniel via slurm-users 
mailto:slurm-users@lists.schedmd.com>>
Date: Friday, August 9, 2024 at 1:39 PM
To: slurm-users@lists.schedmd.com 
mailto:slurm-users@lists.schedmd.com>>
Subject: [slurm-users] FairShare if there's only one account?
External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.


Simple question:

Does FairShare still work if every user is under one account? E.g.:

$ sacctmgr show assoc format=Account,User
   Account   User
-- --
  root
  root   root
   mic
   mic   asmith
   mic  bsmith
   mic   csmith
   mic djones
   mic ejones
   mic frubble


Will it divide time up fairly between the users? I have:

PriorityType=priority/multifactor
PriorityFavorSmall=YES
PriorityWeightAge=5
PriorityWeightFairshare=10
PriorityWeightJobSize=0
PriorityWeightQOS=0

In 21.08.8.



--
Daniel M. Drucker, Ph.D.
Director of IT, MGB Imaging at Belmont
McLean Hospital, a Harvard Medical School Affiliate

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

--
slurm-users mailing list -- 
slurm-users@lists.schedmd.com
To unsubscribe send an email to 
slurm-users-le...@lists.schedmd.com

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To u

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:


External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Paul Raines via slurm-users


This depends on how you have assigned fairshare in sacctmgr when creating
the accounts and users.  At our site we want fairshare only on accounts
and not users, just like you are seeing, so we create accounts with

  sacctmgr -i add account $acct Description="$descr" \
 fairshare=200 GrpJobsAccrue=8

and users with

  sacctmgr -i add user "$u" account=$acct fairshare=parent

If you want users to have their own independent fairshare, you
do not use fairshare=parent but assign a real number.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

   External Email - Use Caution 


I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:


   External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham Compliance 
HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham Compliance 
HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
Hi Paul from over at mclean.harvard.edu!

I have never added any users using sacctmgr - I've always just had everyone I 
guess automatically join the default account, mic. Are you saying that is what 
is causing my problem?

I'm confused I guess because I would have expected that within an account - 
even if there is only one - users would get their 'fair share' of resources, 
rather than just defaulting to FIFO or something. But that doesn't seem to be 
the case.

I do not want any particular user to start out with more priority than any 
other particular user - I just want to make sure that if user A submits a 
million jobs at noon, and user B submits one job at 12:01, user B doesn't have 
to wait until those million jobs finish.

Daniel


On Aug 9, 2024, at 5:47 PM, Paul Raines  wrote:


This depends on how you have assigned fairshare in sacctmgr when creating
the accounts and users.  At our site we want fairshare only on accounts
and not users, just like you are seeing, so we create accounts with

 sacctmgr -i add account $acct Description="$descr" \
fairshare=200 GrpJobsAccrue=8

and users with

 sacctmgr -i add user "$u" account=$acct fairshare=parent

If you want users to have their own independent fairshare, you
do not use fairshare=parent but assign a real number.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

  External Email - Use Caution
I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:


  External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Fulcomer, Samuel via slurm-users
I don't think fairshare use is updated until jobs finish...

On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Paul from over at mclean.harvard.edu!
>
> I have never added *any* users using sacctmgr - I've always just had
> everyone I guess automatically join the default account, *mic*. Are you
> saying that is what is causing my problem?
>
> I'm confused I guess because I would have expected that *within* an
> account - even if there is only one - users would get their 'fair share' of
> resources, rather than just defaulting to FIFO or something. But that
> doesn't seem to be the case.
>
> I do not want any particular user to start out with more priority than any
> other particular user - I just want to make sure that if user A submits a
> million jobs at noon, and user B submits one job at 12:01, user B doesn't
> have to wait until those million jobs finish.
>
> Daniel
>
>
> On Aug 9, 2024, at 5:47 PM, Paul Raines 
> wrote:
>
>
> This depends on how you have assigned fairshare in sacctmgr when creating
> the accounts and users.  At our site we want fairshare only on accounts
> and not users, just like you are seeing, so we create accounts with
>
>  sacctmgr -i add account $acct Description="$descr" \
> fairshare=200 GrpJobsAccrue=8
>
> and users with
>
>  sacctmgr -i add user "$u" account=$acct fairshare=parent
>
> If you want users to have their own independent fairshare, you
> do not use fairshare=parent but assign a real number.
>
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>
>
>
> On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:
>
>   External Email - Use Caution
> I got the opposite result. When I submitted a job as bsmith, they got a
> lower priority (the number was smaller) than the job submitted as csmith.
>
> bsmith (who has never submitted a job before) got a priority of 98387
> (which is 1 times the 0.983871 FairShare), whereas csmith (who is
> already running a huge number of jobs and has been for days now) got a
> priority of 103749.
>
>
>
> On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:
>
>
>   External Email - Use Caution
>
> The format has changed a bit, since none of our RawShares column is
> ‘parent’.
>
> But you can test this to be certain.
>
> If your cluster already has jobs pending, have bsmith (who has zero usage)
> and csmith (who has a lot of usage, relatively) each submit several jobs
> into the pending queue. Alternatively, have bsmith and csmith submit jobs
> with larger resource requests: jobs that are large enough to automatically
> go into a pending state due to lack of resources. Those might be jobs that
> request the whole cluster, even.
>
> bsmith’s jobs should get a higher priority as seen from sprio, and
> bsmith’s jobs should start earlier than csmith’s.
> The information in this e-mail is intended only for the person to whom it
> is addressed.  If you believe this e-mail was sent to you in error and the
> e-mail contains patient information, please contact the Mass General
> Brigham Compliance HelpLine at
> https://www.massgeneralbrigham.org/complianceline <
> https://www.massgeneralbrigham.org/complianceline> .
> Please note that this e-mail is not secure (encrypted).  If you do not
> wish to continue communication over unencrypted e-mail, please notify the
> sender of this message immediately.  Continuing to send or respond to
> e-mail after receiving this message means you understand and accept this
> risk and wish to continue to communicate over unencrypted e-mail.
>
>
> The information in this e-mail is intended only for the person to whom it
> is addressed.  If you believe this e-mail was sent to you in error and the
> e-mail contains patient information, please contact the Mass General
> Brigham Compliance HelpLine at
> https://www.massgeneralbrigham.org/complianceline .
>
> Please note that this e-mail is not secure (encrypted).  If you do not
> wish to continue communication over unencrypted e-mail, please notify the
> sender of this message immediately.  Continuing to send or respond to
> e-mail after receiving this message means you understand and accept this
> risk and wish to continue to communicate over unencrypted e-mail.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
Well, let's say user A has completed a million jobs in the last few days as 
well, and user A has never submitted any before.

On Aug 9, 2024, at 6:03 PM, Fulcomer, Samuel  wrote:


External Email - Use Caution

I don't think fairshare use is updated until jobs finish...

On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users 
mailto:slurm-users@lists.schedmd.com>> wrote:
Hi Paul from over at mclean.harvard.edu!

I have never added any users using sacctmgr - I've always just had everyone I 
guess automatically join the default account, mic. Are you saying that is what 
is causing my problem?

I'm confused I guess because I would have expected that within an account - 
even if there is only one - users would get their 'fair share' of resources, 
rather than just defaulting to FIFO or something. But that doesn't seem to be 
the case.

I do not want any particular user to start out with more priority than any 
other particular user - I just want to make sure that if user A submits a 
million jobs at noon, and user B submits one job at 12:01, user B doesn't have 
to wait until those million jobs finish.

Daniel


On Aug 9, 2024, at 5:47 PM, Paul Raines 
mailto:rai...@nmr.mgh.harvard.edu>> wrote:


This depends on how you have assigned fairshare in sacctmgr when creating
the accounts and users.  At our site we want fairshare only on accounts
and not users, just like you are seeing, so we create accounts with

 sacctmgr -i add account $acct Description="$descr" \
fairshare=200 GrpJobsAccrue=8

and users with

 sacctmgr -i add user "$u" account=$acct fairshare=parent

If you want users to have their own independent fairshare, you
do not use fairshare=parent but assign a real number.

-- Paul Raines 
(http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

  External Email - Use Caution
I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael 
mailto:ren...@tntech.edu>> wrote:


  External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

--
slurm-users mailing list -- 
slurm-users@lists.schedmd.com
To unsubscribe send an email to 
slurm-users-le...@lists.schedmd.com

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
Er, user B has never.

On Aug 9, 2024, at 6:08 PM, Daniel M. Drucker  
wrote:

Well, let's say user A has completed a million jobs in the last few days as 
well, and user A has never submitted any before.

On Aug 9, 2024, at 6:03 PM, Fulcomer, Samuel  wrote:


External Email - Use Caution

I don't think fairshare use is updated until jobs finish...

On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users 
mailto:slurm-users@lists.schedmd.com>> wrote:
Hi Paul from over at mclean.harvard.edu!

I have never added any users using sacctmgr - I've always just had everyone I 
guess automatically join the default account, mic. Are you saying that is what 
is causing my problem?

I'm confused I guess because I would have expected that within an account - 
even if there is only one - users would get their 'fair share' of resources, 
rather than just defaulting to FIFO or something. But that doesn't seem to be 
the case.

I do not want any particular user to start out with more priority than any 
other particular user - I just want to make sure that if user A submits a 
million jobs at noon, and user B submits one job at 12:01, user B doesn't have 
to wait until those million jobs finish.

Daniel


On Aug 9, 2024, at 5:47 PM, Paul Raines 
mailto:rai...@nmr.mgh.harvard.edu>> wrote:


This depends on how you have assigned fairshare in sacctmgr when creating
the accounts and users.  At our site we want fairshare only on accounts
and not users, just like you are seeing, so we create accounts with

 sacctmgr -i add account $acct Description="$descr" \
fairshare=200 GrpJobsAccrue=8

and users with

 sacctmgr -i add user "$u" account=$acct fairshare=parent

If you want users to have their own independent fairshare, you
do not use fairshare=parent but assign a real number.

-- Paul Raines 
(http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

  External Email - Use Caution
I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael 
mailto:ren...@tntech.edu>> wrote:


  External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

--
slurm-users mailing list -- 
slurm-users@lists.schedmd.com
To unsubscribe send an email to 
slurm-users-le...@lists.schedmd.com


The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Co

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Paul Raines via slurm-users

I have never used Slurm where I have not added users explicitly first so I
am not sure what happens in that case.  But from your sshare output it
certainly seems it default to fairshare=parent

Trying modify the users with

  sacctmgr modify user $username fairshare=200

and then run sshare -a -A mic to see what has changed.


-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:57pm, Drucker, Daniel wrote:


Hi Paul from over at mclean.harvard.edu!

I have never added any users using sacctmgr - I've always just had everyone I 
guess automatically join the default account, mic. Are you saying that is what 
is causing my problem?

I'm confused I guess because I would have expected that within an 
account - even if there is only one - users would get their 'fair share' 
of resources, rather than just defaulting to FIFO or something. But that 
doesn't seem to be the case.


I do not want any particular user to start out with more priority than any 
other particular user - I just want to make sure that if user A submits a 
million jobs at noon, and user B submits one job at 12:01, user B doesn't have 
to wait until those million jobs finish.

Daniel


On Aug 9, 2024, at 5:47 PM, Paul Raines  wrote:


This depends on how you have assigned fairshare in sacctmgr when creating
the accounts and users.  At our site we want fairshare only on accounts
and not users, just like you are seeing, so we create accounts with

sacctmgr -i add account $acct Description="$descr" \
   fairshare=200 GrpJobsAccrue=8

and users with

sacctmgr -i add user "$u" account=$acct fairshare=parent

If you want users to have their own independent fairshare, you
do not use fairshare=parent but assign a real number.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

 External Email - Use Caution
I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:


 External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham Compliance 
HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.



The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham Compliance 
HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately.  Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
NormShares changes to '1' for any user I modify like that. Everyone else has 
0.991736. The "FairShare" column does not change.


> On Aug 9, 2024, at 6:35 PM, Paul Raines  wrote:
> 
> I have never used Slurm where I have not added users explicitly first so I
> am not sure what happens in that case.  But from your sshare output it
> certainly seems it default to fairshare=parent
> 
> Trying modify the users with
> 
>  sacctmgr modify user $username fairshare=200
> 
> and then run sshare -a -A mic to see what has changed.
> 
> 
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
> 
> 
> 
> On Fri, 9 Aug 2024 5:57pm, Drucker, Daniel wrote:
> 
>> Hi Paul from over at mclean.harvard.edu!
>> 
>> I have never added any users using sacctmgr - I've always just had everyone 
>> I guess automatically join the default account, mic. Are you saying that is 
>> what is causing my problem?
>> 
>> I'm confused I guess because I would have expected that within an account - 
>> even if there is only one - users would get their 'fair share' of resources, 
>> rather than just defaulting to FIFO or something. But that doesn't seem to 
>> be the case.
>> 
>> I do not want any particular user to start out with more priority than any 
>> other particular user - I just want to make sure that if user A submits a 
>> million jobs at noon, and user B submits one job at 12:01, user B doesn't 
>> have to wait until those million jobs finish.
>> 
>> Daniel
>> 
>> 
>> On Aug 9, 2024, at 5:47 PM, Paul Raines  wrote:
>> 
>> 
>> This depends on how you have assigned fairshare in sacctmgr when creating
>> the accounts and users.  At our site we want fairshare only on accounts
>> and not users, just like you are seeing, so we create accounts with
>> 
>> sacctmgr -i add account $acct Description="$descr" \
>>   fairshare=200 GrpJobsAccrue=8
>> 
>> and users with
>> 
>> sacctmgr -i add user "$u" account=$acct fairshare=parent
>> 
>> If you want users to have their own independent fairshare, you
>> do not use fairshare=parent but assign a real number.
>> 
>> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>> 
>> 
>> 
>> On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:
>> 
>> External Email - Use Caution
>> I got the opposite result. When I submitted a job as bsmith, they got a 
>> lower priority (the number was smaller) than the job submitted as csmith.
>> 
>> bsmith (who has never submitted a job before) got a priority of 98387 (which 
>> is 1 times the 0.983871 FairShare), whereas csmith (who is already 
>> running a huge number of jobs and has been for days now) got a priority of 
>> 103749.
>> 
>> 
>> 
>> On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:
>> 
>> 
>> External Email - Use Caution
>> 
>> The format has changed a bit, since none of our RawShares column is ‘parent’.
>> 
>> But you can test this to be certain.
>> 
>> If your cluster already has jobs pending, have bsmith (who has zero usage) 
>> and csmith (who has a lot of usage, relatively) each submit several jobs 
>> into the pending queue. Alternatively, have bsmith and csmith submit jobs 
>> with larger resource requests: jobs that are large enough to automatically 
>> go into a pending state due to lack of resources. Those might be jobs that 
>> request the whole cluster, even.
>> 
>> bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
>> jobs should start earlier than csmith’s.
>> The information in this e-mail is intended only for the person to whom it is 
>> addressed.  If you believe this e-mail was sent to you in error and the 
>> e-mail contains patient information, please contact the Mass General Brigham 
>> Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
>>  .
>> Please note that this e-mail is not secure (encrypted).  If you do not wish 
>> to continue communication over unencrypted e-mail, please notify the sender 
>> of this message immediately.  Continuing to send or respond to e-mail after 
>> receiving this message means you understand and accept this risk and wish to 
>> continue to communicate over unencrypted e-mail.
>> 

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscri

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Fulcomer, Samuel via slurm-users
Yes, well, in that case, it should work as you desire, modulo your
slurm.conf settings. What are the relevant lines in yours?

On Fri, Aug 9, 2024 at 6:09 PM Drucker, Daniel 
wrote:

> Er, user B has never.
>
> On Aug 9, 2024, at 6:08 PM, Daniel M. Drucker 
> wrote:
>
> Well, let's say user A has completed a million jobs in the last few days
> as well, and user A has never submitted any before.
>
> On Aug 9, 2024, at 6:03 PM, Fulcomer, Samuel 
> wrote:
>
> External Email - Use Caution
>
> I don't think fairshare use is updated until jobs finish...
>
> On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Hi Paul from over at mclean.harvard.edu!
>>
>> I have never added *any* users using sacctmgr - I've always just had
>> everyone I guess automatically join the default account, *mic*. Are you
>> saying that is what is causing my problem?
>>
>> I'm confused I guess because I would have expected that *within* an
>> account - even if there is only one - users would get their 'fair share' of
>> resources, rather than just defaulting to FIFO or something. But that
>> doesn't seem to be the case.
>>
>> I do not want any particular user to start out with more priority than
>> any other particular user - I just want to make sure that if user A submits
>> a million jobs at noon, and user B submits one job at 12:01, user B doesn't
>> have to wait until those million jobs finish.
>>
>> Daniel
>>
>>
>> On Aug 9, 2024, at 5:47 PM, Paul Raines 
>> wrote:
>>
>>
>> This depends on how you have assigned fairshare in sacctmgr when creating
>> the accounts and users.  At our site we want fairshare only on accounts
>> and not users, just like you are seeing, so we create accounts with
>>
>>  sacctmgr -i add account $acct Description="$descr" \
>> fairshare=200 GrpJobsAccrue=8
>>
>> and users with
>>
>>  sacctmgr -i add user "$u" account=$acct fairshare=parent
>>
>> If you want users to have their own independent fairshare, you
>> do not use fairshare=parent but assign a real number.
>>
>> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>
>>
>>
>> On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:
>>
>>   External Email - Use Caution
>> I got the opposite result. When I submitted a job as bsmith, they got a
>> lower priority (the number was smaller) than the job submitted as csmith.
>>
>> bsmith (who has never submitted a job before) got a priority of 98387
>> (which is 1 times the 0.983871 FairShare), whereas csmith (who is
>> already running a huge number of jobs and has been for days now) got a
>> priority of 103749.
>>
>>
>>
>> On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:
>>
>>
>>   External Email - Use Caution
>>
>> The format has changed a bit, since none of our RawShares column is
>> ‘parent’.
>>
>> But you can test this to be certain.
>>
>> If your cluster already has jobs pending, have bsmith (who has zero
>> usage) and csmith (who has a lot of usage, relatively) each submit several
>> jobs into the pending queue. Alternatively, have bsmith and csmith submit
>> jobs with larger resource requests: jobs that are large enough to
>> automatically go into a pending state due to lack of resources. Those might
>> be jobs that request the whole cluster, even.
>>
>> bsmith’s jobs should get a higher priority as seen from sprio, and
>> bsmith’s jobs should start earlier than csmith’s.
>> The information in this e-mail is intended only for the person to whom it
>> is addressed.  If you believe this e-mail was sent to you in error and the
>> e-mail contains patient information, please contact the Mass General
>> Brigham Compliance HelpLine at
>> https://www.massgeneralbrigham.org/complianceline <
>> https://www.massgeneralbrigham.org/complianceline> .
>> Please note that this e-mail is not secure (encrypted).  If you do not
>> wish to continue communication over unencrypted e-mail, please notify the
>> sender of this message immediately.  Continuing to send or respond to
>> e-mail after receiving this message means you understand and accept this
>> risk and wish to continue to communicate over unencrypted e-mail.
>>
>>
>> The information in this e-mail is intended only for the person to whom it
>> is addressed.  If you believe this e-mail was sent to you in error and the
>> e-mail contains patient information, please contact the Mass General
>> Brigham Compliance HelpLine at
>> https://www.massgeneralbrigham.org/complianceline .
>>
>> Please note that this e-mail is not secure (encrypted).  If you do not
>> wish to continue communication over unencrypted e-mail, please notify the
>> sender of this message immediately.  Continuing to send or respond to
>> e-mail after receiving this message means you understand and accept this
>> risk and wish to continue to communicate over unencrypted e-mail.
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.sche

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
PriorityType=priority/multifactor
PriorityFavorSmall=YES
PriorityWeightAge=5
PriorityWeightFairshare=10
PriorityWeightJobSize=0
PriorityWeightQOS=0

In 21.08.8.



On Aug 9, 2024, at 8:36 PM, Fulcomer, Samuel  wrote:


External Email - Use Caution

Yes, well, in that case, it should work as you desire, modulo your slurm.conf 
settings. What are the relevant lines in yours?

On Fri, Aug 9, 2024 at 6:09 PM Drucker, Daniel 
mailto:ddruc...@mclean.harvard.edu>> wrote:
Er, user B has never.

On Aug 9, 2024, at 6:08 PM, Daniel M. Drucker 
mailto:ddruc...@mclean.harvard.edu>> wrote:

Well, let's say user A has completed a million jobs in the last few days as 
well, and user A has never submitted any before.

On Aug 9, 2024, at 6:03 PM, Fulcomer, Samuel 
mailto:samuel_fulco...@brown.edu>> wrote:


External Email - Use Caution

I don't think fairshare use is updated until jobs finish...

On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users 
mailto:slurm-users@lists.schedmd.com>> wrote:
Hi Paul from over at mclean.harvard.edu!

I have never added any users using sacctmgr - I've always just had everyone I 
guess automatically join the default account, mic. Are you saying that is what 
is causing my problem?

I'm confused I guess because I would have expected that within an account - 
even if there is only one - users would get their 'fair share' of resources, 
rather than just defaulting to FIFO or something. But that doesn't seem to be 
the case.

I do not want any particular user to start out with more priority than any 
other particular user - I just want to make sure that if user A submits a 
million jobs at noon, and user B submits one job at 12:01, user B doesn't have 
to wait until those million jobs finish.

Daniel


On Aug 9, 2024, at 5:47 PM, Paul Raines 
mailto:rai...@nmr.mgh.harvard.edu>> wrote:


This depends on how you have assigned fairshare in sacctmgr when creating
the accounts and users.  At our site we want fairshare only on accounts
and not users, just like you are seeing, so we create accounts with

 sacctmgr -i add account $acct Description="$descr" \
fairshare=200 GrpJobsAccrue=8

and users with

 sacctmgr -i add user "$u" account=$acct fairshare=parent

If you want users to have their own independent fairshare, you
do not use fairshare=parent but assign a real number.

-- Paul Raines 
(http://help.nmr.mgh.harvard.edu)



On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

  External Email - Use Caution
I got the opposite result. When I submitted a job as bsmith, they got a lower 
priority (the number was smaller) than the job submitted as csmith.

bsmith (who has never submitted a job before) got a priority of 98387 (which is 
1 times the 0.983871 FairShare), whereas csmith (who is already running a 
huge number of jobs and has been for days now) got a priority of 103749.



On Aug 9, 2024, at 5:11 PM, Renfro, Michael 
mailto:ren...@tntech.edu>> wrote:


  External Email - Use Caution

The format has changed a bit, since none of our RawShares column is ‘parent’.

But you can test this to be certain.

If your cluster already has jobs pending, have bsmith (who has zero usage) and 
csmith (who has a lot of usage, relatively) each submit several jobs into the 
pending queue. Alternatively, have bsmith and csmith submit jobs with larger 
resource requests: jobs that are large enough to automatically go into a 
pending state due to lack of resources. Those might be jobs that request the 
whole cluster, even.

bsmith’s jobs should get a higher priority as seen from sprio, and bsmith’s 
jobs should start earlier than csmith’s.
The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.

The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this mes

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Fulcomer, Samuel via slurm-users
...and what are the top 10-15 lines in your share output?...

On Fri, Aug 9, 2024 at 9:07 PM Drucker, Daniel 
wrote:

> PriorityType=priority/multifactor
> PriorityFavorSmall=YES
> PriorityWeightAge=5
> PriorityWeightFairshare=10
> PriorityWeightJobSize=0
> PriorityWeightQOS=0
>
> In 21.08.8.
>
>
>
> On Aug 9, 2024, at 8:36 PM, Fulcomer, Samuel 
> wrote:
>
> External Email - Use Caution
>
> Yes, well, in that case, it should work as you desire, modulo your
> slurm.conf settings. What are the relevant lines in yours?
>
> On Fri, Aug 9, 2024 at 6:09 PM Drucker, Daniel <
> ddruc...@mclean.harvard.edu> wrote:
>
>> Er, user B has never.
>>
>> On Aug 9, 2024, at 6:08 PM, Daniel M. Drucker <
>> ddruc...@mclean.harvard.edu> wrote:
>>
>> Well, let's say user A has completed a million jobs in the last few days
>> as well, and user A has never submitted any before.
>>
>> On Aug 9, 2024, at 6:03 PM, Fulcomer, Samuel 
>> wrote:
>>
>> External Email - Use Caution
>>
>> I don't think fairshare use is updated until jobs finish...
>>
>> On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users <
>> slurm-users@lists.schedmd.com> wrote:
>>
>>> Hi Paul from over at mclean.harvard.edu!
>>>
>>> I have never added *any* users using sacctmgr - I've always just had
>>> everyone I guess automatically join the default account, *mic*. Are you
>>> saying that is what is causing my problem?
>>>
>>> I'm confused I guess because I would have expected that *within* an
>>> account - even if there is only one - users would get their 'fair share' of
>>> resources, rather than just defaulting to FIFO or something. But that
>>> doesn't seem to be the case.
>>>
>>> I do not want any particular user to start out with more priority than
>>> any other particular user - I just want to make sure that if user A submits
>>> a million jobs at noon, and user B submits one job at 12:01, user B doesn't
>>> have to wait until those million jobs finish.
>>>
>>> Daniel
>>>
>>>
>>> On Aug 9, 2024, at 5:47 PM, Paul Raines 
>>> wrote:
>>>
>>>
>>> This depends on how you have assigned fairshare in sacctmgr when creating
>>> the accounts and users.  At our site we want fairshare only on accounts
>>> and not users, just like you are seeing, so we create accounts with
>>>
>>>  sacctmgr -i add account $acct Description="$descr" \
>>> fairshare=200 GrpJobsAccrue=8
>>>
>>> and users with
>>>
>>>  sacctmgr -i add user "$u" account=$acct fairshare=parent
>>>
>>> If you want users to have their own independent fairshare, you
>>> do not use fairshare=parent but assign a real number.
>>>
>>> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>>>
>>>
>>>
>>> On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:
>>>
>>>   External Email - Use Caution
>>> I got the opposite result. When I submitted a job as bsmith, they got a
>>> lower priority (the number was smaller) than the job submitted as csmith.
>>>
>>> bsmith (who has never submitted a job before) got a priority of 98387
>>> (which is 1 times the 0.983871 FairShare), whereas csmith (who is
>>> already running a huge number of jobs and has been for days now) got a
>>> priority of 103749.
>>>
>>>
>>>
>>> On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:
>>>
>>>
>>>   External Email - Use Caution
>>>
>>> The format has changed a bit, since none of our RawShares column is
>>> ‘parent’.
>>>
>>> But you can test this to be certain.
>>>
>>> If your cluster already has jobs pending, have bsmith (who has zero
>>> usage) and csmith (who has a lot of usage, relatively) each submit several
>>> jobs into the pending queue. Alternatively, have bsmith and csmith submit
>>> jobs with larger resource requests: jobs that are large enough to
>>> automatically go into a pending state due to lack of resources. Those might
>>> be jobs that request the whole cluster, even.
>>>
>>> bsmith’s jobs should get a higher priority as seen from sprio, and
>>> bsmith’s jobs should start earlier than csmith’s.
>>> The information in this e-mail is intended only for the person to whom
>>> it is addressed.  If you believe this e-mail was sent to you in error and
>>> the e-mail contains patient information, please contact the Mass General
>>> Brigham Compliance HelpLine at
>>> https://www.massgeneralbrigham.org/complianceline <
>>> https://www.massgeneralbrigham.org/complianceline> .
>>> Please note that this e-mail is not secure (encrypted).  If you do not
>>> wish to continue communication over unencrypted e-mail, please notify the
>>> sender of this message immediately.  Continuing to send or respond to
>>> e-mail after receiving this message means you understand and accept this
>>> risk and wish to continue to communicate over unencrypted e-mail.
>>>
>>>
>>> The information in this e-mail is intended only for the person to whom
>>> it is addressed.  If you believe this e-mail was sent to you in error and
>>> the e-mail contains patient information, please contact the Mass General
>>> Brigh

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Fulcomer, Samuel via slurm-users
"sshare", not share

And note that the high PriorityWeightAge may be complicating things. We set
it to 0. With it set so high, it allows users to gain priority by flooding
the queue if you allow high numbers of job submissions and they age up in
priority while they're waiting to run.

On Fri, Aug 9, 2024 at 9:15 PM Fulcomer, Samuel 
wrote:

> ...and what are the top 10-15 lines in your share output?...
>
> On Fri, Aug 9, 2024 at 9:07 PM Drucker, Daniel <
> ddruc...@mclean.harvard.edu> wrote:
>
>> PriorityType=priority/multifactor
>> PriorityFavorSmall=YES
>> PriorityWeightAge=5
>> PriorityWeightFairshare=10
>> PriorityWeightJobSize=0
>> PriorityWeightQOS=0
>>
>> In 21.08.8.
>>
>>
>>
>> On Aug 9, 2024, at 8:36 PM, Fulcomer, Samuel 
>> wrote:
>>
>> External Email - Use Caution
>>
>> Yes, well, in that case, it should work as you desire, modulo your
>> slurm.conf settings. What are the relevant lines in yours?
>>
>> On Fri, Aug 9, 2024 at 6:09 PM Drucker, Daniel <
>> ddruc...@mclean.harvard.edu> wrote:
>>
>>> Er, user B has never.
>>>
>>> On Aug 9, 2024, at 6:08 PM, Daniel M. Drucker <
>>> ddruc...@mclean.harvard.edu> wrote:
>>>
>>> Well, let's say user A has completed a million jobs in the last few days
>>> as well, and user A has never submitted any before.
>>>
>>> On Aug 9, 2024, at 6:03 PM, Fulcomer, Samuel 
>>> wrote:
>>>
>>> External Email - Use Caution
>>>
>>> I don't think fairshare use is updated until jobs finish...
>>>
>>> On Fri, Aug 9, 2024 at 5:59 PM Drucker, Daniel via slurm-users <
>>> slurm-users@lists.schedmd.com> wrote:
>>>
 Hi Paul from over at mclean.harvard.edu!

 I have never added *any* users using sacctmgr - I've always just had
 everyone I guess automatically join the default account, *mic*. Are
 you saying that is what is causing my problem?

 I'm confused I guess because I would have expected that *within* an
 account - even if there is only one - users would get their 'fair share' of
 resources, rather than just defaulting to FIFO or something. But that
 doesn't seem to be the case.

 I do not want any particular user to start out with more priority than
 any other particular user - I just want to make sure that if user A submits
 a million jobs at noon, and user B submits one job at 12:01, user B doesn't
 have to wait until those million jobs finish.

 Daniel


 On Aug 9, 2024, at 5:47 PM, Paul Raines 
 wrote:


 This depends on how you have assigned fairshare in sacctmgr when
 creating
 the accounts and users.  At our site we want fairshare only on accounts
 and not users, just like you are seeing, so we create accounts with

  sacctmgr -i add account $acct Description="$descr" \
 fairshare=200 GrpJobsAccrue=8

 and users with

  sacctmgr -i add user "$u" account=$acct fairshare=parent

 If you want users to have their own independent fairshare, you
 do not use fairshare=parent but assign a real number.

 -- Paul Raines (http://help.nmr.mgh.harvard.edu)



 On Fri, 9 Aug 2024 5:20pm, Drucker, Daniel via slurm-users wrote:

   External Email - Use Caution
 I got the opposite result. When I submitted a job as bsmith, they got a
 lower priority (the number was smaller) than the job submitted as csmith.

 bsmith (who has never submitted a job before) got a priority of 98387
 (which is 1 times the 0.983871 FairShare), whereas csmith (who is
 already running a huge number of jobs and has been for days now) got a
 priority of 103749.



 On Aug 9, 2024, at 5:11 PM, Renfro, Michael  wrote:


   External Email - Use Caution

 The format has changed a bit, since none of our RawShares column is
 ‘parent’.

 But you can test this to be certain.

 If your cluster already has jobs pending, have bsmith (who has zero
 usage) and csmith (who has a lot of usage, relatively) each submit several
 jobs into the pending queue. Alternatively, have bsmith and csmith submit
 jobs with larger resource requests: jobs that are large enough to
 automatically go into a pending state due to lack of resources. Those might
 be jobs that request the whole cluster, even.

 bsmith’s jobs should get a higher priority as seen from sprio, and
 bsmith’s jobs should start earlier than csmith’s.
 The information in this e-mail is intended only for the person to whom
 it is addressed.  If you believe this e-mail was sent to you in error and
 the e-mail contains patient information, please contact the Mass General
 Brigham Compliance HelpLine at
 https://www.massgeneralbrigham.org/complianceline <
 https://www.massgeneralbrigham.org/complianceline> .
 Please note that this e-mail is not secure (encrypted).  If you do not
 wish to continue communicat

[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users

> On Aug 9, 2024, at 9:15 PM, Fulcomer, Samuel  
> wrote:
> ...and what are the top 10-15 lines in your share output?... 

See the 4:10PM message in this thread.


The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Drucker, Daniel via slurm-users
On Aug 9, 2024, at 9:21 PM, Fulcomer, Samuel  wrote:
> And note that the high PriorityWeightAge may be complicating things. We set 
> it to 0. With it set so high, it allows users to gain priority by flooding 
> the queue if you allow high numbers of job submissions and they age up in 
> priority while they're waiting to run. 

That's a great point. Changed to 0.


The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FairShare if there's only one account?

2024-08-09 Thread Fulcomer, Samuel via slurm-users
For users with a parent account of "mic", I'd expect the RawShares to be
listed as "1", not "parent".

What's the "sprio" output for two jobs of users A and B, and which of them
hasn't run any jobs?

Also, the first 15 lines of output for "sshare" (no arguments) would be
useful for me.



On Fri, Aug 9, 2024 at 9:52 PM Drucker, Daniel 
wrote:

> On Aug 9, 2024, at 9:21 PM, Fulcomer, Samuel 
> wrote:
> > And note that the high PriorityWeightAge may be complicating things. We
> set it to 0. With it set so high, it allows users to gain priority by
> flooding the queue if you allow high numbers of job submissions and they
> age up in priority while they're waiting to run.
>
> That's a great point. Changed to 0.
>
>
> The information in this e-mail is intended only for the person to whom it
> is addressed.  If you believe this e-mail was sent to you in error and the
> e-mail contains patient information, please contact the Mass General
> Brigham Compliance HelpLine at
> https://www.massgeneralbrigham.org/complianceline <
> https://www.massgeneralbrigham.org/complianceline> .
> Please note that this e-mail is not secure (encrypted).  If you do not
> wish to continue communication over unencrypted e-mail, please notify the
> sender of this message immediately.  Continuing to send or respond to
> e-mail after receiving this message means you understand and accept this
> risk and wish to continue to communicate over unencrypted e-mail.
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com