It's recommended to round RealMemory down to the next lower gigabyte value to prevent nodes from entering a drain state after rebooting with a bios- or kernel-update.

Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node configuration"

Stephan

On 10.07.20 13:46, Sarlo, Jeffrey S wrote:
If you run  slurmd -C  on the compute node, it should tell you what slurm thinks the RealMemory number is.

Jeff

------------------------------------------------------------------------
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of navin srivastava <navin.alt...@gmail.com>
*Sent:* Friday, July 10, 2020 6:24 AM
*To:* Slurm User Community List <slurm-users@lists.schedmd.com>
*Subject:* Re: [slurm-users] changes in slurm.
Thank you for the answers.

is the RealMemory will be decided on the Total Memory value or total usable memory value.

i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
deda1x1591:~ # free -g
              total       used       free     shared    buffers     cached
Mem:           251         67        184          6          0         47

so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any slurm command which will provide me the value to add.

Regards
Navin.



On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus <toomuc...@gmail.com <mailto:toomuc...@gmail.com>> wrote:

    Navin,

    1. you will need to restart slurmctld when you make changes to the
    physical definition of a node. This can be done without affecting
    running jobs.

    2. You can have a node in more than one partition. That will not hurt
    anything. Jobs are allocated to nodes, not partitions, the partition is
    used to determine which node(s) and filter/order jobs. You should add
    the node to the new partition, but also leave it in the 'test'
    partition. If you are looking to remove the 'test' partition, set it to
    down and once all the running jobs that are in it finish, then
    remove it.

    Brian Andrus

    On 7/8/2020 10:57 PM, navin srivastava wrote:
     > Hi Team,
     >
     > i have 2 small query.because of the lack of testing environment i am
     > unable to test the scenario. working on to set up a test environment.
     >
     > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
     > i found the reason is because there is no RealMemory entry in the
    node
     > definition of the slurm.
     >
     > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461]
    NodeAddr=Node[1-12]
     > Sockets=2 CoresPerSocket=10 State=UNKNOWN
     >
     > if i add the RealMemory it should be able to pick. So my query here
     > is, is it possible to add RealMemory in the definition anytime while
     > the jobs are in progres and execute the scontrol reconfigure and
     > reload the daemon on client node?  or do we need to take a
     > downtime?(which i don't think so)
     >
     > 2. Also I would like to know what will happen if some jobs are
    running
     > in a partition(say test) and I will move the associated node to some
     > other partition(say normal) without draining the node.or if i
    suspend
     > the job and then change the node partition and will resume the
    job. I
     > am not deleting the partition here.
     >
     > Regards
     > Navin.
     >
     >
     >
     >
     >
     >
     >



-------------------------------------------------------------------
Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch
+4144 632 30 59  |  ETF D 104  |  Sternwartstrasse 7  | 8092 Zurich
-------------------------------------------------------------------

Reply via email to