Re: [slurm-users] Custom Gres for SSD

Matthias Loose Mon, 24 Jul 2023 00:52:52 -0700

Hi Shunran,

we do something very similar. I have nodes with 2 SSDs in a Raid1mounted on /local. We defined a gres ressource just like you and calledit local. We define the ressource in the gres.conf like this:


  # LOCAL
  NodeName=hpc-node[01-10] Name=local

and add the ressource in counts of GB to the slurm.nodes.conf:

  NodeName=hpc-node01  CPUs=256 RealMemory=... Gres=local:3370

So in this case the node01 has 3370 counts or GB of the gres "local"available for reservation. Now slurm tracks that resource for you andusers can reserve counts of /local space. But there is still one bigproblem, SLURM hast no idea what local is and as u correctly noted,others can just use it. I solved this the following way:


- /local ist owned by root, so no user can just write to it

- the node prolog creates a folder in /local in this form:/local/job_<SLURM_JOB_ID> and makes the job owner of it

- the node epilog deletes that folder

This way you have already solved the problem of people/jobs not havingreserved any local using it. But there ist still no enforcement oflimits. For that I use quotas.My /local is XFS formatted and XFS has a nifty feature called projectquotas, where you can set the quota for a folder.


This is my node prolog script for this purpose:

  #!/bin/bash
  PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

  local_dir="/local"
  local_job=0

  ## DETERMINE GRES:LOCAL
  # get job gres

JOB_TRES=$(scontrol show JobID=${SLURM_JOBID} | grep "TresPerNode=" |cut -d '=' -f 2 | tr ',' ' ')


  # parse for local
  for gres in ${JOB_TRES}; do
    key=$(echo ${gres} | cut -d ':' -f 2 | tr '[:upper:]' '[:lower:]')
    if [[ ${key} == "local" ]]; then
      local_job=$(echo ${gres} | cut -d ':' -f 3)
      break
    fi
  done

  # make job local-dir if requested
  if [[ ${local_job} -ne 0 ]]; then
    # make local-dir for job
    SLURM_TMPDIR="${local_dir}/job_${SLURM_JOBID}"
    mkdir ${SLURM_TMPDIR}

    # conversion
    local_job=$((local_job * 1024 * 1024))

    # set hard limit to requested size + 5%
    hard_limit=$((local_job * 105 / 100))

    # create project quota and set limits

xfs_quota -x -c "project -s -p ${SLURM_TMPDIR} ${SLURM_JOBID}"${local_dir}xfs_quota -x -c "limit -p bsoft=${local_job}k bhard=${hard_limit}k${SLURM_JOBID}" ${local_dir}


    chown ${SLURM_JOB_USER}:0 ${SLURM_TMPDIR}
    chmod 750 ${SLURM_TMPDIR}
  fi

  exit 0

This is my epilog:

  #!/bin/bash
  PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

  local_dir="/local"
  SLURM_TMPDIR="${local_dir}/job_${SLURM_JOBID}"

  # remove the quota

xfs_quota -x -c "limit -p bsoft=0m bhard=0m ${SLURM_JOBID}"${local_dir}


  # remove the folder
  if [[ -d ${SLURM_TMPDIR} ]]; then
    rm -rf --one-file-system ${SLURM_TMPDIR}
  fi

  exit 0

In order to use project quota you would need to activate it by usingthis mount flag: pquota in the fstab.I give the user 5% more than he requested. You just have to make surethat you configure available space - 5% in the nodes.conf.


This is what we do and it works great.

Kind regards, Matt


On 2023-07-24 05:48, Shunran Zhang wrote:

Hi all,

I am attempting to setup a gres to manage jobs that need a
scratch space, but only a few of our computational nodes are
equipped with SSD for such scratch space. Originally I setup a new
partition for those IO-bound jobs, but it ended up that those jobs
might be allocated to the same node thus fighting each other for
IO.

With a look over other settings it appears that the gres setting
looks promising. However I was having some difficulties figuring
out how to limit access to such space to those who requested
--gres=ssd:1.

For now I am using Flags=CountOnly to trust users who uses SSD
request for it, but apparently any job submitted to a node with
SSD can just use such space. Our scratch space implementation is 2
disks (sda and sdb) formatted to btrfs and RAID 0. What should I
do to enforce such limit on which job can use such space?

Related configurations for ref:
gres.conf: NodeName=scratch-1 Name=ssd Flags=CountOnly cgroup.conf:
ConstrainDevices=yes slurm.conf: GresTypes=gpu,ssd
NodeName=scratch-1 CPUs=88 Sockets=2 CoresPerSocket=22
ThreadsPerCore=2  RealMemory=180000 Gres=ssd:1 State=UNKNOWN
Sincerely,
S. Zhang

Re: [slurm-users] Custom Gres for SSD

Reply via email to