Hi all,
On our setup we are using job_container/tmpfs to give each job it's own
temp space. Since our compute nodes have reasonably sized disks for
tasks that do a lot of disk I/O on user's data we have asked users to
copy their data to the local disk at the beginning of the task and (if
need
You could enable debug logging on your slurm controllers to see if that
provides some more useful info. I'd also check your firewall settings to make
sure your not blocking some traffic that you shouldn't. iptables -F will clear
your local Linux firewall.
I'd also triple check the UID on all t
I'm just learning about slurm. I understand that different different
partitions can be prioritized separately, and can have different max time
limits. I was wondering whether or not there was a way to have a
finer-grained prioritization based on the time limit specified by a job,
within a single pa
Yeah, that's sort of the job of the backfill scheduler, as smaller jobs
will fit better into the gaps. There are several options with in the
priority framework that you can use to dial in which jobs get which
priority. I recommend reading through all those and finding the options
that will work