Please see the latest update
# for i in {0..2}; do scontrol show node compute-0-$i | grep RealMemory;
done && scontrol show node hpc | grep RealMemory
RealMemory=64259 AllocMem=1024 FreeMem=57163 Sockets=32 Boards=1
RealMemory=120705 AllocMem=1024 FreeMem=97287 Sockets=32 Boards=1
RealMem
Hi Mahmood,
Your running job is requesting 6 CPUs per node (4 nodes, 6 CPUs per node). That
means 6 CPUs are being used on node hpc.
Your queued job is requesting 5 CPUs per node (4 nodes, 5 CPUs per node). In
total, if it was running, that would require 11 CPUs on node hpc. But hpc only
has 1
Hi Tina
The problem was that slurm was able to create de user directory but
later it wasn't able to create job_id directory... In the prolog script
I add a chown command and it worked! In epilog script slurm delete the
job_id directory so it works fine for me.
Thanks!
--
___
Dear Mahmood,
I'm not aware of any nodes, that have 32, or even 10 sockets. Are you
sure, you want to use the cluster like that?
Best
Marcus
On 12/17/19 10:03 AM, Mahmood Naderan wrote:
Please see the latest update
# for i in {0..2}; do scontrol show node compute-0-$i | grep
RealMemory; do
>Your running job is requesting 6 CPUs per node (4 nodes, 6 CPUs per
node). That means 6 CPUs are being used on node hpc.
>Your queued job is requesting 5 CPUs per node (4 nodes, 5 CPUs per node).
In total, if it was running, that would require 11 CPUs on node hpc. But
hpc only has 10 cores, so it
What services did you restart after changing the slurm.conf? Did you do an
scontrol reconfigure?
Do you have any reservations? scontrol show res
Sean
On Tue, 17 Dec. 2019, 10:35 pm Mahmood Naderan,
mailto:mahmood...@gmail.com>> wrote:
>Your running job is requesting 6 CPUs per node (4 nodes, 6
Greetings,
We are upgrading from 18.x to 19.05.4, but the build process for us appears a
bit different now.
1) There doesn't appear to be a 19.x OSU mvapich2 patch as there were for
previous slurms. Should we use the previous patch or not patch?
2) The acct_gather_profile_hdf5 plugin appears
These are the tests that we use:
The following steps can be performed to verify that the software has been
properly installed and configured. These should be done as a
non-privileged user:
• Generate a credential on stdout:
$ munge -n
• Check if a credential can be loca
>Did you do an scontrol reconfigure?
Thank you. That solved the issue.
Regards,
Mahmood
Hi,
I would like to know if it is possible to limit size of the generated
output file by a job using a lua script. I have seen "job_descriptor"
structure in slurm.h but I have not seen anything to limit that feature.
...I need this because a user submitted a job that has generated a 500
GB out
Hello friends,
We are running slurm 19.05.1-2 with an HA setup consisting of one primary and
one backup controller. However, we are observing that when the backup takes
over, for some reason AllocNodes is getting set to “none” on all of our
partitions. We can remedy this by manually setting A
Thanks for the response.
I have confirmed that the slurm.conf files are the same and that StateSaveDir
is working, we see logs like the following on the backup controller:
Recovered state of 9 partitions
Recovered JobId=124 Assoc=6
Recovered JobId=125 Assoc=6
Recovered JobId=126 Assoc=6
Recovered
Greetings --
>From the Accounting and Resource Limits documentation, there is both the
suggestion to make use of both 'Job Accounting' and 'Job Completion' data.
There is also the following statement, in the Slurm JobComp Configuration:
If you are running with the accounting storage plugin, use o
13 matches
Mail list logo