On Sunday, 14 October 2018 3:30:39 PM AEDT Steven Dick wrote:
> I've found that when creating a new cluster, slurmdbd does not
> function correctly right away. It may be necessary to restart
> slurmdbd at several points during the slurm installation process to
> get everything working correctly.
So the only mention of this I can find is a "rollup" mention in
https://slurm.schedmd.com/slurmdbd.conf.html , now that I specificaly
googled for "slurmdbd rollup" .
So if this hourly summary is the right behaviour, I'd request it be
better documented -- nothing at all is mentioned in
https://slur
Not following. Both are running on the same host.
Thanks.
On Sun, Oct 14, 2018 at 10:23:03PM +0100, Nathan Harper wrote:
> Check firewall rules or network comms in both directions. We had an issue
> with asymmetric routing between our slurmdbd and slurmctld and so connections
> could only be in
Check firewall rules or network comms in both directions. We had an issue with
asymmetric routing between our slurmdbd and slurmctld and so connections could
only be initiated one way. However, restarting slurmdbd would restart the
connection and resync the latest state (or something like that,
Sreport shows data that is summarized hourly. Restarting slurmdbd can delay
this process. If some jobs are missing end records it can massively slow
the process because it may need to pick a much earlier start time in the
past to summarize.
Sacctmgr show runawayjobs can help identify if you are i
This seems to reflect what I am seeing. Someone earlier mentioned
multiple restarts of slurmdbd... those restarts never made data appear
unless right around on the hour.
It's as if instead of data getting sent right through slurmdbd that
something in slurmdbd is just doing an hourly check of the t
On 14-10-2018 12:54, Steven Dick wrote:
It is documented that you need to create the cluster in the database.
It is not documented that the accounting system won't work until you
restart slurmdbd multiple times before it starts collecting accounting
records.
Also, none of the necessary restarts
I have noticed on several clusters that sreport can be upto one hour out of
date i.e. it will update on the hour every hour.
sacct does not behave this way and is always up to date.
I cannot see this stated in the docs or see any config settings to control
this but it happens on the last 17.02 cl
It is documented that you need to create the cluster in the database.
It is not documented that the accounting system won't work until you
restart slurmdbd multiple times before it starts collecting accounting
records.
Also, none of the necessary restarts are needed on an upgrade -- only
when slu
On 14-10-2018 06:30, Steven Dick wrote:
I've found that when creating a new cluster, slurmdbd does not
function correctly right away. It may be necessary to restart
slurmdbd at several points during the slurm installation process to
get everything working correctly.
Also, slurmctld will buffer
I've found that when creating a new cluster, slurmdbd does not
function correctly right away. It may be necessary to restart
slurmdbd at several points during the slurm installation process to
get everything working correctly.
Also, slurmctld will buffer the accounting data until slurmdbd starts
Hi.
I am setting up a new slurm cluster instance. And I just went through
what I thought were the right steps to get job accounting going with
slurmdbd.
So I know that slurmdbd itself works as I can use the sacctmgr commands
to add users and accounts, and the users cannot run jobs unless I first
12 matches
Mail list logo