Hi,

We have been having some with NFS mounts via Infiniband getting dropped
by nodes.  We ended up switching our main admin server, which provides
NFS and Slurm from one machine to another.

Now, however, if slurmdbd is started, as soon as slurmctld starts,
slurmdbd seg faults.  In the slurmdbd.log we have

  slurmdbd: error: We have more allocated time than is possible (7724741 > 
7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 - 
2017-10-17T17:00:00 tres 1
  slurmdbd: error: We have more time than is possible 
(7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from 
2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1
  slurmdbd: Warning: Note very large processing time from hourly_rollup for 
soroban: usec=46390426 began=17:08:17.777
  Segmentation fault (core dumped)

and the corresponding output of strace is

  fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0
  write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132
  +++ killed by SIGSEGV (core dumped) +++

We're running 17.02.7.  Any ideas?

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

Reply via email to