Hi,
We have been having some with NFS mounts via Infiniband getting dropped
by nodes. We ended up switching our main admin server, which provides
NFS and Slurm from one machine to another.
Now, however, if slurmdbd is started, as soon as slurmctld starts,
slurmdbd seg faults. In the slurmdbd.log we have
slurmdbd: error: We have more allocated time than is possible (7724741 >
7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 -
2017-10-17T17:00:00 tres 1
slurmdbd: error: We have more time than is possible
(7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from
2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1
slurmdbd: Warning: Note very large processing time from hourly_rollup for
soroban: usec=46390426 began=17:08:17.777
Segmentation fault (core dumped)
and the corresponding output of strace is
fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0
write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132
+++ killed by SIGSEGV (core dumped) +++
We're running 17.02.7. Any ideas?
Cheers,
Loris
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email [email protected]