You probably have a core file in the directory where slurmdbd logs to, a back trace from gdb would be most telling
On Oct 17, 2017 08:17, "Loris Bennett" <loris.benn...@fu-berlin.de> wrote: > > Hi, > > We have been having some with NFS mounts via Infiniband getting dropped > by nodes. We ended up switching our main admin server, which provides > NFS and Slurm from one machine to another. > > Now, however, if slurmdbd is started, as soon as slurmctld starts, > slurmdbd seg faults. In the slurmdbd.log we have > > slurmdbd: error: We have more allocated time than is possible (7724741 > > 7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 - > 2017-10-17T17:00:00 tres 1 > slurmdbd: error: We have more time than is possible > (7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from > 2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1 > slurmdbd: Warning: Note very large processing time from hourly_rollup > for soroban: usec=46390426 began=17:08:17.777 > Segmentation fault (core dumped) > > and the corresponding output of strace is > > fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0 > write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132 > +++ killed by SIGSEGV (core dumped) +++ > > We're running 17.02.7. Any ideas? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de >