You probably have a core file in the directory where slurmdbd logs to, a
back trace from gdb would be most telling

On Oct 17, 2017 08:17, "Loris Bennett" <loris.benn...@fu-berlin.de> wrote:

>
> Hi,
>
> We have been having some with NFS mounts via Infiniband getting dropped
> by nodes.  We ended up switching our main admin server, which provides
> NFS and Slurm from one machine to another.
>
> Now, however, if slurmdbd is started, as soon as slurmctld starts,
> slurmdbd seg faults.  In the slurmdbd.log we have
>
>   slurmdbd: error: We have more allocated time than is possible (7724741 >
> 7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 -
> 2017-10-17T17:00:00 tres 1
>   slurmdbd: error: We have more time than is possible
> (7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from
> 2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1
>   slurmdbd: Warning: Note very large processing time from hourly_rollup
> for soroban: usec=46390426 began=17:08:17.777
>   Segmentation fault (core dumped)
>
> and the corresponding output of strace is
>
>   fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0
>   write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132
>   +++ killed by SIGSEGV (core dumped) +++
>
> We're running 17.02.7.  Any ideas?
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de
>

Reply via email to