On Fri 26/Oct/2018 11:27:36 +0200 Reco wrote: > On Fri, Oct 26, 2018 at 11:23:39AM +0200, Alessandro Vesely wrote: >> The problem is that the server froze. I don't think that's what it is >> supposed >> to do when a card fails. > > It's my impression too.
In general, it is too difficult to know if a link is good, at least on the local side. I found nothing better than running pings by cron. >> Contrast that with log lines about anything else, from non-redundant power >> supplies to failed GPG signatures. In part, the missing precise diagnosis >> must >> be a shortcoming on part of the card vendor. However, how come the kernel >> didn't realize that the link had to go down, log something, and just fail any >> subsequent call on that interface, instead of freezing? Or did it freeze for >> an unrelated reason? > > I believe that it's impossible to answer this question. It's highly > likely that it was kernel panic. Whenever it was related to failed NIC, > or no - it's impossible to tell since there's no kernel backtrace. Right. I should have tried Ctrl-Alt-F1 or some of the SysRq hacks[*], but I was too upset by services not responding... [*] https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html > I'd install, say, kdump-tools for the future incidents like this. Just installed, thank you! (I'll reboot when the new card arrives). Best Ale --