On Tue, Aug 22, 2006 at 06:44:52PM +0900, Pyun YongHyeon wrote: > On Tue, Aug 22, 2006 at 12:22:37PM +0400, Ruslan Ermilov wrote: > > I agree this is a less painful way to recover, but it's still a > > watchdog and it slows down the performance when it happens. After > > this commit, if there's a moderate number of missing Tx completion > > interrupts (for some reason), even a diagnostic message won't be > > printed. This is bad -- users will "seem" to have working but > > slow systems, without any indication of what causes this slowness. > > It just reinvokes txeof handler and check whether there are pending Tx > descriptors in driver queue. If there are no pending Tx descriptors > it's false watchdog timeout and just return without resetting > hardware. > This is all clear.
> So there is no performance drop. Of course, if we are out of > Tx descriptors and missed Tx completion interrupts it would slow down > Tx process. > Yes, that's what I was talking about. > ATM I don't know what caused this missing Tx completion interrupt. > (chipset bug/Tx interrupt moderation or other bug) > > > I think a diagnostic message should still be printed in this case, > I have no objections on printing a diagnostic message. But if missing > Tx completion interrupts is normal consequences for these hardwares > it would give negative impresstion to users. > It would tell the true, like em0: watchdog timeout (missed Tx interrupt) -- recovering (Maybe under bootverbose only.) > > and adapter->watchdog_events should still be incrementd, we just > > don't need to reinit the chip in this case. > > > adapter->watchdog_events is used to count output errors(if_oerrors). > If we know the watchog timeout is false we should not increment the > counter as we sucessfully transmitted it without errors. > It's still a watchdog event. We can make it a separate counter, like watchdog_tx_event, and not add it to oerrors, but still show it in em_print_hw_stats(). It'd be useful to have this statistics available. > Because it's hard to reproduce it I guess it only happens under > certain conditions. In addition we don't know how many Tx completion > interrupts are lost. If you think it should recover fast from the > above condition wihtout waiting for a watchdog timeout we could > embebd an em_txeof() into em_local_timer() to sweep up Tx > descriptors sucessfully transmitted. > That would make it look more like polling. :-) I'm pretty sure this problem is not unique to em(4). Adding these quirks to all known to be subject to this issue drivers and gathering the statistics would be a good thing IMO. Cheers, -- Ruslan Ermilov [EMAIL PROTECTED] FreeBSD committer
pgpwpMr7xIvMp.pgp
Description: PGP signature