On Sat, Dec 29, 2012 at 1:32 AM, epsilon <epsilo...@t-online.de> wrote: > recently we read a lot of total system freezes. Let me try to > summarize: > > Common in many cases is: The system totally freezes. No keyboard > interaction possible. No kernel panic. No coredump. Nothing in the > logs. Network (ICMP, routing) looks up. But no userland action.
Hmm, oddly, the person that started that thread, frantisek holop, figured out that their system *was* panicing and provided enough information that I've started bouncing around an idea about a possible cause with some other developers. > Different are the situations: Some users observe this during boot, > others in X during night, some see a high diskio just before the > freeze, others see heavy network load. Some systems run in a VM, > others on real hardware. Sometimes the issue is reproducable at the > same time during night, in other cases it occurs randomly. > > So we have a wide variety of situations, but often the same result: > Total freeze without any log or coredump. > > Let's assume all this cases have someting in common. Than something > very fundamental is broken. > > On the other hand, is it really likely all this cases are different > bugs? Your case, as far as you described it, is not the same as frantisek holop's. > To the developers: What is to provide if users did not have anything > in their logs, no cordeump, nothing. Only a total frozen system? Maybe > dmesg and config files, right? And a verbal description what happens, > right? Most of the descriptions I've seen have been too imprecise to help in diagnosis. "It freezes somewhere after "starting network daemons" and "starting local daemons". I tried to disable services I do not essentially need or to substitute them with other solutions. So far no findings here." Freezes 'somewhere'? Hard to make hypotheses about the cause when we're not told what processes were started, or whether it's consistent from freeze to freeze. If you turn on ddb.console=1 in sysctl.conf can you break into ddb when it hangs? What's trace and ps show in that case? show bcstats? If you've performed tests of various sorts, what did they show? Negative results are sometimes _more_ important than positive results; why bother doing a test if you're going to throw out the result? What hypotheses have been *excluded* by your test results? The title of the original thread was "snapshots total freeze", but there were dmesg's in the thread showing Aug kernel builds; for those who haven't tried running a (recent) snapshot, does your problem reproduce or change symptoms when you do? Is this consistent across hardware? Drop another machine into place where the freezing one is; does it freeze too? Philip Guenther