Hello, dear debian fellows! Please forgive my paranoid anonymity, in view of the last section of this message.
1) My problem: I have happily used debian since 1995 (0.93R6 if I recall?). But since I installed 2.1 on my new PC at work, about a year ago, that machine undergoes about a crash per month in average. Nothing to scare a windblows user, of course, but unbearable for someone who knows this should not be so. Especially as these crashes are unrecoverable: screen frozen, mouse/keyboard frozen (no vt switching nor clean reboot possible) and even no access from outside through the network. Thus no alternative to the brutal power switchoff, with subsequent fsck'ing of the whole disk. When does this happen? Always with a heavy load (2-3 users on a 128Mb pentium 400, each with several windows, netscape, emacs etc + some compilation or latex2html going on); always with at least one remote ssh login. I also sometimes had the impression of the mouse freezing temporarily before the total crash, but you know how short time causality can be violated in the human brain. 2) Software problems? In the beginning, I attributed this to the network interface card (3C905C=Tornado) that was not officially supported by Donald Becker's 3C59x driver. Indeed, a twin sister machine (same install, same hardware except for an officially supported 3C905B) had no problem whatsoever. So I fetched the official driver from 3com site, tried a precompiled mandrake kernel with this 3com driver, but the problem remained. I then tried various kernel+3C59x pre- or home-compiled versions (2.13, 2.15, 2.17) but with each I endured at least one crash. I checked the NIC autoconfigured network parameters with ether-diag, found out half duplex generated less error messages but nothing more serious. I installed gnu-accounting package (acct), to see the last commands that were executed before the crash, but found nothing special. 3) Hardware problems? Bored with switching kernels, I followed the hardware problem track. Despite a successful memory test at boot, maybe one of the 2 memory bars was bad? I ran during the summer on half memory, but it ended up by a crash again. I switched the memory bar: problem again a month later. Maybe the NIC slot was bad? I switched with the soundcard last week. No crash yet, but I have reasons to believe it won't help. 4) Hacker/virus problems? During the very first hour of the very first install, I got port-scan attacked (see log below). Bad point for debian, I thought: what is the probability of a PC being attacked in the first hour of its connection to the net? Looks more probable that the attack was triggered by the install process! Anyway, the ports for telnet (22) and ftp (??) were filtered by the local router (except for local machines), I was not running any daemon, so I was not too scared. After watching the logs for about a week, I opened the machine to full internet exclusively through ssh connections which are not filtered by the local router. Until last week, I had no reason to think of hacker origin to my crashes. But last week, I got 2 crashes. And I noticed something very curious in the accounting logs. Among the last processes that finished less than 5 minutes before the crash, there was a bunch of NAMELESS root processes, that started at 0 unix time (Jan 1 1970) and lasted 0 second (!?). E.g: # lastcomm S20acct S root ?? 0.01 secs Thu Oct 19 19:40 accton S root ?? 0.00 secs Thu Oct 19 19:40 ---> reboot root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 root ?? 0.00 secs Thu Jan 1 01:00 perl user1 ?? 0.18 secs Thu Oct 19 18:57 sh user1 ?? 0.01 secs Thu Oct 19 18:57 Suspicious, no? Such nameless processes occured in both crashes last week. I kept the acct logs of the previous one too: there they were, less than 5 minutes before the crash. Unfortunately, I could not check the ones before. But even more curious: my previous machine (call it PC2), with the same install, but a totally different (older) hardware. Users from my machine (call it PC1) often ssh-log to PC2 and vice-versa. Furthermore, during the portscan-attacked install, the new PC1 was bearing the name and address the previously existing PC2. Anyway, for the first time last week, PC2 endured 2 crashes too. On one of these crashes, there were nameless-timeless root processes just before the crash also. But no sign of remote login the full day: looks more like a virus than a hacker? Unless the nameless root processes were in fact erasing the footprints of the remote-login in the various logs, and their name got erased by the process that crashed the machine and thus left no track. All this is a bit loose: not that many statistics to be sure. But I am not too keen on accumulating statistics! So I would like to be able to get as much info out of each crash as I can. Acct is better than nothing, but in fact, I have no way to know which processes were actually active *during* the crash. Nor how much memory was used and things like that. 5) Questions: 0) Is anybody experiencing the same kind of flakiness? If it is really the install that triggered it for me, it should have triggered it for others! 1) How can one generate nameless processes in acct logs? Can this be normal? 2) What tools could I use to help pinpointing the problem? E.g: a process accounting that would log the beginning (instead of ending) processes... 3) Can a network driver really freeze the full kernel? 4) How can the kernel be frozen? Is there a kernel bug that propagated through 2.2.13-17? Many thanks for any help! PS: you can privately reply to this mail. Annex 1: Portscan attack (november 99) 9:07:13 tcplogd: port 1114 connection attempt from [EMAIL PROTECTED] [123.4.576.89] 9:07:13 tcplogd: port 1116 (idem) 9:07:15 tcplogd: port 1171 " 9:07:18 tcplogd: port 1174 " 9:07:20 tcplogd: port 1183 " 9:07:24 tcplogd: port 1186 9:07:26 tcplogd: port 1192 9:08:34 tcplogd: port 1195 9:09:10 tcplogd: port 1203 9:09:42 tcplogd: port 1206 9:10:05 tcplogd: port 1212 9:10:27 tcplogd: port 1215 9:11:03 tcplogd: port 1282 9:13:15 tcplogd: port 1371 9:14:07 tcplogd: port 1430 9:14:37 tcplogd: port 1433 9:14:48 tcplogd: port 1503 9:15:00 tcplogd: port 1506 9:18:23 tcplogd: port 1599 9:19:05 tcplogd: port 1634 9:19:13 tcplogd: port 1667 9:19:15 tcplogd: port 1794 9:19:17 tcplogd: port 1888 9:19:18 tcplogd: port 2042 9:19:20 tcplogd: port 2089 9:19:22 tcplogd: port 2093 9:21:20 tcplogd: port 2098 9:21:33 tcplogd: port 2103 9:21:35 tcplogd: port 2106 9:21:37 tcplogd: port 2146 9:21:38 tcplogd: port 2149 9:21:40 tcplogd: port 2153 9:21:42 tcplogd: port 2157 9:22:05 tcplogd: port 2160 9:24:01 tcplogd: port 2166 9:24:09 tcplogd: port 2169 9:26:10 tcplogd: port 2174 9:27:57 tcplogd: port 2213 9:27:57 tcplogd: port 2216 9:28:27 tcplogd: port 2221 9:31:17 tcplogd: port 2224 9:31:29 tcplogd: port 2232 9:31:48 tcplogd: port 2235 9:32:03 tcplogd: port 2243 9:32:16 tcplogd: port 2252 9:32:29 tcplogd: port 2255 9:32:42 tcplogd: port 2258 9:32:59 tcplogd: port 2266 9:33:23 tcplogd: port 2308 9:34:39 tcplogd: port 2377 9:34:41 tcplogd: port 2383 9:34:42 tcplogd: port 2386 9:34:45 tcplogd: port 2456 9:34:48 tcplogd: port 2465 9:35:29 tcplogd: port 2480 9:35:34 tcplogd: port 2545 9:35:38 tcplogd: port 2662 9:35:42 tcplogd: port 2666 9:35:46 tcplogd: port 2670 9:35:51 tcplogd: port 2857 9:35:58 tcplogd: port 2904 9:36:11 tcplogd: port 3084 9:36:13 tcplogd: port 3138 9:36:22 tcplogd: port 3141 9:36:36 tcplogd: port 3146 9:36:40 tcplogd: port 3203 9:36:51 tcplogd: port 3271 9:37:03 tcplogd: port 3329 9:37:15 tcplogd: port 3388 9:37:23 tcplogd: port 3444 9:37:26 tcplogd: port 3631 9:37:29 tcplogd: port 3689 9:37:32 tcplogd: port 3695 9:37:34 tcplogd: port 3755 9:37:38 tcplogd: port 3879 9:37:41 tcplogd: port 4003 9:37:43 tcplogd: port 4126 9:37:45 tcplogd: port 4129 9:37:54 tcplogd: port 4136 9:37:57 tcplogd: port 4142 9:38:03 tcplogd: port 4147 9:38:10 tcplogd: port 4152 9:38:19 tcplogd: port 4156 9:38:26 tcplogd: port 4159 9:38:28 tcplogd: port 4163 9:38:30 tcplogd: port 4169 9:38:41 tcplogd: port 4174 9:38:45 tcplogd: port 4180 9:38:54 tcplogd: port 4183 9:38:58 tcplogd: port 4188 9:39:03 tcplogd: port 4191