Hello, dear debian fellows!

Please forgive my paranoid anonymity, in view of the last section of
this message.

1) My problem:

I have happily used debian since 1995 (0.93R6 if I recall?). But since I
installed 2.1 on my new PC at work, about a year ago, that machine
undergoes about a crash per month in average. Nothing to scare a
windblows user, of course, but unbearable for someone who knows this
should not be so. Especially as these crashes are unrecoverable: screen
frozen, mouse/keyboard frozen (no vt switching nor clean reboot
possible) and even no access from outside through the network. Thus no
alternative to the brutal power switchoff, with subsequent fsck'ing of
the whole disk.

When does this happen? Always with a heavy load (2-3 users on a 128Mb
pentium
400, each with several windows, netscape, emacs etc + some compilation
or latex2html going on); always with at least one remote ssh login. I
also sometimes had the impression of the mouse freezing temporarily
before the total crash, but you know how short time causality can be
violated in the human brain.

2) Software problems?

In the beginning, I attributed this to the network interface card
(3C905C=Tornado) that was not officially supported by Donald Becker's
3C59x
driver. Indeed, a twin sister machine (same install, same hardware
except for an officially supported 3C905B) had no problem whatsoever. So
I fetched the official driver from 3com site, tried a precompiled
mandrake kernel with this 3com driver, but the problem remained. I then
tried various kernel+3C59x pre- or home-compiled versions (2.13, 2.15,
2.17)
but with each I endured at least one crash. I checked the NIC
autoconfigured network parameters with ether-diag, found out half duplex
generated less error messages but nothing more serious. I installed
gnu-accounting package (acct), to see the last commands that were
executed before the crash, but found nothing special.

3) Hardware problems?

Bored with switching kernels, I followed the hardware problem track.
Despite a successful memory test at boot, maybe one of the 2 memory bars
was bad? I ran during the summer on half memory, but it ended up by a
crash again. I switched the memory bar: problem again a month later.
Maybe the NIC slot was bad? I switched with the soundcard last week. No
crash yet, but I have reasons to believe it won't help.

4) Hacker/virus problems?

During the very first hour of the very first install, I got port-scan
attacked (see log below). Bad point for debian, I thought: what is the
probability of a PC being attacked in the first hour of its connection
to the net? Looks more probable that the attack was triggered by the
install process! Anyway, the ports for telnet (22) and ftp (??) were
filtered by the local router (except for local machines), I was not
running any daemon, so I was not too scared. After watching the logs for
about a week, I opened the machine to full internet exclusively through
ssh connections which are not filtered by the local router.

Until last week, I had no reason to think of hacker origin to my
crashes. But last week, I got 2 crashes. And I noticed something very
curious in the accounting logs. Among the last processes that finished
less than 5 minutes before the crash, there was a bunch of NAMELESS root
processes, that started at 0 unix time (Jan 1 1970) and lasted 0 second
(!?). E.g: 

# lastcomm

S20acct           S     root     ??         0.01 secs Thu Oct 19 19:40
accton            S     root     ??         0.00 secs Thu Oct 19 19:40
---> reboot
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
                        root     ??         0.00 secs Thu Jan  1 01:00
perl                    user1    ??         0.18 secs Thu Oct 19 18:57
sh                      user1    ??         0.01 secs Thu Oct 19 18:57

Suspicious, no? Such nameless processes occured in both crashes last
week. I kept the acct logs of the previous one too: there they were,
less than 5 minutes before the crash. Unfortunately, I could not check
the ones before. 

But even more curious: my previous machine (call it PC2), with the
same install, but a totally different (older) hardware. Users from my
machine (call it PC1) often ssh-log to PC2 and vice-versa. Furthermore,
during the portscan-attacked install, the new PC1 was bearing the name
and address the previously existing PC2. Anyway, for the first time last
week, PC2 endured 2 crashes too. On one of these crashes, there were
nameless-timeless root processes just before the
crash also. But no sign of remote login the full day: looks more like a
virus than a hacker? Unless the nameless root processes were in fact
erasing the footprints of the remote-login in the various logs, and
their name got erased by the process that crashed the machine and thus
left no track.

All this is a bit loose: not that many statistics to be sure. But I am
not too keen on accumulating statistics! So I would like to be able to
get as much info out of each crash as I can. Acct is better than
nothing, but in fact, I have no way to know which processes were
actually active *during* the crash. Nor how much memory was used and
things like that. 

5) Questions:

0) Is anybody experiencing the same kind of flakiness? If it is really
the install that          triggered it for me, it should have triggered
it for others!
1) How can one generate nameless processes in acct logs? Can this be
normal?
2) What tools could I use to help pinpointing the problem? E.g: a
process accounting that        would log the beginning (instead of
ending) processes...
3) Can a network driver really freeze the full kernel?
4) How can the kernel be frozen? Is there a kernel bug that propagated
through 2.2.13-17?

Many thanks for any help!

PS: you can privately reply to this mail.

Annex 1: Portscan attack (november 99)

9:07:13 tcplogd: port 1114 connection attempt from
[EMAIL PROTECTED] [123.4.576.89]
9:07:13 tcplogd: port 1116 (idem)
9:07:15 tcplogd: port 1171 "
9:07:18 tcplogd: port 1174 "
9:07:20 tcplogd: port 1183 "
9:07:24 tcplogd: port 1186 
9:07:26 tcplogd: port 1192 
9:08:34 tcplogd: port 1195 
9:09:10 tcplogd: port 1203 
9:09:42 tcplogd: port 1206 
9:10:05 tcplogd: port 1212 
9:10:27 tcplogd: port 1215 
9:11:03 tcplogd: port 1282 
9:13:15 tcplogd: port 1371 
9:14:07 tcplogd: port 1430 
9:14:37 tcplogd: port 1433 
9:14:48 tcplogd: port 1503 
9:15:00 tcplogd: port 1506 
9:18:23 tcplogd: port 1599 
9:19:05 tcplogd: port 1634 
9:19:13 tcplogd: port 1667 
9:19:15 tcplogd: port 1794 
9:19:17 tcplogd: port 1888 
9:19:18 tcplogd: port 2042 
9:19:20 tcplogd: port 2089 
9:19:22 tcplogd: port 2093 
9:21:20 tcplogd: port 2098 
9:21:33 tcplogd: port 2103 
9:21:35 tcplogd: port 2106 
9:21:37 tcplogd: port 2146 
9:21:38 tcplogd: port 2149 
9:21:40 tcplogd: port 2153 
9:21:42 tcplogd: port 2157 
9:22:05 tcplogd: port 2160 
9:24:01 tcplogd: port 2166 
9:24:09 tcplogd: port 2169 
9:26:10 tcplogd: port 2174 
9:27:57 tcplogd: port 2213 
9:27:57 tcplogd: port 2216 
9:28:27 tcplogd: port 2221 
9:31:17 tcplogd: port 2224 
9:31:29 tcplogd: port 2232 
9:31:48 tcplogd: port 2235 
9:32:03 tcplogd: port 2243 
9:32:16 tcplogd: port 2252 
9:32:29 tcplogd: port 2255 
9:32:42 tcplogd: port 2258 
9:32:59 tcplogd: port 2266 
9:33:23 tcplogd: port 2308 
9:34:39 tcplogd: port 2377 
9:34:41 tcplogd: port 2383 
9:34:42 tcplogd: port 2386 
9:34:45 tcplogd: port 2456 
9:34:48 tcplogd: port 2465 
9:35:29 tcplogd: port 2480 
9:35:34 tcplogd: port 2545 
9:35:38 tcplogd: port 2662 
9:35:42 tcplogd: port 2666 
9:35:46 tcplogd: port 2670 
9:35:51 tcplogd: port 2857 
9:35:58 tcplogd: port 2904 
9:36:11 tcplogd: port 3084 
9:36:13 tcplogd: port 3138 
9:36:22 tcplogd: port 3141 
9:36:36 tcplogd: port 3146 
9:36:40 tcplogd: port 3203 
9:36:51 tcplogd: port 3271 
9:37:03 tcplogd: port 3329 
9:37:15 tcplogd: port 3388 
9:37:23 tcplogd: port 3444 
9:37:26 tcplogd: port 3631 
9:37:29 tcplogd: port 3689 
9:37:32 tcplogd: port 3695 
9:37:34 tcplogd: port 3755 
9:37:38 tcplogd: port 3879 
9:37:41 tcplogd: port 4003 
9:37:43 tcplogd: port 4126 
9:37:45 tcplogd: port 4129 
9:37:54 tcplogd: port 4136 
9:37:57 tcplogd: port 4142 
9:38:03 tcplogd: port 4147 
9:38:10 tcplogd: port 4152 
9:38:19 tcplogd: port 4156 
9:38:26 tcplogd: port 4159 
9:38:28 tcplogd: port 4163 
9:38:30 tcplogd: port 4169 
9:38:41 tcplogd: port 4174 
9:38:45 tcplogd: port 4180 
9:38:54 tcplogd: port 4183 
9:38:58 tcplogd: port 4188 
9:39:03 tcplogd: port 4191

Reply via email to