Due to a faulty fan, one CPU overheated and brought the system down. On restart, fsck indicated that some filesystem corruption occured.
On startup, gdm would not start. After entering my username in the console, the login prompt came back without giving me the opportunity to enter my password. The logical next step, booting in single user mode. In single user mode, it quickly appeared that a few programs segfault. Among them : su, apache, gdm, smbd, nmbd, cron, pppd, and login. Mostly everything else superficially seems to work, with a few exceptions. So I tried to find out what these program could have in common appart from creating tasks with a different user than the one under which they are run. I suspected that they all depended on a library whose file the crash corrupted. So off I went with ldd. Apart from the omnipresent libc6 (without which not much does anything at all), the prime suspect was libcrypt. It seems that anything that uses libcrypt crashes the moment it calls it. I only say "it seems" because I was unable to be more conclusive after observation of strace output. But it may be because I am not familiar with strace. I observed one exception : makepasswd. Strace shows it calling something from libcrypt, but it does its job with no problem. I compared /lib/libcrypt.so.1 between the broken server and another machine with the same OS, and the file sizes were identical. So I have no proof that libcrypt is guilty and my feelings toward this hypothesis may be completely wrong. Here is an example of strace outsput. The program studied is "login" (the one that generates the console login prompt). It begins with calls in /lib/libcrypt.so.1 /lib/libpam.so.0 /lib/libpam_misc.so.0 /lib/libdl.so.2 Then, on the sane system it goes like the following. It's the same on the broken system, except that the memory addresses are not the same. open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\230\327"..., 1024) = 1 fstat64(3, {st_mode=S_IFREG|0755, st_size=1170492, ...}) = 0 old_mmap(NULL, 1187296, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4005c000 mprotect(0x40174000, 40416, PROT_NONE) = 0 old_mmap(0x40174000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1 old_mmap(0x4017a000, 15840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANO close(3) = 0 munmap(0x40016000, 40843) = 0 Here, login on the broken machine segfaults : --- SIGSEGV (Segmentation fault) --- +++ killed by SIGSEGV +++ Except that instead the memory address on the last line is different : munmap(0x40016000, 35897) = 0 I dont know if that detail is relevant, but since some (but not all) of the segfaulting programs end the same way, I thought it might be. On the sane system, here is the beginning of what follows in the strace after the point where it has segfaulted on the broken system. brk(0) = 0x80546dc brk(0x8054704) = 0x8054704 brk(0x8055000) = 0x8055000 getuid32() = 0 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0 brk(0x8057000) = 0x8057000 readlink("/proc/self/fd/0", "/dev/pts/2", 4095) = 10 socket(PF_UNIX, SOCK_STREAM, 0) = 3 connect(3, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ENOENT close(3) = 0 open("/etc/nsswitch.conf", O_RDONLY) = 3 I thought it might give some elements of context. If anyone has read this far, thank you. At that point, I am somewhat out of my depth to say the least. Any hint that can help me pin down the cause of my misery is more than welcome. And yes, I do have backups of my data, but not of the operating system.