Karl,

First off, thank you very much for looking so closely at my logs! I truly appreciate your time.


I've been running Debian testing for about a year-and-a-half. It's been quite stable. I performed a dist-upgrade about two weeks ago. It's been unstable since. By unstable I mean that applications may crash (disappear) and the system may freeze. The system is freezing about once a day.

Did the upgrade include the kernel?

Yes, it included an upgrade from 2.6.12


My processor is an Athlon 64 3200+, but I'm running the 686 kernel (2.6.18-4). I've posted a few logs here:

  http://robinsonhome.org/logs1/dmesg.txt

According to this, you're using the NVidia kernel module, which is closed source. If (and that's a _big_ if so far) the cause of your instability is a kernel problem, see if you can reproduce it without loading the nvidia module; the kernel maintainers tend to ignore kernel bug reports when closed-source modules are loaded...

Yes, I reproduced it again today without the closed-source nVidia modules.


You're also using the ivtv modules - is this a mythbox by any chance? I noticed lirc too somewhere... I'm running roughly the same configuration without problems (AMD 1.8Ghz, 2.6.18 kernel, 512Mb memory, ivtv 0.8.2), but I'm using the -k7 kernel.

Yes, it serves as a Myth backend and frontend...as well as an LTSP server. It did a mighty good job of it all until this happened :(



  http://robinsonhome.org/logs1/kern.log
  http://robinsonhome.org/logs1/messages

According to this you're running out of memory!? At least the oom-killer is (disturbingly) active before the reboot, but the messages are pretty definite: your 2G swap is maxed out. That's bad and will cause all sorts of problems...

I *think* these occurred when I was hunting for a memory size to test with 'memtest.' But now I know to look for this error specifically...thanks.


One thing I noticed while recompiling various applications is that gcc would display the following error:

dsputil.c: In function 'pix_abs8_y2_c':
dsputil.c:3048: internal compiler error: Segmentation fault

This *could* be an unhandled out-of-memory error...

Do you know of any way to get more information from gcc when this happens? It may lead me to the problem. I Googled around but couldn't find anything.


But if I just continued the build by typing 'make' again, it would pick up where it left off and eventually complete. For a large application, I may run into this problem a few times. Upon further view of the build logs I noticed this snippet:

gcc -c -pipe -march=k8

MythTV again? It has it's own processor detection code. And it does take a while to compile...

Yup, it's Myth. I really don't think I had to recompile...I was just reaching at that point. But now it's become the quick litmus test for system stability.



Next time you compile things, start a couple of sessions (=separate windows):
- vmstat 5 - to keep track of free memory and swapping
- top - sorted so the most memory hungry processes are on top
- tail -f /var/log/syslog - to see when oom-killer fires up
- a compile session

and keep an eye on what happens in the other sessions when gcc fails

I'll try it out and let you know how it goes.  Thanks again!

-Mike


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to