Helmut Jarausch schrieb: > On 24 Nov, Stefan G. Weichinger wrote: >> Stefan G. Weichinger schrieb: >>> Stefan G. Weichinger schrieb: >>> >>>> Since then no crashes, but I would have to test clicking some more stuff >>>> to really believe ... >>> As always, after hitting SEND ... one more crash ... >> Sometimes it crashes after clicking opera, sometimes after clicking >> thunderbird, so far never when clicking/starting a gnome-terminal. >> >> I am still looking for a pattern or an error-message somewhere ... >> > > This reminds me of a problem we had just recently. > Have you got a multi-core CPU ? > If yes, read on. > > We have 6 machines here running an identical Gentoo system > (just different hostname and IP number) > with a AMD Phenom II quad core CPU and identical mother boards. > One of them had these random crashes you reported. > I've totured memory by running up to 3 memtester-processes > over night - no single fault. Our dealer has replaced the motherboard - > again no change. Then I suspected the CPU itself although it has stood > a burnK7 run for several hours. > > After the CPU has been replaced the spook has gone. > I suspect a cache coherence problem. The normal memory tests > assign a given window of the physical storage to a given core - > even if run in parallel. But a typical usage under Linux switches > the core which executes a given thread quite frequently. > Now the Phenom II has 4 core each with a private 0.5 Mb primary cache > but a 6 Mb second level cache common to all 4 cores. > In the BIOS one can opt for all 4 cores using this secondary cache > or for only a single core using it. > When a core writes to this cache or to memory all other cores must be > informed that their private cache is invalid. If this doesn't happen or > happens a bit too late, a core will fetch invalid (old) memory contents > which may result in a crash. > So, if you can, set the BIOS switch that only a single core > can use the secondary cache. If the problems disappears > the CPU is broken.
Phew, quite some theory ... do you positively know that this was the reason? I think I haven't seen such a setting in my BIOS. I use an Intel Core2Duo E6600 on a Intel DP965LT board here, 8 gigs of RAM lately ... BUT my issues really only started after completely going to ~amd64, I never saw such a crash before when I used a mixed setup (most pkgs stable, some unstable ...) I will have a look at my BIOS now. Thanks anyway for that information, greets to Aachen (from Austria) ... Stefan