Martin Mokrejs added the comment:

Hi,
  I think I should report back what I found on the hardware side. While memory 
testing tools like memtest86+ and other did not find any error, the built in 
Dell ePSA test suite likely does compute a checksum of tested memory regions. 
It reported some addresses/regions as failed, sadly nobody seems to know 
details of the failing tests. On repeated testing different memory regions were 
reported, so I never understood whether that is a bad CPU cache or something 
randomizing the issue observed. At least, only one of the two SO-DIMMs caused 
the problems so lets conclude it was partly baked up and failing randomly. At 
that time it seemed the cause was either bad CPU producing just too much heat 
or the fan. Fan was replaced, max temps went down from 92 oC to 82 oC. Two 
months later I faced more and more often that an external HDMI-connected 
display did not receive signal, so even the CPU got replaced. I got another 
drop in max temperatures, now max are about 70 oC. Cool!

Back to python, the random crashes of my apps stopped after the memory module 
being replaced, actually who pair was replaced. I started to dream about linux 
kernel making mirroring inside memory for failure resiliency but there is 
nothing like that available.

In summary, this lesson was hard and showed that there are no good tools to 
test hardware. Checksums should be used always and bits tested for fading over 
the time. The mirroring trick could have also uncovered a failing memory or 
CPU. Seems there is still way to go to a perfect computer.

Thanks to everybody for their efforts on this issue. Whether python takes 
something from this lesson is up to you.

----------
status: pending -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18843>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to