On 3/17/20 10:14 AM, Rich Freeman wrote:
On Tue, Mar 17, 2020 at 1:59 AM <tu...@posteo.de> wrote:
Finally, ALL DRIVES FAIL. It doesn't matter what the underlying
storage technology is. I've seen hard drives fail in less than a
year, with the warranty replacement drive failing less than a year
after that. I think next warranty replacement (still in the original
warranty period) lasted 5+ years of near-continuous use. The typical
failure modes of hard drives and solid state storage are different,
but they all fail. You can't perfectly predict WHEN they will fail
either. Most drives have SMART and sometimes it can detect failure
conditions before failure, but not always.
Hello Rich, et al.
I have deleted most, because I agree with the thread details, you get
what you pay for, but excess payment is rarely rewarded...
HEAT is the enemy of all electronics and mechanical things, computer
drives/memory are no exception. There are a myriad of interfaces/codes
on modern motherboards, and quite a few on legacy motherboards that
track heat. Some are not very accurate, but most, are reasonable.
Hopefully, you kept your mobo book. A section somewhere talks about
temperature sensors. If the cpu is loaded, the drives are most likely
getting hot. If the fans are running on a relatively high speed, the
system is generating tons of heat. If the GPU(s) are running ho9t, the
drives are hot. tools that scan the hardware for sensors are great, use
them!
I now install 'water coolers' from thermaltake on all my chassis based
system. new or large video cards have tons of processing going on inside
the GPUs; thus a large source of heat. Systems with lots of GPU cards,
are like ovens. All of this heat, regardless of source, KILLS all forms
of memory, especially 'drives'. Keep everything monitored, well vented
and in a room, cool as possible. Many server farm rooms run below 50
degrees F, to extend the performance and life of electronics,
particularly HDD and other forms of memory. Many chipsets, scale down,
upon increased heat, auto-magically.
Another (indirect) way to monitor heat, is to monitor the power
consumption of a component. (relatively) large power draw, is entwined
with heat production. Heat kills drives and memory.... no exceptions!
Here are few one-liners I use to monitor
(use/load==heat):
watch -n12 sensors -f
dstat -tcndylp --top-cpu 10
htop
What would be great, is if folks just list what they use to monitor the
workload (and therefor heat indirectly) or the actual temperatures of
given chipsets and "smart drives"? Perhaps we can then cull the
responses and update of the gentoo help pages online with more detailed
examples, scripts and tools to better organize heat, current and other
relative performance parameters.
hth,
James