One last thought, it could be a problem with the thermal regulation itself, if 
you can figure out the fan wiring you could set it so they are always full on, 
might help, might not.  you could also put in fans the same size but higher 
current/airflow.  as this is a server, apparently rack mounted you probably 
can't do much else about the cooling.

mad.scientist.at.large (a good madscientist)
--
Read, Scream, Fight <https://www.eff.org <https://www.eff.org/>>



20. Apr 2018 06:21 by michaelkintz...@gmail.com 
<mailto:michaelkintz...@gmail.com>:


> On Friday, 20 April 2018 12:55:13 BST Corbin Bird wrote:
>> Oak Ridge National Laboratory uses these processors ( Rhea Cluster ) and
>> has numerous heat failures.
>>
>> Due to poor cooling ... surprised?
>>
>> The cooling is not working right. Something is still wrong.
>>
>> On 04/19/2018 09:33 PM, R0b0t1 wrote:
>> > Dell Precision T7600, two 16 thread Xeons, 192GB of RAM, two Quadro
>> > cards and a Tesla card.
>> > 
>> > The system is a few years old at this point. Old enough that the
>> > thermal compound could have hardened, which is why I replaced it.
>
> If the problem started suddenly, rather than getting progressively worse over 
> time, it may have something to do with kernel drivers, or some change in 
> firmware.
>
> If the cause is mechanical, I'd also suggest checking the heat sink contact 
> surface.  Some heat sinks are poorly manufactured and require flattening with 
> wet 'n dry sandpaper to get a flat enough surface and improve their contact 
> with the CPU.  I've seen 15°C improvement in a Zalman CPU cooler after excess 
> metal was removed from copper pipes, which were manufactured proud.  Hardcore 
> O/C's flatten the CPU too, but I'd avoid anything as radical because it can 
> go 
> badly wrong if you remove more than the surface varnish from the chip.
>
> In the interim, opening the side panel may also help in hot weather.
>
> -- 
> Regards,
> Mick

Reply via email to