On 04/11/2017 01:10, Jeff Gilbert wrote: > Clock speed and core count matter much more than ECC. I wouldn't chase > ECC support for general dev machines.
The Xeon-W SKUs I posted in the previous thread all had identical or higher clock speeds than equivalent Core i9 SKUs and ECC support with the sole exception of the i9-7980XE which has slightly higher (100MHz) peak turbo clock than the Xeon W-2195. There is IMHO no performance-related reason to skimp on ECC support especially for machines that will sport a significant amount of memory. Importance of ECC memory is IMHO underestimated mostly because it's not common and thus users do not realize they may be hitting memory errors more frequently than they realize. My main workstation is now 5 years old and has accumulated 24 memory errors; that may not seem much but if it happens at a bad time, or in a bad place, they can ruin your day or permanently corrupt your data. As another example of ECC importance my laptop (obviously) doesn't have ECC support and two years ago had a single bit that went bad in the second DIMM. The issue manifested itself as internal compiler errors while building Fennec. The first time I just pulled again from central thinking it was a fluke, the second I updated the build dependencies which I hadn't done in a while thinking that an old GCC might have been the cause. It was not until the third day with a failure that I realized what was happening. A 2-hours long memory test showed me the second DIMM was bad so I removed it, ordered a new one and went on to check my machine. I had to purge my compilation cache because garbage had accumulated in there, run an hg verify on my repo as well as verifying all the installed packages for errors. Since I didn't have access to my main workstation at the time I had wasted 3 days chasing the issue and my workflow was slowed down by a cold compilation cache and a gimped machine (until I could replace the DIMM). This is not common, but it's not rare either and we now have hundreds of developers within Mozilla so people are going to run into issues that can be easily prevented by having ECC memory. That being said ECC memory also makes machines less susceptible to Rowhammer-like attacks and makes them detectable while they are happening. For a more in-depth reading on the matter I suggest reading "Memory Errors in Modern Systems - The Good, The Bad, and The Ugly" [1] in which the authors analyze memory errors on live systems over two years and argue that SEC-DED ECC (the type of protection you usually get on workstations) is often insufficient and even chipkill ECC (now common on most servers) is not enough to catch all errors happening during real world use. Gabriele [1] https://www.cs.virginia.edu/~gurumurthi/papers/asplos15.pdf
signature.asc
Description: OpenPGP digital signature
_______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform