On Sat, 25 Nov 2006 [EMAIL PROTECTED] wrote: .... reformatted ... > First thing is I would like to thank everyone for their replies/help. > This machine has been running for two years under Linux, but for last ^^^^^^^^^
Ugh Oh - possible CPU fan "fatigue" time... more below. > two or three months has had Nexenta Solaris on it. This machine has > never once crashed. I rebooted with a Knoppix disk in and ran memtest86. > Within 30 minutes it counted several hundred errors which after cleaning > the connections still occurred in the same locations. I replaced the RAM > module and retested with no errors. My md5sums all verified no data was > lost making me very happy. I did a zpool scrub which came back 100% > clean. I still don't understand how the machine ran reliably with bad > ram. That being said, a few days later I did a zpool status and saw 20 > checksum errors on one drive and 30 errors on the other. You're still chasing a hardware issue(s) IMHO. First, ensure that you are blowing air over the HDA (Head Disk Assembly) of your installed hard drives. The drives don't care if the airflow is from back to front, left to right, right to left, front to back etc. And it does not have to be a lot of air. As long as there is positive airflow over the HDA and the disk drive controller electronics. Otherwise, it's likely that the disk drives will overheat while there is a lot of head movement taking place. My suggestion is to get a 92mm fan(s) with a hard disk type connector and jury rig the fan(s) to blow air accross the drives. Do whatever it takes to secure the fans in position - bent wire hangers secured to the case will work! It may not look pretty - but it'll get the job done. Or .. mount the drives in drive cannisters with built-in fans. Next is to check for hotspots within the box. Check the memory SIMMs are getting good airflow. A great way to resolve this type of issue is to use the Zalman Fan Bracket (FB123) and one or more 92mm fans. The bracket itself is hard to explain - but it allows you to attach up to 4 fans in slots and position them above anything that is a hot-spot - including, the motherboard chipset, RAM SIMMs, graphics boards, gigabit ethernet cards etc. A picture is worth a 1000 words: http://www.endpcnoise.com/cgi-bin/e/std/sku=fb123 Note: this is not an endorsement of this site - just a good picture - since the Zalman site (zalmanusa.com) is a pain to navigate. Still on the cooling thread - the Seasonic PSUs are highly rated and very quiet. But ... they don't move enough air through your box and should be suplemented with an intake fan (if you box has provision to add one) and a rear panel mounted exhaust fan. Many PC users have upgraded their PSUs and been careful to select a quiet PSU - but they did not realize that the quiet PSU, with its slow moving fan, greatly reduced the existing airflow through the box. The PSU can run effectively with the reduced airflow - but not the other components in the system. If you want to apply science and actually measure your box for hotspots, I suggest you run the box at the usual ambient temp, with the usual active workload then carefully remove the side cover very quickly (while the box is still running) and use a Fluke IR (Infra Red) thermal probe[1] to measure for hot spots. Record the CPU heatsink temp, RAM DIMMs, HDA, motherboard chipset etc. You can also busy out the box by running SETI and/or beat up on the disk drives and take more measurements[2]. Then after you apply the fixes ... retest. A couple of pointers that may help. If your box has an 80mm exhaust fan - replace it with a 92mm (or 120mm) fan and use a plastic 90mm to 80mm adaptor. This'll increase airflow without increasing the noise. Also, Zalman makes a small "gizmo" that you put inline with a fan, that allows you to vary the fan speed and set the speed to get the best noise/cooling tradeoff for your box. Its called the "fan mate 2". Last item on cooling (sorry) - many older systems that used small CPU fan based coolers, die after only 2 years. But in many cases, the fan does not actually stop turning - but slows down dramatically. And, sometimes it'll slow down only after it heats up a little. So if you take the side cover off after the system has been running for a couple of hours, you'll see the fan turning slowly - and touching the CPU heatsink will probably burn your finger. If you check it a minute after first powering up the system - it'll look normal and completely fool you. When this happens (fan slows down), the CPU temp will increase, it's thermal resistance will go lower, and it'll draw more current ... which will generate even more heat. This is the classic symptom of what we call "thermal runaway". A slightly more subtle variant of this issue, is with the AMD factory based coolers. After you remove the CPU heatsink fan, you'll notice a lot of dirt/dust blocking up to 1/2 the area of the heatsink and greatly reducing the airflow. But ... you *have to* remove the fan to actually see this. [3] If you have this issue, I suggest you replace the (AMD) factory cooler with a Zalman product. [4] In general, (Open)Solaris is a great system *exerciser*. It will usually flush out marginal hardware that appears to work just fine with other, "impaired" Operating Systems. > Does anyone have any idea why I have to do "zpool export amber" followed > by "zpool import amber" for my zpool to be mounted on reboot? zfs set > mountpoint does nothing. This may be a issue unique to Nexenta - I don't know. First get the hardware completely rock-solid - then look for the software issues. > BTW to answer some other concerns, the Seasonic supply is 400Watts with > a guaranteed minimum efficency of 80%. Using a kill-o-watt meter I have > about 120Watts power consumption. The machine is on a UPS. > [1] I use an older model which requires a separate DMM (digital multi-meter) with 1/10 of a milli-volt resolution. A Fluke DMM of course! But now the "Fluke 60 Series Handheld Infrared Thermometers" are accurate and affordable. For example: http://www.testequipmentdepot.com/fluke/thermometers/62.htm [2] but don't do this until you've determined that you have reasonable airflow within the box or you'll probably damage something. [3] Email me offlist with your motherboard and CPU type and I can probably make a recommendation. [4] I proposed this solution to a user on the [EMAIL PROTECTED] list - and it resolved his problem. His problem - the system would reset after getting about 1/2 way through a Solaris install. The installer was simply acting as a good system exerciser and heating up his CPU until it glitched out. After he removed the CPU fan and cleaned up his heatsink - he loaded up Solaris successfully. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss