On Tue, Dec 2, 2008 at 11:42 AM, Brian Hechinger <[EMAIL PROTECTED]> wrote:
> I was not in front of the machine, I had remote hands working with me, so I > appologize in advance for any lack of detail I'm about to give. > > The server in question is running snv_81 booting ZFS Root using Tim's > scripts to > "convert" it over to ZFS Root. > > My server in colo stopped responding. I had a screen session open and I > could > switch between screen windows and create new windows but I could not run > any > commands. I also could not log into the box. > > The hands on person saw this on the console (transcribed from a video > console): > > SYNCHRONIZE CACHE command failed (5) > scsi: WARNING: /[EMAIL PROTECTED],0/pci1095,[EMAIL PROTECTED]/[EMAIL > PROTECTED],0 (sd1) > > sd1 is one of two SATA disks connected to the machine via a SiL3124 > controller. > > I had the remote hands pull sd1 and reboot the machine. It came right up > and has > been running fine since. Lacking its mirrored disks, however. > > Due to other issues I've had with this box (If you think you can get away > with running > ZFS on a 32-bit machine, you are mistaken) I'm looking to replace it > anyway. What > concerns me is that a single disk having gone bad like that can take out > the whole > machine. This is not what I would consider an ideal or acceptable setup > for a machine > that is in colo that doesn't have 24x7 onsite support. > > What was to blame for this disk failure causing my machine to become > unresponsive? Was > it the SiL3124? Is it something else? Is this what I should expect from > SATA? > > I ask all these questions as I want to make sure that if this is indeed > connected to the > use of a SATA controller, or the use of a specific SATA controller that I > certainly avoid > that with this next machine. > > I've got a very slim budget on this, and based on that I found what looks > like a pretty > nice little server that is in my budget. It's an ASUS RS161-E2/PA2 which > is based on the > nForce Professional 2200, which from what I can tell is what the Ultra 40 > is based on, so > I would expect it to pretty much just work. > > Will the nv_sata driver behave in a more sane fashion in a case like what > I've just gone > through? If this is a shortcoming of SATA, does anyone have any > recommendations on a not > too expensive setup based on a SAS controller? > > As much as I would like this thing to do a great job in the performance > arena, stability is > definitely higher on the list of what's really important to me. > > Thanks, > > -brian > I believe the issue you're running into is the failmode you currently have set. Take a look at this: http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/ --Tim
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss