On Tue, Dec 2, 2008 at 11:42 AM, Brian Hechinger <[EMAIL PROTECTED]> wrote:

> I was not in front of the machine, I had remote hands working with me, so I
> appologize in advance for any lack of detail I'm about to give.
>
> The server in question is running snv_81 booting ZFS Root using Tim's
> scripts to
> "convert" it over to ZFS Root.
>
> My server in colo stopped responding.  I had a screen session open and I
> could
> switch between screen windows and create new windows but I could not run
> any
> commands.  I also could not log into the box.
>
> The hands on person saw this on the console (transcribed from a video
> console):
>
> SYNCHRONIZE CACHE command failed (5)
> scsi: WARNING: /[EMAIL PROTECTED],0/pci1095,[EMAIL PROTECTED]/[EMAIL 
> PROTECTED],0 (sd1)
>
> sd1 is one of two SATA disks connected to the machine via a SiL3124
> controller.
>
> I had the remote hands pull sd1 and reboot the machine.  It came right up
> and has
> been running fine since. Lacking its mirrored disks, however.
>
> Due to other issues I've had with this box (If you think you can get away
> with running
> ZFS on a 32-bit machine, you are mistaken) I'm looking to replace it
> anyway.  What
> concerns me is that a single disk having gone bad like that can take out
> the whole
> machine.  This is not what I would consider an ideal or acceptable setup
> for a machine
> that is in colo that doesn't have 24x7 onsite support.
>
> What was to blame for this disk failure causing my machine to become
> unresponsive?  Was
> it the SiL3124?  Is it something else?  Is this what I should expect from
> SATA?
>
> I ask all these questions as I want to make sure that if this is indeed
> connected to the
> use of a SATA controller, or the use of a specific SATA controller that I
> certainly avoid
> that with this next machine.
>
> I've got a very slim budget on this, and based on that I found what looks
> like a pretty
> nice little server that is in my budget.  It's an ASUS RS161-E2/PA2 which
> is based on the
> nForce Professional 2200, which from what I can tell is what the Ultra 40
> is based on, so
> I would expect it to pretty much just work.
>
> Will the nv_sata driver behave in a more sane fashion in a case like what
> I've just gone
> through?  If this is a shortcoming of SATA, does anyone have any
> recommendations on a not
> too expensive setup based on a SAS controller?
>
> As much as I would like this thing to do a great job in the performance
> arena, stability is
> definitely higher on the list of what's really important to me.
>
> Thanks,
>
> -brian
>



I believe the issue you're running into is the failmode you currently have
set.  Take a look at this:
http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/


--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to