Re: [zfs-discuss] ZFS hangs/freezes after disk failure,

Matt Harrison Sun, 24 Aug 2008 20:06:56 -0700

Todd H. Poole wrote:
>> But you're not attempting hotswap, you're doing hot plug....
> 
> Do you mean hot UNplug? Because I'm not trying to get this thing to recognize 
> any new disks without a restart... Honest. I'm just trying to prevent the 
> machine from freezing up when a drive fails. I have no problem restarting the 
> machine with a new drive in it later so that it recognizes the new disk.
> 
>> and unless you're using the onboard bios' concept of an actual
>> RAID array, you don't have an array, you've got a JBOD and
>> it's not a real JBOD - it's a PC motherboard which does _not_
>> have the same electronic and electrical protections that a
>> JBOD has *by design*.
> 
> I'm confused by what your definition of a RAID array is, and for that matter, 
> what a JBOD is... I've got plenty of experience with both, but just to make 
> sure I wasn't off my rocker, I consulted the demigod:
> 
> http://en.wikipedia.org/wiki/RAID
> http://en.wikipedia.org/wiki/JBOD
> 
> and I think what I'm doing is indeed RAID... I'm not using some sort of 
> controller card, or any specialized hardware, so it's certainly not Hardware 
> RAID (and thus doesn't contain any of the fancy electronic or electrical 
> protections you mentioned), but lacking said protections doesn't preclude the 
> machine from being considered a RAID. All the disks are the same capacity, 
> the OS still sees the zpool I've created as one large volume, and since I'm 
> using RAID-Z (RAID5), it should be redundant... What other qualifiers out 
> there are necessary before a system can be called RAID compliant?
> 
> If it's hot-swappable technology, or a controller hiding the details from the 
> OS and instead  presenting a single volume, then I would argue those things 
> are extra - not a fundamental prerequisite for a system to be called a RAID.
> 
> Furthermore, while I'm not sure what the difference between a "real JBOD" and 
> a plain old JBOD is, this set-up certainly wouldn't qualify for either. I 
> mean, there is no concatenation going on, redundancy should be present (but 
> due to this issue, I haven't been able to verify that yet), and all the 
> drives are the same size... Am I missing something in the definition of a 
> JBOD?
> 
> I don't think so...
>  
>> And you're right, it can. But what you've been doing is outside
>> the bounds of what IDE hardware on a PC motherboard is designed
>> to cope with.
> 
> Well, yes, you're right, but it's not like I'm making some sort of radical 
> departure outside of the bounds of the hardware... It really shouldn't be a 
> problem so long as it's not an unreasonable departure because that's where 
> software comes in. When the hardware can't cut it, that's where software 
> picks up the slack.
> 
> Now, obviously, I'm not saying software can do anything with any piece of 
> hardware you give it - no matter how many lines of code you write, your 
> keyboard isn't going to turn into a speaker - but when it comes to reasonable 
> stuff like ensuring a machine doesn't crash because a user did something with 
> the hardware that he or she wasn't supposed to do? Prime target for software.
> 
> And that's the way it's always been... The whole push behind that whole ZFS 
> Promise thing (or if you want to make it less specific, the attractiveness of 
> RAID in general), was that "RAID-Z [wouldn't] require any special hardware. 
> It doesn't need NVRAM for correctness, and it doesn't need write buffering 
> for good performance. With RAID-Z, ZFS makes good on the original RAID 
> promise: it provides fast, reliable storage using cheap, commodity disks." 
> (http://blogs.sun.com/bonwick/entry/raid_z)
> 
>> Well sorry, it does. Welcome to an OS which does care.
> 
> The half-hearted apology wasn't necessary... I understand that OpenSolaris 
> cares about the method those disks use to plug into the motherboard, but what 
> I don't understand is why that limitation exists in the first place. It would 
> seem much better to me to have an OS that doesn't care (but developers that 
> do) and just finds a way to work, versus one that does care (but developers 
> that don't) and instead isn't as flexible and gets picky... I'm not saying 
> OpenSolaris is the latter, but I'm not getting the impression it's the former 
> either...
> 
>> If the controlling electronics for your disk can't
>> handle it, then you're hosed. That's why FC, SATA (in SATA
>> mode) and SAS are much more likely to handle this out of
>> the box. Parallel SCSI requires funky hardware, which is why
>> those old 6- or 12-disk multipacks are so useful to have.
>>
>> Of the failure modes that you suggest above, only one
>> is going to give you anything other than catastrophic
>> failure (drive motor degradation) - and that is because the
>> drive's electronics will realise this, and send warnings to
>> the host.... which should have its drivers written so
>> that these messages are logged for the sysadmin to act upon.
>>
>> The other failure modes are what we call catastrophic. And
>> where your hardware isn't designed with certain protections
>> around drive connections, you're hosed. No two ways
>> about it. If your system suffers that sort of failure, would
>> you seriously expect that non-hardened hardware would survive it?
> 
> Yes, I would. At the risk of sounding repetitive, I'll summarize what I've 
> been getting at in my previous responses: I certainly _do_ think it's 
> reasonable to expect non-hardened hardware to survive this type of failure. 
> In fact, I think its unreasonable _not_ to expect it to. The Linux kernel, 
> the BSD kernels, and the NT kernel (or whatever chunk of code runs Windows) 
> all provide this type of functionality, and have so for some time. Granted, 
> they may all do it in different ways, but at the end of the day, unplugging 
> an IDE hard drive from a software RAID5 array in OpenSuSE or RedHat, FreeBSD, 
> or Windows XP Professional will not bring the machine down. And it shouldn't 
> in OpenSolaris either. There might be some sort of noticeable bump (Windows, 
> for example, pauses for a few seconds while it tries to figure out what hell 
> just happened to one of it's disks), but there isn't anything show 
> stopping... 
> 
>> If you've got newer hardware, which can support SATA
>> in native SATA mode, USE IT.
> 
> I'll see what I can do - this might be some sort of BIOS setting that can be 
> configured.
>  
>>> I'm grateful for your help, but is there another way that you can think
>>> of to get this to work?
>> You could start by taking us seriously when we tell
>> you that what you've been doing is not a good idea, and
>> find other ways to simulate drive failures.
> 
> Lets drop the confrontational attitude - I'm not trying to dick around with 
> you here. I've done my due diligence in researching this issue on Google, 
> these forums, and Sun's documentation before making a post, I've provided any 
> clarifying information that has been requested by those kind enough to post a 
> response, and I've yet to resort to any witty or curt remarks in my 
> correspondence with you, tcook, or myxiplx. Whatever is causing you to think 
> I'm not taking anyone seriously, let me reassure you, I am.
> 
> The only thing I'm doing is testing a system by applying the worst case 
> scenario of survivable torture to it and seeing how it recovers. If that's 
> not a good idea, then I guess we disagree. But that's ok - you're James C. 
> McPherson, Senior Kernel Software Engineer, Solaris, and I'm just some user 
> who's trying to find a solution to his problem. My bad for expecting the same 
> level of respect I've given two other members of this community to be 
> returned in kind by one of it's leaders.
> 
> So aside from telling me to "[never] try this sort of thing with IDE" does 
> anyone else have any other ideas on how to prevent OpenSolaris from locking 
> up whenever an IDE drive is abruptly disconnected from a ZFS RAID-Z array?
> 
> -Todd
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


I'm far from being an expert on this subject, but this is what I understand:

Unplugging a drive (actually pulling the cable out) does not simulate a 
drive failure, it simulates a drive getting unplugged, which is 
something the hardware is not capable of dealing with.

If your drive were to suffer something more realistic, along the lines 
of how you would normally expect a drive to die, then the system should 
cope with it a whole lot better.

Unfortunately, hard drives don't come with a big button saying "simulate 
head crash now" or "make me some bad sectors" so it's going to be 
difficult to simulate those failures.

All I can say is that unplugging a drive yourself will not simulate a 
failure, it merely causes the disk to disappear. Dying or dead disks 
will still normally be able to communicate with the driver to some 
extent, so they are still "there".

If you were using dedicated hotswappable hardware, then I wouldn't 
expect to see the problem, but AFAIK off the shelf SATA hardware doesn't 
support this fully, so unexpected results will occur.

I hope this has been of some small help, even just to explain why the 
system didn't cope as you expected.

Matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS hangs/freezes after disk failure,

Reply via email to