Re: Software raid - controller options

2007-11-05 Thread Alberto Alonso
ATA controllers cards? > > Are there any specific chipsets/brands of motherboards or controller > cards that you software raid veterans prefer? > > Thank you for your time and any info you are able to give me! > > Lyle > > - > To unsubscribe from this list:

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 15:15 -0400, Doug Ledford wrote: > It was tested, it simply obviously had a bug you hit. Assuming that > your particular failure situation is the only possible outcome for all > the other people that used it would be an invalid assumption. There are > lots of code paths in a

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 11:45 -0400, Doug Ledford wrote: > The key word here being "supported". That means if you run across a > problem, we fix it. It doesn't mean there will never be any problems. On hardware specs I normally read "supported" as "tested within that OS version to work within spe

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Fri, 2007-11-02 at 11:09 +, David Greaves wrote: > David > PS I can't really contribute to your list - I'm only using cheap desktop > hardware. > - If you had failures and it properly handled them, then you can contribute to the good combinations, so far that's the list that is kind of e

Re: Software RAID when it works and when it doesn't

2007-11-02 Thread Alberto Alonso
On Sat, 2007-10-27 at 11:26 -0400, Bill Davidsen wrote: > Alberto Alonso wrote: > > On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > > > > > >> Depending on the hardware you can still access a different disk while > >> another one is reseti

Re: Implementing low level timeouts within MD

2007-11-02 Thread Alberto Alonso
On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote: > I wasn't belittling them. I was trying to isolate the likely culprit in > the situations. You seem to want the md stack to time things out. As > has already been commented by several people, myself included, that's a > band-aid and not a f

Re: Implementing low level timeouts within MD

2007-10-31 Thread Alberto Alonso
On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote: > > Really, you've only been bitten by three so far. Serverworks PATA > (which I tend to agree with the other person, I would probably chock 3 types of bugs is too many, it basically affected all my customers with multi-terabyte arrays. Hec

Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote: > I agree with Doug: nothing prevents you from using md above very slow > drivers (such as remote disks or even a filesystem implemented over a > tape device to make it extreme). Only the low-level drivers know when > it is appropriate to tim

Re: Implementing low level timeouts within MD

2007-10-29 Thread Alberto Alonso
On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote: > OK, these you don't get to count. If you run raid over USB...well...you > get what you get. IDE never really was a proper server interface, and > SATA is much better, but USB was never anything other than a means to > connect simple device

Re: Implementing low level timeouts within MD

2007-10-27 Thread Alberto Alonso
On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote: > On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote: > > Regardless of the fact that it is not MD's fault, it does make > > software raid an invalid choice when combined with those drivers. A > > single disk fa

RocketRAID 2220 firmware raid experience

2007-10-27 Thread Alberto Alonso
Has anybody used the RocketRAID 2220 to build "hardware" raid and lived through failures? As some of you may know from my previous posts, I've been having problems with software raid. Unfortunately, this was the only card available to me to add to my server so I haven't been able to test anything

Re: Implementing low level timeouts within MD

2007-10-27 Thread Alberto Alonso
On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote: > > This isn't an md problem, this is a low level disk driver problem. Yell > at the author of the disk driver in question. If that driver doesn't > time things out and return errors up the stack in a reasonable time, > then it's broken. Md

Implementing low level timeouts within MD

2007-10-26 Thread Alberto Alonso
I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? For me this year shall be known as "the year the array stood still" (bad scifi reference :-) After 4 different array failures a

Re: Software RAID when it works and when it doesn't

2007-10-26 Thread Alberto Alonso
On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > Depending on the hardware you can still access a different disk while > another one is reseting. But since there is no timeout in md it won't > try to use any other disk while one is stuck. > > That is exactly what I miss. > > MfG

Re: Software RAID when it works and when it doesn't

2007-10-24 Thread Alberto Alonso
On Wed, 2007-10-24 at 16:04 -0400, Bill Davidsen wrote: > I think what you really want is to notice how long the drive and driver > took to recover or fail, and take action based on that. In general "kick > the drive" is not optimal for a few bad spots, even if the drive > recovery sucks. The

Re: Software RAID when it works and when it doesn't

2007-10-23 Thread Alberto Alonso
On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: > I'm not sure the timeouts are the problem, even if md did its own > timeout, it then needs a way to tell the driver (or device) to stop > retrying. I don't believe that's available, certainly not everywhere, > and anything other than eve

Re: Software RAID when it works and when it doesn't

2007-10-19 Thread Alberto Alonso
On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: > Mike Accetta <[EMAIL PROTECTED]> writes: > What I would like to see is a timeout driven fallback mechanism. If > one mirror does not return the requested data within a certain time > (say 1 second) then the request should be duplicat

Re: Software RAID when it works and when it doesn't

2007-10-14 Thread Alberto Alonso
On Sun, 2007-10-14 at 10:21 -0600, Maurice Hilarius wrote: > Alberto Alonso wrote: > > > PATA (IDE) with > Master and Slave drives is a "bad idea" as, when one drive fails, the > other of the Master & Slave pair often is no longer usable. > On discrete int

Re: Software RAID when it works and when it doesn't

2007-10-13 Thread Alberto Alonso
On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote: > On Saturday October 13, [EMAIL PROTECTED] wrote: > > Over the past several months I have encountered 3 > > cases where the software RAID didn't work in keeping > > the servers up and running. > > > > In all cases, the failure has been on a sin

Software RAID when it works and when it doesn't

2007-10-13 Thread Alberto Alonso
Over the past several months I have encountered 3 cases where the software RAID didn't work in keeping the servers up and running. In all cases, the failure has been on a single drive, yet the whole md device and server become unresponsive. (usb-storage) In one situation a RAID 0 across 2 USB dri

Kicking the right drive out

2007-10-13 Thread Alberto Alonso
I have a need to kick a disk out of a RAID 5 array. I can do a fdisk on 2 out of the 3 devices that form part of the array, so I suspect I know which one is bad. The problem is that mdstat shows the array as follows: md3 : active raid5 sda6[0] sdc6[2] sdb6[1] 960863488 blocks level 5, 64k

Re: When does a disk get flagged as bad?

2007-06-02 Thread Alberto Alonso
age to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233http://www.ggsys.net Hardware, consulting, sysadmin, monitoring and remote backups - To unsubsc

Re: When does a disk get flagged as bad?

2007-05-30 Thread Alberto Alonso
On Wed, 2007-05-30 at 22:28 -0400, Mike Accetta wrote: > Alberto Alonso writes: > > OK, lets see if I can understand how a disk gets flagged > > as bad and removed from an array. I was under the impression > > that any read or write operation failure flags the drive as > &

Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread Alberto Alonso
On Tue, 2007-05-29 at 13:28 +1000, David Chinner wrote: > On Mon, May 28, 2007 at 05:45:27PM -0500, Alberto Alonso wrote: > > On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote: > > > I consider the possibility of serving out bad data (i.e after > > > a remount t

Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread Alberto Alonso
On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote: > On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote: > > I think his point was that going into a read only mode causes a > > less catastrophic situation (ie. a web server can still serve > > pages). >

Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-24 Thread Alberto Alonso
for my needs (except issues with NFS interaction, where the bug report never got answered), but that doesn't mean it can not be improved. Just my 2 cents, Alberto > Cheers, > > Dave. -- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233

When does a disk get flagged as bad?

2007-05-24 Thread Alberto Alonso
where the array is never degraded. Does an error of type: end_request: I/O error, dev sdb, sector not count as a read/write error? Thanks, Alberto -- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233http://www.ggsys.net Hardware, consulting

I/O errors, server unresponsive, array NOT-degraded

2007-05-22 Thread Alberto Alonso
ince the array rebuilds that the disks themselves should be OK? * Why is the md device not being downgraded? Thanks, Alberto -- Alberto AlonsoGlobal Gate Systems LLC. (512) 351-7233http://www.ggsys.net Hardware, consulting, sysadmin, monitoring an