qla2xxx BUG: workqueue leaked lock or atomic

2007-02-26 Thread Andre Noll
Hi On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems connected to a qla2xxx card and used as a single volume via lvm. The system seems to lock up only if data gets written to both raid systems at the same time. On a standard kernel nothing makes it to the log, the system just freez

Re: end to end error recovery musings

2007-02-26 Thread Theodore Tso
On Mon, Feb 26, 2007 at 04:33:37PM +1100, Neil Brown wrote: > Do we want a path in the other direction to handle write errors? The > file system could say "Don't worry to much if this block cannot be > written, just return an error and I will write it somewhere else"? > This might allow md not to

Re: end to end error recovery musings

2007-02-26 Thread Alan
> the new location. I believe this should be always true, so presumably > with all modern disk drives a write error should mean something very > serious has happend. Not quite that simple. If you write a block aligned size the same size as the physical media block size maybe this is true. If yo

Re: end to end error recovery musings

2007-02-26 Thread James Bottomley
On Mon, 2007-02-26 at 08:25 -0500, Theodore Tso wrote: > Somewhat off-topic, but my one big regret with how the dm vs. evms > competition settled out was that evms had the ability to perform block > device snapshots using a non-LVM volume as the base --- and that EVMS > allowed a single drive to be

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Alan wrote: the new location. I believe this should be always true, so presumably with all modern disk drives a write error should mean something very serious has happend. Not quite that simple. I think that write errors are normally quite serious, but there are exceptions which might be

Re: end to end error recovery musings

2007-02-26 Thread Alan
> I think that this is mostly true, but we also need to balance this against > the > need for higher levels to get a timely response. In a really large IO, a > naive > retry of a very large write could lead to a non-responsive system for a very > large time... And losing the I/O could result

Re: aacraid not detecting drives if compiled into kernel

2007-02-26 Thread pcaldes
Thanx for your quick reply on Friday. I was able to get the aacraid driver statically linked into the kernel to recognize the drive. I found the 1.1-5-2420 drivers from the Adaptec site (aacraid_drv_1.1.5-2420.rpm) Since there were no patches specifically for my kernel RHEL3 (kernel-2.4.21-4

RE: aacraid not detecting drives if compiled into kernel

2007-02-26 Thread Salyzyn, Mark
1.1.5-2433 is available on the website for other controllers that belong to the aacraid family. I know this can be confusing, but products are placed on the website after the combination of controller, driver and Linux distribution are tested and are thusly associated. Untested or limited tested dr

Re: Please help if u can.

2007-02-26 Thread Luben Tuikov
--- Douglas Gilbert <[EMAIL PROTECTED]> wrote: > This code was effectively removed from Luben's control > about 18 months ago and has passed through several sets Or rather, it was "forked off" of my main development tree. I guess, bottomley felt more comfortable with controlling it, that way. > F

Re: Please help if u can.

2007-02-26 Thread Luben Tuikov
--- "Darrick J. Wong" <[EMAIL PROTECTED]> wrote: > Laziness, in my case. I suppose it would be useful to document the fact > that I've made changes to libsas/aic94xx. Though the "what has been > done" part ... I was hoping the commit messages would suffice. Doing a rev list on drivers/scsi/aic94

Re: Please help if u can.

2007-02-26 Thread Luben Tuikov
--- John Scarpa <[EMAIL PROTECTED]> wrote: > Dear Luben, > > I am trying to compile the aic94xx for my aic9410 directly into my > kernel (fc5_64bit-2.6.20)... Is this possible or must it be loaded as a > module? I am really not wanting to add modular support to my nice neat > monolithic kerne

Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-26 Thread Andrew Vasquez
On Mon, 26 Feb 2007, Andre Noll wrote: > On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems > connected to a qla2xxx card and used as a single volume via lvm. > The system seems to lock up only if data gets written to both raid > systems at the same time. > > On a standard kernel no

SCSI devices with 256-byte sectors don't work?

2007-02-26 Thread Chuck Ebbert
Apparently there really are such devices: Sep 28 20:05:42 localhost kernel: scsi4 : SCSI emulation for USB Mass Storage devices Sep 28 20:05:42 localhost kernel: Vendor: Sandisk Model: ImageMate SDDR09 Rev: 0100 Sep 28 20:05:42 localhost kernel: Type: Direct-AccessANSI SCSI

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Alan wrote: I think that this is mostly true, but we also need to balance this against the need for higher levels to get a timely response. In a really large IO, a naive retry of a very large write could lead to a non-responsive system for a very large time... And losing the I/O could resul

Re: end to end error recovery musings

2007-02-26 Thread H. Peter Anvin
Theodore Tso wrote: In any case, the reason why I bring this up is that it would be really nice if there was a way with a single laptop drive to be able to do snapshots and background fsck's without having to use initrd's with device mapper. This is a major part of why I've been trying to pus

Re: end to end error recovery musings

2007-02-26 Thread Jeff Garzik
Theodore Tso wrote: Can someone with knowledge of current disk drive behavior confirm that for all drives that support bad block sparing, if an attempt to write to a particular spot on disk results in an error due to bad media at that spot, the disk drive will automatically rewrite the sector to

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Theodore Tso wrote: Can someone with knowledge of current disk drive behavior confirm that for all drives that support bad block sparing, if an attempt to write to a particular spot on disk results in an error due to bad media at that spot, the disk drive will automatically

Re: end to end error recovery musings

2007-02-26 Thread Alan
> One interesting counter example is a smaller write than a full page - say 512 > bytes out of 4k. > > If we need to do a read-modify-write and it just so happens that 1 of the 7 > sectors we need to read is flaky, will this "look" like a write failure? The current core kernel code can't handle

RE: end to end error recovery musings

2007-02-26 Thread Moore, Eric
On Monday, February 26, 2007 9:42 AM, Ric Wheeler wrote: > Which brings us back to a recent discussion at the file > system workshop on being > more repair oriented in file system design so we can survive > situations like > this a bit more reliably ;-) > On the second day of the workshop, t