Re: raid5:md3: read error corrected , followed by , Machine Check Exception: .

2007-07-14 Thread Alan Cox
On Sat, 14 Jul 2007 17:08:27 -0700 (PDT) "Mr. James W. Laferriere" <[EMAIL PROTECTED]> wrote: > Hello All , I was under the impression that a 'machine check' would be > caused by some near to the CPU hardware failure , Not a bad disk ? It indicates a hardware failure > Jul 14 23:00:26 f

Re: Linux Software RAID is really RAID?

2007-07-04 Thread Alan Cox
> > A hard(ware) lockup, not software. > > That's why Intel says ICH5 doesn't do hotplug. > > OIC. I don't think there's much left to do from the driver side then. > Or is there any workaround? I'm not familiar with the ICH5 SATA side but on the PATA side we also need to run code to fix up chips

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Alan Stern
n's kernel is built from and just use it, although it would take a long time to build because it includes so many drivers. Whittling it down to just the drivers you need would be tedious but not very difficult. Alan Stern - To unsubscribe from this list: send the line "unsubscribe l

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Alan Stern
nd EHCI. Links to their specifications are available here: http://www.usb.org/developers/resources/ Specifications for various classes of USB devices are available here: http://www.usb.org/developers/devclass_docs Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Alan Stern
is liable to miss bits and pieces of the kernel log when a lot of information comes along all at once. You're much better off getting the stack trace data directly from dmesg. (And when you do, you don't end up with 30 columns of wasted data added to the beginning of each line.) Ala

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Alan Stern
don't know which they would be. > > I have no copy khubd running. That in itself is a very bad sign. You need to look at the dmesg log. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Alan Stern
matter which. But you haven't tried using different hubs, different USB cables, or different computers. > Nonetheless, I'm beginning to think I'm dealing with a hardware issue, not > a kernel issue, just because it is so consistent. People have reported problems in which the har

Re: end to end error recovery musings

2007-02-27 Thread Alan
he only one with this problem (and a workaround) which is SATA capable 8) Alan - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: end to end error recovery musings

2007-02-27 Thread Alan
t dangly bits associated with a sequence of inodes with the same upper bits. More problematic is losing indirect blocks, and being able to keep some kind of [inode low bits/block index] would help put stuff back together. Alan - To unsubscribe from this list: send the line "unsubscribe linux-ra

Re: end to end error recovery musings

2007-02-26 Thread Alan
> One interesting counter example is a smaller write than a full page - say 512 > bytes out of 4k. > > If we need to do a read-modify-write and it just so happens that 1 of the 7 > sectors we need to read is flaky, will this "look" like a write failure? The current core kernel code can't handle

Re: end to end error recovery musings

2007-02-26 Thread Alan
> I think that this is mostly true, but we also need to balance this against > the > need for higher levels to get a timely response. In a really large IO, a > naive > retry of a very large write could lead to a non-responsive system for a very > large time... And losing the I/O could result

Re: end to end error recovery musings

2007-02-26 Thread Alan
true. If you write a sector on a device with physical sector size larger than logical block size (as allowed by say ATA7) then it's less clear what happens. I don't know if the drive firmware implements multiple "tails" in this case. On a read error it is worth trying the other

Re: sata badness in 2.6.20-rc1? [Was: Re: md patches in -mm]

2006-12-15 Thread Alan
On Fri, 15 Dec 2006 13:39:27 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Fri, 15 Dec 2006 13:05:52 -0800 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > Jeff, I shall send all the sata patches which I have at you one single time > > and I shall then drop the lot. So please don't flub th

Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

2006-09-19 Thread Alan Cox
complexities - the callback wants to take locks to guard the object it works on but if it is called synchronously - eg if hardware is busy and we fall back - it might deadlock with the caller of dmaa_async_foo() who also needs to hold the lock. Alan - To unsubscribe from this list: send the line &q

large copy to mdadm array fails, marks readonly

2006-09-05 Thread Alan Gibson
): ext3_journal_start_sb: Detected aborted journal [17181702.96] Remounting filesystem read-only im stumped, any ideas? thanks much, alan - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo inf

Re: Linux: Why software RAID?

2006-08-24 Thread Alan Cox
Ar Iau, 2006-08-24 am 07:31 -0700, ysgrifennodd Marc Perkel: > So - the bottom line answer to my question is that unless you are > running raid 5 and you have a high powered raid card with cache and > battery backup that there is no significant speed increase to use > hardware raid. For raid 0 ther

Re: Linux: Why software RAID?

2006-08-24 Thread Alan Cox
Ar Iau, 2006-08-24 am 09:07 -0400, ysgrifennodd Adam Kropelin: > Jeff Garzik <[EMAIL PROTECTED]> wrote: > with sw RAID of course if the builder is careful to use multiple PCI > cards, etc. Sw RAID over your motherboard's onboard controllers leaves > you vulnerable. Generally speaking the channels

Re: [PATCH 000 of 5] md: Introduction

2006-01-18 Thread Alan Cox
On Mer, 2006-01-18 at 09:14 +0100, Sander wrote: > If the (harddisk internal) remap succeeded, the OS doesn't see the bad > sector at all I believe. True for ATA, in the SCSI case you may be told about the remap having occurred but its a "by the way" type message not an error proper. > If you (th

Re: Where is the performance bottleneck?

2005-08-31 Thread Dr. David Alan Gilbert
sdf (which seems to be the slowest and fastest drives respectively). I guess if everyone was running at sdf's speed you would be pretty happy. If you physically swap f and g does the performance follow the drive or the letter? Dave -- -Open up your eyes, open up your mind, open up your

Re: Where is the performance bottleneck?

2005-08-31 Thread Dr. David Alan Gilbert
your code --- / Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _|_ http://www.treblig.org |___/ - To unsubscribe from this list: send the line "unsubscribe

Re: [PATCH] RAID5 NULL Checking Bug Fixt

2001-05-16 Thread Alan Cox
> On Wednesday May 16, [EMAIL PROTECTED] wrote: > > > > (more patches to come. They will go to Linus, Alan, and linux-raid only). > > This is the next one, which actually addresses the "NULL Checking > Bug". Thanks. As Linus merges I'll switch over

Re: Proposed RAID5 design changes.

2001-03-21 Thread Alan Cox
> Umm. Isn't RAID implemented as the md device? That implies that it is > responsible for some kind of error management. Bluntly, the file systems > don't declare a file system kaput until they've retried the critical > I/O operations. Why should RAID5 be any less tolerant? File systems give up t

Re: Proposed RAID5 design changes.

2001-03-21 Thread Alan Cox
> any data, but under normal default drive setup the sector will not be > reallocated. If testing the failing sector is too much effort, a > simple overwrite with the corrected data, at worst, improves the > chances of the drive firmware being able to reallocate the sector. > This works just f

Re: Proposed RAID5 design changes.

2001-03-21 Thread Alan Cox
> > 1) Read and write errors should be retried at least once before kicking > >the drive out of the array. > > This doesn't seem unreasonable on the face of it. Device level retries are the job of the device level driver > > 2) On more persistent read errors, the failed block (or whatever u