asynchronous write

2007-07-18 Thread Peter T. Breuer
Did the asynchronous write stuff (as it was in fr1) ever get into kernel software raid? I see from the raid acceleration ("ioat") patching going on that some sort of asynchronicity is being contemplated, but blessed if I can make head or tail of the descriptions I've read. It looks vaguely like p

Re: possible deadlock through raid5/md

2006-10-15 Thread Peter T. Breuer
While travelling the last few days, a theory has occurred to me to explain this sort of thing ... > A user has sent me a ps ax output showing an enbd client daemon > blocked in get_active_stripe (I presume in raid5.c). > > ps ax -of,uid,pid,ppid,pri,ni,vsz,rss,wchan:30,stat,tty,time,comman

possible deadlock through raid5/md

2006-10-05 Thread Peter T. Breuer
A user has sent me a ps ax output showing an enbd client daemon blocked in get_active_stripe (I presume in raid5.c). ps ax -o f,uid,pid,ppid,pri,ni,vsz,rss,wchan:30,stat,tty,time,command F UID PID PPID PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 5 0 26540 1 23

Re: remark and RFC

2006-08-19 Thread Peter T. Breuer
"Also sprach Gabor Gombas:" > On Thu, Aug 17, 2006 at 08:28:07AM +0200, Peter T. Breuer wrote: > > > 1) if the network disk device has decided to shut down wholesale > >(temporarily) because of lack of contact over the net, then > >retries and writes a

Re: remark and RFC

2006-08-18 Thread Peter T. Breuer
"Also sprach ptb:" > 4) what the network device driver wants to do is be able to identify >the difference between primary requests and retries, and delay >retries (or repeat them internally) with some reasonable backoff >scheme to give them more chance of working in the face of a >

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
HI Neil .. "Also sprach Neil Brown:" > On Wednesday August 16, [EMAIL PROTECTED] wrote: > > 1) I would like raid request retries to be done with exponential > >delays, so that we get a chance to overcome network brownouts. > > > > 2) I would like some channel of communication to be available

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
"Also sprach Molle Bestefich:" > > > See above. The problem is generic to fixed bandwidth transmission > > channels, which, in the abstract, is "everything". As soon as one > > does retransmits one has a kind of obligation to keep retransmissions > > down to a fixed maximum percentage of the poten

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
"Also sprach Molle Bestefich:" [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Peter T. Breuer wrote: > > > You want to hurt performance for every single MD user out there, just > > > > There's no performance drop! Exponentially staged retries on

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
"Also sprach Molle Bestefich:" [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Peter T. Breuer wrote: > > We can't do a HOT_REMOVE while requests are outstanding, as far as I > > know. > > Actually, I'm not quite sure which kind of requests you

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
"Also sprach Molle Bestefich:" > Peter T. Breuer wrote: > > I would like raid request retries to be done with exponential > > delays, so that we get a chance to overcome network brownouts. > > Hmm, I don't think MD even does retries of requests. I had a "

Re: remark and RFC

2006-08-16 Thread Peter T. Breuer
"Also sprach Molle Bestefich:" [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Peter T. Breuer wrote: > > 1) I would like raid request retries to be done with exponential > >delays, so that we get a chance to overcome network brownouts. > > > > I

remark and RFC

2006-08-16 Thread Peter T. Breuer
Hello - I believe the current kernel raid code retries failed reads too quickly and gives up too soon for operation over a network device. Over (my) the enbd device, the default mode of operation was before-times to have the enbd device time out requests after 30s of net stalemate and maybe even

Re: remove resyncing disk

2005-04-20 Thread Peter T. Breuer
Robbie Hughes <[EMAIL PROTECTED]> wrote: > Number Major Minor RaidDevice State >0 00 -1 removed >1 22 661 active sync /dev/hdd2 >2 330 spare /dev/hda3 > The main problem i have now is

Re: Questions about software RAID

2005-04-20 Thread Peter T. Breuer
Molle Bestefich <[EMAIL PROTECTED]> wrote: > There seems to be an obvious lack of a properly thought out interface > to notify userspace applications of MD events (disk failed --> go > light a LED, etc). Well, that's probably truish. I've been meaning to ask for a per-device sysctl interface for s

Re: Questions about software RAID

2005-04-18 Thread Peter T. Breuer
tmp <[EMAIL PROTECTED]> wrote: > I've read "man mdadm" and "man mdadm.conf" but I certainly doesn't have > an overview of software RAID. Then try using it instead/as well as reading about it, and you will obtain a more cmprehensive understanding. > OK. The HOWTO describes mostly a raidtools conte

Re: Questions about software RAID

2005-04-18 Thread Peter T. Breuer
tmp <[EMAIL PROTECTED]> wrote: > 1) I have a RAID-1 setup with one spare disk. A disk crashes and the > spare disk takes over. Now, when the crashed disk is replaced with a new > one, what is then happening with the role of the spare disk? Is it > reverting to its old role as spare disk? Try it an

Re: RAID1 and data safety?

2005-04-10 Thread Peter T. Breuer
Doug Ledford <[EMAIL PROTECTED]> wrote: > > > Now, if I recall correctly, Peter posted a patch that changed this > > > semantic in the raid1 code. The raid1 code does not complete a write to > > > the upper layers of the kernel until it's been completed on all devices > > > and his patch made it s

Re: RAID1 and data safety?

2005-04-08 Thread Peter T. Breuer
I forgot to say "thanks"! Thanks for the breakdown. Doug Ledford <[EMAIL PROTECTED]> wrote: (of event count increment) > I think the best explanation is this: any change in array state that OK .. > would necessitate kicking a drive out of the array if it didn't also > make this change in state

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Luca Berra <[EMAIL PROTECTED]> wrote: > On Tue, Mar 29, 2005 at 01:29:22PM +0200, Peter T. Breuer wrote: > >Neil Brown <[EMAIL PROTECTED]> wrote: > >> Due to the system crash the data on hdb is completely ignored. Data > > > >Neil - can you explain the

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Neil Brown <[EMAIL PROTECTED]> wrote: > On Tuesday March 29, [EMAIL PROTECTED] wrote: > > > > Don't put the journal on the raid device, then - I'm not ever sure why > > people do that! (they probably have a reason that is good - to them). > > Not good advice. DO put the journal on a raid device

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Neil Brown <[EMAIL PROTECTED]> wrote: > Due to the system crash the data on hdb is completely ignored. Data Neil - can you explain the algorithm that stamps the superblocks with an event count, once and for all? (until further amendment :-). It goes without saying that sb's are not stamped at ev

Re: RAID1 and data safety?

2005-03-29 Thread Peter T. Breuer
Schuett Thomas EXT <[EMAIL PROTECTED]> wrote: > And here the fault happens: > By chance, it reads the transaction log from hda, then sees, that the > transaction was finished, and clears the overall unclean bit. > This cleaning is a write, so it goes to *both* HDs. Don't put the journal on the ra

[PATCH] md sleeps under spinlock on exit

2005-03-27 Thread Peter T. Breuer
md_exit calls mddev_put on each mddev during module exit. mddev_put calls blk_put_queue under spinlock, although it can sleep (it clearly calls kblockd_flush). This patch lifts the spinlock to do the flush. --- md.c.orig Fri Dec 24 22:34:29 2004 +++ md.cSun Mar 27 14:14:22 2005 @@ -173,7

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-25 Thread Peter T. Breuer
Luca Berra <[EMAIL PROTECTED]> wrote: > we can have a series of failures which must be accounted for and dealt > with according to a policy that might be site specific. > > A) Failure of the standby node > A.1) the active is allowed to continue in the absence of a data replica > A.2) disk writ

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Peter T. Breuer
Paul Clements <[EMAIL PROTECTED]> wrote: > system A > [raid1] > / \ > [disk][nbd] --> system B > > 2) you're writing, say, block 10 to the raid1 when A crashes (block 10 > is dirty in the bitmap, and you don't know whether it got written to the > disk on A or B, neither, o

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Peter T. Breuer
Luca Berra <[EMAIL PROTECTED]> wrote: > If we want to do data-replication, access to the data-replicated device > should be controlled by the data replication process (*), md does not > guarantee this. Well, if one writes to the md device, then md does guarantee this - but I find it hard to parse

Re: [PATCH 0/3] md bitmap-based asynchronous writes

2005-03-22 Thread Peter T. Breuer
Neil Brown <[EMAIL PROTECTED]> wrote: > However I want to do raid5 first. I think it would be much easier > because of the stripe cache. Any 'stripe' with a bad read would be There's the FR5 patch (fr5.sf.net) which adds a bitmap to raid5. It doesn't do "robust read" for raid5, however. > flagg

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Peter T. Breuer
Paul Clements <[EMAIL PROTECTED]> wrote: OK - thanks for the reply, Paul ... > Peter T. Breuer wrote: > > But why don't we already know from the _single_ bitmap on the array > > node ("the node with the array") what to rewrite in total? All writes > > m

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-21 Thread Peter T. Breuer
Paul Clements <[EMAIL PROTECTED]> wrote: > At any rate, this is all irrelevant given the second part of that email > reply that I gave. You still have to do the bitmap combining, regardless > of whether two systems were active at the same time or not. As I understand it, you want both bitmaps in

Re: "Robust Read"

2005-03-19 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > The point a) is moot, because this whole structure is used in raid1.c ONLY. > (I don't know why it is placed into raid1.h header file instead of into > raid1.c directly, but that's a different topic). Hmm. I'm a little surprised. I would be worried that

Re: "Robust Read"

2005-03-19 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > > Uh OK. As I recall one only needs to count, one doesn't need a bitwise > > map of what one has dealt with. > > Well. I see read_balance() is now used to resubmit reads. There's > a reason to use it instead of choosing "next" disk, I think. I can't

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Guy <[EMAIL PROTECTED]> wrote: > I agree, but I don't think a block device can to a re-sync without > corrupting both. How do you merge a superset at the block level? AND the 2 Don't worry - it's just a one-way copy done efficiently (i.e., leaving out all the blocks known to be unmodified both s

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2005-03-19T16:06:29, "Peter T. Breuer" <[EMAIL PROTECTED]> wrote: > > I'm cutting out those parts of the discussion which are irrelevant (or > which I don't consider worth pursuing; maybe you'll

Re: "Robust Read"

2005-03-19 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > [-- text/plain, encoding 7bit, charset: KOI8-R, 74 lines --] > > Peter T. Breuer wrote: > [] > > The patch was originally developed for 2.4, then ported to 2.6.3, and > > then to 2.6.8.1. Neil has recently been doing

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Mario Holbe <[EMAIL PROTECTED]> wrote: > Peter T. Breuer <[EMAIL PROTECTED]> wrote: > > Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > >> Split-brain is a well studied subject, and while many prevention > >> strategies exist, errors occur even i

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2005-03-19T14:27:45, "Peter T. Breuer" <[EMAIL PROTECTED]> wrote: > > > > Which one of the datasets you choose you could either arbitate via some > > > automatic mechanisms (drbd-0.8 has a couple) o

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2005-03-19T12:43:41, "Peter T. Breuer" <[EMAIL PROTECTED]> wrote: > > > Well, there is the "right data" from our point of view, and it is what > > should by on (one/both?) device by now. O

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > Ok, you intrigued me enouth already.. what's the FR1 patch? I want > to give it a try... ;) Especially I'm interested in the "Robust Read" > thing... That was published on this list a few weeks ago (probably needs updating, but I am sure you can help

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > Luca Berra wrote: > > On Fri, Mar 18, 2005 at 02:42:55PM +0100, Lars Marowsky-Bree wrote: > > > >> The problem is for multi-nodes, both sides have their own bitmap. When a > >> split scenario occurs, and both sides begin modifying the data, that > >> bi

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Paul Clements <[EMAIL PROTECTED]> wrote: > Peter T. Breuer wrote: > > I don't see that this solves anything. If you had both sides going at > > once, receiving different writes, then you are sc&**ed, and no > > resolution of bitmaps will help you, since bo

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-19 Thread Peter T. Breuer
Mario Holbe <[EMAIL PROTECTED]> wrote: > Peter T. Breuer <[EMAIL PROTECTED]> wrote: > > Yes, you can "sync" them by writing any one of the two mirrors to the > > other one, and need do so only on the union of the mapped data areas, > > As far as I unders

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Peter T. Breuer
Mario Holbe <[EMAIL PROTECTED]> wrote: > Peter T. Breuer <[EMAIL PROTECTED]> wrote: > > different (legitimate) data. It doesn't seem relevant to me to consider > > if they are equally up to date wrt the writes they have received. They > > will be in the wrong

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Peter T. Breuer
Paul Clements <[EMAIL PROTECTED]> wrote: > [ptb] > > Could you set out the scenario very exactly, please, for those of us at > > the back of the class :-). I simply don't see it. I'm not saying it's > > not there to be seen, but that I have been unable to build a mental > > image of the situation f

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2005-03-18T13:52:54, "Peter T. Breuer" <[EMAIL PROTECTED]> wrote: > > > (proviso - I didn't read the post where you set out the error > > situations, but surely, on theoretical grounds, all that can ha

Re: [patch] md superblock update failures

2005-03-18 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > Minor cleanup: > > > @@ -1325,24 +1336,24 @@ repeat: > > > > dprintk("%s ", bdevname(rdev->bdev,b)); > > if (!rdev->faulty) { > > - err += write_disk_sb(rdev); > > + md_super_write

Re: [PATCH 1/2] md bitmap bug fixes

2005-03-18 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2005-03-15T09:54:52, Neil Brown <[EMAIL PROTECTED]> wrote: > > I think any scheme that involved multiple bitmaps would be introducing > > too much complexity. Certainly your examples sound very far fetched > > (as I think you admitted yourself).

Re: [PATCH md 0 of 4] Introduction

2005-03-09 Thread Peter T. Breuer
Neil Brown <[EMAIL PROTECTED]> wrote: > On Tuesday March 8, [EMAIL PROTECTED] wrote: > > Have you remodelled the md/raid1 make_request() fn? > > Somewhat. Write requests are queued, and raid1d submits them when > it is happy that all bitmap updates have been done. OK - so a slight modification o

Re: [PATCH md 0 of 4] Introduction

2005-03-08 Thread Peter T. Breuer
Paul Clements <[EMAIL PROTECTED]> wrote: > Peter T. Breuer wrote: > > Neil - can you describe for me (us all?) what is meant by > > intent-logging here. > > Since I wrote a lot of the code, I guess I'll try... Hi, Paul. Thanks. > > Well, I can guess -

Re: [PATCH md 0 of 4] Introduction

2005-03-08 Thread Peter T. Breuer
NeilBrown <[EMAIL PROTECTED]> wrote: > The second two fix bugs that were introduced by the recent > bitmap-based-intent-logging patches and so are not relevant Neil - can you describe for me (us all?) what is meant by intent­logging here. Well, I can guess - I suppose the driver marks the bitmap

Re: Write Order Restrictions

2005-03-08 Thread Peter T. Breuer
Can Sar <[EMAIL PROTECTED]> wrote: > the driver just cycles through all devices that make up a soft raid > device and just calls generic_make_request on them. Is this correct, or > does some other function involved in the write process (starting from > the soft raid level down) actually wait on

Re: Joys of spare disks!

2005-03-07 Thread Peter T. Breuer
[EMAIL PROTECTED] wrote: > I've been going through the MD driver source, and to tell the truth, can't > figure out where the read error is detected and how to "hook" that event and > force a re-write of the failing sector. I would very much appreciate it if I did that for RAID1, or at least most

Re: Creating RAID1 with "missing" - mdadm 1.90

2005-03-05 Thread Peter T. Breuer
berk walker <[EMAIL PROTECTED]> wrote: > What might the proper [or functional] syntax be to do this? > > I'm running 2.6.10-1.766-FC3, and mdadm 1.90. Substitute the word "missing" for the corresponding device in the mdadm create command. (quotes manual page) To create a "degraded" array i

Re: Severe, huge data corruption with softraid

2005-03-02 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > >>Unable to handle kernel paging request at virtual address f8924690 > > > > That address is bogus. Looks more like a negative integer. I suppose > > ram corruption is a posibility too. > > Ram corruption in what sense? Faulty DIMM? Anything. > Well

Re: Severe, huge data corruption with softraid

2005-03-02 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > And finally I managed to get an OOPs. What CPU? SMP? How many? Which kernel? Is it preemptive? > Created fresh raid5 array out of 4 partitions, > chunk size = 4kb. > Created ext3fs on it. > Tested write speed (direct-io) - it was terrible, > about

Re: RAID1 robust read and read/write correct and EVMS-BBR

2005-02-23 Thread Peter T. Breuer
[EMAIL PROTECTED] wrote: > We are waiting for the one day where the same block on all mirrors has > read problems. Ok, we're now waiting for about 15 years because the > HPUX mirror strategy is the same. Quite a long time without desaster > but it will happen (till today Murphy was right in any cas

Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread Peter T. Breuer
J. David Beutel <[EMAIL PROTECTED]> wrote: > Peter T. Breuer wrote, on 2005-Feb-23 1:50 AM: > > > Quite possibly - I never tested the rewrite part of the patch, just > > > >wrote it to indicate how it should go and stuck it in to encourage > >others to go on f

Re: RAID1 robust read and read/write correct and EVMS-BBR

2005-02-23 Thread Peter T. Breuer
In gmane.linux.raid Nagpure, Dinesh <[EMAIL PROTECTED]> wrote: > I noticed the discussion about robust read on the RAID list and similar one > on the EVMS list so I am sending this mail to both the lists. Latent media > faults which prevent data from being read from portions of a disk has always >

Re: *terrible* direct-write performance with raid5

2005-02-23 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > (note raid5 performs faster than a single drive, it's expectable > as it is possible to write to several drives in parallel). Each raid5 write must include at least ONE write to a target. I think you're saying that the writes go to different targets fr

Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread Peter T. Breuer
J. David Beutel <[EMAIL PROTECTED]> wrote: > I'd like to try this patch > http://marc.theaimsgroup.com/?l=linux-raid&m=110704868115609&w=2 with > EVMS BBR. > > Has anyone tried it on 2.6.10 (with FC2 1.9 and EVMS patches)? Has > anyone tried the rewrite part at all? I don't know md or the ker

Re: *terrible* direct-write performance with raid5

2005-02-22 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > Peter T. Breuer wrote: > > Michael Tokarev <[EMAIL PROTECTED]> wrote: > > > >>When debugging some other problem, I noticied that > >>direct-io (O_DIRECT) write speed on a software raid5 > > > &

Re: *terrible* direct-write performance with raid5

2005-02-22 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > When debugging some other problem, I noticied that > direct-io (O_DIRECT) write speed on a software raid5 And normal write speed (over 10 times the size of ram)? > is terrible slow. Here's a small table just to show > the idea (not numbers by itself a

Re: > 2TB ?

2005-02-10 Thread Peter T. Breuer
No email <[EMAIL PROTECTED]> wrote: > > Forgive me as this is probably a silly question and one that has been > answered many times, I have tried to search for the answers but have > ended up more confused than when I started. So thought maybe I could > ask the community to put me out of my miser

Re: RAID1 detection of faulty disks

2005-02-09 Thread Peter T. Breuer
[EMAIL PROTECTED] wrote: > just for my understanding of RAID1. When is a partition set faulty? > As soon as a read hits a bad block or only when a write attempts to > write to a bad block? > > I'm a little bit confused as I read the thread 'Robust read patch for raid1'. > Does it mean that a read

Re: Robust read patch for raid1

2005-02-01 Thread Peter T. Breuer
Peter T. Breuer <[EMAIL PROTECTED]> wrote: > Allow me to remind what the patch does: it allows raid1 to proceed > smoothly after a read error on a mirror component, without faulting the > component. If the information is on another component, it will be > returned. If all com

Robust read patch for raid1

2005-01-29 Thread Peter T. Breuer
I've had the opportunity to test the "robust read" patch that I posted earier in the month (10 Jan, Subject: Re: Spares and partitioning huge disks), and it needs one more change ... I assumed that the raid1 map function would move a (retried) request to another disk, but it des not, it always move

corruption on disk

2005-01-23 Thread Peter T. Breuer
Just a followup ... Neil said he has never seen disks corrupt spontaneously. I'm just making the rounds of checking the daily md5sums on one group of machines with a view to estimating the corruption rates. Here's one of the typical (one bit) corruptions: doc013:/usr/oboe/ptb% cmp --verbose /tm

Re: patches for mdadm 1.8.0 (auto=dev and stacking of devices)

2005-01-23 Thread Peter T. Breuer
Luca Berra <[EMAIL PROTECTED]> wrote: > On Sun, Jan 23, 2005 at 07:52:53PM +0100, Peter T. Breuer wrote: > >Making special device files "on demand" requires the cooperation of the > >driver and devfs (and since udev apparently replaces devfs, udev). One > >wou

Re: patches for mdadm 1.8.0 (auto=dev and stacking of devices)

2005-01-23 Thread Peter T. Breuer
Luca Berra <[EMAIL PROTECTED]> wrote: > I believe the correct solution to this would be implementing a char-misc > /dev/mdadm device that mdadm would use instead of the block device, > like device-mapper does. Alas i have no time for this in the forseable > future. It's a generic problem (or non-p

Re: patches for mdadm 1.8.0 (auto=dev and stacking of devices)

2005-01-23 Thread Peter T. Breuer
Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2005-01-23T16:13:05, Luca Berra <[EMAIL PROTECTED]> wrote: > > > the first one adds an auto=dev parameter > > rationale: udev does not create /dev/md* device files, so we need a way > > to create them when assembling the md device. > > Am I missi

Re: No response?

2005-01-20 Thread Peter T. Breuer
David Dougall <[EMAIL PROTECTED]> wrote: > Jan 10 11:56:06 linux-sg2 kernel: SCSI disk error : host 0 channel 0 id 0 > lun 47 > return code = 802 That is sda. > Jan 10 11:56:08 linux-sg2 kernel: I/O error: dev 08:10, sector 343219280 Well, I don't really understand - that is sdb, no? No? (

Re: No response?

2005-01-20 Thread Peter T. Breuer
David Dougall <[EMAIL PROTECTED]> wrote: > If I am running software raid1 and a disk device starts throwing I/O > errors, Is the filesystem supposed to see any indication of this? I No - not if the error is on only one disk. The first error will fault the disk from the array and the driver will r

Re: Checking if RAID does work?

2005-01-19 Thread Peter T. Breuer
maarten <[EMAIL PROTECTED]> wrote: > On Wednesday 19 January 2005 21:19, Peter T. Breuer wrote: > > Poonam Dalya <[EMAIL PROTECTED]> wrote: > > > I mounted my /dev/md1 on /mnt/raid. and then wrote a > > > file on it. Then I tried to mount the raid disks >

Re: Checking if RAID does work?

2005-01-19 Thread Peter T. Breuer
Poonam Dalya <[EMAIL PROTECTED]> wrote: > I mounted my /dev/md1 on /mnt/raid. and then wrote a > file on it. Then I tried to mount the raid disks > /dev/hda10 on some other mount point and checked that > mount point. But there was nothing in that mount > point. Please could you please help me with

Re: RAID1 & 2.6.9 performance problem

2005-01-18 Thread Peter T. Breuer
Hans Kristian Rosbach <[EMAIL PROTECTED]> wrote: > On Mon, 2005-01-17 at 17:46, Peter T. Breuer wrote: > > Interesting. How did you measure latency? Do you have a script you > > could post? > > It's part of another application we use internally at work. I'll c

Re: RAID1 & 2.6.9 performance problem

2005-01-17 Thread Peter T. Breuer
Hans Kristian Rosbach <[EMAIL PROTECTED]> wrote: > -It selects the disk that is closest to the wanted sector by remembering > what sector was last requested and what disk was used for it. > -For sequential reads (sucha as hdparm) it will override and use the > same disk anyways. (sector = lastsec

Re: Spares and partitioning huge disks

2005-01-15 Thread Peter T. Breuer
Mikael Abrahamsson <[EMAIL PROTECTED]> wrote: > if read error then > recreate the block from parity > write to sector that had read error > wait until write has completed > flush buffers > read back block from drive > if block still bad > fail disk > log result Well, I haven'

Re: Spares and partitioning huge disks

2005-01-15 Thread Peter T. Breuer
Michael Tokarev <[EMAIL PROTECTED]> wrote: > That all to say: yes indeed, this lack of "smart error handling" is > a noticieable omission in linux software raid. There are quite some > (sometimes fatal to the data) failure scenarios that'd not had happened > provided the smart error handling where