Re: [BUG] OOPS 2.6.24.2 raid5 write with ioatdma

2008-02-15 Thread Dan Williams
e, I'll try to reproduce this locally. Regards, Dan ioat: fix 'ack' handling, driver must ensure that 'ack' is zero From: Dan Williams <[EMAIL PROTECTED]> Initialize 'ack' to zero in case the descriptor has been recycled. Signed-off-by: Dan Williams &l

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Dan Williams
> heheh. > > it's really easy to reproduce the hang without the patch -- i could > hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB. > i'll try with ext3... Dan's experiences suggest it won't happen with ext3 > (or is even more rare), which would explain why this has is overall a

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-10 Thread Dan Williams
submits the i/o on a stripe is non-deterministic. So I do not see this change making the situation any worse. In fact, it may make it a bit better since there is a higher chance for the thread submitting i/o to MD to do its own i/o to the backing disks. Reviewed-by: Dan Williams <[EMAIL PROTEC

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-09 Thread Dan Williams
On Wed, 2008-01-09 at 20:57 -0700, Neil Brown wrote: > So I'm incline to leave it as "do as much work as is available to be > done" as that is simplest. But I can probably be talked out of it > with a convincing argument Well, in an age of CFS and CFQ it smacks of 'unfairness'. But does that

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-09 Thread Dan Williams
On Jan 9, 2008 5:09 PM, Neil Brown <[EMAIL PROTECTED]> wrote: > On Wednesday January 9, [EMAIL PROTECTED] wrote: > > On Sun, 2007-12-30 at 10:58 -0700, dean gaudet wrote: > > > i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1 > > > > > > http://git.kernel.org/?p=linux/kernel/git

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-09 Thread Dan Williams
On Sun, 2007-12-30 at 10:58 -0700, dean gaudet wrote: > On Sat, 29 Dec 2007, Dan Williams wrote: > > > On Dec 29, 2007 1:58 PM, dean gaudet <[EMAIL PROTECTED]> wrote: > > > On Sat, 29 Dec 2007, Dan Williams wrote: > > > > > > > On Dec 29, 200

Re: Raid 1, new disk can't be added after replacing faulty disk

2008-01-07 Thread Dan Williams
On Jan 7, 2008 6:44 AM, Radu Rendec <[EMAIL PROTECTED]> wrote: > I'm experiencing trouble when trying to add a new disk to a raid 1 array > after having replaced a faulty disk. > [..] > # mdadm --version > mdadm - v2.6.2 - 21st May 2007 > [..] > However, this happens with both mdadm 2.6.2 and 2.6.4

Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Dan Williams
needed as we > > don't delay data block recovery in the same way for raid6 yet. > > But making the change now is safer long-term. > > > > This bug exists in 2.6.23 and 2.6.24-rc > > > > Cc: [EMAIL PROTECTED] > > Cc: Dan Williams <[EMAIL PROTECTED]&

Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Dan Williams
r long-term. > > This bug exists in 2.6.23 and 2.6.24-rc > > Cc: [EMAIL PROTECTED] > Cc: Dan Williams <[EMAIL PROTECTED]> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]> > Acked-by: Dan Williams <[EMAIL PROTECTED]> - To unsubscribe from this list: sen

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-29 Thread Dan Williams
On Dec 29, 2007 1:58 PM, dean gaudet <[EMAIL PROTECTED]> wrote: > On Sat, 29 Dec 2007, Dan Williams wrote: > > > On Dec 29, 2007 9:48 AM, dean gaudet <[EMAIL PROTECTED]> wrote: > > > hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on

Re: 2.6.24-rc6 reproducible raid5 hang

2007-12-29 Thread Dan Williams
On Dec 29, 2007 9:48 AM, dean gaudet <[EMAIL PROTECTED]> wrote: > hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on > the same 64k chunk array and had raised the stripe_cache_size to 1024... > and got a hang. this time i grabbed stripe_cache_active before bumping > the siz

mdadm: unable to add a disk to degraded raid1 array

2007-12-29 Thread Dan Williams
In case someone else happens upon this I have found that mdadm >= v2.6.2 cannot add a disk to a degraded raid1 array created with mdadm < 2.6.2. I bisected the problem down to mdadm git commit 2fb749d1b7588985b1834e43de4ec5685d0b8d26 which appears to make an incompatible change to the super block'

Re: HELP! New disks being dropped from RAID 6 array on every reboot

2007-11-23 Thread Dan Williams
On Nov 23, 2007 11:19 AM, Joshua Johnson <[EMAIL PROTECTED]> wrote: > Greetings, long time listener, first time caller. > > I recently replaced a disk in my existing 8 disk RAID 6 array. > Previously, all disks were PATA drives connected to the motherboard > IDE and 3 promise Ultra 100/133 controll

Re: PROBLEM: raid5 hangs

2007-11-14 Thread Dan Williams
On Nov 14, 2007 5:05 PM, Justin Piszcz <[EMAIL PROTECTED]> wrote: > On Wed, 14 Nov 2007, Bill Davidsen wrote: > > Justin Piszcz wrote: > >> This is a known bug in 2.6.23 and should be fixed in 2.6.23.2 if the RAID5 > >> bio* patches are applied. > > > > Note below he's running 2.6.22.3 which doesn'

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Dan Williams
ndle_stripe6. Diffing the patches shows the changes for hunk #3: -@@ -2903,6 +2907,13 @@ static void handle_stripe6(struct stripe +@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh) raid5-fix-unending-write-sequence.patch is in -mm and I believe is waiting on an Acked

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Dan Williams
On Nov 13, 2007 5:23 PM, Greg KH <[EMAIL PROTECTED]> wrote: > On Tue, Nov 13, 2007 at 04:22:14PM -0800, Greg KH wrote: > > On Mon, Oct 22, 2007 at 05:15:27PM +1000, NeilBrown wrote: > > > > > > It appears that a couple of bugs slipped in to md for 2.6.23. > > > These two patches fix them and are ap

Re: kernel panic (2.6.23.1-fc7) in drivers/md/raid5.c:144

2007-11-13 Thread Dan Williams
[ Adding Neil, stable@, DaveJ, and GregKH to the cc ] On Nov 13, 2007 11:20 AM, Peter <[EMAIL PROTECTED]> wrote: > Hi > > I had a 3 disc raid5 array running fine with Fedora 7 (32bit) kernel > 2.6.23.1-fc7 on an old Athlon XP using a two sata_sil cards. > > I replaced the hardware with an Athlon6

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread Dan Williams
On 11/8/07, Bill Davidsen <[EMAIL PROTECTED]> wrote: > Jeff Lessem wrote: > > Dan Williams wrote: > > > The following patch, also attached, cleans up cases where the code > > looks > > > at sh->ops.pending when it should be looking at the consistent &g

[PATCH] raid5: fix unending write sequence

2007-11-08 Thread Dan Williams
From: Dan Williams <[EMAIL PROTECTED]> handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 check 5: state 0x6 toread read write f800ffcffcc0 written check 4: state 0x6 toread read

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Dan Williams
tx = ops_run_biodrain(sh, tx); + tx = ops_run_biodrain(sh, tx, pending); overlap_clear++; } if (test_bit(STRIPE_OP_POSTXOR, &pending)) - ops_run_postxor(sh, tx); + ops_run_postxor(sh, tx, pending); if (test_bit(STRIPE_OP_C

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread Dan Williams
On 11/5/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: [..] > > Are you seeing the same "md thread takes 100% of the CPU" that Joël is > > reporting? > > > > Yes, in another e-mail I posted the top output with md3_raid5 at 100%. > This seems too similar to Joël's situation for them not to be correla

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread Dan Williams
On 11/4/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > On Mon, 5 Nov 2007, Neil Brown wrote: > > > On Sunday November 4, [EMAIL PROTECTED] wrote: > >> # ps auxww | grep D > >> USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > >> root 273 0.0 0.0 0 0 ?

Re: Bug in processing dependencies by async_tx_submit() ?

2007-11-01 Thread Dan Williams
On 11/1/07, Yuri Tikhonov <[EMAIL PROTECTED]> wrote: > > Hi Dan, > > Honestly I tried to fix this quickly using the approach similar to proposed > by you, with one addition though (in fact, deletion of BUG_ON(chan == > tx->chan) in async_tx_run_dependencies()). And this led to "Kernel stack >

Re: Bug in processing dependencies by async_tx_submit() ?

2007-10-31 Thread Dan Williams
check if the ->parent field of the dependency is non-NULL. This also requires that the parent field be cleared at dependency submission time. Found-by: Yuri Tikhonov <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- crypto/async_tx/async_tx.c |6 +- 1

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-27 Thread Dan Williams
On 10/27/07, BERTRAND Joël <[EMAIL PROTECTED]> wrote: > Dan Williams wrote: > > Can you collect some oprofile data, as Ming suggested, so we can maybe > > see what md_d0_raid5 and istd1 are fighting about? Hopefully it is as > > painless to run on sparc as it is on IA

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread Dan Williams
On 10/24/07, BERTRAND Joël <[EMAIL PROTECTED]> wrote: > Hello, > > Any news about this trouble ? Any idea ? I'm trying to fix it, but I > don't see any specific interaction between raid5 and istd. Does anyone > try to reproduce this bug on another arch than sparc64 ? I only use > sp

Re: MD driver document

2007-10-24 Thread Dan Williams
On 10/24/07, tirumalareddy marri <[EMAIL PROTECTED]> wrote: > > Hi, >I am looking for best way of understanding MD > driver(including raid5/6) architecture. I am > developing driver for one of the PPC based SOC. I have > done some code reading and tried to use HW debugger to > walk through the

Re: async_tx: get best channel

2007-10-23 Thread Dan Williams
On Fri, 2007-10-19 at 05:23 -0700, Yuri Tikhonov wrote: > > Hello Dan, Hi Yuri, sorry it has taken me so long to get back to you... > > I have a suggestion regarding the async_tx_find_channel() procedure. > > First, a little introduction. Some processors (e.g. ppc440spe) have several > DMA

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Dan Williams
On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote: > > Sorry for this last mail. I have found another mistake, but I > don't > know if this bug comes from iscsi-target or raid5 itself. iSCSI target > is disconnected because istd1 and md_d0_raid5 kernel threads use 100% > of > CPU each

[PATCH -stable, 2.6.24-rc] raid5: fix clearing of biofill operations (try2)

2007-10-19 Thread Dan Williams
rom: Dan Williams <[EMAIL PROTECTED]> ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the 'pending' and 'ack' bits. Since the test_and_ack_op() macro only checks against 'complete' it can get an inconsistent snapshot of pen

Re: [BUG] Raid5 trouble

2007-10-19 Thread Dan Williams
On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: > I never see any oops with this patch. But I cannot create a > RAID1 array > with a local RAID5 volume and a foreign RAID5 array exported by iSCSI. > iSCSI seems to works fine, but RAID1 creation randomly aborts due to a > unknown SCS

Re: [BUG] Raid5 trouble

2007-10-17 Thread Dan Williams
rnel bad sw trap 5 [#1] > > If that can help you... > > JKB This gives more evidence that it is probably mishandling of STRIPE_OP_BIOFILL. The attached patch (replacing the previous) moves the clearing of these bits into handle_stripe5 and adds some debug informatio

Re: [BUG] Raid5 trouble

2007-10-17 Thread Dan Williams
On 10/17/07, Dan Williams <[EMAIL PROTECTED]> wrote: > On 10/17/07, BERTRAND Joël <[EMAIL PROTECTED]> wrote: > > BERTRAND Joël wrote: > > > Hello, > > > > > > I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each > > > serve

Re: [BUG] Raid5 trouble

2007-10-17 Thread Dan Williams
On 10/17/07, BERTRAND Joël <[EMAIL PROTECTED]> wrote: > BERTRAND Joël wrote: > > Hello, > > > > I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each > > server has a partitionable raid5 array (/dev/md/d0) and I have to > > synchronize both raid5 volumes by raid1. Thus, I have trie

Re: experiences with raid5: stripe_queue patches

2007-10-16 Thread Dan Williams
On Mon, 2007-10-15 at 08:03 -0700, Bernd Schubert wrote: > Hi, > > in order to tune raid performance I did some benchmarks with and > without the > stripe queue patches. 2.6.22 is only for comparison to rule out other > effects, e.g. the new scheduler, etc. Thanks for testing! > It seems there i

Re: mdadm: /dev/sda1 is too small: 0K

2007-10-13 Thread Dan Williams
On 10/13/07, Hod Greeley <[EMAIL PROTECTED]> wrote: > Hello, > > I tried to create a raid device starting with > > foo:~ 1032% mdadm --create -l1 -n2 /dev/md1 /dev/sda1 missing > mdadm: /dev/sda1 is too small: 0K > mdadm: create aborted > Quick sanity check, is /dev/sda1 still a block device node

Re: [PATCH -mm 0/4] raid5: stripe_queue (+20% to +90% write performance)

2007-10-09 Thread Dan Williams
On Mon, 2007-10-08 at 23:21 -0700, Neil Brown wrote: > On Saturday October 6, [EMAIL PROTECTED] wrote: > > Neil, > > > > Here is the latest spin of the 'stripe_queue' implementation. > Thanks to > > raid6+bitmap testing done by Mr. James W. Laferriere there have been > > several cleanups and fixes

Re: [PATCH -mm 0/4] raid5: stripe_queue (+20% to +90% write performance)

2007-10-07 Thread Dan Williams
On 10/6/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > On Sat, 6 Oct 2007, Dan Williams wrote: > > > Neil, > > > > Here is the latest spin of the 'stripe_queue' implementation. Thanks to > > raid6+bitmap testing done by Mr. James W. Lafer

[PATCH -mm 4/4] raid5: use stripe_queues to prioritize the "most deserving" requests (rev7)

2007-10-06 Thread Dan Williams
_queue_bio' and object allocation changes into separate patches * fix release_stripe/release_queue ordering * refactor handle_queue and release_queue to remove STRIPE_QUEUE_HANDLE and sq->sh back references * kill init_sh and allocate init_sq on the stack Tested-by: Mr. James W. L

[PATCH -mm 3/4] raid5: convert add_stripe_bio to add_queue_bio

2007-10-06 Thread Dan Williams
l indicate whether a stripe_head is attached. Tested-by: Mr. James W. Laferriere <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 53 include/linux/raid/raid5.h |6 + 2 files changed

[PATCH -mm 1/4] raid5: add the stripe_queue object for tracking raid io requests (rev3)

2007-10-06 Thread Dan Williams
mes W. Laferriere <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 564 +++- include/linux/raid/raid5.h | 28 +- 2 files changed, 364 insertions(+), 228 deletions(-) diff --git a/driver

[PATCH -mm 2/4] raid5: split allocation of stripe_heads and stripe_queues

2007-10-06 Thread Dan Williams
loop. Tested-by: Mr. James W. Laferriere <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 316 include/linux/raid/raid5.h | 11 +- 2 files changed, 239 insertions(+), 88 deletions(-)

[PATCH -mm 0/4] raid5: stripe_queue (+20% to +90% write performance)

2007-10-06 Thread Dan Williams
ce numbers. Andrew, These are updated in the git-md-accel tree, but I will work the finalized versions through Neil's 'Signed-off-by' path. Dan Williams (4): raid5: add the stripe_queue object for tracking raid io requests (rev3) raid5: split allocation of strip

[GIT PULL] async-tx/md-accel fixes and documentation for 2.6.23

2007-09-24 Thread Dan Williams
Linus, please pull from: git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus to receive: Dan Williams (3): async_tx: usage documentation and developer notes (v2) async_tx: fix dma_wait_for_async_tx raid5: fix 2 bugs in ops_complete_biofill The raid5

[PATCH 2.6.23-rc7 2/3] async_tx: fix dma_wait_for_async_tx

2007-09-20 Thread Dan Williams
Fix dma_wait_for_async_tx to not loop forever in the case where a dependency chain is longer than two entries. This condition will not happen with current in-kernel drivers, but fix it for future drivers. Found-by: Saeed Bishara <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL

[PATCH 2.6.23-rc7 3/3] raid5: fix ops_complete_biofill

2007-09-20 Thread Dan Williams
ead). ops_complete_biofill can run in tasklet context, so rather than upgrading all the stripe locks from spin_lock to spin_lock_bh this patch just moves read completion handling back into handle_stripe. Found-by: Yuri Tikhonov <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- d

[PATCH 2.6.23-rc7 1/3] async_tx: usage documentation and developer notes

2007-09-20 Thread Dan Williams
Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- Documentation/crypto/async-tx-api.txt | 217 + 1 files changed, 217 insertions(+), 0 deletions(-) diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt new file mode

[PATCH 2.6.23-rc7 0/3] async_tx and md-accel fixes for 2.6.23

2007-09-20 Thread Dan Williams
Fix a couple bugs and provide documentation for the async_tx api. Neil, please 'ack' patch #3. git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus Dan Williams (3): async_tx: usage documentation and developer notes async_tx: fix dma_wait_for_async_tx

Re: md raid acceleration and the async_tx api

2007-09-13 Thread Dan Williams
On 9/13/07, Yuri Tikhonov <[EMAIL PROTECTED]> wrote: > > Hi Dan, > > On Friday 07 September 2007 20:02, you wrote: > > You need to fetch from the 'md-for-linus' tree. But I have attached > > them as well. > > > > git fetch git://lost.foo-projects.org/~dwillia2/git/iop > > md-for-linus:md-for-linu

Re: [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-08-30 Thread Dan Williams
On 8/30/07, saeed bishara <[EMAIL PROTECTED]> wrote: > you are right, I've another question regarding the function > dma_wait_for_async_tx from async_tx.c, here is the body of the code: >/* poll through the dependency chain, return when tx is complete */ > 1.do { > 2.

Re: md raid acceleration and the async_tx api

2007-08-30 Thread Dan Williams
ead of the other proposed patches) . > Regards, Yuri Thanks, Dan [1] git fetch -f git://lost.foo-projects.org/~dwillia2/git/iop md-for-linus:refs/heads/md-for-linus raid5: fix the 'more_to_read' case in ops_complete_biofill From: Dan Williams <[EMAIL PRO

Re: raid5:md3: kernel BUG , followed by , Silent halt .

2007-08-27 Thread Dan Williams
On 8/25/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote: > Hello Dan , > > On Mon, 20 Aug 2007, Dan Williams wrote: > > On 8/18/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote: > >> Hello All , Here we go again . Again attempting to

Re: Patch for boot-time assembly of v1.x-metadata-based soft (MD) arrays: reasoning and future plans

2007-08-26 Thread Dan Williams
On 8/26/07, Abe Skolnik <[EMAIL PROTECTED]> wrote: > Dear Mr./Dr. Williams, > Just "Dan" is fine :-) > > Because you can rely on the configuration file to be certain about > > which disks to pull in and which to ignore. Without the config file > > the auto-detect routine may not always do the rig

Re: Patch for boot-time assembly of v1.x-metadata-based soft (MD) arrays

2007-08-26 Thread Dan Williams
On 8/26/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > On Sun, 26 Aug 2007, Abe Skolnik wrote: > > > Dear Mr./Dr./Prof. Brown et al, > > > > I recently had the unpleasant experience of creating an MD array for > > the purpose of booting off it and then not being able to do so. Since > > I had

Re: raid5:md3: kernel BUG , followed by , Silent halt .

2007-08-20 Thread Dan Williams
ential write performance > (stripe-queue) , Dan Williams <[EMAIL PROTECTED]> Hello James, Thanks for the report. I tried to reproduce this on my system, no luck. However it looks like their is a potential race between 'handle_queue' and 'add_queue_bio'. The attache

Re: [RFT] 2.6.22.1-iop1 for improved sequential write performance (stripe-queue)

2007-08-06 Thread Dan Williams
On 8/4/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote: > Hello Dan , > > On Thu, 19 Jul 2007, Dan Williams wrote: > > Per Bill Davidsen's request I have made available a 2.6.22.1 based > > kernel with the current raid5 performance changes I have bee

Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5

2007-07-30 Thread Dan Williams
[trimmed all but linux-raid from the cc] On 7/30/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: > CONFIG: > > Software RAID 5 (400GB x 6): Default mkfs parameters for all filesystems. > Kernel was 2.6.21 or 2.6.22, did these awhile ago. Can you give 2.6.22.1-iop1 a try to see what affect it has on s

[GIT PATCH 1/2] raid5: add the stripe_queue object for tracking raid io requests (take2)

2007-07-22 Thread Dan Williams
/s without. Pre-patch throughput hovers at ~85MB/s for this dd command. Changes in take2: * leave the flags with the buffers, prevents a data corruption issue whereby stale buffer state flags are attached to newly initialized buffers Signed-off-by: Dan Williams <[EMAIL PROTECTED]> ---

[GIT PATCH 0/2] stripe-queue for 2.6.23 consideration

2007-07-22 Thread Dan Williams
insertions(+), 407 deletions(-) Dan Williams (2): raid5: add the stripe_queue object for tracking raid io requests (take2) raid5: use stripe_queues to prioritize the "most deserving" requests (take4) I initially considered them 2.6.24 material but after fixing the sync+io data

[RFT] 2.6.22.1-iop1 for improved sequential write performance (stripe-queue)

2007-07-19 Thread Dan Williams
Per Bill Davidsen's request I have made available a 2.6.22.1 based kernel with the current raid5 performance changes I have been working on: 1/ Offload engine acceleration (recently merged for the 2.6.23 development cycle) 2/ Stripe-queue, an evolutionary change to the raid5 queuing model (take4)

[-mm PATCH 1/2] raid5: add the stripe_queue object for tracking raid io requests (take2)

2007-07-13 Thread Dan Williams
at ~98MB/s compared to ~120MB/s without. Pre-patch throughput hovers at ~85MB/s for this dd command. Changes in take2: * leave the flags with the buffers, prevents a data corruption issue whereby stale buffer state flags are attached to newly initialized buffers Signed-off-by: Dan Williams <[EM

[-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3

2007-07-13 Thread Dan Williams
Neil, Andrew, The following patches replace the stripe-queue patches currently in -mm. Following your suggestion, Neil, I gathered blktrace data on the number of reads generated by sequential write stimulus. It turns out that reduced pre-reading is not the cause of the performance increase, but r

[GIT PULL] ioat fixes, raid5 acceleration, and the async_tx api

2007-07-13 Thread Dan Williams
_dma_copybreak sysctl I/OAT: Only offload copies for TCP when there will be a context switch Dan Aloni (1): I/OAT: fix I/OAT for kexec Dan Williams (20): dmaengine: refactor dmaengine around dma_async_tx_descriptor dmaengine: make clients responsible for managing channels

Re: [RFC PATCH 0/2] raid5: 65% sequential-write performance improvement, stripe-queue take2

2007-07-05 Thread Dan Williams
On 04 Jul 2007 13:41:26 +0200, Andi Kleen <[EMAIL PROTECTED]> wrote: Dan Williams <[EMAIL PROTECTED]> writes: > The write performance numbers are better than I expected and would seem > to address the concerns raised in the thread "Odd (slow) RAID > performance"

[RFC PATCH 0/2] raid5: 65% sequential-write performance improvement, stripe-queue take2

2007-07-03 Thread Dan Williams
The first take of the stripe-queue implementation[1] had a performance limiting bug in __wait_for_inactive_queue. Fixing that issue drastically changed the performance characteristics. The following data from tiobench shows the relative performance difference of the stripe-queue patchset. Unit i

[RFC PATCH 0/2] An evolutionary change to the raid456 queuing model

2007-06-27 Thread Dan Williams
Raz's stripe-deadline patch illuminated the fact that the current queuing model leaves write performance on the table in some cases. The following patches introduce a new queuing model which attempts to recover this performance. On an ARM based iop13xx platform I see an averaged %14.7 increase in

Re: [md-accel PATCH 03/19] xor: make 'xor_blocks' a library routine for use with async_tx

2007-06-27 Thread Dan Williams
[ trimmed the cc ] On 6/26/07, Satyam Sharma <[EMAIL PROTECTED]> wrote: Hi Dan, [ Minor thing ... ] Not a problem, thanks for taking a look... On 6/27/07, Dan Williams <[EMAIL PROTECTED]> wrote: > The async_tx api tries to use a dma engine for an operation, but will fa

Re: [md-accel PATCH 00/19] md raid acceleration and the async_tx api

2007-06-26 Thread Dan Williams
On 6/26/07, Mr. James W. Laferriere <[EMAIL PROTECTED]> wrote: Hello Dan , On Tue, 26 Jun 2007, Dan Williams wrote: > Greetings, > > Per Andrew's suggestion this is the md raid5 acceleration patch set > updated with more thorough changelogs to lower the barrier t

[md-accel PATCH 19/19] ARM: Add drivers/dma to arch/arm/Kconfig

2007-06-26 Thread Dan Williams
Cc: Russell King <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- arch/arm/Kconfig |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 50d9f3e..0cb2d4f 100644 --- a/arch/arm/Kconfig +++ b/arch

[md-accel PATCH 17/19] iop13xx: surface the iop13xx adma units to the iop-adma driver

2007-06-26 Thread Dan Williams
x) * build error fix from Kirill A. Shutemov * rebase for async_tx changes * add interrupt support * do not call platform register macros in driver code * remove unnecessary ARM assembly statement * checkpatch.pl fixes * gpl v2 only correction Cc: Russell King <[EMAIL PROTECTED]> Signed-off-

[md-accel PATCH 18/19] iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver

2007-06-26 Thread Dan Williams
TECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- arch/arm/mach-iop32x/glantank.c|2 arch/arm/mach-iop32x/iq31244.c |5 arch/arm/mach-iop32x/iq80321.c |3 arch/arm/mach-iop32x/n2100.c |2 arch/arm/mach-iop33x/iq80331.c

[md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-06-26 Thread Dan Williams
efore iop_chan_enable * checkpatch.pl fixes * gpl v2 only correction * move set_src, set_dest, submit to async_tx methods Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/Kconfig |8 drivers/dma/Makefile|1 drivers/dma

[md-accel PATCH 14/19] md: handle_stripe5 - request io processing in raid5_run_ops

2007-06-26 Thread Dan Williams
I/O submission requests were already handled outside of the stripe lock in handle_stripe. Now that handle_stripe is only tasked with finding work, this logic belongs in raid5_run_ops. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- driver

[md-accel PATCH 15/19] md: remove raid5 compute_block and compute_parity5

2007-06-26 Thread Dan Williams
replaced by raid5_run_ops Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 124 1 files changed, 0 insertions(+), 124 deletions(-) diff --git a/drivers/md/raid5.c

[md-accel PATCH 12/19] md: handle_stripe5 - add request/completion logic for async read ops

2007-06-26 Thread Dan Williams
arrive while raid5_run_ops is running they will not be handled until handle_stripe is scheduled to run again. Changelog: * cleanup to_read and to_fill accounting * do not fail reads that have reached the cache Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL

[md-accel PATCH 13/19] md: handle_stripe5 - add request/completion logic for async expand ops

2007-06-26 Thread Dan Williams
differentiate expand operations from normal write operations. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 50 ++ 1 files changed, 38 insertions(+), 12 deletions(-) diff --git

[md-accel PATCH 10/19] md: handle_stripe5 - add request/completion logic for async compute ops

2007-06-26 Thread Dan Williams
R5_Wantcompute flag there is no facility to pass the async_tx dependency chain across successive calls to raid5_run_ops. The req_compute variable protects against this case. Changelog: * remove the req_compute BUG_ON Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL

[md-accel PATCH 11/19] md: handle_stripe5 - add request/completion logic for async check ops

2007-06-26 Thread Dan Williams
g is added. Changelog: * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 84 1 files changed, 65 insertions(+), 19 de

[md-accel PATCH 08/19] md: common infrastructure for running operations with raid5_run_ops

2007-06-26 Thread Dan Williams
tted. This enables batching of the submission and completion of operations. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 67 +--- 1 files changed, 58 insertions(+),

[md-accel PATCH 09/19] md: handle_stripe5 - add request/completion logic for async write ops

2007-06-26 Thread Dan Williams
imple flag, Neil Brown * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 161 +--- 1 files changed, 138 inser

[md-accel PATCH 07/19] md: raid5_run_ops - run stripe operations outside sh->lock

2007-06-26 Thread Dan Williams
sary spin_lock from ops_complete_biofill * remove test_and_set/test_and_clear BUG_ONs, Neil Brown * remove explicit interrupt handling for channel switching, this feature was absorbed (i.e. it is now implicit) by the async_tx api Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: Ne

[md-accel PATCH 05/19] raid5: refactor handle_stripe5 and handle_stripe6 (v2)

2007-06-26 Thread Dan Williams
s in code duplication between raid5 and raid6. The following new routines are shared between raid5 and raid6: handle_completed_write_requests handle_requests_to_failed_array handle_stripe_expansion Changes in v2: * fixed 'conf->raid_disk-1' for the raid6 

[md-accel PATCH 06/19] raid5: replace custom debug PRINTKs with standard pr_debug

2007-06-26 Thread Dan Williams
Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in favor of the global DEBUG definition. To get local debug messages just add '#define DEBUG' to the top of the file. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/m

[md-accel PATCH 04/19] async_tx: add the async_tx api

2007-06-26 Thread Dan Williams
ionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> Acked-By: NeilBrown <[EMAIL PROTECTED]> --- crypto

[md-accel PATCH 03/19] xor: make 'xor_blocks' a library routine for use with async_tx

2007-06-26 Thread Dan Williams
ECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- crypto/Kconfig |6 ++ crypto/Makefile |6 ++ crypto/xor.c | 156 ++ drivers/md/Kconfig |1 drivers/md/Makefile |4 + driv

[md-accel PATCH 01/19] dmaengine: refactor dmaengine around dma_async_tx_descriptor

2007-06-26 Thread Dan Williams
ke set_src, set_dest, and tx_submit descriptor specific methods Cc: Jeff Garzik <[EMAIL PROTECTED]> Cc: Chris Leech <[EMAIL PROTECTED]> Cc: Shannon Nelson <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/dmaengine.c | 182 +

[md-accel PATCH 02/19] dmaengine: make clients responsible for managing channels

2007-06-26 Thread Dan Williams
w return dma_state_client to dmaengine (ack, dup, nak) * checkpatch.pl fixes Cc: Chris Leech <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/dmaengine.c | 217 +++-- drivers/dma/ioatdma.c |1 drivers/dma/

[md-accel PATCH 00/19] md raid acceleration and the async_tx api

2007-06-26 Thread Dan Williams
ady for Linus to pull for 2.6.23. git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus Dan Williams (19): dmaengine: refactor dmaengine around dma_async_tx_descriptor dmaengine: make clients responsible for managing channels xor: make 'xor_blocks' a libra

Re: stripe_cache_size and performance

2007-06-25 Thread Dan Williams
7. And now, the question: the best absolute 'write' performance comes with a stripe_cache_size value of 4096 (for my setup). However, any value of stripe_cache_size above 384 really, really hurts 'check' (and rebuild, one can assume) performance. Why? Question: After performance goes "bad" does

Re: [PATCH git-md-accel 1/2] raid5: refactor handle_stripe5 and handle_stripe6

2007-06-18 Thread Dan Williams
On 6/18/07, Dan Williams <[EMAIL PROTECTED]> wrote: ... +static void handle_stripe_expansion(raid5_conf_t *conf, struct stripe_head *sh, + struct r6_state *r6s) +{ + int i; + + /* We have read all the blocks in this stripe and now we n

[PATCH git-md-accel 2/2] raid5: replace custom debug print with standard pr_debug

2007-06-18 Thread Dan Williams
Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in favor of the global DEBUG definition. To get local debug messages just add '#define DEBUG' to the top of the file. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/m

[PATCH git-md-accel 1/2] raid5: refactor handle_stripe5 and handle_stripe6

2007-06-18 Thread Dan Williams
s in code duplication between raid5 and raid6. The following new routines are shared between raid5 and raid6: handle_completed_write_requests handle_requests_to_failed_array handle_stripe_expansion Signed-off-by: Dan Williams <[EMAI

[PATCH git-md-accel 0/2] raid5 refactor, and pr_debug cleanup

2007-06-18 Thread Dan Williams
I address Andrew's concerns about the commit messages. Dan Williams (2): raid5: refactor handle_stripe5 and handle_stripe6 raid5: replace custom debug print with standard pr_debug - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a me

Re: raid5: coding style cleanup / refactor

2007-06-15 Thread Dan Williams
On 6/15/07, Neil Brown <[EMAIL PROTECTED]> wrote: Good idea... Am I asking too much to have separate things in separate patches? It makes review easier. ...yeah I got a little bit carried away after the refactoring. I will spin the refactoring out into a separate patch and handle the coding

Re: raid5: coding style cleanup / refactor

2007-06-14 Thread Dan Williams
On 6/14/07, Bill Davidsen <[EMAIL PROTECTED]> wrote: When you are ready for wider testing, if you have a patch against a released kernel it makes testing easy, characteristics are pretty well known already. Thanks I went ahead and put a separate snapshot up on SourceForge: http://downloads.sour

Re: raid5: coding style cleanup / refactor

2007-06-13 Thread Dan Williams
In other words, it seemed like a good idea at the time, but I am open to suggestions. I went ahead and added the cleanup patch to the front of the git-md-accel.patch series. A few more whitespace cleanups, but no major changes from what I posted earlier. The new rebased series is still passin

Re: raid5: coding style cleanup / refactor

2007-06-12 Thread Dan Williams
> I assume that you're prepared to repair all that damage to your tree, but > it seems a bit masochistic? It's either this or have an inconsistent coding style throughout raid5.c. I figure it is worth it to have reduced code duplication between raid5 and raid6, and it makes it easier to add new

Re: Paranoid read mode for raid5/6

2007-06-11 Thread Dan Williams
On 6/11/07, Mattias Wadenstein <[EMAIL PROTECTED]> wrote: Hi *, Is there way to tell md to be paranoid and verify the parity for raid5/6 on every read? I guess this would come with a (significant) performance hit, but sometimes that's not a big deal (unlike disks scrambling your data). Also, re

[PATCH] md: comment add_stripe_bio

2007-06-05 Thread Dan Williams
From: Dan Williams <[EMAIL PROTECTED]> Document the overloading of struct bio fields. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- [ drop this if you think it is too much commenting/unnecessary, but I figured I would leave some breadcrumbs for the next guy. ] driver

[PATCH 16/16] iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver

2007-05-01 Thread Dan Williams
219 boards * do not call platform register macros in driver code * remove switch() statements for compatible register offsets/layouts * change over to bitmap based capabilities Cc: Russell King <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- arch/arm/mach-io

  1   2   3   >