Hi Jens!
On 25 Apr 2007, at 12:18, Jens Axboe wrote:
On Wed, Apr 25 2007, Brad Campbell wrote:
Jens Axboe wrote:
It looks to be extremely rare. Aliases are extremely rare, front
merges
are rare. And you need both to happen with the details you
outlined. But
it's a large user base, and we'
Neil Brown wrote:
I wonder if we should avoid bypassing the stripe cache if the needed stripes
are already in the cache... or if at least one needed stripe is or
if the array is degraded...
Probably in the degraded case we should never bypass the cache, as if
we do, then a sequential read of
On Wednesday April 25, [EMAIL PROTECTED] wrote:
>
> [ 756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb()
> [ 756.329615] [] cfq_merged_request+0x71/0x80
> [ 756.345046] [] cfq_merged_request+0x0/0x80
> [ 756.360216] [] elv_merged_request+0x4e/0x50
> [ 756.375647] [] __make
On Wed, Apr 25 2007, Neil Brown wrote:
> On Wednesday April 25, [EMAIL PROTECTED] wrote:
> >
> > Here's a fix for it, confirmed.
> >
>
> Patch looks good to me.
Good! And thanks for taking the time to look at this bug btw, it's had
me puzzled a bit this week. I should not have ruled out the ali
On Wednesday April 25, [EMAIL PROTECTED] wrote:
>
> Here's a fix for it, confirmed.
>
Patch looks good to me.
Hopefully Brad can still wait for the WARN_ON to fire - it might give
useful clues to why this is happening. It might be interesting.
Thanks,
NeilBrown
-
To unsubscribe from this list:
On Wed, Apr 25 2007, Brad Campbell wrote:
> Jens Axboe wrote:
>
> >It looks to be extremely rare. Aliases are extremely rare, front merges
> >are rare. And you need both to happen with the details you outlined. But
> >it's a large user base, and we've had 3-4 reports on this in the past
> >months.
Neil Brown wrote:
You could test this theory by putting a
WARN_ON(cfqq->next_rq == NULL);
at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
[ 756.311074] BUG: at block/cfq-iosched.c:543 cfq_reposition_rq_rb()
[ 756.329615] [] cfq_merged_request+0x71/0x80
[ 756.34
Jens Axboe wrote:
It looks to be extremely rare. Aliases are extremely rare, front merges
are rare. And you need both to happen with the details you outlined. But
it's a large user base, and we've had 3-4 reports on this in the past
months. So it obviously does happen. I could not make it trigge
Neil Brown wrote:
On Wednesday April 25, [EMAIL PROTECTED] wrote:
BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?
Can someone remind me which raid level(s) was/were involved?
Raid-5 gegraded here, But I've had it
On Wed, Apr 25 2007, Neil Brown wrote:
> On Wednesday April 25, [EMAIL PROTECTED] wrote:
> >
> > That's pretty close to where I think the problem is (the front merging
> > and cfq_reposition_rq_rb()). The issue with that is that you'd only get
> > aliases for O_DIRECT and/or raw IO, and that doesn
On Wednesday April 25, [EMAIL PROTECTED] wrote:
>
> That's pretty close to where I think the problem is (the front merging
> and cfq_reposition_rq_rb()). The issue with that is that you'd only get
> aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
> here. Given that front m
On Wed, Apr 25 2007, Jens Axboe wrote:
> On Wed, Apr 25 2007, Neil Brown wrote:
> > On Tuesday April 24, [EMAIL PROTECTED] wrote:
> > > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up,
> > > report the issue to
> > > [EMAIL PROTECTED]
> > > [105449.683646] cfq: busy=1,drv=
Neil Brown wrote:
How likely it would be to get two requests with the same sector number
I don't know. I wouldn't expect it to ever happen - I have seen it
before, but it was due to a bug in ext3. Maybe XFS does it
intentionally some times?
It certainly sounds like an odd thing to occur.
Ev
On Wed, Apr 25 2007, Neil Brown wrote:
> On Tuesday April 24, [EMAIL PROTECTED] wrote:
> > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up,
> > report the issue to
> > [EMAIL PROTECTED]
> > [105449.683646] cfq: busy=1,drv=0,timer=0
> > [105449.694871] cfq rr_list:
> > [105
On Tuesday April 24, [EMAIL PROTECTED] wrote:
> [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up,
> report the issue to
> [EMAIL PROTECTED]
> [105449.683646] cfq: busy=1,drv=0,timer=0
> [105449.694871] cfq rr_list:
> [105449.702715] 3108: sort=0,next=,q=0/1,a=1/0,
Jens Axboe wrote:
Ok, can you try and reproduce with this one applied? It'll keep the
system running (unless there are other corruptions going on), so it
should help you a bit as well. It will dump some cfq state info when the
condition triggers that can perhaps help diagnose this. So if you can
On Tue, Apr 24 2007, Roland Kuhn wrote:
> Hi Jens!
>
> On 24 Apr 2007, at 14:32, Jens Axboe wrote:
>
> >On Tue, Apr 24 2007, Roland Kuhn wrote:
> >>Hi Jens!
> >>
> >>[I made a typo in the Cc: list so that lkml is only included as of
> >>now. Actually I copied the typo from you ;-) ]
> >
> >Well n
Hi Jens!
On 24 Apr 2007, at 14:32, Jens Axboe wrote:
On Tue, Apr 24 2007, Roland Kuhn wrote:
Hi Jens!
[I made a typo in the Cc: list so that lkml is only included as of
now. Actually I copied the typo from you ;-) ]
Well no, you started the typo, I merely propagated it and forgot to
fix
On Tue, Apr 24 2007, Roland Kuhn wrote:
> Hi Jens!
>
> [I made a typo in the Cc: list so that lkml is only included as of
> now. Actually I copied the typo from you ;-) ]
Well no, you started the typo, I merely propagated it and forgot to fix
it up :-)
> >On Tue, Apr 24 2007, Jens Axboe wrote:
Hi Jens!
[I made a typo in the Cc: list so that lkml is only included as of
now. Actually I copied the typo from you ;-) ]
On 24 Apr 2007, at 11:40, Jens Axboe wrote:
On Tue, Apr 24 2007, Jens Axboe wrote:
On Tue, Apr 24 2007, Roland Kuhn wrote:
Hi Jens!
On 24 Apr 2007, at 11:18, Jens Ax
On Sun, Apr 22 2007, Brad Campbell wrote:
> Jens Axboe wrote:
> >
> >Thanks for testing Brad, be sure to use the next patch I sent instead.
> >The one from this mail shouldn't even get you booted. So double check
> >that you are still using CFQ :-)
> >
>
> [184901.576773] BUG: unable to handle ker
Jens Axboe wrote:
Thanks for testing Brad, be sure to use the next patch I sent instead.
The one from this mail shouldn't even get you booted. So double check
that you are still using CFQ :-)
[184901.576773] BUG: unable to handle kernel NULL pointer dereference at
virtual address 005c
[1
On Wed, Apr 18 2007, Brad Campbell wrote:
> Jens Axboe wrote:
>
> >I had something similar for generic_unplug_request() as well, but didn't
> >see/hear any reports of it being tried out. Here's a complete debugging
> >patch for this and other potential dangers.
> >
>
> I had a clean 2.6.21-rc7 th
On Wed, Apr 18 2007, Jens Axboe wrote:
> I had something similar for generic_unplug_request() as well, but didn't
> see/hear any reports of it being tried out. Here's a complete debugging
> patch for this and other potential dangers.
Which had a bug (do the check _after_ deleting from the rbtree,
Jens Axboe wrote:
I had something similar for generic_unplug_request() as well, but didn't
see/hear any reports of it being tried out. Here's a complete debugging
patch for this and other potential dangers.
I had a clean 2.6.21-rc7 that I forgot to change the default sched on take down my mai
On Tue, Apr 17 2007, Neil Brown wrote:
> On Monday April 16, [EMAIL PROTECTED] wrote:
> >
> > cfq_dispatch_insert() was called with rq == 0. This one is getting really
> > annoying... and md is involved again (RAID0 this time.)
>
> Yeah... weird.
> RAID0 is so light-weight and so different from R
Hi,
On Tuesday 17 April 2007, Neil Brown wrote:
> On Monday April 16, [EMAIL PROTECTED] wrote:
> >
> > cfq_dispatch_insert() was called with rq == 0. This one is getting really
> > annoying... and md is involved again (RAID0 this time.)
>
> Yeah... weird.
> RAID0 is so light-weight and so diffe
Neil Brown wrote:
On Monday April 16, [EMAIL PROTECTED] wrote:
cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)
Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe
Neil Brown wrote:
On Monday April 16, [EMAIL PROTECTED] wrote:
cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)
Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe
On Monday April 16, [EMAIL PROTECTED] wrote:
>
> cfq_dispatch_insert() was called with rq == 0. This one is getting really
> annoying... and md is involved again (RAID0 this time.)
Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that
Brad Campbell wrote:
> Brad Campbell wrote:
>> G'day all,
>>
>> All I have is a digital photo of this oops. (It's 3.5mb). I have
>> serial console configured, but Murphy is watching me carefully and I
>> just can't seem to reproduce it while logging the console output.
>>
>
> And as usual, after t
Adrian Bunk wrote:
[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ]
On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote:
Brad Campbell wrote:
G'day all,
All I have is a digital photo of this oops. (It's 3.5mb). I have serial
console configured, but Mur
[ Cc's added, additional information is in http://lkml.org/lkml/2007/4/15/32 ]
On Sun, Apr 15, 2007 at 02:49:29PM +0400, Brad Campbell wrote:
> Brad Campbell wrote:
> >G'day all,
> >
> >All I have is a digital photo of this oops. (It's 3.5mb). I have serial
> >console configured, but Murphy is wa
Brad Campbell wrote:
G'day all,
All I have is a digital photo of this oops. (It's 3.5mb). I have serial
console configured, but Murphy is watching me carefully and I just can't
seem to reproduce it while logging the console output.
And as usual, after trying to capture one for 4 days, I ge
G'day all,
All I have is a digital photo of this oops. (It's 3.5mb). I have serial console configured, but
Murphy is watching me carefully and I just can't seem to reproduce it while logging the console output.
http://www.fnarfbargle.com/CIMG0736.JPG
I had it die the same way using plain 2.6.
35 matches
Mail list logo