Re: Test feedback 2.6.17.4+libata-tj-stable (EH, hotplug)

2006-07-10 Thread Tejun Heo
Christian Pernegger wrote: The fact that the disk had changed minor numbers after it was plugged back in bugs me a bit. (was sdc before, sde after). Additionally udev removed the sdc device file, so I had to manually recreate it to be able to remove the 'faulty' disk from its md array. That's b

Re: libata hotplug and md raid?

2006-09-13 Thread Tejun Heo
Ric Wheeler wrote: (Adding Tejun & Greg KH to this thread) Adding linux-ide to this thread. Leon Woestenberg wrote: [--snip--] In short, I use ext3 over /dev/md0 over 4 SATA drives /dev/sd[a-d] each driven by libata ahci. I unplug then replug the drive that is rebuilding in RAID-5. When I u

Re: Problem booting linux 2.6.19-rc5, 2.6.19-rc5-git6, 2.6.19-rc5-mm2 with md raid 1 over lvm root

2006-11-15 Thread Tejun Heo
Nicolas Mailhot wrote: The failing kernels (I tried -rc5, -rc5-git6, -rc5-mm2 only print : %< device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: [EMAIL PROTECTED] md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. %<- (I didn't bother copying the rest of the f

Re: nonzero mismatch_cnt with no earlier error

2007-03-04 Thread Tejun Heo
Eyal Lebedinsky wrote: > I CC'ed linux-ide to see if they think the reported error was really innocent: > > Question: does this error report suggest that a disk could be corrupted? > > This SATA disk is part of an md raid and no error was reported by md. > > [937567.332751] ata3.00: exception Em

Re: 2.6.20.3 AMD64 oops in CFQ code

2007-04-02 Thread Tejun Heo
[resending. my mail service was down for more than a week and this message didn't get delivered.] [EMAIL PROTECTED] wrote: > > Anyway, what's annoying is that I can't figure out how to bring the > > drive back on line without resetting the box. It's in a hot-swap enclosure, > > but power cycling

Re: 2.6.20.3 AMD64 oops in CFQ code

2007-04-03 Thread Tejun Heo
[EMAIL PROTECTED] wrote: > [EMAIL PROTECTED] wrote: >>> Anyway, what's annoying is that I can't figure out how to bring the >>> drive back on line without resetting the box. It's in a hot-swap enclosure, >>> but power cycling the drive doesn't seem to help. I thought libata hotplug >>> was workin

Re: 2.6.20.3 AMD64 oops in CFQ code

2007-04-04 Thread Tejun Heo
Lee Revell wrote: > On 4/4/07, Bill Davidsen <[EMAIL PROTECTED]> wrote: >> I won't say that's voodoo, but if I ever did it I'd wipe down my >> keyboard with holy water afterward. ;-) >> >> Well, I did save the message in my tricks file, but it sounds like a >> last ditch effort after something get

Re: Kernel 2.6.20.4: Software RAID 5: ata13.00: (irq_stat 0x00020002, failed to transmit command FIS)

2007-04-09 Thread Tejun Heo
Justin Piszcz wrote: > > > On Thu, 5 Apr 2007, Justin Piszcz wrote: > >> Had a quick question, this is the first time I have seen this happen, >> and it was not even under during heavy I/O, hardly anything was going >> on with the box at the time. > > .. snip .. > > # /usr/bin/time badblocks -

Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-26 Thread Tejun Heo
Hello, Neil Brown. Please cc me on blkdev barriers and, if you haven't yet, reading Documentation/block/barrier.txt can be helpful too. Neil Brown wrote: [--snip--] > 1/ SAFE. With a SAFE device, there is no write-behind cache, or if > there is it is non-volatile. Once a write complet

Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-28 Thread Tejun Heo
Hello, Neil Brown wrote: > 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP. > > This is certainly a very attractive position - it makes the interface > cleaner and makes life easier for filesystems and other clients of > the block interface. > Currently filesystems handle -EOPNO

Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-31 Thread Tejun Heo
Jens Axboe wrote: > On Thu, May 31 2007, David Chinner wrote: >> On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote: >>> On Thu, May 31 2007, David Chinner wrote: IOWs, there are two parts to the problem: 1 - guaranteeing I/O ordering 2 - guaranteeing blocks are on

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-31 Thread Tejun Heo
Stefan Bader wrote: > 2007/5/30, Phillip Susi <[EMAIL PROTECTED]>: >> Stefan Bader wrote: >> > >> > Since drive a supports barrier request we don't get -EOPNOTSUPP but >> > the request with block y might get written before block x since the >> > disk are independent. I guess the chances of this are

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Tejun Heo
[ cc'ing Ric Wheeler for storage array thingie. Hi, whole thread is at http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/3344 ] Hello, [EMAIL PROTECTED] wrote: > but when you consider the self-contained disk arrays it's an entirely > different story. you can easily have a few gig of

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-01 Thread Tejun Heo
[EMAIL PROTECTED] wrote: > On Fri, 01 Jun 2007 16:16:01 +0900, Tejun Heo said: >> Don't those thingies usually have NV cache or backed by battery such >> that ORDERED_DRAIN is enough? > > Probably *most* do, but do you really want to bet the user's data on it? Tho

Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-02 Thread Tejun Heo
Hello, Jens Axboe wrote: >> Would that be very different from issuing barrier and not waiting for >> its completion? For ATA and SCSI, we'll have to flush write back cache >> anyway, so I don't see how we can get performance advantage by >> implementing separate WRITE_ORDERED. I think zero-lengt

Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-04 Thread Tejun Heo
Jens Axboe wrote: > On Sat, Jun 02 2007, Tejun Heo wrote: >> Hello, >> >> Jens Axboe wrote: >>>> Would that be very different from issuing barrier and not waiting for >>>> its completion? For ATA and SCSI, we'll have to flush write back cache >

Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Tejun Heo
Hello, Mikael Pettersson wrote: > On Sat, 16 Jun 2007 15:52:33 +0400, Brad Campbell wrote: >> I've got a box here based on current Debian Stable. >> It's got 15 Maxtor SATA drives in it on 4 Promise TX4 controllers. >> >> Using kernel 2.6.21.x it shuts down, but of course with a huge "clack" as 15

Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Tejun Heo
Mikael Pettersson wrote: > FWIW, I'm seeing scsi layer accesses (cache flushes) after things > like rmmod sata_promise. They error out and don't seem to cause > any harm, but the fact that they occur at all makes me nervous. That's okay. On rmmod, as the low level device (ATA) goes away first jus

Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-06-19 Thread Tejun Heo
Hello, David Greaves wrote: >> Good :) > Now, not so good :) Oh, crap. :-) > So I hibernated last night and resumed this morning. > Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry > Dave) > > Here are some photos of the screen during resume. This is not 100% > reproduc

Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-06-20 Thread Tejun Heo
David Greaves wrote: > Tejun Heo wrote: >> Your controller is repeatedly reporting PHY readiness changed exception. >> Are you reading the system image from the device attached to the first >> SATA port? > > Yes if you mean 1st as in the one after the zero-th ... I

Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-07-02 Thread Tejun Heo
David Greaves wrote: >> Tejun Heo wrote: >>> It's really weird tho. The PHY RDY status changed events are coming >>> from the device which is NOT used while resuming > > There is an obvious problem there though Tejun (the errors even when sda > isn&#x

Re: Linux Software RAID is really RAID?

2007-07-03 Thread Tejun Heo
Brad Campbell wrote: > Johny Mail list wrote: >> Hello list, >> I have a little question about software RAID on Linux. >> I have installed Software Raid on all my SC1425 servers DELL by >> believing that the md raid was a strong driver. >> And recently i make some test on a server and try to view i

Re: Linux Software RAID is really RAID?

2007-07-03 Thread Tejun Heo
Mark Lord wrote: > I believe he said it was ICH5 (different post/thread). > > My observation on ICH5 is that if one unplugs a drive, > then the chipset/cpu locks up hard when toggling SRST > in the EH code. > > Specifically, it locks up at the instruction > which restores SRST back to the non-ass

Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-05 Thread Tejun Heo
Hello, Jens. Jens Axboe wrote: > On Mon, May 28 2007, Neil Brown wrote: >> I think the implementation priorities here are: >> >> 1/ implement a zero-length BIO_RW_BARRIER option. >> 2/ Use it (or otherwise) to make all dm and md modules handle >>barriers (and loop?). >> 3/ Devise and implement

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-10 Thread Tejun Heo
[EMAIL PROTECTED] wrote: > On Tue, 10 Jul 2007 14:39:41 EDT, Ric Wheeler said: > >> All of the high end arrays have non-volatile cache (read, on power loss, it >> is a >> promise that it will get all of your data out to permanent storage). You >> don't >> need to ask this kind of array to drai

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-10 Thread Tejun Heo
Ric Wheeler wrote: >> Don't those thingies usually have NV cache or backed by battery such >> that ORDERED_DRAIN is enough? > > All of the high end arrays have non-volatile cache (read, on power loss, > it is a promise that it will get all of your data out to permanent > storage). You don't need t

Re: Possible data corruption sata_sil24?

2007-07-18 Thread Tejun Heo
David Shaw wrote: >>> It fails whether I use a raw /dev/sdd or partition it into one large >>> /dev/sdd1, or partition into multiple partitions. sata_sil24 seems to >>> work by itself, as does dm, but as soon as I mix sata_sil24+dm, I get >>> corruption. >> H Can you reproduce the corrupti

[PATCH] block: cosmetic changes

2007-07-18 Thread Tejun Heo
Cosmetic changes. This is taken from Jens' zero-length barrier patch. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Cc: Jens Axboe <[EMAIL PROTECTED]> --- block/ll_rw_blk.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: work/

[PATCH] block: factor out bio_check_eod()

2007-07-18 Thread Tejun Heo
End of device check is done twice in __generic_make_request() and it's fully inlined each time. Factor out bio_check_eod(). This is taken from Jens' zero-length barrier patch. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Cc: Jens Axboe <[EMAIL PROTECTED]> --- bl

Re: [PATCH] block: factor out bio_check_eod()

2007-07-18 Thread Tejun Heo
Jens Axboe wrote: > On Wed, Jul 18 2007, Tejun Heo wrote: >> End of device check is done twice in __generic_make_request() and it's >> fully inlined each time. Factor out bio_check_eod(). > > Tejun, yeah I should seperate the cleanups and put them in the upstream &g

Re: [PATCH] block: factor out bio_check_eod()

2007-07-18 Thread Tejun Heo
Jens Axboe wrote: > On Wed, Jul 18 2007, Tejun Heo wrote: >> Jens Axboe wrote: >>> On Wed, Jul 18 2007, Tejun Heo wrote: >>>> End of device check is done twice in __generic_make_request() and it's >>>> fully inlined each time. Factor out bio_che

Re: [PATCH] block: factor out bio_check_eod()

2007-07-18 Thread Tejun Heo
Jens Axboe wrote: > On Wed, Jul 18 2007, Tejun Heo wrote: >> Jens Axboe wrote: >>> On Wed, Jul 18 2007, Tejun Heo wrote: >>>> Jens Axboe wrote: >>>>> On Wed, Jul 18 2007, Tejun Heo wrote: >>>>>> End of device check is done twice in _

Re: [PATCH] block: factor out bio_check_eod()

2007-07-18 Thread Tejun Heo
Jens Axboe wrote: >> somewhat annoying, I'll see if I can prefix it with git-daemon in the >> future. > > OK, now skip the /data/git/ stuff and just use > > git://git.kernel.dk/linux-2.6-block.git Alright, it works like a charm now. Thanks. -- tejun - To unsubscribe from this list: send the l

Re: Possible data corruption sata_sil24?

2007-07-19 Thread Tejun Heo
David Shaw wrote: >> I'm not sure whether this is problem of sata_sil24 or dm layer. Cc'ing >> linux-raid for help. How much memory do you have? One big difference >> between ata_piix and sata_sil24 is that sil24 can handle 64bit DMA. >> Maybe dma mapping or something interacts weirdly with dm t

Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-10 Thread Tejun Heo
Bill Davidsen wrote: > Jan Engelhardt wrote: >> On Dec 1 2007 06:26, Justin Piszcz wrote: >>> I ran the following: >>> >>> dd if=/dev/zero of=/dev/sdc >>> dd if=/dev/zero of=/dev/sdd >>> dd if=/dev/zero of=/dev/sde >>> >>> (as it is always a very good idea to do this with any new disk) >> >> Why wo

Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-10 Thread Tejun Heo
Justin Piszcz wrote: > The badblocks did not do anything; however, when I built a software raid > 5 and the performed a dd: > > /usr/bin/time dd if=/dev/zero of=fill_disk bs=1M > > [42332.936615] ata5.00: exception Emask 0x2 SAct 0x7000 SErr 0x0 action > 0x2 frozen > [42332.936706] ata5.00: spuri