Hello list,
----- Original Message ----- From: "David Chinner" <[EMAIL PROTECTED]> To: "Janos Haar" <[EMAIL PROTECTED]> Cc: <linux-kernel@vger.kernel.org> Sent: Tuesday, December 25, 2007 10:13 AM Subject: Re: xfs|loop|raid: attempt to access beyond end of device > On Sun, Dec 23, 2007 at 08:21:08PM +0100, Janos Haar wrote: > > Hello, list, > > > > I have a little problem on one of my productive system. > > > > The system sometimes crashed, like this: > > > > Dec 23 08:53:05 Albohacen-global kernel: attempt to access beyond end of > > device > > Dec 23 08:53:05 Albohacen-global kernel: loop0: rw=1, want=50552830649176, > > limit=3085523200 > > Dec 23 08:53:05 Albohacen-global kernel: Buffer I/O error on device loop0, > > logical block 6319103831146 > > Dec 23 08:53:05 Albohacen-global kernel: lost page write due to I/O error on > > loop0 > > So a long way beyond the end of the device. > > [snip soft lockup warnings] > > > Dec 23 09:08:19 Albohacen-global kernel: Filesystem "loop0": Access to block > > zero in inode 397821447 start_block: 0 start_off: 0 blkcnt: 0 extent-state: > > 0 lastx: e4 > > And that's to block zero of the filesystem. Sure signs of a corupted inode > extent btree. We've seen a few of these corruptions on loopback device > reported recently. > > You'll need to unmount and repair the filesystem to make this go away, > but it's hard to know what is causing the btree corruption. Yes, if i do that, the fs clean again, and works well until we moves big amount of data on the storage. But the problem comes out again, and again. If i can, i will try the latest stable kernel, with some loop fixes, but it is not too easy. This is a productive system. > > > Dec 23 09:08:22 Albohacen-global last message repeated 19 times > > > > some more info: > > > > [EMAIL PROTECTED] ~]# uname -a > > Linux Albohacen-global 2.6.21.1 #3 SMP Thu May 3 04:33:36 CEST 2007 x86_64 > > x86_64 x86_64 GNU/Linux > > [EMAIL PROTECTED] ~]# cat /proc/mdstat > > Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] > > [multipath] [faulty] > > md1 : active raid4 sdf2[1] sde2[0] sdd2[5] sdc2[4] sdb2[3] sda2[2] > > 19558720 blocks level 4, 64k chunk, algorithm 0 [6/6] [UUUUUU] > > bitmap: 8/239 pages [32KB], 8KB chunk > > > > md2 : active raid4 sdf3[1] sde3[0] sdd3[5] sdc3[4] sdb3[3] sda3[2] > > 1542761600 blocks level 4, 64k chunk, algorithm 0 [6/6] [UUUUUU] > > bitmap: 0/148 pages [0KB], 1024KB chunk > > > > md0 : active raid1 sdb1[1] sda1[0] > > 104320 blocks [2/2] [UU] > > > > unused devices: <none> > > [EMAIL PROTECTED] ~]# losetup /dev/loop0 > > /dev/loop0: [0010]:6598 (/dev/md2), encryption blowfish (type 18) > > You're using an encrypted block device? Yes. > What mechanism are you using for > encryption (doesn't appear to be dmcrypt)? No, this is the "old way", not the dmcrypt. This is cryptoloop. > Does it handle readahead bio > cancellation correctly? I dont know exactly, but in my log it is always a write failures! ( i think from rw=1 and lost page write messages) It is the readahead important in this case? > We had similar XFS corruption problems on dmcrypt > between 2.6.14 and ~2.6.20 due to a bug in dmcrypt's failure to handle > aborted readahead I/O correctly.... Ok, what is the next step? :-) (i will try the latest kernel, if i can....) Can i switch to dmcrypt from cryptoloop without rebuild the fs? Thanks a lot, Janos Haar > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/