Hi all, In testing large (>4TB) device support on 2.6, I've been using a simple write/verify test to check both block device and regular file correctness.
Set to write 1MB poison patterns for the whole of a file until EOF is encountered, it worked just fine on ext3: the file got a short write on the last write, leaving the file at its largest permitted size of 0x1fffffff000 (2^32 sectors minus a page.) Verify works fine. This 2^32 sector limit is set in ext3_max_size(), which has the comment /* * Maximal file size. There is a direct, and {,double-,triple-}indirect * block limit, and also a limit of (2^32 - 1) 512-byte sectors in i_blocks. * We need to be 1 filesystem block less than the 2^32 sector limit. */ Trouble is, that limit *should* be an i_blocks limit, because i_blocks is still 32-bits, and (more importantly) is multiplied by the fs blocksize / 512 in stat(2) to return st_blocks in 512-byte chunks. Overflow 2^32 sectors in i_blocks and stat(2) wraps. But i_blocks includes indirect blocks as well as data, so for a non-sparse file we wrap stat(2) st_blocks well before the file is 2^32*512 bytes long. Yet ext3_max_size() doesn't understand this: it simply caps the size with if (res > (512LL << 32) - (1 << bits)) res = (512LL << 32) - (1 << bits); so write() keeps writing past the wrap, resulting in a file which looks like: [EMAIL PROTECTED] scratch]# ls -lh verif-file9.tmp -rw-r--r-- 1 root root 2.0T Feb 10 05:49 verif-file9.tmp [EMAIL PROTECTED] scratch]# du -h verif-file9.tmp 2.1G verif-file9.tmp Worse comes at e2fsck time: near the end of walking the indirect tree, e2fsck decides that the file has grown too large, as in this fsck -n output: Pass 1: Checking inodes, blocks, and sizes Inode 20 is too big. Truncate? no Block #536346622 (980630816) causes file to be too big. IGNORED. Block #536346623 (980630817) causes file to be too big. IGNORED. Block #536346624 (980630818) causes file to be too big. IGNORED. ... Whoops. e2fsck sees that st_blocks is too large at this point, and decides that it wants to truncate the file to make it fit. So if a user has legitimately created such a file, fsck will effectively attempt to corrupt it at the next fsck. So who is right? Should ext3 let the file grow that large? For now, I think we need to constrain ext2/3 files so that i_blocks does not exceed 2^32*512/blocksize. Even if we fix up all the stat() stuff to pass back 64-bit st_blocks, we still have every e2fsck in existence which will not be able to deal with those files. Eventually 64-bit st_blocks would be good to have, but it needs to have a fs feature flag to let e2fsck know about it. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/