On 2009-07-28 15:42 +0200, Josh Kelley wrote: > We use Debian for some embedded devices that use off-the-shelf flash > drives for their primary storage. Since upgrading from etch to lenny > and tweaking our partition layout, we've started seeing filesystem > corruption occur very rapidly after we clone the filesystem (via > partimage and resize2fs). While investigating, I've been able to > reproduce the corruption with both etch's and lenny's partimage, with > both etch's and lenny's e2fsprogs, with both the realtime-patched > kernel we used under etch and lenny's stock amd64 kernel, with flash > drives of different sizes, with different flash drive partition > layouts, and with one of our embedded devices, an off-the-shelf lenny > server, and an off-the-shelf etch server. This doesn't make any sense > to me. > > While trying to figure all of this out, I've found that I can > reproduce filesystem corruption 100% of the time simply by executing > these commands: > > mke2fs -O has_journal,resize_inode,dir_index,filetype,sparse_super,large_file > /dev/sdb2 > tune2fs -c 29 /dev/sdb2 # /dev/sdb is an external flash drive > mount /dev/sdb2 /mnt/image > cd /mnt/image > tar xf ~/data.tar # data.tar is a 71MB archive of the /var partition > cd > umount /mnt/image > e2fsck -f /dev/sdb2 > > At this point, e2fsck starts complaining with errors like this: > Symlink /lib/python-support/python2.5/_dbus_glib_bindings.so (inode > #113416) is invalid. > Clear<y>? > > Turning off has_journal or adding -o data=journal fixes the > immediately preceding problem. (I haven't tested it for our cloning > procedure.) However, I don't want to go back to ext2, and > data=journal seems to be barely documented. (What exactly does it > do?)
Quoting mount.8: All data is committed into the journal prior to being written into the main file system. In other words, your data are written to disk twice. > We've seen other errors after cloning (subdirectories that point to > their parents, "resize inode not valid", etc.), but these particular > errors are completely reproducible. The corruption occurs on more > than one flash drive. badblocks -w /dev/sdb reports no errors > (although I seem to remember one of disks being bigger running > badblocks - do flash drives remap bad sectors?). I think so. > I can't imagine that Linux or Debian would be released with this sort > of potentially severe reproducible bug but am at a loss to figure out > what I might be doing wrong or what's specific to my setup. And I > can't figure out why we're only seeing it since upgrading to lenny > when I can currently reproduce the problem under etch. > > Any help would be greatly appreciated. Thanks. I would suggest testing the flash drives with different filesystems under different operating systems. Fill it up completely, re-plug the device, read the data back and compare to the original. There had been cases of USB memory sticks with manipulated controllers produced by fraudulent manufacturers. These sticks reported a higher capacity than they really had. They never reported read or write errors, but once you filled more than half of the reported capacity, all writes would go to the same sectors, producing massive data and filesystem corruption. I had bought such a scam product myself, and it cost me many hours of grief. Sven -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org