On Fri, May 11, 2007 at 07:13:48AM +1000, David Chinner wrote: > On Thu, May 10, 2007 at 07:46:33AM -0700, Jeremy Fitzhardinge wrote: > > David Chinner wrote: > > > On Wed, May 09, 2007 at 05:54:09PM -0700, Jeremy Fitzhardinge wrote: > > > > > >> David Chinner wrote: > > >> > > >>> Suspend-resume, eh? > > >>> > > >>> There's an immediate suspect. Can you test this specifically for us? > > >>> i.e. download a known good file set, do some stuff, suspend, resume, > > >>> then check the files? If it doesn't show up the first time, can > > >>> you do it a few times just to rule it out? > > >>> > > >> Well, I've been doing suspend-resume with xfs for a while without > > >> problems; the problems seem to be recent and easily repeatable. Which > > >> just means that it could be a new suspend-resume problem, of course. > > >> > > > > > > Ok. I'm just trying to find a relatively simple test case for the > > > problem - seeing as you seem to be able to reliably reproduce this > > > we should be able to work out the trigger... > > > > > > > OK, I was able to reproduce it reliably with a script with did basically: > > > > for i in `seq 20`; do > > hg clone -U --pull a b-$i > > hg verify b-$i # always OK > > umount /home > > sleep 5 > > mount /home > > hg verify b-$i # often found truncated files > > done > > > > > > No suspend/resumes involved. The trees are linux kernel ones, so fairly > > large, but small enough to fit entirely in core. My script also > > captured xfs_bmap before/after output for files which had tended to be > > corrupted in the past, but unfortunately none of them got corrupted in > > these tests. But I do have all the trees lying around to extract more > > detail for if you like. > > Ok, so most of the of the integrity errors are processed by an > error like this: > > drivers/scsi/sata_sil24.c index contains -98 extra bytes > unpacking file drivers/scsi/sata_sil24.c 5715cdfceaca: Error -5 while > decompressing data > > That's an -EIO and not a normal error to report. Are there any > errors in dmesg or syslog corresponding to this? > > The errors tend to imply problems decompressing and patching files, > not that truncates are occurring once the files have been patched. > Can you check that what is being pulled from the repository is correct > before it gets uncompressed?
Notice that verify gets run twice. Before unmount, it's fine, after remount, it's not. That message saying that the file contains -98 extra bytes is Mercurial detecting the truncation before if tries to read and decompress the truncated bit. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/