Hi Jim, Congratulation - the patch works for me. Result looks good - no errors. Tomorror I will test more extensive (copy all files and compare all files). The debug output from the first patch is now identical when using the bulk copy and the single copy.
Best regards, Mike Von: Jim Meyering <j...@meyering.net> An: "Mike Gerth" <m.ge...@avm.de> Kopie: "Alan Curry" <pacman...@kosh.dhis.org>, 12...@debbugs.gnu.org Datum: 16.10.2012 17:46 Betreff: Re: bug#12656: Re[3]: bug#12656: cp since 8.11 corrupts files Mike Gerth wrote: > Hi Alan, > the difference between bulk copy (then the corruption comes) and single > file copy (everything ok) is: > > diff /PatchLogBulk.txt /PatchLogSingle.txt > 635c635 > < Julius.nsf merged extent#3 off=18239488 len=155648 flags=0x1000 > --- >> Julius.nsf merged extent#3 off=18239488 len=159744 flags=0x1000 ... > Von: "Alan Curry" <pacman...@kosh.dhis.org> > An: m.ge...@avm.de > Kopie: j...@meyering.net (Jim Meyering), 12...@debbugs.gnu.org > Datum: 16.10.2012 03:14 > Betreff: Re: bug#12656: Re[2]: bug#12656: cp since 8.11 corrupts > files > > The critical piece of the trace is here: > > lseek(3, 18239488, SEEK_SET) = 18239488 > lseek(4, 18239488, SEEK_SET) = 18239488 > read(3, "\0\0\0\0\0\0\0\0"..., 65536) = 65536 > write(4, "\0\0\0\0\0\0\0\0"..., 65536) = 65536 > read(3, "?\0\r\0\20\0\0\37"..., 65536) = 65536 > write(4, "?\0\r\0\20\0\0\37"..., 65536) = 65536 > read(3, "\20\0\0\377\0\20\0\0"..., 24576) = 24576 > write(4, "\20\0\0\377\0\20\0\0"..., 24576) = 24576 > lseek(3, 18403328, SEEK_SET) = 18403328 > lseek(4, 18403328, SEEK_SET) = 18403328 > > The 24576-byte read and write should be 4096 bytes longer. cp has got the > extents mixed up somehow. > > Here are some quick and dirty patches to make cp dump what it thinks the > extents are at a couple of different stages. It would be interesting to > see > what they say about the troublesome file. > > --- src/extent-scan.c 2012-10-15 19:24:55.112096929 -0500 > +++ src/extent-scan.c 2012-10-15 19:56:22.533209742 -0500 > @@ -131,6 +131,12 @@ > sizeof (struct extent_info)); > > unsigned int i = 0; > + for(i=0; i<fiemap->fm_mapped_extents; ++i) > + fprintf(stderr, "fd %d extent#%d log=%llu len=%llu > flags=0x%llx\n", > + scan->fd, i, > + (unsigned long long)fm_extents[i].fe_logical, > + (unsigned long long)fm_extents[i].fe_length, > + (unsigned long long)fm_extents[i].fe_flags); > for (i = 0; i < fiemap->fm_mapped_extents; i++) > { > assert (fm_extents[i].fe_logical > --- src/copy.c 2012-10-15 18:55:39.397899266 -0500 > +++ src/copy.c 2012-10-15 19:56:23.417209716 -0500 > @@ -317,6 +317,12 @@ > > unsigned int i; > bool empty_extent = false; > + for(i=0; i<scan.ei_count; ++i) > + fprintf(stderr, "%s merged extent#%d off=%llu len=%llu > flags=0x%llx\n", > + src_name, i, > + (unsigned long long)scan.ext_info[i].ext_logical, > + (unsigned long long)scan.ext_info[i].ext_length, > + (unsigned long long)scan.ext_info[i].ext_flags); > for (i = 0; i < scan.ei_count || empty_extent; i++) > { > off_t ext_start; Thanks for the additional info, Alan. With that, I have deduced that cp was using freed memory in this unusual case. How? When a file was so fragmented that it took two or more FIEMAP ioctls to obtain the list of extents, our "last_ei" pointer would be pointing into a malloc'd buffer from the first ioctl, when that buffer has to be enlarged (realloc'd) to accommodate the results of the second call. If that realloc actually freed the first buffer, our last_ei pointer (which we're about to use) was left pointing to that freed memory. Mike, can you test this preliminary patch? If it does the job, I'll add something like the above to the log, write a NEWS entry and work on adding a test. This can happen only with a FIEMAP-enabled copy. >From 25046bda75bba889e800a2df10c8c4393b106990 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Tue, 16 Oct 2012 17:43:49 +0200 Subject: [PATCH] cp: avoid data-corrupting free-memory-read * src/extent-scan.c (extent_scan_read): Reset our last_ei pointer whenever the parent buffer might have just been freed. Reported by Mike Gerth in http://bugs.gnu.org/12656, and with help from Alan Curry. --- src/extent-scan.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/extent-scan.c b/src/extent-scan.c index 0c25c57..5b47295 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -89,7 +89,7 @@ extern bool extent_scan_read (struct extent_scan *scan) { unsigned int si = 0; - struct extent_info *last_ei IF_LINT ( = scan->ext_info); + struct extent_info *last_ei = scan->ext_info; while (true) { @@ -127,8 +127,12 @@ extent_scan_read (struct extent_scan *scan) assert (scan->ei_count <= SIZE_MAX - fiemap->fm_mapped_extents); scan->ei_count += fiemap->fm_mapped_extents; - scan->ext_info = xnrealloc (scan->ext_info, scan->ei_count, - sizeof (struct extent_info)); + { + size_t prev_idx = last_ei - scan->ext_info; + scan->ext_info = xnrealloc (scan->ext_info, scan->ei_count, + sizeof (struct extent_info)); + last_ei = scan->ext_info + prev_idx; + } unsigned int i = 0; for (i = 0; i < fiemap->fm_mapped_extents; i++) -- 1.8.0.rc2