Hi Ted! On Sun, 2010-11-28 at 23:11:52 -0500, Ted Ts'o wrote: > I did some experimenting, and I figured out what was going on. You're > right, (c) doesn't quite work, because delayed allocation meant that > the writeout didn't take place until the fsync() for each file > happened. I didn't see this at first; my apologies.
Thanks for the analysis. > However, this *does* work: > > extract(a); > sync_file_range(fd.a, 0, 0, SYNC_FILE_RANGE_WRITE); > extract(b.dpkg-new); > sync_file_range(fd.b, 0, 0, SYNC_FILE_RANGE_WRITE); > extract(c.dpkg-new); > sync_file_range(fd.c, 0, 0, SYNC_FILE_RANGE_WRITE); The man page and the kernel sources seem to indicate this might block depending on the size of the request queue? > sync_file_range(fd.a, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE); > sync_file_range(fd.b, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE); > sync_file_range(fd.c, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE); > fdatasync(a); > fdatasync(b.dpkg-new); > fdatasync(c.dpkg-new); > > rename(b.dpkg-new, b); > rename(c.dpkg-new, c); > What's going on here? sync_file_range() is a Linux specific system > call that has been around for a while. It allows program to control > when writeback happens in a very low-level fashion. The first set of > sync_file_range() system calls causes the system to start writing back > each file once it has finished being extracted. It doesn't actually > wait for the write to finish; it just starts the writeback. Hmm, ok so what about posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED) instead, skimming over the kernel source seems to indicate it might end up doing more or less the same thing but in a portable way? Could someone with ext4/btrfs/xfs/etc test w/ and w/o the attached patch against dpkg? thanks, guillem
diff --git a/src/archives.c b/src/archives.c index a2cba6a..a94096f 100644 --- a/src/archives.c +++ b/src/archives.c @@ -683,6 +683,9 @@ tarobject(void *ctx, struct tar_entry *ti) _("backend dpkg-deb during `%.255s'"), path_quote_filename(fnamebuf, ti->name, 256)); } + + posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED); + r = ti->size % TARBLKSZ; if (r > 0) if (safe_read(tc->backendpipe, databuf, TARBLKSZ - r) == -1)