Hi Ted!
On Sun, 2010-11-28 at 23:11:52 -0500, Ted Ts'o wrote:
> I did some experimenting, and I figured out what was going on. You're
> right, (c) doesn't quite work, because delayed allocation meant that
> the writeout didn't take place until the fsync() for each file
> happened. I didn't see this at first; my apologies.
Thanks for the analysis.
> However, this *does* work:
>
> extract(a);
> sync_file_range(fd.a, 0, 0, SYNC_FILE_RANGE_WRITE);
> extract(b.dpkg-new);
> sync_file_range(fd.b, 0, 0, SYNC_FILE_RANGE_WRITE);
> extract(c.dpkg-new);
> sync_file_range(fd.c, 0, 0, SYNC_FILE_RANGE_WRITE);
The man page and the kernel sources seem to indicate this might block
depending on the size of the request queue?
> sync_file_range(fd.a, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE);
> sync_file_range(fd.b, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE);
> sync_file_range(fd.c, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE);
> fdatasync(a);
> fdatasync(b.dpkg-new);
> fdatasync(c.dpkg-new);
>
> rename(b.dpkg-new, b);
> rename(c.dpkg-new, c);
> What's going on here? sync_file_range() is a Linux specific system
> call that has been around for a while. It allows program to control
> when writeback happens in a very low-level fashion. The first set of
> sync_file_range() system calls causes the system to start writing back
> each file once it has finished being extracted. It doesn't actually
> wait for the write to finish; it just starts the writeback.
Hmm, ok so what about posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED)
instead, skimming over the kernel source seems to indicate it might
end up doing more or less the same thing but in a portable way?
Could someone with ext4/btrfs/xfs/etc test w/ and w/o the attached patch
against dpkg?
thanks,
guillem
diff --git a/src/archives.c b/src/archives.c
index a2cba6a..a94096f 100644
--- a/src/archives.c
+++ b/src/archives.c
@@ -683,6 +683,9 @@ tarobject(void *ctx, struct tar_entry *ti)
_("backend dpkg-deb during `%.255s'"),
path_quote_filename(fnamebuf, ti->name, 256));
}
+
+ posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
+
r = ti->size % TARBLKSZ;
if (r > 0)
if (safe_read(tc->backendpipe, databuf, TARBLKSZ - r) == -1)