Hi,
While working on a backup script that --updates the tar file (on a disk)
with --multi-volume, I discovered that tar does not seek through the
archive and speed is much lower than expected. Are there any technical
reasons for that, other than outdated silent assumptions?
While trying to read the code and documentation, I stumbled upon this
code in buffer.c||
|if (!multi_volume_option && !use_compress_program_option && fstat
(archive, &st) == 0) seekable_archive = S_ISREG (st.st_mode); else
seekable_archive = false;|
That multi_volume_option isn't documented to make the file non-seekable
(see below). Is this just a silent and incorrect assumption that
--multi-volume always implies non-seekable tapes?
From the man page:
-n, --seek
Assume the archive is seekable. Normally tar
determines automatically whether the archive
can be seeked or not. This option is intended for use
in cases when such recognition fails.
It takes effect only if the archive is open for
reading (e.g. with --list or --extract op-
tions).
As far as I understand it, --update is mostly a combination of --compare
(which is a seeking read operation) and --append in case file size and
date differ. According to my tests, the --compare part of --update does
not seek between headers (even without --multivolume and for an
uncompressed .tar file). Can we please get a huge performance boost in
--update by making it jump from header to header (=seek) in the compare
phase? The streamed file contents seem not to be needed for anything and
slow down the process.
Best regards,
Johannes Nieß