Rodrigo Queiro wrote:
I discovered this because Python's tarfile module fails to open such files with "invalid header", since it expects this field to contain an ASCII number, as described in the docs:
That's a shortcoming in tar's documentation. Tar uses the GNU format by default, which has a base-256 extension that supports negative timestamps. If you want GNU tar to refuse to use this GNU extension, please use '-H ustar'.
I fixed the documentation bug by installing the attached patches. Most of them are to bring Tar up-to-date with recently-released compilers and whatnot; patch 0003 fixes the documentation bug in question.
>From 1bdc8c22f59d44fcb7be2e735b4758e6d4b2dd8a Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Sat, 18 Nov 2017 08:39:33 -0800 Subject: [PATCH 1/5] Fix typo caught by GCC 7.2.1 * lib/wordsplit.c (wordsplit_perror): Add missing "break;". --- lib/wordsplit.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/wordsplit.c b/lib/wordsplit.c index 07d0f8a..f2ecada 100644 --- a/lib/wordsplit.c +++ b/lib/wordsplit.c @@ -1584,6 +1584,7 @@ wordsplit_perror (struct wordsplit *wsp) case WRDSE_NOSUPP: wsp->ws_error (_("command substitution is not yet supported")); + break; case WRDSE_USAGE: wsp->ws_error (_("invalid wordsplit usage")); -- 2.14.3
>From 75de6ec6425ae6fd76f7a9fe92f8426f213d120b Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Sat, 18 Nov 2017 08:39:33 -0800 Subject: [PATCH 2/5] build: update gnulib submodule to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index e210a3c..91e8348 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit e210a3cbaec0ee82a67ff8fc427e21bdd64dba1b +Subproject commit 91e834891d14dd73788290293bde0e42415c9bdb -- 2.14.3
>From 063790ca3cca7824276c447f3d5914a9a761579f Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Sat, 18 Nov 2017 08:39:33 -0800 Subject: [PATCH 3/5] Document base-256 representation in GNU format Problem reported by Rodrigo Queiro in: https://lists.gnu.org/r/bug-tar/2017-11/msg00018.html * doc/intern.texi (Standard, Extensions): Document base-256 representations. --- doc/intern.texi | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/doc/intern.texi b/doc/intern.texi index 8d20567..5ef0ee8 100644 --- a/doc/intern.texi +++ b/doc/intern.texi @@ -87,6 +87,8 @@ The @code{name}, @code{linkname}, @code{magic}, @code{uname}, and @code{gname} are null-terminated character strings. All other fields are zero-filled octal numbers in ASCII. Each numeric field of width @var{w} contains @var{w} minus 1 digits, and a null. +(In the extended @acronym{GNU} format, the numeric fields can take +other forms.) The @code{name} field is the file name of the file, with directory names (if any) preceding the file name, separated by slashes. @@ -112,14 +114,12 @@ be ignored. The @code{size} field is the size of the file in bytes; linked files are archived with this field specified as zero. -The @code{mtime} field is the data modification time of the file at -the time it was archived. It is the ASCII representation of the octal -value of the last time the file's contents were modified, represented -as an integer number of +The @code{mtime} field represents the data modification time of the file at +the time it was archived. It represents the integer number of seconds since January 1, 1970, 00:00 Coordinated Universal Time. -The @code{chksum} field is the ASCII representation of the octal value -of the simple sum of all bytes in the header block. Each 8-bit +The @code{chksum} field represents +the simple sum of all bytes in the header block. Each 8-bit byte in the header is added to an unsigned integer, initialized to zero, the precision of which shall be no less than seventeen bits. When calculating the checksum, the @code{chksum} field is treated as @@ -310,6 +310,18 @@ of an archive should have this type. @end table +For fields containing numbers or timestamps that are out of range for +the basic format, the @acronym{GNU} format uses a base-256 +representation instead of an ASCII octal number. If the leading byte +is 0xff (255), all the bytes of the field (including the leading byte) +are concatenated in big-endian order, with the result being a negative +number expressed in two's complement form. If the leading byte is +0x80 (128), the non-leading bytes of the field are concatenating in +big-endian order, with the result being a positive number expressed in +binary form. Leading bytes other than 0xff, 0x80 and ASCII octal +digits are reserved for future use, as are base-256 representations of +values that would be in range for the basic format. + You may have trouble reading a @acronym{GNU} format archive on a non-@acronym{GNU} system if the options @option{--incremental} (@option{-G}), @option{--multi-volume} (@option{-M}), @option{--sparse} (@option{-S}), or @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) were -- 2.14.3
>From 07d1f8185d4056d2db4ca125fc2808f3cffb63d8 Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Sat, 18 Nov 2017 08:39:33 -0800 Subject: [PATCH 4/5] Port to Texinfo 6.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * doc/tar.texi (Sparse Recovery): Omit â.â from anchor name, as âmakeinfoâ now complains about it. All uses changed. --- doc/sparse.texi | 4 ++-- doc/tar.texi | 26 +++++++++++++------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/doc/sparse.texi b/doc/sparse.texi index d607fb7..84fe2b0 100644 --- a/doc/sparse.texi +++ b/doc/sparse.texi @@ -143,7 +143,7 @@ format, it will also extract a file containing extension header attributes. This file can be used to expand the file to its original state. However, posix-aware @command{tar}s will usually ignore the unknown variables, which makes restoring the file more -difficult. @xref{extracting sparse v.0.x, Extraction of sparse +difficult. @xref{extracting sparse v0x, Extraction of sparse members in v.0.0 format}, for the detailed description of how to restore such members using non-GNU @command{tar}s. @end enumerate @@ -175,7 +175,7 @@ The real name of the sparse file is stored in the variable @code{GNU.sparse.name}. Thus, those @command{tar} implementations that are not aware of GNU extensions will at least extract the files into separate directories, giving the user a possibility to expand it -afterwards. @xref{extracting sparse v.0.x, Extraction of sparse +afterwards. @xref{extracting sparse v0x, Extraction of sparse members in v.0.1 format}, for the detailed description of how to restore such members using non-GNU @command{tar}s. diff --git a/doc/tar.texi b/doc/tar.texi index ec657f5..ce8508f 100644 --- a/doc/tar.texi +++ b/doc/tar.texi @@ -1521,7 +1521,7 @@ the error message is produced because there is no member named @file{folk}, only one named @file{home/myself/folk}. If you are not sure of the exact file name, use @dfn{globbing -patterns}, for example: +patterns}, for example: @smallexample $ @kbd{tar --list --file=practice.tar --wildcards '*/folk'} @@ -3155,7 +3155,7 @@ Disable extended attributes support. @xref{Extended File Attributes, xattrs}. When @command{tar} is using the @option{--files-from} option, this option instructs @command{tar} to expect file names terminated with -@acronym{NUL}, and to process file names verbatim. +@acronym{NUL}, and to process file names verbatim. This means that @command{tar} correctly works with file names that contain newlines or begin with a dash. @@ -3896,7 +3896,7 @@ tar: Exiting with failure status due to previous errors @end group @end example -The following table summarizes all position-sensitive options. +The following table summarizes all position-sensitive options. @table @option @item --directory=@var{dir} @@ -3915,24 +3915,24 @@ The following table summarizes all position-sensitive options. @itemx --no-verbatim-files-from @xref{verbatim-files-from}. -@item --recursion +@item --recursion @itemx --no-recursion @xref{recurse}. -@item --anchored -@itemx --no-anchored +@item --anchored +@itemx --no-anchored @xref{anchored patterns}. -@item --ignore-case -@itemx --no-ignore-case +@item --ignore-case +@itemx --no-ignore-case @xref{case-insensitive matches}. @item --wildcards @itemx --no-wildcards @xref{controlling pattern-matching}. -@item --wildcards-match-slash -@itemx --no-wildcards-match-slash +@item --wildcards-match-slash +@itemx --no-wildcards-match-slash @xref{controlling pattern-matching}. @item --exclude @@ -5596,7 +5596,7 @@ map file. @subsection Extended File Attributes Extended file attributes are name-value pairs that can be -associated with each node in a file system. Despite the fact that +associated with each node in a file system. Despite the fact that POSIX.1e draft which proposed them has been withdrawn, the extended file attributes are supported by many file systems. @GNUTAR{} can store extended file attributes along with the files. This feature is @@ -5711,7 +5711,7 @@ listing of ACL is printed after each file entry: @group -rw-r--r--+ smith/users 110 2016-03-16 16:07 file a: user::rw-,user:gray:-w-,group::r--,mask::rw-,other::r-- -@end group +@end group @end example @dfn{Security-Enhanced Linux} (@dfn{SELinux} for short) is a Linux @@ -10856,7 +10856,7 @@ Done @end group @end smallexample -@anchor{extracting sparse v.0.x} +@anchor{extracting sparse v0x} @cindex sparse files v.0.1, extracting with non-GNU tars @cindex sparse files v.0.0, extracting with non-GNU tars An @dfn{extended header} is a special @command{tar} archive header -- 2.14.3
>From 6b493e913328b1bc8f7cc5a913a9d2f4d3af59eb Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Sat, 18 Nov 2017 08:39:33 -0800 Subject: [PATCH 5/5] Port to gcc -Wimplicit-fallthrough=5 * src/common.h (FALLTHROUGH): New macro, for use with gcc -Wimplicit-fallthrough=5, which is now the default when used with Gnulib after commit 2017-05-16T16:23:52!egg...@cs.ucla.edu and with --enable-gcc-warnings --- src/buffer.c | 12 +++++------- src/common.h | 6 ++++++ src/compare.c | 3 +-- src/delete.c | 11 ++++------- src/extract.c | 12 +++++------- src/list.c | 9 ++++----- src/names.c | 24 ++++++++++++------------ src/sparse.c | 3 ++- src/tar.c | 9 ++++----- src/update.c | 6 ++---- 10 files changed, 45 insertions(+), 50 deletions(-) diff --git a/src/buffer.c b/src/buffer.c index 6f96c2f..51f299f 100644 --- a/src/buffer.c +++ b/src/buffer.c @@ -106,7 +106,7 @@ bool write_archive_to_stdout; /* When creating a multi-volume archive, each 'bufmap' represents a member stored (perhaps partly) in the current record buffer. Bufmaps are form a single-linked list in chronological order. - + After flushing the record to the output media, all bufmaps that represent fully written members are removed from the list, the nblocks and sizeleft values in the bufmap_head and start values @@ -1004,7 +1004,7 @@ void flush_archive (void) { size_t buffer_level; - + if (access_mode == ACCESS_READ && time_to_start_writing) { access_mode = ACCESS_WRITE; @@ -1296,8 +1296,7 @@ change_tape_menu (FILE *read_file) sys_spawn_shell (); break; } - /* FALL THROUGH */ - + FALLTHROUGH; default: fprintf (stderr, _("Invalid input. Type ? for help.\n")); } @@ -1506,8 +1505,7 @@ try_new_volume (void) header = find_next_block (); if (header->header.typeflag != GNUTYPE_MULTIVOL) break; - /* FALL THROUGH */ - + FALLTHROUGH; case GNUTYPE_MULTIVOL: if (!read_header0 (&dummy)) return false; @@ -1532,7 +1530,7 @@ try_new_volume (void) quote (bufmap_head->file_name))); return false; } - + if (strcmp (continued_file_name, bufmap_head->file_name)) { if ((archive_format == GNU_FORMAT || archive_format == OLDGNU_FORMAT) diff --git a/src/common.h b/src/common.h index 964f0b6..bbe167e 100644 --- a/src/common.h +++ b/src/common.h @@ -44,6 +44,12 @@ # define GLOBAL extern #endif +#if 7 <= __GNUC__ +# define FALLTHROUGH __attribute__ ((__fallthrough__)) +#else +# define FALLTHROUGH ((void) 0) +#endif + #define TAREXIT_SUCCESS PAXEXIT_SUCCESS #define TAREXIT_DIFFERS PAXEXIT_DIFFERS #define TAREXIT_FAILURE PAXEXIT_FAILURE diff --git a/src/compare.c b/src/compare.c index 8d609e8..fee1372 100644 --- a/src/compare.c +++ b/src/compare.c @@ -480,8 +480,7 @@ diff_archive (void) ERROR ((0, 0, _("%s: Unknown file type '%c', diffed as normal file"), quotearg_colon (current_stat_info.file_name), current_header->header.typeflag)); - /* Fall through. */ - + FALLTHROUGH; case AREGTYPE: case REGTYPE: case GNUTYPE_SPARSE: diff --git a/src/delete.c b/src/delete.c index 0b3c27f..9044ba2 100644 --- a/src/delete.c +++ b/src/delete.c @@ -187,8 +187,7 @@ delete_archive_members (void) skip_member (); break; } - - /* Fall through. */ + FALLTHROUGH; case HEADER_SUCCESS_EXTENDED: logical_status = status; break; @@ -199,7 +198,7 @@ delete_archive_members (void) set_next_block_after (current_header); break; } - /* Fall through. */ + FALLTHROUGH; case HEADER_END_OF_FILE: logical_status = HEADER_END_OF_FILE; break; @@ -210,14 +209,12 @@ delete_archive_members (void) { case HEADER_STILL_UNREAD: WARN ((0, 0, _("This does not look like a tar archive"))); - /* Fall through. */ - + FALLTHROUGH; case HEADER_SUCCESS: case HEADER_SUCCESS_EXTENDED: case HEADER_ZERO_BLOCK: ERROR ((0, 0, _("Skipping to next header"))); - /* Fall through. */ - + FALLTHROUGH; case HEADER_FAILURE: break; diff --git a/src/extract.c b/src/extract.c index 36919da..395db55 100644 --- a/src/extract.c +++ b/src/extract.c @@ -394,7 +394,7 @@ set_stat (char const *file_name, } /* Find the direct ancestor of FILE_NAME in the delayed_set_stat list. - */ + */ static struct delayed_set_stat * find_direct_ancestor (char const *file_name) { @@ -758,10 +758,9 @@ maybe_recoverable (char *file_name, bool regular, bool *interdir_made) break; stp = &st; } - /* The caller tried to open a symbolic link with O_NOFOLLOW. Fall through, treating it as an already-existing file. */ - + FALLTHROUGH; case EEXIST: /* Remove an old file, if the options allow this. */ @@ -778,8 +777,7 @@ maybe_recoverable (char *file_name, bool regular, bool *interdir_made) case KEEP_NEWER_FILES: if (file_newer_p (file_name, stp, ¤t_stat_info)) break; - /* FALL THROUGH */ - + FALLTHROUGH; case DEFAULT_OLD_FILES: case NO_OVERWRITE_DIR_OLD_FILES: case OVERWRITE_OLD_FILES: @@ -939,7 +937,7 @@ is_directory_link (const char *file_name) struct stat st; int e = errno; int res; - + res = (fstatat (chdir_fd, file_name, &st, AT_SYMLINK_NOFOLLOW) == 0 && S_ISLNK (st.st_mode) && fstatat (chdir_fd, file_name, &st, 0) == 0 && @@ -1011,7 +1009,7 @@ extract_dir (char *file_name, int typeflag) if (keep_directory_symlink_option && is_directory_link (file_name)) return 0; - + if (deref_stat (file_name, &st) == 0) { current_mode = st.st_mode; diff --git a/src/list.c b/src/list.c index 84e73ac..14388a5 100644 --- a/src/list.c +++ b/src/list.c @@ -120,7 +120,7 @@ enforce_one_top_level (char **pfile_name) { char *file_name = *pfile_name; char *p; - + for (p = file_name; *p && (ISSLASH (*p) || *p == '.'); p++) ; @@ -132,7 +132,7 @@ enforce_one_top_level (char **pfile_name) if (ISSLASH (p[pos]) || p[pos] == 0) return; } - + *pfile_name = make_file_name (one_top_level_dir, file_name); normalize_filename_x (*pfile_name); } @@ -218,7 +218,7 @@ read_and (void (*do_something) (void)) if (show_omitted_dirs_option) WARN ((0, 0, _("%s: Omitting"), quotearg_colon (current_stat_info.file_name))); - /* Fall through. */ + FALLTHROUGH; default: skip_member (); continue; @@ -273,8 +273,7 @@ read_and (void (*do_something) (void)) { case HEADER_STILL_UNREAD: ERROR ((0, 0, _("This does not look like a tar archive"))); - /* Fall through. */ - + FALLTHROUGH; case HEADER_ZERO_BLOCK: case HEADER_SUCCESS: if (block_number_option) diff --git a/src/names.c b/src/names.c index bd1e44e..f6ad9fe 100644 --- a/src/names.c +++ b/src/names.c @@ -146,7 +146,7 @@ static struct argp_option names_options[] = { {"no-wildcards-match-slash", NO_WILDCARDS_MATCH_SLASH_OPTION, 0, 0, N_("wildcards do not match '/'"), GRID+1 }, #undef GRID - + {NULL} }; @@ -160,7 +160,7 @@ file_selection_option (int key) if (p->key == key) return p; return NULL; -} +} static char const * file_selection_option_name (int key) @@ -173,7 +173,7 @@ static bool is_file_selection_option (int key) { return file_selection_option (key) != NULL; -} +} /* Either NL or NUL, as decided by the --null option. */ static char filename_terminator = '\n'; @@ -221,7 +221,7 @@ static int matching_flags = 0; /* exclude_fnmatch options */ static int include_anchored = EXCLUDE_ANCHORED; /* Pattern anchoring options used for file inclusion */ - + #define EXCLUDE_OPTIONS \ (((wildcards != disable_wildcards) ? EXCLUDE_WILDCARDS : 0) \ | matching_flags \ @@ -696,7 +696,7 @@ name_list_adjust (void) For simplicity, only a tail pointer of the list is maintained. */ - + struct name_elt *unconsumed_option_tail; /* Push an option to the list */ @@ -729,7 +729,7 @@ unconsumed_option_report (void) if (unconsumed_option_tail) { struct name_elt *elt; - + ERROR ((0, 0, _("The following options were used after any non-optional arguments in archive create or update mode. These options are positional and affect only arguments that follow them. Please, rearrange them properly."))); elt = unconsumed_option_tail; @@ -753,13 +753,13 @@ unconsumed_option_report (void) ERROR ((0, 0, _("--%s has no effect"), file_selection_option_name (elt->v.opt.option))); break; - + default: break; } elt = elt->next; } - + unconsumed_option_free (); } } @@ -967,7 +967,7 @@ handle_option (const char *str, struct name_elt const *ent) struct wordsplit ws; int i; struct option_locus loc; - + while (*str && isspace (*str)) ++str; if (*str != '-') @@ -1025,7 +1025,7 @@ read_next_name (struct name_elt *ent, struct name_elt *ret) (0, 0, N_("%s: file name read contains nul character"), quotearg_colon (ent->v.file.name))); ent->v.file.term = 0; - /* fall through */ + FALLTHROUGH; case file_list_success: if (!ent->v.file.verbatim) { @@ -1110,7 +1110,7 @@ name_next_elt (int change_dirs) name_list_advance (); break; } - /* fall through */ + FALLTHROUGH; case NELT_NAME: copy_name (ep); if (unquote_option) @@ -1128,7 +1128,7 @@ name_next_elt (int change_dirs) } unconsumed_option_report (); - + return NULL; } diff --git a/src/sparse.c b/src/sparse.c index b3a3fd3..d41c0ea 100644 --- a/src/sparse.c +++ b/src/sparse.c @@ -361,11 +361,12 @@ sparse_scan_file (struct tar_sparse_file *file) /* fall back to "raw" for this and all other files */ hole_detection = HOLE_DETECTION_RAW; #endif + FALLTHROUGH; case HOLE_DETECTION_RAW: if (sparse_scan_file_raw (file)) return true; } - + return false; } diff --git a/src/tar.c b/src/tar.c index 07a6995..3f844a8 100644 --- a/src/tar.c +++ b/src/tar.c @@ -1241,7 +1241,7 @@ parse_owner_group (char *arg, uintmax_t field_max, char const **name_option) u = u1; break; } - /* Fall through. */ + FALLTHROUGH; case LONGINT_OVERFLOW: invalid_num = arg; break; @@ -1396,8 +1396,7 @@ parse_opt (int key, char *arg, struct argp_state *state) optloc_save (OC_LISTED_INCREMENTAL, args->loc); listed_incremental_option = arg; after_date_option = true; - /* Fall through. */ - + FALLTHROUGH; case 'G': /* We are making an incremental dump (FIXME: are we?); save directories at the beginning of the archive, and include in each @@ -1522,8 +1521,7 @@ parse_opt (int key, char *arg, struct argp_state *state) case 'N': after_date_option = true; - /* Fall through. */ - + FALLTHROUGH; case NEWER_MTIME_OPTION: if (TIME_OPTION_INITIALIZED (newer_mtime_option)) USAGE_ERROR ((0, 0, _("More than one threshold date"))); @@ -2066,6 +2064,7 @@ parse_opt (int key, char *arg, struct argp_state *state) argp_error (state, _("Options '-[0-7][lmh]' not supported by *this* tar")); + exit (EX_USAGE); #endif /* not DEVICE_PREFIX */ diff --git a/src/update.c b/src/update.c index 73fbbe1..2f823e4 100644 --- a/src/update.c +++ b/src/update.c @@ -186,13 +186,11 @@ update_archive (void) { case HEADER_STILL_UNREAD: WARN ((0, 0, _("This does not look like a tar archive"))); - /* Fall through. */ - + FALLTHROUGH; case HEADER_SUCCESS: case HEADER_ZERO_BLOCK: ERROR ((0, 0, _("Skipping to next header"))); - /* Fall through. */ - + FALLTHROUGH; case HEADER_FAILURE: break; -- 2.14.3