Dear René ,

René Scharfe wrote:
Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)

Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So now I'm trying.

Would it make sense to support the new options for ZIP files as well?

I was not aware of the availability of UID/GID in pkzip file format...
Oh, checking APPNOTE.TXT ( https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
there is a storage! (see 4.5.7-Unix Extra Field). But it seems
that current git-archive emits pkzip without the field.

The background why I propose the options for tar format was described
in above. Similar things are hoped by pkzip users? If it's required,
I will try.

+--owner=<name>[:<uid>]::
+       Force <name> as owner and <uid> as uid for the files in the tar
+       archive.  If <uid> is not supplied, <name> can be either a user
+       name or numeric UID.  In this case the missing part (UID or
+       name) will be inferred from the current host's user database.
+
+--group=<name>[:<gid>]::
+       Force <name> as group and <gid> as gid for the files in the tar
+       archive.  If <gid> is not supplied, <name> can be either a group
+       name or numeric GID.  In this case the missing part (GID or
+       name) will be inferred from the current host's group database.
+

IIUC the default behavior is kept, i.e. without these options the
archive entries appear to be owned by root:root.  I think it's a good
idea to mention this here.

Indeed. The default behaviour of git-archive without these options
(root:root) would be different from that of (common) tar (preserving
uid/gid of the files to be archived), it should be clarified.

bsdtar has --uname, --uid, --gname, and -gid, which seem simpler.  At
least you could use OPT_STRING and OPT_INTEGER with them (plus a range
check).  And they should be easier to explain.

Thank you very much for proposing good alternative. Indeed, such well-
separated options make the code simple & stable. However, according
to the manual search systems of FreeBSD ( https://www.freebsd.org/cgi/man.cgi ),
the options for such functionalities are not always same.

FreeBSD 8.2 and earlier: --uname, --gname, --uid, --gid are unavailable.
it seems that using "mtree" was the preferred way to specify such).

FreeBSD 8.3 and later: --uname, --gname, --uid, --gid are available.
the manual says follows:

     --uid id
             Use the provided user id number and ignore the user name from the
             archive.  On create, if --uname is not also specified, the user
             name will be set to match the user id.

     --uname name
             Use the provided user name.  On extract, this overrides the user
             name in the archive; if the provided user name does not exist on
             the system, it will be ignored and the user id (from the archive
             or from the --uid option) will be used instead.  On create, this
             sets the user name that will be stored in the archive; the name
             is not verified against the system user database.

Thus, to emulate (post 2012-) bsdtar perfectly, getpwnam(), getpwuid() etc
would be still needed to implement "--uid" X-(.

Tracking the history of bsdtar, maybe I should track the history of GNU
tar. According to ChangeLog, even --owner --group are rather newer option
since 1.13.18 (released on 2000-10-29). The original syntax was like this.

`--owner=USER'
     Specifies that `tar' should use USER as the owner of members when
     creating archives, instead of the user associated with the source
     file.  USER is first decoded as a user symbolic name, but if this
     interpretation fails, it has to be a decimal numeric user ID.

     There is no value indicating a missing number, and `0' usually
     means `root'.  Some people like to force `0' as the value to offer
     in their distributions for the owner of files, because the `root'
     user is anonymous anyway, so that might as well be the owner of
     anonymous archives.

     This option does not affect extraction from archives.

Oh, there is no colon separated syntax! According to ChangeLog, the
introduction of colon separated syntax was on 2011-08-13 and
released as GNU tar-1.27 (2013-10-06).

`--owner=USER'
     Specifies that `tar' should use USER as the owner of members when
     creating archives, instead of the user associated with the source
     file.  USER can specify a symbolic name, or a numeric ID, or both
     as NAME:ID.  *Note override::.

     This option does not affect extraction from archives.

Comparing the original --owner and current --owner description, a
strange point is that the original description says "USER is first
decoded as a user symbolic name, but if this interpretation fails,
it has to be a decimal numeric user ID." What? It seems that
"checking whether the specified username is known by the host system
and its numerical uid is resolvable - if unresolvable, try to
parse as decimal value - if failed, take it as fatal error". Here
I quote the related part.

tar-1.14/src/names.c

119 /* Given UNAME, set the corresponding UID and return 1, or else, return 0. */
    120 int
    121 uname_to_uid (char const *uname, uid_t *uidp)
    122 {
    123   struct passwd *passwd;
    124
    125   if (cached_no_such_uname
    126       && strcmp (uname, cached_no_such_uname) == 0)
    127     return 0;
    128
    129   if (!cached_uname
    130       || uname[0] != cached_uname[0]
    131       || strcmp (uname, cached_uname) != 0)
    132     {
    133       passwd = getpwnam (uname);
    134       if (passwd)
    135         {
    136           cached_uid = passwd->pw_uid;
    137           assign_string (&cached_uname, passwd->pw_name);
    138         }
    139       else
    140         {
    141           assign_string (&cached_no_such_uname, uname);
    142           return 0;
    143         }
    144     }
    145   *uidp = cached_uid;
    146   return 1;
    147 }   1087       case OWNER_OPTION:

tar-1.14/src/tar.c
   1088         if (! (strlen (optarg) < UNAME_FIELD_SIZE
   1089                && uname_to_uid (optarg, &owner_option)))
   1090           {
   1091             uintmax_t u;
   1092             if (xstrtoumax (optarg, 0, 10, &u, "") == LONGINT_OK
   1093                 && u == (uid_t) u)
   1094               owner_option = u;
   1095             else
   1096               FATAL_ERROR ((0, 0, "%s: %s", quotearg_colon (optarg),
   1097                             _("Invalid owner")));
   1098           }
   1099         break;

In summary, there are following types.

a) older GNU tar
--owner must match with the host database, no option to set
uname & uid separately.

b) newer GNU tar
--owner accepts unknown username and/or uid.
if only one part is given and known by the host system,
the missing part is deduced by it.
if only one part is given and unknown by the host system,
the missing part is unchanged from the file to be archived.

c) newer bsd tar
--uname/--uid accept unknown username and/or uid.
username is just used to override uname entry of the archive,
but uid is used to override both of uid and uname entries,
if uid is known and username is not specified.
If uid is unknown, uid is overriden, but the username entry
is unchanged from the file to be archived.

which behaviour is to be simulated? I want to propose
yet another one, similar to c) but incompatble.

d) --uname, --uid, --gname, --gid check only the syntax
(to kick the username starting with digit, non-digit
uid, etc) and no check for known/unknown.

+#if ULONG_MAX > 0xFFFFFFFFUL
+       /*
+        * --owner, --group rejects uid/gid greater than 32-bit
+        * limits, even on 64-bit platforms.
+        */
+       if (ul > 0xFFFFFFFFUL)
+               return STR_IS_DIGIT_TOO_LARGE;
+#endif

The #if is not really necessary, is it?  Compilers should be able to
optimize the conditional out on 32-bit platforms.

Thanks for finding this, I'm glad to have a chance to ask a
question; git is not needed to care for 16-bit platforms?

+       /* the operand is known to be non-digit */
+
+       args->uname = xstrdup(tar_owner);
+       pw = getpwnam(tar_owner);

How well does this work on Windows?  In daemon.c we avoid calling
getpwnam(3), getgrnam(3) etc. if NO_POSIX_GOODIES is not defined.

OK, I can enclose them by ifdefs of NO_POSIX_GOODIES. But,
maybe the design the options would be discussed for first.

Both of latest GNU and BSD tar call getpwnam() or getpwuid(),
but designing as all of --uname --uid --gname --gid as "only syntax
is checked (non-digit UID/GID should be refused), but known/unknown
is not checked" would be the most portable.

GNU tar and bsdtar show the names of owner and group with -t -v at
least, albeit in slightly different formats.  Can this help avoid
parsing the archive on our own?

Yeah, writing yet another tar archive parser in C, to avoid the additional
dependency to Python or newer Perl (Archive::Tar since perl-5.10), is
painful, I feel (not only for me but also for the maintainers).
If tar command itself works well, it would be the best.

But, I'm not sure whether the format of "tar tv" output is stably
standardized. It's the reason why I wrote Python tool. If I execute
git-archive with sufficently long randomized username & uid in
several times, it would be good test?

But getting a short program like zipdetails for tar would be nice as
well of course. :)

I wrote something in C:
https://github.com/mpsuzuki/git/blob/pullreq-20171227-c/t/helper/test-parse-tar-file.c

but if somebody wants the support of other tar variants,
he/she would have some headache :-)

Regards,
mpsuzuki

Reply via email to