Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Bruno Haible
Hi Paul,

> I'm addressing https://savannah.gnu.org/bugs/?56834
> 
> I want to invoke a child process, and I need the program I'm going to
> invoke to be searched for in a PATH which is (potentially) different
> than the PATH which is active in the parent.
> 
> When you use execlp(), you can just assign the child's environment
> before you invoke the exec.  Since you've already forked you're not
> modifying the parent's environment.
> 
> However, with posix_spawnp() that doesn't work

Thanks for explaining.

> the path search is done
> on the _parent's_ PATH environment value not the one passed in to
> posix_spawnp().  I don't want that, I need the search done in the
> child's environment.  I can't decide if this behavior by posix_spawnp()
> is correct or not, and the POSIX spec isn't clear, but it's how it
> works (at least on GNU/Linux).

For me, the POSIX spec [1] is clear: envp "constitute[s] the environment
for the new process image". It does not say that the search for the program
is done through envp.

All implementations agree on this: In the attached test, I get the output
  Callee found in parent's $PATH.
in all tested platforms (glibc/Linux, glibc/kFreeBSD, glibc/Hurd, Mac OS X,
FreeBSD 11, NetBSD 7, AIX, Solaris 10, Solaris 11, Solaris OpenIndiana,
Haiku) and with gnulib's replacement (tested on Solaris 9). So, there is
no portability issue here.

> At the same time, I don't want to be running putenv() to set then
> reset the PATH environment variable in the parent every time I invoke
> a command.
> 
> So what I want to do is run find_in_path_str() passing in the PATH from
> the child's environment then pass the result to posix_spawn.

Makes sense.

> I suppose an even simpler interface, for me, would be something like:
> 
> find_in_path_envp (const char *prog, const char **envp);
> 
> where it would look up PATH in the environment passed in, so I don't
> have to do the lookup of PATH, and use that instead of getenv().  But I
> don't know if that's as generally useful an interface.

If there were many libc functions that take a '[const] char**' environment
as argument, I would agree that this was the way to go. But there are no
getenv, setenv, putenv variants that take a 'char **' arguments; so it appears
that POSIX and libc APIs don't recognize 'char **' as a valuable data type.
I therefore find it better to define the function with just the PATH value
as argument.

> Because that's not my use-case :).  I already have a PATH string, it's
> just not in my current environment.

OK.

> prog = find_in_path_str ("myprog", lookup_path ());
> 
> and if lookup_path() returns NULL it defaults to the environment PATH,

I don't think NULL should be interpreted as "look in the default PATH".
Rather, the convention is that an empty or null PATH means the current
directory. Try
  $ (cd /tmp; PATH=; cat --version)
  $ (cd /bin; PATH=; cat --version)
  $ (cd /tmp; unset PATH; cat --version)
  $ (cd /bin; unset PATH; cat --version)

> rather than having to do something like:
> 
> const char *path = lookup_path ();
> if (path)
>   prog = find_in_path_str ("myprog", path);
> else
>   prog = find_in_path ("myprog");

If the caller needs this logic, it should better be coded explicitly like this.

Note also that what you need is not merely an *attempt* to determine the
filename of the executable, but you need it always - since you _don't_ want
the program to be looked up in the parent's PATH. Thus you need also a
return value that indicates that the program was not found in PATH. Otherwise,
assume
  parent PATH = "/bin:/usr/bin"
  child PATH = "/tmp"
  program = "cat"
the find_in_path_str search would do a lookup in the child PATH, not find it,
return "cat", and the posix_spawnp would then find "cat" in the parent PATH
and invoke /bin/cat.

This, in turn, means that we need to provide also an implementation for
Windows, Cygwin, EMX, DOS.

And this means that it can't really share much with the existing findprog.c.
So the implementation should go into a different .c file.

Bruno

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html


spawn-test.tar.gz
Description: application/compressed-tar


sc_prohibit_magic_number_exit false positive on string

2019-09-08 Thread Darshit Shah
I just realized that the syntax check rule sc_prohibit_magic_number_exit will
cause a false positive when it finds the relevant tokens within a string as
well.

For example, in Wget, we have the following snippet in our tests which trips
this rule:

> WGET_TEST_EXPECTED_FILES, &(wget_test_file_t []) {
>   { "exit-status.txt", "exit(8)\n" },
>   {   NULL } },

I made a very tiny change to the rule in maint.mk(L408) to account for this:

-exclude='exit \(77\)|error ?\(((0|77),|[^,]*)' \
+exclude='exit \(77\)|error ?\(((0|77),|[^,]*)|"(usage|exit|error).*"'  
\

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Paul Smith
On Sun, 2019-09-08 at 13:38 +0200, Bruno Haible wrote:
> > the path search is done
> > on the _parent's_ PATH environment value not the one passed in to
> > posix_spawnp().  I don't want that, I need the search done in the
> > child's environment.  I can't decide if this behavior by
> > posix_spawnp() is correct or not, and the POSIX spec isn't clear,
> > but it's how it works (at least on GNU/Linux).
> 
> For me, the POSIX spec [1] is clear: envp "constitute[s] the
> environment for the new process image". It does not say that the
> search for the program is done through envp.

It doesn't make clear where the search is done, one way or the other. 
But I agree there's no portability problem here.

> If there were many libc functions that take a '[const] char**'
> environment as argument, I would agree that this was the way to go.
> But there are no getenv, setenv, putenv variants that take a 'char
> **' arguments; so it appears that POSIX and libc APIs don't recognize
> 'char **' as a valuable data type.

Well, none of those functions need a char**.  But char** appears in a
number of other POSIX functions (any function that takes an environment
for example!  Which is what this function would do.)  However either
way is fine with me.

> > prog = find_in_path_str ("myprog", lookup_path ());
> > 
> > and if lookup_path() returns NULL it defaults to the environment
> > PATH,
> 
> I don't think NULL should be interpreted as "look in the default
> PATH".  Rather, the convention is that an empty or null PATH means
> the current directory.

I find that VERY odd, and possibly a security risk.  According to
POSIX, the behavior if PATH is not present or is empty is
implementation-defined and my preference/expectation would be to behave
as if the program is not found.  However, that's a different
discussion.

> > rather than having to do something like:
> > 
> > const char *path = lookup_path ();
> > if (path)
> >   prog = find_in_path_str ("myprog", path);
> > else
> >   prog = find_in_path ("myprog");
> 
> If the caller needs this logic, it should better be coded explicitly
> like this.
> 
> Note also that what you need is not merely an *attempt* to determine
> the filename of the executable, but you need it always - since you
> _don't_ want the program to be looked up in the parent's PATH. Thus
> you need also a return value that indicates that the program was not
> found in PATH.

There is already a return value to that effect in find_prog_in_path():
it returns the same pointer that was passed in.  I don't see why we
should make the interface any different for the new function.  It
allows the returned value to always be usable, avoids memory allocation
if not needed, and provides a simple way to check whether the file was
found or not.

> Otherwise,
> assume
>   parent PATH = "/bin:/usr/bin"
>   child PATH = "/tmp"
>   program = "cat"
> the find_in_path_str search would do a lookup in the child PATH, not
> find it, return "cat", and the posix_spawnp would then find "cat" in
> the parent PATH and invoke /bin/cat.

Sorry, I should have called it out more clearly, but in my email I said
that after this lookup we would invoke posix_spawn() (no "p").  You're
right, but if the user doesn't want the system to do the lookup she
simply won't use the posix_spawnp() method.  I wrote:

> > So what I want to do is run find_in_path_str() passing in the PATH
> > from the child's environment then pass the result to posix_spawn.

> This, in turn, means that we need to provide also an implementation
> for Windows, Cygwin, EMX, DOS.

Yes, clearly that would be ideal--it's not needed for my use-case since
I use it with posix_spawn() which obviously doesn't exist on these
other platforms, but additional portability would be a good thing.

The current implementation of find_prog_in_path() seems to use that
same reasoning to avoid implementing path search on these other
platforms: it apparently assumes that the calling code can just leave
it to the platform to resolve instead.

> And this means that it can't really share much with the existing
> findprog.c.  So the implementation should go into a different .c
> file.

I don't see why.  Path lookup would use identical methods so if it's
useful to have find_prog_in_path_str() implemented for all these other
platforms, why wouldn't that same code be used to allow
find_prog_in_path() to work properly for all these other platforms as
well?




Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Bruno Haible
Hi Paul,

> > > prog = find_in_path_str ("myprog", lookup_path ());
> > > 
> > > and if lookup_path() returns NULL it defaults to the environment
> > > PATH,
> > 
> > I don't think NULL should be interpreted as "look in the default
> > PATH".  Rather, the convention is that an empty or null PATH means
> > the current directory.
> 
> I find that VERY odd, and possibly a security risk.  According to
> POSIX, the behavior if PATH is not present or is empty is
> implementation-defined and my preference/expectation would be to behave
> as if the program is not found.  However, that's a different
> discussion.

Your preference does not match what systems do.

Things are most robust if - the other differences set aside -
  find_in_path (prog)
is equivalent to
  find_in_path_str (prog, getenv ("PATH")).

I wouldn't want the two to make different interpretations of the
empty or absent PATH.

> > Note also that what you need is not merely an *attempt* to determine
> > the filename of the executable, but you need it always - since you
> > _don't_ want the program to be looked up in the parent's PATH. Thus
> > you need also a return value that indicates that the program was not
> > found in PATH.
> 
> There is already a return value to that effect in find_prog_in_path():
> it returns the same pointer that was passed in.

No, the returned pointer is the same as the argument not only in the
"not found" case, but also in the "already contains a slash" case.

> > Otherwise,
> > assume
> >   parent PATH = "/bin:/usr/bin"
> >   child PATH = "/tmp"
> >   program = "cat"
> > the find_in_path_str search would do a lookup in the child PATH, not
> > find it, return "cat", and the posix_spawnp would then find "cat" in
> > the parent PATH and invoke /bin/cat.
> 
> Sorry, I should have called it out more clearly, but in my email I said
> that after this lookup we would invoke posix_spawn() (no "p").

Makes sense.

On the other hand, if the function already determined that the program
cannot be found, there's no point in calling posix_spawn() or execl().
To allow the caller to do this optimization, a NULL return value is useful.

> > This, in turn, means that we need to provide also an implementation
> > for Windows, Cygwin, EMX, DOS.
> 
> Yes, clearly that would be ideal

OK, I just spent 1 hour determining the suffixes that the different
platforms use in their search.

> it's not needed for my use-case since
> I use it with posix_spawn() which obviously doesn't exist on these
> other platforms

I still hope someone will write a posix_spawn for native Windows...

> > And this means that it can't really share much with the existing
> > findprog.c.  So the implementation should go into a different .c
> > file.
> 
> I don't see why.

The original findprog.c does not need platform-specific knowledge,
since by definition it's merely an optimization. The code with
platform-specific knowledge will be significantly different.

Bruno




Re: sc_prohibit_magic_number_exit false positive on string

2019-09-08 Thread Jim Meyering
On Sun, Sep 8, 2019 at 6:06 AM Darshit Shah  wrote:
> I just realized that the syntax check rule sc_prohibit_magic_number_exit will
> cause a false positive when it finds the relevant tokens within a string as
> well.
>
> For example, in Wget, we have the following snippet in our tests which trips
> this rule:
>
> > WGET_TEST_EXPECTED_FILES, &(wget_test_file_t []) {
> >   { "exit-status.txt", "exit(8)\n" },
> >   {   NULL } },
>
> I made a very tiny change to the rule in maint.mk(L408) to account for this:
>
> -exclude='exit \(77\)|error ?\(((0|77),|[^,]*)' \
> +exclude='exit \(77\)|error ?\(((0|77),|[^,]*)|"(usage|exit|error).*"'
>   \

Hi Darshit, that feels a little too specific.
Did you consider exempting that file from this one check?
You can do that by adding a line like the following to cfg.mk:

exclude_file_name_regexp--sc_prohibit_magic_number_exit =
offending-file-regexp\.c$$



Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Bruno Haible
Hi Paul,

Here's the implementation I'm committing.

I feel that merging the two functions in a single file would add more
contortions than benefit.


2019-09-08  Bruno Haible  

findprog-in: New module.
Suggested by Paul Smith .
* lib/findprog.h (find_in_given_path): New declaration.
* lib/findprog-in.c: New file, based on lib/findprog.c.
* m4/findprog-in.m4: New file, based on m4/findprog.m4.
* modules/findprog-in: New file.

diff --git a/lib/findprog.h b/lib/findprog.h
index a354f67..9bc8a60 100644
--- a/lib/findprog.h
+++ b/lib/findprog.h
@@ -21,16 +21,29 @@ extern "C" {
 #endif
 
 
-/* Look up a program in the PATH.
-   Attempt to determine the pathname that would be called by execlp/execvp
-   of PROGNAME.  If successful, return a pathname containing a slash
-   (either absolute or relative to the current directory).  Otherwise,
-   return PROGNAME unmodified.
+/* Looks up a program in the PATH.
+   Attempts to determine the pathname that would be called by execlp/execvp
+   of PROGNAME.  If successful, it returns a pathname containing a slash
+   (either absolute or relative to the current directory).  Otherwise, it
+   returns PROGNAME unmodified.
Because of the latter case, callers should use execlp/execvp, not
execl/execv on the returned pathname.
The returned string is freshly malloc()ed if it is != PROGNAME.  */
 extern const char *find_in_path (const char *progname);
 
+/* Looks up a program in the given PATH-like string.
+   The PATH argument consists of a list of directories, separated by ':' or
+   (on native Windows) by ';'.  An empty PATH element designates the current
+   directory.  A null PATH is equivalent to an empty PATH, that is, to the
+   singleton list that contains only the current directory.
+   Determines the pathname that would be called by execlp/execvp of PROGNAME.
+   - If successful, it returns a pathname containing a slash (either absolute
+ or relative to the current directory).  The returned string can be used
+ with either execl/execv or execlp/execvp.  It is freshly malloc()ed if it
+ is != PROGNAME.
+   - Otherwise, it returns NULL.  */
+extern const char *find_in_given_path (const char *progname, const char *path);
+
 
 #ifdef __cplusplus
 }
diff --git a/lib/findprog-in.c b/lib/findprog-in.c
new file mode 100644
index 000..3d70b7b
--- /dev/null
+++ b/lib/findprog-in.c
@@ -0,0 +1,178 @@
+/* Locating a program in a given path.
+   Copyright (C) 2001-2004, 2006-2019 Free Software Foundation, Inc.
+   Written by Bruno Haible , 2001, 2019.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+
+#include 
+
+/* Specification.  */
+#include "findprog.h"
+
+#include 
+#include 
+#include 
+#include 
+
+#include "filename.h"
+#include "concat-filename.h"
+#include "xalloc.h"
+
+#if (defined _WIN32 && !defined __CYGWIN__) || defined __EMX__ || defined 
__DJGPP__
+  /* Native Windows, OS/2, DOS */
+# define NATIVE_SLASH '\\'
+#else
+  /* Unix */
+# define NATIVE_SLASH '/'
+#endif
+
+/* Separator in PATH like lists of pathnames.  */
+#if (defined _WIN32 && !defined __CYGWIN__) || defined __EMX__ || defined 
__DJGPP__
+  /* Native Windows, OS/2, DOS */
+# define PATH_SEPARATOR ';'
+#else
+  /* Unix */
+# define PATH_SEPARATOR ':'
+#endif
+
+/* The list of suffixes that the execlp/execvp function tries when searching
+   for the program.  */
+static const char * const suffixes[] =
+  {
+#if defined _WIN32 && !defined __CYGWIN__ /* Native Windows */
+"", ".com", ".exe", ".bat", ".cmd"
+/* Note: Files without any suffix are not considered executable.  */
+/* Note: The cmd.exe program does a different lookup: It searches according
+   to the PATHEXT environment variable.
+   See .
+   Also, it executes files ending .bat and .cmd directly without letting 
the
+   kernel interpret the program file.  */
+#elif defined __CYGWIN__
+"", ".exe", ".com"
+#elif defined __EMX__
+"", ".exe"
+#elif defined __DJGPP__
+"", ".com", ".exe", ".bat"
+#else /* Unix */
+""
+#endif
+  };
+
+const char *
+find_in_given_path (const char *progname, const char *path)
+{
+  {
+bool has_slash = false;
+const char *p;
+
+for (p = progname; *p != '\0'; p++)
+  if (ISSLASH (*p))
+{
+  has_slash

Re: critique of gnulib - string allocation

2019-09-08 Thread Bruno Haible
Jonas Termansen wrote:
> > Buffer allocation and string production
> > should not be decoupled as it leads to bugs. (Modern languages, such as
> > the one I develop at work, does simply not have these problems.) In C,
> > strdup should be used instead of strlen+malloc+strcpy/memcpy because
> > it's much less error prone. Generally asprintf should be used instead of
> > sprintf/snprintf because it does the buffer allocation for you and
> > significantly reduces the risk.
> > ...
> > The
> > correct resolution is to not even have this class of problems in the
> > first place by not decoupling buffer allocation from the string
> > creation.

This is something I can support. The problem is that asprintf is often
considered overkill, because it has to parse a format string first.
Application programmers don't care about this, by system programmers do.

What can we do about it? If gnulib offers some convenience functions

   char * concatenated_string2 (const char *string1, const char *string2);
   char * concatenated_string3 (const char *string1, const char *string2,
const char *string3);
   char * concatenated_string4 (const char *string1, const char *string2,
const char *string3, const char *string4);

it only solves half of the problem, because often a concatenation involves
a substring, and system programmers don't want to do an intermediate
memory allocation for the substring.

So, what we would need is are functions

   char * substring (const char *string, size_t length);
   char * concatenated_string2 (const char *string1, size_t length1,
const char *string2, size_t length2);
   char * concatenated_string3 (const char *string1, size_t length1,
const char *string2, size_t length2,
const char *string3, size_t length3);
   ...

where the length arguments are set to SIZE_MAX to designate the entire
string.

How does that sound? Should we add something like that to gnulib?

Paul Eggert wrote:
> asprintf is often a good way to go, and Gnulib uses it. However, like sprintf 
> and snprintf, asprintf mishandles large buffers and so careful code cannot 
> use 
> it to both allocate and format large results - which means that careful code 
> must do the "risky" stuff anyway since asprintf is not up to the task.

What do you mean by "asprintf mishandles large buffers"? The fact that an error
occurs if the result is longer than INT_MAX characters? Really, we have data
larger than 2 GB in many places, but *strings* larger than 2 GB? You can't send
a mail of that size, nor open a file with a line that is that long in any text
editor except Emacs.

In GNU gettext, I make plenty of use of xasprintf. What are the risks?

Bruno




Re: critique of gnulib - stdioext

2019-09-08 Thread Bruno Haible
Jonas Termansen wrote:
> It's been a while since I looked at the stdio-ext functions, although
> I'm not really sure why they need to exist. At least there is a way to
> satisfy gnulib.

These functions exist
  a) to enable important functionality that has been forgotten by the ISO C
 standards authors, e.g.
   - fseterr (why should a custom/extended printf implementation not be
 able to set the error indicator of a stream?)
   - fpending (needed when, at the end of a program, we want to flush
 stdout in a safe way, but without introducing unneeded system calls).
  b) as an optimization, so that e.g. GNU m4 does not need to copy the contents
 of the FILE buffers into its own buffers, but can already do searches
 for newlines and parentheses directly in the FILE buffers.

Bruno




Re: critique of gnulib - malloc wrapper

2019-09-08 Thread Bruno Haible
Jonas Termansen wrote:
> I object to the attitude that code analysis tools should only really be
> supported on glibc systems. A lot of security features are being
> pioneered on other systems and making it easier for everyone to use
> these tools benefits everyone
> 
> "Exploit mitigation counter-measures" is whenever a system has an
> exploit mitigation and software goes out of its way to not take benefit.
> A good example is the 2014 Heartbleed vulnerability where there was a
> good old buffer overflow. OpenSSL was wrapping malloc with its own
> allocation layer, which made use-after-free bugs worse and did not
> support zeroing freed allocations. That meant that systems with a
> hardened malloc (an exploit mitigation) such as OpenBSD, which would
> have reduced the data leakage a lot, did not benefit from the exploit
> mitigation. ...

The gnulib malloc wrapper is not an as severe problem as you might think.
It is only enabled
  - on AIX, because malloc(0) -> NULL on this platform,
  - on native Windows, because malloc does not errno upon failure,
  - in cross-compiles - a problem for which we are searching a solution.

When you say "A lot of security features are being pioneered on other systems",
these are mostly BSD and research OSes, not AIX nor native Windows.

Bruno




Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Paul Smith
On Sun, 2019-09-08 at 16:59 +0200, Bruno Haible wrote:
> Hi Paul,
> 
> > > > prog = find_in_path_str ("myprog", lookup_path ());
> > > > 
> > > > and if lookup_path() returns NULL it defaults to the environment
> > > > PATH,
> > > 
> > > I don't think NULL should be interpreted as "look in the default
> > > PATH".  Rather, the convention is that an empty or null PATH means
> > > the current directory.
> > 
> > I find that VERY odd, and possibly a security risk.  According to
> > POSIX, the behavior if PATH is not present or is empty is
> > implementation-defined and my preference/expectation would be to behave
> > as if the program is not found.  However, that's a different
> > discussion.
> 
> Your preference does not match what systems do.
> 
> Things are most robust if - the other differences set aside -
>   find_in_path (prog)
> is equivalent to
>   find_in_path_str (prog, getenv ("PATH")).

My suggestion was that BOTH these functions should not assume the CWD
if PATH is empty or missing, not that they should behave differently.

> On the other hand, if the function already determined that the program
> cannot be found, there's no point in calling posix_spawn() or execl().
> To allow the caller to do this optimization, a NULL return value is useful.

Mm.  In my situation I invoked the exec/spawn function anyway, even if
the program was not found.  That ensures that the error handling etc.
is identical in all situations.  I'll need to consider whether the
optimization is sufficiently large, at least in my case, to warrant the
potential difference in behavior.

> > > This, in turn, means that we need to provide also an implementation
> > > for Windows, Cygwin, EMX, DOS.
> > 
> > Yes, clearly that would be ideal
> 
> OK, I just spent 1 hour determining the suffixes that the different
> platforms use in their search.
> 
> > it's not needed for my use-case since
> > I use it with posix_spawn() which obviously doesn't exist on these
> > other platforms
> 
> I still hope someone will write a posix_spawn for native Windows...

That would be nice; posix_spawn is much closer to native Windows API
behavior than fork/exec (from what little I know of it).

> > > And this means that it can't really share much with the existing
> > > findprog.c.  So the implementation should go into a different .c
> > > file.
> > 
> > I don't see why.
> 
> The original findprog.c does not need platform-specific knowledge,
> since by definition it's merely an optimization. The code with
> platform-specific knowledge will be significantly different.

My personal opinion is that it's not difficult to come up with ways
findprog can be useful _in addition_ to simply being a precursor to
exec, and when used like that it would be beneficial to have the full
lookup.  However, I guess now that findprog-in exists that could be
used instead (with getenv ("PATH")) rather than findprog.

Thanks for working on this Bruno!




Re: critique of gnulib - string allocation

2019-09-08 Thread Ben Pfaff
On Sun, Sep 8, 2019 at 10:08 AM Bruno Haible  wrote:
> So, what we would need is are functions
>
>char * substring (const char *string, size_t length);
>char * concatenated_string2 (const char *string1, size_t length1,
> const char *string2, size_t length2);
>char * concatenated_string3 (const char *string1, size_t length1,
> const char *string2, size_t length2,
> const char *string3, size_t length3);
>...
>
> where the length arguments are set to SIZE_MAX to designate the entire
> string.

I think that substring() is the same as xstrndup().



Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Bruno Haible
Hi Paul,

> > > I find that VERY odd, and possibly a security risk.  According to
> > > POSIX, the behavior if PATH is not present or is empty is
> > > implementation-defined and my preference/expectation would be to behave
> > > as if the program is not found.  However, that's a different
> > > discussion.
> > 
> > Your preference does not match what systems do.
> > 
> > Things are most robust if - the other differences set aside -
> >   find_in_path (prog)
> > is equivalent to
> >   find_in_path_str (prog, getenv ("PATH")).
> 
> My suggestion was that BOTH these functions should not assume the CWD
> if PATH is empty or missing, not that they should behave differently.

OK. But what, do you suggest, should the functions do when confronted to
an empty path? What is 'make' supposed to do when the Makefile defines
  PATH =
?

> My personal opinion is that it's not difficult to come up with ways
> findprog can be useful _in addition_ to simply being a precursor to
> exec

In this case we should probably add a flag argument that tells the
function to do the complete lookup also when the progname contains a
slash. For now, until someone claims that this functionality would
be actually useful, I'll leave it as is.

Bruno




Re: [PATCH] findprog: Support searching in a specified path string

2019-09-08 Thread Paul Smith
On Sun, 2019-09-08 at 19:48 +0200, Bruno Haible wrote:
> > My suggestion was that BOTH these functions should not assume the CWD
> > if PATH is empty or missing, not that they should behave differently.
> 
> OK. But what, do you suggest, should the functions do when confronted to
> an empty path? What is 'make' supposed to do when the Makefile defines
>   PATH =
> ?

Very simply, in both the "unset PATH" and "PATH=" cases, it should
report all unqualified (containing no slashes) programs as not found.

That is clearly allowed by POSIX and IMO is the most expected, and
secure, behavior.  Having PATH= be equivalent to PATH=., and even
moreso "unset PATH" be equivalent to PATH=., is quite odd IMO.

> > My personal opinion is that it's not difficult to come up with ways
> > findprog can be useful _in addition_ to simply being a precursor to
> > exec
> 
> In this case we should probably add a flag argument that tells the
> function to do the complete lookup also when the progname contains a
> slash. For now, until someone claims that this functionality would
> be actually useful, I'll leave it as is.

I think this will not be needed.  In the event that someone does need
this they can simply use the findprog-in module instead.




Re: critique of gnulib - string allocation

2019-09-08 Thread Paul Eggert

On 9/8/19 10:08 AM, Bruno Haible wrote:

What do you mean by "asprintf mishandles large buffers"? The fact that an error
occurs if the result is longer than INT_MAX characters? Really, we have data
larger than 2 GB in many places, but *strings* larger than 2 GB?


Sure, in Emacs:

(length (make-string (ash 1 31) ?x))
2147483648

or in regular-expression matchers:

$ truncate -s 2GiB big
$ printf '\nx\n' >>big
$ ls -l big
-rw-r--r-- 1 eggert eggert 2147483651 Sep  8 13:15 big
$ grep -a x big
x

or in other places where GNU tools use strings to represent arbitrary 
user-specified data. The GNU coding standards say to avoid arbitrary limits, and 
on 64-bit platforms we should avoid arbitrary 32-bit limits on the lengths of 
strings.




Re: critique of gnulib - string allocation

2019-09-08 Thread Bruno Haible
Paul Eggert wrote:
> The GNU coding standards say to avoid arbitrary limits, and 
> on 64-bit platforms we should avoid arbitrary 32-bit limits on the lengths of 
> strings.

Well, then we need variants of the *printf functions that return an 'ssize_t'
instead of an 'int'.

Do you happen to know the opinion of the glibc people on this topic?

Bruno





chown: fix configure output

2019-09-08 Thread Bruno Haible
While cross-compiling a gnulib testdir, I see this line in the
configure output:

../configure: line 12368: test: too many arguments

It comes from this line:

if test $gl_cv_func_chown_follows_symlink = no; then

This patch should fix it.


2019-09-08  Bruno Haible  

chown: Fix configure output (regression from 2019-03-23).
* m4/chown.m4 (gl_FUNC_CHOWN): Fix reference to
gl_cv_func_chown_follows_symlink variable.

diff --git a/m4/chown.m4 b/m4/chown.m4
index b798325..b693686 100644
--- a/m4/chown.m4
+++ b/m4/chown.m4
@@ -1,4 +1,4 @@
-# serial 32
+# serial 33
 # Determine whether we need the chown wrapper.
 
 dnl Copyright (C) 1997-2001, 2003-2005, 2007, 2009-2019 Free Software
@@ -80,9 +80,10 @@ AC_DEFUN_ONCE([gl_FUNC_CHOWN],
 HAVE_CHOWN=0
   else
 dnl Some old systems treated chown like lchown.
-if test $gl_cv_func_chown_follows_symlink = no; then
-  REPLACE_CHOWN=1
-fi
+case "$gl_cv_func_chown_follows_symlink" in
+  *yes) ;;
+  *) REPLACE_CHOWN=1 ;;
+esac
 
 dnl Some old systems tried to use uid/gid -1 literally.
 case "$ac_cv_func_chown_works" in




Why does gnulib use makefile rules rather than configure?

2019-09-08 Thread Paul Smith
I'm looking at allowing GNU make to use more gnulib facilities, but
I've run into a serious problem.

It seems that a lot of gnulib modules rely on makefile rules added to
Makefile.in to construct files, rather than using traditional configure
replacement .in file conversions.

This is a real issue for me because I've always provided a shell
script, build.sh, which can be used to bootstrap an instance of make if
the user doesn't already have one.

The build.sh script relies on the user first running configure to
detect all the normal system-specific behaviors and generate the
standard configure output files such as config.h etc., but then instead
of running "make" (which they don't have), they run ./build.sh.

However, when using gnulib this no longer works because many gnulib
modules seem to delegate various configuration operations to the
generated makefile, rather than using configure to do it.

As a simple example, consider alloca-opt.  gnulib provides alloca.in.h
then adds a Makefile.am rule to convert it to alloca.h, that uses sed
to replace one value:

  sed -e 's|@''HAVE_ALLOCA_H''@|$(HAVE_ALLOCA_H)|g'

The prevalence of this type of behavior in gnulib means that I either
have to give up on using gnulib with GNU make, or else give up on
providing a bootstrapping script.


I don't see why these replacements couldn't instead be done via
configure and its built-in replacement facilities.  Why can't we add
these headers as AC_CONFIG_FILES() and allow them to be generated by
the configure script, instead of requiring makefile rules to do it?




Re: critique of gnulib - string allocation

2019-09-08 Thread Paul Eggert

On 9/8/19 1:58 PM, Bruno Haible wrote:

Well, then we need variants of the *printf functions that return an 'ssize_t'
instead of an 'int'.

Do you happen to know the opinion of the glibc people on this topic?


Sorry, no. I imagine it's come up.

I would suggest ptrdiff_t rather than ssize_t, as the latter was 'int' on some 
old 64-bit platforms whereas ptrdiff_t never had that problem, and Glibc malloc 
now prohibits objects larger than PTRDIFF_MAX bytes.




Re: Why does gnulib use makefile rules rather than configure?

2019-09-08 Thread Paul Eggert

On 9/8/19 2:06 PM, Paul Smith wrote:

Why can't we add
these headers as AC_CONFIG_FILES() and allow them to be generated by
the configure script, instead of requiring makefile rules to do it?


Makefile rules can do things that an Autoconf substitition can't, or at least 
can't do easily. For example, the arpa_inet module (the 2nd one I looked at in 
response to your email) doesn't merely substitute; it also copies three files' 
contents into lib/arpa_inet.h.


Typically 'make' beats 'configure' to do this sort of thing, since 'make' is 
more powerful - e.g., it can run in parallel.


Admittedly GNU Make is a special case, since you want to build it without having 
'make'. And if it's just a few Gnulib modules and they're not doing anything 
fancy, perhaps we can modify the modules to use 'configure' substitutions 
instead of 'make' rules. On the other hand perhaps it's time to stop worrying 
about building GNU make without a 'make'. We don't worry that GCC can't be built 
on a system without a C compiler, so why worry about building GNU Make on a 
system without 'make'?




Re: critique of gnulib - cross-compilation guesses

2019-09-08 Thread Bruno Haible
Jonas Termansen wrote:
> > A good example is e.g.
> > cross-compilation. For instance, an old Unix might not have had a broken
> > fork() system call or something. When cross-compiling, gnulib might be
> > pessimistic and default to assuming the system call is broken, which may
> > be handled with a poor or inefficient fallback, disabling functionality
> > at runtime, or a compile time error. There is usually a whitelist of
> > systems without the problem and an environment variable to inject the
> > true answer. That means that it's harder to compete with a new unknown
> > operating system because I must set the environment variable, while
> > other operating systems just work, including the buggy one. That means
> > my good operating system is paying for the complexity caused by a bad
> > operating system. I'd rather the extra work of cross-compiling is moved
> > to the buggy operating systems.
> > 
> > Cross-compilation is inherently a bit more difficult when the host
> > system is buggy, so a more reasonable design would could be to assume
> > the best about unknown operating systems, and to only invoke the
> > workarounds for known buggy systems (and forcing them to set the
> > environment variable instead of me). That means the buggy operating
> > systems pay the cost instead of the good ones (making it harder for new
> > systems). Making cross-compilation nice helps the development of new
> > operating systems, and not just for established things like glibc/musl.
> > 
> > As you saw in my gnulib wiki page, I literally inject 120 environment
> > variables to make gnulib assume the very best about my operating system.
> > I'd rather be confronted with bugs up front than have then be secretly
> > hidden by a portability layer ...

> > 3) Defaulting to assume the best when cross-compiling to unknown systems.

Paul Eggert wrote:
> (3) shouldn't be that hard to do, though it would be tedious.

Here's the patch that implements a configure option to "assume the best"
instead of "assume the worst". I've called it 'risky', so people know that
it's risky to enable it.

I've put it into gnulib-common.m4, although a couple of macros in Autoconf's
functions.m4 could also profit from it. The reason is that an Autoconf
release is likely not going to happen soon.

It handles all your 120 variables, and more.


2019-09-08  Bruno Haible  

Add option to assume the best, not the worst, when cross-compiling.
Suggested by Jonas Termansen .
* m4/gnulib-common.m4 (gl_COMMON_BODY): Add --enable-cross-guesses=...
option. Set gl_cross_guess_normal and gl_cross_guess_inverted.
* m4/argz.m4 (gl_FUNC_ARGZ): Obey --enable-cross-guesses for
lt_cv_sys_argz_works.
* m4/calloc.m4 (_AC_FUNC_CALLOC_IF): Obey --enable-cross-guesses for
ac_cv_func_calloc_0_nonnull.
* m4/canonicalize.m4 (gl_FUNC_REALPATH_WORKS): Obey
--enable-cross-guesses for gl_cv_func_realpath_works.
* m4/cbrtl.m4 (gl_FUNC_CBRTL): Obey --enable-cross-guesses for
gl_cv_func_cbrtl_ieee.
* m4/ceil.m4 (gl_FUNC_CEIL): Obey --enable-cross-guesses for
gl_cv_func_ceil_ieee.
* m4/ceilf.m4 (gl_FUNC_CEILF): Obey --enable-cross-guesses for
gl_cv_func_ceilf_ieee.
* m4/ceill.m4 (gl_FUNC_CEILL): Obey --enable-cross-guesses for
gl_cv_func_ceill_ieee.
* m4/chown.m4 (AC_FUNC_CHOWN): Obey --enable-cross-guesses for
ac_cv_func_chown_works.
(gl_FUNC_CHOWN): Obey --enable-cross-guesses for
gl_cv_func_chown_slash_works, gl_cv_func_chown_ctime_works.
* m4/d-ino.m4 (gl_CHECK_TYPE_STRUCT_DIRENT_D_INO): Obey
--enable-cross-guesses for gl_cv_struct_dirent_d_ino.
* m4/exp2l.m4 (gl_FUNC_EXP2L): Obey --enable-cross-guesses for
gl_cv_func_exp2l_works, gl_cv_func_exp2l_ieee.
* m4/expl.m4 (gl_FUNC_EXPL): Obey --enable-cross-guesses for
gl_cv_func_expl_works.
* m4/expm1.m4 (gl_FUNC_EXPM1): Obey --enable-cross-guesses for
gl_cv_func_expm1_ieee.
* m4/expm1l.m4 (gl_FUNC_EXPM1L): Obey --enable-cross-guesses for
gl_cv_func_expm1l_works.
* m4/fchdir.m4 (gl_FUNC_FCHDIR): Obey --enable-cross-guesses for
gl_cv_func_open_directory_works.
* m4/fchownat.m4 (gl_FUNC_FCHOWNAT_DEREF_BUG): Obey
--enable-cross-guesses for gl_cv_func_fchownat_nofollow_works.
(gl_FUNC_FCHOWNAT_EMPTY_FILENAME_BUG): Obey --enable-cross-guesses for
gl_cv_func_fchownat_empty_filename_works.
* m4/fdopendir.m4 (gl_FUNC_FDOPENDIR): Obey --enable-cross-guesses for
gl_cv_func_fdopendir_works.
* m4/floor.m4 (gl_FUNC_FLOOR): Obey --enable-cross-guesses for
gl_cv_func_floor_ieee.
* m4/floorf.m4 (gl_FUNC_FLOORF): Obey --enable-cross-guesses for
gl_cv_func_floorf_ieee.
* m4/fma.m4 (gl_FUNC_FMA_WORKS): Obey --enable-cross-guesses for
gl_cv_func_fma_works.
* m4/fmaf.m4 (gl_FUNC_FMAF_WO

Re: critique of gnulib - disabling workarounds

2019-09-08 Thread Bruno Haible
Jonas Termansen wrote:
> To recap, my primary requests are:
> 
> 1) Categorizing gnulib into three parts (replacement functions for when
> they don't exist, workarounds for bugs, and utility functions).
> 
> 2) Making it possible to disable the gnulib bug replacements with a
> configure command line option.
> 
> 3) Defaulting to assume the best when cross-compiling to unknown systems.

Now that 3) is implemented, I don't see the utility of 2). If someone is
NOT cross-compiling, and a configure test has determined that a certain
system function is buggy or missing, what would be the point of disabling
the workaround? Even for your situation as an OS developer, Gnulib doesn't
hide the issues: It mentions the test results in the configure output,
and you even a sample program to reproduce each problem (embedded in the
config.log).

Regarding 1): This categorization exists in the documentation [1][2].
Should we go as far as splitting gnulib into 3 git repositories? I think
it would complicate things too much, because
  - most packages borrow modules from all three kinds,
  - there are dependencies not only from the utility modules to the
POSIX emulation modules, but also the other way around (e.g. to
the modules assure, c-ctype, filenamecat, flexmember, gettext-h,
integer-length, localcharset, localename, minmax, scratch-buffer,
verify).

Bruno

[1] https://www.gnu.org/software/gnulib/manual/html_node/index.html
[2] https://www.gnu.org/software/gnulib/MODULES.html




hash: provide hash_xinitialize

2019-09-08 Thread Akim Demaille
Hi Jim,

Recently in Bison I had to check all my calls to hash_initialize for memory 
exhaustion.  Coreutils do it by hand for some of the calls, but we could just 
as well provide a simple wrapper?

commit 73f0aa2e58b1dabbe075ab6bf5644da36d7c72d2
Author: Akim Demaille 
Date:   Mon Sep 9 08:31:33 2019 +0200

hash: provide hash_xinitialize.

Suggested by Egor Pugin 
https://lists.gnu.org/archive/html/bison-patches/2019-09/msg00026.html

* modules/hash (Depends-on): Add xalloc.
* lib/hash.h, lib/hash.c (hash_xinitialize): New.

diff --git a/ChangeLog b/ChangeLog
index 5c226ce43..ca12b170b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2019-09-09  Akim Demaille  
+
+   hash: provide hash_xinitialize.
+   Suggested by Egor Pugin 
+   https://lists.gnu.org/archive/html/bison-patches/2019-09/msg00026.html
+   * modules/hash (Depends-on): Add xalloc.
+   * lib/hash.h, lib/hash.c (hash_xinitialize): New.
+
 2019-09-06  Akim Demaille  
 
bitset: style changes
diff --git a/lib/hash.c b/lib/hash.c
index 9e1f8e841..ca1d04044 100644
--- a/lib/hash.c
+++ b/lib/hash.c
@@ -27,6 +27,7 @@
 #include "hash.h"
 
 #include "bitrotate.h"
+#include "xalloc.h"
 #include "xalloc-oversized.h"
 
 #include 
@@ -645,6 +646,22 @@ hash_initialize (size_t candidate, const Hash_tuning 
*tuning,
   return NULL;
 }
 
+
+/* Same as hash_initialize, but invokes xalloc_die on memory
+   exhaustion.  */
+
+Hash_table *
+hash_xinitialize (size_t candidate, const Hash_tuning *tuning,
+  Hash_hasher hasher, Hash_comparator comparator,
+  Hash_data_freer data_freer)
+{
+  Hash_table *res =
+hash_initialize (candidate, tuning, hasher, comparator, data_freer);
+  if (!res)
+xalloc_die ();
+  return res;
+}
+
 /* Make all buckets empty, placing any chained entries on the free list.
Apply the user-specified function data_freer (if any) to the datas of any
affected entries.  */
diff --git a/lib/hash.h b/lib/hash.h
index a1a483a35..8f2e4591f 100644
--- a/lib/hash.h
+++ b/lib/hash.h
@@ -89,6 +89,9 @@ void hash_reset_tuning (Hash_tuning *);
 Hash_table *hash_initialize (size_t, const Hash_tuning *,
  Hash_hasher, Hash_comparator,
  Hash_data_freer) _GL_ATTRIBUTE_WUR;
+Hash_table *hash_xinitialize (size_t, const Hash_tuning *,
+  Hash_hasher, Hash_comparator,
+  Hash_data_freer) _GL_ATTRIBUTE_WUR;
 void hash_clear (Hash_table *);
 void hash_free (Hash_table *);
 
diff --git a/modules/hash b/modules/hash
index 42502e749..31303c1a8 100644
--- a/modules/hash
+++ b/modules/hash
@@ -9,6 +9,7 @@ Depends-on:
 bitrotate
 stdbool
 stdint
+xalloc
 xalloc-oversized
 
 configure.ac: