Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread Kamil Dudka
On Tuesday, April 10, 2018 9:54:26 PM CEST Bernhard Voelker wrote:
> Hi Kamil,
> 
> On 04/06/2018 09:29 AM, Kamil Dudka wrote:
> > On Friday, April 6, 2018 8:08:55 AM CEST Bernhard Voelker wrote:
> >> On 04/05/2018 02:44 PM, Kamil Dudka wrote:
> >>> The same 'find' command with
> >>> the -noleaf option does not abort:
> >>> 
> >>> https://bugzilla.redhat.com/1558249#c20
> >> 
> >> This bug report is non-public.  Can you make it available, or
> >> post a reproducer, please?
> > 
> > I did not know it was private.  It should be publicly available now.
> 
> Thanks.
> 
> Although current upstream (v4.6.0-172-gb94be7a0) has already the
> non-CIFS-aware FTS change from gnulib, I couldn't reproduce the crash with
> a quick test here. I'll try again tomorrow.
> 
> Did you try with a build from upstream as well?
> 
> Thanks & have a nice day,
> Berny

I have not tried to reproduce the crash myself.  I was just analyzing the 
strace outputs attached to that bug.  I hope I will have some time next week 
to debug it deeper (in case nobody finds the cause till then).

Kamil





Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread James Youngman
On Thu, Apr 5, 2018 at 4:49 PM, Paul Eggert  wrote:
>
> On 04/05/2018 05:44 AM, Kamil Dudka wrote:
>>
>> Does this change (intentionally?) enable leaf optimization for CIFS?
>
>
> No, I wasn't thinking about CIFS when I wrote that. Thanks for reporting
it. I installed the attached to try to fix this. I have not tested it with
CIFS.
>

Where is the abort actually happening?   If the abort is in fts itself,
then this patch is a reasonable fix (though generally I prefer not to allow
aborts in library code).   But if the abort is in find itself (i.e. not in
the gnulib code at all) then we should fix find (either as well, or
instead).

James.


Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread Kamil Dudka
On Wednesday, April 11, 2018 3:38:40 PM CEST James Youngman wrote:
> On Thu, Apr 5, 2018 at 4:49 PM, Paul Eggert  wrote:
> > On 04/05/2018 05:44 AM, Kamil Dudka wrote:
> >> Does this change (intentionally?) enable leaf optimization for CIFS?
> > 
> > No, I wasn't thinking about CIFS when I wrote that. Thanks for reporting
> 
> it. I installed the attached to try to fix this. I have not tested it with
> CIFS.
> 
> 
> Where is the abort actually happening?   If the abort is in fts itself,
> then this patch is a reasonable fix (though generally I prefer not to allow
> aborts in library code).   But if the abort is in find itself (i.e. not in
> the gnulib code at all) then we should fix find (either as well, or
> instead).
> 
> James.

The abort() is in gnulib/fts code.  It is triggered on this line:

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/fts.c;h=3814e58f#l1383

As far as I know, there is currently no fix available for that.

Kamil





Gnulib on Windows (native / mingw32) / VMS / etc.

2018-04-11 Thread Paul Smith
Hi all.  I spent a bit of time this weekend looking into what it would
take to update GNU make to use the glob/fnmatch implementations from
the current gnulib, rather than the ancient (and buggy) implementation
that GNU make has embedded, more or less unchanged, for at least 25
years.

This, unfortunately, turns out to be quite a daunting task.

Gnulib glob/fnmatch rely on a huge number of other gnulib packages so
instead of the simple two-file implementation GNU make has now it
expands to a lib directory containing 20+ files (I didn't actually
count but it's quite a lot).

This matters for a few reasons: first, GNU make provides a
bootstrapping script that will let you compile make on systems which
don't already have make... that means that I need to be able to build
all these extra files without the assistance of automake (I do run
configure).  I'm not sure how easy that will be.

More worrying, GNU make is compile-able on Windows and does not require
Cygwin: it can be built using native MSVC and/or MingW32.  For
bootstrapping GNU make provides a "build_w32.bat" file that compiles
make using either gcc or MSVC.  In this setup I have a hard-coded
config.h for Windows that I install since I don't run configure. 
Making this work with the many packages added to lib seems complex.

Finally, GNU make can also be compiled on other less common
architectures, such as VMS and these ports are quite active.  Is there
any information about all these different glob/fnmatch requirements in
gnulib, insofar as portability to other environments like VMS?

I would really like to use native gnulib implementations but the
overhead involved has me considering whether it would be feasible to
try to extract the latest implementation and somehow fencing it in to
avoid all these extra dependencies.

Any comments anyone has would be welcome, cheers!



Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread Paul Eggert

On 04/11/2018 07:03 AM, Kamil Dudka wrote:

As far as I know, there is currently no fix available for that.


I looked into this and found some bugs in relevant code (bugs that I 
introduced last summer; sorry!). I installed the attached into Gnulib to 
fix what I found. Please give it a try and see whether it fixes your 
problem.


>From be9b95a8f5a3ddbb722099df10d19c74a341fff9 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Wed, 11 Apr 2018 12:50:35 -0700
Subject: [PATCH] fts: fix bug in find across filesystems

This fixes a bug I introduced last summer.
Problem reported by Kamil Dudka in:
https://lists.gnu.org/r/bug-gnulib/2018-04/msg00033.html
* lib/fts.c (filesystem_type, dirent_inode_sort_may_be_useful)
(leaf_optimization):
New arg for file descriptor.  All callers changed.
(fts_build): Check for whether inodes should be sorted
before closing the directory.
---
 ChangeLog | 12 
 lib/fts.c | 55 +++
 2 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index ea30952f5..3702f75c1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2018-04-11  Paul Eggert  
+
+	fts: fix bug in find across filesystems
+	This fixes a bug I introduced last summer.
+	Problem reported by Kamil Dudka in:
+	https://lists.gnu.org/r/bug-gnulib/2018-04/msg00033.html
+	* lib/fts.c (filesystem_type, dirent_inode_sort_may_be_useful)
+	(leaf_optimization):
+	New arg for file descriptor.  All callers changed.
+	(fts_build): Check for whether inodes should be sorted
+	before closing the directory.
+
 2018-04-07  Bruno Haible  
 
 	unicase/u*-context: Fix link errors with libunistring <= 0.9.9.
diff --git a/lib/fts.c b/lib/fts.c
index 3814e58fc..a26c6824a 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -726,11 +726,12 @@ dev_type_compare (void const *x, void const *y)
   return ax->st_dev == ay->st_dev;
 }
 
-/* Return the file system type of P, or 0 if not known.
+/* Return the file system type of P with file descriptor FD, or 0 if not known.
+   If FD is negative, P's file descriptor is unavailable.
Try to cache known values.  */
 
 static fsword
-filesystem_type (FTSENT const *p)
+filesystem_type (FTSENT const *p, int fd)
 {
   FTS *sp = p->fts_fts;
   Hash_table *h = sp->fts_leaf_optimization_works_ht;
@@ -756,7 +757,7 @@ filesystem_type (FTSENT const *p)
 }
 
   /* Look-up failed.  Query directly and cache the result.  */
-  if (fstatfs (p->fts_fts->fts_cwd_fd, &fs_buf) != 0)
+  if (fd < 0 || fstatfs (fd, &fs_buf) != 0)
 return 0;
 
   if (h)
@@ -778,12 +779,12 @@ filesystem_type (FTSENT const *p)
   return fs_buf.f_type;
 }
 
-/* Return false if it is easy to determine the file system type of the
-   directory P, and sorting dirents on inode numbers is known not to
-   improve traversal performance with that type of file system.
-   Otherwise, return true.  */
+/* Return true if sorting dirents on inode numbers is known to improve
+   traversal performance for the directory P with descriptor DIR_FD.
+   Return false otherwise.  When in doubt, return true.
+   DIR_FD is negative if unavailable.  */
 static bool
-dirent_inode_sort_may_be_useful (FTSENT const *p)
+dirent_inode_sort_may_be_useful (FTSENT const *p, int dir_fd)
 {
   /* Skip the sort only if we can determine efficiently
  that skipping it is the right thing to do.
@@ -791,7 +792,7 @@ dirent_inode_sort_may_be_useful (FTSENT const *p)
  while the cost of *not* performing it can be O(N^2) with
  a very large constant.  */
 
-  switch (filesystem_type (p))
+  switch (filesystem_type (p, dir_fd))
 {
 case S_MAGIC_CIFS:
 case S_MAGIC_NFS:
@@ -805,16 +806,17 @@ dirent_inode_sort_may_be_useful (FTSENT const *p)
 }
 }
 
-/* Given an FTS entry P for a directory D,
+/* Given an FTS entry P for a directory with descriptor DIR_FD,
return true if it is both useful and valid to apply leaf optimization.
The optimization is useful only for file systems that lack usable
dirent.d_type info.  The optimization is valid if an st_nlink value
of at least MIN_DIR_NLINK is an upper bound on the number of
-   subdirectories of D, counting "." and ".."  as subdirectories.  */
+   subdirectories of D, counting "." and ".."  as subdirectories.
+   DIR_FD is negative if unavailable.  */
 static enum leaf_optimization
-leaf_optimization (FTSENT const *p)
+leaf_optimization (FTSENT const *p, int dir_fd)
 {
-  switch (filesystem_type (p))
+  switch (filesystem_type (p, dir_fd))
 {
   /* List here the file system types that may lack usable dirent.d_type
  info, yet for which the optimization does apply.  */
@@ -851,12 +853,13 @@ leaf_optimization (FTSENT const *p)
 
 #else
 static bool
-dirent_inode_sort_may_be_useful (FTSENT const *p _GL_UNUSED)
+dirent_inode_sort_may_be_useful (FTSENT const *p _GL_UNUSED,
+ int dir_fd _GL_UNUSED)
 {
   return true;
 }
 static enum leaf_optimization
-leaf_optimization (F

Re: Gnulib on Windows (native / mingw32) / VMS / etc.

2018-04-11 Thread Paul Eggert
Except for VMS, GNU Emacs is in a boat similar to GNU Make. It uses 
Gnulib (but not Automake), it is buildable on MS-Windows and does not 
need Cygwin, it has its own configuration batch script for MS-Windows 
(and MS-DOS!). So I would look to Emacs for inspiration here. I'm mostly 
responsible for the Emacs Gnulib usage and can help and/or answer 
questions in this area.


Gnulib does have some VMS support but it's on an as-needed basis, and 
quite possibly you'd need to hack it in (this was most recently done for 
mktime in September). As it happens, in August we removed some VMS code 
from the glob module, since nobody had used it for years and glibc was 
removing it. I suppose we could put it back in. Alternatively, GNU Make 
could continue to use the old glob.c for VMS, and use Gnulib glob.c for 
everything else. (Not ideal, I know.)





Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread Bernhard Voelker
On 04/11/2018 09:53 PM, Paul Eggert wrote:
> @@ -1589,6 +1593,15 @@ mem1:   saved_errno = errno;
>  tail->fts_link = p;
>  tail = p;
>  }
> +
> +/* If there are many entries, no sorting function has been
> +   specified, and this file system is of a type that may be
> +   slow with a large number of entries, arrange to sort the
> +   directory entries on increasing inode numbers.  */
> +if (nitems == _FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD
__^^
> +&& !sp->fts_compar)
> +  sort_by_inode = dirent_inode_sort_may_be_useful (cur, 
> dir_fd);
> +

This looks wrong: didn't you mean the '>' operator?
Thanks for digging into this issue.

Have a nice day,
Berny



Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread Paul Eggert

On 04/11/2018 02:58 PM, Bernhard Voelker wrote:

didn't you mean the '>' operator?


No, '==' should be right. I added the attached to try to explain this 
better. Thanks for the nudge.


>From 09ca545d5c0af8760ce27b1e9cbe300e64f73746 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Wed, 11 Apr 2018 16:27:03 -0700
Subject: [PATCH] fts: add comment

* lib/fts.c (fts_build): Explain why ==, not >.
See remark by Bernhard Voelker in:
https://lists.gnu.org/r/bug-gnulib/2018-04/msg00041.html
---
 ChangeLog | 5 +
 lib/fts.c | 6 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 3702f75c1..f5752a217 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2018-04-11  Paul Eggert  
 
+	fts: add comment
+	* lib/fts.c (fts_build): Explain why ==, not >.
+	See remark by Bernhard Voelker in:
+	https://lists.gnu.org/r/bug-gnulib/2018-04/msg00041.html
+
 	fts: fix bug in find across filesystems
 	This fixes a bug I introduced last summer.
 	Problem reported by Kamil Dudka in:
diff --git a/lib/fts.c b/lib/fts.c
index a26c6824a..d54351059 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -1597,7 +1597,11 @@ mem1:   saved_errno = errno;
 /* If there are many entries, no sorting function has been
specified, and this file system is of a type that may be
slow with a large number of entries, arrange to sort the
-   directory entries on increasing inode numbers.  */
+   directory entries on increasing inode numbers.
+
+   The NITEMS comparison uses ==, not >, because the test
+   needs to be tried at most once once, and NITEMS will exceed
+   the threshold after it is incremented below.  */
 if (nitems == _FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD
 && !sp->fts_compar)
   sort_by_inode = dirent_inode_sort_may_be_useful (cur, dir_fd);
-- 
2.14.3



Re: [PATCH 5/6] fts: three levels of leaf optimization

2018-04-11 Thread Bernhard Voelker
On 04/12/2018 01:28 AM, Paul Eggert wrote:
> No, '==' should be right. I added the attached to try to explain this 
> better. Thanks for the nudge.

ah, indeed, running dirent_inode_sort_may_be_useful only once
in the loop may/will save quite some time, thanks.

Have a nice day,
Berny