fdopendir closes the file descriptor on MinGW

David Grayson Mon, 16 Mar 2015 08:04:45 -0700

Hello.  When running under MinGW on Windows, there seems to be a bug
in Gnulib's fdopendir implementation.  Gnulib's fdopendir closes the
file descriptor that was passed to it as an argument.  It then tries
to reopen the directory using the same file descriptor, but that
doesn't seem to work in MinGW, and so the file descriptor remains
closed after fdopendir returns.


Here is an example of some code that exhibits the bug:

  int fd = open("emptydir", O_RDONLY | O_DIRECTORY | O_NOCTTY | O_NONBLOCK);
  printf("dup(fd) = %d\n", dup(fd));
  fdopendir(fd);
  printf("dup(fd) = %d\n", dup(fd));

Under MinGW, the second call to dup will fail and return -1 because
fdopendir closes the file descriptor.

Ultimately, I am trying to compile Grep 2.21 in Windows with MinGW so
that I can have a good tool for searching files.  (I can't find any
recent version of Grep compiled for Windows which doesn't have extra
dependencies.)  When I compile Grep, the version I compiled does not
work properly with the recursive options (-r and -R).  I wrote about
this on the Grep bug tracker here:

  http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16444#18

Grep uses the FTS implementation from Gnulib.  Gnulib's FTS
implementation uses fdopendir.  After calling fdopendir, FTS tries to
use the same file descriptor for other purposes, but since fdopendir
already closed it, grep ends up printing a fatal error that says "Bad
file descriptor".  (This was "Bug A" in my post on the grep bug
tracker.)


== How I reproduced the bug ==

I reproduced the bug by making a very simple C project that just uses
Gnulib.  The code does some operations on an empty directory named
"emptydir", which is assumed to be inside the current working
directory, and prints the results.  You can get the source here:

  http://www.davidegrayson.com/keep/gnulib_150315/test_gnulib-1.0.0-src.tar.gz

I used autoconf to generate a source distribution, which you can get here:

  http://www.davidegrayson.com/keep/gnulib_150315/test_gnulib-1.0.0.tar.gz

The main source file (main.c) is attached to this message, and you can
also read it here:

  http://www.davidegrayson.com/keep/gnulib_150315/main.c

When I run the code in Arch Linux, it gives the expected output:

  $ ./src/testgnulib
  dup(fd) = 4
  dup(fd) = 5
  emptydir
  emptydir

When I run the code in Windows 8.1 64-bit, it gives bad results.  I
compiled the code with MinGW (gcc.exe (i686-posix-dwarf-rev1, Built by
MinGW-W64 project) 4.9.2) and a patched version of make (3.82-pololu2
from https://github.com/pololu/make).  Most of the other utilities on
my PATH in Windows come from Msysgit.  I ran "./configure" in Git
Bash.  Then I invoked "make" with the following command, because
CreateProcess does not recognize paths like /c/git/bin/mkdir:

  make MKDIR_P='mkdir -p' SED=sed

When I run the resulting executable in Windows, it gives this output:

  $ ./src/testgnulib.exe
  dup(fd) = 4
  dup(fd) = -1
  emptydir
  emptydir: error: Bad file descriptor

This shows that the file descriptor was duplicable before fdopendir
was called, but not duplicable afterwards (because fdopendir closed
the descriptor).  It also shows the resulting problem in FTS, where
FTS returns an error when it tries to traverse the empty directory.

If I comment out the "close (fd);" line in fdopendir.c, then the
program behaves as expected under MinGW, returning the same output
that it did in Arch Linux.


== Discussion of the bug ==

The documentation for fdeopendir that I was reading can be found here:

  http://pubs.opengroup.org/onlinepubs/9699919799/functions/fdopendir.html
  https://www.gnu.org/software/gnulib/manual/html_node/fdopendir.html

The prototype of fdeopndir is:

  DIR *fdopendir(int fd);

It takes a file descriptor representing an opened directory, and
creates a DIR pointer representing that directory, to be used by the
dirent system.

Neither the POSIX documentation nor the Gnulib documentation for
fdopendir mentions that the file descriptor might get closed by
fdopendir, so it seems like a bug for that to happen.

The Gnulib documentation says that its fdopendir "does not guarantee
that 'dirfd(fdopendi
r(n))==n'".  And indeed, when I call dirfd(fdopendir(fd)) under MinGW
it returns -1.

The POSIX documentation for fdopendir says that when closedir() is
called on the returned pointer, the file descriptor (fd) shall be
closed.  That means that somehow, some information about fd has to be
associated with the returned DIR pointer.

I looked at the source code of Gnulib's fdopendir and tried to figure
out what is going on, but I can't say I totally understand it.
Gnulib's fdopendir first closes the passed file descriptor, and then
it uses a non-thread-safe strategy involving duplication and recursion
in order to reopen the directory in a way so that it happens to reuse
the same file descriptor number.  This works on Linux but apparently
does not work on MinGW.

This is speculation, but I suspect that this dup/recursive strategy in
fdopendir is there in order to make sure that the returned DIR pointer
is using the right file descriptor, so that closedir will end up
closing the file descriptor as POSIX requires.  But on MinGW+Gnulib,
dirfd(DIR *) returns -1, so it makes me think that DIR pointers don't
actually contain file descriptors or close them when closedir is
called; they might use some kind of native Win32 HANDLE instead.


== Possible solutions ==

I would appreciate any tips on how to fix this problem.  Ideally we
would fix Gnulib, but if that is not going to happen then I would
still like to find a good workaround that I can apply to the software
that I build.  My workaround of commenting out the "close (fd);" line
in fdopendir.c is not a good long term solution because it will leave
unused directory handles open in any program that assumes that
"closedir(fdopendir(fd))" actually closes fd, as specified by POSIX.

To make fdopendir be POSIX compliant on MinGW, it seems like you would
want to somehow associate the fd with the DIR pointer even if the fd
isn't used by typical dirent functions.  Then when closedir is called,
you would want to retrieve that fd and close it, as a side effect.

Alternatively, if full POSIX compliance is too hard, we can instead
say that any Gnulib program that calls fdopendir should avoid using
the supplied file descriptor afterwards since it might have been
closed.  We would need to fix FTS and any other code that does that.
I suspect this would be a radical change that breaks lots of programs
though.

I don't fully understand what is going on here and I might have missed
something.

--David Grayson

#include <stdio.h>

#include "config.h"
#include "fts_.h"
#include "progname.h"
#include "error.h"
#include "errno.h"
#include "string.h"
#include "fcntl.h"
#include "unistd.h"

int main()
{
  set_program_name("testgnulib");

  // This code demonstrates a bug in fdopendir on Mingw.
  {
    int fd = open("emptydir", O_RDONLY | O_DIRECTORY | O_NOCTTY | O_NONBLOCK);
    if (fd < 0)
    {
      fprintf(stderr, "open returned %d\n", fd);
      return 1;
    }

    printf("dup(fd) = %d\n", dup(fd));
    fdopendir(fd);
    printf("dup(fd) = %d\n", dup(fd));
  }

  // This code shows how the bug affects fts.
  {
    char filename[] = "emptydir";
    char * fts_args[] = { filename, NULL };
    int fts_opts = FTS_PHYSICAL;
    FTS * fts = fts_open(fts_args, fts_opts, NULL);
    if (!fts)
    {
      fprintf(stderr, "fts_open returned NULL\n");
      return 2;
    }

    while(1)
    {
      FTSENT * ent = fts_read(fts);
      if (!ent) { break; }
      if (ent->fts_info == FTS_ERR)
      {
        printf("%s: error: %s\n", ent->fts_path, strerror(ent->fts_errno));
      }
      else
      {
        printf("%s\n", ent->fts_path);
      }
    }
  }
  return 0;
}

fdopendir closes the file descriptor on MinGW

Reply via email to