https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

            Bug ID: 109485
           Summary: Feature request: More efficient include path handling
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: preprocessor
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dani.borg at outlook dot com
  Target Milestone: ---

When preprocessing, the method to lookup include paths is inefficient and cause
more file system calls than needed.
The current method simply tries each include path in order, for every unique
#include directive. Basically O(n*n) in complexity.
The method scales poorly when the number of include paths increase which can
cause high system load and long build times.

A smarter approach, which clang appears to be using, is to keep track of which
include paths doesn't contain the path seen in the include directive. Then file
system queries for impossible paths can be avoided.

To give a concrete example that compares gcc and clang, the following can be
run in bash on a Linux system. The example below only shows the relevant output
from strace.

#prepare - create 2 include paths, 3 headers and a source file including the
headers
mkdir -p a b/b && touch b/b/a.h b/b/b.h b/b/c.h && echo -e '#include
"b/a.h"\n#include "b/b.h"\n#include "b/c.h"' > a.c

#gcc
strace -f -e open,stat gcc -Ia -Ib -nostdinc a.c -E -o/dev/null
  [pid 12] open("a/b/a.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or
directory)
  [pid 12] open("b/b/a.h", O_RDONLY|O_NOCTTY) = 4
  [pid 12] open("a/b/b.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or
directory)
  [pid 12] open("b/b/b.h", O_RDONLY|O_NOCTTY) = 4
  [pid 12] open("a/b/c.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or
directory)
  [pid 12] open("b/b/c.h", O_RDONLY|O_NOCTTY) = 4
  [pid 12] +++ exited with 0 +++

#clang
strace -f -e open,stat clang -Ia -Ib -nostdinc a.c -E -o/dev/null
  stat("a/b", 0x7ffd8d11ac20)             = -1 ENOENT (No such file or
directory)
  stat("b/b", {st_mode=S_IFDIR|0755, st_size=55, ...}) = 0
  open("b/b/a.h", O_RDONLY|O_CLOEXEC)     = 3
  open("b/b/b.h", O_RDONLY|O_CLOEXEC)     = 3
  open("b/b/c.h", O_RDONLY|O_CLOEXEC)     = 3
  +++ exited with 0 +++

Note how clang first determines if the partial path exist, before testing the
full path. This way all open("a/b/... calls can be avoided.

Reply via email to