On 30.01.2014 23:10, Rob Weir wrote:
On Wed, Jan 29, 2014 at 4:18 AM, Andre Fischer <awf....@gmail.com> wrote:
I would like to report some observations that I made when thinking about how
to make building OpenOffice with one global makefile feasible. It will
probably the last of build related mails in the near future.
Traditional make uses a top-down approach. It starts with a target, 'all'
by default, and looks at its dependencies. When one of those has to be made
or is newer then the target then the target also has to be made. This is
done recursively and depth-first. Every file on which 'all' has a direct or
indirect dependency has to be checked. If we would build OpenOffice with
one makefile (included makefiles don't count) then that are a lot of files
to check. There are about 9800 cxx files, 3500 c files, 12000 hxx files,
and lot of external headers. Checking the modification time of so many
files is one of the reasons for the delay in , say, sw/ between starting
make and its first action.
As I don't have all global dependencies in a format that would allow
experimation, I tried how long it would take to get the mtime (time of last
modification) for all files, source, generated, compiled, linked, about
120000. I wrote a small Java program for that. With a warm cache that
takes about 23s. When run in 4 threads this reduced to less than 8s. Could
be worse.
But it also could be better because in general there are only a few files
modified, usually files that you modified yourself in an editor. There is
another approach, introduced, as far as I know, by the tup [1] build tool,
that is bottom up. If you had something similar to the oracle of complexity
theory, that gave you the list of modified files since the last build, you
could find the depending files much faster. Faster for two reasons.
Firstly, there is only one path in the dependency tree up towards the root
(while there are many down from the root). Only targets on this path are
affected by the modified file. Secondly, the dependency analysis is
comparatively cheap. The expensive part is to determine the file
modification times. If they where miraculously given then even the top-down
approach would not take noticably longer.
So I made another experiment to see if such an oracle can be created. Java
7 has the java.nio.file.WatchService that lets you monitor file
modfifications in one directory. I registered it to all directories in our
source tree (some 16000 directories). With the WatchService in place every
file modification can be recorded and stored for later. On the next build
you only have to check the set of modified files, not all files.
Registering the directory watchers takes a couple of seconds but after that
it does not cause any noticeable CPU activity. Any file modifications are
reported almost at once. I do not have the framework in place to start a
build with this information but I would expect it to be as fast as compiling
the modified files and linking takes.
The tup website references a paper [2] in which the established top-down
approaches are called alpha alogithms and the new bottom-up approach is
called beta algorithm. Tup has implemented a file modification watcher (in C
or C++) only for Linux. On Windows it just scans all files (for which it
needs a little more time than my Java program, maybe it does not use more
than one thread).
This is something that we should keep in mind for when we ever should get a
build solution with global dependencies and this build tool would turn out
to be too slow.
If can find the source code of my Java experiments at [3]. If nothing else
you can see an application of the ForkJoinPool that allowed my to write the
parallel file system scan in just a few lines. There is also an alternative
implementation that uses the ExecutorService (with a fixed thread pool)
which needs a few more lines of code. And there is of course the use of the
WatchService.
Has anyone read this book?
http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620
It was on my list to read for many years. From what I've seen it
suggests design approaches to the improve build times. So things that
go beyond what you can do by just changing build files, more
fundamental changes to how interfaces are defined.
I have not but I might, thanks for the hint.
I agree that improving our software design is always a good idea but I
would not change our design just to make the build faster. Many of our
code related problems are caused by the design and the limitations of
the build system. Examples are the individual building of directories
(in dmake modules) and creation of one library per directory or by the
many ugly tricks that avoid becoming "incompatible" (the need to compile
files in other modules).
Otherwise I wonder if we're trying to optimize a bubble sort?
I would smile about this if I had not seen a handcrafted sort algorithm
in our build scripts that is even inferior than bubble sort :-)
Improving the build system for me is more than just an interesting way
to pass time. It is one of my major pain points (not sure how to phrase
that in proper english). The Windows build takes about three times the
time of the Linux build. The dmake builds in the larger modules (sc,
sd) take so long that when they finish I have sometimes forgotten what
changes I made. The gbuild builds are not entirely reliable (change an
SDI file and make does not compile one file).
-Andre
-Rob
Regards,
Andre
[1] http://gittup.org/tup/
[2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
[3] http://people.apache.org/~af/test.zip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org