A recent discussion (``Default software in the base'') suggests using
Clang/LLVM as the system compiler in OpenBSD in the short-term future.
This discussion hasn't really gone anywhere, yet I thought I could waste
bandwidth with my thoughts as the current de-facto compiler maintainer
in OpenBSD.

Mind you, I did not ask to end up maintaining the system compiler in
OpenBSD. I have earned this position because I have had to fix or
workaround too many bugs in gcc, as a port maintainer. And I wish I
hadn't need to do this.



A long time ago, in the first few years of the *BSD projects, the only
free software compiler spanning the various platforms BSD systems were
targeting was gcc. pcc was orphaned, TenDRA still used a cumbersome
build system and did not support enough platforms, and that was about
everything in the free software land.

Also, gcc 2.5 (at the time) had a few bugs, but not many. You could
trust it to produce working code at any optimization level, and forget
about it. In other words: there was no need to put any effort in
maintaining the compiler, because it was (almost) bug-free.

This state of mind was valid up to the 2.7 days. The accepted wisdom was
that -O2 was supposed to be followed by -fno-strength-reduce, because
2.7 had bugs in the strength reduction code, which mostly affected i386
code. And then you could trust the compiler.

And then C++98 came out, as well as C99, and it was time for serious
work in gcc, if only to attempt to support the new features of these
standards. One may remember the schism between gcc 2.8, conservative but
trying to catch up on C++98, and the ``Pentium gcc'' group, attempting
to produce faster code by stretching the optimizer code beyond its
limits.

These projects eventually merged as gcc 2.95. From then on, a few things
changed forever:
- many more people were working on specific optimizations
- these optimizations, unlike the 2.5/2.7 optimizer, were no longer
  ``almost platform-independent'', but would benefit from
  particularities of the target platform, leading to more code
  attempting to decide whether a given optimization recipe was worth
  applying or not.

As an unavoidable consequence of this, something very important in the
world order changed: gcc had bugs, and you were expected to accept that
and cope with them.



When I write `gcc', you can read `the compiler'. As Arthur C. Clarke
would have said, ``any sufficiently optimizing compiler is
indistinguishable from magic.''



So what does this tale teaches us ?



First, compilers are fragile. While one would like to expect a minimum
level of correctness and trustworthiness from a modern compiler, we
can't, regardless of the compiler we use.


Second, compilers are a moving target. Architectures without enough
testers and developers start misbehaving (because they are the only ones
to subtly break assumptions of the newly added optimization passes, yet
95% of the time end up producing working code, after all), and
eventually get dropped. The prime example of this had been m88k, which
got broken in gcc 2.95 because of a target-specific macro suddenly
needed its arguments to be brace-protected, and noone had fixed the m88k
backend because noone had tested/cared.



This is the reason why OpenBSD ships with different compilers, depending
upon the platform you are running OpenBSD on: a given release of gcc
might not be suitable on a given, less popular, platform (which is not
surprising for gcc since, due to benchmark^Wcompetition with other
compilers, from gcc 3 onwards, the gcc developers have been eager to
release ``bug free'' new versions by enforcing a policy that only
``regressions'' would get fixed, and spending more time changing their
definition of ``regression'' or trying to explain why regressions
weren't, so as not need to fix them in any stable release). And it is
very unfortunate that gcc 2.95 does not completely implement C99, for we
would have happily kept it for the older platforms, those which are not
supported, or fubar (does it make any difference) with later versions.



Switching from gcc to clang is worth considering, and truth is that some
developers have been tinkering with that idea. This is something that
may (and probably will) happen on some platforms (since llvm does not
support as many platforms as OpenBSD does); but switching a subset of
OpenBSD's supported platforms is not a trivial task, and a lot of work
needs to happen first (such as replacing libgcc with compiler-rt, and
port it to the missing platforms).

And if/when such a switch happens, bugs will trigger and problems will
need fixing; and we can not risk being naive enough to expect llvm
developers to handle bug reports and bugfix releases any better than the
gcc developers do (although we hope they will).

Assuming the upstream developers fail to deliver, it's up to us to fix
or workaround compiler problems as we encounter them; sometimes it's as
easy as finding out which patch has been commited upstream, but not
backported to the version we use; and sometimes it's a genuine issue
which may or may not have been reported in the latest compiler version,
and we are on our own. When this happens, we can only rely upon our
developer skills and intimacy with the compiler.

A few of our developers have, over the years, become unafraid of gcc,
and able to investigate issues, backport fixes, and fix or work around
bugs: I'll only mention niklas@, espie@, etoh@ and otto@, and hope the
few others will forgive me for not listing their names. This has not
been an easy road, to say the least. Now, another few of our developers
are working on building a similar knowledge of llvm. I wish them a lot
of luck, and I will try to join them in the near future.

In the meantime I am not sure they feel confident enough to support
switching the most popular OpenBSD platforms from gcc to llvm.

In a few months or years from now, things will be different...



...but there is something I wish would happen first.

An LTS release of an open source compiler.

Because all compilers nowadays are full of subtle bugs, but so many of
them than you can't avoid them as soon as you compile any nontrivial
piece of code, and because we can't afford to going back to assembly, we
need a compiler we can trust.

GCC, as well as LLVM, have Fortune 500 companies backing them, paying
smart developers to work fulltime on these projects.

Yet none of them dares to provide a long time support version. Bugs in
version N are fixed in version N+1, but new bugs are introduced. And
noone cares about trying to settle things down and produce a compiler
one can trust (because version N+1 runs 3.14% faster in the loonystones
benchmark which doesn't match any real life use case). Who cares?
Tomorrow's compiler will generate code which will complete an infinite
loop in less than 5 seconds; stay tuned for more accomplishments!



The free software world needs an LTS compiler. The last de-facto LTS
compiler we have had was gcc 2.7.2.1, and it is too old to compile
modern C and C++ code.



Should a free software LTS compiler appear (be it a gcc fork, or an llvm
fork, or something else), then OpenBSD would consider using it, very
seriously. And we probably wouldn't be the only free software project
doing so.



Miod (a.k.a ``Don Quixote de La Compiladora')

Reply via email to