Crispin v. Bernstein (was Re: Maildir format)

craig Fri, 14 Jan 2000 08:42:01 -0800
I would be interested in knowing more about both Dan Bernstein's
and Mark Crispin's views about how software should be engineered,
e.g. pointers to web pages with their views boiled down to the
salient points.  (Someday I intend to investigate at least Dan's
views for myself via web/USENET searches and such, but that'll wait
until I can really devote enough time to start and finish such a
search...so, in the meantime, URLs to summaries would be wonderful.)

>From my (very limited) perspective, both Mark and Dan eschew some of
the "common wisdom" about Unix and related matters, and both have some
good, or at least plausible, reasons.  For example, I recall Mark
explaining to me various reasons why MIT's ITS operating system was
superior to DEC's TOPS-10 back around 1975 -- he claimed the
dispatch method of ITS' kernel-call-from-user-program (to use a bit
of recognizable modern lingo; back then, it was "UUO", IIRC) was
actually *faster* than TOPS-10, despite the fact that its API was
(sixchar-)name-based vs. TOPS-10's number-(aka-offset-into-what-one-
would-assume-was-a-table-of-jump-targets-)based, making the advantage
of ITS' early version of dynamic linking even more useful.

More recently, ISTR seeing Mark complain about Unix reliability
and scalability, especially vis-a-vis NFS (locking and all that),
perhaps on USENET.

Looking at what Dan's done with qmail, he seems to have seen at least
some of the same problems (or, call them "tradeoffs" if that offends
you) and taken some similarly critical (by which I mean objective,
not necessarily gratuitously-insulting) view of various commonly-accepted
components of the general Unix programming model and methodology as
has Mark.

(My vague impression is that Mark has addressed some of the problems
by promoting other OSes and/or developing his own, while Dan has
developed more robust new, and new versions of existing, Unix tools
so as to improve its foundation.  In the abstract, I appreciate both
approaches, and assume each of their efforts in these directions has
been at least somewhat beneficial, in terms of education if not
actualy product deployment.)

So, while I can accept that Mark and Dan don't like each other (and
won't comment on why I don't find that surprising on either side of
the equation, since lots of people don't like me either ;-), and while
I wouldn't be surprised to find that they have fundamental disagreements
about what's right or wrong about the predominant Unix-etc. software-
development model, it seems to me they might well have points of strong
*and* important agreement, which I'd someday like to explore.

This has become more important to me since I "discovered" qmail and
then went to Linux Expo and learned about things like the IBM Jikes
compiler.  The upshot has been that I've pretty much decided to learn
to write (useful) software *correctly*, or not at all (maybe do music
or other stuff instead).  That's the thrust behind my recent decision
to stop working on g77 or GCC for that matter -- there are plenty of
people (as I discovered at Linux Expo) very willing and enthusiastic
to step in and contribute to the GNU/Linux pool of software, for
which fundamental (vs. distribute/collect-bug-reports/debug/fix/repeat)
correctness is barely in the top five of goals, much less #1.  So
the lack of *willing* developers of open-source software is no longer
the problem (if it ever was) -- it's the lack of developers of *high-
quality* open-source software, or, more precisely, the acceptability
of low-quality solutions, that seems to be the problem.

Problem is, I have basically no background in writing correct software,
or even in determining its correctness.  (I don't count code reviews
as anything other than modestly helpful components of these activities.
I've had a thorough code-review done of one of my products by two
more seasoned professionals.  It was a great experience, but, after
all their concerns were addressed, there were still bugs.  Sure, not
"too many" according to management, but too many that I'm sure could
have been designed out of existence up front if I'd known how.)

Oh, sure, like so many of us, I've managed to write lots of software that,
after sufficient debugging, testing, and tweaking, worked well enough
that many people claimed it was "correct", when really it was just "good
enough".  But I mean it's still a revealing science to me how to write
software so that, from the outset, it's highly likely to work correctly
without a lengthy debugging process.  (Think of what it's taken to get
qmail to not have security holes vs. sendmail.  I'd like to be able to
write something, e.g., an optimizing compiler, meeting qmail's performance
in this area as compared to GCC meeting sendmail's.  Imagine an open-
source compiler with the #1 goal of never generating incorrect code,
#2 of never crashing, #3 of making the code run fast, or similar, with
stuff like supporting whatever extension Linux programmers want today
further down the list than those three.)

Though I've sometimes disagreed with, or doubted, claims by various
"enthusiasts" like Mark and Dan over the years, I've usually kept
their claims, or at least the salient points as I've seen them, in
mind while observing how I've done my software (also tech-writing) work
to see if they had a point after all.  Often enough, I've gained
new insight on an old claim while working on a program, especially
when fixing a bug (maybe the Nth bug of its "class", a class distinction
some enthusiast's "old" point helped me to make), and realized that
they *did* have a valid point.  (My perspective on good language
design, for example, has changed vastly.  What people say makes C++
and Perl such good languages today is exactly the sort of stuff I
said was important back in the late '70s.  Now I know how wrong it
is, when it comes to writing correct software.)

The upshot is, I have a lot more to learn than I ever believed I did
back when I took my first full-time job programming in 1978, and I
see in products like qmail and people like Dan some indication of the
sort of stuff I have to learn, understand, and be able to either put
into practice or recognize people who can (in case I want to hire
them).

Especially if I undertake some long-cherished plans to research and
design (from scratch if necessary) a new OS that could actually be
useful for more than just 10 years and more than just a few million
people, I had better understand some of the fundamental aspects of
OS design that contribute to, or perhaps hinder, the development
and deployment of quality applications.

Some salient summaries of the issues Dan and others, like Mark, have
wrestled with, like reliability, scalability, portability, security,
and so on, could be valuable resources to someone like myself, who'd
like to at least "take a breather" (in the midst of what seems like
pell-mell writing of marginally acceptable code worldwide) and figure
out how it *should* be done before we resume adding to the problem(s).

(Yes, I'm likely to collect pointers to such documents on my own
web page, if I don't find such a resource on someone else's first,
or until I do find such a site.)

        tq vm, (burley)

P.S. Just so it's clear, I'm focusing on writing correct *open-source*
software.  That's at least plausibly a very different task than writing
some sorts of closed-source, especially one-shot, software.  It requires
writing software that is difficult for newcomers to the *source code*
to misunderstand sufficiently that they think they know what they're doing
when they change it, and in doing so inadvertently break it.  It's
impossible to meet that standard 100%, even in isolation, and clearly
it's more important to have the software work when deployed *today* than
to be sure some fool won't break tomorrow's version during development.
But I want to stress that I'm not talking about building conceptually
wonderful, crystalline monoliths that shatter the first time someone
else comes along and modifies them.  In fact, I'm talking about building
software in such a way that the "crystalline" portions of it are nearly
as small as possible, though exactly what that means, I still don't
entirely know.

As a trivial example, which only approximates a real-world one, I'll
"pick on" qmail.  Assuming it was open-source (guess it isn't, but
nevermind).  I see discussions of the bounce message -- "can I change it?"
In my view, the code that *produces* the bounce message should be
distinct from a *description* of an acceptable bounce message.  Part
of building qmail should include checking the bounce message that'll
be produced against that description, and rejecting (just as if there
was a compile-time error) the code if there's a mismatch.  Maybe that's
even how it's designed, I don't know, and, in isolation, it's probably
not tremendously important.  (Commentary in the code is a simple, if not
100% reliable, way to address such problems.  It's reliability can still
exceed that of mechanical solutions such as I've proposed, especially
if the code-development infrastructure doesn't directly support and
endorse their use.)

But I've noticed that when I take such issues more seriously in designing
software, then, when other people work on it, the results of their failure
to completely grasp what I was doing and, say, forget to change all three
source files to reflect something new they're adding, is more and more
likely to be up-front, rather than way-down-stream, failure.  (BTW, "other
people" include myself, an hour, a day, a week, even years, later.  I no
longer take any enjoyment writing software with any unique architectural
concepts that I have to memorize to continue working on it.)

And that saves not only lots of time for developers, it can save end
users the trouble of tracking down apparent bugs on their end only to
find they're "upstream" bugs.

What I have in mind, though, is much more comprehensive than such
cross-checking.  That's just an illustration of what I mean by
"crystalline" -- you change one thing and not another, and the system
(as a whole, through deployment) breaks or even shatters.  If the
system doesn't even let you *deploy* the (errant) change in the first
place, and especially if it appropriately leads you to better understand
what you're doing, the system as a whole should, at least in theory,
end up being more robust.
Crispin v. Bernstein (was Re: Maildir format)

Reply via email to