:>
:> He didn't say this until after the situation had started to degrade.
:>
:> Besides, he's right. 3.x has serious problems.
:
:All running software has serious problems, that's why it is never considered
:done. Taking the time to enumerate specific problems that are currently
:plaguing an installation is the only way anyone can possibly hope to help.
:Problems reports of "It don't work" are helpful to absolutely noone.
This simply isn't true. I have written plenty of software (large
projects) that do not have serious problems and, in fact, some do not
have any known problems at all. I have written several operating systems
and one of them is least as complex as the FreeBSD core (but not as
complex if you count drivers) which are bug-free (that is, there have
been no recorded crashes and except for feature updates have never been
rebooted).
FreeBSD can become 'bug free' insofar as it is possible to become bug
free. You have to believe that it can happen or it won't. I believe
it can -- my personal goal for the project is to make the core bug free
and uncrashable (and here I mean only with a network and disk driver,
and not all the other drivers out there which would be an impossible
task). Since I've actually *written* bug-free and uncrashable OS cores
I am confident that it is possible to do with FreeBSD.
Many of the issues relating to FreeBSD's instability and the many bugs
in the core have nothing to do with continuing development work
per-say, but instead has to do with an attitude that allows major
pollution to be introduced into the code to optimize very specific
cases (which destabilizes the source at the same time), and the lack of
proper documention within the source code. It is precisely these two
things which I have concentrated on the most - by rewriting where
necessary, generalizing optimizations (and ripping quite a few out of
the VM system entirely), and documenting the hell out of any procedure
I modify with succinct comments.
There are two good examples of code pollution and, needless to say, they
have been responsible for a huge number of bugs over the years. Hundreds
of bugs at least. The first example is all the VM hacking that was
done to accomodate partial cache instantiation and, most noteably,
partial byte-range writes for NFS. So far this year I have managed to
rip about half of those hacks out at relatively little cost (a few
esoteric NFS write cases will be slower is all and buffer cache writing
is slightly slower due to the extra system process, but hopefully made up
by the move to an O(1) algorithm (previously an O(N^2) algorithm).
The second example is the VFS layer implemenation and, most especially,
VOP_LOOKUP(). VOP_LOOKUP() has caused no end of trouble but the VFS layer
implementation with all of its locking assumptions and return requirements
has made filesystem design problematic at best. There is enormous
complexity in the lookup, directory scanning, VFS cache code that hides
bugs and that could be removed with a rewrite.
In general, it is possible to fix these problems but some of those fixes
require significant rewriting. You have to be willing to rewrite and
take your lumps up front or you may be faced with a situation where
new problems are found with a subsystem for years to come. The best
example of this in my case is the getnewbuf() code. The code was
originally optimized with so many 'hacks' that it created at least half
a dozen serious bugs in the system. When I first rewrote it I encountered
a huge amount of resistance from certain people who believed (wrongly) that
rewriting would create more bugs then it fixed. While a few bugs were
introduced (that's the 'taking your lumps part), the generalization of
the code made finding and fixing them much, much easier and this will
ultimately lead to a better track record down the road.
I applaud the removal of dead code that has been going on, though I have
major problems with the way some of it has been gone about. Compared
to what some committers have been doing recently, the dead code removal
that Alan and I had done to the VM system earlier in the year was a walk
in the part. I am dead set against 'hiding' bugs by trying to cache
around them instead of fixing them, which is essentially the category
in which I put most of the recent changes to procfs and /bin/ps.
It may seem counter-productive, but in order to fix bugs and make the
system stable we actually need to cause the bugs to come to light
more quickly and in a manner that is so blazingly obvious that we can
fix them more quickly. Hence the reason for putting KASSERT()'s all
throughout the VM system (which led to the discovery that VM pages were
being put on the cache queue while still dirty and led to a fix for
a serious filesystem corruption bug, amoung other things). When I did
that some people screamed at me because they thought it would make the
system unstable, but how many panics have we ever seen from it?
I am happy to see other people start to do the same thing.
So, I think it *IS* possible to make FreeBSD sufficiently bug-free that
people become 'surprised' when they are able to crash a box running it.
-Matt
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message