Re: Knuth's literate programming "tangle" function in Clojure

Tim Daly Thu, 06 Jan 2011 20:07:40 -0800


On 1/6/2011 9:15 PM, Ken Wesson wrote:

I've now had a quick look at this too.

Aside from the grammatical and spelling nitty-gritty, there are some
larger scale concerns, ones that unfortunately go to the heart of the
entire "literate programming" concept.

Namely, at the start it jumps very quickly into a red-black tree
implementation without really motivating it.

Actually the document was released because it contained a complete
working version of Clojure so it was the initial starting point.

I started on the Red Black tree section so I could show an example
of how to reorganize the code presentation so that it fit the
presentation in text rather than fitting the requirements of Java.

Red Black trees are mentioned in one of Rich's lectures as an
implementation detail so I decided to try and reverse engineer
their use.

Clearly there needs to be some discussion about Persistence and
Okasaki and Rich's research references motivating why they were
chosen.

I did this with the section on Bit-partitioned hash tries.
That version has not been published yet.

Now, it mentions wanting a binary search tree, and that a red-black
tree maintains certain invariants that limit how unbalanced it can
get, but the explicit motivation for this (avoiding the linear
worst-case performance of a fully unbalanced tree in favor of a tree
that gives guaranteed logarithmic bounds on its operations, other than
the obviously-linear-best-case full traversal of course).

More significantly, there's no explanation for introducing a binary
search tree in the first place -- and mention of Java but no direct
motivation for not using java.util.TreeMap or similar instead of
rolling our own. There's a brief mention of immutability and
persistence, without the connecting thoughts -- namely we want clojure
to have an immutable sorted-map type, and while we could use a factory
function that spits out (Collections/unmodifiableMap (doto (TreeMap.)
#(doseq [[k v] kvs] (.put % k v)))), every assoc would require copying
the entire map, in linear time and space, etc. etc. etc., vs. using a
"persistent" data structure specifically designed for structure
sharing with modified versions.

And then of course the overarching purpose of Clojure itself isn't
even given, let alone why it has sorted-map, and why it's an immutable
sorted-map ...

There is very little that needs to be said about the overarching
purpose of Clojure for the expected target audience. Since this
includes detailed source code in two languages it is reasonable to
assume that it is being read by a programmer and, more specifically,
it is being read by a programmer who is interested in the details
of Clojure's implementation.

I know, I know, tall order, right? I hope you're at least thinking
about this stuff for the eventual final version, though. (And why
start with the red-black tree, anyway? After introducing Clojure I'd
probably start with getting a minimalist implementation that is
runnable first

The included implementation is runnable. I was not advocating
literate program at the introduction of Clojure 1.0 so the first
version is 1.3 which is not minimalist in any sense.

  -- so the compiler and evaluator and repl and
s-expressions, lists and any other infrastructure these need under the
hood, and whatever else is needed for this -- some of the numerics
stuff, symbols and symbol interning, etc.; then with already a basic
Lisp environment in play, adding the other functions and data
structures in core that weren't needed just to get things up and
running with a basically functional repl.

I wish Clojure was "factored" in terms of the implementation
to suit the I2I (ideas-to-implementation) approach but it isn't.
The literate goal here isn't to rewrite or refactor it but to
explain how the ideas are realized in code. A different goal
would be to factor Clojure in layers but that's someone else's book.

  Maybe sorted-map (or a
persistent immutable binary search tree implemented in Java) is one of
the things that is needed to get the compiler working, but if so, I'd
think a truly literate version is going to explain what Clojure is,
and then how we're gonna implement it, and then details of the planned
compiler implementation, and why this needs a persistent immutable
binary search tree, and how that will also be useful later for making
the user-visible sorted-map and sorted-set features, and THEN get down
into the weeds of how we make a persistent immutable binary search
tree, how we can avoid the linear worst-time behavior of unbalanced
trees by ensuring the tree never gets too far out of balance (and even
what "balanced" means in this context), and that red-black trees are
among several kinds that have such properties, and then why red-black
trees were chosen over, say, two-three trees, etc.

Of course, you might need to ask Rich some questions to get some of
this information.

Rich obviously knows the answers to these questions but he is
a valuable and highly constrained resource. It would be nice if
he provided some motivating material but I think that we, as programmers,
can figure out how bit-partitioning works and how the code implements
it. Every new developer needs to do that to make changes. Literate
source code would decrease the time it takes to get up to speed.
So while it would be nice if Rich contributed I think he has better
things to do.

  It's pretty clear how a persistent immutable
balanced search tree is needed to make sorted-map and sorted-set work
well and in a pure-functional way; it may or may not be obvious on
close examination that a red-black tree is a superior choice over the
alternatives; maybe it isn't and Rich's choice was more or less
arbitrary, or based on what he already knew how to implement or had a
reference handy for; or Rich tried several alternatives and found that
he could make a Java implementation of a persistent, immutable
red-black tree faster than any of the others, or better at
structure-sharing; to really be sure, unless it really does become
obvious on looking into the code itself why, you'd have to ask Rich
why he chose red-black trees for this job. And there will probably be
many other similar points in the code where there's an implementation
choice that may not be obvious without Rich's input. And all of them
will need some explanation, if the program is to be truly literate --
even if there turn out to be cases where that explanation will be
something like "tests showed that X and Y were equally good
alternatives for implementing this, so Rich just flipped a coin and it
came up tails so he used Y". :)


Rich has provided a lot of discussion about his motivations
in his videos. There was a discussion a while back about providing
this same information in textual form and as a result of that
I ended up advocating literate programming.... which led to this.

I am hoping that showing an example of literate Clojure will
motivate others to pick a section (e.g. agents) and work through
the details. Once a section is written it can benefit from the
"many eyes" of open source to improve the style and accuracy
of presentation. In that way we all end up developing a document
that quickly brings up the average quality material for developers.

Tim Daly

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Knuth's literate programming "tangle" function in Clojure

Reply via email to