Re: Insertion - The clojure way

Ken Wesson Wed, 29 Dec 2010 19:16:50 -0800

On Wed, Dec 29, 2010 at 8:13 PM, Mark Engelberg
<mark.engelb...@gmail.com> wrote:
> 2.  Your new insertion function is a reasonable way to get rid of the
> laziness that breaks the function for large inputs.
> Another, slightly clearer and faster way:
> (defn insert [n alon]
>  (loop [alon alon, result []]
>    (cond
>     (empty? alon) (conj result n)
>     (<= n (first alon)) (into result (cons n alon))
>     :else (recur (rest alon) (conj result (first alon))))))


I don't agree that this is necessarily clearer; shorter, yes, but
that's not always the same thing.

> (Also, as you point out, the above could be further sped up with transients).

Can conj be sped up with transients, without some hint as to the
vector's eventual size? I know repeated assoces on a vector are
significantly sped up by using transients.

> However, it should be pointed out that even this latest version
> doesn't share all the performance characteristics of the classic
> functional insertion sort given at the Htdp link.  Specifically, one
> of the (few) great things about insertion sort is that it is blazingly
> fast on input lists that are already sorted, or nearly sorted.  As a
> challenge (to you or Andreas, or anyone who is interested in this
> thread), figure out why the original insertion sort works so quickly
> on sorted lists, why the above code does not, and then try to
> duplicate that performance characteristic with an insertion sort
> algorithm in Clojure.

I'd think the key thing to making it performant for almost-sorted data
is to have a bidirectional linked list with a cursor in it; when
adding a new element you'd move from the current position left or
right until the comparison sign changed and then splice a new node in.
With sorted data it would boil down to just appending nodes at the end
of the list in O(n) and with almost-sorted data there'd be the
occasional excursion back into the list, then forwards again to the
tail.

A big difficulty with this in Clojure is that its seqs don't support
efficient backward traversal while its vectors don't support efficient
in-place insertion. Using a mutable java.util.LinkedList as a
"transient" inside the insertion-sort function makes it easy enough to
do this; the insertion operation starts with a ListIterator, sees if
the element to add is greater-equal or lesser, and then starts walking
the Iterator one way until the sign changes or one end of the list is
hit, then inserts there. Obviously the innards are not functional, but
if the LinkedList is copied into a Clojure list or vector before being
returned, the sort itself can be functional.

In theory this form of insertion sort could also be parallelized; the
concurrent operations on the linked list are all insertions. For this
you'd want to use Clojure refs holding explicit list nodes rather than
a java.util collection. I might try knocking something like this
together later. (Of course, mergesort is eminently parallelizable
after the first pivot is chosen and before the final merge. The final
merge might be, hypothetically, also parallelized with 2 threads
working toward the middle from each end, as well, given again a
bidirectional linked list as the underlying data structure. Naturally,
larger numbers of threads than 2 could only be involved after the
first several pivot selections and before the last several merges.)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Insertion - The clojure way

Reply via email to