Re: Enhanced Primitive Support

Tim Daly Tue, 22 Jun 2010 10:13:17 -0700

We could always take the FORTRAN approach and make
identifiers that start with I through N default to
contagious behavior :-)


Rich Hickey wrote:

On Jun 22, 2010, at 12:44 AM, Mark Engelberg wrote:
The new uber-loop is fantastic.

So I guess the main point still to be finalized is whether the default
arithmetic ops will auto-promote or error when long addition
overflows.

Playing around with the latest equals branch:

user=> (def n 9223372036854775810)
#'user/n

user=> (* (/ n 3) 3)
9223372036854775810N

user=> (* (/ n 2) 2)
java.lang.ArithmeticException: integer overflow

user=> (def x (/ n 4))
#'user/x

user=> (+ x x x x)
9223372036854775810N

user=> (+ (+ x x) (+ x x))
java.lang.ArithmeticException: integer overflow

user=> (range (- n 2) n)
(9223372036854775808N 9223372036854775809N)

user=> (range (- n 3) n)
java.lang.ArithmeticException: integer overflow


I understand exactly why some of these work and some of these don't.
My main point here is to illustrate that without the full numeric
tower supported by the default ops, there can certainly be some
surprises.  There is a "pathway" with the standard ops from longs to
rational numbers to bigints, but you can't cross directly from longs
to bigints.  Similarly, you can roundtrip from bigints to rationals
and back, but can't roundtrip from bigints to longs and back.  So the
results of computations depends on the path you follow.  Maybe we can
live with these idiosyncrasies for the speed benefits, but this is
worth being aware of.

The range example is something that is easily enough fixed.  Probably
if this branch becomes the standard, range should be modified so that
if the *upper bound* is a bigint, inc' is used rather than inc to
generate the range.  But I think this illustrates the kinds of issues
we're headed towards, regardless of which default is chosen -- people
who write libraries will be choosing between overflow and auto-boxing
primitives, and it might not always be clear from documentation what
the consequences are.  In the current implementation of range, it
works perfectly fine with longs, and it works perfectly fine with
bigints, but it breaks when your lower and upper bounds cross the
boundary.  This is exactly the kind of thing that might not be thought
of when making test cases, so errors like this could lurk for quite a
while without being spotted.

The people on the side of overflow-error-as-default feel that these
sorts of runtime errors are no more problematic than the many other
sorts of runtime errors that can result in Clojure, such as an
out-of-bounds exception when accessing a vector.  But I see these
types of errors as very different.  An out-of-bounds exception is easy
enough to prevent -- there is a simple test you can include in your
code to make sure your index is in bounds before you access your
vector.  But I think it's much harder to determine in advance whether
a sequence of computations will "cross the long boundary" for all the
possible inputs.

This is probably the main reason I continue to advocate for
auto-promoting ops as the default.  Error-upon-overflow adds an
element of run-time risk, and requires careful thought and additional
testing to achieve the same level of reliability.  I *want*
error-upon-overflow operations to be slightly harder to use so that
library writers will use them judiciously and consciously, and be very
aware of the extra effort they need to go to test their functions for
all numbers, clearly documenting any restrictions on the kinds of
numbers that are permitted.

Like I said before, Clojure's built-in range can easily be adjusted to
work well for speed *and* handle both longs and bigints gracefully.
But it serves as a good example of how the two defaults will affect
library writers.  If auto-promotion is the default, most library
writers will just use the +,*,-,inc,dec operators and it would work
for all numbers right out of the box.  A library writer who wants to
optimize for speed would have to go to a bit of extra effort to add
the apostrophes, and would hopefully at that point give some careful
thought as to what the consequences will be, catching the fact that
this will break ranges that span from longs to bigints, and adjusting
the code accordingly.  On the other hand, if overflow-on-error is the
default, this is what most people will use, and we'll end up with a
lot of code that breaks when crossing the long boundary.

I don't use any bigints, or anything even close to overflowing a long,
in the kind of code that I write for work.  If error-upon-overflow
wins as the default, I'll gain performance benefits with no immediate
downside.  But ultimately, I feel that anything that helps me reason
about and trust my code, and helps me trust the robustness of code
written by others that I rely upon, is a principle worth fighting for.
It is precisely for these reasons that the first iteration was basedupon BigInteger contagion. Everyone has to understand this is not ablack and white, preference based decision. It is a multi-dimensionalproblem involving promotion, reduction, contagion, loop types,literals, boxing types, interop, equality and more.
For instance, you can't keep the benefits of 'uber loop' and move thedefault to auto-promoting ops, because auto-promoting ops cannotreturn primitives. There is no free lunch in this.
It would be easy, and is likely, to have bigint literals require the Nsuffix, in that way there is no hiding the use of bigints as you have.
The issues with contagion are as you mentioned previously -performance with smaller numbers and equality. The former can beaddressed with a better bigint, the latter only partially addressed byreturning to equivalence based equality. Equivalence-based equality isstill in play, and impacts contagion, as then (= 42 42N) can be true.More subtly, it impacts the choice of box type for longs-and-smaller.Right now they are 'packed', using Integers when they fit, but thiswill probably hurt us when escape analysis is in full swing, as itputs a branch in the boxing process and bifurcates the resulting types(Integer and Long). However, always boxing to Long (including oninterop boundaries), while yielding highly-consistent box types, sansbigints, opens up a possible box type mismatch when someone in Java says:
someClojureFn.invoke(42); //autobox is Integer
With equivalence based equality, that is less of a problem. But thereis still a tradeoff regarding keys, the algorithm of the hostrequiring .equals() for map keys and set members. There are alsohashCode divergences between Longs and BigIntegers of the equivalentvalue.
BigInteger contagion would make all but the range case above work justfine.
-----------
The claim that this primitive stuff is just for numeric-intensiveapplications is outrageous and false, and ignores the implementationof Clojure itself to an embarrassing degree. I've worked my tail offto reduce the number of allocations inside things like sequences etcto the absolute minimum. Now down to 1 per step, and with chunks 1/32per step. Moving from 1 to 2 or 3 per step would result in a 2x to 3xslowdown for every consumer of these fns.
Everyone has to realize the math you are advocating for the default,on non-tagged architectures like the JVM and CLR, *must* do anallocation on every +/-/* etc operation. And such ops are litteredthroughout non-numeric data structure code, for indexes, offsets,bounds etc. Allocating on every math op in something like thepersistent vector would make it completely unusably slow.
The languages being pointed to (Ruby, Python, Mathematica) write theirhard bits in C, or, for the J versions, Java.
Well, guess what, all of the things I've written in my career havebeen hard bits. Those languages were unusable for any of my productionwork, for performance reasons. I wrote Clojure so I could stop writingJava and C#, not Ruby, because I couldn't have used Ruby in the firstplace.
And, I think some people are considering moving from Ruby to Clojure,for the hard bits of their systems, in part *because* Clojure has abetter performance profile, their alternatives being Java or Scala,not Python or Groovy.
Now you're all sitting on the end of the food chain, eating gazellesand saying, 'who needs photosynthesis to be easy'?
;-)
I do. And so do you if you appreciate fast gazelles. You wouldn't beable to use a Clojure written in the default Clojure you are advocating.
Rich


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Enhanced Primitive Support

Reply via email to