We could always take the FORTRAN approach and make
identifiers that start with I through N default to
contagious behavior :-)
Rich Hickey wrote:
On Jun 22, 2010, at 12:44 AM, Mark Engelberg wrote:
The new uber-loop is fantastic.
So I guess the main point still to be finalized is whether the default
arithmetic ops will auto-promote or error when long addition
overflows.
Playing around with the latest equals branch:
user=> (def n 9223372036854775810)
#'user/n
user=> (* (/ n 3) 3)
9223372036854775810N
user=> (* (/ n 2) 2)
java.lang.ArithmeticException: integer overflow
user=> (def x (/ n 4))
#'user/x
user=> (+ x x x x)
9223372036854775810N
user=> (+ (+ x x) (+ x x))
java.lang.ArithmeticException: integer overflow
user=> (range (- n 2) n)
(9223372036854775808N 9223372036854775809N)
user=> (range (- n 3) n)
java.lang.ArithmeticException: integer overflow
I understand exactly why some of these work and some of these don't.
My main point here is to illustrate that without the full numeric
tower supported by the default ops, there can certainly be some
surprises. There is a "pathway" with the standard ops from longs to
rational numbers to bigints, but you can't cross directly from longs
to bigints. Similarly, you can roundtrip from bigints to rationals
and back, but can't roundtrip from bigints to longs and back. So the
results of computations depends on the path you follow. Maybe we can
live with these idiosyncrasies for the speed benefits, but this is
worth being aware of.
The range example is something that is easily enough fixed. Probably
if this branch becomes the standard, range should be modified so that
if the *upper bound* is a bigint, inc' is used rather than inc to
generate the range. But I think this illustrates the kinds of issues
we're headed towards, regardless of which default is chosen -- people
who write libraries will be choosing between overflow and auto-boxing
primitives, and it might not always be clear from documentation what
the consequences are. In the current implementation of range, it
works perfectly fine with longs, and it works perfectly fine with
bigints, but it breaks when your lower and upper bounds cross the
boundary. This is exactly the kind of thing that might not be thought
of when making test cases, so errors like this could lurk for quite a
while without being spotted.
The people on the side of overflow-error-as-default feel that these
sorts of runtime errors are no more problematic than the many other
sorts of runtime errors that can result in Clojure, such as an
out-of-bounds exception when accessing a vector. But I see these
types of errors as very different. An out-of-bounds exception is easy
enough to prevent -- there is a simple test you can include in your
code to make sure your index is in bounds before you access your
vector. But I think it's much harder to determine in advance whether
a sequence of computations will "cross the long boundary" for all the
possible inputs.
This is probably the main reason I continue to advocate for
auto-promoting ops as the default. Error-upon-overflow adds an
element of run-time risk, and requires careful thought and additional
testing to achieve the same level of reliability. I *want*
error-upon-overflow operations to be slightly harder to use so that
library writers will use them judiciously and consciously, and be very
aware of the extra effort they need to go to test their functions for
all numbers, clearly documenting any restrictions on the kinds of
numbers that are permitted.
Like I said before, Clojure's built-in range can easily be adjusted to
work well for speed *and* handle both longs and bigints gracefully.
But it serves as a good example of how the two defaults will affect
library writers. If auto-promotion is the default, most library
writers will just use the +,*,-,inc,dec operators and it would work
for all numbers right out of the box. A library writer who wants to
optimize for speed would have to go to a bit of extra effort to add
the apostrophes, and would hopefully at that point give some careful
thought as to what the consequences will be, catching the fact that
this will break ranges that span from longs to bigints, and adjusting
the code accordingly. On the other hand, if overflow-on-error is the
default, this is what most people will use, and we'll end up with a
lot of code that breaks when crossing the long boundary.
I don't use any bigints, or anything even close to overflowing a long,
in the kind of code that I write for work. If error-upon-overflow
wins as the default, I'll gain performance benefits with no immediate
downside. But ultimately, I feel that anything that helps me reason
about and trust my code, and helps me trust the robustness of code
written by others that I rely upon, is a principle worth fighting for.
It is precisely for these reasons that the first iteration was based
upon BigInteger contagion. Everyone has to understand this is not a
black and white, preference based decision. It is a multi-dimensional
problem involving promotion, reduction, contagion, loop types,
literals, boxing types, interop, equality and more.
For instance, you can't keep the benefits of 'uber loop' and move the
default to auto-promoting ops, because auto-promoting ops cannot
return primitives. There is no free lunch in this.
It would be easy, and is likely, to have bigint literals require the N
suffix, in that way there is no hiding the use of bigints as you have.
The issues with contagion are as you mentioned previously -
performance with smaller numbers and equality. The former can be
addressed with a better bigint, the latter only partially addressed by
returning to equivalence based equality. Equivalence-based equality is
still in play, and impacts contagion, as then (= 42 42N) can be true.
More subtly, it impacts the choice of box type for longs-and-smaller.
Right now they are 'packed', using Integers when they fit, but this
will probably hurt us when escape analysis is in full swing, as it
puts a branch in the boxing process and bifurcates the resulting types
(Integer and Long). However, always boxing to Long (including on
interop boundaries), while yielding highly-consistent box types, sans
bigints, opens up a possible box type mismatch when someone in Java says:
someClojureFn.invoke(42); //autobox is Integer
With equivalence based equality, that is less of a problem. But there
is still a tradeoff regarding keys, the algorithm of the host
requiring .equals() for map keys and set members. There are also
hashCode divergences between Longs and BigIntegers of the equivalent
value.
BigInteger contagion would make all but the range case above work just
fine.
-----------
The claim that this primitive stuff is just for numeric-intensive
applications is outrageous and false, and ignores the implementation
of Clojure itself to an embarrassing degree. I've worked my tail off
to reduce the number of allocations inside things like sequences etc
to the absolute minimum. Now down to 1 per step, and with chunks 1/32
per step. Moving from 1 to 2 or 3 per step would result in a 2x to 3x
slowdown for every consumer of these fns.
Everyone has to realize the math you are advocating for the default,
on non-tagged architectures like the JVM and CLR, *must* do an
allocation on every +/-/* etc operation. And such ops are littered
throughout non-numeric data structure code, for indexes, offsets,
bounds etc. Allocating on every math op in something like the
persistent vector would make it completely unusably slow.
The languages being pointed to (Ruby, Python, Mathematica) write their
hard bits in C, or, for the J versions, Java.
Well, guess what, all of the things I've written in my career have
been hard bits. Those languages were unusable for any of my production
work, for performance reasons. I wrote Clojure so I could stop writing
Java and C#, not Ruby, because I couldn't have used Ruby in the first
place.
And, I think some people are considering moving from Ruby to Clojure,
for the hard bits of their systems, in part *because* Clojure has a
better performance profile, their alternatives being Java or Scala,
not Python or Groovy.
Now you're all sitting on the end of the food chain, eating gazelles
and saying, 'who needs photosynthesis to be easy'?
;-)
I do. And so do you if you appreciate fast gazelles. You wouldn't be
able to use a Clojure written in the default Clojure you are advocating.
Rich
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en