Dealing with numeric ranges is error-prone, so I welcome these. Another operation on ranges is membership. This is easy to derive from the clamp methods, I think:
``` checkRange(x,min,max) := clamp(x,min,max) == x ``` That idiom works even for NaNs, which are never in any range. I wonder about having a separate method for this. The idiom does fail for empty ranges, so it’s almost worth making a method that returns false with empty ranges instead of throwing. Probably not, though. But I think there is no harm and real benefit from widening the value parameters, so value is always either long or double. That will save casts (which also cause bugs) for some use cases, such as doing 32x32-to-64-bit arithmetic followed by overflow checks and/or saturation. That’s a common use case for clamping. ``` public static int clamp(long value, int min, int max) public static long clamp(long value, long min, long max) public static double clamp(double value, double min, double max) public static float clamp(double value, float min, float max) ``` I wonder if there is a use case for these methods (either clamp or the range-membership method), with 64-bit overflow detection for complex expressions. The idea is that you compute the expression in two domains, double and long. The double domain correctly tracks magnitude (but not precision) while long tracks 64 bits of precision. At the end, if the double value is in the range representable by longs, you take the long value, with full confidence. ``` long x = …, y = a*x*x + b*x + c; double xf = x, yf = a*xf*xf + b*xf + c; double xc = clamp(xf, Long.MIN_VALUE, Long.MAX_VALUE); boolean saturated = false; if (xc != xf) { // error processing saturated = true; // or throw an exception x = xc; // saturate towards the required magnitude } ``` That use case doesn’t require a membership operation per se, just clamp. I thought about cross-domain clamps, which preserve 64 bit precision and double range at the same time in different places: ``` public static long clamp(long value, double min, double max) public static long clamp(double value, long min, long max) ``` Those are perhaps overly subtle. I think it would be OK for nearly all use cases to cast the min/max values to the appropriate type (same as value) and then call one of the clamp functions originally proposed. Note that (long)(double)Long.MAX_VALUE is the same as Long.MAX_VALUE, and the same is true for MIN_VALUE. So these critical parameters would be processed as the same limits by both the regular and cross-typed overloading. Now for another topic. There is another common range operation which is very often the source of bugs, and I’d like us to consider it for this group. That is the midpoint operation, naively coded as (min+max)/2, and more subtly as min+(max-min)/2 and even more subtly what our friend Mr. Stackoverflow recommends, (min+max)>>>1. All three of these idioms hide bugs, for some values of max or min. The fully correct idiom (for 2s complement numbers) is: ``` midpoint(min, max) := min + ((max - min) >>> 1) ``` The subexpression max-min never overflows, when it is regarded as an unsigned number, and the “>>>1” immediately converts that unsigned number into the correctly signed positive number. That’s the only way to avoid overflow for all inputs. The third idiom above (the “smartest” of the three) will fail, for example, if you feed it negative values. For some uses that is OK, but for others it isn’t. ``` public static int midpoint(int min, int max) public static long midpoint(long min, long max) public static double midpoint(double min, double max) public static float midpoint(float min, float max) ``` All kinds of divide and conquer algorithms need a robust midpoint operation. And many of those are coded wrongly, because affine ranges are hard to reason about. The int and long midpoints are more useful than the floating point ones, because ints and longs are used as indexes into arrays (including Panama native arrays), where subdivision and binary search is common (and very often buggy). More generally, this midpoint operation performs an affine weighted sum within the given numeric range, so one might think that a floating point “alpha” parameter would be interesting. The midpoint operations hardwire alpha=0.5, and they use clever fast fixpoint division to divide by 2. There are applications for using alpha other than 0.5, such as a 10x subdivision of a range using alpha=0.1, 0.2, and so on. I don’t recommend an “affine midpoint” method, because I think the clamp methods combined with floating point math can get the right answers. For example, given a 64-bit range, you can use double arithmetic to compute affine inner points and then clamp them, like this: ``` final double ALPHA = 0.1; //float would be OK too final long min = …, max = …; final double incr = ((double)max - min) * ALPHA; for (int i = 1; i < 10; i++) addPoint(clamp((long)(min + i*incr), min, max)); ``` Doing the grid measurements in double (instead of long) is good enough for most purposes, and doesn’t cause anomalies even if the difference between min and max is large. Using a long for the difference would lead to bugs when max-min cannot be represented as a long. Anyway, the above examples tell me that we don’t need a general affine range splitter if we can use floating point to compute the splits and use clamp to prevent bugs at the range boundaries. But we do need the optimized midpoint operations, because people always insist on them taking a couple of cycles, and then they insist on coding them dangerously. Besides clamp and midpoint is there another operation on affine ranges that is commonly coded, and with bugs? I don’t think so. Range algebra such as union, intersection, difference, convex hull: those come to mind, but I don’t think they are as bug-prone (and as frequent) as clamp and midpoint. Range translation (moving a range to another base point) is subject to overflow, but I think that can be handled by using higher precision and/or overflow checks, and/or clamping. — John