If you forget to implement RandomAccess, you'll get a very subtle performance bug.
Just because it's easy if you know how to do it doesn't make it easy for a random developer to do correctly. On Wed, May 15, 2024 at 3:21 PM Mateusz Romanowski < romanowski.mate...@gmail.com> wrote: > Hi, > I would say it is not worth any effort. > > One can easily write an ad-hoc *local* adapter extending > `java.util.AbstractList`.. > .. and immediately invoke existing `java.util.Collections::binarySearch` > method. > > Cheers, > MR > > On Thu, Apr 25, 2024 at 9:09 PM Pavel Rappo <pavel.ra...@oracle.com> > wrote: > >> >> >> > On 25 Apr 2024, at 19:41, David Lloyd <david.ll...@redhat.com> wrote: >> > >> > The JDK contains a few collection- and array-oriented implementations >> of binary search. For the most part, they all work the same way: search the >> target "thing" by index using the obvious bisection strategy, returning >> either /index/ or /-index-1/ depending on whether it was found at the end >> of the search. >> > >> > However, if you're doing a binary search on a thing that is not a list >> or an array, you have two choices: try to make the thing you're searching >> on implement List (often infeasible) or write your own binary search. >> > >> > I'm really tired of writing my own binary search. I've probably done it >> 50 times by now, each one using a slightly different access and/or compare >> strategy, and every time is a new chance to typo something or get something >> backwards, leading to irritating debugging sessions and higher dentist >> bills. >> >> Can we safely say that it sets your teeth on edge? >> >> > It got me to thinking that it wouldn't be too hard to make a "general" >> binary search which can search on anything, so that's what I did. I was >> thinking that it might be interesting to try adding this into the JDK >> somehow. >> > >> > This implementation is more or less what I now copy & paste to >> different projects at the moment: >> > >> > public static <C, T> int binarySearch(C collection, int start, int >> end, T key, Comparator<? super T> cmp, IntGetter<C, T> getter) { >> > int low = start; >> > int high = end - 1; >> > while (low <= high) { >> > int mid = low + high >>> 1; >> > int res = cmp.compare(getter.get(collection, mid), key); >> > if (res < 0) { >> > low = mid + 1; >> > } else if (res > 0) { >> > high = mid - 1; >> > } else { >> > return mid; >> > } >> > } >> > return -low - 1; >> > } >> > // (Plus variations for `Comparable` keys and long indices) >> > >> > A key problem with this approach is that in the JDK, there is no >> `ObjIntFunction<T, R>` or `ObjLongFunction<T, R>` which would be necessary >> to implement the "getter" portion of the algorithm (despite the existence >> of the analogous `ObjIntConsumer<T>` and `ObjLongConsumer<T>`). So, I also >> have to copy/paste a `IntGetter<T, R>`/`LongGetter<T, R>` as well, which is >> annoying. >> > >> > A slightly less-good approach is for the general `binarySearch` method >> to accept a `IntFunction<T>`/`LongFunction<T>`, and drop the `C collection` >> parameter, like this: >> > >> > public static <T> int binarySearch(int start, int end, T key, >> Comparator<? super T> cmp, IntFunction<T> getter) { ... } >> > >> > In this case, we can use the function types that the JDK already >> provides, but we would very likely have to also create a capturing lambda >> (e.g. `myList::get` instead of `List::get`). Maybe this isn't that bad of a >> compromise. >> > >> > It would be possible to replace the existing `binarySearch` >> implementations with delegating calls to a generalized implementation. For >> `Collections`, the indexed version uses `List::get` and the iterator >> version uses a helper lambda to move the iterator and get the result. For >> arrays, a lambda would be provided which gets the corresponding array >> element. If the non-capturing variant is used, then (on paper at least) >> this version should perform similarly to the existing implementations, with >> less code needed overall. However, if a capturing lambda is required (due >> to the aforementioned lack of `ObjXFunction`), then this could be slightly >> worse-performing than the existing implementation due to the construction >> (and maybe dispatch) overhead of the lambda. Some real-world benchmarks >> would have to be written with various-sized data sets. >> > >> > It would also be possible to produce primitive variations which operate >> on int, float, long, and double values, using existing functions if >> capturing is deemed "OK". It is also possible to produce a variation which >> uses a `long` for the index, for huge data sets (for example, very large >> mapped files using `MemorySegment`). >> > >> > Also unclear is: where would it live? `Collections`? Somewhere else? >> > >> > Any thoughts/opinions would be appreciated (even if they are along the >> lines of "it's not worth the effort"). Particularly, any insight would be >> appreciated as to whether or not this kind of hypothetical enhancement >> would warrant a JEP (I wouldn't think so, but I'm no expert at such >> assessments). >> > >> > -- >> > - DML • he/him >> >> Have a look at this recently filed issue: >> https://bugs.openjdk.org/browse/JDK-8326330 >> >> -Pavel >> >> >> -- Louis Wasserman (he/they)