On Sun, 5 Nov 2023 17:55:39 GMT, Tagir F. Valeev <tval...@openjdk.org> wrote:
>> This is a Draft PR for [JEP-461](https://openjdk.org/jeps/461) > > Very solid work, thank you! See some minor comments inline. I actually have > much more ideas of specific gatherers, but they could be discussed separately. > > One thing that bothers me is that Gatherer is similar to Collector, yet the > APIs are totally different, so users may be confused. I like the Gatherer API > much more and I see that evolving current Collector API might be hard, given > tons of third-party implementations, so I don't see a good way to fix this. > But probably we should provide Collector-to-Gatherer adapter (producing > one-element stream)? And in general, it would be nice to use > exactly-one-element-producing-gatherers as terminal operations, without > explicit findFirst().get(). Thanks for the feedback @amaembo! I hope you find the updates reasonable :) > src/java.base/share/classes/java/util/stream/AbstractPipeline.java line 191: > >> 189: * existing pipeline. >> 190: * >> 191: * The previous stage must be unlinked and unconsumed. > > Note that linebreak in Javadoc is normally ignored. It's probably better to > add an explicit `<p>`. This way it will be rendered better in IDE > documentation view. @amaembo That's true, but given that this is internal API which won't be a part of the JDK Javadoc I think it is safe to address this at a later point. 👍 > src/java.base/share/classes/java/util/stream/Gatherer.java line 99: > >> 97: * public static <T, R> Gatherer<T, ?, R> map(Function<? super T, ? >> extends R> mapper) { >> 98: * return Gatherer.of( >> 99: * (unused, element, downstream) -> // integrator > > As [JEP 456](https://openjdk.org/jeps/456) is already integrated, probably > it's reasonable to use `(_, element, downstream) ->` here? I thought this over for a while, and given that most developers won't be used to that feature yet I think it is clearer for the first preview to keep `unused` here. > src/java.base/share/classes/java/util/stream/Gatherer.java line 119: > >> 117: * >> 118: * AS an example, in order to create a gatherer to implement a >> sequential >> 119: * Prefix Scan as a Gatherer, it could be done the following way: > > Do we need 'Prefix Scan' in title case? The reason for capitalizing it was that it is a specific term with specific meaning. Capitalizing it makes it stand out and let people do their own research if they want/need to learn more about Prefix Scans. > src/java.base/share/classes/java/util/stream/Gatherer.java line 308: > >> 306: */ >> 307: static <T, R> Gatherer<T, Void, R> ofSequential( >> 308: Integrator<Void, T, R> integrator) { > > Probably PECS signature `? super T, ? extends R` could be useful here? I've thought about that, and since we don't use that for the other parameters (like `initializer`) I don't think it helps changing it here. And the reason this has little effect is that the code calling the factory typically controls the components/parameters/return type. > src/java.base/share/classes/java/util/stream/Gatherers.java line 389: > >> 387: * @throws IllegalArgumentException when windowSize is less than 1 >> 388: */ >> 389: public static <TR> Gatherer<TR, ?, List<TR>> windowSliding(int >> windowSize) { > > From my experience, sliding window with windowSize = 2 is the most important > and deserves the explicit support (I have such an operation in my StreamEx > library and it's used quite often). Please consider adding (name subject to > discussions): > > `public static <T, R> Gatherer<T, ?, R> pairMap(BiFunction<T, T, R> > functionToApplyToTwoAdjacentElements)`. It could be more optimal, as you only > need to store the previous element. Moreover, I believe one can implement a > proper combiner to make it parallel-friendly. Could be added separately after > this PR is merged. I can contribute it. @amaembo pairMap should be straight-forward to implement. And regarding parallelization I don't think that will necessarily work since it would not be able to run incrementally (depth-first). > src/java.base/share/classes/java/util/stream/Gatherers.java line 403: > >> 401: // Integrator >> 402: Integrator.ofGreedy((window, e, downstream) -> { >> 403: window.addLast(e); > > Too bad it will reject null elements in the current implementation. This > should be documented, or better fixed. We can use ArrayList and keep track of > the current start index, and export slides manually using arraycopy twice and > `listFromTrustedArrayNullsAllowed`. It's only a little bit trickier and > should not be slower than the current implementation. I can write it if you > don't want to do it by yourself :-) Took a stab at reimplementing both windowFixed and windowSliding to allow null elements in the stream: https://github.com/openjdk/jdk/pull/16420/commits/57fea44ce71ae94107f3fb6d7f104d35839371d9 > src/java.base/share/classes/java/util/stream/Gatherers.java line 436: > >> 434: * Stream.of(1,2,3,4,5,6,7,8,9) >> 435: * .gather( >> 436: * Gatherers.fold(() -> "", (string, number) -> >> string + number) > > This particular sample is not great, as it's totally replaceable with > reduce() (`map(String::valueOf).reduce(String::concat)`), so users may wonder > why using fold instead. I can suggest another sample which I show everywhere: > `anyList.stream().gather(Gatherers.fold(() -> 1, (acc, e) -> acc * 31 + > Objects.hashCode(e))).findFirst().get();` > (haven't tested, but the idea is like this) > It computes the hashCode of the list, according to the list specification. > Here, the accumulator is inherently non-associative, so if you replace it > with `reduce()` and make the stream parallel, it will yield the wrong result. > But `Gatherers.fold` will work! > > If you don't like it, we can think up something else, but concatenation > sample is certainly not good. Perhaps the best thing here is to change such that the supplier supplies something non-zero like "String: "? > src/java.base/share/classes/java/util/stream/Gatherers.java line 450: > >> 448: * @throws NullPointerException if any of the parameters are null >> 449: */ >> 450: public static <T, R> Gatherer<T, ?, R> fold( > > Should it be explicitly named as `foldLeft`? Because `foldRight` is also > possible (though will require complete stream buffering). Also see > [JDK-8133680](https://bugs.openjdk.org/browse/JDK-8133680) and > [JDK-8292845](https://bugs.openjdk.org/browse/JDK-8292845) (probably should > be linked to gatherer's issue, and read the comment about naming) TBH I don't think `foldRight` makes much sense for potentially unbounded structures such as Stream. In the case you need it, it might be better to export it to a List and then reversing it. > src/java.base/share/classes/java/util/stream/Gatherers.java line 482: > >> 480: * @throws NullPointerException if any of the parameters are null >> 481: */ >> 482: public static <T, R> Gatherer<T, ?, R> scan( > > Similarly, probably should be named `scanLeft`? That's a good question, and here's my thinking—`scanRight` doesn't make any sense for a construct which supports unboundedness, so if we were discussing *Collections* I'd be more inclined to agree (but it is more likely that reversing the collection and then calling scanLeft would be better). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16420#issuecomment-1798197056 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1382925138 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1382927064 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383126657 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383135200 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383597640 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383603310 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383147855 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383605841 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383151321