Re: RFR: JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview)

Viktor Klang Wed, 08 Nov 2023 07:43:00 -0800

On Sun, 5 Nov 2023 17:55:39 GMT, Tagir F. Valeev <tval...@openjdk.org> wrote:


>> This is a Draft PR for [JEP-461](https://openjdk.org/jeps/461)
>
> Very solid work, thank you! See some minor comments inline. I actually have 
> much more ideas of specific gatherers, but they could be discussed separately.
> 
> One thing that bothers me is that Gatherer is similar to Collector, yet the 
> APIs are totally different, so users may be confused. I like the Gatherer API 
> much more and I see that evolving current Collector API might be hard, given 
> tons of third-party implementations, so I don't see a good way to fix this. 
> But probably we should provide Collector-to-Gatherer adapter (producing 
> one-element stream)? And in general, it would be nice to use 
> exactly-one-element-producing-gatherers as terminal operations, without 
> explicit findFirst().get().

Thanks for the feedback @amaembo! I hope you find the updates reasonable :)

> src/java.base/share/classes/java/util/stream/AbstractPipeline.java line 191:
> 
>> 189:      * existing pipeline.
>> 190:      *
>> 191:      * The previous stage must be unlinked and unconsumed.
> 
> Note that linebreak in Javadoc is normally ignored. It's probably better to 
> add an explicit `<p>`. This way it will be rendered better in IDE 
> documentation view.

@amaembo That's true, but given that this is internal API which won't be a part 
of the JDK Javadoc I think it is safe to address this at a later point. 👍

> src/java.base/share/classes/java/util/stream/Gatherer.java line 99:
> 
>> 97:  * public static <T, R> Gatherer<T, ?, R> map(Function<? super T, ? 
>> extends R> mapper) {
>> 98:  *     return Gatherer.of(
>> 99:  *         (unused, element, downstream) -> // integrator
> 
> As [JEP 456](https://openjdk.org/jeps/456) is already integrated, probably 
> it's reasonable to use `(_, element, downstream) ->` here?

I thought this over for a while, and given that most developers won't be used 
to that feature yet I think it is clearer for the first preview to keep 
`unused` here.

> src/java.base/share/classes/java/util/stream/Gatherer.java line 119:
> 
>> 117:  *
>> 118:  * AS an example, in order to create a gatherer to implement a 
>> sequential
>> 119:  * Prefix Scan as a Gatherer, it could be done the following way:
> 
> Do we need 'Prefix Scan' in title case?

The reason for capitalizing it was that it is a specific term with specific 
meaning. Capitalizing it makes it stand out and let people do their own 
research if they want/need to learn more about Prefix Scans.

> src/java.base/share/classes/java/util/stream/Gatherer.java line 308:
> 
>> 306:      */
>> 307:     static <T, R> Gatherer<T, Void, R> ofSequential(
>> 308:             Integrator<Void, T, R> integrator) {
> 
> Probably PECS signature `? super T, ? extends R` could be useful here?

I've thought about that, and since we don't use that for the other parameters 
(like `initializer`) I don't think it helps changing it here. And the reason 
this has little effect is that the code calling the factory typically controls 
the components/parameters/return type.

> src/java.base/share/classes/java/util/stream/Gatherers.java line 389:
> 
>> 387:      * @throws IllegalArgumentException when windowSize is less than 1
>> 388:      */
>> 389:     public static <TR> Gatherer<TR, ?, List<TR>> windowSliding(int 
>> windowSize) {
> 
> From my experience, sliding window with windowSize = 2 is the most important 
> and deserves the explicit support (I have such an operation in my StreamEx 
> library and it's used quite often). Please consider adding (name subject to 
> discussions):
> 
> `public static <T, R> Gatherer<T, ?, R> pairMap(BiFunction<T, T, R> 
> functionToApplyToTwoAdjacentElements)`. It could be more optimal, as you only 
> need to store the previous element. Moreover, I believe one can implement a 
> proper combiner to make it parallel-friendly. Could be added separately after 
> this PR is merged. I can contribute it.

@amaembo pairMap should be straight-forward to implement. And regarding 
parallelization I don't think that will necessarily work since it would not be 
able to run incrementally (depth-first).

> src/java.base/share/classes/java/util/stream/Gatherers.java line 403:
> 
>> 401:                 // Integrator
>> 402:                 Integrator.ofGreedy((window, e, downstream) -> {
>> 403:                     window.addLast(e);
> 
> Too bad it will reject null elements in the current implementation. This 
> should be documented, or better fixed. We can use ArrayList and keep track of 
> the current start index, and export slides manually using arraycopy twice and 
>  `listFromTrustedArrayNullsAllowed`. It's only a little bit trickier and 
> should not be slower than the current implementation. I can write it if you 
> don't want to do it by yourself :-)

Took a stab at reimplementing both windowFixed and windowSliding to allow null 
elements in the stream: 
https://github.com/openjdk/jdk/pull/16420/commits/57fea44ce71ae94107f3fb6d7f104d35839371d9

> src/java.base/share/classes/java/util/stream/Gatherers.java line 436:
> 
>> 434:      *     Stream.of(1,2,3,4,5,6,7,8,9)
>> 435:      *           .gather(
>> 436:      *               Gatherers.fold(() -> "", (string, number) -> 
>> string + number)
> 
> This particular sample is not great, as it's totally replaceable with 
> reduce() (`map(String::valueOf).reduce(String::concat)`), so users may wonder 
> why using fold instead. I can suggest another sample which I show everywhere:
> `anyList.stream().gather(Gatherers.fold(() -> 1, (acc, e) -> acc * 31 + 
> Objects.hashCode(e))).findFirst().get();`
> (haven't tested, but the idea is like this)
> It computes the hashCode of the list, according to the list specification. 
> Here, the accumulator is inherently non-associative, so if you replace it 
> with `reduce()` and make the stream parallel, it will yield the wrong result. 
> But `Gatherers.fold` will work!
> 
> If you don't like it, we can think up something else, but concatenation 
> sample is certainly not good.

Perhaps the best thing here is to change such that the supplier supplies 
something non-zero like "String: "?

> src/java.base/share/classes/java/util/stream/Gatherers.java line 450:
> 
>> 448:      * @throws NullPointerException if any of the parameters are null
>> 449:      */
>> 450:     public static <T, R> Gatherer<T, ?, R> fold(
> 
> Should it be explicitly named as `foldLeft`? Because `foldRight` is also 
> possible (though will require complete stream buffering). Also see 
> [JDK-8133680](https://bugs.openjdk.org/browse/JDK-8133680) and 
> [JDK-8292845](https://bugs.openjdk.org/browse/JDK-8292845) (probably should 
> be linked to gatherer's issue, and read the comment about naming)

TBH I don't think `foldRight` makes much sense for potentially unbounded 
structures such as Stream. In the case you need it, it might be better to 
export it to a List and then reversing it.

> src/java.base/share/classes/java/util/stream/Gatherers.java line 482:
> 
>> 480:      * @throws NullPointerException if any of the parameters are null
>> 481:      */
>> 482:     public static <T, R> Gatherer<T, ?, R> scan(
> 
> Similarly, probably should be named `scanLeft`?

That's a good question, and here's my thinking—`scanRight` doesn't make any 
sense for a construct which supports unboundedness, so if we were discussing 
*Collections* I'd be more inclined to agree (but it is more likely that 
reversing the collection and then calling scanLeft would be better).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16420#issuecomment-1798197056
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1382925138
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1382927064
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383126657
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383135200
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383597640
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383603310
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383147855
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383605841
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1383151321

Re: RFR: JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview)

Reply via email to