Re: java.util.stream.Stream: API for user-extensible intermediate operations

Viktor Klang Thu, 29 Jun 2023 06:25:29 -0700


Over the past 6+ months I've been thinking about, and tinkering with, how we'd 
be able to expose a user-facing API for extensible intermediate 
java.util.stream.Stream operations―a feature envisioned all the way back when 
Streams were created.


I'm now at a point where I have a viable design and implementation, and so I'm 
turning to you for your feedback: on the direction taken; the API concepts; 
and, in particular, is there anything which I have overlooked/missed?

>I think this API is overly generic and hard to reason about it, for users and 
>IDEs.

The API is for all intents and purposes Collector with a boolean return type 
for the accumulator and an added downstream handle parameter added to the 
accumulator and the finisher.

>The main issue is that the same API is used for both stateless and stateful 
>operations, which means that as a user, we have no idea if a call to 
>stream.gather() is stateful or not.

How is this different from any of the other pre-existing Stream operations?

Most of the operation are stateless, only a handful of well known operations 
are stateful, all other stateful operations are done by collectors.
But stream.gather() allows both stateless and stateful gatherer.
I believe that instead of having an intermediary operation that can be 
stateless or stateful, it's better to have a Collector that starts a new stream.

Instead of
  stream.gather(Gatherers.foo())

A collector/gatherer can propagate the elements into a new stream
  stream.collect(Gatherers.foo(stream -> ...))
 or
  stream.collect(Gatherers.foo(), stream -> ...)

This allows a better control on the parallelization (both streams are 
independant) and a clear path to retrofit the Collectors as Gatherers (instead 
of having two too similar APIs side by side).

There is no difference in how parallelization would be possible to 
occur―parallel Stream already runs multi-stage parallel evaluation when 
stateful stages exist in the same pipeline.



>Which is a departure from the current API that cleanly separate stateless and 
>staful operations. Here, we are left in the dark. In a sense, this API is too 
>powerful, it can do too much thing, so as a user we can not reason about it.

A Gatherer encodes it input and output types, in what sense would that not be 
enough to reason about it ?

The initial idea of the API is to have almost all intermediary operations to be 
stateless so by default, we know that the complexity of the stream is linear,  
it will parallelize quite well, etc.
Once you have an intermediary operation like a gatherer, all this "good" 
property are null and void.

There are quite a few stateful operations on streams since a long time―the 
slicing operations, the distinct operations, the sorting operations, the limit 
operation, the while-operations etc. My bet is that most developers using 
streams will not know from a glance which ones of those are to be considered 
stateful and has an impact on evaluation―and personally I think that is a great 
thing, as it is an implementation detail.



>I like the idea of a Collector 2.0 i.e. using the Gatherer API at the end of 
>the stream (not in the middle), but currently, the Gatherer API is not a 
>Collector, so we now have two different APIs for doing partially the same job. 
>I wonder if the Collector API can be retroffitted to act as a Gatherer API, 
>avoiding to have to choose which one to use, a gatherer being the equivalent 
>of a "flat-collector" + short-circuit.

Collector serves a very important role of being able to get information out of 
a Stream and deliver that information in a certain shape, a Gatherer does not 
provide any facility for this.
A collector can get information out of a Stream into a new one, at that point 
you have something quite similar to a Gatherer.
It sounds like you're describing Gatherer―it's a Collector like construct which 
can output into a new stream.


>The idea of unsupportedCombiner() seems out of place, like a patch to be able 
>to clobble different things together. I'm not sure to understand why it's 
>needed for a Gatherer, and why it is not needed for Collectors ?

Nothing prevents us from treating a `null` combiner the same way. My primary 
reason for making it a dedicated thing was to be able to differentiate a 
possible bug (user passing in a null reference inadvertently) from explicitly 
stating that a combiner does not exist from this operation.

unsupportedCombiner() as an artifact can be completely hidden if desired, as 
Gatherer.of() can have permutations without specifying a combiner, and the 
default method of Gatherer.combiner() could return unsupportedCombiner(). I 
opted not to do this initially, because I felt like being explicit about not 
having a combiner means that it is a concious decision by the implementor of 
the Gatherer.
My question is more, why do we need this unsupportedCombiner on a Gatherer and 
not on a Collector ?
We can definitely investigate adding that to Collector as well, but it is 
outside of the scope of this proposal as it deals with intermediate, not 
terminal, operations.



>So I would prefer that API to extends the current Collector API but not the 
>intermediary operations. Yes, it's less powerful.
It means that instead of using one stream with a collect like operation in the 
middle, users will have to use two streams, one after the other, but it makes 
the code easier to understand (also having two streams give users better 
control on which part should be in parallel).

That would be something completely different from the goal of providing 
user-extensible intermediate operations, which is something which this proposal 
is explicitly trying to address.
User-extensible intermediate operations or a gatherer as a better collector 
have similar semantics, so it's not a completely different.
The difference in semantics between Gatherer and Collector is outlined in the 
initial post of this document. With that said, I think your position has been 
made clear―you prefer augmenting the terminal operation rather than introducing 
an intermediate operation.

Cheers,
√

regards,
Rémi

>Rémi


(If you, like myself, prefer reading pre-rendered markdown, click 
here<https://cr.openjdk.org/~vklang/Gatherers.html>)

Re: java.util.stream.Stream: API for user-extensible intermediate operations

Reply via email to