Hi Viktor Thanks for bearing with me.
> [...] are you suggesting that we add a section in the Gatherer (and > Collector) in the form of "Implementor Notes" that are a bit more high-level > than the specification? Yes, exactly, explicitly mentioning reusability (I actually read through the Javadoc at the time, but didn't connect the dots) and providing guidance/recommending alternatives for use cases where one might be tempted to write a non-reusable Gatherer (I think these use cases can be summarized as: combining a Stream with another Stream, where both Streams might be infinite). Kind regards, Anthony September 23, 2024 at 5:19 PM, "Viktor Klang" <viktor.kl...@oracle.com> wrote: > > Hi Anthony, > > >The idea is to collect feedback, to see how many people report their > >Gatherers being broken (i.e. their Gatherers being non-compliant without > >realizing it), so enforcing it in `Stream::gather` is sufficient for this > >purpose. > > Even if this is well-intentioned, my experience tells me that this feedback > will not materialize, and trying to provoke conformance at runtime will have > a noticeable performance impact not encumbering other intermediate > operations, especially for processing the bulk majority of streams (which > tend to be less than 10 elements in size). > > >This hasn't come up before because it requires people (a) to read the > >Javadoc, (b) to connect the dots and conclude "thus, a Gatherer must be > >reusable", and (c) to be willing to invest their time in asking the > >question, rather than moving on since their Gatherers "just work". > > Adding clarifications to the Javadoc may be the most balanced path forward, > in doing so we're talking updating the documentation for both Gatherer and > Collector. > > >Not sure I understand this argument? I'd argue that increasing those odds > >would be done by allowing an additional category of Gatherers, not by > >prohibiting it? > > No, that would be "moving the goalposts" i.e. making Gatherers specified more > loosely. Developers will write the code that they write, but if something > isn't behaving as expected, it is important to know which side to debug—the > library or the user code. > > > I've written a `concat` Gatherer being blissfully unaware that it was not > >compliant, others have written non-reusable Gatherers as well: they exist > >and things like `concat` and `zip` are natural/intuitive use cases for > >Gatherers. > > The ability for developers to implement interfaces (knowingly or unknowingly) > in non-spec-conforming ways aside, are you suggesting that we add a section > in the Gatherer (and Collector) in the form of "Implementor Notes" that are a > bit more high-level than the specification? > > Cheers, > > √ > > **Viktor Klang** > Software Architect, Java Platform Group > > Oracle > > ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ > > **From:** Anthony Vanelverdinghe <d...@anthonyv.be> > **Sent:** Thursday, 19 September 2024 20:57 > **To:** Viktor Klang <viktor.kl...@oracle.com>; core-libs-dev@openjdk.org > <core-libs-dev@openjdk.org> > **Subject:** [External] : Re: Stream Gatherers (JEP 473) feedback > > > Hi Viktor > > > Alas, there's no place where this could be enforced, users could have their > > own implementations of Stream (so cannot be enforced in Stream::gather). > > The idea is to collect feedback, to see how many people report their > Gatherers being broken (i.e. their Gatherers being non-compliant without > realizing it), so enforcing it in `Stream::gather` is sufficient for this > purpose. > > This hasn't come up before because it requires people (a) to read the > Javadoc, (b) to connect the dots and conclude "thus, a Gatherer must be > reusable", and (c) to be willing to invest their time in asking the question, > rather than moving on since their Gatherers "just work". > > > java.util.stream.Stream does not explain the rationale for why it is > > single-use, Collector does not explain why they are reusable, why would > > Gatherers be held to a different standard? > > For `Stream` the package Javadoc has statements like "No storage. A stream is > not a data structure" and "Possibly unbounded.", which is sufficient > rationale to me. > > For `Collector`, unless I'm missing something, it does not actually specify > that it must be reusable, so it does not have to provide a rationale for it > either. Even if I did miss something and reusability is implied from the > specification: the question would likely never come up, because a Collector > will in practice always be reusable anyway (read: I can't readily think of a > sensible non-reusable Collector). This is unlike Gatherer, where some obvious > use cases such as `concat` and `zip` exist and people like me wonder why such > use cases are, apparently needlessly, prohibited by the Gatherer > specification. > > > Think of it more like increasing the odds that users are given > > spec-conformant Gatherers. > > Not sure I understand this argument? I'd argue that increasing those odds > would be done by allowing an additional category of Gatherers, not by > prohibiting it? I've written a `concat` Gatherer being blissfully unaware > that it was not compliant, others have written non-reusable Gatherers as > well: they exist and things like `concat` and `zip` are natural/intuitive use > cases for Gatherers. Gunnar wrote a blog post > [https://urldefense.com/v3/__https://www.morling.dev/blog/zipping-gatherer/__;!!ACWV5N9M2RV99hQ!KmRJAZ0OMfv5XrDKYFVNTJyVWBah899OR9tdZKUHJB928SXc6VEdT4ni1AHI_lGezKchV9kYO04XUdClsg$ > ] about his `zip` Gatherer saying "Java 22 [...] promises to improve the > situation here." and none of his readers pointed out that his Gatherer is not > compliant either (nor complained that his Gatherer is not reusable). > > Kind regards, Anthony > > September 19, 2024 at 11:30 AM, "Viktor Klang" <viktor.kl...@oracle.com> > wrote: > > > > > > Hi Anthony, > > > > > > Bear with me for a moment, > > > > > > in the same vein as there's nothing which *enforces* equals(…) or > > hashCode() to be conformant to their specs, or any interface-implementation > > for that matter, I don't see how we could make any stronger enforcement of > > Gatherers. > > > > > > >My belief is that the subject of reusability hasn't come up before because > > >non-reusable Gatherers "just work": as long as instances of such Gatherers > > >are not reused, they don't lead to unexpected results or observable > > >differences in behavior. And so people have been implementing non-reusable > > >Gatherers such as `concat` and `zip` without realizing they aren't > > >compliant. Or maybe they did realize it, but didn't see the downside of > > >being non-compliant. > > > > > > Alas, there's no place where this could be enforced, users could have their > > own implementations of Stream (so cannot be enforced in Stream::gather). > > Ultimately, it all boils down to specification—if an equals(…)-method > > implementation leads to surprising behavior when used with a collection, > > one typically needs to first ensure that the equals(…)-method conforms to > > its expected specification before one can presume that the collection has a > > bug. > > > > > > For the "just work"—scenario, one can only make claims about things which > > have been proven. So in this case, what tests have passed for the > > implementation in question? > > > > > > >Which brings me to my next point: in case of (b), the Javadoc and/or JEP > > >should explain the rationale. Even to me it still seems like a needless > > >restriction. > > > java.util.stream.Stream does not explain the rationale for why it is > > single-use, Collector does not explain why they are reusable, why would > > Gatherers be held to a different standard? > > > > > > > "protecting the users from being given non-reusable Gatherers" > > > Think of it more like increasing the odds that users are given > > spec-conformant Gatherers. > > > > > > Cheers, > > > > > > √ > > > > > > **Viktor Klang** > > > Software Architect, Java Platform Group > > > > > > Oracle > > > > > > ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ > > > > > > **From:** Anthony Vanelverdinghe <d...@anthonyv.be> > > > **Sent:** Wednesday, 18 September 2024 18:27 > > > **To:** Viktor Klang <viktor.kl...@oracle.com>; core-libs-dev@openjdk.org > > <core-libs-dev@openjdk.org> > > > **Subject:** [External] : Re: Stream Gatherers (JEP 473) feedback > > > > > > > > > Hi Viktor > > > > > > Let me start with a question: is the requirement (a) "a Gatherer SHOULD be > > reusable", or (b) "a Gatherer MUST be reusable"? > > > > > > As of today the specification says (b), whereas the implementation matches > > (a). > > > > > > In case of (a), I propose to align the specification to allow for > > compliant, non-reusable Gatherers. > > > > > > In case of (b), I propose to align the implementation to enforce > > compliance. Something like: > > > > > > (1) invoke `initializer()` twice, giving `i1` and `i2`. Discard `i1` and > > invoke `i2` twice, giving `state1` and `state2`. > > > > > > (2) invoke `finisher()` twice, giving `f1` and `f2`. Discard `f1` and > > invoke `f2` twice, the first time with `state1` and a dummy Downstream, the > > second time with the actual final state, i.e. `state2` after all elements > > were integrated, and the actual Downstream. > > > > > > Then backport this change to JDK 23 & 22 and/or do another round of preview > > in JDK 24. > > > > > > I'm confident that enforcing compliance would result in significant amounts > > of feedback questioning the requirement. > > > > > > My belief is that the subject of reusability hasn't come up before because > > non-reusable Gatherers "just work": as long as instances of such Gatherers > > are not reused, they don't lead to unexpected results or observable > > differences in behavior. And so people have been implementing non-reusable > > Gatherers such as `concat` and `zip` without realizing they aren't > > compliant. Or maybe they did realize it, but didn't see the downside of > > being non-compliant. > > > > > > Which brings me to my next point: in case of (b), the Javadoc and/or JEP > > should explain the rationale. Even to me it still seems like a needless > > restriction. You say: > > > > > > > And I think the worst of all worlds would be a scenario where you, as a > > > user, are given a Gatherer<X,Y,Z> and you have no idea whether you can > > > re-use it or not. > > > > > > so I'd guess the rationale is "protecting the users from being given > > non-reusable Gatherers". > > > > > > However, I can't readily think of a situation where this would be essential. > > > > > > If a user creates a Gatherer by invoking a factory method, the factory > > method can specify whether its result is reusable. > > > > > > And if a user is given a Gatherer as a method argument, and they need the > > Gatherer to be reusable, they could change the parameter to a > > `Supplier<Gatherer>` instead. > > > > > > > >In a previous response you proposed using `Gatherer > > > >concat(Supplier<Stream<T>>)` instead, but then I'd just pass `() -> > > > >aStream`, wonder why the parameter isn't just a `Stream<T>`, and the > > > >Gatherer would still not be reusable. > > > > > > > > > > > > > > There's a very important, to me, difference between the two. In the > > > Stream-case, there exists 0 reusable usages. For the > > > Supplier<Stream>-case the implementation does not restrict re-usability, > > > but rather it is up to the caller to actively opt-out of reusability > > > (which could of course also be declared to be undefined behavior of the > > > implementor of said Gatherer). Local non-reusability decided by the > > > caller > Global non-reusability decided by the callee. > > > > > > We agree, just that I'd provide 2 factory methods, `concat(Stream<T>)` > > (non-reusable) and `append(List<T>)` (reusable), whereas you'd provide a > > 2-in-1 `concat(Supplier<Stream<T>>)`. > > > > > > Kind regards, Anthony > > > > > > September 12, 2024 at 11:55 PM, "Viktor Klang" <viktor.kl...@oracle.com> > > wrote: > > > > > > > > > > > > > > Hi Anthony > > > > > > > > > > > > > > Great questions! I had typed up a long response when my email client > > > decided the email was too large, crashed, and deleted my draft, so I'll > > > try to recreate what I wrote from memory. > > > > > > > > > > > > > > >While I understand that most Gatherers will be reusable, and that it's a > > > >desirable characteristic, surely there will also be non-reusable > > > >Gatherers? > > > > > > > > > > > > > > To me, this is governed by the following parts of the Gatherer > > > specification > > > https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html > > > : > > > > > > > > > > > > > > "Each invocation of initializer() > > > https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#initializer() > > > > > > ,integrator()https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#integrator() > > > > > > ,combiner()https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#combiner() > > > , and > > > finisher()https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.html#finisher() > > > must return a semantically identical result." > > > > > > > > > > > > > > and > > > > > > > > > > > > > > "Implementations of Gatherer must not capture, retain, or expose to other > > > threads, the references to the state instance, or the > > > downstreamGatherer.Downstreamhttps://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.Downstream.html > > > > > > PREVIEWhttps://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/util/stream/Gatherer.Downstream.html#preview-java.util.stream.Gatherer.Downstream > > > for longer than the invocation duration of the method which they are > > > passed to." > > > > > > > > > > > > > > And I think the worst of all worlds would be a scenario where you, as a > > > user, are given a Gatherer<X,Y,Z> and you have no idea whether you can > > > re-use it or not. > > > > > > > > > > > > > > For Stream, the assumption is that they are NOT reusable at all. > > > > > > > For Gatherer, I think the only reasonable assumption is that they are > > > reusable. > > > > > > > > > > > > > > >In particular, any Gatherer that is the result of a factory method with > > > >a `Stream<T>` parameter which supports infinite Streams, will be > > > >non-reusable, won't it? > > > > > > > > > > > > > > Not necessarily, if the factory method **consumes** the Stream and > > > creates a stable result which is reusable, then the resulting Gatherer is > > > reusable. > > > > > > > > > > > > > > >In a previous response you proposed using `Gatherer > > > >concat(Supplier<Stream<T>>)` instead, but then I'd just pass `() -> > > > >aStream`, wonder why the parameter isn't just a `Stream<T>`, and the > > > >Gatherer would still not be reusable. > > > > > > > > > > > > > > There's a very important, to me, difference between the two. In the > > > Stream-case, there exists 0 reusable usages. For the > > > Supplier<Stream>-case the implementation does not restrict re-usability, > > > but rather it is up to the caller to actively opt-out of reusability > > > (which could of course also be declared to be undefined behavior of the > > > implementor of said Gatherer). Local non-reusability decided by the > > > caller > Global non-reusability decided by the callee. > > > > > > > > > > > > > > >As another example, take Gunnar Morling's zip Gatherers: > > > > > > > > > > > > > > I don't see how Gatherers like this could be made reusable, or why that > > > would even be desirable. > > > > > > > > > > > > > > Having been R&D-ing in the Stream-space more than a decade, I'm convinced > > > that there's no universally safe way to implement `zip` for push-style > > > stream designs. I'm happy to be proven wrong though, as that would open > > > up some interesting possibilities for things like Stream::iterator() and > > > Stream:spliterator(). > > > > > > > > > > > > > > >My use case was about a pipeline where the concatenation comes somewhere > > > >in the middle of the pipeline. > > > > > > > > > > > > > > My apologies, I misunderstood. To me, the operation you describe is > > > called `inject`. > > > > > > > Given a stable (reusable) source of elements you can definitely implement > > > Gatherers which do before, during, or after-injections of elements to a > > > stream. > > > > > > > > > > > > > > Thanks again for the great questions and conversation, it's valuable! > > > > > > > Cheers, > > > > > > > > > > > > > > √ > > > > > > > > > > > > > > **Viktor Klang** > > > > > > > Software Architect, Java Platform Group > > > > > > > > > > > > > > Oracle > > > > > > > > > > >