Hi Rémi,

Thanks for your thoughts!

Improving the documentation of Gatherer / Gatherers is definitely something I 
am intending to have inspired by feedback received as people try it out, so 
your thoughts here are most welcome.

For the longest time Gatherer had Characteristics (just like Collector, which 
Gatherer borrows its design from) but in the end it didn't carry its weight and 
tracking the characteristics separately from the actual behavior turned out to 
be a source of both poor performance (since characteristics need to be 
composed) and a source of subtle defects.

The concern around the identity-equals of a lambda is a bit of a non-issue 
since we are comparing instances of the interfaces, and the default values are 
not 
lambdaform:https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L453

PS. Forgetting ofGreedy shouldn't affect semantics, it's just an evaluation 
hint which can result in higher performance.

Cheers,
√

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Remi Forax 
<fo...@univ-mlv.fr>
Sent: Wednesday, 17 January 2024 16:38
To: core-libs-dev <core-libs-...@openjdk.java.net>
Subject: Improving the Gatherer public API

Hello,
i've played quite a lot with the Gatherer API and overall, i quite like how 
things work conceptually.
But i think the public API can be improved.


First, the documentation is not fully clear that a Gatherer has 3 
characteristics,
1/ Is it sequential or parallelizable (can be parallel is ask)
2/ Is it stateless or stateful
3/ Is the integrator greedy or not.

There is also, is there a finisher or not, but this is less important.

When creating a Gatherer, a user can find the right method "Gatherer.of" by 
following this state diagram

    // if sequential
      // if stateless
        // if greedy
        Gatherer.ofSequential(Gatherer.Integrator.ofGreedy(integrator), 
finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(integrator, finisher?)
      // otherwise statefull
        // if greedy
        Gatherer.ofSequential(initializer, 
Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(initializer, integrator, finisher?)
    // otherwise parallelizable
      // if stateless
        // if greedy
        Gatherer.of(Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.of(integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.of(initializer, Gatherer.Integrator.ofGreedy(integrator), 
combiner, finisher)
        // otherwise short-circuit
        Gatherer.of(initializer, integrator, combiner, finisher)

The first issue I stumble several time is that i've forgotten to use 
Integrator.ofGreedy(). I'm not the only one, at least both Viktor and Nicolai 
Parlog had the same issue. It's to easy to forget to call Integrator.ofGreedy().

I think the API should be slightly modified to force the user to make a choice 
between greedy or short-cuircuit.

I'm proposing to add a new parameter for all the factory methods, in front of 
the integrator, forcing the user to make a choice
  enum IntegratorKind {
    GREEDY, SHORT_CIRCUIT
  }

so the state diagram becomes

    // if sequential
      // if stateless
        // if greedy
        Gatherer.ofSequential(GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(SHORT_CIRCUIT, integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.ofSequential(initializer, GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(initializer, SHORT_CIRCUIT, integrator, finisher?)
    // otherwise parallelizable
      // if stateless
        // if greedy
        Gatherer.of(GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.of(SHORT_CIRCUIT, integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.of(initializer, GREEDY, integrator, combiner, finisher)
        // otherwise short-circuit
        Gatherer.of(initializer, SHORT_CIRCUIT, integrator, combiner, finisher)

The second issue is that it's hard to implement a Gatherer that itself need a 
Gatherer as parameter (like andThen/compose). This is due to the fact that 
querying if a Gatherer is SEQUENTIAL, STATELESS or GREEDY is far for obvious. 
Instead of having a method characteristics() like a Collector, the way to query 
the characteristics is very add-hoc, using either the default 
combiner/initializer or instanceof on the integrator.

  SEQUENTIAL: combiner == defaultCombiner
  STATELESS:  initializer == defaultInitializer
  GREEDY: integrator instanceof Greedy

I think the API should be simple if the default combiner/initializer and 
finisher were removed a method characteristics (and a default method 
hasCHaracteristics) were added.
It would be more like a Collector, simpler to user, and the default 
implementation of the combiner/initializer/finisher could be an impleemntation 
detail instead of being part of the API.

As a side note: the exact semantics of == on a lambda is not specified so the 
actual implementation ask users to rely on the way the current OpenJDK 
implementation works.

They are more mails to come on two other issues not fully realated to the 
public API of the Gatherer, so i'm trying to keep the mail short.

regards,
Rémi

Reply via email to