Hi Viktor,

thanks for your clarifications.

I agree that from a performance point of view, there isn't all that much to be 
gained. I thought more about parallelizing distinctBy and windowSliding, 
Perhaps one can squeeze out a modest gain, but I am not excited by the 
potential.

AFAIK, the "greedy/short-circuiting" decision point doesn't have a major impact 
on performance either. Or am I wrong there?

In my mind, given that performance is not worth more than maybe a tweak, this 
amplifies my first issue with the surface API.

I started out thinking that almost nobody is going to write a gatherer, so why worry? But 
I found myself writing a couple in the last few days. And I wonder whether the current 
API is at "peak complexity".

If I use the factory methods, I have to make a choice between of/ofGreedy and 
ofSequential/of.

And if I don't use the factory methods, I have to mess with a marker interface 
or a method yielding a magic default.

Is there some virtuous collapse?

First off, I think factory methods should be the favored approach. And "of" 
should be the safe choice. That's why I would rename ofSequential into of, and introduce 
ofParallel for optimization. Like with of/ofGreedy.

I have seen some gatherer implementations that don't use the factory methods, 
even though they could have. Is this flexibility useful? The details are fussy, 
with the marker interface and the magic default combiner. That's where I 
thought the characteristics approach is a better API. It has precedence, and it 
is unfussy.

I realize it's not a big deal, and I was going to let it slide. Until I heard Brian's 
Devoxx talk where he mentioned "peak complexity", and I felt, that's, in a 
small way, what is present here.

Cheers,

Cay

--

(Moving this to core-libs-dev)

Cay,

Regarding 1, Characteristics was a part of the Gatherers-contract for a very 
long time, alas it didn't end up worth its cost. There's a longer discussion on 
the topic here: 
https://mail.openjdk.org/pipermail/core-libs-dev/2024-January/118138.html (and 
I'm sure that there were more, but that's the one which comes to mind)

Regarding 2, I did have a prototype which had a Downstream in the Combiner, but 
created a new dimension of complexity which made it even harder for people to 
move from sequential to parallelizable. The door isn't closed on it, but I 
remain unconvinced it's worth the surface area for performance reasons.

As a bit of a side-note, it's worth knowing that in the reference stream implementation, it is not unusual that 
parallel-capable stages are executed as "islands" which means that short-circuiting signals cannot travel 
across those islands. Since parallel-capable Gatherers can be fused to execute in the same "island" if we get 
to a place where "all" intermediate operations are parallel-capable Gatherers, there'd be a single end-to-end 
"island" and hence the ability to propagate the short-circuiting would be preserved in all modes of 
execution. Also worth knowing that a `gather(…)` immediately followed by a `collect(…)` can also be fused to run 
together.

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle

________________________________
From: jdk-dev <jdk-dev-retn at openjdk.org> on behalf of Cay Horstmann 
<cay.horstmann at gmail.com>
Sent: Friday, 4 October 2024 19:58
To: jdk-dev at openjdk.org <jdk-dev at openjdk.org>
Subject: Re: New candidate JEP: 485: Stream Gatherers

Hi, I have some belated questions about the design choices in this API that I 
could not find addressed in the JEP.

1. Why aren't characteristics used to express "greediness/short-circuiting" or 
"sequentialness/parallelizability"?

I understand that for the former I use ofGreedy/of, or implement 
Gatherers.Integrator.Greedy/Gatherers.Integrator. And for the latter, I use 
ofSequential/of, or, if I implement the Gatherer interface, have the combiner 
return defaultCombiner() or not.

But it seems a bit complex and less familiar than the characteristics mechanism 
that exists for spliterators, streams, and collectors.

The original design document (https://cr.openjdk.org/~vklang/Gatherers.html) 
used characteristics, so I wonder what motivated the change.

2. Why wasn't the combiner() designed to allow pushing of elements to the end 
of the first range's sink? Then distinctBy could be parallelized without 
buffering the elements. More generally, with some state fiddling one can then 
handle the elements around range splits.

As it is, I don't see how to parallelize such computations other than to buffer 
all elements.

I looked at the project at https://github.com/jhspetersson/packrat that 
implements a number of gatherers. Only one uses a combiner, to join buffers 
until their contents can be pushed in the finisher.

Cheers,

Cay
--

Cay S. Horstmann | https://horstmann.com

Reply via email to