On Tue, 16 Nov 2021 20:53:26 GMT, kabutz <d...@openjdk.org> wrote:

>> This is a draft proposal for how we could improve stream performance for the 
>> case where the streams are empty. Empty collections are common-place. If we 
>> iterate over them with an Iterator, we would have to create one small 
>> Iterator object (which could often be eliminated) and if it is empty we are 
>> done. However, with Streams we first have to build up the entire pipeline, 
>> until we realize that there is no work to do. With this example, we change 
>> Collection#stream() to first check if the collection is empty, and if it is, 
>> we simply return an EmptyStream. We also have EmptyIntStream, 
>> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to 
>> have the same characteristics and behaviour as the streams returned by 
>> Stream.empty(), IntStream.empty(), etc. 
>> 
>> Some of the JDK tests fail with this, due to ClassCastExceptions (our 
>> EmptyStream is not an AbstractPipeline) and AssertionError, since we can 
>> call some methods repeatedly on the stream without it failing. On the plus 
>> side, creating a complex stream on an empty stream gives us upwards of 50x 
>> increase in performance due to a much smaller object allocation rate. This 
>> PR includes the code for the change, unit tests and also a JMH benchmark to 
>> demonstrate the improvement.
>
> kabutz has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Refactored empty stream implementations to reduce duplicate code and 
> improved unordered()
>   Added StreamSupport.empty for primitive spliterators and use that in 
> Arrays.stream()

I agree it’s the “kind of” optimization that would be nice.  “Kind of”.  
Personally I would be happier to see complexity like this added that would help 
a larger class of common streams.

It’s a harder problem, and I know this is case of “the best is the enemy of the 
good”, but I think a stream which has less content bulk than pipeline phases 
(according to some heuristic weighting) might possibly be handled better by 
dumping the elements into an Object array and running each phase in sequence 
over that array, rather than composing a “net result of all phases” object and 
then running it over the few elements.  Stream object creation can be reduced, 
perhaps, by building the stream around a small internal buffer that collects 
pipeline phases (and their lambdas), by side effect.  The terminal operation 
runs this Rube-Goldberg contraption in an interpretive manner over the 
elements.  An advantage would arise if the contraption were smaller and simpler 
than a fully-composed stream of today, and the optimizations lost by having an 
interpreter instead of a specialized object ness were insignificant due to the 
small bulk of the stream source.

If such an optimization really works, it would automatically handle the 
zero-elements case, but would also cover lots of use cases (in the JDK code 
even) where streams are used for their notational convenience, rather than 
their ability to process many elements efficiently.

I looked at this for a few days, several months ago.  I solved enough problems 
to see that there is a long tail of difficulties in stream implementation!  
(Many are noted in this PR thread.)  I noticed that some pipeline phases can 
expand the “bulk” (flatMap and friends).  Bulk-reducing phases (filter) are not 
a problem, for already-small streams.  (These issues do not arise for truly 
empty streams.)  For expansion there would have to be an option to inflate a 
previously-small stream to use the general paths.  Another issue is avoiding 
running any lambdas until the terminal operation, which means capturing them in 
some orderly fashion.  Again, if a bizarre pipeline structure shows up, 
inflation is an option.  And for truly empty streams some or all of the 
pipeline structure can be just dropped on the floor, as this PR proposes.

In the end, I think the best leverage will come from a completely different set 
of techniques, from whatever allows us to do “loop customization”.  By loop 
customization which I mean the ability to compile the loop in the terminal 
operation separately for each “interestingly distinct” combination of pipeline 
components, in such a way that the loop can constant-fold their lambdas, the 
shape of the stream source, and anything else that affects loop code quality.   
That technique should apply well regardless of “bulk”, since the most complex 
object allocation should happen during specialization, which is a link-time 
operation, instead of every time a stream is created.

Current state of the art uses mega-inlining, which has complex limitations and 
failure modes.  (It utterly fails for parallel streams.)  When we get to 
specialized generics, I hope to take another run at the problem, so that the 
type of each pipeline lambda “feeds” into a specialization syndrome that the 
JVM “sees” distinct from other specializations, and can optimize separately.  
(Making streams be value classes could help also.)  I guess all *that* might be 
“the impossible dream being the enemy of the best”.

Anyway, FWIW…

-------------

PR: https://git.openjdk.org/jdk/pull/6275

Reply via email to