----- Original Message ----- > From: "John R Rose" <jr...@openjdk.org> > To: core-libs-dev@openjdk.org > Sent: Thursday, July 21, 2022 4:12:14 AM > Subject: Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]
> On Tue, 16 Nov 2021 20:53:26 GMT, kabutz <d...@openjdk.org> wrote: > >>> This is a draft proposal for how we could improve stream performance for the >>> case where the streams are empty. Empty collections are common-place. If we >>> iterate over them with an Iterator, we would have to create one small >>> Iterator >>> object (which could often be eliminated) and if it is empty we are done. >>> However, with Streams we first have to build up the entire pipeline, until >>> we >>> realize that there is no work to do. With this example, we change >>> Collection#stream() to first check if the collection is empty, and if it >>> is, we >>> simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream >>> and >>> EmptyDoubleStream. We have taken great care for these to have the same >>> characteristics and behaviour as the streams returned by Stream.empty(), >>> IntStream.empty(), etc. >>> >>> Some of the JDK tests fail with this, due to ClassCastExceptions (our >>> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >>> call >>> some methods repeatedly on the stream without it failing. On the plus side, >>> creating a complex stream on an empty stream gives us upwards of 50x >>> increase >>> in performance due to a much smaller object allocation rate. This PR >>> includes >>> the code for the change, unit tests and also a JMH benchmark to demonstrate >>> the >>> improvement. >> >> kabutz has updated the pull request incrementally with one additional commit >> since the last revision: >> >> Refactored empty stream implementations to reduce duplicate code and >> improved >> unordered() >> Added StreamSupport.empty for primitive spliterators and use that in >> Arrays.stream() > > I agree it’s the “kind of” optimization that would be nice. “Kind of”. > Personally I would be happier to see complexity like this added that would > help a larger class of common streams. > > It’s a harder problem, and I know this is case of “the best is the enemy of > the > good”, but I think a stream which has less content bulk than pipeline phases > (according to some heuristic weighting) might possibly be handled better by > dumping the elements into an Object array and running each phase in sequence > over that array, rather than composing a “net result of all phases” object and > then running it over the few elements. Stream object creation can be reduced, > perhaps, by building the stream around a small internal buffer that collects > pipeline phases (and their lambdas), by side effect. The terminal operation > runs this Rube-Goldberg contraption in an interpretive manner over the > elements. An advantage would arise if the contraption were smaller and > simpler > than a fully-composed stream of today, and the optimizations lost by having an > interpreter instead of a specialized object ness were insignificant due to the > small bulk of the stream source. I don't think it will ever work in real life because there are a lot of streams that only works based on luck and how stream are currently implemented. Last week, when grading a student project, i've seen a stream that can be simplified to Arrays.asList(3, null).stream().map(Object::toString).count() > > ------------- > > PR: https://git.openjdk.org/jdk/pull/6275 Rémi