On Thu, Feb 13, 2025 at 3:06 PM Viktor Klang <viktor.kl...@oracle.com> wrote: > While it may look enticing to merely propagate expected element count as an > input parameter to the supplier function, > I think it deserves some extra thought, specifically if it may make more > sense to pass some sort of StreamInfo type which can provide more metadata in > the future.
I could see that being useful for properties such as non-nullness, which would allow collections such as ImmutableList to skip the null check in the end. > Another open question is how to propagate this information through Gatherers > (i.e. a bigger scope than Collector-augmentation) to enable more > sophisticated optimizations—because ultimately the availability of the > information throughout the pipeline is going to be important for Collector. Do you think that there could be a need to pass stream information to anything other than the Gatherer's state initializer? Based on a cursory glance, it looks straightforward to pass the same info to it as to the Collector. If that's true and we go with a more extensible design than a plain long, Gatherers could be opted in in follow-up work. Best, Fabian > > > Cheers, > √ > > > Viktor Klang > Software Architect, Java Platform Group > Oracle > ________________________________ > From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Fabian > Meumertzheim <fab...@buildbuddy.io> > Sent: Wednesday, 12 February 2025 11:09 > To: core-libs-dev@openjdk.org <core-libs-dev@openjdk.org> > Subject: JDK-8072840: Presizing for Stream Collectors > > As an avid user of Guava's ImmutableCollections, I have been > interested in ways to close the efficiency gap between the built-in > `Stream#toList()` and third-party `Collector` implementations such as > `ImmutableList#toImmutableList()`. I've found the biggest problem to > be the lack of sizing information in `Collector`s, which led to me to > draft a solution to JDK-8072840: > https://github.com/openjdk/jdk/pull/23461 > > The benchmark shows pretty significant gains for sized streams that > mostly reshape data (e.g. slice records or turn a list into a map by > associating keys), which I've found to be a pretty common use case. > > Before I formally send out the PR for review, I would like to gather > feedback on the design aspects of it (rather than the exact > implementation). I will thus leave it in draft mode for now, but > invite anyone to comment on it or on this thread. > > Fabian