Hi Viktor, Thanks for the response. I concur with all of your points below - nice summary/analysis. Looking forward to this new use case.
-Archie On Mon, Sep 23, 2024 at 10:38 AM Viktor Klang <viktor.kl...@oracle.com> wrote: > Hi Archie! > > >I recently encountered a real-world use case for Stream Gatherers and > thought I'd mention it (apologies if you've seen these ideas already). > These might even be worth considering including as standard offerings. > > No need to preface your email with an apology 🙂 I'm happy to get > thoughts, feedback, and ideas! > > You have a pretty interesting use-case, and performing a "sub-group > reorder" is an interesting use-case. > Pulling on that string a bit more, it would seem like its primary > use-cases would lie in scenarios where you're consuming things like data > files (think CSVs etc where there's a predefined order/grouping to the > entries) or consuming things like database results, would that be a fair > characterization? > > Interestingly, one of the exercises I went through during the design of > Gatherer was to reimplement the java.util.stream.Stream interface only > using Gatherer and Collector, so I took a stab at things like sorted() etc. > As is common in the situation of operating on top of pre-ordered data, > parallelization tends to be costly to implement (either by having to forego > it, or by needing to perform much of the operation in the finisher after > aggregating elements in the integrator). > > Quickly thinking about the operation, I presume you'd need to be able to > track the current "primary group", collect elements into something like an > ArrayList, and upon switching to a new primary group, you sort the list > according to the secondary group, emit all elements for as long as > Downstream::push allows you to, clear the list, enqueue the new element > which had the new primary group. Then in the finisher you perform the same > operation, presuming that that's the signal that there's no more elements > in the last primary group. > > But I think you can relax the requirement that the primary group must be > Comparable, you only need to have a BiPredicate and test (prev, next) for > sameness, and if you get false, you presume that `next` belongs to a new > primary group. But since you're going to sort the group on the secondary, > that'll need to be comparable. > > Cheers, > √ > > > *Viktor Klang* > Software Architect, Java Platform Group > Oracle > > ------------------------------ > *From:* Archie Cobbs <archie.co...@gmail.com> > *Sent:* Friday, 20 September 2024 19:20 > *To:* Viktor Klang <viktor.kl...@oracle.com> > *Cc:* core-libs-dev@openjdk.org <core-libs-dev@openjdk.org> > *Subject:* [External] : Re: Stream Gatherers (JEP 473) feedback > > Hi Viktor, > > I recently encountered a real-world use case for Stream Gatherers and > thought I'd mention it (apologies if you've seen these ideas already). > These might even be worth considering including as standard offerings. > > Motivating example: Suppose you have a large list of people sorted by > LastName. Your goal is to print the list ordered by LastName, then by > FirstName. The list is too large to re-sort the whole thing. > > Instead, you only need to "fixup" the ordering, by re-sorting just the > groups of people with the same last name. > > If you happen to know that there are no more than 100 people with the > exact same last nam, then one option would be a Gatherer that re-sorts a > stream, but only up to some maximum window size (if two elements were out > of order and separated by more than the window size, then sorted output > would no longer be guaranteed). > > In the above example, you would set the window size to 100 and it might > look something like this: > > listOfPeople.stream() // sorted by lastName only > .gather(Gatherers.windowSort(100, > Comparator.comparing(Person::lastName).thenComparing(Person::firstName))); > > Another (probably better) option would be a "secondary sort" Gatherer. > This would take an already sorted stream and then apply a secondary sort. > It would collect items having the same primary sort key (however many there > were), and then re-sort those groups all at once using the secondary key: > > listOfPeople.stream() // sorted by lastName only > .gather(Gatherers.secondarySort(Comparator.comparing(Person::lastName), > Comparator.comparing(Person::firstName))); > > This kind of thing is common when querying a database that doesn't have > the required composite index corresponding to a desired composite sort, > e.g., there is an index on LastName and an index on FirstName, but no > composite index on LastName+FirstName, etc. > > -- Archie L. Cobbs