Hi Archie!

>I recently encountered a real-world use case for Stream Gatherers and thought 
>I'd mention it (apologies if you've seen these ideas already). These might 
>even be worth considering including as standard offerings.

No need to preface your email with an apology 🙂 I'm happy to get thoughts, 
feedback, and ideas!

You have a pretty interesting use-case, and performing a "sub-group reorder" is 
an interesting use-case.
Pulling on that string a bit more, it would seem like its primary use-cases 
would lie in scenarios where you're consuming things like data files (think 
CSVs etc where there's a predefined order/grouping to the entries) or consuming 
things like database results, would that be a fair characterization?

Interestingly, one of the exercises I went through during the design of 
Gatherer was to reimplement the java.util.stream.Stream interface only using 
Gatherer and Collector, so I took a stab at things like sorted() etc. As is 
common in the situation of operating on top of pre-ordered data, 
parallelization tends to be costly to implement (either by having to forego it, 
or by needing to perform much of the operation in the finisher after 
aggregating elements in the integrator).

Quickly thinking about the operation, I presume you'd need to be able to track 
the current "primary group", collect elements into something like an ArrayList, 
and upon switching to a new primary group, you sort the list according to the 
secondary group, emit all elements for as long as Downstream::push allows you 
to, clear the list, enqueue the new element which had the new primary group. 
Then in the finisher you perform the same operation, presuming that that's the 
signal that there's no more elements in the last primary group.

But I think you can relax the requirement that the primary group must be 
Comparable, you only need to have a BiPredicate and test (prev, next) for 
sameness, and if you get false, you presume that `next` belongs to a new 
primary group. But since you're going to sort the group on the secondary, 
that'll need to be comparable.

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle

________________________________
From: Archie Cobbs <archie.co...@gmail.com>
Sent: Friday, 20 September 2024 19:20
To: Viktor Klang <viktor.kl...@oracle.com>
Cc: core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>
Subject: [External] : Re: Stream Gatherers (JEP 473) feedback

Hi Viktor,

I recently encountered a real-world use case for Stream Gatherers and thought 
I'd mention it (apologies if you've seen these ideas already). These might even 
be worth considering including as standard offerings.

Motivating example: Suppose you have a large list of people sorted by LastName. 
Your goal is to print the list ordered by LastName, then by FirstName. The list 
is too large to re-sort the whole thing.

Instead, you only need to "fixup" the ordering, by re-sorting just the groups 
of people with the same last name.

If you happen to know that there are no more than 100 people with the exact 
same last nam, then one option would be a Gatherer that re-sorts a stream, but 
only up to some maximum window size (if two elements were out of order and 
separated by more than the window size, then sorted output would no longer be 
guaranteed).

In the above example, you would set the window size to 100 and it might look 
something like this:

listOfPeople.stream()   // sorted by lastName only
  .gather(Gatherers.windowSort(100, 
Comparator.comparing(Person::lastName).thenComparing(Person::firstName)));

Another (probably better) option would be a "secondary sort" Gatherer. This 
would take an already sorted stream and then apply a secondary sort. It would 
collect items having the same primary sort key (however many there were), and 
then re-sort those groups all at once using the secondary key:

listOfPeople.stream()   // sorted by lastName only
  .gather(Gatherers.secondarySort(Comparator.comparing(Person::lastName), 
Comparator.comparing(Person::firstName)));

This kind of thing is common when querying a database that doesn't have the 
required composite index corresponding to a desired composite sort, e.g., there 
is an index on LastName and an index on FirstName, but no composite index on 
LastName+FirstName, etc.

-Archie

--
Archie L. Cobbs

Reply via email to