On Tue, 11 Apr 2023 01:18:53 GMT, Stuart Marks <sma...@openjdk.org> wrote:

>> Adds overrides for common Map operations to avoid having to allocate the 
>> entrySet for Collectors$Partition maps.
>
> I don't think there is any useful relationship between this work at 
> SequencedCollections / SequencedMap. The return type of this collector is 
> simply `Map<...>` so the type information won't be visible and the new 
> methods won't be visible either.
> 
> It would be useful to examine actual code to see how this is used. 
> SourceGraph to the rescue:
> 
> https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ajava+Collectors.partitioningBy&patternType=standard&sm=1&groupBy=repo
> 
> I looked at most of the results on the first page of hits, probably about 
> 20-30 or so. Almost every single one was followed by a `map.get(true)` or 
> `map.get(false)`, sometimes immediately, sometimes later in the method, or 
> sometimes in a caller which accepted the partition map as a return value. 
> (The only exception was Google Error Prone, which iterated over the map's 
> keys not using its `keySet()` but using a for-each loop over 
> `ImmutableList.of(false, true)` or `ImmutableList.of(true, false)` and then 
> getting the value, which seemed kind of ... overkill to me.)
> 
> Anyway what this tells me is that `get()` is the most important thing to 
> optimize, and filling out things like the various Map views is probably 
> wasted effort.
> 
> As an aside, it's interesting to look at how code uses `partitioningBy`. For 
> example, I saw some graph algorithm that was processing edges based on 
> whether the edge was incoming or outgoing. The code looked something like 
> this:
> 
>     Map<Boolean, List<Edge>> partitionedEdges = 
> edges.stream()....collect(partitioningBy(Edge::isOutgoing));
>     for (var edge : partitionedEdges.get(Boolean.TRUE)) { // outgoing edges
>         ... many lines of code ...
>     }
>     ... many lines of code ...
>     for (var edge : partitionedEdges.get(Boolean.FALSE)) { // incoming edges
>         ...
>     }
> 
> (I note that lots of code used `Boolean.TRUE` and `Boolean.FALSE`, presumably 
> to avoid boxing overhead, which is small, because it doesn't allocate 
> anything, but which is a method call instead of a field load.)
> 
> Anyway though my conclusion is that having a boolean map is a lousy way to 
> represent the return value. If you have a map, which elements are under true 
> and which are false, if there's no obvious correlation between true/false and 
> the property you're partitioning by, such as incoming/outgoing? Maybe you'd 
> want something like this:
> 
>     enum Direction { INCOMING, OUTGOING }
> 
> and then you could do a `groupingBy` to get a `Map<Direction, List<Edge>>` 
> but dealing with a map is still kind of clunky.
> 
> Maybe what you want is a record instead!
> 
>     record EdgesByDirection(List<Edge> incoming, List<Edge> outgoing) { }
> 
> OK, enough digression. If anybody is going to put effort in this area, I 
> think it would be better to explore alternative directions such as the above 
> instead of optimizing the "map" that's returned by the `partitioningBy` 
> collector.
> 
> --
> 
> For this change, I'd definitely keep the `get` override. You have to override 
> `entrySet` because it's abstract in `AbstractMap`, so something like this 
> would work:
> 
>     return Set.of(Map.entry(false, forFalse), Map.entry(true, forTrue));
> 
> I'm not sure it's worth overriding even `size` and `isEmpty`; those just get 
> delegated to the `entrySet` anyway (so there is some creation overhead) but 
> nobody seems to use them anyway.

@stuart-marks Yes, it would perhaps make more sense to have a dedicated 
structure to represent the return value. But as you say, it'd be a different 
issue altogether. I'll leave it at the proposed changes and hope we can 
integrate this. :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13347#issuecomment-1502924810

Reply via email to