On Tue, 11 Apr 2023 01:18:53 GMT, Stuart Marks <sma...@openjdk.org> wrote:
>> Adds overrides for common Map operations to avoid having to allocate the >> entrySet for Collectors$Partition maps. > > I don't think there is any useful relationship between this work at > SequencedCollections / SequencedMap. The return type of this collector is > simply `Map<...>` so the type information won't be visible and the new > methods won't be visible either. > > It would be useful to examine actual code to see how this is used. > SourceGraph to the rescue: > > https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ajava+Collectors.partitioningBy&patternType=standard&sm=1&groupBy=repo > > I looked at most of the results on the first page of hits, probably about > 20-30 or so. Almost every single one was followed by a `map.get(true)` or > `map.get(false)`, sometimes immediately, sometimes later in the method, or > sometimes in a caller which accepted the partition map as a return value. > (The only exception was Google Error Prone, which iterated over the map's > keys not using its `keySet()` but using a for-each loop over > `ImmutableList.of(false, true)` or `ImmutableList.of(true, false)` and then > getting the value, which seemed kind of ... overkill to me.) > > Anyway what this tells me is that `get()` is the most important thing to > optimize, and filling out things like the various Map views is probably > wasted effort. > > As an aside, it's interesting to look at how code uses `partitioningBy`. For > example, I saw some graph algorithm that was processing edges based on > whether the edge was incoming or outgoing. The code looked something like > this: > > Map<Boolean, List<Edge>> partitionedEdges = > edges.stream()....collect(partitioningBy(Edge::isOutgoing)); > for (var edge : partitionedEdges.get(Boolean.TRUE)) { // outgoing edges > ... many lines of code ... > } > ... many lines of code ... > for (var edge : partitionedEdges.get(Boolean.FALSE)) { // incoming edges > ... > } > > (I note that lots of code used `Boolean.TRUE` and `Boolean.FALSE`, presumably > to avoid boxing overhead, which is small, because it doesn't allocate > anything, but which is a method call instead of a field load.) > > Anyway though my conclusion is that having a boolean map is a lousy way to > represent the return value. If you have a map, which elements are under true > and which are false, if there's no obvious correlation between true/false and > the property you're partitioning by, such as incoming/outgoing? Maybe you'd > want something like this: > > enum Direction { INCOMING, OUTGOING } > > and then you could do a `groupingBy` to get a `Map<Direction, List<Edge>>` > but dealing with a map is still kind of clunky. > > Maybe what you want is a record instead! > > record EdgesByDirection(List<Edge> incoming, List<Edge> outgoing) { } > > OK, enough digression. If anybody is going to put effort in this area, I > think it would be better to explore alternative directions such as the above > instead of optimizing the "map" that's returned by the `partitioningBy` > collector. > > -- > > For this change, I'd definitely keep the `get` override. You have to override > `entrySet` because it's abstract in `AbstractMap`, so something like this > would work: > > return Set.of(Map.entry(false, forFalse), Map.entry(true, forTrue)); > > I'm not sure it's worth overriding even `size` and `isEmpty`; those just get > delegated to the `entrySet` anyway (so there is some creation overhead) but > nobody seems to use them anyway. @stuart-marks Yes, it would perhaps make more sense to have a dedicated structure to represent the return value. But as you say, it'd be a different issue altogether. I'll leave it at the proposed changes and hope we can integrate this. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13347#issuecomment-1502924810