On Wed, 5 Apr 2023 09:19:57 GMT, Viktor Klang <d...@openjdk.org> wrote:

> Adds overrides for common Map operations to avoid having to allocate the 
> entrySet for Collectors$Partition maps.

I don't think there is any useful relationship between this work at 
SequencedCollections / SequencedMap. The return type of this collector is 
simply `Map<...>` so the type information won't be visible and the new methods 
won't be visible either.

It would be useful to examine actual code to see how this is used. SourceGraph 
to the rescue:

https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ajava+Collectors.partitioningBy&patternType=standard&sm=1&groupBy=repo

I looked at most of the results on the first page of hits, probably about 20-30 
or so. Almost every single one was followed by a `map.get(true)` or 
`map.get(false)`, sometimes immediately, sometimes later in the method, or 
sometimes in a caller which accepted the partition map as a return value. (The 
only exception was Google Error Prone, which iterated over the map's keys not 
using its `keySet()` but using a for-each loop over `ImmutableList.of(false, 
true)` or `ImmutableList.of(true, false)` and then getting the value, which 
seemed kind of ... overkill to me.)

Anyway what this tells me is that `get()` is the most important thing to 
optimize, and filling out things like the various Map views is probably wasted 
effort.

As an aside, it's interesting to look at how code uses `partitioningBy`. For 
example, I saw some graph algorithm that was processing edges based on whether 
the edge was incoming or outgoing. The code looked something like this:

    Map<Boolean, List<Edge>> partitionedEdges = 
edges.stream()....collect(partitioningBy(Edge::isOutgoing));
    for (var edge : partitionedEdges.get(Boolean.TRUE)) { // outgoing edges
        ... many lines of code ...
    }
    ... many lines of code ...
    for (var edge : partitionedEdges.get(Boolean.FALSE)) { // incoming edges
        ...
    }

(I note that lots of code used `Boolean.TRUE` and `Boolean.FALSE`, presumably 
to avoid boxing overhead, which is small, because it doesn't allocate anything, 
but which is a method call instead of a field load.)

Anyway though my conclusion is that having a boolean map is a lousy way to 
represent the return value. If you have a map, which elements are under true 
and which are false, if there's no obvious correlation between true/false and 
the property you're partitioning by, such as incoming/outgoing? Maybe you'd 
want something like this:

    enum Direction { INCOMING, OUTGOING }

and then you could do a `groupingBy` to get a `Map<Direction, List<Edge>>` but 
dealing with a map is still kind of clunky.

Maybe what you want is a record instead!

    record EdgesByDirection(List<Edge> incoming, List<Edge> outgoing) { }

OK, enough digression. If anybody is going to put effort in this area, I think 
it would be better to explore alternative directions such as the above instead 
of optimizing the "map" that's returned by the `partitioningBy` collector.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13347#issuecomment-1502548705

Reply via email to