On 10/27/17 1:56 PM, Chris Dennis wrote:
I’m very confused about what was intended with the ‘UNORDERED’ Collector
characteristic. My logical inference was that unordered as a Collector
characteristic meant that the result of this collector applied to any stream
was independent of the order of element delivery. This means that the stream
backend could safely elide any ordered characteristic of the stream driving the
collector and open up more possible execution pathways (parallelism for
example). The javadoc for unordered is two sentences:
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*/
The first sentence makes sense, although it is phrased in a way that also
explains the behavior seen if you flag an order sensitive collector as
unordered. The second sentence is weird, it implies a relation between
collectors, encounter order, and an inherent ordering or iteration order of a
collection object that might be the target of a collection. To me the important
property of a Collection object with regards to ordering of a Collector that
fills it is whether reordering the insertions to the collection can change the
resultant state, for a Set this is clearly true since only the first presented
element that is equal will be stored.
I think you need to be very careful making any assumptions about which of equal
elements will end up in a set. Consider the spec of Set.addAll:
Adds all of the elements in the specified collection to this set
if they're not already present.
Suppose the source collection has several non-identical elements that are equal
to each other. This says nothing about which of them is preserved, and indeed
there is no mention whatsoever about the identity of equal objects. The entire
Set spec is defined in terms of equals, not identity. So I think it's mistaken
to assume that the first of equal but non-identical objects will be stored.
(This is also the reason I changed the specification of Set.copyOf in my recent
draft, to remove the requirement that the first such element be preserved.
Set.addAll and Collectors.toSet make no such stipulation.)
There is also a paragraph in the Collector javadoc:
* <p>For collectors that do not have the {@code UNORDERED} characteristic,
* two accumulated results {@code a1} and {@code a2} are equivalent if
* {@code finisher.apply(a1).equals(finisher.apply(a2))}. For unordered
* collectors, equivalence is relaxed to allow for non-equality related to
* differences in order. (For example, an unordered collector that accumulated
* elements to a {@code List} would consider two lists equivalent if they
* contained the same elements, ignoring order.)
This seems weird, why would we try to define correctness of a parallel
reduction against a collector that was sensitive to ordering.
This is indeed a bit odd, but it makes sense. You might have a collector that
collects into a List, because you want to collect duplicates. But you might not
actually care about the order; you just want all the elements. The fact that
List stores elements in some order is irrelevant. In this case such a collector
would quite reasonably have the UNORDERED characteristic.
Finally, when investigating the properties of the collectors returned from the
Collectors static factory I was surprised to discover that none of the
collectors that are truly unordered were marked as such (counting, min, max,
averaging, summing, summarizing), and the only collector that was marked as
unordered was Collectors.toSet(), which although it is explicitly marked as
unordered seems like it really shouldn’t be.
Whats going on here? Which parts of all this are intended and which (if any)
are bugs?
Well I think it does make sense for the toSet() collector to be UNORDERED. It
may be the case, though, that other collectors are lacking the UNORDERED
characteristic that they should have. That sounds like it might be a bug.
s'marks
Thanks in advance for any enlightenment,
Chris Dennis
P.S. Coincidentally the unordered toSet() collector works perfectly at the
moment due to the happy interaction of the Spliterator.trySplit() contract and
the folding/combining behavior of the stream implementations parallel reduction
algorithm.