> From: "Viktor Klang" <viktor.kl...@oracle.com> > To: "Remi Forax" <fo...@univ-mlv.fr> > Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" > <psan...@openjdk.java.net> > Sent: Monday, January 22, 2024 4:22:11 PM > Subject: Re: Gatherer: spliterator characteristics are not propagated
> Hi Rémi, Hello, > For instance, stateless is neither recessive nor dominant, since the > composition > of two stateless operations is only ever stateless if they both are greedy as > well: [ > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588 > | > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588 > ] Okay, so choosing SEQUENTIAL vs PARALELLIZABLE is not that important given that the combination is ad-hoc, reflecting the characterristics is enough. > So even if making it represented as ints (more like Spliterator, rather than > Collector) makes things faster, it's still both work to track, propagate, and > also becomes a side-channel that needs to remain in sync with the actual > implementation of the logic. The flags are in sync with the implementation because the only way to create a Gatherer if through the factory methods and those factory methods (and only them) compute the proper combination of SEQUENTIAL | STATELESS | GREEDY. A user should not be able to set those flags. Only the flags KEEP_* are settable by a user. > One could argue that logic such as: someCollection.stream().map(…).count() is > a > performance bug/inefficiency in an of itself as it would be faster to do > someCollection.size(). The stream implementation has a whole mechanism in place to propagate/preverse flags like SIZED, DISTINCT or SORTED. For me, discussing about the merit of this mechanism seems a little off topic. I would prefer the Gatherer to be a good citizen and works seemlessly with the other intermediary operations. > Cheers, > √ regards, Rémi > Viktor Klang > Software Architect, Java Platform Group > Oracle > From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> > Sent: Saturday, 20 January 2024 17:40 > To: Viktor Klang <viktor.kl...@oracle.com> > Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz > <psan...@openjdk.java.net> > Subject: [External] : Re: Gatherer: spliterator characteristics are not > propagated >> From: "Viktor Klang" <viktor.kl...@oracle.com> >> To: "Remi Forax" <fo...@univ-mlv.fr> >> Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" >> <psan...@openjdk.java.net> >> Sent: Thursday, January 18, 2024 5:14:38 PM >> Subject: Re: Gatherer: spliterator characteristics are not propagated >>> And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if >>> both gatherers keep it sorted. >> That is unfortunately not the case. That would presume that you can implement >> the composition such that it can preserve all the common flags. Some flags >> could be "dominant" and some "recessive" to use genetics nomenclature. > Some flags of the stream pipeline are "recessive", mainly SHORT_CIRCUIT, but > not > the characteristics of the Gatherer which can have the corresponding > "dominant" > flag, GREEDY, in this case. > And the same for sequential, the flag should be PARALELIZABLE and not > SEQUENTIAL. > The idea is that the Gatherer characteristics can have the same bit set at the > same position as the stream op flags (as defined by StreamOpFlag). > So KEEP_DISTINCT is in position 0, KEEP_SORTED in in position 2 and KEEP_SIZED > is in position 3. > For GREEDY, we use the same position as SHORT_CIRCUIT and we will flip the bit > (using XOR) when we want to extract the stream op flags from the > characteristics > All the factory methods call the generic of() with a combination of > PARALELIZABLE and STATELESS and the user can adds the characteristics GREEDY, > KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED (otherwise an exception should be > raised). > In StreamOpFlag, there are two unused positions (14 and 15), that's perfect > for > our two new states PARALELIZABLE and STATELESS, so no problem here > (technically > we can also reuse positions of the Spliterator characteristic given that those > flags are masked before being sent to the GathererOp super constructor). > The way to transform a Gatherer characteristics op to a stream flags op is to > flip the bits corresponding to SHORT_CIRCUIT, add the highter bit of all flags > but SHORT-CIRCUIT (because stream op flags are encoded using 2 bits) and mask > to only retain SHORT_CIRCUIT, KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED. > public static int toOpFlags ( int characteristics ) { > return (( characteristics ^ SHORT_CIRCUIT_MASK ) | HIGHER_BITS ) & > STREAM_OP_MASK ; > } > see below for a full script. >>> I suppose that if those flags already exist, it's because they have a >>> purpose >>> and i do not understand how it can make the other operations slower. >> Extra invocations, extra storage, and extra composition overhead is not free. >> Since Stream is one-shot you need to include the construction cost with the >> execution cost. For something like an empty Stream construction cost scan be >> 90+% of the total costs. > If you create a Gatherer, the characteristics is a constant (so the validity > check is removed, it's just a mask and a test) so the result of calling > toOpFlags() is a constant too. > If the factory method is not inlined, the cost is 3 bitwise operations which > is > I believe faster than the "instanceof Greedy" used in the current code. >> Cheers, >> √ > regards, > Rémi > --- > public class GathererFlag { > // cut and paste from StreamOpFlag > /** > * The bit pattern for setting/injecting a flag. > */ > private static final int SET_BITS = 0b01 ; > /** > * The bit pattern for clearing a flag. > */ > private static final int CLEAR_BITS = 0b10 ; > /** > * The bit pattern for preserving a flag. > */ > private static final int PRESERVE_BITS = 0b11 ; > private static int position ( int opFlagSet ) { > return Integer . numberOfTrailingZeros ( opFlagSet ) >> 1 ; > } > private static final int DISTINCT_POSITION = position ( StreamOpFlag . > IS_DISTINCT ); > private static final int SORTED_POSITION = position ( StreamOpFlag . IS_SORTED > ); > private static final int SIZED_POSITION = position ( StreamOpFlag . IS_SIZED > ); > private static final int SHORT_CIRCUIT_POSITION = position ( StreamOpFlag . > IS_SHORT_CIRCUIT ); > private static final int STATELESS_POSITION = 14 ; > private static final int PARELLIZABLE_POSITION = 15 ; > public static final int PARELLIZABLE = SET_BITS << ( PARELLIZABLE_POSITION << > 1 > ); > public static final int STATELESS = SET_BITS << ( STATELESS_POSITION << 1 ); > public static final int GREEDY = SET_BITS << ( SHORT_CIRCUIT_POSITION << 1 ); > public static final int KEEP_DISTINCT = SET_BITS << ( DISTINCT_POSITION << 1 > ); > public static final int KEEP_SORTED = SET_BITS << ( SORTED_POSITION << 1 ); > public static final int KEEP_SIZED = SET_BITS << ( SIZED_POSITION << 1 ); > private static final int SHORT_CIRCUIT_MASK = SET_BITS << ( > SHORT_CIRCUIT_POSITION << 1 ); > // no GREEDY here > private static final int HIGHER_BITS = ( PARELLIZABLE | STATELESS | > KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ) << 1 ; > private static final int STREAM_OP_MASK = > (( GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ) << 1 ) | > GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ; > public static String toString ( int characteristics ) { > return Stream . of ( characteristics ) > .< String >mapMulti(( f , consumer ) -> { > if (( f & PARELLIZABLE ) == PARELLIZABLE ) { > consumer .accept( "PARELLIZABLE" ); > } > if (( f & STATELESS ) == STATELESS ) { > consumer .accept( "STATELESS" ); > } > if (( f & GREEDY ) == GREEDY ) { > consumer .accept( "GREEDY" ); > } > if (( f & KEEP_DISTINCT ) == KEEP_DISTINCT ) { > consumer .accept( "KEEP_DISTINCT" ); > } > if (( f & KEEP_SORTED ) == KEEP_SORTED ) { > consumer .accept( "KEEP_SORTED" ); > } > if (( f & KEEP_SIZED ) == KEEP_SIZED ) { > consumer .accept( "KEEP_SIZED" ); > } > }) > .collect( Collectors . joining ( ", " )); > } > public static int toOpFlags ( int characteristics ) { > return (( characteristics ^ SHORT_CIRCUIT_MASK ) | HIGHER_BITS ) & > STREAM_OP_MASK ; > } > public static String toOpFlagsString ( int opFlags ) { > return Arrays . stream ( StreamOpFlag . values ()) > .map( op -> { > if ( op .isPreserved( opFlags )) { > return "preserved " + op ; > } > if ( op .isCleared( opFlags )) { > return "cleared " + op ; > } > if ( op .isKnown( opFlags )) { > return "set " + op ; > } > return "?? " + op ; > }) > .collect( Collectors . joining ( ", " )); > } > void main () { > var characteristics = PARELLIZABLE | STATELESS | GREEDY | KEEP_DISTINCT | > KEEP_SORTED | KEEP_SIZED ; > System . out .println( toOpFlagsString ( toOpFlags ( characteristics ))); > var characteristics2 = STATELESS | KEEP_DISTINCT | KEEP_SIZED ; > System . out .println( toOpFlagsString ( toOpFlags ( characteristics2 ))); > } > } >> Viktor Klang >> Software Architect, Java Platform Group >> Oracle >> From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> >> Sent: Thursday, 18 January 2024 16:17 >> To: Viktor Klang <viktor.kl...@oracle.com> >> Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz >> <psan...@openjdk.java.net> >> Subject: [External] : Re: Gatherer: spliterator characteristics are not >> propagated >>> From: "Viktor Klang" <viktor.kl...@oracle.com> >>> To: "Remi Forax" <fo...@univ-mlv.fr> >>> Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" >>> <psan...@openjdk.java.net> >>> Sent: Thursday, January 18, 2024 3:36:07 PM >>> Subject: Re: Gatherer: spliterator characteristics are not propagated >>> I suspect that it is a rather slippery slope, once KEEP-flags are added, >>> then >>> others will want to be able to have INJECT-flags, and then people might have >>> different opinions w.r.t. the default should be to clear all flags etc. >>> And that's even before one looks at the composition-part of it, what are the >>> flags for A.andThen(B)? (then extrapolate to N compositions and the >>> available >>> set of flags always approaches 0) >>> I spent quite a bit of time on this and in the end tracking all this info, >>> and >>> making sure that the flags of implementations correspond to the actual >>> behavior, just ended up costing performance for most streams and introduced >>> an >>> extra dimension to creation and maintenance which I had a hard time >>> justifying. >> It can be a slippery slope if we were designing from the ground up but the >> stream implementation already exists and SORTED, DISTINCT and SIZED are the >> flags that are already tracked by the current implementation. >> Currently only SHORT_CIRCUIT is set (if not greedy), >> see [ >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java*L209__;Iw!!ACWV5N9M2RV99hQ!PhMxqlDzLWPRuwYc7ECRKNPVs0BtnoE-RdT-Jdkng7S-iFuERAHYcWvJ-OMKGLrkPdSrUl3xj1R9ypyeqeWI$ >> | >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java#L209 >> ] >> And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if >> both gatherers keep it sorted. >>> Making specific, rare, combinations of operations faster at the expense of >>> making 99% of all others slower is a hard pill for most to swallow. >> I suppose that if those flags already exist, it's because they have a purpose >> and i do not understand how it can make the other operations slower. >>> Cheers, >>> √ >> regards, >> Rémi >>> Viktor Klang >>> Software Architect, Java Platform Group >>> Oracle >>> From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> >>> Sent: Thursday, 18 January 2024 10:28 >>> To: Viktor Klang <viktor.kl...@oracle.com> >>> Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz >>> <psan...@openjdk.java.net> >>> Subject: [External] : Re: Gatherer: spliterator characteristics are not >>> propagated >>>> From: "Viktor Klang" <viktor.kl...@oracle.com> >>>> To: "Remi Forax" <fo...@univ-mlv.fr>, "core-libs-dev" >>>> <core-libs-...@openjdk.java.net> >>>> Sent: Wednesday, January 17, 2024 8:48:07 PM >>>> Subject: Re: Gatherer: spliterator characteristics are not propagated >>>> Hi Rémi, >>>> You can find some of my benches here: [ >>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_au5zyje2l$ >>>> | >>>> https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref >>>> ] >>>> Initially I had Characteristics such as ORDERED etc on Gatherer but it just >>>> didn't end up worth it when looking at the bench results over a wide array >>>> of >>>> stream sizes and number of operations. >>> I think there are 3 gatherer characteristics that make sense: KEEP_SORTED, >>> KEEP_DISTINCT and KEEP_SIZED, >>> all of them say that if the stream was sorted/distinct/sized then the stream >>> returned by a call to gather() is still sorted (with the same comparator), >>> distinct or sized. >>> As examples, map() is KEEP_SIZED, filter() is KEEP_SORTED | KEEP_DISTINCT >>> and >>> windowFixed is KEEP_DISTINCT. >>> [CC Paul, so he can correct me if i'm saying something stupid] >>> Now for the benchmarks, it depends what you want to measure, benchmarking >>> streams is tricky. This is what i know about benchmarking streams. >>> First, the JIT has two ways to profile types at runtime, >>> Either a method takes a function as parameter >>> void map(Function function) { >>> function.apply(...) >>> } >>> and when map is called with a subtype of Function, the JIT will propagate >>> the >>> exact type when map is inlined, >>> Or a method use a field >>> class Op { >>> Function function; >>> void map() { >>> function.apply(...) >>> } >>> } >>> in that case, the VM records the profile of function.apply() and if there >>> are >>> more than two different profiles, the VM declare profile poluttion and do >>> not >>> try to optimize. >>> The Stream implementation tries very hard to use only parameters instead of >>> fields, that's why it does not use classical Iterator that are pull >>> iterator (a >>> filter iterator requires a field) but a Spliterator which is a push >>> iterator, >>> the element is sent as parameter of the consumer.That's also why collect >>> does >>> not use the builder pattern (that accumulate values in fields) but a >>> Collector >>> that publish the functions to be called as parameter. >>> Obvisously, this is more complex than that, a Collector stores the >>> functions in >>> fields so it should not work well but the implementation uses a record that >>> plays well with escape analysis. Escape analysis see the fields of an >>> instance >>> as parameters so the functions of a Collector are correctly propagated (if >>> the >>> escape analysis works). And lambdas are using invokedynamic, and the VM >>> tries >>> really hard to inline invokedynamic, so lambdas (that captures value or not) >>> are routinely fully inlined with the intermediate operation of a stream. >>> In your tests, i've not seen comparaisons between an existing method like >>> map() >>> or filter() followed by a sorted()/distinct()/count()/toList(), i.e. >>> operations >>> where the characteristics KEEP_* have an impact and their equivalent using a >>> Gatherer. >>>> Cheers, >>>> √ >>> regards, >>> Rémi >>>> Viktor Klang >>>> Software Architect, Java Platform Group >>>> Oracle >>>> From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Remi >>>> Forax >>>> <fo...@univ-mlv.fr> >>>> Sent: Wednesday, 17 January 2024 16:48 >>>> To: core-libs-dev <core-libs-...@openjdk.java.net> >>>> Subject: Gatherer: spliterator characteristics are not propagated >>>> While doing some benchmarking of the Gatherer API, i've found that the >>>> characteristics of the spliterator was not propagated by the method >>>> Stream.gather() making the stream slower than it should. >>>> As an example, there is no way when reimplementing map() using a Gatherer >>>> to say >>>> that this intermediate operation keep the size, which is important if the >>>> terminal operation is toList() because if the size is known, toList() will >>>> presize the List and avoid the creation of an intermediary ArrayList. >>>> See [ >>>> https://urldefense.com/v3/__https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_auzwTY8aB$ >>>> | >>>> https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java >>>> ] >>>> I think that adding a way to propagate the spliterator characteristics >>>> through a >>>> Gatherer would greatly improve the performance of commons streams (at >>>> least all >>>> the ones that end with a call to toList). >>>> I have some idea of how to do that, but I prefer first to hear if i've >>>> overlook >>>> something and if improving the performance is something worth changing the >>>> API. >>>> regards, >>>> Rémi