> From: "Viktor Klang" <viktor.kl...@oracle.com> > To: "Remi Forax" <fo...@univ-mlv.fr> > Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" > <psan...@openjdk.java.net> > Sent: Monday, January 22, 2024 10:06:27 PM > Subject: Re: Gatherer: spliterator characteristics are not propagated
>> The flags are in sync with the implementation because the only way to create >> a >> Gatherer if through the factory methods and those factory methods (and only >> them) compute the proper combination of SEQUENTIAL | STATELESS | GREEDY. A >> user >> should not be able to set those flags. Only the flags KEEP_* are settable by >> a >> user. > But I presume this also requires to have a `int characteristics()`-method on > the > Gatherer interfacewhich means that users who are not using the factory methods > will have full possibility of not only returning the flags, but returning any > int. The current implementation suffers the same kind of issue, it's easy to write a mutable Gatherer and change the functions after creation, worst, right in the middle of a call to stream.gather(...). Perhaps the Gatherer interface should be sealed ? We did not have that option during the 1.8 timeframe, when the Collector API was created. >> The stream implementation has a whole mechanism in place to >> propagate/preverse >> flags like SIZED, DISTINCT or SORTED. For me, discussing about the merit of >> this mechanism seems a little off topic. I would prefer the Gatherer to be a >> good citizen and works seemlessly with the other intermediary operations. > I can see where you're coming from here, but to me, adding API surface needs > to > pull its weight. > In this case I wasn't convinced that it did, hence we're having this > conversation. \uD83D\uDE42 > Cheers, > √ regards, Rémi > Viktor Klang > Software Architect, Java Platform Group > Oracle > From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> > Sent: Monday, 22 January 2024 19:56 > To: Viktor Klang <viktor.kl...@oracle.com> > Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz > <psan...@openjdk.java.net> > Subject: [External] : Re: Gatherer: spliterator characteristics are not > propagated >> From: "Viktor Klang" <viktor.kl...@oracle.com> >> To: "Remi Forax" <fo...@univ-mlv.fr> >> Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" >> <psan...@openjdk.java.net> >> Sent: Monday, January 22, 2024 4:22:11 PM >> Subject: Re: Gatherer: spliterator characteristics are not propagated >> Hi Rémi, > Hello, >> For instance, stateless is neither recessive nor dominant, since the >> composition >> of two stateless operations is only ever stateless if they both are greedy as >> well: [ >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java*L588__;Iw!!ACWV5N9M2RV99hQ!Lm52jd6kovd5t-cmrqqSLiRcIajBGXLxh85LO3eeiL6UxbKZuNPcUnO6z2i0FzMEoNr7U-cOBuWPCjo57FVW$ >> | >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588 >> ] > Okay, so choosing SEQUENTIAL vs PARALELLIZABLE is not that important given > that > the combination is ad-hoc, reflecting the characterristics is enough. >> So even if making it represented as ints (more like Spliterator, rather than >> Collector) makes things faster, it's still both work to track, propagate, and >> also becomes a side-channel that needs to remain in sync with the actual >> implementation of the logic. > The flags are in sync with the implementation because the only way to create a > Gatherer if through the factory methods and those factory methods (and only > them) compute the proper combination of SEQUENTIAL | STATELESS | GREEDY. A > user > should not be able to set those flags. Only the flags KEEP_* are settable by a > user. >> One could argue that logic such as: someCollection.stream().map(…).count() >> is a >> performance bug/inefficiency in an of itself as it would be faster to do >> someCollection.size(). > The stream implementation has a whole mechanism in place to propagate/preverse > flags like SIZED, DISTINCT or SORTED. For me, discussing about the merit of > this mechanism seems a little off topic. I would prefer the Gatherer to be a > good citizen and works seemlessly with the other intermediary operations. > Cheers, > √ > Viktor Klang > Software Architect, Java Platform Group > Oracle > From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> > Sent: Monday, 22 January 2024 19:56 > To: Viktor Klang <viktor.kl...@oracle.com> > Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz > <psan...@openjdk.java.net> > Subject: [External] : Re: Gatherer: spliterator characteristics are not > propagated >> From: "Viktor Klang" <viktor.kl...@oracle.com> >> To: "Remi Forax" <fo...@univ-mlv.fr> >> Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" >> <psan...@openjdk.java.net> >> Sent: Monday, January 22, 2024 4:22:11 PM >> Subject: Re: Gatherer: spliterator characteristics are not propagated >> Hi Rémi, > Hello, >> For instance, stateless is neither recessive nor dominant, since the >> composition >> of two stateless operations is only ever stateless if they both are greedy as >> well: [ >> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java*L588__;Iw!!ACWV5N9M2RV99hQ!Lm52jd6kovd5t-cmrqqSLiRcIajBGXLxh85LO3eeiL6UxbKZuNPcUnO6z2i0FzMEoNr7U-cOBuWPCjo57FVW$ >> | >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588 >> ] > Okay, so choosing SEQUENTIAL vs PARALELLIZABLE is not that important given > that > the combination is ad-hoc, reflecting the characterristics is enough. >> So even if making it represented as ints (more like Spliterator, rather than >> Collector) makes things faster, it's still both work to track, propagate, and >> also becomes a side-channel that needs to remain in sync with the actual >> implementation of the logic. > The flags are in sync with the implementation because the only way to create a > Gatherer if through the factory methods and those factory methods (and only > them) compute the proper combination of SEQUENTIAL | STATELESS | GREEDY. A > user > should not be able to set those flags. Only the flags KEEP_* are settable by a > user. >> One could argue that logic such as: someCollection.stream().map(…).count() >> is a >> performance bug/inefficiency in an of itself as it would be faster to do >> someCollection.size(). > The stream implementation has a whole mechanism in place to propagate/preverse > flags like SIZED, DISTINCT or SORTED. For me, discussing about the merit of > this mechanism seems a little off topic. I would prefer the Gatherer to be a > good citizen and works seemlessly with the other intermediary operations. >> Cheers, >> √ > regards, > Rémi >> Viktor Klang >> Software Architect, Java Platform Group >> Oracle >> From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> >> Sent: Saturday, 20 January 2024 17:40 >> To: Viktor Klang <viktor.kl...@oracle.com> >> Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz >> <psan...@openjdk.java.net> >> Subject: [External] : Re: Gatherer: spliterator characteristics are not >> propagated >>> From: "Viktor Klang" <viktor.kl...@oracle.com> >>> To: "Remi Forax" <fo...@univ-mlv.fr> >>> Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" >>> <psan...@openjdk.java.net> >>> Sent: Thursday, January 18, 2024 5:14:38 PM >>> Subject: Re: Gatherer: spliterator characteristics are not propagated >>>> And for A.andThen(B), A.flags & B.flags should work, the stream is sorted >>>> if >>>> both gatherers keep it sorted. >>> That is unfortunately not the case. That would presume that you can >>> implement >>> the composition such that it can preserve all the common flags. Some flags >>> could be "dominant" and some "recessive" to use genetics nomenclature. >> Some flags of the stream pipeline are "recessive", mainly SHORT_CIRCUIT, but >> not >> the characteristics of the Gatherer which can have the corresponding >> "dominant" >> flag, GREEDY, in this case. >> And the same for sequential, the flag should be PARALELIZABLE and not >> SEQUENTIAL. >> The idea is that the Gatherer characteristics can have the same bit set at >> the >> same position as the stream op flags (as defined by StreamOpFlag). >> So KEEP_DISTINCT is in position 0, KEEP_SORTED in in position 2 and >> KEEP_SIZED >> is in position 3. >> For GREEDY, we use the same position as SHORT_CIRCUIT and we will flip the >> bit >> (using XOR) when we want to extract the stream op flags from the >> characteristics >> All the factory methods call the generic of() with a combination of >> PARALELIZABLE and STATELESS and the user can adds the characteristics GREEDY, >> KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED (otherwise an exception should be >> raised). >> In StreamOpFlag, there are two unused positions (14 and 15), that's perfect >> for >> our two new states PARALELIZABLE and STATELESS, so no problem here >> (technically >> we can also reuse positions of the Spliterator characteristic given that >> those >> flags are masked before being sent to the GathererOp super constructor). >> The way to transform a Gatherer characteristics op to a stream flags op is to >> flip the bits corresponding to SHORT_CIRCUIT, add the highter bit of all >> flags >> but SHORT-CIRCUIT (because stream op flags are encoded using 2 bits) and mask >> to only retain SHORT_CIRCUIT, KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED. >> public static int toOpFlags ( int characteristics ) { >> return (( characteristics ^ SHORT_CIRCUIT_MASK ) | HIGHER_BITS ) & >> STREAM_OP_MASK ; >> } >> see below for a full script. >>>> I suppose that if those flags already exist, it's because they have a >>>> purpose >>>> and i do not understand how it can make the other operations slower. >>> Extra invocations, extra storage, and extra composition overhead is not >>> free. >>> Since Stream is one-shot you need to include the construction cost with the >>> execution cost. For something like an empty Stream construction cost scan be >>> 90+% of the total costs. >> If you create a Gatherer, the characteristics is a constant (so the validity >> check is removed, it's just a mask and a test) so the result of calling >> toOpFlags() is a constant too. >> If the factory method is not inlined, the cost is 3 bitwise operations which >> is >> I believe faster than the "instanceof Greedy" used in the current code. >>> Cheers, >>> √ >> regards, >> Rémi >> --- >> public class GathererFlag { >> // cut and paste from StreamOpFlag >> /** >> * The bit pattern for setting/injecting a flag. >> */ >> private static final int SET_BITS = 0b01 ; >> /** >> * The bit pattern for clearing a flag. >> */ >> private static final int CLEAR_BITS = 0b10 ; >> /** >> * The bit pattern for preserving a flag. >> */ >> private static final int PRESERVE_BITS = 0b11 ; >> private static int position ( int opFlagSet ) { >> return Integer . numberOfTrailingZeros ( opFlagSet ) >> 1 ; >> } >> private static final int DISTINCT_POSITION = position ( StreamOpFlag . >> IS_DISTINCT ); >> private static final int SORTED_POSITION = position ( StreamOpFlag . >> IS_SORTED >> ); >> private static final int SIZED_POSITION = position ( StreamOpFlag . IS_SIZED >> ); >> private static final int SHORT_CIRCUIT_POSITION = position ( StreamOpFlag . >> IS_SHORT_CIRCUIT ); >> private static final int STATELESS_POSITION = 14 ; >> private static final int PARELLIZABLE_POSITION = 15 ; >> public static final int PARELLIZABLE = SET_BITS << ( PARELLIZABLE_POSITION >> << 1 >> ); >> public static final int STATELESS = SET_BITS << ( STATELESS_POSITION << 1 ); >> public static final int GREEDY = SET_BITS << ( SHORT_CIRCUIT_POSITION << 1 ); >> public static final int KEEP_DISTINCT = SET_BITS << ( DISTINCT_POSITION << 1 >> ); >> public static final int KEEP_SORTED = SET_BITS << ( SORTED_POSITION << 1 ); >> public static final int KEEP_SIZED = SET_BITS << ( SIZED_POSITION << 1 ); >> private static final int SHORT_CIRCUIT_MASK = SET_BITS << ( >> SHORT_CIRCUIT_POSITION << 1 ); >> // no GREEDY here >> private static final int HIGHER_BITS = ( PARELLIZABLE | STATELESS | >> KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ) << 1 ; >> private static final int STREAM_OP_MASK = >> (( GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ) << 1 ) | >> GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ; >> public static String toString ( int characteristics ) { >> return Stream . of ( characteristics ) >> .< String >mapMulti(( f , consumer ) -> { >> if (( f & PARELLIZABLE ) == PARELLIZABLE ) { >> consumer .accept( "PARELLIZABLE" ); >> } >> if (( f & STATELESS ) == STATELESS ) { >> consumer .accept( "STATELESS" ); >> } >> if (( f & GREEDY ) == GREEDY ) { >> consumer .accept( "GREEDY" ); >> } >> if (( f & KEEP_DISTINCT ) == KEEP_DISTINCT ) { >> consumer .accept( "KEEP_DISTINCT" ); >> } >> if (( f & KEEP_SORTED ) == KEEP_SORTED ) { >> consumer .accept( "KEEP_SORTED" ); >> } >> if (( f & KEEP_SIZED ) == KEEP_SIZED ) { >> consumer .accept( "KEEP_SIZED" ); >> } >> }) >> .collect( Collectors . joining ( ", " )); >> } >> public static int toOpFlags ( int characteristics ) { >> return (( characteristics ^ SHORT_CIRCUIT_MASK ) | HIGHER_BITS ) & >> STREAM_OP_MASK ; >> } >> public static String toOpFlagsString ( int opFlags ) { >> return Arrays . stream ( StreamOpFlag . values ()) >> .map( op -> { >> if ( op .isPreserved( opFlags )) { >> return "preserved " + op ; >> } >> if ( op .isCleared( opFlags )) { >> return "cleared " + op ; >> } >> if ( op .isKnown( opFlags )) { >> return "set " + op ; >> } >> return "?? " + op ; >> }) >> .collect( Collectors . joining ( ", " )); >> } >> void main () { >> var characteristics = PARELLIZABLE | STATELESS | GREEDY | KEEP_DISTINCT | >> KEEP_SORTED | KEEP_SIZED ; >> System . out .println( toOpFlagsString ( toOpFlags ( characteristics ))); >> var characteristics2 = STATELESS | KEEP_DISTINCT | KEEP_SIZED ; >> System . out .println( toOpFlagsString ( toOpFlags ( characteristics2 ))); >> } >> } >>> Viktor Klang >>> Software Architect, Java Platform Group >>> Oracle >>> From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> >>> Sent: Thursday, 18 January 2024 16:17 >>> To: Viktor Klang <viktor.kl...@oracle.com> >>> Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz >>> <psan...@openjdk.java.net> >>> Subject: [External] : Re: Gatherer: spliterator characteristics are not >>> propagated >>>> From: "Viktor Klang" <viktor.kl...@oracle.com> >>>> To: "Remi Forax" <fo...@univ-mlv.fr> >>>> Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" >>>> <psan...@openjdk.java.net> >>>> Sent: Thursday, January 18, 2024 3:36:07 PM >>>> Subject: Re: Gatherer: spliterator characteristics are not propagated >>>> I suspect that it is a rather slippery slope, once KEEP-flags are added, >>>> then >>>> others will want to be able to have INJECT-flags, and then people might >>>> have >>>> different opinions w.r.t. the default should be to clear all flags etc. >>>> And that's even before one looks at the composition-part of it, what are >>>> the >>>> flags for A.andThen(B)? (then extrapolate to N compositions and the >>>> available >>>> set of flags always approaches 0) >>>> I spent quite a bit of time on this and in the end tracking all this info, >>>> and >>>> making sure that the flags of implementations correspond to the actual >>>> behavior, just ended up costing performance for most streams and >>>> introduced an >>>> extra dimension to creation and maintenance which I had a hard time >>>> justifying. >>> It can be a slippery slope if we were designing from the ground up but the >>> stream implementation already exists and SORTED, DISTINCT and SIZED are the >>> flags that are already tracked by the current implementation. >>> Currently only SHORT_CIRCUIT is set (if not greedy), >>> see [ >>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java*L209__;Iw!!ACWV5N9M2RV99hQ!PhMxqlDzLWPRuwYc7ECRKNPVs0BtnoE-RdT-Jdkng7S-iFuERAHYcWvJ-OMKGLrkPdSrUl3xj1R9ypyeqeWI$ >>> | >>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java#L209 >>> ] >>> And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if >>> both gatherers keep it sorted. >>>> Making specific, rare, combinations of operations faster at the expense of >>>> making 99% of all others slower is a hard pill for most to swallow. >>> I suppose that if those flags already exist, it's because they have a >>> purpose >>> and i do not understand how it can make the other operations slower. >>>> Cheers, >>>> √ >>> regards, >>> Rémi >>>> Viktor Klang >>>> Software Architect, Java Platform Group >>>> Oracle >>>> From: fo...@univ-mlv.fr <fo...@univ-mlv.fr> >>>> Sent: Thursday, 18 January 2024 10:28 >>>> To: Viktor Klang <viktor.kl...@oracle.com> >>>> Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz >>>> <psan...@openjdk.java.net> >>>> Subject: [External] : Re: Gatherer: spliterator characteristics are not >>>> propagated >>>>> From: "Viktor Klang" <viktor.kl...@oracle.com> >>>>> To: "Remi Forax" <fo...@univ-mlv.fr>, "core-libs-dev" >>>>> <core-libs-...@openjdk.java.net> >>>>> Sent: Wednesday, January 17, 2024 8:48:07 PM >>>>> Subject: Re: Gatherer: spliterator characteristics are not propagated >>>>> Hi Rémi, >>>>> You can find some of my benches here: [ >>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_au5zyje2l$ >>>>> | >>>>> https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref >>>>> ] >>>>> Initially I had Characteristics such as ORDERED etc on Gatherer but it >>>>> just >>>>> didn't end up worth it when looking at the bench results over a wide >>>>> array of >>>>> stream sizes and number of operations. >>>> I think there are 3 gatherer characteristics that make sense: KEEP_SORTED, >>>> KEEP_DISTINCT and KEEP_SIZED, >>>> all of them say that if the stream was sorted/distinct/sized then the >>>> stream >>>> returned by a call to gather() is still sorted (with the same comparator), >>>> distinct or sized. >>>> As examples, map() is KEEP_SIZED, filter() is KEEP_SORTED | KEEP_DISTINCT >>>> and >>>> windowFixed is KEEP_DISTINCT. >>>> [CC Paul, so he can correct me if i'm saying something stupid] >>>> Now for the benchmarks, it depends what you want to measure, benchmarking >>>> streams is tricky. This is what i know about benchmarking streams. >>>> First, the JIT has two ways to profile types at runtime, >>>> Either a method takes a function as parameter >>>> void map(Function function) { >>>> function.apply(...) >>>> } >>>> and when map is called with a subtype of Function, the JIT will propagate >>>> the >>>> exact type when map is inlined, >>>> Or a method use a field >>>> class Op { >>>> Function function; >>>> void map() { >>>> function.apply(...) >>>> } >>>> } >>>> in that case, the VM records the profile of function.apply() and if there >>>> are >>>> more than two different profiles, the VM declare profile poluttion and do >>>> not >>>> try to optimize. >>>> The Stream implementation tries very hard to use only parameters instead of >>>> fields, that's why it does not use classical Iterator that are pull >>>> iterator (a >>>> filter iterator requires a field) but a Spliterator which is a push >>>> iterator, >>>> the element is sent as parameter of the consumer.That's also why collect >>>> does >>>> not use the builder pattern (that accumulate values in fields) but a >>>> Collector >>>> that publish the functions to be called as parameter. >>>> Obvisously, this is more complex than that, a Collector stores the >>>> functions in >>>> fields so it should not work well but the implementation uses a record that >>>> plays well with escape analysis. Escape analysis see the fields of an >>>> instance >>>> as parameters so the functions of a Collector are correctly propagated (if >>>> the >>>> escape analysis works). And lambdas are using invokedynamic, and the VM >>>> tries >>>> really hard to inline invokedynamic, so lambdas (that captures value or >>>> not) >>>> are routinely fully inlined with the intermediate operation of a stream. >>>> In your tests, i've not seen comparaisons between an existing method like >>>> map() >>>> or filter() followed by a sorted()/distinct()/count()/toList(), i.e. >>>> operations >>>> where the characteristics KEEP_* have an impact and their equivalent using >>>> a >>>> Gatherer. >>>>> Cheers, >>>>> √ >>>> regards, >>>> Rémi >>>>> Viktor Klang >>>>> Software Architect, Java Platform Group >>>>> Oracle >>>>> From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Remi >>>>> Forax >>>>> <fo...@univ-mlv.fr> >>>>> Sent: Wednesday, 17 January 2024 16:48 >>>>> To: core-libs-dev <core-libs-...@openjdk.java.net> >>>>> Subject: Gatherer: spliterator characteristics are not propagated >>>>> While doing some benchmarking of the Gatherer API, i've found that the >>>>> characteristics of the spliterator was not propagated by the method >>>>> Stream.gather() making the stream slower than it should. >>>>> As an example, there is no way when reimplementing map() using a Gatherer >>>>> to say >>>>> that this intermediate operation keep the size, which is important if the >>>>> terminal operation is toList() because if the size is known, toList() will >>>>> presize the List and avoid the creation of an intermediary ArrayList. >>>>> See [ >>>>> https://urldefense.com/v3/__https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_auzwTY8aB$ >>>>> | >>>>> https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java >>>>> ] >>>>> I think that adding a way to propagate the spliterator characteristics >>>>> through a >>>>> Gatherer would greatly improve the performance of commons streams (at >>>>> least all >>>>> the ones that end with a call to toList). >>>>> I have some idea of how to do that, but I prefer first to hear if i've >>>>> overlook >>>>> something and if improving the performance is something worth changing >>>>> the API. >>>>> regards, >>>>> Rémi