Hi Rémi,

For instance, stateless is neither recessive nor dominant, since the 
composition of two stateless operations is only ever stateless if they both are 
greedy as well: 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588

So even if making it represented as ints (more like Spliterator, rather than 
Collector) makes things faster, it's still both work to track, propagate, and 
also becomes a side-channel that needs to remain in sync with the actual 
implementation of the logic.

One could argue that logic such as: someCollection.stream().map(…).count() is a 
performance bug/inefficiency in an of itself as it would be faster to do 
someCollection.size().

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: fo...@univ-mlv.fr <fo...@univ-mlv.fr>
Sent: Saturday, 20 January 2024 17:40
To: Viktor Klang <viktor.kl...@oracle.com>
Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz 
<psan...@openjdk.java.net>
Subject: [External] : Re: Gatherer: spliterator characteristics are not 
propagated



________________________________
From: "Viktor Klang" <viktor.kl...@oracle.com>
To: "Remi Forax" <fo...@univ-mlv.fr>
Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" 
<psan...@openjdk.java.net>
Sent: Thursday, January 18, 2024 5:14:38 PM
Subject: Re: Gatherer: spliterator characteristics are not propagated

And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if 
both gatherers keep it sorted.
That is unfortunately not the case. That would presume that you can implement 
the composition such that it can preserve all the common flags. Some flags 
could be "dominant" and some "recessive" to use genetics nomenclature.

Some flags of the stream pipeline are "recessive", mainly SHORT_CIRCUIT, but 
not the characteristics of the Gatherer which can have the corresponding 
"dominant" flag, GREEDY, in this case.
And the same for sequential, the flag should be PARALELIZABLE and not 
SEQUENTIAL.

The idea is that the Gatherer characteristics can have the same bit set at the 
same position as the stream op flags (as defined by StreamOpFlag).
So KEEP_DISTINCT is in position 0, KEEP_SORTED in in position 2 and KEEP_SIZED 
is in position 3.
For GREEDY, we use the same position as SHORT_CIRCUIT and we will flip the bit 
(using XOR) when we want to extract the stream op flags from the characteristics

All the factory methods call the generic of() with a combination of 
PARALELIZABLE and STATELESS and the user can adds the characteristics GREEDY, 
KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED (otherwise an exception should be 
raised).

In StreamOpFlag, there are two unused positions (14 and 15), that's perfect for 
our two new states PARALELIZABLE and STATELESS, so no problem here (technically 
we can also reuse positions of the Spliterator characteristic given that those 
flags are masked before being sent to the GathererOp super constructor).

The way to transform a Gatherer characteristics op to a stream flags op is to 
flip the bits corresponding to SHORT_CIRCUIT, add the highter bit of all flags 
but SHORT-CIRCUIT (because stream op flags are encoded using 2 bits) and mask 
to only retain SHORT_CIRCUIT, KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED.


public static int toOpFlags(int characteristics) {
  return  ((characteristics ^ SHORT_CIRCUIT_MASK) | HIGHER_BITS) & 
STREAM_OP_MASK;
}

see below for a full script.



I suppose that if those flags already exist, it's because they have a purpose 
and i do not understand how it can make the other operations slower.

Extra invocations, extra storage, and extra composition overhead is not free. 
Since Stream is one-shot you need to include the construction cost with the 
execution cost. For something like an empty Stream construction cost scan be 
90+% of the total costs.

If you create a Gatherer, the characteristics is a constant (so the validity 
check is removed, it's just a mask and a test) so the result of calling 
toOpFlags() is a constant too.

If the factory method is not inlined, the cost is 3 bitwise operations which is 
I believe faster than the "instanceof Greedy" used in the current code.


Cheers,
√

regards,
Rémi

---

public class GathererFlag {
  // cut and paste from StreamOpFlag
  /**
   * The bit pattern for setting/injecting a flag.
   */
  private static final int SET_BITS = 0b01;

  /**
   * The bit pattern for clearing a flag.
   */
  private static final int CLEAR_BITS = 0b10;

  /**
   * The bit pattern for preserving a flag.
   */
  private static final int PRESERVE_BITS = 0b11;


  private static int position(int opFlagSet) {
    return Integer.numberOfTrailingZeros(opFlagSet) >> 1;
  }

  private static final int DISTINCT_POSITION = 
position(StreamOpFlag.IS_DISTINCT);
  private static final int SORTED_POSITION = position(StreamOpFlag.IS_SORTED);
  private static final int SIZED_POSITION = position(StreamOpFlag.IS_SIZED);

  private static final int SHORT_CIRCUIT_POSITION = 
position(StreamOpFlag.IS_SHORT_CIRCUIT);
  private static final int STATELESS_POSITION = 14;
  private static final int PARELLIZABLE_POSITION = 15;

  public static final int PARELLIZABLE = SET_BITS << (PARELLIZABLE_POSITION << 
1);
  public static final int STATELESS = SET_BITS << (STATELESS_POSITION << 1);
  public static final int GREEDY = SET_BITS << (SHORT_CIRCUIT_POSITION << 1);

  public static final int KEEP_DISTINCT = SET_BITS << (DISTINCT_POSITION << 1);
  public static final int KEEP_SORTED = SET_BITS << (SORTED_POSITION << 1);
  public static final int KEEP_SIZED = SET_BITS << (SIZED_POSITION << 1);

  private static final int SHORT_CIRCUIT_MASK = SET_BITS << 
(SHORT_CIRCUIT_POSITION << 1);

  // no GREEDY here
  private static final int HIGHER_BITS = (PARELLIZABLE | STATELESS | 
KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED) << 1;

  private static final int STREAM_OP_MASK =
      ((GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED) << 1) |
      GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED;

  public static String toString(int characteristics) {
    return Stream.of(characteristics)
        .<String>mapMulti((f, consumer) -> {
          if ((f & PARELLIZABLE) == PARELLIZABLE) {
            consumer.accept("PARELLIZABLE");
          }
          if ((f & STATELESS) == STATELESS) {
            consumer.accept("STATELESS");
          }
          if ((f & GREEDY) == GREEDY) {
            consumer.accept("GREEDY");
          }
          if ((f & KEEP_DISTINCT) == KEEP_DISTINCT) {
            consumer.accept("KEEP_DISTINCT");
          }
          if ((f & KEEP_SORTED) == KEEP_SORTED) {
            consumer.accept("KEEP_SORTED");
          }
          if ((f & KEEP_SIZED) == KEEP_SIZED) {
            consumer.accept("KEEP_SIZED");
          }
        })
        .collect(Collectors.joining(", "));
  }

  public static int toOpFlags(int characteristics) {
    return  ((characteristics ^ SHORT_CIRCUIT_MASK) | HIGHER_BITS) & 
STREAM_OP_MASK;
}

public static String toOpFlagsString(int opFlags) {
return Arrays.stream(StreamOpFlag.values())
.map(op -> {
if (op.isPreserved(opFlags)) {
return "preserved " + op;
}
if (op.isCleared(opFlags)) {
return "cleared " + op;
}
if (op.isKnown(opFlags)) {
return "set " + op;
}
return "?? " + op;
})
.collect(Collectors.joining(", "));
}

void main() {
var characteristics = PARELLIZABLE | STATELESS | GREEDY | KEEP_DISTINCT | 
KEEP_SORTED | KEEP_SIZED;
System.out.println(toOpFlagsString(toOpFlags(characteristics)));

var characteristics2 = STATELESS | KEEP_DISTINCT | KEEP_SIZED;
System.out.println(toOpFlagsString(toOpFlags(characteristics2)));
}
}



Viktor Klang
Software Architect, Java Platform Group
Oracle

________________________________
From: fo...@univ-mlv.fr <fo...@univ-mlv.fr>
Sent: Thursday, 18 January 2024 16:17
To: Viktor Klang <viktor.kl...@oracle.com>
Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz 
<psan...@openjdk.java.net>
Subject: [External] : Re: Gatherer: spliterator characteristics are not 
propagated



________________________________
From: "Viktor Klang" <viktor.kl...@oracle.com>
To: "Remi Forax" <fo...@univ-mlv.fr>
Cc: "core-libs-dev" <core-libs-...@openjdk.java.net>, "Paul Sandoz" 
<psan...@openjdk.java.net>
Sent: Thursday, January 18, 2024 3:36:07 PM
Subject: Re: Gatherer: spliterator characteristics are not propagated
I suspect that it is a rather slippery slope, once KEEP-flags are added, then 
others will want to be able to have INJECT-flags, and then people might have 
different opinions w.r.t. the default should be to clear all flags etc.

And that's even before one looks at the composition-part of it, what are the 
flags for A.andThen(B)? (then extrapolate to N compositions and the available 
set of flags always approaches 0)

I spent quite a bit of time on this and in the end tracking all this info, and 
making sure that the flags of implementations correspond to the actual 
behavior, just ended up costing performance for most streams and introduced an 
extra dimension to creation and maintenance which I had a hard time justifying.

It can be a slippery slope if we were designing from the ground up but the 
stream implementation already exists and SORTED, DISTINCT and SIZED are the 
flags that are already tracked by the current implementation.

Currently only SHORT_CIRCUIT is set (if not greedy),
see 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java#L209<https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java*L209__;Iw!!ACWV5N9M2RV99hQ!PhMxqlDzLWPRuwYc7ECRKNPVs0BtnoE-RdT-Jdkng7S-iFuERAHYcWvJ-OMKGLrkPdSrUl3xj1R9ypyeqeWI$>

And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if 
both gatherers keep it sorted.


Making specific, rare, combinations of operations faster at the expense of 
making 99% of all others slower is a hard pill for most to swallow.

I suppose that if those flags already exist, it's because they have a purpose 
and i do not understand how it can make the other operations slower.



Cheers,
√

regards,
Rémi



Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: fo...@univ-mlv.fr <fo...@univ-mlv.fr>
Sent: Thursday, 18 January 2024 10:28
To: Viktor Klang <viktor.kl...@oracle.com>
Cc: core-libs-dev <core-libs-...@openjdk.java.net>; Paul Sandoz 
<psan...@openjdk.java.net>
Subject: [External] : Re: Gatherer: spliterator characteristics are not 
propagated



________________________________
From: "Viktor Klang" <viktor.kl...@oracle.com>
To: "Remi Forax" <fo...@univ-mlv.fr>, "core-libs-dev" 
<core-libs-...@openjdk.java.net>
Sent: Wednesday, January 17, 2024 8:48:07 PM
Subject: Re: Gatherer: spliterator characteristics are not propagated
Hi Rémi,

You can find some of my benches here: 
https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref<https://urldefense.com/v3/__https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_au5zyje2l$>

Initially I had Characteristics such as ORDERED etc on Gatherer but it just 
didn't end up worth it when looking at the bench results over a wide array of 
stream sizes and number of operations.

I think there are 3 gatherer characteristics that make sense: KEEP_SORTED, 
KEEP_DISTINCT and KEEP_SIZED,
all of them say that if the stream was sorted/distinct/sized then the stream 
returned by a call to gather() is still sorted (with the same comparator), 
distinct or sized.

As examples, map() is KEEP_SIZED, filter() is KEEP_SORTED | KEEP_DISTINCT and 
windowFixed is KEEP_DISTINCT.

[CC Paul, so he can correct me if i'm saying something stupid]

Now for the benchmarks, it depends what you want to measure, benchmarking 
streams is tricky. This is what i know about benchmarking streams.
First, the JIT has two ways to profile types at runtime,
Either a method takes a function as parameter
  void map(Function function) {
    function.apply(...)
  }
and when map is called with a subtype of Function, the JIT will propagate the 
exact type when map is inlined,
Or a method use a field
  class Op {
    Function function;

    void map() {
       function.apply(...)
    }
  }
in that case, the VM records the profile of function.apply() and if there are 
more than two different profiles, the VM declare profile poluttion and do not 
try to optimize.

The Stream implementation tries very hard to use only parameters instead of 
fields, that's why it does not use classical Iterator that are pull iterator (a 
filter iterator requires a field) but a Spliterator which is a push iterator, 
the element is sent as parameter of the consumer.That's also why collect does 
not use the builder pattern (that accumulate values in fields) but a Collector 
that publish the functions to be called as parameter.

Obvisously, this is more complex than that, a Collector stores the functions in 
fields so it should not work well but the implementation uses a record that 
plays well with escape analysis. Escape analysis see the fields of an instance 
as parameters so the functions of a Collector are correctly propagated (if the 
escape analysis works). And lambdas are using invokedynamic, and the VM tries 
really hard to inline invokedynamic, so lambdas (that captures value or not) 
are routinely fully inlined with the intermediate operation of a stream.

In your tests, i've not seen comparaisons between an existing method like map() 
or filter() followed by a sorted()/distinct()/count()/toList(), i.e. operations 
where the characteristics KEEP_* have an impact and their equivalent using a 
Gatherer.



Cheers,
√

regards,
Rémi



Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Remi Forax 
<fo...@univ-mlv.fr>
Sent: Wednesday, 17 January 2024 16:48
To: core-libs-dev <core-libs-...@openjdk.java.net>
Subject: Gatherer: spliterator characteristics are not propagated

While doing some benchmarking of the Gatherer API, i've found that the 
characteristics of the spliterator was not propagated by the method 
Stream.gather() making the stream slower than it should.

As an example, there is no way when reimplementing map() using a Gatherer to 
say that this intermediate operation keep the size, which is important if the 
terminal operation is toList() because if the size is known, toList() will 
presize the List and avoid the creation of an intermediary ArrayList.

See 
https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java<https://urldefense.com/v3/__https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_auzwTY8aB$>

I think that adding a way to propagate the spliterator characteristics through 
a Gatherer would greatly improve the performance of commons streams (at least 
all the ones that end with a call to toList).

I have some idea of how to do that, but I prefer first to hear if i've overlook 
something and if improving the performance is something worth changing the API.

regards,
Rémi



Reply via email to