# Analysis ## Obervationally ### IGVN During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are:
in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4)) in(2): java/lang/Object * (speculative=null) We compute the join (HS' meet): https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1310-L1317 t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) But the current `_type` (of the `PhiNode` as a `TypeNode`) is _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) We filter `t` by `_type` https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1332 and we get ft=java/lang/Object * which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 and https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/node.cpp#L1127-L1133 ### Verification On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time _type=java/lang/Object * and so after filtering `t` by (new) `_type` and we get ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. ## But why?! ### Details on type computation In short, we are doing t = typeof(in(1)) / typeof(in(2)) ft = t /\ _type (* IGVN *) ft' = t /\ ft (* Verification *) and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". To me, the surprising fact was that the intersection java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *) /\ _type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *) ~> java/lang/Object * What happened to the speculative type? Both `MyValue2` and `MyValue3` are inheriting `MyAbstract` (and implementing `MyInterface`). So the code correctly find that the intersection of these speculative type is compiler/valhalla/inlinetypes/MyAbstract (compiler/valhalla/inlinetypes/MyInterface):AnyNull * (flat in array),iid=top The interesting part is that it's `AnyNull`: indeed, what else is a `MyValue2` and `MyValue3` at the same time? And then, `above_centerline` decides it's not useful enough (too precise, too clone from HS' top/normal bottom) and remove the speculative type. https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/type.cpp#L2886-L2888 But on the verification run, `compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *` is intersected with the speculative type of `java/lang/Object *`, which is unknown (HS' bottom/normal top), so we are simply getting `MyValue2`. If we did not discard `AnyNull` using `above_centerline`, we would have the intersection of `MyValue2` and `AnyNull`, giving `AnyNull`, which is indeed stable. ## Ok, but the types are weird? Indeed, they are! How can we get a speculative type `MyValue3` on the `PhiNode` when inputs are both `Object`, and one is speculated to be a `MyValue2`? This comes from incremental inlining. It seems that we have some profiling information on the returned type of a callee, that happens to be `MyValue3`, which propagate to the `PhiNode`. Later, the callee is inlined, and we get new type information (`MyValue2`) from its body (from the returned type of a callee of our callee, if I remember well), that reaches the input of our `PhiNode`. # Reproducing ## In Valhalla This crash is quite rare because: 1. it needs a specific speculative type setup, which depends heavily on timing 2. if `PhiNode::Value` is called a second time, it will stabilize the `_type` field before verification. To limitate the influence of 2., I've tested with an additional assert that would immediately do const Type* ft_ = t->filter_speculative(ft); in `PhiNode::Value` and compare `ft` and `ft_`. Indeed, we are never sure a run of `Value` is not the last one: it should always be legal to stop anywhere (even if in a particular case), it was going to run further. With this extra check, the crash a bit more common, but still pretty rare. Tests that have been witness to crash then at least once: - `compiler/valhalla/inlinetypes/TestCallingConvention.java` - `compiler/valhalla/inlinetypes/TestIntrinsics.java` - `compiler/valhalla/inlinetypes/TestArrays.java` - `compiler/valhalla/inlinetypes/TestBasicFunctionality.java` All in `compiler/valhalla/inlinetypes` while I was also testing with mainline tests. Suspicious, uh. ## In mainline With the aforementioned extra check, I've tried to see if it could happen on mainline since the involved code seems not to be valhalla-specific. As we could expect given that only valhalla tests have been seen to crash, no such crash at all. ## Crafting an example I've tried to craft an example that would create a similar situation, but without luck. I never managed to reach a correct setup of incremental inlining, conflicting type profiling... # Fixing I think changing the type system would be quite risky: it is all over the place. Also, fixing would require not to drop the speculative type when `above_centerline`. This is not like something missing or a corner case that one didn't think of, it's rather removing a feature that is explicitly here, so probably on purpose. As a first approach, one could simply run `filter_speculative` twice, that should be enough as the second filter will simply select the non empty speculative type if there is only one, and this one won't be `above_centerline`, or it would not exist as a speculative type already. To try to be a bit less aggressive, we can rather do that in case where we know it cannot be useful. If `ft` obtained from `filter_speculative` has no speculative type, but `t` has, we can know that it might be because it has been dropped, and computing `t->filter_speculative(ft)` could pick the speculative type of `t`. The speculative type can still be removed if the non-speculative type of `ft` is exact and non null for instance, but we've still reached a fixpoint, so it's correct, but a little bit too much work. That being said, I'm not claiming it's a great solution: it seems that many parts do their job as expected, but the result is unfortunate. Ignoring speculative types in verification (or maybe of `PhiNode`s only) would also work. Anyway, the problem is an unprecise type, but it is still sound: working with it shouldn't be an issue. I was told that maybe @rwestrel would have an opinion, or an idea to do differently. Thanks, Marc ------------- Commit messages: - fix assert message - Assert after each filter - without the assert - With instrumentation - Tentative fix Changes: https://git.openjdk.org/valhalla/pull/1717/files Webrev: https://webrevs.openjdk.org/?repo=valhalla&pr=1717&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367245 Stats: 76 lines in 1 file changed: 76 ins; 0 del; 0 mod Patch: https://git.openjdk.org/valhalla/pull/1717.diff Fetch: git fetch https://git.openjdk.org/valhalla.git pull/1717/head:pull/1717 PR: https://git.openjdk.org/valhalla/pull/1717
