Re: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2]

Jatin Bhateja Wed, 23 Oct 2024 19:12:22 -0700

On Wed, 23 Oct 2024 17:39:22 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:


>> Because lowering is a transformation that increases the complexity of the 
>> graph.
>> 
>> - A `d = ExtractD(z,  4)` expanded into `x = VectorExtract(z, 2); d = 
>> ExtractD(x, 0)` increases the number of nodes by 1.
>> - A logic cone transformed into a `MacroLogicV` introduces another kind of 
>> node that may not be recognized by other nodes.
>> 
>> As a result, we should do this as the last step when other transformation 
>> has finished their jobs. For the concerns regarding loop body size, we still 
>> have a function in `Matcher` for that purpose.
>
> Another reason is that lowering being done late allows us to have more 
> freedom to break some invariants of the nodes, such as looking through 
> `VectorReinterpret`. An example is this (really crafted) case:
> 
>     Int256Vector v;
>     int a = v.lane(5);
>     float b = v.reinterpretAsFloats().lane(7);
> 
> This would be transformed into:
> 
>     vector<i,8> v;
>     vector<i,4> v0 = VectorExtract(v, 1);
>     int a = ExtractI(v0, 1);
>     vector<f,8> v1 = VectorReinterpret(v, <f,8>);
>     vector<f,4> v2 = VectorExtract(v1, 1);
>     float b = ExtractF(v2, 3);
> 
> By allowing lowering to look through `VectorReinterpret` and break the 
> invariant of `Extract` nodes that the element types of their inputs and 
> outputs must be the same, we can `gvn` `v1` and `v`, `v2` and `v0`. Simplify 
> the graph:
> 
>     vector<i,8> v;
>     vector<i,4> v0 = VectorExtract(v, 1);
>     int a = ExtractI(v0, 1);
>     float b = ExtractF(v0, 3);

> Because lowering is a transformation that increases the complexity of the 
> graph.
> 
> * A `d = ExtractD(z,  4)` expanded into `x = VectorExtract(z, 2); d = 
> ExtractD(x, 0)` increases the number of nodes by 1.
> * A logic cone transformed into a `MacroLogicV` introduces another kind of 
> node that may not be recognized by other nodes.
> 
> As a result, we should do this as the last step when other transformation has 
> finished their jobs. For the concerns regarding loop body size, we still have 
> a function in `Matcher` for that purpose.

Yes, you rightly pointed out, given the fact that lowering in some cases may 
significantly impact the graph shape it should be accounted by loop 
optimizations.

Unrolling decisions are  based on loop body size and a rudimentary cost model 
e.g. macro logic optimization which folds entire logic tree into one x86 
specific lowered IR should promote unrolling.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814134951

Re: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2]

Reply via email to