On Tue, Sep 8, 2020 at 10:11 AM luoxhu <luo...@linux.ibm.com> wrote:
>
> Hi Richi,
>
> On 2020/9/7 19:57, Richard Biener wrote:
> > +      if (TREE_CODE (to) == ARRAY_REF)
> > +       {
> > +         tree op0 = TREE_OPERAND (to, 0);
> > +         if (TREE_CODE (op0) == VIEW_CONVERT_EXPR
> > +             && expand_view_convert_to_vec_set (to, from, to_rtx))
> > +           {
> > +             pop_temp_slots ();
> > +             return;
> > +           }
> > +       }
> >
> > you're placing this at an awkward spot IMHO, after to_rtx expansion
> > but disregading parts of  it and compensating just with 'to' matching.
> > Is the pieces (offset, bitpos) really too awkward to work with for
> > matching?
> >
> > Because as written you'll miscompile
> >
> > struct X { _vector signed int v; _vector singed int u; } x;
> >
> > test(int i, int a)
> > {
> >    x.u[i] = a;
> > }
> >
> > as I think you'll end up assigning to x.v.
>
> Thanks for pointing out, this case will be a problem for the patch.
> I checked with optimize_bitfield_assignment_op, it will return very early
> as the mode1 is not VOIDmode, and this is actually not "FIELD op= VAL"
> operation?
>
> To be honest, I am not quite familiar with this part of code, I put the new
> function expand_view_convert_to_vec_set just after to_rtx expansion because
> adjust_address will change the V4SImode memory to SImode memory, but I need
> keep target to_rtx V4SImode to save the vector after calling
> rs6000_vector_set_var, so seems paradoxical here?
>
>  p to_rtx
> $264 = (rtx_def *) (mem/c:V4SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 
> S16 A128])
>
> => to_rtx = adjust_address (to_rtx, mode1, 0);
>
> p to_rtx
> $265 = (rtx_def *) (mem/c:SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 S4 
> A128])
>
>
> >
> > Are we just interested in the case were we store to a
> > pseudo or also when the destination is memory?  I guess
> > only when it's a pseudo - correct?  In that case
> > handling this all in optimize_bitfield_assignment_op
> > is probably the best thing to try.
> >
> > Note we possibly refrain from assigning a pseudo to
> > such vector because we see a variable array-ref to it.
>
> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
> the vector variable will be store to stack first, then [r112:DI] is a
> memory here to be processed.  So the patch loads it from stack(insn #10) to
> temp vector register first, and store to stack again(insn #24) after
> rs6000_vector_set_var.

Hmm, yeah - I guess that's what should be addressed first then.
I'm quite sure that in case 'v' is not on the stack but in memory like
in my case a SImode store is better than what we get from
vec_insert - in fact vec_insert will likely introduce a RMW cycle
which is prone to inserting store-data-races?

So - what we need to "fix" is cfgexpand.c marking variably-indexed
decls as not to be expanded as registers (see
discover_nonconstant_array_refs).

I guess one way forward would be to perform instruction
selection on GIMPLE here and transform

VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D)

to a (direct) internal function based on the vec_set optab.  But then
in GIMPLE D.3185 is also still memory (we don't have a variable
index partial register set operation - BIT_INSERT_EXPR is
currently specified to receive a constant bit position only).

At which point after your patch is the stack storage elided?

>
> optimized:
>
> D.3185 = v_3(D);
> _1 = n_5(D) & 3;
> VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D);
> v_8 = D.3185;
> return v_8;
>
> => expand without the patch:
>
>     2: r119:V4SI=%2:V4SI
>     3: r120:DI=%5:DI
>     4: r121:DI=%6:DI
>     5: NOTE_INSN_FUNCTION_BEG
>     8: [r112:DI]=r119:V4SI
>
>     9: r122:DI=r121:DI&0x3
>    10: r123:DI=r122:DI<<0x2
>    11: r124:DI=r112:DI+r123:DI
>    12: [r124:DI]=r120:DI#0
>
>    13: r126:V4SI=[r112:DI]
>    14: r118:V4SI=r126:V4SI
>    18: %2:V4SI=r118:V4SI
>    19: use %2:V4SI
>
> => expand with the patch (replace #9~#12 to #10~#24):
>
>     2: r119:V4SI=%2:V4SI
>     3: r120:DI=%5:DI
>     4: r121:DI=%6:DI
>     5: NOTE_INSN_FUNCTION_BEG
>     8: [r112:DI]=r119:V4SI
>     9: r122:DI=r121:DI&0x3
>
>    10: r123:V4SI=[r112:DI]         // load from stack
>    11: {r125:SI=0x3-r122:DI#0;clobber ca:SI;}
>    12: r125:SI=r125:SI<<0x2
>    13: {r125:SI=0x14-r125:SI;clobber ca:SI;}
>    14: r128:DI=unspec[`*.LC0',%2:DI] 47
>       REG_EQUAL `*.LC0'
>    15: r127:V2DI=[r128:DI]
>       REG_EQUAL const_vector
>    16: r126:V16QI=r127:V2DI#0
>    17: r129:V16QI=unspec[r120:DI#0] 61
>    18: r130:V16QI=unspec[r125:SI] 151
>    19: r131:V16QI=unspec[r129:V16QI,r129:V16QI,r130:V16QI] 232
>    20: r132:V16QI=unspec[r126:V16QI,r126:V16QI,r130:V16QI] 232
>    21: r124:V16QI=r123:V4SI#0
>    22: r124:V16QI={(r132:V16QI!=const_vector)?r131:V16QI:r124:V16QI}
>    23: r123:V4SI=r124:V16QI#0
>    24: [r112:DI]=r123:V4SI       // store to stack.
>
>    25: r134:V4SI=[r112:DI]
>    26: r118:V4SI=r134:V4SI
>    30: %2:V4SI=r118:V4SI
>    31: use %2:V4SI
>
>
> Thanks,
> Xionghu

Reply via email to