On Tue, Sep 8, 2020 at 10:11 AM luoxhu <luo...@linux.ibm.com> wrote: > > Hi Richi, > > On 2020/9/7 19:57, Richard Biener wrote: > > + if (TREE_CODE (to) == ARRAY_REF) > > + { > > + tree op0 = TREE_OPERAND (to, 0); > > + if (TREE_CODE (op0) == VIEW_CONVERT_EXPR > > + && expand_view_convert_to_vec_set (to, from, to_rtx)) > > + { > > + pop_temp_slots (); > > + return; > > + } > > + } > > > > you're placing this at an awkward spot IMHO, after to_rtx expansion > > but disregading parts of it and compensating just with 'to' matching. > > Is the pieces (offset, bitpos) really too awkward to work with for > > matching? > > > > Because as written you'll miscompile > > > > struct X { _vector signed int v; _vector singed int u; } x; > > > > test(int i, int a) > > { > > x.u[i] = a; > > } > > > > as I think you'll end up assigning to x.v. > > Thanks for pointing out, this case will be a problem for the patch. > I checked with optimize_bitfield_assignment_op, it will return very early > as the mode1 is not VOIDmode, and this is actually not "FIELD op= VAL" > operation? > > To be honest, I am not quite familiar with this part of code, I put the new > function expand_view_convert_to_vec_set just after to_rtx expansion because > adjust_address will change the V4SImode memory to SImode memory, but I need > keep target to_rtx V4SImode to save the vector after calling > rs6000_vector_set_var, so seems paradoxical here? > > p to_rtx > $264 = (rtx_def *) (mem/c:V4SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 > S16 A128]) > > => to_rtx = adjust_address (to_rtx, mode1, 0); > > p to_rtx > $265 = (rtx_def *) (mem/c:SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 S4 > A128]) > > > > > > Are we just interested in the case were we store to a > > pseudo or also when the destination is memory? I guess > > only when it's a pseudo - correct? In that case > > handling this all in optimize_bitfield_assignment_op > > is probably the best thing to try. > > > > Note we possibly refrain from assigning a pseudo to > > such vector because we see a variable array-ref to it. > > Seems not only pseudo, for example "v = vec_insert (i, v, n);" > the vector variable will be store to stack first, then [r112:DI] is a > memory here to be processed. So the patch loads it from stack(insn #10) to > temp vector register first, and store to stack again(insn #24) after > rs6000_vector_set_var.
Hmm, yeah - I guess that's what should be addressed first then. I'm quite sure that in case 'v' is not on the stack but in memory like in my case a SImode store is better than what we get from vec_insert - in fact vec_insert will likely introduce a RMW cycle which is prone to inserting store-data-races? So - what we need to "fix" is cfgexpand.c marking variably-indexed decls as not to be expanded as registers (see discover_nonconstant_array_refs). I guess one way forward would be to perform instruction selection on GIMPLE here and transform VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D) to a (direct) internal function based on the vec_set optab. But then in GIMPLE D.3185 is also still memory (we don't have a variable index partial register set operation - BIT_INSERT_EXPR is currently specified to receive a constant bit position only). At which point after your patch is the stack storage elided? > > optimized: > > D.3185 = v_3(D); > _1 = n_5(D) & 3; > VIEW_CONVERT_EXPR<int[4]>(D.3185)[_1] = i_6(D); > v_8 = D.3185; > return v_8; > > => expand without the patch: > > 2: r119:V4SI=%2:V4SI > 3: r120:DI=%5:DI > 4: r121:DI=%6:DI > 5: NOTE_INSN_FUNCTION_BEG > 8: [r112:DI]=r119:V4SI > > 9: r122:DI=r121:DI&0x3 > 10: r123:DI=r122:DI<<0x2 > 11: r124:DI=r112:DI+r123:DI > 12: [r124:DI]=r120:DI#0 > > 13: r126:V4SI=[r112:DI] > 14: r118:V4SI=r126:V4SI > 18: %2:V4SI=r118:V4SI > 19: use %2:V4SI > > => expand with the patch (replace #9~#12 to #10~#24): > > 2: r119:V4SI=%2:V4SI > 3: r120:DI=%5:DI > 4: r121:DI=%6:DI > 5: NOTE_INSN_FUNCTION_BEG > 8: [r112:DI]=r119:V4SI > 9: r122:DI=r121:DI&0x3 > > 10: r123:V4SI=[r112:DI] // load from stack > 11: {r125:SI=0x3-r122:DI#0;clobber ca:SI;} > 12: r125:SI=r125:SI<<0x2 > 13: {r125:SI=0x14-r125:SI;clobber ca:SI;} > 14: r128:DI=unspec[`*.LC0',%2:DI] 47 > REG_EQUAL `*.LC0' > 15: r127:V2DI=[r128:DI] > REG_EQUAL const_vector > 16: r126:V16QI=r127:V2DI#0 > 17: r129:V16QI=unspec[r120:DI#0] 61 > 18: r130:V16QI=unspec[r125:SI] 151 > 19: r131:V16QI=unspec[r129:V16QI,r129:V16QI,r130:V16QI] 232 > 20: r132:V16QI=unspec[r126:V16QI,r126:V16QI,r130:V16QI] 232 > 21: r124:V16QI=r123:V4SI#0 > 22: r124:V16QI={(r132:V16QI!=const_vector)?r131:V16QI:r124:V16QI} > 23: r123:V4SI=r124:V16QI#0 > 24: [r112:DI]=r123:V4SI // store to stack. > > 25: r134:V4SI=[r112:DI] > 26: r118:V4SI=r134:V4SI > 30: %2:V4SI=r118:V4SI > 31: use %2:V4SI > > > Thanks, > Xionghu