On Jun 1, 2018, at 10:35 AM, Richard Biener <richard.guent...@gmail.com> wrote: > > On June 1, 2018 5:15:58 PM GMT+02:00, Bill Schmidt <wschm...@linux.ibm.com> > wrote: >> On Jun 1, 2018, at 10:11 AM, Will Schmidt <will_schm...@vnet.ibm.com> >> wrote: >>> >>> On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote: >>>> On Thu, May 31, 2018 at 9:59 PM Will Schmidt >> <will_schm...@vnet.ibm.com> wrote: >>>>> >>>>> Hi, >>>>> Add support for gimple folding for unaligned vector loads and >> stores. >>>>> testcases posted separately in this thread. >>>>> >>>>> Regtest completed across variety of systems, P6,P7,P8,P9. >>>>> >>>>> OK for trunk? >>>>> Thanks, >>>>> -Will >>>>> >>>>> [gcc] >>>>> >>>>> 2018-05-31 Will Schmidt <will_schm...@vnet.ibm.com> >>>>> >>>>> * config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs) >> Add vec_xst >>>>> variants to the list. (rs6000_gimple_fold_builtin) Add >> support for >>>>> folding unaligned vector loads and stores. >>>>> >>>>> diff --git a/gcc/config/rs6000/rs6000.c >> b/gcc/config/rs6000/rs6000.c >>>>> index d62abdf..54b7de2 100644 >>>>> --- a/gcc/config/rs6000/rs6000.c >>>>> +++ b/gcc/config/rs6000/rs6000.c >>>>> @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum >> rs6000_builtins fn_code) >>>>> case ALTIVEC_BUILTIN_STVX_V8HI: >>>>> case ALTIVEC_BUILTIN_STVX_V4SI: >>>>> case ALTIVEC_BUILTIN_STVX_V4SF: >>>>> case ALTIVEC_BUILTIN_STVX_V2DI: >>>>> case ALTIVEC_BUILTIN_STVX_V2DF: >>>>> + case VSX_BUILTIN_STXVW4X_V16QI: >>>>> + case VSX_BUILTIN_STXVW4X_V8HI: >>>>> + case VSX_BUILTIN_STXVW4X_V4SF: >>>>> + case VSX_BUILTIN_STXVW4X_V4SI: >>>>> + case VSX_BUILTIN_STXVD2X_V2DF: >>>>> + case VSX_BUILTIN_STXVD2X_V2DI: >>>>> return true; >>>>> default: >>>>> return false; >>>>> } >>>>> } >>>>> @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin >> (gimple_stmt_iterator *gsi) >>>>> gimple_set_location (g, loc); >>>>> gsi_replace (gsi, g, true); >>>>> return true; >>>>> } >>>>> >>>>> + /* unaligned Vector loads. */ >>>>> + case VSX_BUILTIN_LXVW4X_V16QI: >>>>> + case VSX_BUILTIN_LXVW4X_V8HI: >>>>> + case VSX_BUILTIN_LXVW4X_V4SF: >>>>> + case VSX_BUILTIN_LXVW4X_V4SI: >>>>> + case VSX_BUILTIN_LXVD2X_V2DF: >>>>> + case VSX_BUILTIN_LXVD2X_V2DI: >>>>> + { >>>>> + arg0 = gimple_call_arg (stmt, 0); // offset >>>>> + arg1 = gimple_call_arg (stmt, 1); // address >>>>> + lhs = gimple_call_lhs (stmt); >>>>> + location_t loc = gimple_location (stmt); >>>>> + /* Since arg1 may be cast to a different type, just use >> ptr_type_node >>>>> + here instead of trying to enforce TBAA on pointer >> types. */ >>>>> + tree arg1_type = ptr_type_node; >>>>> + tree lhs_type = TREE_TYPE (lhs); >>>>> + /* POINTER_PLUS_EXPR wants the offset to be of type >> 'sizetype'. Create >>>>> + the tree using the value from arg0. The resulting type >> will match >>>>> + the type of arg1. */ >>>>> + gimple_seq stmts = NULL; >>>>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, >> arg0); >>>>> + tree temp_addr = gimple_build (&stmts, loc, >> POINTER_PLUS_EXPR, >>>>> + arg1_type, arg1, >> temp_offset); >>>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>>>> + /* Use the build2 helper to set up the mem_ref. The >> MEM_REF could also >>>>> + take an offset, but since we've already incorporated >> the offset >>>>> + above, here we just pass in a zero. */ >>>>> + gimple *g; >>>>> + g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, >> temp_addr, >>>>> + build_int_cst >> (arg1_type, 0))); >>>> >>>> So in GIMPLE the type of the MEM_REF specifies the alignment so my >> question >>>> is what type does the lhs usually have here? I'd simply guess V4SF, >> etc.? In >>> >>> yes. (double-checking). my reference for the intrinsic signatures >>> shows the lhs is a vector of type. The rhs can be either *type or >>> *vector of type. >>> >>> vector double vec_vsx_ld (int, const vector double *); >>> vector double vec_vsx_ld (int, const double *); >>> With similar/same for the assorted other types. >>> >>> These are also on my list as 'unaligned' vector loads. I'm not >> certain >>> if that adds a twist to how I should answer the below.. >>> >>> Bill? >> >> 'unaligned' means not necessarily aligned on a vector boundary. >> They are guaranteed to be aligned on an element boundary. >>> >>>> this case you are missing a >>>> tree ltype = build_aligned_type (lhs_type, desired-alignment); >>>> >>>> and use that ltype for building the MEM_REF. I suppose in this case >> the known >>>> alignment is either BITS_PER_UNIT or element alignment (thus >>>> TYPE_ALIGN (TREE_TYPE (lhs_type)))? >>> >>> I'd think element alignment. but no longer certain. :-) >> >> Yep, element alignment. > > Note the x86 unaligned intrinsics support arbitray unaligned loads. So that's > not available for power? Does the HW implementation require element > alignment?
I had to go look this up again... Actually, the required alignment is 4 bytes regardless of the data type. I thought it was 8 bytes for V2DF/V2DI accesses, but that's not correct. But we don't support arbitrary alignment at the byte level. Thanks! Bill > > Richard. > >> Thanks, >> Bill >>> >>>> Or is the type of the load the element types? >>> >>> >>> So, In any case.. I'll build up / modify some tests to look at data >>> being loaded, and see if I can see alignment issues here. >>> >>> Thanks, >>> -Will >>> >>> >>> >>>> Richard. >>>> >>>>> + gimple_set_location (g, loc); >>>>> + gsi_replace (gsi, g, true); >>>>> + return true; >>>>> + } >>>>> + >>>>> + /* unaligned Vector stores. */ >>>>> + case VSX_BUILTIN_STXVW4X_V16QI: >>>>> + case VSX_BUILTIN_STXVW4X_V8HI: >>>>> + case VSX_BUILTIN_STXVW4X_V4SF: >>>>> + case VSX_BUILTIN_STXVW4X_V4SI: >>>>> + case VSX_BUILTIN_STXVD2X_V2DF: >>>>> + case VSX_BUILTIN_STXVD2X_V2DI: >>>>> + { >>>>> + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. >> */ >>>>> + arg1 = gimple_call_arg (stmt, 1); /* Offset. */ >>>>> + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to >> address. */ >>>>> + location_t loc = gimple_location (stmt); >>>>> + tree arg0_type = TREE_TYPE (arg0); >>>>> + /* Use ptr_type_node (no TBAA) for the arg2_type. */ >>>>> + tree arg2_type = ptr_type_node; >>>>> + /* POINTER_PLUS_EXPR wants the offset to be of type >> 'sizetype'. Create >>>>> + the tree using the value from arg0. The resulting type >> will match >>>>> + the type of arg2. */ >>>>> + gimple_seq stmts = NULL; >>>>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, >> arg1); >>>>> + tree temp_addr = gimple_build (&stmts, loc, >> POINTER_PLUS_EXPR, >>>>> + arg2_type, arg2, >> temp_offset); >>>>> + /* Mask off any lower bits from the address. */ >>>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>>>> + gimple *g; >>>>> + g = gimple_build_assign (build2 (MEM_REF, arg0_type, >> temp_addr, >>>>> + build_int_cst >> (arg2_type, 0)), arg0); >>>>> + gimple_set_location (g, loc); >>>>> + gsi_replace (gsi, g, true); >>>>> + return true; >>>>> + } >>>>> + >>>>> /* Vector Fused multiply-add (fma). */ >>>>> case ALTIVEC_BUILTIN_VMADDFP: >>>>> case VSX_BUILTIN_XVMADDDP: >>>>> case ALTIVEC_BUILTIN_VMLADDUHM: >>>>> {