store

Richard Biener Tue, 15 Apr 2014 02:07:31 -0700

On Fri, Apr 4, 2014 at 7:49 AM, Thomas Preud'homme
<thomas.preudho...@arm.com> wrote:
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of Rainer Orth
>>
>> Just omit the { target *-*-* } completely, also a few more times.
>
> Please find attached an updated patch.


@@ -1733,6 +1743,51 @@ find_bswap_1 (gimple stmt, struct
symbolic_number *n, int limit)
         to initialize the symbolic number.  */
       if (!source_expr1)
        {
+         n->base_addr = n->offset = NULL_TREE;
+         if (is_gimple_assign (rhs1_stmt))

you want gimple_assign_load_p (rhs1_stmt) && !gimple_has_volatile_ops
(rhs1_stmt)
here.

+               case ARRAY_RANGE_REF:

For ARRAY_RANGE_REF this doesn't make much sense IMHO.

+               case ARRAY_REF:
+                 n->base_addr = TREE_OPERAND (var, 0);
+                 elt_size = array_ref_element_size (var);
+                 if (TREE_CODE (elt_size) != INTEGER_CST)
+                   return NULL_TREE;
+                 index = TREE_OPERAND (var, 1);
+                 if (TREE_THIS_VOLATILE (var) || TREE_THIS_VOLATILE (index))
+                   return NULL_TREE;
+                 n->offset = fold_build2 (MULT_EXPR, sizetype,
+                                          index, elt_size);

You fail to honor array_ref_low_bound.

With handling only the outermost handled-component and then only a
selected subset you'll catch many but not all cases.  Why not simply
use get_inner_reference () here (plus stripping the constant offset
from an innermost MEM_REF) and get the best of both worlds (not
duplicate parts of its logic and handle more cases)?  Eventually
using tree-affine.c and get_inner_reference_aff is even more appropriate
so you can compute the address differences without decomposing
them yourselves.

Btw, I think for the recursion to work properly you need to handle loads
for the toplevel stmt, not for rhs1_stmt.  Eventually you need to split
out a find_bswap_2 that handles the recursion case that allows loads
from the case that doesn't called from find_bswap.

+             /*  Compute address to load from and cast according to the size
+                 of the load.  */
+             load_ptr_type = build_pointer_type (load_type);
+             addr_expr = build1 (ADDR_EXPR, load_ptr_type, bswap_src);
+             addr_tmp = make_temp_ssa_name (load_ptr_type, NULL, "load_src");
+             addr_stmt = gimple_build_assign_with_ops
+                        (NOP_EXPR, addr_tmp, addr_expr, NULL);
+             gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT);
+
+             /* Perform the load.  */
+             load_offset_ptr = build_int_cst (load_ptr_type, 0);
+             val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst");
+             val_expr = build2 (MEM_REF, load_type, addr_tmp, load_offset_ptr);
+             load_stmt = gimple_build_assign_with_ops
+                        (MEM_REF, val_tmp, val_expr, NULL);

this is unnecessarily complex and has TBAA issues.  You don't need to
create a "correct" pointer type, so doing

    addr_expr = fold_build_addr_expr (bswap_src);

is enough.  Now, to fix the TBAA issues you either need to remember
and "combine" the reference_alias_ptr_type of each original load and
use that for the load_offset_ptr value or decide that isn't worth it and
use alias-set zero (use ptr_type_node).

Can you also expand the comment about "size vs. range"?  Is it
that range can be bigger than size if you have (short)a[0] |
((short)a[3] << 1) sofar
where size == 2 but range == 3?  Thus range can also be smaller than size
for example for (short)a[0] | ((short)a[0] << 1) where range would be 1 and
size == 2?  I suppose adding two examples like this to the comment, together
with the expected value of 'n' would help here.

Otherwise the patch looks good.  Now we're only missing the addition
of trying to match to a VEC_PERM_EXPR with a constant mask
using can_vec_perm_p ;)

Thanks,
Richard.



> Best regards,
>
> Thomas

Re: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store

Reply via email to