On Thu, Apr 24, 2014 at 3:34 AM, Thomas Preud'homme <thomas.preudho...@arm.com> wrote: > See updated part 2 of the patch in attachment. Part 1 is unchanged. New > ChangeLog are as follows: > > *** gcc/ChangeLog *** > > 2014-04-23 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/54733 > * tree-ssa-math-opts.c (find_bswap_load): New. > (find_bswap_1): Add support for memory source. > (find_bswap): Likewise. > (execute_optimize_bswap): Likewise. > * expr.c (get_inner_reference): Renamed to ... > (get_inner_reference_1): This. Also add a parameter to control whether > a MEM_REF should be split into base + offset. > (get_inner_reference): New wrapper around get_inner_reference_1 for > compatibility. > (get_inner_reference_base): New wrapper around get_inner_reference_1 > to access its new facet. > * tree.h (get_inner_reference_base): Declare. > > > *** gcc/testsuite/ChangeLog *** > > 2014-04-23 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/54733 > * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap > optimization to support memory sources. > * gcc.dg/optimize-bswaphi-1.c: Likewise. > * gcc.dg/optimize-bswapsi-2.c: Likewise. > * gcc.c-torture/execute/bswap-2.c: Likewise. > > Bootstrapped on x86_64-linux-gnu with no testsuite regression. Also did a > arm-none-eabi cross build with no regression after running testsuite via qemu
+/* Given an expression EXP that is a handled_component_p, + look for the base of the ultimate containing object, which is returned and + specify the access position and size. */ + +tree +get_inner_reference_base (tree exp, HOST_WIDE_INT *pbitsize, + HOST_WIDE_INT *pbitpos, tree *poffset, + enum machine_mode *pmode, int *punsignedp, + int *pvolatilep, bool keep_aligning) +{ this name isn't exactly spot-on and it's behavior doesn't appear to be consistent for TARGET_MEM_REF (which get_inner_reference_1 simply returns unchanged). It returns the base object or address of the ultimate containing object (not handling TARGET_MEM_REF should be simply documented), so I'd say instead of inventing a new name add an overload instead (with the extra flag). I'd name the flag 'allow_address', not 'ptr_base' (as you don't return a pointer to a decl but the decl itself if the base is a decl). + bitpos /= BITS_PER_UNIT; + bitpos_expr = build_int_cst (sizetype, bitpos); bitpos_expr = size_int (bitpos); and it's really bytepos_expr now, no? + if (n->offset) + { + bitpos_expr = build_int_cst (sizetype, bitpos); + n->offset = fold_build2 (PLUS_EXPR, sizetype, n->offset, bitpos_expr); + } + else + n->offset = build_int_cst (sizetype, bitpos); n->offset = bitpos_expr; btw, I wonder why you glob together bitpos and offset again here. I'd simply record and compare n->bitpos. Which means + off_cmp_expr = fold_build2 (LT_EXPR, TREE_TYPE (n1.offset), + n2.offset, n1.offset); + if (TREE_CODE (off_cmp_expr) != INTEGER_CST) + return NULL_TREE; you'd simply compare n1.offset and n2.offset like you compare the bases (use operand_equal_p ()) and reject differences and then the constant offset part (tracked in bitpos) can be operated on with integer math instead of trees. + /* Compute address to load from and cast according to the size + of the load. */ + addr_expr = build_fold_addr_expr (unshare_expr (bswap_src)); + addr_tmp = make_temp_ssa_name (TREE_TYPE (addr_expr), NULL, + "load_src"); + addr_stmt = gimple_build_assign (addr_tmp, addr_expr); + gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT); + + /* Perform the load. */ + load_offset_ptr = build_int_cst (alias_type, 0); + val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst"); + val_expr = build2 (MEM_REF, load_type, addr_tmp, load_offset_ptr); + load_stmt = gimple_build_assign_with_ops + (MEM_REF, val_tmp, val_expr, NULL); to not cause spurious TREE_ADDRESSABLE bases you should avoid the load_src temporary if possible: if (is_gimple_min_invariant (addr_expr)) addr_tmp = addr_expr; else { addr_tmp = make_temp .... and then use fold_build2 to build the MEM_REF. Also use gimple_build_assign for the load, not the with_ops variant. The rest of the patch looks good to me now, sorry for the delay in reviewing. Thanks, Richard. > Best regards, > > Thomas