Hi, On Thu, Jan 12, 2012 at 03:23:31PM +0100, Richard Guenther wrote: > On Thu, 12 Jan 2012, Martin Jambor wrote: > > > Hi, > > > > I tend to believe that this is the SRA part of a fix for PR 50444 (and > > possibly some strict alignment SRA-related bugs too. What it does is > > that it decreases the alignment of the built MEM_REF type if it seems > > necessary. That is in cases when the alignment of the type as it > > arrived in the parameter is larger than the alignment of the base we > > are building the MEM_REF on top of or the MEM_REF we are replacing or > > when it is larger than the biggest power of two that divides the > > offset we are adding, if it is non zero). > > > > This patch alone fixes the testcase on i686, on x86_64 the testcase > > stops segfaulting but the final comparison does not go well and the > > testcase returns non-zero exit code. However, I belive that is most > > probably an expansion problem, rather than SRA's. Perhaps it should > > be also noted that if I revert SRA's build_ref_for_model to what it > > was in revision 164280 and SRA creates only a MEM_REF rather than a > > COMPONENT_REF on top of a MEM_REF which it does now, the assignment > > expansion takes the "optab_handler (movmisalign_optab, mode)" route > > and the testcase is compiled well, with both the segfault gone and > > comparison yielding the correct result. I plan to have a look at what > > goes on in expansion with the current build_ref_for_model shortly but > > thought I'd post this for review and comments before that. > > > > A similar patch has passed bootstrap and testsuite on x86_64-linux, > > this exact one is undergoing the same as I'm writing this. Is it the > > good direction, should I commit the patch (to trunk and the 4.6 > > branch) if it passes? > > I don't think this is a suitable place to discover the alignment > for the access. In fact what we should do is use the same alignment > that was used for the access we replicate/decompose (or is the > 'base' argument to build_ref_for_offset always the original access?)
No, it often it is not. And I am not sure what that same alignment above is. Let me explain the crux of the problem with the testcase. The data structures look like this: typedef struct { uint32_t v[4]; } a4x32; typedef struct { __m128i m; } a1xm128i; typedef struct { a1xm128i key; a4x32 c; size_t elem; a4x32 v; } Engine; And we have a local aggregate of type a4x32 but which is accessed as __m128i, therefore its scalar replacement has type __m128i. Any store of that type requires 128bit alignment. The problem is that the original a4x32 aggregate is stored into the v field of a structure of type Engine which is pointed to by a pointer parameter. Field v has offset 40 and therefore any store to it in the __m128i type without fiddling with its alignment is illegal (the Engine structure itself is 128bit aligned because of field key). Specifically, before SRA it all looks like this. D.4781 and D.4792 are local a4x32 structures, e points to a struct Engine): MEM[(struct *)&D.4792].m = vector_ssa_name; D.4781 = D.4792; e_1(D)->v = D.4781; If I want to replace the local aggregates D.4781 and D.4792 with vectors and make them go away completely (and I do believe people would complain if I did not, especially if they are used in more computations elsewhere in the function), I have to store the vecor to the unaligned position. If we want to make D.4781 disappear completely, there is no original access we are replicating/decomposing and we have to take into account the offset 40, which however is a property of this particular memory access and nothing else. Therefore I thought this was the proper place to deal with it. > > Also ... > > > Thanks, > > > > Martin > > > > > > 2012-01-12 Martin Jambor <mjam...@suse.cz> > > > > PR tree-optimization/50444 > > * tree-sra.c (build_ref_for_offset): Decrease the alignment of the > > type of the created MEM_REF if necessary. > > > > Index: src/gcc/tree-sra.c > > =================================================================== > > --- src.orig/gcc/tree-sra.c > > +++ src/gcc/tree-sra.c > > @@ -1458,9 +1458,10 @@ build_ref_for_offset (location_t loc, tr > > tree exp_type, gimple_stmt_iterator *gsi, > > bool insert_after) > > { > > + HOST_WIDE_INT base_offset, final_offset; > > + unsigned int align; > > tree prev_base = base; > > tree off; > > - HOST_WIDE_INT base_offset; > > > > gcc_checking_assert (offset % BITS_PER_UNIT == 0); > > > > @@ -1488,24 +1489,34 @@ build_ref_for_offset (location_t loc, tr > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > update_stmt (stmt); > > > > + final_offset = offset / BITS_PER_UNIT; > > off = build_int_cst (reference_alias_ptr_type (prev_base), > > - offset / BITS_PER_UNIT); > > + final_offset); > > base = tmp; > > + align = get_object_alignment (prev_base); > > } > > else if (TREE_CODE (base) == MEM_REF) > > { > > - off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > > - base_offset + offset / BITS_PER_UNIT); > > + final_offset = base_offset + offset / BITS_PER_UNIT; > > + off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > > final_offset); > > off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off); > > + align = get_object_or_type_alignment (base); > > base = unshare_expr (TREE_OPERAND (base, 0)); > > } > > else > > { > > - off = build_int_cst (reference_alias_ptr_type (base), > > - base_offset + offset / BITS_PER_UNIT); > > + final_offset = base_offset + offset / BITS_PER_UNIT; > > + off = build_int_cst (reference_alias_ptr_type (base), final_offset); > > + align = get_object_alignment (base); > > I think you always want to use get_object_alignment_1 to be able to > factor in offset and thus knowledge about explicit misalignment, > otherwise you'd pessimize quite some cases. Then the question > is whether to factor in TYPE_ALIGN of the original access > (see get_object_or_type_alignment). I think the MEM_REF case needs to use get_object_or_type_alignment like my patch does because quite likely its base is a pointer SSA_NAME. I do not know how I could make any use of the value returned in get_object_alignment_1's bitposp other than exactly what get_object_alignment does. Can you give me a further hint? What would be the reason for pessimizing things? Thanks a lot, Martin > > Richard. > > > base = build_fold_addr_expr (unshare_expr (base)); > > } > > > > + if (final_offset != 0) > > + align = MIN (align, > > + (1u << ctz_hwi (absu_hwi (final_offset))) * > > BITS_PER_UNIT); > > + if (align < TYPE_ALIGN (exp_type)) > > + exp_type = build_aligned_type (exp_type, align); > > + > > return fold_build2_loc (loc, MEM_REF, exp_type, base, off); > > }