On 23 February 2016 at 21:49, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > On 23 February 2016 at 17:31, Richard Biener <rguent...@suse.de> wrote: >> On Tue, 23 Feb 2016, Prathamesh Kulkarni wrote: >> >>> On 22 February 2016 at 17:36, Richard Biener <rguent...@suse.de> wrote: >>> > On Mon, 22 Feb 2016, Prathamesh Kulkarni wrote: >>> > >>> >> Hi Richard, >>> >> As discussed in private mail, this version of patch attempts to >>> >> increase alignment >>> >> of global struct decl if it contains an an array field(s) and array's >>> >> offset is a multiple of the alignment of vector type corresponding to >>> >> it's scalar type and recursively checks for nested structs. >>> >> eg: >>> >> static struct >>> >> { >>> >> int a, b, c, d; >>> >> int k[4]; >>> >> float f[10]; >>> >> }; >>> >> k is a candidate array since it's offset is 16 and alignment of >>> >> "vector (4) int" is 8. >>> >> Similarly for f. >>> >> >>> >> I haven't been able to create a test-case where there are >>> >> multiple candidate arrays and vector alignment of arrays are different. >>> >> I suppose in this case we will have to increase alignment >>> >> of the struct by the max alignment ? >>> >> eg: >>> >> static struct >>> >> { >>> >> <fields> >>> >> T1 k[S1] >>> >> <fields> >>> >> T2 f[S2] >>> >> <fields> >>> >> }; >>> >> >>> >> if V1 is vector type corresponding to T1, and V2 corresponding vector >>> >> type to T2, >>> >> offset (k) % align(V1) == 0 and offset (f) % align(V2) == 0 >>> >> and align (V1) > align(V2) then we will increase alignment of struct >>> >> by align(V1). >>> >> >>> >> Testing showed FAIL for g++.dg/torture/pr31863.C due to program timeout. >>> >> Initially it appeared to me, it went in infinite loop. However >>> >> on second thoughts I think it's probably not an infinite loop, rather >>> >> taking (extraordinarily) large amount of time >>> >> to compile the test-case with the patch. >>> >> The test-case builds quickly for only 2 instantiations of ClassSpec >>> >> (ClassSpec<Key, A001, 1>, >>> >> ClassSpec<Key, A002, 2>) >>> >> Building with 22 instantiations (upto ClassSpec<Key, A0023, 22>) takes up >>> >> to ~1m to compile. >>> >> with: >>> >> 23 instantiations: ~2m >>> >> 24 instantiations: ~5m >>> >> For 30 instantiations I terminated cc1plus after 13m (by SIGKILL). >>> >> >>> >> I guess it shouldn't go in an infinite loop because: >>> >> a) structs cannot have circular references. >>> >> b) works for lower number of instantiations >>> >> However I have no sound evidence that it cannot be in infinite loop. >>> >> I don't understand why a decl node is getting visited more than once >>> >> for that test-case. >>> >> >>> >> Using a hash_map to store alignments of decl's so that decl node gets >>> >> visited >>> >> only once prevents the issue. >>> > >>> > Maybe aliases. Try not walking vnode->alias == true vars. >>> Hi, >>> I have a hypothesis why decl node gets visited multiple times. >>> >>> Consider the test-case: >>> template <typename T, unsigned N> >>> struct X >>> { >>> T a; >>> virtual int foo() { return N; } >>> }; >>> >>> typedef struct X<int, 1> x_1; >>> typedef struct X<int ,2> x_2; >>> >>> static x_1 obj1 __attribute__((used)); >>> static x_2 obj2 __attribute__((used)); >>> >>> Two additional structs are created by C++FE, c++filt shows: >>> _ZTI1XIiLj1EE -> typeinfo for X<int, 1u> >>> _ZTI1XIiLj2EE -> typeinfo for X<int, 2u> >>> >>> Both of these structs have only one field D.2991 and it appears it's >>> *shared* between them: >>> struct D.2991; >>> const void * D.2980; >>> const char * D.2981; >>> >>> Hence the decl node D.2991 and it's fields (D.2890, D.2981) get visited >>> twice: >>> once when walking _ZTI1XIiLj1EE and 2nd time when walking _ZTI1XIiLj2EE >>> >>> Dump of walking over the global structs for above test-case: >>> http://pastebin.com/R5SABW0c >>> >>> So it appears to me to me a DAG (interior node == struct decl, leaf == >>> non-struct field, >>> edge from node1 -> node2 if node2 is field of node1) is getting >>> created when struct decl >>> is a type-info object. >>> >>> I am not really clear on how we should proceed: >>> a) Keep using hash_map to store alignments so that every decl gets >>> visited only once. >>> b) Skip walking artificial record decls. >>> I am not sure if skipping all artificial struct decls would be a good >>> idea, but I don't >>> think it's possible to identify if a struct-decl is typeinfo struct at >>> middle-end ? >> >> You shouldn't end up walking those when walking the type of >> global decls. That is, don't walk typeinfo decls - yes, practically >> that means just not walking DECL_ARTIFICIAL things. > Hi, > I have done the changes in the patch (attached) and cross-tested > on arm*-*-* and aarch64*-*-* without failures. > Is it OK for stage-1 ? Hi, Is the attached patch OK for trunk ? Bootstrapped and tested on aarch64-linux-gnu, ppc64le-linux-gnu. Cross-tested on arm*-*-* and aarch64*-*-*.
Thanks, Prathamesh > > Thanks, > Prathamesh >> >> Richard.
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c new file mode 100644 index 0000000..7010a52 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target section_anchors } */ +/* { dg-require-effective-target vect_int } */ + +#define N 32 + +/* Increase alignment of struct if an array's offset is multiple of alignment of + vector type corresponding to it's scalar type. + For the below test-case: + offsetof(e) == 8 bytes. + i) For arm: let x = alignment of vector type corresponding to int, + x == 8 bytes. + Since offsetof(e) % x == 0, set DECL_ALIGN(a, b, c) to x. + ii) For aarch64, ppc: x == 16 bytes. + Since offsetof(e) % x != 0, don't increase alignment of a, b, c. +*/ + +static struct A { + int p1, p2; + int e[N]; +} a, b, c; + +int foo(void) +{ + for (int i = 0; i < N; i++) + a.e[i] = b.e[i] + c.e[i]; + + return a.e[0]; +} + +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target aarch64*-*-* } } } */ +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target powerpc64*-*-* } } } */ +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 3 "increase_alignment" { target arm*-*-* } } } */ diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 2b25b45..0dd0d24 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -794,6 +794,88 @@ make_pass_slp_vectorize (gcc::context *ctxt) This should involve global alignment analysis and in the future also array padding. */ +static unsigned get_vec_alignment_for_decl (tree); + +/* Return alignment of array's vector type corresponding to scalar type. + 0 if no vector type exists. */ +static unsigned +get_vec_alignment_for_array_decl (tree array_decl) +{ + tree type = TREE_TYPE (array_decl); + gcc_assert (TREE_CODE (type) == ARRAY_TYPE); + + tree vectype = get_vectype_for_scalar_type (strip_array_types (type)); + return (vectype) ? TYPE_ALIGN (vectype) : 0; +} + +/* Return alignment of field having maximum alignment of vector type + corresponding to it's scalar type. For now, we only consider fields whose + offset is a multiple of it's vector alignment. + 0 if no suitable field is found. */ +static unsigned +get_vec_alignment_for_record_decl (tree record_decl) +{ + tree type = TREE_TYPE (record_decl); + gcc_assert (TREE_CODE (type) == RECORD_TYPE); + unsigned max_align = 0, alignment; + HOST_WIDE_INT offset; + + /* Skip artificial decls like typeinfo decls or if + record is packed. */ + if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type)) + return 0; + + for (tree field = first_field (type); + field != NULL_TREE; + field = DECL_CHAIN (field)) + { + /* C++FE puts node "._0" of code TYPE_DECL. skip that. */ + if (TREE_CODE (field) != FIELD_DECL) + continue; + + offset = int_byte_position (field); + alignment = get_vec_alignment_for_decl (field); + + if (alignment + && (offset % (alignment / BITS_PER_UNIT) == 0) + && (alignment > max_align)) + max_align = alignment; + } + + return max_align; +} + +/* Return alignment of vector type corresponding to decl's scalar type + or 0 if it doesn't exist or the vector alignment is lesser than + decl's alignment. */ +static unsigned +get_vec_alignment_for_decl (tree decl) +{ + if (decl == NULL_TREE) + return 0; + + gcc_assert (DECL_P (decl)); + + static unsigned alignment = 0; + tree type = TREE_TYPE (decl); + + switch (TREE_CODE (type)) + { + case ARRAY_TYPE: + alignment = get_vec_alignment_for_array_decl (decl); + break; + case RECORD_TYPE: + alignment = get_vec_alignment_for_record_decl (decl); + break; + default: + alignment = 0; + break; + } + + return (alignment > DECL_ALIGN (decl)) ? alignment : 0; +} + +/* Entry point to increase_alignment pass. */ static unsigned int increase_alignment (void) { @@ -804,23 +886,14 @@ increase_alignment (void) /* Increase the alignment of all global arrays for vectorization. */ FOR_EACH_DEFINED_VARIABLE (vnode) { - tree vectype, decl = vnode->decl; - tree t; + tree decl = vnode->decl; unsigned int alignment; - t = TREE_TYPE (decl); - if (TREE_CODE (t) != ARRAY_TYPE) - continue; - vectype = get_vectype_for_scalar_type (strip_array_types (t)); - if (!vectype) - continue; - alignment = TYPE_ALIGN (vectype); - if (DECL_ALIGN (decl) >= alignment) - continue; - - if (vect_can_force_dr_alignment_p (decl, alignment)) + alignment = get_vec_alignment_for_decl (decl); + + if (alignment && vect_can_force_dr_alignment_p (decl, alignment)) { - vnode->increase_alignment (TYPE_ALIGN (vectype)); + vnode->increase_alignment (alignment); dump_printf (MSG_NOTE, "Increasing alignment of decl: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, decl); dump_printf (MSG_NOTE, "\n");
ChangeLog
Description: Binary data