ping https://gcc.gnu.org/ml/gcc/2016-05/msg00120.html
Thanks, Prathamesh On 11 May 2016 at 15:39, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > On 6 May 2016 at 17:20, Richard Biener <rguent...@suse.de> wrote: >> On Wed, 4 May 2016, Prathamesh Kulkarni wrote: >> >>> On 23 February 2016 at 21:49, Prathamesh Kulkarni >>> <prathamesh.kulka...@linaro.org> wrote: >>> > On 23 February 2016 at 17:31, Richard Biener <rguent...@suse.de> wrote: >>> >> On Tue, 23 Feb 2016, Prathamesh Kulkarni wrote: >>> >> >>> >>> On 22 February 2016 at 17:36, Richard Biener <rguent...@suse.de> wrote: >>> >>> > On Mon, 22 Feb 2016, Prathamesh Kulkarni wrote: >>> >>> > >>> >>> >> Hi Richard, >>> >>> >> As discussed in private mail, this version of patch attempts to >>> >>> >> increase alignment >>> >>> >> of global struct decl if it contains an an array field(s) and array's >>> >>> >> offset is a multiple of the alignment of vector type corresponding to >>> >>> >> it's scalar type and recursively checks for nested structs. >>> >>> >> eg: >>> >>> >> static struct >>> >>> >> { >>> >>> >> int a, b, c, d; >>> >>> >> int k[4]; >>> >>> >> float f[10]; >>> >>> >> }; >>> >>> >> k is a candidate array since it's offset is 16 and alignment of >>> >>> >> "vector (4) int" is 8. >>> >>> >> Similarly for f. >>> >>> >> >>> >>> >> I haven't been able to create a test-case where there are >>> >>> >> multiple candidate arrays and vector alignment of arrays are >>> >>> >> different. >>> >>> >> I suppose in this case we will have to increase alignment >>> >>> >> of the struct by the max alignment ? >>> >>> >> eg: >>> >>> >> static struct >>> >>> >> { >>> >>> >> <fields> >>> >>> >> T1 k[S1] >>> >>> >> <fields> >>> >>> >> T2 f[S2] >>> >>> >> <fields> >>> >>> >> }; >>> >>> >> >>> >>> >> if V1 is vector type corresponding to T1, and V2 corresponding vector >>> >>> >> type to T2, >>> >>> >> offset (k) % align(V1) == 0 and offset (f) % align(V2) == 0 >>> >>> >> and align (V1) > align(V2) then we will increase alignment of struct >>> >>> >> by align(V1). >>> >>> >> >>> >>> >> Testing showed FAIL for g++.dg/torture/pr31863.C due to program >>> >>> >> timeout. >>> >>> >> Initially it appeared to me, it went in infinite loop. However >>> >>> >> on second thoughts I think it's probably not an infinite loop, rather >>> >>> >> taking (extraordinarily) large amount of time >>> >>> >> to compile the test-case with the patch. >>> >>> >> The test-case builds quickly for only 2 instantiations of ClassSpec >>> >>> >> (ClassSpec<Key, A001, 1>, >>> >>> >> ClassSpec<Key, A002, 2>) >>> >>> >> Building with 22 instantiations (upto ClassSpec<Key, A0023, 22>) >>> >>> >> takes up >>> >>> >> to ~1m to compile. >>> >>> >> with: >>> >>> >> 23 instantiations: ~2m >>> >>> >> 24 instantiations: ~5m >>> >>> >> For 30 instantiations I terminated cc1plus after 13m (by SIGKILL). >>> >>> >> >>> >>> >> I guess it shouldn't go in an infinite loop because: >>> >>> >> a) structs cannot have circular references. >>> >>> >> b) works for lower number of instantiations >>> >>> >> However I have no sound evidence that it cannot be in infinite loop. >>> >>> >> I don't understand why a decl node is getting visited more than once >>> >>> >> for that test-case. >>> >>> >> >>> >>> >> Using a hash_map to store alignments of decl's so that decl node >>> >>> >> gets visited >>> >>> >> only once prevents the issue. >>> >>> > >>> >>> > Maybe aliases. Try not walking vnode->alias == true vars. >>> >>> Hi, >>> >>> I have a hypothesis why decl node gets visited multiple times. >>> >>> >>> >>> Consider the test-case: >>> >>> template <typename T, unsigned N> >>> >>> struct X >>> >>> { >>> >>> T a; >>> >>> virtual int foo() { return N; } >>> >>> }; >>> >>> >>> >>> typedef struct X<int, 1> x_1; >>> >>> typedef struct X<int ,2> x_2; >>> >>> >>> >>> static x_1 obj1 __attribute__((used)); >>> >>> static x_2 obj2 __attribute__((used)); >>> >>> >>> >>> Two additional structs are created by C++FE, c++filt shows: >>> >>> _ZTI1XIiLj1EE -> typeinfo for X<int, 1u> >>> >>> _ZTI1XIiLj2EE -> typeinfo for X<int, 2u> >>> >>> >>> >>> Both of these structs have only one field D.2991 and it appears it's >>> >>> *shared* between them: >>> >>> struct D.2991; >>> >>> const void * D.2980; >>> >>> const char * D.2981; >>> >>> >>> >>> Hence the decl node D.2991 and it's fields (D.2890, D.2981) get visited >>> >>> twice: >>> >>> once when walking _ZTI1XIiLj1EE and 2nd time when walking _ZTI1XIiLj2EE >>> >>> >>> >>> Dump of walking over the global structs for above test-case: >>> >>> http://pastebin.com/R5SABW0c >>> >>> >>> >>> So it appears to me to me a DAG (interior node == struct decl, leaf == >>> >>> non-struct field, >>> >>> edge from node1 -> node2 if node2 is field of node1) is getting >>> >>> created when struct decl >>> >>> is a type-info object. >>> >>> >>> >>> I am not really clear on how we should proceed: >>> >>> a) Keep using hash_map to store alignments so that every decl gets >>> >>> visited only once. >>> >>> b) Skip walking artificial record decls. >>> >>> I am not sure if skipping all artificial struct decls would be a good >>> >>> idea, but I don't >>> >>> think it's possible to identify if a struct-decl is typeinfo struct at >>> >>> middle-end ? >>> >> >>> >> You shouldn't end up walking those when walking the type of >>> >> global decls. That is, don't walk typeinfo decls - yes, practically >>> >> that means just not walking DECL_ARTIFICIAL things. >>> > Hi, >>> > I have done the changes in the patch (attached) and cross-tested >>> > on arm*-*-* and aarch64*-*-* without failures. >>> > Is it OK for stage-1 ? >>> Hi, >>> Is the attached patch OK for trunk ? >>> Bootstrapped and tested on aarch64-linux-gnu, ppc64le-linux-gnu. >>> Cross-tested on arm*-*-* and aarch64*-*-*. >> >> You can't simply use >> >> + offset = int_byte_position (field); >> >> as it can be placed at variable offset which will make int_byte_position >> ICE. Note it also returns a truncated byte position (bit position >> stripped) which may be undesirable here. I think you want to use >> bit_position and if and only if DECL_FIELD_OFFSET and >> DECL_FIELD_BIT_OFFSET are INTEGER_CST. > oops, I didn't realize offsets could be variable. > Will that be the case only for VLA member inside struct ? >> >> Your observation about the expensiveness of the walk still stands I guess >> and eventually you should at least cache the >> get_vec_alignment_for_record_decl cases. Please make those workers >> _type rather than _decl helpers. > Done >> >> You seem to simply get at the maximum vectorized field/array element >> alignment possible for all arrays - you could restrict that to >> arrays with at least vector size (as followup). > Um sorry, I didn't understand this part. >> >> + /* Skip artificial decls like typeinfo decls or if >> + record is packed. */ >> + if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type)) >> + return 0; >> >> I think we should honor DECL_USER_ALIGN as well and not mess with those >> decls. > Done >> >> Given the patch now does quite some extra work it might make sense >> to split the symtab part out of the vect_can_force_dr_alignment_p >> predicate and call that early. > In the patch I call symtab_node::can_increase_alignment_p early. > I tried moving that to it's callers - vect_compute_data_ref_alignment and > increase_alignment::execute, however that failed some tests in vect, > and hence I didn't add the following hunk in the patch. Did I miss some check > ? > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index 7652e21..2c1acee 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -795,7 +795,10 @@ vect_compute_data_ref_alignment (struct data_reference > *dr) > && TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR) > base = TREE_OPERAND (TREE_OPERAND (base, 0), 0); > > - if (!vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype))) > + if (!(TREE_CODE (base) == VAR_DECL > + && decl_in_symtab_p (base) > + && symtab_node::get (base)->can_increase_alignment_p () > + && vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype)))) > { > if (dump_enabled_p ()) > { > > Thanks, > Prathamesh >> >> Richard.