On Fri, Nov 1, 2013 at 4:04 AM, Aldy Hernandez <al...@redhat.com> wrote: > Hello gentlemen. I'm CCing all of you, because each of you can provide > valuable feedback to various parts of the compiler which I touch. I have > sprinkled love notes with your names throughout the post :). > > This is a patch against the gomp4 branch. It provides initial support for > simd-enabled functions which are "#pragma omp declare simd" in the OpenMP > world and elementals in Cilk Plus nomenclature. The parsing bits for OpenMP > are already in trunk, but they are silently ignored. This patch aims to > remedy the situation. The Cilk Plus parsing bits, OTOH, are not ready, but > could trivially be adapted to use this infrastructure (see below). > > I would like to at least get this into the gomp4 branch for now, because I > am accumulating far too many changes locally. > > The main idea is that for a simd annotated function, we can create one or > more cloned vector variants of a scalar function that can later be used by > the vectorizer. > > For a simple example with multiple returns... > > #pragma omp declare simd simdlen(4) notinbranch > int foo (int a, int b) > { > if (a == b) > return 555; > else > return 666; > } > > ...we would generate with this patch (unoptimized):
Just a quick question, queueing the thread for later review (aww, queue back at >100 threads again - I can't get any work done anymore :(). What does #pragma omp declare simd guarantee about memory side-effects and memory use in the function? That is, unless the function can be safely annotated with the 'const' attribute the whole thing is useless for the vectorizer. Thanks, Richard. > foo.simdclone.0 (vector(4) int simd.4, vector(4) int simd.5) > { > unsigned int iter.6; > int b.3[4]; > int a.2[4]; > int retval.1[4]; > int _3; > int _5; > int _6; > vector(4) int _7; > > <bb 2>: > a.2 = VIEW_CONVERT_EXPR<int[4]>(simd.4); > b.3 = VIEW_CONVERT_EXPR<int[4]>(simd.5); > iter.6_12 = 0; > > <bb 3>: > # iter.6_9 = PHI <iter.6_12(2), iter.6_14(6)> > _5 = a.2[iter.6_9]; > _6 = b.3[iter.6_9]; > if (_5 == _6) > goto <bb 5>; > else > goto <bb 4>; > > <bb 4>: > > <bb 5>: > # _3 = PHI <555(3), 666(4)> > retval.1[iter.6_9] = _3; > iter.6_14 = iter.6_9 + 1; > if (iter.6_14 < 4) > goto <bb 6>; > else > goto <bb 7>; > > <bb 6>: > goto <bb 3>; > > <bb 7>: > _7 = VIEW_CONVERT_EXPR<vector(4) int>(retval.1); > return _7; > > } > > The new loop is properly created and annotated with loop->force_vect=true > and loop->safelen set. > > A possible use may be: > > int array[1000]; > void bar () > { > int i; > for (i=0; i < 1000; ++i) > array[i] = foo(i, 123); > } > > In which case, we would use the simd clone if available: > > bar () > { > vector(4) int vect_cst_.21; > vector(4) int vect_i_6.20; > vector(4) int * vectp_array.19; > vector(4) int * vectp_array.18; > vector(4) int vect_cst_.17; > vector(4) int vect__4.16; > vector(4) int vect_vec_iv_.15; > vector(4) int vect_cst_.14; > vector(4) int vect_cst_.13; > int stmp_var_.12; > int i; > unsigned int ivtmp_1; > int _4; > unsigned int ivtmp_7; > unsigned int ivtmp_20; > unsigned int ivtmp_21; > > <bb 2>: > vect_cst_.13_8 = { 0, 1, 2, 3 }; > vect_cst_.14_2 = { 4, 4, 4, 4 }; > vect_cst_.17_13 = { 123, 123, 123, 123 }; > vectp_array.19_15 = &array; > vect_cst_.21_5 = { 1, 1, 1, 1 }; > goto <bb 4>; > > <bb 3>: > > <bb 4>: > # i_9 = PHI <i_6(3), 0(2)> > # ivtmp_1 = PHI <ivtmp_7(3), 1000(2)> > # vect_vec_iv_.15_11 = PHI <vect_vec_iv_.15_12(3), vect_cst_.13_8(2)> > # vectp_array.18_16 = PHI <vectp_array.18_17(3), vectp_array.19_15(2)> > # ivtmp_20 = PHI <ivtmp_21(3), 0(2)> > vect_vec_iv_.15_12 = vect_vec_iv_.15_11 + vect_cst_.14_2; > vect__4.16_14 = foo.simdclone.0 (vect_vec_iv_.15_11, vect_cst_.17_13); > _4 = 0; > MEM[(int *)vectp_array.18_16] = vect__4.16_14; > vect_i_6.20_19 = vect_vec_iv_.15_11 + vect_cst_.21_5; > i_6 = i_9 + 1; > ivtmp_7 = ivtmp_1 - 1; > vectp_array.18_17 = vectp_array.18_16 + 16; > ivtmp_21 = ivtmp_20 + 1; > if (ivtmp_21 < 250) > goto <bb 3>; > else > goto <bb 5>; > > <bb 5>: > return; > > } > > That's the idea. > > Some of the ABI issues still need to be resolved (mangling for avx-512, what > to do with non x86 architectures, what (if any) default clones will be > created when no vector length is specified, etc etc), but the main > functionality can be seen above. > > Uniform and linear parameters (which are passed as scalars) are still not > handled. Also, Jakub mentioned that with the current vectorizer we probably > can't make good use of the inbranch/masked clones. I have a laundry list of > missing things prepended by // FIXME if anyone is curious. > > I'd like some feedback from y'all in your respective areas, since this > touches a few places besides OpenMP. For instance... > > [Honza] Where do you suggest I place a list of simd clones for a particular > (scalar) function? Right now I have added a simdclone_of field in > cgraph_node and am (temporarily) serially scanning all functions in > get_simd_clone(). This is obviously inefficient. I didn't know whether to > use the current next_sibling_clone/etc fields or create my own. I tried > using clone_of, and that caused some havoc so I'd like some feedback. > > [Martin] I have adapted the ipa_parm_adjustment infrastructure to allow > adding new arguments out of the blue like you mentioned was missing in > ipa-prop.h. I have also added support for creating vectors of arguments. > Could you take a look at my changes to ipa-prop.[ch]? > > [Martin] I need to add new arguments in the case of inbranch clones, which > add an additional vector with a mask as the last argument: For the > following: > > #pragma omp declare simd simdlen(4) inbranch > int foo (int a) > { > return a + 1234; > } > > ...we would generate a clone with: > > vector(4) int > foo.simdclone.0 (vector(4) int simd.4, vector(4) int mask.5) > > I thought it best to enhance ipa_modify_formal_parameters() and associated > machinery than to add the new argument ad-hoc. We already have enough ways > of doing tree and cgraph versioning in the compiler ;-). > > [Richi] I would appreciate feedback on the vectorizer and the infrastructure > as a whole. Do keep in mind that this is a work in progress :). > > [Balaji] This patch would provide the infrastructure that can be used by the > Cilk Plus elementals. When this is complete, all that would be missing is > the parser. You would have to tag the original function with "omp declare > simd" and "cilk plus elemental" attributes. See simd_clone_clauses_extract. > > [Jakub/rth]: As usual, valuable feedback on OpenMP and everything else is > greatly appreciated. > > Oh yeah, there are many more changes that would ideally be needed in the > vectorizer. > > Fire away!