On Wed, 22 Jul 2020, zhoukaipeng (A) wrote: > Hi, > > It is the patch to add "#pragma GCC no_reduc_chain" for pr96053. It > only completes the front end of C language. > > For the testcase, it successfully skipped doing slp by finding sequences > from reduction chains. Without "#pragma GCC no_reduc_chain", it will > fail to do vectorization. > > Please help to check if there is any problem. If there is no problem, I > will continue to complete the front end of the remaining languages.
First of all I think giving users more control over vectorization is good. Now as for "#pragma GCC no_reduc_chain" I'd like to avoid negatives and terms internal to GCC. I also would like to see vectorization pragmas to be grouped somehow, also to avoid bit explosion in struct loop. There's already annot_expr_no_vector_kind and annot_expr_vector_kind both only used by the fortran FE at the moment. Note ANNOATE_EXPR already allows an extra argument thus only annot_expr_vector_kind should prevail with its argument specifying a bitmask of vectorizer hints. We'd have an extra enum for those like enum annot_vector_subkind { annot_vector_never = 0, annot_vector_auto = 1, // this is the default annot_vector_always = 3, your new flag }; and the user would specify it via #pragma GCC vect [(never|always|auto)] [your new flag] now, I honestly have a difficulty in suggesting a better name than no_reduc_chain. Quoting the testcase: +double f(double *a, double *b) +{ + double res1 = 0; + double res0 = 0; +#pragma GCC no_reduc_chain + for (int i = 0 ; i < 1000; i+=4) { + res0 += a[i] * b[i]; + res1 += a[i+1] * b[i*1]; + res0 += a[i+2] * b[i+2]; + res1 += a[i+3] * b[i+3]; + } + return res0 + res1; +} for your case with IIRC V2DF vectors using reduction chains will result in a vectorization factor of two while with a SLP reduction the vectorization factor is one. So maybe it is better to give the user control over the vectorization factor? That's desirable in other cases where the user wants to force a larger VF to get extra unrolling for example. For the testcase above you'd use #pragma GCC vect vf(1) or so (syntax to be discussed). The side-effect would be that with a reduction chain the VF request cannot be fulfilled but with a SLP reduction it can. Of course no_reduc_chain is much easier to actually implement in a strict way while specifying VF will likely need to be documented as a hint (with an eventual diagnostic if it wasn't fulfilled) Richard/Jakub, any thoughts? Thanks, Richard.