On Tue, Sep 15, 2015 at 5:32 PM, Alan Hayward <alan.hayw...@arm.com> wrote: > > > On 15/09/2015 13:09, "Richard Biener" <richard.guent...@gmail.com> wrote: > >> >>Now comments on the patch itself. >> >>+ if (code == COND_EXPR) >>+ *v_reduc_type = COND_REDUCTION; >> >>so why not simply use COND_EXPR instead of the new v_reduc_type? > > v_reduc_type is also dependant on check_reduction (which comes from > !nested_cycle in vectorizable_reduction). > It seemed messy to keep on checking for both those things throughout. > > In my patch to catch simpler condition reductions, I’ll be adding another > value to this enum too. v_reduc_type will be set to this new value based > on the same properties for COND_REDUCTION plus some additional constraints. > >> >>+ if (check_reduction && code != COND_EXPR && >>+ vect_is_slp_reduction (loop_info, phi, def_stmt)) >> >>&&s go to the next line > > ok > >> >>+ /* Reduction of the max index and a reduction of the found >>+ values. */ >>+ epilogue_cost += add_stmt_cost (target_cost_data, 1, >>+ vec_to_scalar, stmt_info, 0, >>+ vect_epilogue); >> >>vec_to_scalar once isn't what the comment suggests. Instead the >>comment suggests twice what a regular reduction would do >>but I guess we can "hide" the vec_to_scalar cost and "merge" it >>with the broadcast. Thus make the above two vector_stmt costs? >> >>+ /* A broadcast of the max value. */ >>+ epilogue_cost += add_stmt_cost (target_cost_data, 2, >>+ scalar_to_vec, stmt_info, 0, >>+ vect_epilogue); >> >>comment suggests a single broadcast. > > I’ve made a copy/paste error here. Just need to swap the 1 and the 2. > > >> >>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi) >> the final vector of induction results: */ >> exit_phi = NULL; >> FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg) >>- { >>+ { >> gimple use_stmt = USE_STMT (use_p); >> if (is_gimple_debug (use_stmt)) >> continue; >> >>please avoid unrelated whitespace changes. > > Ok. I was changing “8 spaces” to a tab, but happy to revert. > >> >>+ case COND_EXPR: >>+ if (v_reduc_type == COND_REDUCTION) >>+ { >>... >>+ /* Fall through. */ >>+ >> case MIN_EXPR: >> case MAX_EXPR: >>- case COND_EXPR: >> >>aww, so we could already handle COND_EXPR reductions? How do they >>differ from what you add? Did you see if that path is even exercised >>today? > > Today, COND_EXPRs are only supported when they are nested inside a loop. > See the vect-cond-*.c tests. > For example: > > for (j = 0; j < M; j++) > { > x = x_in[j]; > curr_a = a[0]; > > for (i = 0; i < N; i++) > { > next_a = a[i+1]; > curr_a = x > c[i] ? curr_a : next_a; > } > x_out[j] = curr_a; > } > > > In that case, only the outer loop is vectorised. > >> >>+ /* Create a vector of {init_value, 0, 0, 0...}. */ >>+ vec<constructor_elt, va_gc> *v; >>+ vec_alloc (v, nunits); >>+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val); >>+ if (SCALAR_FLOAT_TYPE_P (scalar_type)) >>+ for (i = 1; i < nunits; ++i) >>+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, >>+ build_real (scalar_type, >>dconst0)); >>+ else >>+ for (i = 1; i < nunits; ++i) >>+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, >>+ build_int_cst (scalar_type, 0)); >>+ init_def = build_constructor (vectype, v); >> >>you can unify the float/int case by using build_zero_cst (scalar_type). >>Note that you should build a vector constant instead of a constructor >>if init_val is a constant. The convenient way is to build the vector >>elements into a tree[] array and use build_vector_stat in that case. > > Ok, will switch to build_zero_cst. > Also, I will switch my vector to {init_value, init_value, init_value…}. > I had {init_value, 0, 0, 0…} because I was going have the option of > ADD_REDUC_EXPR, > But that got removed along the way.
You can then simply use build_vector_from_val (). >> >>+ /* Find maximum value from the vector of found indexes. */ >>+ tree max_index = make_temp_ssa_name (index_scalar_type, NULL, ""); >> >>just use make_ssa_name (index_scalar_type); > > Ok > >> >>+ /* Convert the vector of data to the same type as the EQ. */ >>+ tree vec_data_cast; >>+ if ( TYPE_UNSIGNED (index_vec_type)) >>+ { >> >>How come it never happens the element >>sizes do not match? (int index type and double data type?) > > This was a little unclear. > The induction index is originally created as an unsigned version of the > type as the data vector. > (see the definition of cr_index_vector_type in vectorizable_reduction(), > which is then used to create cond_name) > > I will remove the if and replace with a gcc_checking_assert(TYPE_UNSIGNED > (index_vec_type)) > > >> >>+ /* Where the max index occured, use the value from the data >>vector. */ >>+ tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, >>""); >>+ gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR, >>+ vec_compare, >>vec_data_cast); >> >>that is, don't you need to do some widening/shortening on the comparison >>result? >>(also what happens in the case "or all of the values"?) >> >>Definitely too much VIEW_CONVERT_MAGIC here for my taste ;) > > Given the induction index is the same size as the data, this will be ok. > > I don’t like the VIEW_CONVERT_MAGIC either! But I couldn’t see any other > way of making it work. > > > The case of "all of the values” is when there were no matches in the loop. > When this happens, the data vector will contain only the default values > {init_value, 0, 0, 0…} > (… once I change my code due to the previous comment, we’ll have > {init_value, init_value, init_value…} ). > The REDUC_MAX_EXPR will turn either of those vectors into a “init_value”. > >> >>+ This function also handles reduction of condition expressions, for >>example: >>+ for (int i = 0; i < N; i++) >>+ if (a[i] < value) >>+ last = a[i]; >>+ This is handled by vectorising the loop and creating an additional >>vector >>+ containing the loop indexes for which "a[i] < value" was true. In the >>+ function epilogue this is reduced to a single max value and then used >>to >>+ index into the vector of results. >> >>I miss a comment that shows the kind of code we transform this to. >>"an additional vector containing the loop indexes" can't work - the vector >>will not be large enough ;) Naiively I would have made 'last' a vector, >>performing the reduction element-wise and in the epilogue reduce >>'last' itself. And it looks like we are already doing that for >> >>int foo (int *a) >>{ >> int val = 0; >> for (int i = 0; i < 1024; ++i) >> if (a[i] > val) >> val = a[i]; >> return val; >>} >> >>I must be missing something. Yes, I think we can't do index reduction >>yet, >>but pr65947-10.c is alrady handled? > > > I think an easier way of looking thinking about this is to see what > happens each iteration of the loop: > > Assuming the example above and a vector length of 4. > The index vector starts as (0,0,0,0). > On the first iteration, the condition passes for, lets say, i=2 and i=3, > then index vector is (0,0,2,3) > on the next iteration, the condition passes for i=4, then index vector is > now (4,0,2,3) > on the next iteration, the condition passes for i=8 and i=9, then index > vector is now (8,9,2,3) > Etc > In the end we might end up with something like index vector=(26,2,54,14). > The “2” has not been set since the first iteration. > The only value we care about is the last one set - the 54. The rest are > meaningless to us. > At the same time the data vector has been storing values at the same time > as the index vector, and so now contains (a[26], a[2], a[54], a[14]) > > In the epilogue, we reduce the index vector down to “54”, then create a > vector (54,54,54,54). > This is matched against the index vector to give us > (false,false,true,false). > Then we use that vector to index the data vector, giving us (0, 0, a[54], > 0). > Which we then reduce down to a single value. Ok, I see. That this case is already vectorized is because it implements MAX_EXPR, modifying it slightly to int foo (int *a) { int val = 0; for (int i = 0; i < 1024; ++i) if (a[i] > val) val = a[i] + 1; return val; } makes it no longer handled by current code. > > > > >> >>Stopping here. > > Thanks for the comments so far. > I’ll put together a new patch with the changes above. > > > Alan. > >