https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #5 from sergey.shalnov at intel dot com --- (In reply to Richard Biener from comment #2) > The strange code is because we perform basic-block vectorization resulting in > > vect_cst__249 = {_251, _251, _251, _251, _334, _334, _334, _334, _417, > _417, _417, _417, _48, _48, _48, _48}; > MEM[(unsigned int *)&tmp] = vect_cst__249; > _186 = tmp[0][0]; > _185 = tmp[1][0]; > ... > > which for some reason is deemed profitable: > > t.c:32:12: note: Cost model analysis: > Vector inside of basic block cost: 24 > Vector prologue cost: 64 > Vector epilogue cost: 0 > Scalar cost of basic block: 192 > t.c:32:12: note: Basic block will be vectorized using SLP > > what is odd is that the single vector store is costed 24 while the 16 scalar > int stores are costed 192. The vector build from scalar costs 64. > > I guess Honzas cost-model tweaks might have gone wrong here or we're hitting > an oddity in the SLP costing. > > Even if it looks strange maybe the sequence _is_ profitable? > > The second loop would be vectorized if 'sum' was unsigned. Richard, No, the sequence is not profitable. If we don't use any vector registers here the performance will be better for all architectures. I'm talking about vectorized code only here. I'm trying to look into vect_stmt_relevant_p() function to implement additional limitations and avoid block vectorization if the loop is not vectorized. If you have any idea how to avoid vectorization in this particular place - please let me know. Sergey