https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106022
--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Richard Biener from comment #11) > (In reply to H.J. Lu from comment #9) > > (In reply to Richard Biener from comment #8) > > > (In reply to H.J. Lu from comment #6) > > > > Created attachment 53169 [details] > > > > A patch > > > > > > > > This patch multiplies the vector store cost by the number of scalar > > > > elements > > > > in > > > > a word to properly compare scalar store cost against vector store cost. > > > > > > But that's not "properly" but "wrong" ... > > > > > > Note we already cost the vector load from the constant pool so the vector > > > side costing is correct. > > > > > > What's eventually imprecise is the scalar cost where you could anticipate > > > store merging, but adjusting the vector cost side is just wrong. > > > > I tried to adjust the scalar cost. When the scalar cost of storing a byte > > is 6, dividing it by 8 (the number of scalar elements in a word) becomes 0. > > Will it work? > > No, I think you would need to pattern match an actual store sequence, > for example by looking at > > if (STMT_VINFO_GROUPED_ACCESS (stmt_info) > && pow2p_hwi (DR_GROUP_STORE_COUNT (stmt_info))) > /* cost a possibly merged store only once (but with larger mode?) */ > if (DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info) > ... The information aren't available in add_stmt_cost. I will count number of scalar stores and vector stores. Then I will compare them in finish_cost. > So costing the whole sequence of scalar stores a single time, with > adjusted mode. > > store-merging also handles non-QImode stores btw.