On 9/9/21 12:19 PM, Bill Schmidt wrote:
On 9/9/21 11:11 AM, Segher Boessenkool wrote:
Hi!

On Wed, Sep 08, 2021 at 02:57:14PM +0800, Kewen.Lin wrote:
+      /* If we have strided or elementwise loads into a vector, it's
+        possible to be bounded by latency and execution resources for
+        many scalar loads.  Try to account for this by scaling the
+        construction cost by the number of elements involved, when
+        handling each matching statement we record the possible extra
+        penalized cost into target cost, in the end of costing for
+        the whole loop, we do the actual penalization once some load
+        density heuristics are satisfied.  */
The above comment is quite hard to read.  Can you please break up the last
sentence into at least two sentences?
How about the below:

+      /* If we have strided or elementwise loads into a vector, it's
"strided" is not a word: it properly is "stridden", which does not read
very well either.  "Have loads by stride, or by element, ..."?  Is that
good English, and easier to understand?
No, this is OK.  "Strided loads" is a term of art used by the
vectorizer; whether or not it was the Queen's English, it's what we
have...  (And I think you might only find "bestridden" in some 18th or
19th century English poetry... :-)
+        possible to be bounded by latency and execution resources for
+        many scalar loads.  Try to account for this by scaling the
+        construction cost by the number of elements involved.  For
+        each matching statement, we record the possible extra
+        penalized cost into the relevant field in target cost.  When
+        we want to finalize the whole loop costing, we will check if
+        those related load density heuristics are satisfied, and add
+        this accumulated penalized cost if yes.  */

Otherwise this looks good to me, and I recommend maintainers approve with
that clarified.
Does that text look good to you now Bill?  It is still kinda complex,
maybe you see a way to make it simpler.
I think it's OK now.  The complexity at least matches the code now
instead of exceeding it. :-P  j/k...

Well, let me not be lazy, and see whether I can help:

"Power processors do not currently have instructions for strided and elementwise loads, and instead we must generate multiple scalar loads.  This leads to undercounting of the cost.  We account for this by scaling the construction cost by the number of elements involved, and saving this as extra cost that we may or may not need to apply.  When finalizing the cost of the loop, the extra penalty is applied when the load density heuristics are satisfied."

Something like that?

Bill


Reply via email to