On 11/7/24 4:34 PM, Li, Pan2 wrote:
Thanks Tamar and Jeff for comments.

I'm not sure it's that simple.  It'll depend on the micro-architecture.
So things like strength of the branch predictors, how fetch blocks are
handled (can you have embedded not-taken branches, short-forward-branch
optimizations, etc).

After:

.L.sat_add_u_1(unsigned int, unsigned int):
          add 4,3,4
          rldicl 9,4,0,32
          subf 3,3,9
          sradi 3,3,63
          or 3,3,4
          rldicl 3,3,0,32
          blr

and before

.L.sat_add_u_1(unsigned int, unsigned int):
          add 4,3,4
          cmplw 0,4,3
          bge 0,.L2
          li 4,-1
.L2:
          rldicl 3,4,0,32
          blr

I am not familiar with branch prediction, but the branch should be 50% token 
and 50% not-token
according to the range of sat add input. It is the worst case for branch 
prediction? I mean if we call
100 times with token, not-token, token, not-token... sequence, the branch 
version will be still faster?
Feel free to correct me if I'm wrong.
It's less about the range of values in the type and more about what values actually occur in practice. I would generally expect most of the time these operations don't actually need to saturate. *If* that is true, then the branch predictors should do a fairly good job.

If on the other hand the actual distribution is random-ish, then the predictors are really going to struggle.

Jeff


Reply via email to