Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

Jeff Law Sat, 09 Nov 2024 06:03:52 -0800



On 11/7/24 4:34 PM, Li, Pan2 wrote:

Thanks Tamar and Jeff for comments.

I'm not sure it's that simple.  It'll depend on the micro-architecture.
So things like strength of the branch predictors, how fetch blocks are
handled (can you have embedded not-taken branches, short-forward-branch
optimizations, etc).

After:

.L.sat_add_u_1(unsigned int, unsigned int):
          add 4,3,4
          rldicl 9,4,0,32
          subf 3,3,9
          sradi 3,3,63
          or 3,3,4
          rldicl 3,3,0,32
          blr

and before

.L.sat_add_u_1(unsigned int, unsigned int):
          add 4,3,4
          cmplw 0,4,3
          bge 0,.L2
          li 4,-1
.L2:
          rldicl 3,4,0,32
          blr


I am not familiar with branch prediction, but the branch should be 50% token 
and 50% not-token
according to the range of sat add input. It is the worst case for branch 
prediction? I mean if we call
100 times with token, not-token, token, not-token... sequence, the branch 
version will be still faster?
Feel free to correct me if I'm wrong.

It's less about the range of values in the type and more about whatvalues actually occur in practice. I would generally expect most of thetime these operations don't actually need to saturate. *If* that istrue, then the branch predictors should do a fairly good job.

If on the other hand the actual distribution is random-ish, then thepredictors are really going to struggle.


Jeff

Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

Reply via email to