On 8/29/19 6:34 AM, Stefan Brankovic wrote:
> Then I run my performance tests and I got following results(test is calling
> vpkpx 100000 times):
> 
> 1) Current helper implementation: ~ 157 ms
> 
> 2) helper implementation you suggested: ~94 ms
> 
> 3) tcg implementation: ~75 ms

I assume you tested in a loop.  If you have just the one expansion, you'll not
see the penalty for the icache expansion.  To show the other extreme, you'd
want to test as separate sequential invocations.

That said, I'd be more interested in a real test case that isn't just calling
one instruction over and over.  Is there a real test case that shows vpkpx in
the top 25 of the profile?  With more than 0.5% of runtime?


r~

Reply via email to