On Fri, 16 Sep 2022 09:27:39 GMT, Andrew Haley <a...@openjdk.org> wrote:

>> Interesting, I had not considered that.  Thanks for pointing that out.  I'm 
>> honestly not sure how to evaluate the impact of the generated code on the 
>> icache.  I'll look at the logic surrounding the ghash processBlocks(_wide) 
>> code to see how that decision is made.  I don't have an aversion to going 
>> back to an assembly-based loop using the suggestions that @dchuyko made and 
>> maybe that's the right choice if it means more compact code.
>
> It's not so complicated. if you can make the code smaller with negligible 
> impact on throughput, do so. If not, don't.

I really didn't see a noticeable impact on performance with the loop unrolled 
so I'm going with the SUB/CBNZ approach.  Seems like it does the best job of 
keeping the generated stub smaller and still be a tiny bit more efficient than 
what I started with.  As always, I appreciate the suggestions.

-------------

PR: https://git.openjdk.org/jdk/pull/7702

Reply via email to