On Fri, 16 Sep 2022 09:27:39 GMT, Andrew Haley <a...@openjdk.org> wrote:
>> Interesting, I had not considered that. Thanks for pointing that out. I'm >> honestly not sure how to evaluate the impact of the generated code on the >> icache. I'll look at the logic surrounding the ghash processBlocks(_wide) >> code to see how that decision is made. I don't have an aversion to going >> back to an assembly-based loop using the suggestions that @dchuyko made and >> maybe that's the right choice if it means more compact code. > > It's not so complicated. if you can make the code smaller with negligible > impact on throughput, do so. If not, don't. I really didn't see a noticeable impact on performance with the loop unrolled so I'm going with the SUB/CBNZ approach. Seems like it does the best job of keeping the generated stub smaller and still be a tiny bit more efficient than what I started with. As always, I appreciate the suggestions. ------------- PR: https://git.openjdk.org/jdk/pull/7702