On 11/14/22 14:49, Christoph Müllner wrote:
We can take this further, but then the following questions pop up: * how much data processing per loop iteration?
I have no idea because I don't have any real data. Last time I gathered any data on this issue was circa 1988 :-)
* what about unaligned strings?
I'd punt. I don't think we can depend on having a high performance unaligned access. You could do a dynamic check of alignment, but you'd really need to know that they're aligned often enough that the dynamic check can often be recovered.
Happy to get suggestions/opinions for improvement.
I think this is pretty good without additional data that would indicate that handling unaligned cases or a different number of loop peels would be a notable improvement.
Jeff