James Greenhalgh wrote: > I haven't seen a follow-up to Andrew's point regarding other > read-modify-write operations. > > Did youi investigate the cost of these?
I looked at whether there are other similar cases, but it appears SHA1 is unique due to the odd dataflow, the mismatch in latencies and the high repetition. So it seems best to handle it as a special case. What does seem useful is teaching GCC to prefer using the same register for accumulators. That is a general issue that would improve performance in many cases. Wilco