------- Comment #50 from potswa at mac dot com 2009-11-03 17:53 ------- The current RAI algo uses a temporary regardless of size or class. We could put in a "&& sizeof(_ValueType) < __MAX_TEMP_SIZE" or somethingÂ… but stack overflow from a single temporary doesn't seem to have been concern in the past.
I don't see how being register-size in particular is important. If we were swapping the temporary every time, we would want it to fit in a reasonable number of registers so the compiler could optimize out read-after-writes. But the __tmp here is only written and read once. The larger it is, the more acceleration. Proposed performance is very good with k small > 1, compared to current. Using memmove is simply even faster. It's not clear such rotate operations are popular enough to warrant a framework for optimization, though. If we assure it's a non-move type then I also favor reverting out the _GLIBCXX_MOVE[3](). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41351