------- Comment #23 from potswa at mac dot com 2009-09-15 15:29 ------- With the new special case, I get 3x faster than current for n = 100, k = 99. Now it weighs in at 45 lines in my style, before conversion to official style, and not coincidentally I don't really feel like posting it again :vP . I'll do the legal stuff next.
Note that a significant speedup is available if std::copy is used for other values of k than 1 and n-1. I just observed 4x over my algorithm and 7.5x over the current one for n = 100 k = n-2 which seems disproportionate but behavior is correct. (The advantage disappears for large n.) This strategy generally requires constructing min( k, n-k ) temporaries. What's the policy on that kind of optimization? The temporaries can only go on the stack, which makes things hairy. Although "isolated cases" benefit, its most reasonable to only special-case left and right shift by 1, right? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41351