https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52473
Thomas Koenig <tkoenig at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |tkoenig at gcc dot gnu.org --- Comment #5 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Created attachment 41394 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41394&action=edit Benchmark of straight DO loops vs. library version Some numbers from https://groups.google.com/forum/#!topic/comp.lang.fortran/AI0F1Vpkc3I show that using straight DO loops could both help and hurt: $ ./a.out Testing explicit DO loops Dim = 1 Elapsed CPU time = 2.82861114 Dim = 2 Elapsed CPU time = 2.93245506 Dim = 3 Elapsed CPU time = 2.94523525 Testing built-in cshift Dim = 1 Elapsed CPU time = 1.65619278 Dim = 2 Elapsed CPU time = 2.80988693 Dim = 3 Elapsed CPU time = 7.13671684 I'll see what could be done using an explicit call to memcpy for the innermost loops.