On 30 March 2017 at 12:46, Naveen N. Rao <naveen.n....@linux.vnet.ibm.com> wrote: > Also, with a simple module to memset64() a 1GB vmalloc'ed buffer, here > are the results: > generic: 0.245315533 seconds time elapsed ( +- 1.83% ) > optimized: 0.169282701 seconds time elapsed ( +- 1.96% )
Wondering what makes gcc not to produce efficient assembly code. Can you please post the disassembly of C implementation of memset64? Just for info purpose. Thanks, Prasanna