Jeremy Hall <gcc.h...@gmail.com> writes: > I wonder if its possible to improve the code generation for inline > stringops when > the length is known to be a multiple of 4 bytes?
The selection of the algorithm is fairly complex and depends on the specific processor you are tuning for. See decide_alg in config/i386/i386.c. There has been quite a lot of work in this area based on benchmarking on a range of processors. I'm sure there is plenty of room for improvement, but it should be based on real benchmarks doing memcpy of various sizes. To be clear: the gcc@gcc.gnu.org mailing list is for discussion about the development of gcc itself. If you want to make suggestions for improvements without digging into the gcc code, I recommend an enhancement request at http://gcc.gnu.org/bugzilla/ , in a case like this ideally with benchmarks. On this mailing list we'll be happy to tell you what to modify to fix the compiler yourself. In this case, decide_alg. Note in particular the comments there that a loop performs better for small values. Ian