https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140
--- Comment #9 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Martin Liška from comment #8) > (In reply to Wilco from comment #7) > > (In reply to Martin Liška from comment #6) > > > Created attachment 41772 [details] > > > Patch candidate > > > > > > I'm going to prepare some test-cases for that. Does it look good? > > > > Yes, it now inlines small constant sizes. However large and variable sized > > copies have the wrong return value: > > > > void *f1(void *p, void *q) { return __builtin_mempcpy(p, q, 256); } > > > > f1: > > mov x2, 256 > > b memcpy > > Yep, I've noticed. It's strange for me why it's not working. I've just asked > at GCC ML: https://gcc.gnu.org/ml/gcc/2017-07/msg00144.html It's marked as a tailcall so anything you generate afterwards will be ignored: (call_insn/j 13 12 14 2 (parallel [ (set (reg:DI 0 x0) (call (mem:DI (symbol_ref:DI ("memcpy") [flags 0x41] <function_decl 0xffffb7acc700 __builtin_memcpy>) [0 __builtin_memcpy S8 A8]) (const_int 0 [0]))) (return) ]) "mempcpy.c":3 -1 Also check this case: void f4(void *p, void *q, int i) { __builtin_mempcpy(p, q, i); }