On 24 November 2016 at 17:48, Richard Biener <rguent...@suse.de> wrote: > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 14:07, Richard Biener <rguent...@suse.de> wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi, >> >> Consider following test-case: >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> { >> >> __builtin_memcpy (a1, a2, a3); >> >> return a1; >> >> } >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> and the call could be emitted as a tail-call. >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> but if it is changed to: >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> return t1; >> >> >> >> Then memcpy is emitted as a tail-call. >> >> The attached patch tries to handle the former case. >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> Does this patch look OK ? >> > >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. >> > */ >> > +tree >> > +gimple_call_return_arg (gcall *call_stmt) >> > +{ >> > >> > >> > Please just inline it at the single use - the name is not terribly >> > informative. >> > >> > I'm not sure you can rely on code-generation working if you not >> > effectively change the IL to >> > >> > a1 = __builtin_memcpy (a1, a2, a3); >> > return a1; >> > >> > someone more familiar with RTL expansion plus tail call emission on >> > RTL needs to chime in. >> Well I was trying to copy-propagate function's argument into uses of >> it's return value if >> function returned that argument, so the assignment to lhs of call >> could be made redundant. >> >> eg: >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> { >> void *t1 = __builtin_memcpy (a1, a2, a3); >> return t1; >> } >> >> After patch, copyprop transformed it into: >> t1 = __builtin_memcpy (a1, a2, a3); >> return a1; > > But that's a bad transform -- if we know that t1 == a1 then it's > better to use t1 as that's readily available in the return register > while the register for a1 might have been clobbered and thus we > need to spill it for the later return. Oh I didn't realize this could possibly pessimize RA. For test-case:
void *t1 = memcpy (dest, src, n); if (t1 != dest) __builtin_abort (); we could copy-propagate t1 into cond_expr and make the condition redundant. However I suppose this particular case could be handled with VRP instead (t1 and dest should be marked equivalent) ? Thanks, Prathamesh > >> But this now interferes with tail-call optimization, because it is not >> able to emit memcpy >> as tail-call anymore due to which the patch regressed 20050503-1.c. >> I am not sure how to workaround this. >> >> Thanks, >> Prathamesh >> > >> > Richard. >> > > -- > Richard Biener <rguent...@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)