Second, it seems that by design, LTO prefers builtins to user-provided
versions of them.  In particular, lto_symtab_prevailing_decl() stipulates
that builtins are their own prevailing decl.  So even if we lowered TM
before LTO streaming, user provided builtins wouldn't be preferred (and thus
inlined) as we would expect into application code.

Hmm, so you say you have sth like

void *memcpy(void *dst, void *src, size_t n) { ...implementation... }
void foo()
{
   memcpy (...);
}

and expect it to be inlined from the supplied body instead of using the
builtin expander?

Yes.  Ultimately we want to do exactly that with TM instrumented code.

I think we could make this work ... at least under a sort-of ODR, that
all bodies (from different TUs) and the builtin have the same behavior.

Mind to file an enhancement bug?  Does it work without LTO?

Without LTO the memcpy gets inlined correctly.  This is what I am using:

houston:/build/t/gcc$ cat a.c
char *dst, *src;

void *memcpy(void *, const void *, __SIZE_TYPE__);

main()
{
  memcpy(dst, src, 123);
}
houston:/build/t/gcc$ cat b.c
extern int putchar(int);

void *memcpy(void *dst,
             const void *src,
             __SIZE_TYPE__ n)
{
  putchar(13);
}
houston:/build/t/gcc$ ./xgcc -B./ -flto -O3 a.c b.c -save-temps -o a.out

However, with LTO, somewhere around constant propagation (ccp2), we decide the memcpy is no longer needed and remove it altogether. So it looks like the builtin was preferred.

I will file an enhancement PR with the above example.

Thanks for looking into this.

Reply via email to