gchatelet added a comment.

In D61634#1493927 <https://reviews.llvm.org/D61634#1493927>, @efriedma wrote:

> I would be careful about trying to over-generalize here.  There are a few 
> different related bits of functionality which seem to be interesting, given 
> the discussion in the llvm-dev thread, here, and in related patches:


Thx for the feedback @efriedma, I don't fully understand what you're suggesting 
here so I will try to reply inline.

> 1. The ability to specify -fno-builtin* on a per-function level, using 
> function attributes.

`-fno-builtin*` is about preventing clang/llvm from recognizing that a piece of 
code has the same semantic as a particular IR intrinsic, it has nothing to do 
with preventing the compiler from generating runtime calls.

- `fno-builtin` is about transformation from code to IR (frontend)
- The RFC is about the transformation from IR to runtime calls (backend)

> 2. Improved optimization when -fno-builtin-memcpy is specified.

I don't see this happening because if `-fno-builtin-memcpy`is used, clang 
(frontend) might already have unrolled and vectorized the loop, It is then very 
hard - by simply looking at the IR - to recognize that it's a `memcpy` and 
generate good code (e.g. https://godbolt.org/z/JZ-mR0)
Here we really want the compiler to understand that we are copying memory (i.e. 
this is really `@llvm.memcpy` semantic) but we want to prevent it from calling 
the runtime.

> 3. The ability to avoid calls to memcpy for certain C constructs which would 
> naturally be lowered to a memcpy call, like struct assignment of large 
> structs, or explicit calls to __builtin_memcpy().  Maybe also some 
> generalization of this involving other libc/libm/compiler-rt calls.

I believe very few people will use the attribute described in the RFC, it will 
most probably be library maintainers that already know a good deal of how the 
compiler is allowed to transform the code.

> 4. The ability to force the compiler to generate "rep; movs" on x86 without 
> inline asm.

This is not strictly required - at least this is not too useful from the 
purpose of building memcpy functions (more on this a few lines below).

> It's not clear to me that all of this should be tied together.  In 
> particular, I'm not sure -fno-builtin-memcpy should imply the compiler never 
> generates a call to memcpy().

As a matter of fact, those are not tied together. There are different use cases 
with different solutions, the one I'm focusing on here is about preventing the 
compiler from synthesizing runtime calls because we want to be able to 
implement them directly from C / C++.
It is orthogonal to having the compiler recognize a piece of code as an IR 
intrinsic.

> On recent x86 chips, you might be able to get away with unconditionally using 
> "rep movs", but generally an efficient memcpy for more than a few bytes is a 
> lot longer than one instruction, and is not something reasonable for the 
> compiler to synthesize inline.

Well it depends. On Haswell and particularly Skylake it's hard to beat 
rep;movsb for anything bigger than 1k, be it aligned or not.
On other architectures and especially on the ones without ERMSB you have 
different strategies. Actually this is the very goal of this RFC: if you can 
inline or use PGO you can do a much better job for small sizes than calling 
libc's memcpy or inserting `rep;movsb`.

> If we're adding new IR attributes here, we should also consider the 
> interaction with LTO.

Yes this is a very different story, that's why I'm not exploring this route. 
It's rather possible that it would come with a high maintenance cost as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to