gchatelet added a subscriber: tejohnson. gchatelet added a comment. In D61634#1500453 <https://reviews.llvm.org/D61634#1500453>, @efriedma wrote:
> My main blocker is that I want to make sure we're moving in the right > direction: towards LLVM IR with clear semantics, towards straightforward > rules for writing freestanding C code, and towards solutions which behave > appropriately for all targets. There's clearly a problem here that's worth > solving, but I want to make sure we're adding small features that interact > with existing features in an obvious way. The patch as written is adding a > new attribute that changes the semantics of C and LLVM IR in a subtle way > that interacts with existing optimizations and lowering in ways that are > complex and hard to understand. This makes a lot of sense, I'm totally on board to reduce entropy and confusion along the way. > I don't want to mix general restrictions on calling C library functions, with > restrictions that apply specifically to memcpy: clang generally works on the > assumption that a "memcpy" symbol is available in freestanding environments, > and we don't really want to change that. > > With -fno-builtin, specifically, we currently apply the restriction that > optimizations will not introduce memcpy calls that would not exist with > optimization disabled. This is sort of artificial, and and maybe a bit > confusing, but it seems to work well enough in practice. gcc does something > similar. > > I don't really want to add an attribute that has a different meaning from > -fno-builtin. An attribute that has the same meaning is a lot less > problematic: it reduces potential confusion, and solves a related problem > that -fno-builtin currently doesn't really work with LTO, because we don't > encode it into the IR. Adding @tejohnson to the conversation. IIUC you're offering to introduce something like `__attribute__((no-builtin-memcpy))` or `__attribute__((no-builtin("memcpy")))`. As attributes they would compose nicely with (Thin)LTO. I believe we still want to turn `-fno-builtin` flags into attributes so there's no impedance mismatch between the flag and the attribute right? > Your current patch is using the "AlwaysInline" flag for > SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline > implementation. In general, this flag is only supported for constant-size > memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not > have some special lowering, like the x86 "rep;movs", you'll hit an assertion > failure. And we don't really want to add an implementation of > variable-length memcpy for a lot of targets; it's very inefficient on targets > which don't have unaligned memory access. You could try to narrowly fix this > to only apply "AlwaysInline" if the size is a constant integer, but there's a > related problem: even if a memcpy is constant size in C, optimizations don't > promise to preserve that, in general. We could theoretically add such a > promise, I guess, but that's awkward: it would conditionally apply to > llvm.memcpy where the parameter is already const, so it's not something we > can easily verify. Fair enough. This patch was really to get the conversation started : I was myself not especially convinced about the approach. Hijacking the `AlwaysInline` parameter was a shortcut that would not work for other mem function anyways. > If we added a new builtin `llvm.memcpy.inline`, or something like that, the > end result ends up being simpler in many ways. It has obvious rules for both > generating it and lowering it, which don't depend on attributes in the parent > function. We could mark the size as "immarg", so we don't have to add > special optimization rules for it. And if we have it, we have a "complete" > solution for writing memcpy implementations in C: we make `__builtin_memcpy`, > or a new `__builtin_inline_memcpy`, produce it, and it can be combined with > -fno-builtin to ensure we don't produce any calls to the "real" memcpy. The > downside of a new builtin, of course, is that we now have two functions with > similar semantics, and potentially have to fix a bunch of places to check for > both of them. This was one of the approaches I envisioned, it's much cleaner and it's also a lot more work : ) I'm willing to go that route knowing that it would also work for @theraven's use case. > MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops > into llvm.memcpy). If we want it to perform some transform in circumstances > where the call "memcpy" isn't available, we have to make sure to restrict it > based on the target: in the worst case, on some targets without unaligned > memory access, it could just act as a low-quality loop unroller. This > applies no matter how the result is actually represented in IR. Yes if we have to generate loops it need to happen before SelectionDAG. ---- In this framework `-ffreestanding` stays as it is - namely it implies `-fno-builtin`. I still think that the semantic is somewhat surprising and unclear to the newcomer but I guess we can't do anything about it at this point - apart adding more documentation. Lastly, if we are to introduce new IR intrinsics, how about adding some for `memcmp` and `bcmp`? It does not have to be part of this patch but I think it's worth considering for consistency. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D61634/new/ https://reviews.llvm.org/D61634 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits