serge-sans-paille added a comment. In D71374#1783032 <https://reviews.llvm.org/D71374#1783032>, @Jim wrote:
> I am curious what is difference of code generation after applying your > changes? Before, when compiling #define _GNU_SOURCE #include <string.h> void* foo(void* to, void* from, unsigned n) { return mempcpy(mempcpy(to, from, n), from, n); } We get (clang -O3) define i8* @foo(i8*, i8*, i32) #0 { %4 = alloca i8*, align 8 %5 = alloca i8*, align 8 %6 = alloca i32, align 4 store i8* %0, i8** %4, align 8 store i8* %1, i8** %5, align 8 store i32 %2, i32* %6, align 4 %7 = load i8*, i8** %4, align 8 %8 = load i8*, i8** %5, align 8 %9 = load i32, i32* %6, align 4 %10 = zext i32 %9 to i64 %11 = call i8* @mempcpy(i8* %7, i8* %8, i64 %10) #2 %12 = load i8*, i8** %5, align 8 %13 = load i32, i32* %6, align 4 %14 = zext i32 %13 to i64 %15 = call i8* @mempcpy(i8* %11, i8* %12, i64 %14) #2 ret i8* %15 } And we now get define dso_local i8* @foo(i8* %to, i8* nocapture readonly %from, i32 %n) local_unnamed_addr #0 { entry: %conv = zext i32 %n to i64 tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %to, i8* align 1 %from, i64 %conv, i1 false) %0 = getelementptr i8, i8* %to, i64 %conv tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %0, i8* align 1 %from, i64 %conv, i1 false) %1 = getelementptr i8, i8* %0, i64 %conv ret i8* %1 } Which looks much better to me, esp. as it unlocks memcpy-specific optimisations Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71374/new/ https://reviews.llvm.org/D71374 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits