lxfind created this revision. Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei, hiraditya. lxfind requested review of this revision. Herald added subscribers: llvm-commits, cfe-commits, jdoerfert. Herald added projects: clang, LLVM.
A coroutine has the following structure in LLVM IR: entry: alloca .. %promise = alloca ... %0 = call token @llvm.coro.id(..., %promise %1 = call i1 @llvm.coro.alloc(token %0) br i1 %1, label %coro.alloc, label %coro.init coro.alloc: %2 = call i64 @llvm.coro.size.i64() %call = call noalias nonnull i8* @_Znwm(i64 %2) br label %coro.init coro.init: ; preds = %coro.alloc, %entry %3 = phi i8* [ null, %entry ], [ %call, %coro.alloc ] %4 = call i8* @llvm.coro.begin(token %0, i8* %3) ... move parameters to stack alloca create promise object ... actual coroutine body ... It uses coro.id to uniquely identify the coroutine (which also refers to the promise object), use coro.alloc to decide whether to create frame on the heap, and use coro.begin the mark the frame object. After that, it always moves all parameters to stack (stored in allocas), create the promise object by calling its constructor. Finally it emits the actual coroutine body code. Having all of these in the same function creates problems: optimization passes blend the initialization code with the coroutine body and move code around, which latter violates some of the requirements by coroutines. There are two examples: 1. Frame objects accessed before coro.begin: coro.begin returns the frame pointer, that is, the frame is only ready after coro.begin. If any value is used across coroutine suspension and needs to be put on the frame, they need to be accessed through the frame instead of alloca. This is easy if the value is first accessed after coro.begin: we can just replace all their references by a pointer to the frame. However if a value is accessed before coro.begin, but also need to live on the frame, we are in trouble. D66230 <https://reviews.llvm.org/D66230> made an initial attempt to fix this, but it wasn't complete. I made the fix more robust in D86859 <https://reviews.llvm.org/D86859>, which introduced a lot of complexity to AllocaUseVisitor. The basic idea is that we track every alloca use (both explicit and implicit through aliases) before coro.begin, and if they are touched we copy them into the frame after coro.begin. This is however not bullet-proof. If there exists complicated phi nodes, we may end up having to copy every single alloca to the frame. This patch separate the code before coro.begin and after coro.begin, making it impossible for optimization passes to mess around. There can be no complicated access to the frame before frame creation. 2. Captured by-val parameter through MemCpyOptPass: https://bugs.llvm.org/show_bug.cgi?id=48857. To summarize the problem, in the coroutine IR, a first mem.copy copies a passed-by-value parameter to a local allloca, and latter (after a coroutine suspension) copies the local alloca to another local alloca. MemCpyOptPass merges them and turns the second copy to be copying directly from the parameter to the second local alloca. This will lead to crash because the passed-by-value parameter pointer would have died after the coroutine suspension. This patch separate the parameter copy code and the coroutine body, making this kind of optimizations impossible. Overall, we want to split the coroutine as much as possible as early as possible to avoid any kind of violations of coroutine propertiers from optimization passes. To split the coroutine early, this patch splits the coroutine right after parameter move during CoroEarly pass. Anything before remain in the original function (called init function), and the rest is put into a new function (called ramp function). It's done through 3 steps: 1. In CGCoroutine, we need to emit a few new intrinsic instructions that CoroEarly can use to correctly split the function. First of all, the parameter move should only happen once in the init function. To achieve this effect, a new intrinsic coro.init() is created that returns a boolean value. It will return true in the init function while false in the ramp function. This allows us to control the behavior difference between init and ramp. Secondly, we need a marker that tells CoroEarly pass that the init function part is done, and the rest belongs to the ramp function. This is achieved by a new intrinsic coro.init.end(). This essentially marks the splitting point in CoroEarly split. Finally, every alloca that's storing the parameter copies will be annotated with metadata, indicating that they are parameters and will be used in the ramp function. The same thing is done to the promise object. These should be the only allocas that need to be used across init and ramp function. Such metadata will allow us to properly tag these values during CoroEarly pass, so that they won't be DCEed even though they may not be used in the rest of init function. 2. In CoroEarly pass, if the coroutine is a switch-lowering coroutine (i.e. has a coro.id), we will split the coroutine. The split process works like this: we first go through every alloca that has metadata and generate calls to a new intrinsic coro.frame.get(). coro.frame.get() returns the pointer from the frame for the specific alloca. It captures the alloca, indicates whether it's a promise and has a unique ID. All uses of this alloca is then replaced with the value returned by coro.frame.get(); next the function is cloned into a new function (ramp function), which takes only one parameter, the coroutine frame. It then goes through all intrinsics in the new function, repalce coro.begin with the parameter, replace coro_init() with false, and also replaces coro_alloc() with false (only init function needs to alloca the frame); finally it goes through all intrinsics in the init function, replaces coro_init() with true, and generates a call to the ramp function at the location of coro.init.end(), which removes rest of the function. 3. In CoroSplit pass, the idea is to inline the ramp function back to the init function, so that we can reuse the existing CoroSplit logic. To do so, we need to introduce a few more CORO_PRESPLIT_ATTR to tag the different states of the init and ramp function. When the ramp function is ready to split, and when we are processing the init function, we inline ramp function into the init function. To do so, we replace every coro.frame.get() with the original alloca. It also sets the promise field of coro.id to the promise object. After inlining, we update CGSCC and delete the ramp function. Examples: Coroutine IR emitted from the Clang front-end will look like this: define @foo(i32 %val...) entry: %val1 = alloca i32, align 4, !coroutine_frame_alloca !2 ... %promise = alloca... !coroutine_frame_alloca !5 %0 = call token @llvm.coro.id(..., null... %1 = call i1 @llvm.coro.alloc(token %0) br i1 %1, label %coro.alloc, label %coro.begin coro.alloc: %2 = call i64 @llvm.coro.size.i64() %call = call noalias nonnull i8* @_Znwm(i64 %2) br label %coro.begin coro.begin: ; preds = %coro.alloc, %entry %3 = phi i8* [ null, %entry ], [ %call, %coro.alloc ] %4 = call i8* @llvm.coro.begin(token %0, i8* %3) %5 = call i1 @llvm.coro.init() br i1 %5, label %coro.init, label %coro.init.ready coro.init: ; preds = %coro.begin %6 = bitcast i32* %val1 to i8* %7 = load i32, i32* %val.addr, align 4 store i32 %7, i32* %val1, align 4 ... call void @llvm.coro.init.end() br label %coro.init.ready coro.init.ready: ... ... !2 = !{i1 false, i32 0} !5 = !{i1 true, i32 3} ... CoroEarly will then split into two functions: define @foo(i32 %val...) entry: %val1 = alloca i32, align 4 ... %promise = alloca... %0 = call token @llvm.coro.id(..., null... %1 = call i1 @llvm.coro.alloc(token %0) br i1 %1, label %coro.alloc, label %coro.begin coro.alloc: %2 = call i64 @llvm.coro.size.i64() %call = call noalias nonnull i8* @_Znwm(i64 %2) br label %coro.begin coro.begin: ; preds = %coro.alloc, %entry %3 = phi i8* [ null, %entry ], [ %call, %coro.alloc ] %4 = call i8* @llvm.coro.begin(token %0, i8* %3) %5 = bitcast i32* %val1 to i8* %6 = call i8* @llvm.coro.frame.get(i8* %4, i8* %5, i1 false, i32 0) %7 = bitcast i8* %6 to i32* ... br label %coro.init coro.init: ; preds = %coro.begin %17 = bitcast i32* %7 to i8* %18 = load i32, i32* %val.addr, align 4 store i32 %18, i32* %7, align 4 ... call void @_Z1fi8MoveOnly11MoveAndCopy.ramp(i8* %4) ret void define @foo.ramp(i8* %0) entry: %val1 = alloca i32, align 4 ... %promise = alloca... %1 = call token @llvm.coro.id(..., null... br label %coro.begin coro.begin: %2 = bitcast i32* %val1 to i8* %3 = call i8* @llvm.coro.frame.get(i8* %0, i8* %2, i1 false, i32 0) %4 = bitcast i8* %3 to i32* br label %coro.init.ready coro.init.ready: ... Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D100415 Files: clang/lib/CodeGen/CGCoroutine.cpp llvm/include/llvm/IR/Intrinsics.td llvm/lib/Transforms/Coroutines/CoroEarly.cpp llvm/lib/Transforms/Coroutines/CoroInternal.h llvm/lib/Transforms/Coroutines/CoroSplit.cpp
Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp =================================================================== --- llvm/lib/Transforms/Coroutines/CoroSplit.cpp +++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp @@ -2049,6 +2049,74 @@ Fns.push_back(PrepareFn); } +static Function *getCoroInitFunction(Function &RampFunc) { + StringRef RampName = RampFunc.getName(); + assert(RampName.endswith(".ramp") && "Ramp function must ends with .ramp"); + StringRef InitName = RampName.substr(0, RampName.size() - 5); + return RampFunc.getParent()->getFunction(InitName); +} + +static Function *inlineRampFunction(Function &F) { + CallInst *RampCall = cast<CallInst>( + &*llvm::find_if(instructions(F), [&](const Instruction &I) { + if (const CallInst *CI = dyn_cast<CallInst>(&I)) + return CI->getCalledFunction()->getName().startswith(F.getName()); + return false; + })); + InlineFunctionInfo IFI; + InlineFunction(*RampCall, IFI); + + SmallVector<IntrinsicInst *, 2> CoroIds; + CoroBeginInst *CoroBegin = nullptr; + SmallVector<IntrinsicInst *, 8> CoroFrameGets; + for (Instruction &I : instructions(F)) { + auto *II = dyn_cast<IntrinsicInst>(&I); + if (!II) + continue; + switch (II->getIntrinsicID()) { + default: + break; + case Intrinsic::coro_id: + CoroIds.push_back(II); + break; + case Intrinsic::coro_begin: + CoroBegin = cast<CoroBeginInst>(II); + break; + case Intrinsic::coro_frame_get: + CoroFrameGets.push_back(II); + break; + } + } + assert(CoroIds.size() == 2 && "There must be two coro.id calls, from the " + "init function and ramp function respectively"); + CoroIdInst *RealId = cast<CoroIdInst>(CoroBegin->getId()); + for (IntrinsicInst *I : CoroIds) + if (I != RealId) + I->replaceAllUsesWith(RealId); + DenseMap<uint32_t, Instruction *> FrameSlotMap; + for (IntrinsicInst *FrameGet : CoroFrameGets) { + bool IsPromise = cast<ConstantInt>(FrameGet->getOperand(2))->getZExtValue(); + uint32_t SlotID = + cast<ConstantInt>(FrameGet->getOperand(3))->getZExtValue(); + auto Itr = FrameSlotMap.find(SlotID); + Instruction *Ptr; + if (Itr == FrameSlotMap.end()) { + Ptr = cast<Instruction>(FrameGet->getOperand(1)); + FrameSlotMap[SlotID] = Ptr; + } else { + Ptr = Itr->second; + } + FrameGet->replaceAllUsesWith(Ptr); + FrameGet->eraseFromParent(); + if (IsPromise) { + RealId->setOperand(1, new BitCastInst(Ptr->stripPointerCasts(), + Ptr->getType(), "", RealId)); + } + } + + return RampCall->getCalledFunction(); +} + PreservedAnalyses CoroSplitPass::run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM, LazyCallGraph &CG, CGSCCUpdateResult &UR) { @@ -2082,6 +2150,8 @@ } } + SmallVector<Function *, 1> UnpreparedInitFuncs; + SmallVector<Function *, 1> InlinedRampFuncs; // Split all the coroutines. for (LazyCallGraph::Node *N : Coroutines) { Function &F = N->getFunction(); @@ -2089,12 +2159,24 @@ StringRef Value = Attr.getValueAsString(); LLVM_DEBUG(dbgs() << "CoroSplit: Processing coroutine '" << F.getName() << "' state: " << Value << "\n"); - if (Value == UNPREPARED_FOR_SPLIT) { + if (Value == DO_NOT_PROCESS) + continue; + if (Value == UNPREPARED_FOR_SPLIT_RAMP) { // Enqueue a second iteration of the CGSCC pipeline on this SCC. UR.CWorklist.insert(&C); - F.addFnAttr(CORO_PRESPLIT_ATTR, PREPARED_FOR_SPLIT); + // Once we allow the ramp function to be optimized, we will split + // the init function directly and ignore the ramp function. + F.addFnAttr(CORO_PRESPLIT_ATTR, DO_NOT_PROCESS); + UnpreparedInitFuncs.push_back(getCoroInitFunction(F)); continue; } + if (Value == PREPARED_FOR_SPLIT_INIT) { + Function *RampFunc = inlineRampFunction(F); + InlinedRampFuncs.push_back(RampFunc); + RampFunc->removeDeadConstantUsers(); + RampFunc->dropAllReferences(); + updateCGAndAnalysisManagerForCGSCCPass(CG, C, *N, AM, UR, FAM); + } F.removeFnAttr(CORO_PRESPLIT_ATTR); SmallVector<Function *, 4> Clones; @@ -2109,6 +2191,23 @@ UR.RCWorklist.insert(CG.lookupRefSCC(CG.get(*Clones[0]))); } } + for (Function *F : UnpreparedInitFuncs) + F->addFnAttr(CORO_PRESPLIT_ATTR, PREPARED_FOR_SPLIT_INIT); + for (Function *DeadF : InlinedRampFuncs) { + auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF)); + FAM.clear(*DeadF, DeadF->getName()); + AM.clear(DeadC, DeadC.getName()); + auto &DeadRC = DeadC.getOuterRefSCC(); + CG.removeDeadFunction(*DeadF); + + // Mark the relevant parts of the call graph as invalid so we don't visit + // them. + UR.InvalidatedSCCs.insert(&DeadC); + UR.InvalidatedRefSCCs.insert(&DeadRC); + + DeadF->getBasicBlockList().clear(); + M.getFunctionList().remove(DeadF); + } if (!PrepareFns.empty()) { for (auto *PrepareFn : PrepareFns) { @@ -2179,6 +2278,7 @@ createDevirtTriggerFunc(CG, SCC); // Split all the coroutines. + // FIXME: adapt to the new split model for (Function *F : Coroutines) { Attribute Attr = F->getFnAttribute(CORO_PRESPLIT_ATTR); StringRef Value = Attr.getValueAsString(); @@ -2190,7 +2290,7 @@ F->removeFnAttr(CORO_PRESPLIT_ATTR); continue; } - if (Value == UNPREPARED_FOR_SPLIT) { + if (Value == UNPREPARED_FOR_SPLIT_RAMP) { prepareForSplit(*F, CG); continue; } Index: llvm/lib/Transforms/Coroutines/CoroInternal.h =================================================================== --- llvm/lib/Transforms/Coroutines/CoroInternal.h +++ llvm/lib/Transforms/Coroutines/CoroInternal.h @@ -37,9 +37,11 @@ // Async lowering similarily triggers a restart of the pipeline after it has // split the coroutine. #define CORO_PRESPLIT_ATTR "coroutine.presplit" -#define UNPREPARED_FOR_SPLIT "0" +#define DO_NOT_PROCESS "0" #define PREPARED_FOR_SPLIT "1" #define ASYNC_RESTART_AFTER_SPLIT "2" +#define UNPREPARED_FOR_SPLIT_RAMP "3" +#define PREPARED_FOR_SPLIT_INIT "4" #define CORO_DEVIRT_TRIGGER_FN "coro.devirt.trigger" Index: llvm/lib/Transforms/Coroutines/CoroEarly.cpp =================================================================== --- llvm/lib/Transforms/Coroutines/CoroEarly.cpp +++ llvm/lib/Transforms/Coroutines/CoroEarly.cpp @@ -8,10 +8,15 @@ #include "llvm/Transforms/Coroutines/CoroEarly.h" #include "CoroInternal.h" +#include "llvm/ADT/SetVector.h" +#include "llvm/IR/Dominators.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/InstIterator.h" #include "llvm/IR/Module.h" +#include "llvm/IR/Type.h" #include "llvm/Pass.h" +#include "llvm/Transforms/Utils/Cloning.h" +#include "llvm/Transforms/Utils/Local.h" using namespace llvm; @@ -145,6 +150,121 @@ CB->setCannotDuplicate(); } +static void splitRampFunction(Function &F) { + Module *M = F.getParent(); + LLVMContext &C = M->getContext(); + { + CoroBeginInst *CoroBegin = cast<CoroBeginInst>( + &*llvm::find_if(instructions(F), + [](Instruction &I) { return isa<CoroBeginInst>(&I); })); + Instruction *InsertPoint = CoroBegin->getNextNode(); + + for (Instruction &I : make_early_inc_range(instructions(F))) { + auto *AI = dyn_cast<AllocaInst>(&I); + if (!AI) + continue; + auto *MD = AI->getMetadata("coroutine_frame_alloca"); + if (!MD) + continue; + + auto *IsPromise = cast<ConstantAsMetadata>(MD->getOperand(0))->getValue(); + auto *SlotID = cast<ConstantAsMetadata>(MD->getOperand(1))->getValue(); + auto *VoidPt = + new BitCastInst(AI, llvm::Type::getInt8PtrTy(C), "", InsertPoint); + auto *FrameGet = CallInst::Create( + Intrinsic::getDeclaration(M, Intrinsic::coro_frame_get), + {CoroBegin, VoidPt, IsPromise, SlotID}, "", InsertPoint); + auto *NewPtr = new BitCastInst(FrameGet, AI->getType(), "", InsertPoint); + AI->replaceUsesWithIf(NewPtr, + [&](Use &U) { return U.getUser() != VoidPt; }); + } + } + + Function *NewF; + { + // Create the split ramp function, and clone. + llvm::Type *NewFArgTypes[] = {llvm::Type::getInt8PtrTy(C)}; + auto newFuncType = + FunctionType::get(F.getReturnType(), NewFArgTypes, false); + NewF = Function::Create(newFuncType, + GlobalValue::LinkageTypes::ExternalLinkage, + F.getName() + ".ramp"); + NewF->addFnAttr(Attribute::NoInline); + M->getFunctionList().push_back(NewF); + ValueToValueMapTy VMap; + for (Argument &A : F.args()) + VMap[&A] = UndefValue::get(A.getType()); + SmallVector<ReturnInst *, 4> Returns; + CloneFunctionInto(NewF, &F, VMap, CloneFunctionChangeType::LocalChangesOnly, + Returns); + } + + { + // Process the init function. + IntrinsicInst *CoroBegin = nullptr; + IntrinsicInst *CoroInitEnd = nullptr; + for (Instruction &I : make_early_inc_range(instructions(F))) { + auto *II = dyn_cast<IntrinsicInst>(&I); + if (!II) + continue; + switch (II->getIntrinsicID()) { + default: + break; + case Intrinsic::coro_begin: + CoroBegin = II; + break; + case Intrinsic::coro_init: + II->replaceAllUsesWith( + llvm::ConstantInt::get(llvm::Type::getInt1Ty(C), 1)); + II->eraseFromParent(); + break; + case Intrinsic::coro_init_end: + CoroInitEnd = II; + break; + } + } + assert(CoroInitEnd->getNextNode() == + CoroInitEnd->getParent()->getTerminator() && + "coro.init.end call should be at the end of the init block"); + CoroInitEnd->getNextNode()->eraseFromParent(); + CallInst *Ret = CallInst::Create(NewF, {CoroBegin}, "", CoroInitEnd); + if (F.getReturnType()->isVoidTy()) + ReturnInst::Create(C, nullptr, CoroInitEnd); + else + ReturnInst::Create(C, Ret, CoroInitEnd); + CoroInitEnd->eraseFromParent(); + removeUnreachableBlocks(F); + F.addFnAttr(CORO_PRESPLIT_ATTR, DO_NOT_PROCESS); + } + + { + // Process the ramp function. + for (Instruction &I : make_early_inc_range(instructions(*NewF))) { + auto *II = dyn_cast<IntrinsicInst>(&I); + if (!II) + continue; + switch (II->getIntrinsicID()) { + default: + continue; + case Intrinsic::coro_begin: + II->replaceAllUsesWith(NewF->getArg(0)); + break; + case Intrinsic::coro_init: + II->replaceAllUsesWith( + llvm::ConstantInt::get(llvm::Type::getInt1Ty(C), 0)); + break; + case Intrinsic::coro_alloc: + II->replaceAllUsesWith( + llvm::ConstantInt::get(llvm::Type::getInt1Ty(C), 0)); + break; + } + II->eraseFromParent(); + } + removeUnreachableBlocks(*NewF); + NewF->addFnAttr(CORO_PRESPLIT_ATTR, UNPREPARED_FOR_SPLIT_RAMP); + } +} + bool Lowerer::lowerEarlyIntrinsics(Function &F) { bool Changed = false; CoroIdInst *CoroId = nullptr; @@ -179,7 +299,6 @@ // with a coroutine attribute. if (auto *CII = cast<CoroIdInst>(&I)) { if (CII->getInfo().isPreSplit()) { - F.addFnAttr(CORO_PRESPLIT_ATTR, UNPREPARED_FOR_SPLIT); setCannotDuplicate(CII); CII->setCoroutineSelf(); CoroId = cast<CoroIdInst>(&I); @@ -210,9 +329,11 @@ // Make sure that all CoroFree reference the coro.id intrinsic. // Token type is not exposed through coroutine C/C++ builtins to plain C, so // we allow specifying none and fixing it up here. - if (CoroId) + if (CoroId) { for (CoroFreeInst *CF : CoroFrees) CF->setArgOperand(0, CoroId); + splitRampFunction(F); + } return Changed; } @@ -226,6 +347,10 @@ } PreservedAnalyses CoroEarlyPass::run(Function &F, FunctionAnalysisManager &) { + if (F.getFnAttribute(CORO_PRESPLIT_ATTR).getValueAsString() == + UNPREPARED_FOR_SPLIT_RAMP) + return PreservedAnalyses::all(); + Module &M = *F.getParent(); if (!declaresCoroEarlyIntrinsics(M) || !Lowerer(M).lowerEarlyIntrinsics(F)) return PreservedAnalyses::all(); Index: llvm/include/llvm/IR/Intrinsics.td =================================================================== --- llvm/include/llvm/IR/Intrinsics.td +++ llvm/include/llvm/IR/Intrinsics.td @@ -1274,6 +1274,12 @@ ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>]>; +def int_coro_frame_get : Intrinsic<[llvm_ptr_ty], + [llvm_ptr_ty, llvm_ptr_ty, llvm_i1_ty, llvm_i32_ty], + [IntrNoMem]>; +def int_coro_init: Intrinsic<[llvm_i1_ty], [], []>; +def int_coro_init_end: Intrinsic<[], [], []>; + ///===-------------------------- Other Intrinsics --------------------------===// // def int_trap : Intrinsic<[], [], [IntrNoReturn, IntrCold]>, Index: clang/lib/CodeGen/CGCoroutine.cpp =================================================================== --- clang/lib/CodeGen/CGCoroutine.cpp +++ clang/lib/CodeGen/CGCoroutine.cpp @@ -547,7 +547,7 @@ auto *EntryBB = Builder.GetInsertBlock(); auto *AllocBB = createBasicBlock("coro.alloc"); - auto *InitBB = createBasicBlock("coro.init"); + auto *BeginBB = createBasicBlock("coro.begin"); auto *FinalBB = createBasicBlock("coro.final"); auto *RetBB = createBasicBlock("coro.ret"); @@ -564,7 +564,7 @@ auto *CoroAlloc = Builder.CreateCall( CGM.getIntrinsic(llvm::Intrinsic::coro_alloc), {CoroId}); - Builder.CreateCondBr(CoroAlloc, AllocBB, InitBB); + Builder.CreateCondBr(CoroAlloc, AllocBB, BeginBB); EmitBlock(AllocBB); auto *AllocateCall = EmitScalarExpr(S.getAllocate()); @@ -577,17 +577,17 @@ // See if allocation was successful. auto *NullPtr = llvm::ConstantPointerNull::get(Int8PtrTy); auto *Cond = Builder.CreateICmpNE(AllocateCall, NullPtr); - Builder.CreateCondBr(Cond, InitBB, RetOnFailureBB); + Builder.CreateCondBr(Cond, BeginBB, RetOnFailureBB); // If not, return OnAllocFailure object. EmitBlock(RetOnFailureBB); EmitStmt(RetOnAllocFailure); } else { - Builder.CreateBr(InitBB); + Builder.CreateBr(BeginBB); } - EmitBlock(InitBB); + EmitBlock(BeginBB); // Pass the result of the allocation to coro.begin. auto *Phi = Builder.CreatePHI(VoidPtrTy, 2); @@ -606,12 +606,36 @@ CodeGenFunction::RunCleanupsScope ResumeScope(*this); EHStack.pushCleanup<CallCoroDelete>(NormalAndEHCleanup, S.getDeallocate()); + // Wrap around the parameter copy with a coro.init() check. + // This will allows us to perform parameter copy in the init function, but + // not in the ramp function. + auto *InitBB = createBasicBlock("coro.init"); + auto *InitReadyBB = createBasicBlock("coro.init.ready"); + auto *CoroInit = + Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::coro_init)); + Builder.CreateCondBr(CoroInit, InitBB, InitReadyBB); + + EmitBlock(InitBB); + SmallVector<llvm::AllocaInst *, 4> FrameAllocas; // Create parameter copies. We do it before creating a promise, since an // evolution of coroutine TS may allow promise constructor to observe // parameter copies. + int ID = 0; for (auto *PM : S.getParamMoves()) { EmitStmt(PM); ParamReplacer.addCopy(cast<DeclStmt>(PM)); + llvm::AllocaInst *Alloca = cast<llvm::AllocaInst>( + GetAddrOfLocalVar(cast<VarDecl>(cast<DeclStmt>(PM)->getSingleDecl())) + .getPointer()); + Alloca->setMetadata( + "coroutine_frame_alloca", + llvm::MDNode::get( + getLLVMContext(), + { + llvm::ConstantAsMetadata::get( + Builder.getInt1(false)) /*IsPromise*/, + llvm::ConstantAsMetadata::get(Builder.getInt32(ID++)), + })); // TODO: if(CoroParam(...)) need to surround ctor and dtor // for the copy, so that llvm can elide it if the copy is // not needed. @@ -619,12 +643,23 @@ EmitStmt(S.getPromiseDeclStmt()); + Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::coro_init_end)); + Builder.CreateBr(InitReadyBB); + EmitBlock(InitReadyBB); + Address PromiseAddr = GetAddrOfLocalVar(S.getPromiseDecl()); - auto *PromiseAddrVoidPtr = - new llvm::BitCastInst(PromiseAddr.getPointer(), VoidPtrTy, "", CoroId); - // Update CoroId to refer to the promise. We could not do it earlier because - // promise local variable was not emitted yet. - CoroId->setArgOperand(1, PromiseAddrVoidPtr); + llvm::AllocaInst *PromiseAlloca = + cast<llvm::AllocaInst>(PromiseAddr.getPointer()); + + PromiseAlloca->setMetadata( + "coroutine_frame_alloca", + llvm::MDNode::get( + getLLVMContext(), + { + llvm::ConstantAsMetadata::get( + Builder.getInt1(true)) /*IsPromise*/, + llvm::ConstantAsMetadata::get(Builder.getInt32(ID++)), + })); // Now we have the promise, initialize the GRO GroManager.EmitGroInit();
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits