ChuanqiXu added a comment. In D97915#2860916 <https://reviews.llvm.org/D97915#2860916>, @ychen wrote:
> In D97915#2859237 <https://reviews.llvm.org/D97915#2859237>, @ChuanqiXu wrote: > >> In D97915#2848816 <https://reviews.llvm.org/D97915#2848816>, @ychen wrote: >> >>>> Thanks for clarifying. Let's solve the semantics problem first. >>>> With the introduction about 'raw frame', I think it's necessary to >>>> introduce this concept in the section 'Switched-Resume Lowering' or even >>>> the section 'Introduction' in the document. Add a section to tell the >>>> terminology is satisfied too. >>> >>> Done. >>> >>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and >>>> 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same >>>> value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying >>>> to solve the problem about memory leak. But I think we could use >>>> llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame >>>> (Maybe we need to add an intrinsic `llvm.coro.raw.size`). Then we can >>>> omit a field in the frame to save space. >>> >>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address >>> instead of raw frame pointer) >>> >>> Apologies for the confusion. I've briefly explained it here >>> https://reviews.llvm.org/D102145#2752445 I think it is not clear. >>> "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine >>> frame field storing the `raw frame pointer`" only after `insertSpills` in >>> CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an >>> alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in >>> this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of >>> "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please >>> check line 31. The downside is that the write to coroutine frame is not >>> through an alloca but a direct write. It is unusual because all fields in >>> the frame are stored as 1. special/header fields 2. alloca 3. splills. >>> Doing the write indirectly as Alloca makes me comfortable. The tradeoff is >>> one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think? >>> >>> 19 coro.alloc.align: ; preds = >>> %coro.alloc.check.align >>> 20 %3 = sub nsw i64 64, 16 >>> 21 %4 = add i64 128, %3 >>> 22 %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13 >>> 23 %mask = sub i64 64, 1 >>> 24 %intptr = ptrtoint i8* %call1 to i64 >>> 25 %over_boundary = add i64 %intptr, %mask >>> 26 %inverted_mask = xor i64 %mask, -1 >>> 27 %aligned_intptr = and i64 %over_boundary, %inverted_mask >>> 28 %diff = sub i64 %aligned_intptr, %intptr >>> 29 %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff >>> 30 call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 >>> 64) ] >>> 31 store i8* %call1, i8** %alloc.frame.ptr, align 8 >>> >>> ; Replace line 31 with below, and must makes sure line 46~line 48 is >>> skipped. >>> ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32() >>> ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff >>> ; %addr1 = bitcast i8* %addr to i8** >>> ; store i8* %call1, i8** %addr1, align 8 >>> >>> >>> 32 br label %coro.init.from.coro.alloc.align >>> 33 >>> 34 coro.init.from.coro.alloc.align: ; preds = >>> %coro.alloc.align >>> 35 %aligned_result.coro.init = phi i8* [ %aligned_result, >>> %coro.alloc.align ] >>> 36 br label %coro.init >>> 37 >>> 38 coro.init: ; preds = >>> %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor >>> o.init.from.coro.alloc >>> 39 %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ >>> %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result >>> .coro.init, %coro.init.from.coro.alloc.align ] >>> 40 %FramePtr = bitcast i8* %5 to %f0.Frame* >>> 41 %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 0 >>> 42 store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** >>> %resume.addr, align 8 >>> 43 %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void >>> (%f0.Frame*)* @f0.cleanup >>> 44 %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 1 >>> 45 store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, >>> align 8 >>> 46 %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, >>> i32 2 >>> 47 %8 = load i8*, i8** %alloc.frame.ptr, align 8 >>> 48 store i8* %8, i8** %7, align 8 >>> 49 br label %AllocaSpillBB >>> 50 >>> 51 AllocaSpillBB: ; preds = %coro.init >>> 52 %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 4 >>> 53 %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 5 >>> 54 %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 6 >>> 55 %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 7 >>> 56 %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* >>> %FramePtr, i32 0, i32 8 >>> 57 %__promise.reload.addr = getelementptr inbounds %f0.Frame, >>> %f0.Frame* %FramePtr, i32 0, i32 10 >>> 58 br label %PostSpill >>> >>> >>> >>> >>>> Then I am a little confused for the design again, since we would treat the >>>> value for CoroBegin as the address of coroutine frame in the past and it >>>> looks like to be the raw frame now. Let me reconsider if it is OK. >>> >>> The returned value of CoroBegin is still coroutine frame not a raw frame >>> even if the frame is overaligned. You could check the above code. >> >> Thanks for clarifying! >> >> I don't understand why we need to store the address for coroutine raw frame >> in the coroutine frame. For example, `%call1` in your example marks the >> address for the raw frame. Then can we use the value `%call1` in every place >> where we want to use the address for coroutine frame? >> If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in >> the frontend if we need to use the address for the raw frame. Then in the >> middle end, we could replace `llvm.coro.raw.frame` with `%call1` simply. >> Similarly, we could define intrinsic `llvm.coro.raw.frame.size`. As far as I >> know from the codes, the address for the coroutine frame is mainly used for >> deallocation. So it should be fine I guess. >> >> --- >> >> Then the code generated now looks roughly like: >> >> if (should over align) { >> /// ... >> mem = ... >> } else { >> /// ... >> mem = ... >> } >> coro.begin(id, mem); >> >> It looks redundant since the `then` part and `else` part looks very >> similar. I understand it would be eliminated in the middle end. But another >> problem is that the redundant implementation in clang. Maybe we could solve >> it by refactoring. >> But I am wondering if it is possible to use another pattern (assume >> `llvm.coro.alloc` returns true): >> >> %raw.frame.ptr = new(call @llvm.coro.raw.frame.size()) >> %true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we >> need a better name >> call @llvm.coro.begin(coro.id, %true.frame.ptr) >> >> Then for `llvm.coro.frame`, we could return `@raw.frame.ptr ` simply if the >> alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or >> we could do simply to align up for the coroutine frame. There are many APIs >> in Align.h. >> And for the destruction, we could emit: >> >> call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size()) >> >> In this way, I guess we would get simpler implementation and generated codes. >> >> BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and >> llvm.coro.size would change slightly. They would stands for the address and >> size for the coroutine frame if we don't need over alignment. >> >> How do you think about this? > > I was confused by this and @rjmccall explained it here > https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover > "raw frame pointer" (`%call1`) from coroutine frame pointer statically at > deallocation time. Oh, I understand why we need to store the address for raw frame now. Another question is that how do you think combine the pattern: if (should over align) { /// ... mem = ... } else { /// ... mem = ... } coro.begin(id, mem); into this one: %true.frame.ptr = call @llvm.coro.create.frame(new(call @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name ; It would be lowered to store the address of the raw frame to the alloca in the middle end if needed call @llvm.coro.begin(coro.id, %true.frame.ptr) and this one: call @delete(call @llvm.coro.raw.frame.ptr(), call @llvm.coro.raw.frame.size()) ; Then use `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever we want. It looks like we could generate the same code in the front for normal and over aligned coroutine. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D97915/new/ https://reviews.llvm.org/D97915 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits