[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Stefan Gränitz via Phabricator via cfe-commits Tue, 16 Feb 2021 07:39:04 -0800

sgraenitz added a subscriber: anarazel.
sgraenitz added a comment.

Hi Vassil, thanks for upstreaming this! I think it goes into a good direction.

The last time I looked at the Cling sources downstream, it was based on LLVM 
release 5.0. The IncrementalJIT class was based on what we call OrcV1 today. 
OrcV1 is long deprecated and even though it's still in tree today, it will very 
likely be removed in the upcoming release cycle. So I guess, one of the 
challenges will be porting the Cling internals to OrcV2 -- a lot has changed, 
mostly to the better :) Not all of this is relevant for this patch, but maybe 
it's worth mentioning for upcoming additions.

OrcV2 works with Dylibs, basically symbol namespaces. When you add a module to 
a Dylib, all its symbols will be added. Respectively, if you want to remove 
something from a Dylib, you have to remove the symbols (for fine-tuning it you 
can reach for a Dylib's ResourceTracker). Symbols won't be materialized until 
you look them up. I guess for incremental compilation you would keep on adding 
symbols, one increment at a time.

  int var1 = 42; int f() { return var1; }
  int var2 = f();

Let's say these are two inputs. The first line only adds definitions for 
symbols `var1` and `f()` but won't materialize anything. The second line would 
have to lookup `f()`, execute it and emit a new definition for `var2`. I never 
got into Cling deep enough to find out how it works, but I assume it's 
high-level enough that it won't require large changes. One thing I'd recommend 
to double-check: if there is a third line that adds a static constructor, will 
LLJIT only run this one or will it run all previous static ctors again when 
calling `initialize()`? I assume the former but I wouldn't bet on it.

Another aspect is that downstream Cling is based on RuntimeDyld for linking 
Orc's output object files. I remember RemovableObjectLinkingLayer adding some 
object file removal code. Upstream OrcV2 grew it's own linker in the meantime. 
It's called JITLink and gets pulled into LLJIT via `ObjectLinkingLayer`. 
RuntimeDyld-based linking is still supported with the 
`RTDyldObjectLinkingLayer`. JITLink is not complete for all platforms yet. 
Thus, LLJITBuilder defaults to JITLink on macOS and RuntimeDyld otherwise. 
Chances are that JITLink gets good enough for ELF to enable it by default on 
Linux (at least x86-64). I guess that's your platform of concern? The related 
question is whether you are aiming for JITLink straight away or staying with 
RuntimeDyld for now.

For the moment, I added a few pointers inline. Some are referring to my general 
comments above.

================
Comment at: clang/lib/CodeGen/CodeGenAction.cpp:908

+CodeGenerator *CodeGenAction::getCodeGenerator() const {
+  return BEConsumer->getCodeGenerator();
----------------
v.g.vassilev wrote:
> @rjmccall, we were wondering if there is a better way to ask CodeGen to start 
> a new module. The current approach seems to be drilling hole in a number of 
> abstraction layers.
> 
> In the past we have touched that area a little in 
> https://reviews.llvm.org/D34444 and the answer may be already there but I 
> fail to connect the dots.
> 
> Recently, we thought about having a new FrontendAction callback for beginning 
> a new phase when compiling incremental input. We need to keep track of the 
> created objects (needed for error recovery) in our Transaction. We can have a 
> map of `Transaction*` to `llvm::Module*` in CodeGen. The issue is that new 
> JITs take ownership of the `llvm::Module*` which seems to make it impossible 
> to support jitted code removal with that model (cc: @lhames, @rsmith).
When compiling incrementally, doeas a "new phase" mean that all subsequent code 
will go into a new module from then on? How will dependencies to previous 
symbols be handled? Are all symbols external?

> The issue is that new JITs take ownership of the llvm::Module*

That's true, but you can still keep a raw pointer to it, which will be valid at 
least as long as the module wasn't linked. Afterwards it depends on the linker:
* RuntimeDyld can return ownership of the object's memory range via 
`NotifyEmittedFunction`
* JITLink provides the `ReturnObjectBufferFunction` for the same purpose

> seems to make it impossible to support jitted code removal with that model

Can you figure out what symbols are affected and remove these? A la: 
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020

I think @anarazel has ported a client with code removal to OrcV2 successfully 
in the past. Maybe there's something we can learn from it.

================
Comment at: clang/lib/Interpreter/IncrementalExecutor.h:36
+  llvm::Error addModule(std::unique_ptr<llvm::Module> M);
+  llvm::Error runCtors() const;
+};
----------------
v.g.vassilev wrote:
> teemperor wrote:
> > Should we maybe merge `runCtors` and `addModule`? Not sure if there is a 
> > use case for adding a Module but not running Ctors. Also documentation.
> The case we have is when there is no JIT -- currently we have such a case in 
> IncrementalProcessingTest I think. Another example, which will show up in 
> future patches, is the registration of atexit handlers. That is, before we 
> `runCtors` we make a pass over the LLVM IR and collect some specific details 
> and (possibly change the IR and then run).
> 
> I'd rather keep it separate for now if that's okay.
> Should we maybe merge runCtors and addModule?

+1 even though there may be open questions regarding incremental initialization.

> The case we have is when there is no JIT -- currently we have such a case in 
> IncrementalProcessingTest

Can you run anything if there is no JIT? I think what you have in 
`IncrementalProcessing.EmitCXXGlobalInitFunc` is 
`getGlobalInit(llvm::Module*)`, which checks for symbol names with a specific 
prefix.

> before we runCtors we make a pass over the LLVM IR and collect some specific 
> details and (possibly change the IR and then run).

The idiomatic solution for such modifications would use an IRTransformLayer as 
in:
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/examples/OrcV2Examples/LLJITWithOptimizingIRTransform/LLJITWithOptimizingIRTransform.cpp#L108

> Another example, which will show up in future patches, is the registration of 
> atexit handlers

`atexit` handlers as well as global ctors/dtors should be covered by LLJIT 
PlatformSupport. The LLJITBuilder will inject a GenericLLVMIRPlatformSupport 
instance by default:
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp#L125

It's not as comprehensive as e.g. the MachO implementation, but should be 
sufficient for your use-case as you have IR for all your JITed code. (It would 
NOT work if you cached object files, reloaded them in a subsequent session and 
wanted to run their ctors.) So, your below call to `initialize()` should do it 
already.

================
Comment at: clang/unittests/Interpreter/InterpreterTest.cpp:29
+  auto CI = cantFail(clang::IncrementalCompilerBuilder::create(ClangArgs));
+  return std::move(cantFail(clang::Interpreter::create(std::move(CI))));
+}
----------------
Warning: `std::move` prevents copy elision

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96033/new/

https://reviews.llvm.org/D96033

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Reply via email to