On 26.01.2018 22:38, Andres Freund wrote:
And without it perf is not able to unwind stack trace for generated
code.
You can work around that by using --call-graph lbr with a sufficiently
new perf. That'll not know function names et al, but at least the parent
will be associated correctly.
With --call-graph lbr result is ... slightly different (see attached
profile) but still there is "unknown" bar.
But you are compiling code using LLVMOrcAddEagerlyCompiledIR
and I find no way to pass no-omit-frame pointer option here.
It shouldn't be too hard to open code support for it, encapsulated in a
function:
// Set function attribute "no-frame-pointer-elim" based on
// NoFramePointerElim.
for (auto &F : *Mod) {
auto Attrs = F.getAttributes();
StringRef Value(options.NoFramePointerElim ? "true" : "false");
Attrs = Attrs.addAttribute(F.getContext(), AttributeList::FunctionIndex,
"no-frame-pointer-elim", Value);
F.setAttributes(Attrs);
}
that's all that option did for mcjit.
I have implemented the following function:
void
llvm_no_frame_pointer_elimination(LLVMModuleRef mod)
{
llvm::Module *module = llvm::unwrap(mod);
for (auto &F : *module) {
auto Attrs = F.getAttributes();
Attrs = Attrs.addAttribute(F.getContext(),
llvm::AttributeList::FunctionIndex,
"no-frame-pointer-elim", "true");
F.setAttributes(Attrs);
}
}
and call it before LLVMOrcAddEagerlyCompiledIR in llvm_compile_module:
llvm_no_frame_pointer_elimination(context->module);
smod = LLVMOrcMakeSharedModule(context->module);
if (LLVMOrcAddEagerlyCompiledIR(compile_orc, &orc_handle, smod,
llvm_resolve_symbol, NULL))
{
elog(ERROR, "failed to jit module");
}
... but it has no effect: produced profile is the same (with
--call-graph dwarf).
May be you can point me on my mistake...
Actually I am trying to find answer for the question why your version of
JIT provides ~2 times speedup at Q1, while ISPRAS version
(https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf)
speedup Q1 is 5.5x times.
May be it is because them are using double type to calculate aggregates
while as far as I understand you are using standard Postgres aggregate
functions?
Or may be because ISPRAS version is not checking for NULL values...
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company