On 26.01.2018 22:38, Andres Freund wrote:
And without it perf is not able to unwind stack trace for generated
code.
You can work around that by using --call-graph lbr with a sufficiently
new perf. That'll not know function names et al, but at least the parent
will be associated correctly.

With --call-graph lbr result is ... slightly different (see attached profile) but still there is "unknown" bar.

But you are compiling code using LLVMOrcAddEagerlyCompiledIR
and I find no way to pass no-omit-frame pointer option here.
It shouldn't be too hard to open code support for it, encapsulated in a
function:
     // Set function attribute "no-frame-pointer-elim" based on
     // NoFramePointerElim.
     for (auto &F : *Mod) {
       auto Attrs = F.getAttributes();
       StringRef Value(options.NoFramePointerElim ? "true" : "false");
       Attrs = Attrs.addAttribute(F.getContext(), AttributeList::FunctionIndex,
                                  "no-frame-pointer-elim", Value);
       F.setAttributes(Attrs);
     }
that's all that option did for mcjit.

I have implemented the following function:

void
llvm_no_frame_pointer_elimination(LLVMModuleRef mod)
{
    llvm::Module *module = llvm::unwrap(mod);
    for (auto &F : *module) {
        auto Attrs = F.getAttributes();
        Attrs = Attrs.addAttribute(F.getContext(), llvm::AttributeList::FunctionIndex,
                                   "no-frame-pointer-elim", "true");
        F.setAttributes(Attrs);
    }
}

and call it before LLVMOrcAddEagerlyCompiledIR in llvm_compile_module:

        llvm_no_frame_pointer_elimination(context->module);
        smod = LLVMOrcMakeSharedModule(context->module);

        if (LLVMOrcAddEagerlyCompiledIR(compile_orc, &orc_handle, smod,
                                        llvm_resolve_symbol, NULL))
        {
            elog(ERROR, "failed to jit module");
        }


... but it has no effect: produced profile is the same (with --call-graph dwarf).
May be you can point me on my mistake...


Actually I am trying to find answer for the question why your version of JIT provides ~2 times speedup at Q1, while ISPRAS version (https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf)
speedup Q1 is 5.5x times.
May be it is because them are using double type to calculate aggregates while as far as I understand you are using standard Postgres aggregate functions?
Or may be because ISPRAS version is not checking for NULL values...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to