[Crossposting to both gcc-patches and binutils lists, since this
patch kit touches both source trees].

Binutils devs: GCC 5 gained a way to build GCC as a shared library,
libgccjit.so.

I'm been experimenting with ways of optimizing libgccjit, and the
following patch kit (touching both gcc and binutils) achieves a 5x
speedup of
  gcc/testsuite/jit.dg/test-benchmark.c
on this x86_64 box (Fedora 20).

The benchmark constructs IR for a simple function in memory, compiles
it, and runs it, 100 times in a row, in the hope of simulating the
workload of an interpreter/VM/language runtime, where bytecode
functions gradually become "hot" (e.g. interpretation count exceeds
a threshold) and are compiled to machine code, all within one
process.

gcc's backend code emits .s files, and libgccjit currently use pex to
invoke the gcc driver to turn it from .s to a .so file (which in
turn invokes "as" and "ld").

These invocations dominate the time take by libgccjit, so the patch
series attempts to time them, and to move them in-process; doing
so largely eliminates the cost of them.

Here are the performance gains:

jit.dg/test-benchmark.c, 100 iterations at optlevel 0:
 Without embedded driver:      wallclock of 5.300s (0.053s per iteration)
 With embedded driver:         wallclock of 4.630s (0.046s per iteration)
 With embedded driver & gas:   wallclock of 3.510s (0.035s per iteration)
 With embedded driver&as&ld:   wallclock of 2.130s (0.021s per iteration)
 As above, hacking up ld args: wallclock of 1.030s (0.010s per iteration)

i.e. about 5x speedup.

There are some memory leaks, FIXMEs, etc, and it hasn't been fully
tested yet, but I thought it was time to post this for discussion.

The patch kit also generalizes gcc's timevar mechanism in such a way
that it can be used both by jit client code, and by "as" and "ld".  An
example of a combined report on the accumulated timings of 100
iterations of jit.dg/test-benchmark.c at optlevel 0:

Execution times (seconds)
Client items:
 test_jit                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 create_code             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 compile                 :   0.21 (30%) usr   0.13 (45%) sys   0.25 (25%) wall  
 14939 kB (74%) ggc
 verify_code             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
GCC items:
 phase setup             :   0.15 (22%) usr   0.02 ( 7%) sys   0.15 (15%) wall  
 10661 kB (53%) ggc
 phase parsing           :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall  
   653 kB ( 3%) ggc
 callgraph construction  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall  
   242 kB ( 1%) ggc
 callgraph optimization  :   0.01 ( 1%) usr   0.01 ( 3%) sys   0.01 ( 1%) wall  
   142 kB ( 1%) ggc
 cfg construction        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
    17 kB ( 0%) ggc
 cfg cleanup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 df live regs            :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
    23 kB ( 0%) ggc
 register information    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 parser (global)         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
   199 kB ( 1%) ggc
 tree eh                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
   196 kB ( 1%) ggc
 tree operand scan       :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.00 ( 0%) wall  
   100 kB ( 0%) ggc
 out of ssa              :   0.00 ( 0%) usr   0.02 ( 7%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 expand                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
   398 kB ( 2%) ggc
 loop init               :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
    67 kB ( 0%) ggc
 integrated RA           :   0.07 (10%) usr   0.02 ( 7%) sys   0.02 ( 2%) wall  
  2468 kB (12%) ggc
 LRA virtuals elimination:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
    56 kB ( 0%) ggc
 machine dep reorg       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 shorten branches        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall  
     0 kB ( 0%) ggc
 final                   :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
   216 kB ( 1%) ggc
 initialize rtl          :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
    12 kB ( 0%) ggc
 rest of compilation     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall  
   232 kB ( 1%) ggc
 unaccounted todo        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall  
     0 kB ( 0%) ggc
 replay of JIT client activity:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) 
wall     309 kB ( 2%) ggc
 driver                  :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 driver: setup           :   0.04 ( 6%) usr   0.00 ( 0%) sys   0.06 ( 6%) wall  
     0 kB ( 0%) ggc
 driver: do spec on infiles:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) 
wall       0 kB ( 0%) ggc
 driver: run linker      :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.02 ( 2%) wall  
     0 kB ( 0%) ggc
 driver: embedded assembler:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) 
wall       0 kB ( 0%) ggc
 driver: embedded linker :   0.04 ( 6%) usr   0.02 ( 7%) sys   0.04 ( 4%) wall  
     0 kB ( 0%) ggc
 load JIT result         :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
Embedded 'as':
 gas_main                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 before pass             :   0.03 ( 4%) usr   0.02 ( 7%) sys   0.13 (13%) wall  
     0 kB ( 0%) ggc
 perform_an_assembly_pass:   0.06 ( 9%) usr   0.01 ( 3%) sys   0.06 ( 6%) wall  
     0 kB ( 0%) ggc
 after pass              :   0.04 ( 6%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall  
     0 kB ( 0%) ggc
 cleanup                 :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall  
     0 kB ( 0%) ggc
Embedded 'ld':
 ld_internal_main: init  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 ldmain.c: lang_final    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 ldmain.c: lang_process  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 lang_process: 1st half  :   0.00 ( 0%) usr   0.02 ( 7%) sys   0.02 ( 2%) wall  
     0 kB ( 0%) ggc
 open_output             :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 open_input_bfds         :   0.01 ( 1%) usr   0.02 ( 7%) sys   0.01 ( 1%) wall  
     0 kB ( 0%) ggc
 lang_input_statement_enum:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
      0 kB ( 0%) ggc
 open_input_bfds:load_symbols:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
wall       0 kB ( 0%) ggc
 load_symbols: ldfile_open_file:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
wall       0 kB ( 0%) ggc
 ldlang_add_file         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 load_symbols: bfd_link_add_symbols:   0.02 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 
0%) wall       0 kB ( 0%) ggc
 lang_process: 2nd half  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 4%) wall  
     0 kB ( 0%) ggc
 ldmain.c: ldwrite       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall  
     0 kB ( 0%) ggc
 ld_main cleanup         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
     0 kB ( 0%) ggc
 TOTAL                 :   0.69             0.29             0.99              
20298 kB

Thoughts?

-- 
1.8.5.3

Reply via email to