Re: JIT compiling with LLVM v12.2

John Naylor Thu, 29 Mar 2018 05:58:34 -0700

Hi Andres,
I spent some time over pouring over the JIT README, and I've attached
a patch with some additional corrections as well as some stylistic
suggestions. The latter may be debatable, but I'm sure you can take
and pick as you see fit. If there are cases where I misunderstood your
intent, maybe that's also useful information. :-)


-John Naylor

diff --git a/src/backend/jit/README b/src/backend/jit/README
index bfed319..7924127 100644
--- a/src/backend/jit/README
+++ b/src/backend/jit/README
@@ -13,12 +13,12 @@ the CPU that just handles that expression, yielding a speedup.
 That this is done at query execution time, possibly even only in cases
 the relevant task is done a number of times, makes it JIT, rather than
 ahead-of-time (AOT). Given the way JIT compilation is used in
-postgres, the lines between interpretation, AOT and JIT are somewhat
+PostgreSQL, the lines between interpretation, AOT and JIT are somewhat
 blurry.
 
 Note that the interpreted program turned into a native program does
 not necessarily have to be a program in the classical sense. E.g. it
-is highly beneficial JIT compile tuple deforming into a native
+is highly beneficial to JIT compile tuple deforming into a native
 function just handling a specific type of table, despite tuple
 deforming not commonly being understood as a "program".
 
@@ -26,7 +26,7 @@ deforming not commonly being understood as a "program".
 Why JIT?
 ========
 
-Parts of postgres are commonly bottlenecked by comparatively small
+Parts of PostgreSQL are commonly bottlenecked by comparatively small
 pieces of CPU intensive code. In a number of cases that is because the
 relevant code has to be very generic (e.g. handling arbitrary SQL
 level expressions, over arbitrary tables, with arbitrary extensions
@@ -49,11 +49,11 @@ particularly beneficial for removing branches during tuple deforming.
 How to JIT
 ==========
 
-Postgres, by default, uses LLVM to perform JIT. LLVM was chosen
+PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
 because it is developed by several large corporations and therefore
 unlikely to be discontinued, because it has a license compatible with
-PostgreSQL, and because its LLVM IR can be generated from C
-using the clang compiler.
+PostgreSQL, and because its IR can be generated from C using the Clang
+compiler.
 
 
 Shared Library Separation
@@ -68,20 +68,20 @@ An additional benefit of doing so is that it is relatively easy to
 evaluate JIT compilation that does not use LLVM, by changing out the
 shared library used to provide JIT compilation.
 
-To achieve this code, e.g. expression evaluation, intending to perform
-JIT, calls a LLVM independent wrapper located in jit.c to do so. If
-the shared library providing JIT support can be loaded (i.e. postgres
-was compiled with LLVM support and the shared library is installed),
-the task of JIT compiling an expression gets handed of to shared
-library. This obviously requires that the function in jit.c is allowed
-to fail in case no JIT provider can be loaded.
+To achieve this, code intending to perform JIT (e.g. expression evaluation)
+calls an LLVM independent wrapper located in jit.c to do so. If the
+shared library providing JIT support can be loaded (i.e. PostgreSQL was
+compiled with LLVM support and the shared library is installed), the task
+of JIT compiling an expression gets handed of to the shared library. This
+obviously requires that the function in jit.c is allowed to fail in case
+no JIT provider can be loaded.
 
 Which shared library is loaded is determined by the jit_provider GUC,
 defaulting to "llvmjit".
 
 Cloistering code performing JIT into a shared library unfortunately
 also means that code doing JIT compilation for various parts of code
-has to be located separately from the code doing so without
+has to be located separately from the code that executes without
 JIT. E.g. the JITed version of execExprInterp.c is located in
 jit/llvm/ rather than executor/.
 
@@ -105,17 +105,21 @@ implementations.
 
 Emitting individual functions separately is more expensive than
 emitting several functions at once, and emitting them together can
-provide additional optimization opportunities. To facilitate that the
-LLVM provider separates function definition from emitting them in an
-executable way.
+provide additional optimization opportunities. To facilitate that, the
+LLVM provider separates function definition (LLVM IR) from function
+emission (executable mmap()ed segments).
 
 Creating functions into the current mutable module (a module
 essentially is LLVM's equivalent of a translation unit in C) is done
 using
+
   extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);
+
 in which it then can emit as much code using the LLVM APIs as it
 wants. Whenever a function actually needs to be called
+
   extern void *llvm_get_function(LLVMJitContext *context, const char *funcname);
+
 returns a pointer to it.
 
 E.g. in the expression evaluation case this setup allows most
@@ -127,12 +131,12 @@ used.
 Error Handling
 --------------
 
-There are two aspects to error handling.  Firstly, generated (LLVM IR)
-and emitted functions (mmap()ed segments) need to be cleaned up both
-after a successful query execution and after an error. This is done by
-registering each created JITContext with the current resource owner,
-and cleaning it up on error / end of transaction. If it is desirable
-to release resources earlier, jit_release_context() can be used.
+There are two aspects of error handling.  Firstly, generated and
+emitted functions need to be cleaned up both after a successful query
+execution and after an error. This is done by registering each created
+JITContext with the current resource owner, and cleaning it up on error /
+end of transaction. If it is desirable to release resources earlier,
+jit_release_context() can be used.
 
 The second, less pretty, aspect of error handling is OOM handling
 inside LLVM itself. The above resowner based mechanism takes care of
@@ -140,14 +144,18 @@ cleaning up emitted code upon ERROR, but there's also the chance that
 LLVM itself runs out of memory. LLVM by default does *not* use any C++
 exceptions. Its allocations are primarily funneled through the
 standard "new" handlers, and some direct use of malloc() and
-mmap(). For the former a 'new handler' exists
-http://en.cppreference.com/w/cpp/memory/new/set_new_handler for the
-latter LLVM provides callback that get called upon failure
-(unfortunately mmap() failures are treated as fatal rather than OOM
-errors).  What we've, for now, chosen to do, is to have two functions
-that LLVM using code must use:
+mmap(). For the former a 'new handler' exists:
+
+http://en.cppreference.com/w/cpp/memory/new/set_new_handler
+
+For the latter LLVM provides callbacks that get called upon failure
+(unfortunately mmap() failures are treated as fatal rather than OOM errors).
+What we've chosen to do for now is have two functions that LLVM using code
+must use:
+
 extern void llvm_enter_fatal_on_oom(void);
 extern void llvm_leave_fatal_on_oom(void);
+
 before interacting with LLVM code.
 
 When a libstdc++ new or LLVM error occurs, the handlers set up by the
@@ -156,11 +164,11 @@ than ERROR, as we *cannot* reliably throw ERROR inside a foreign
 library without risking corrupting its internal state.
 
 Users of the above sections do *not* have to use PG_TRY/CATCH blocks,
-the handlers instead are reset on toplevel sigsetjmp() level.
+the handlers are reset at the toplevel sigsetjmp() level instead.
 
 Using a relatively small enter/leave protected section of code, rather
 than setting up these handlers globally, avoids negative interactions
-with extensions that might use C++ like e.g. postgis. As LLVM code
+with extensions that might use C++ such as PostGIS. As LLVM code
 generation should never execute arbitrary code, just setting these
 handlers temporarily ought to suffice.
 
@@ -168,9 +176,9 @@ handlers temporarily ought to suffice.
 Type Synchronization
 --------------------
 
-To able to generate code performing tasks that are done in "interpreted"
-postgres, it obviously is required that code generation knows about at
-least a few postgres types.  While it is possible to inform LLVM about
+To able to generate code that can perform tasks done by "interpreted"
+PostgreSQL, it is obviously required that code generation knows about at
+least a few PostgreSQL types.  While it is possible to inform LLVM about
 type definitions by recreating them manually in C code, that is failure
 prone and labor intensive.
 
@@ -178,13 +186,15 @@ Instead there is one small file (llvmjit_types.c) which references each of
 the types required for JITing. That file is translated to bitcode at
 compile time, and loaded when LLVM is initialized in a backend.
 
-That works very well to synchronize the type definition, unfortunately
+That works very well to synchronize the type definition, but unfortunately
 it does *not* synchronize offsets as the IR level representation doesn't
-know field names.  Instead required offsets are maintained as defines in
-the original struct definition. E.g.
+know field names.  Instead, required offsets are maintained as defines in
+the original struct definition, like so:
+
 #define FIELDNO_TUPLETABLESLOT_NVALID 9
         int                     tts_nvalid;             /* # of valid values in tts_values */
-while that still needs to be defined, it's only required for a
+
+While that still needs to be defined, it's only required for a
 relatively small number of fields, and it's bunched together with the
 struct definition, so it's easily kept synchronized.
 
@@ -193,23 +203,25 @@ Inlining
 --------
 
 One big advantage of JITing expressions is that it can significantly
-reduce the overhead of postgres's extensible function/operator
-mechanism, by inlining the body of called functions / operators.
+reduce the overhead of PostgreSQL's extensible function/operator
+mechanism, by inlining the body of called functions/operators.
 
-It obviously is undesirable to maintain a second implementation of
+It is obviously undesirable to maintain a second implementation of
 commonly used functions, just for inlining purposes. Instead we take
-advantage of the fact that the clang compiler can emit LLVM IR.
+advantage of the fact that the Clang compiler can emit LLVM IR.
 
 The ability to do so allows us to get the LLVM IR for all operators
 (e.g. int8eq, float8pl etc), without maintaining two copies.  These
 bitcode files get installed into the server's
   $pkglibdir/bitcode/postgres/
-Using existing LLVM functionality (for parallel LTO compilation),
-additionally an index is over these is stored to
-$pkglibdir/bitcode/postgres.index.bc
+
+To use existing LLVM functionality for parallel LTO compilation,
+additionally an index over these is stored in
+  $pkglibdir/bitcode/postgres.index.bc
 
 Similarly extensions can install code into
   $pkglibdir/bitcode/[extension]/
+
 accompanied by
   $pkglibdir/bitcode/[extension].index.bc
 
@@ -225,7 +237,7 @@ Caching
 Currently it is not yet possible to cache generated functions, even
 though that'd be desirable from a performance point of view. The
 problem is that the generated functions commonly contain pointers into
-per-execution memory. The expression evaluation functionality needs to
+per-execution memory. The expression evaluation machinery needs to
 be redesigned a bit to avoid that. Basically all per-execution memory
 needs to be referenced as an offset to one block of memory stored in
 an ExprState, rather than absolute pointers into memory.
@@ -247,13 +259,13 @@ What to JIT
 ===========
 
 Currently expression evaluation and tuple deforming are JITed. Those
-were chosen because they commonly are major CPU bottlenecks in
+were chosen because they are commonly major CPU bottlenecks in
 analytics queries, but are by no means the only potentially beneficial cases.
 
 For JITing to be beneficial a piece of code first and foremost has to
 be a CPU bottleneck. But also importantly, JITing can only be
-beneficial if overhead can be removed by doing so. E.g. in the tuple
-deforming case the knowledge about the number of columns and their
+beneficial if overhead can be removed by doing so. In the tuple
+deforming case the knowledge about the number of columns and their
 types can remove a significant number of branches, and in the
 expression evaluation case a lot of indirect jumps/calls can be
 removed.  If neither of these is the case, JITing is a waste of
@@ -278,11 +290,11 @@ Currently there are a number of GUCs that influence JITing:
 - jit_inline_above_cost = -1, 0-DBL_MAX - inlining is tried if query has
   higher cost.
 
-whenever a query's total cost is above these limits, JITing is
+Whenever a query's total cost is above these limits, JITing is
 performed.
 
 Alternative costing models, e.g. by generating separate paths for
-parts of a query with lower cpu_* costs, are also a possibility, but
+parts of a query with lower CPU costs, are also a possibility, but
 it's doubtful the overhead of doing so is sufficient.  Another
 alternative would be to count the number of times individual
 expressions are estimated to be evaluated, and perform JITing of these
@@ -291,5 +303,5 @@ individual expressions.
 The obvious seeming approach of JITing expressions individually after
 a number of execution turns out not to work too well. Primarily
 because emitting many small functions individually has significant
-overhead. Secondarily because the time till JITing occurs causes
+overhead. Secondarily because the time until JITing occurs causes
 relative slowdowns that eat into the gain of JIT compilation.

Re: JIT compiling with LLVM v12.2

Reply via email to