> BACKGROUND
> ========================================
> 
> We want to realize parallel INSERT SELECT in the following steps:
> 1) INSERT + parallel SELECT
> 2) Parallel INSERT + parallel SELECT
> 
> Below are example use cases.  We don't expect high concurrency or an empty
> data source.
> * Data loading (ETL or ELT) into an analytics database, typically a data ware
> house.
> * Batch processing in an OLTP database.
> 2) Enabling users to declare that the table allows parallel data modification 
> Add
> a table property that represents parallel safety of the table for DML 
> statement
> execution.  Users specify it as follows:
> 
> CREATE TABLE table_name (...) PARALLEL { UNSAFE | RESTRICTED | SAFE };
>     ALTER TABLE table_name PARALLEL { UNSAFE | RESTRICTED | SAFE };
> 
> This property is recorded in pg_class's relparallel column as 'u', 'r', or 
> 's', just
> like pg_proc's proparallel.  The default is UNSAFE.
> 
> The planner assumes that all of the table, its descendant partitions, and 
> their
> ancillary objects have the specified parallel safety or safer one.  The user 
> is
> responsible for its correctness.  If the parallel processes find an object 
> that is
> less safer than the assumed parallel safety during statement execution, it
> throws an ERROR and abort the statement execution.
> 
> When the parallel safety of some of these objects is changed, it's costly to
> reflect it on the parallel safety of tables that depend on them.  So, we 
> don't do
> it.  Instead, we provide a utility function 
> pg_get_parallel_safety('table_name')
> that returns records of (objid, classid, parallel_safety) that represent the
> parallel safety of objects that determine the parallel safety of the specified
> table.  The function only outputs objects that are not parallel safe.  
> Otherwise,
> it will consume excessive memory while accumulating the output.  The user
> can use this function to identify problematic objects when a parallel DML 
> fails
> or is not parallelized in an expected manner.
> 
> How does the executor detect parallel unsafe objects?  There are two ways:
> 
> 1) At loading time
> ...
> 2) At function execution time
> All related objects come down to some function execution.  So, add a parallel
> safety check there when in a parallel worker.  If the current process is a 
> parallel
> worker and the function is parallel unsafe, error out with ereport(ERROR).  
> This
> approach eliminates the oversight of parallel safety check with the additional
> bonus of tiny code change!
> 
> The place would be FunctionCallInvoke().  It's a macro in fmgr.h now.  Perhaps
> we should make it a function in fmgr.c, so that fmgr.h does not have to 
> include
> header files for parallelism-related definitions.
> 
> We have to evaluate the performance effect of converting FunctionCallInvoke()
> into a function and adding an if statement there, because it's a relatively
> low-level function.

Based on above, we plan to move forward with the apporache 2) (declarative 
idea).

Attatching the POC patchset which including the following:

0001: provide a utility function pg_get_parallel_safety('table_name').

  The function returns records of (objid, classid, parallel_safety) that 
represent
  the parallel safety of objects that determine the parallel safety of the 
specified table.
  Note: The function only outputs objects that are not parallel safe.
  (Thanks a lot for greg's previous work, most of the safety check code here is 
based on it)

0002: allow user use "ALTER TABLE PARALLEL SAFE/UNSAFE/RESTRICTED".

  Add proparallel column in pg_class and allow use to change its.

0003: detect parallel unsafe objects in executor.
      
  Currently we choose to check function's parallel safety at function execution 
time.
  We add safety check at FunctionCallInvoke(), but it may be better to check in 
fmgr_info_cxt_security.
  we are still discussing it in another thread[1].

  TODO: we currently skip checking built-in function's parallel safety, because 
we lack the information about built-in
  function's parallel safety, we cannot access pg_proc.proparallel in a low 
level because it could result in infinite recursion.
  Adding parallel property in fmgrBuiltin will enlarge the frequently accessed 
fmgr_builtins and lock down the value of the 
  parallel-safety flag. The solution is still under discussion. Suggestions and 
comments are welcome.
  
0004: fix some mislabeled function in testcase

  Since we check parallel safety of function at a low level, we found some 
functions marked as parallel unsafe will be
  executed in parallel mode in regression test when setting 
force_parallel_mode=regress. After checking, these functions
  are parallel safe, So , we plan to fix these function's parallel label.
  Note: we plan to take 0004 as a separate patch , see[2], I post 0004 here 
just to prevent some testcase failures.

The above are the POC patches, it could be imperfect for now and I am still 
working on improving it. 
Suggestions and comments about the design or code are very welcome and 
appreciated.

Best regards,
houzj

Attachment: v1-POC-0003-check-parallel-safety-in-fmgr.patch
Description: v1-POC-0003-check-parallel-safety-in-fmgr.patch

Attachment: v1-POC-0004-fix-testcase-with-wrong-parallel-safety-flag.patch
Description: v1-POC-0004-fix-testcase-with-wrong-parallel-safety-flag.patch

Attachment: v1-POC-0001-get-parallel-safety-function.patch
Description: v1-POC-0001-get-parallel-safety-function.patch

Attachment: v1-POC-0002-ALTER-TABLE-PARALLEL.patch
Description: v1-POC-0002-ALTER-TABLE-PARALLEL.patch

Reply via email to