Note: Changing the subject as I felt the topic has diverted from the original reported case and also it might help others to pay attention.
On Wed, Jul 28, 2021 at 8:22 AM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > > > Consider below ways to allow the user to specify the parallel-safety option: > > > > (a) > > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } > > ... > > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. > > > > OR > > > > (b) > > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) > > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) > > Personally, I think the approach (a) might be better. Since it's similar to > ALTER FUNCTION PARALLEL XXX which user might be more familiar with. > Okay, and I think for (b) true/false won't be sufficient because one might want to specify restricted. > Besides, I think we need a new default value about parallel dml safety. Maybe > 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is > likely to alter the safety to the default value to get the automatic safety > check, a independent default value can make it more clear. > Hmm, but auto won't work for partitioned tables, right? If so, that might appear like an inconsistency to the user and we need to document the same. Let me summarize the discussion so far in this thread so that it is helpful to others. We would like to parallelize INSERT SELECT (first step INSERT + parallel SELECT and then Parallel (INSERT + SELECT)) and for that, we have explored a couple of ways. The first approach is to automatically detect if it is safe to parallelize insert and then do it without user intervention. To detect automatically, we need to determine the parallel-safety of various expressions (like default column expressions, check constraints, index expressions, etc.) at the planning time which can be costly but we can avoid most of the cost if we cache the parallel safety for the relation. So, the cost needs to be paid just once. Now, we can't cache this for partitioned relations because it can be very costly (as we need to lock all the partitions) and has deadlock risks (while processing invalidation), this has been explained in email [1]. Now, as we can't think of a nice way to determine parallel safety automatically for partitioned relations, we thought of providing an option to the user. The next thing to decide here is that if we are providing an option to the user in one of the ways as mentioned above in the email, what should we do if the user uses that option for non-partitioned relations, shall we just ignore it or give an error that this is not a valid syntax/option? The one idea which Dilip and I are advocating is to respect the user's input for non-partitioned relations and if it is not given then compute the parallel safety and cache it. To facilitate users for providing a parallel-safety option, we are thinking to provide a utility function "pg_get_table_parallel_dml_safety(regclass)" that returns records of (objid, classid, parallel_safety) for all parallel unsafe/restricted table-related objects from which the table's parallel DML safety is determined. This will allow user to identify unsafe objects and if the required user can change the parallel safety of required functions and then use the parallel safety option for the table. Thoughts? Note - This topic has been discussed in another thread as well [2] but as many of the key technical points have been discussed here I thought it is better to continue here. [1] - https://www.postgresql.org/message-id/CAA4eK1Jwz8xGss4b0-33eyX0i5W_1CnqT16DjB9snVC--DoOsQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.