Hi, Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer slots allocation, maintenance, decision to flush and clean up, need to be implemented outside the table_multi_insert() API. The main problem is that it fails to take into consideration the underlying storage engine capabilities, for more details of this point refer to a discussion in multi inserts in CTAS thread[1]. This also creates a lot of duplicate code which is more error prone and not maintainable.
More importantly, in another thread [3] @Andres Freund suggested to have table insert APIs in such a way that they look more like 'scan' APIs i.e. insert_begin, insert, insert_end. The main advantages doing this are(quoting from his statement in [3]) - "more importantly it'd allow an AM to optimize operations across multiple inserts, which is important for column stores." I propose to introduce new table access methods for both multi and single inserts based on the prototype suggested by Andres in [3]. Main design goal of these new APIs is to give flexibility to tableam developers in implementing multi insert logic dependent on the underlying storage engine. Below are the APIs. I suggest to have a look at v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details of the new data structure and the API functionality. Note that temporarily I used XX_v2, we can change it later. TableInsertState* table_insert_begin(initial_args); void table_insert_v2(TableInsertState *state, TupleTableSlot *slot); void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot); void table_multi_insert_flush(TableInsertState *state); void table_insert_end(TableInsertState *state); I'm attaching a few patches(just to show that these APIs work, avoids a lot of duplicate code and makes life easier). Better commenting can be added later. If these APIs and patches look okay, we can even consider replacing them in other places such as nodeModifyTable.c and so on. v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch ---> introduces new table access methods for multi and single inserts. Also implements/rearranges the outside code for heap am into these new APIs. v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch ---> adds new multi insert table access methods to CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW. v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds new single insert table access method to ALTER TABLE rewrite table code. v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds new single and multi insert table access method to COPY code. Thoughts? Many thanks to Robert, Vignesh and Dilip for offlist discussion. [1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com [2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com [3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch
Description: Binary data
v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch
Description: Binary data
v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch
Description: Binary data
v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
Description: Binary data