On Tue, Dec 8, 2020 at 6:27 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > Hi, > > Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH > Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer > slots allocation, maintenance, decision to flush and clean up, need to > be implemented outside the table_multi_insert() API. The main problem > is that it fails to take into consideration the underlying storage > engine capabilities, for more details of this point refer to a > discussion in multi inserts in CTAS thread[1]. This also creates a lot > of duplicate code which is more error prone and not maintainable. > > More importantly, in another thread [3] @Andres Freund suggested to > have table insert APIs in such a way that they look more like 'scan' > APIs i.e. insert_begin, insert, insert_end. The main advantages doing > this are(quoting from his statement in [3]) - "more importantly it'd > allow an AM to optimize operations across multiple inserts, which is > important for column stores." > > I propose to introduce new table access methods for both multi and > single inserts based on the prototype suggested by Andres in [3]. Main > design goal of these new APIs is to give flexibility to tableam > developers in implementing multi insert logic dependent on the > underlying storage engine. > > Below are the APIs. I suggest to have a look at > v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details > of the new data structure and the API functionality. Note that > temporarily I used XX_v2, we can change it later. > > TableInsertState* table_insert_begin(initial_args); > void table_insert_v2(TableInsertState *state, TupleTableSlot *slot); > void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot); > void table_multi_insert_flush(TableInsertState *state); > void table_insert_end(TableInsertState *state); > > I'm attaching a few patches(just to show that these APIs work, avoids > a lot of duplicate code and makes life easier). Better commenting can > be added later. If these APIs and patches look okay, we can even > consider replacing them in other places such as nodeModifyTable.c and > so on. > > v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch ---> > introduces new table access methods for multi and single inserts. Also > implements/rearranges the outside code for heap am into these new > APIs. > v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch > ---> adds new multi insert table access methods to CREATE TABLE AS, > CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW. > v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds > new single insert table access method to ALTER TABLE rewrite table > code. > v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds > new single and multi insert table access method to COPY code. > > Thoughts? > > [1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com > [2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com > [3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de
Added this to commitfest to get it reviewed further. https://commitfest.postgresql.org/31/2871/ With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com