> > It is quite possible what you are saying is correct but I feel that is > > not this patch's fault. So, won't it better to discuss this in a > > separate thread? > > > > Good use case but again, I think this can be done as a separate patch. > > Agreed. > I think even the current patch offers great benefits and can be committed in > PG > 14, even if all my four feedback comments are left unaddressed. I just > touched > on them for completeness in terms of typically expected use cases. They will > probably be able to be implemented along the current design. > > > > > I think here you are talking about the third patch (Parallel Inserts). > > I guess if one has run inserts parallelly from psql then also similar > > behavior would have been observed. For tables, it might lead to better > > results once we have the patch discussed at [1]. Actually, this needs > > more investigation. > > > > [1] - > > > https://www.postgresql.org/message-id/20200508072545.GA9701%40telsas > o > > ft.com > > That looks interesting and worth a try.
Hi, I test the bitmapscan with both multi-insert patch and parallel insert patch applied. But the performance degradation and table size increased still happened in my machine. To better analyze this issue, I did some more research on it (only applied parallel insert patch) I add some code to track the time spent in index operation. From the results[1], we can see more workers will bring more cost in _bt_search_insert() in each worker. After debugged, the most cost part is the following: ----- /* drop the read lock on the page, then acquire one on its child */ *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access); ----- It seems the order of parallel bitmap scan's result will result in more lock time in parallel insert. [1]---------------parallel bitmap scan------------------ worker 0: psql:test.sql:10: INFO: insert index _bt_search_insert time:834735 psql:test.sql:10: INFO: insert index total time:1895330 psql:test.sql:10: INFO: insert tuple time:628064 worker 2: psql:test.sql:10: INFO: insert index _bt_search_insert time:1552242 psql:test.sql:10: INFO: insert index total time:2374741 psql:test.sql:10: INFO: insert tuple time:314571 worker 4: psql:test.sql:10: INFO: insert index _bt_search_insert time:2496424 psql:test.sql:10: INFO: insert index total time:3016150 psql:test.sql:10: INFO: insert tuple time:211741 ---------------------------- Based on above, I tried to change the order of results that bitmapscan return. In the original test, we prepare data in order (like: generate_series(1,10000,1)), If we change the order we insert the data in the source table, the performance degradation will not always happen[2]. And table size difference will be small. -------------------out of order source table----------------------------- insert into x(a,b,c) select i,i+1,i+2 from generate_series(1,600000000) as t(i) order by random(); ---------------------------------------------------------------------------- Test results when source table out of order(using bitmap heap scan): [2]-------------------------------------------------------- Worker 0: Execution Time: 37028.006 ms Worker 2: Execution Time: 11355.153 ms Worker 4: Execution Time: 9273.398 ms -------------------------------------------------------- So, this performance degradation issue seems related on the order of the data in the source table. It does not always happen. Do we need to do some specific fix for it ? For multi-insert, I guess the reason why it does not solve the performance problem is that we do not actually have a api for multi-index insert, Like the api for tableam rd_tableam->multi_insert(), so we still execute ExecInsertIndexTuples in a loop for the multi index insert. I plan to do some more test for multi-insert and parallel insert with out of order source table. Best regards, houzj