Hi, while looking at the other thread related to postgres_fdw batching [1] and testing with very large batches, I noticed this disappointing behavior when inserting 1M rows (just integers, nothing fancy):
no batching: 64782 ms 100 rows: 2118 ms 32767 rows: 41115 ms Pretty nice improvement when batching 100 rows, but then it all goes wrong for some reason. The problem is pretty obvious from a perf profile: --100.00%--ExecModifyTable | --99.70%--ExecInsert | |--50.87%--MakeSingleTupleTableSlot | | | --50.85%--MakeTupleTableSlot | | | --50.70%--IncrTupleDescRefCount | | | --50.69%--ResourceOwnerRememberTupleDesc | | | --50.69%--ResourceArrayAdd | |--48.18%--ExecBatchInsert | | | --47.92%--ExecDropSingleTupleTableSlot | | | |--47.17%--DecrTupleDescRefCount | | | | | --47.15%--ResourceOwnerForgetTupleDesc | | | | | --47.14%--ResourceArrayRemove | | | --0.53%--ExecClearTuple | --0.60%--ExecCopySlot There are two problems at play, here. Firstly, the way it's coded now the slots are pretty much re-created for each batch. So with 1M rows and batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful, and it shouldn't be too difficult to keep the slots across batches. (We can't initialize all the slots in advance, because we don't know how many will be needed, but we don't have to release them between batches.) The other problem is that ResourceArrayAdd/Remove seem to behave a bit poorly with very many elements - I'm not sure if it's O(N^2) or worse, but growing the array and linear searches seem to be a bit expensive. I'll take a look at fixing the first point, but I'm not entirely sure how much will that improve the situation. regards [1] https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company