On 18 July 2018 at 20:29, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > Let me know what you think of the code in the updated patch.
Thanks for sending the updated patch. I looked over it tonight and made a number of changes: 1) Got rid of PARTITION_ROUTING_MAXSIZE. The code using this was useless since the int would have wrapped long before it reached UINT_MAX. There's no shortage of other code doubling the size of an array by multiplying it by 2 unconditionally without considering overflowing an int. Unsure why you considered this more risky. 2) Fixed a series of bugs regarding the size of the arrays in PartitionTupleRouting. The map arrays and the partitions array could differ in size despite your comment that claimed child_parent_tupconv_maps was the same size as 'partitions' when non-NULL. The map arrays being a different size than the partitions array caused the following two cases to segfault. I've included two cases as it was two seperate bugs that caused them. -- case 1 drop table listp; create table listp (a int, b int) partition by list (a); create table listp1 partition of listp for values in (1); create table listp2 partition of listp for values in (2); create table listp3 partition of listp for values in (3); create table listp4 partition of listp for values in (4); create table listp5 partition of listp for values in (5); create table listp6 partition of listp for values in (6); create table listp7 partition of listp for values in (7); create table listp8 partition of listp for values in (8); create table listp9 (b int, a int); alter table listp attach partition listp9 for values in(9); insert into listp select generate_series(1,9); -- case 2 drop table listp; create table listp (a int, b int) partition by list (a); create table listp1 (b int, a int); alter table listp attach partition listp1 for values in(1); create table listp1 partition of listp for values in (1); create table listp2 partition of listp for values in (2); create table listp3 partition of listp for values in (3); create table listp4 partition of listp for values in (4); create table listp5 partition of listp for values in (5); create table listp6 partition of listp for values in (6); create table listp7 partition of listp for values in (7); create table listp8 partition of listp for values in (8); create table listp9 partition of listp for values in (9); insert into listp select generate_series(1,9); 3) Got rid of ExecUseUpdateResultRelForRouting. I started to change this to remove references to UPDATE in order to make it more friendly towards other possible future node types that it would get used for (aka MERGE). In the end, I found that performance could regress when in cases like: drop table listp; create table listp (a int) partition by list(a); \o /dev/null \timing off select 'create table listp'||x::Text||' partition of listp for values in('||x::Text||');' from generate_series(1,1000) x; \gexec \o insert into listp select x from generate_series(1,999) x; \timing on update listp set a = a+1; It's true that UPDATEs with a large number of subplans performance is quite terrible today in the planner, but this code made the performance of planning+execution a bit worse. If we get around to fixing the inheritance planner then I think ExecUseUpdateResultRelForRouting() could easily appear in profiles. I ended up rewriting it to just get called once and build a hash table by Oid storing a ResultRelInfo pointer. This also gets rid of the slow nested loop in the cleanup operation inside ExecCleanupTupleRouting(). 4) Did some tuning work in ExecFindPartition() getting rid of a redundant check after the loop completion. Also added some likely() and unlikely() decorations around some conditions. 5) Updated some newly out-dated comments since your patch in execPartition.h. 6) Replaced the palloc0() in ExecSetupPartitionTupleRouting() with a palloc() updating the few fields that were not initialised. This might save a few TPS (at least once we get rid of the all partition locking) in the single-row INSERT case, but I've not tested the performance of this yet. 7) Also moved and edited some comments above ExecSetupPartitionTupleRouting() that I felt explained a little too much about some internal implementation details. One thing that I thought of, but didn't do was just having ExecFindPartition() return the ResultRelInfo. I think it would be much nicer in both call sites to not have to check the ->partitions array to get that. The copy.c call site would need a few modifications around the detection code to see if the partition has changed, but it all looks quite possible to change. I left this for now as I have another patch which touches all that code that I feel is closer to commit than this patch is. I've attached a delta of the changes I made since your v2 delta and also a complete updated patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
v2-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch
Description: Binary data
v2_insert_speedups_delta.patch
Description: Binary data