On Sat, 9 Mar 2019 at 13:18, David Rowley <david.row...@2ndquadrant.com> wrote: > This is something I'd like to look into for v13. I think it'll be > much easier to do if we can get your List reimplementation patch in > first. That's going to allow list_nth on PlannerInfo->eq_classes to > work quickly and will remove the need for having to build an array to > store the classes, and as you mention, RelOptInfo could store a > Bitmapset to store which ECs have members for this rel. I've > modified the CF entry to target v13.
I started looking at this again and I came up with the attached eclass_indexes_v2.patch. This modifies the original patch significantly so that we no longer build this big eclass index when we think we might start to need it. Instead, I've added a Bitmapset field to RelOptInfo named "eclass_indexes". During add_eq_member() we just note down the PlannerInfo->eq_classes index of the eclass we're adding to in each RelOptInfo->eclass_indexes that's involved in the new eclass member being added. This means we can use the Bitmapset lookup anytime we like. That allowed me to get rid of those special cases I had added to make use of the index when it exists. I've now modified all the existing code to make use of the RelOptInfo->eclass_indexes field. As mentioned above, my thoughts on this patch were that this new method would only be any good once we got Tom's list implementation patch as that makes list_nth() O(1) instead of O(N). On benchmarking this I was quite surprised that it still improves performance quite a bit even without the list reimplementation patch. Using Tom's test case [1], I get the following: * master (82a5649fb) latency average = 6992.320 ms latency average = 6990.297 ms latency average = 7030.619 ms * master + eclass_indexes_v2 latency average = 2537.536 ms latency average = 2530.824 ms latency average = 2543.770 ms If I add in Tom's list reimplementation too, I get: * master + Tom's list reimplementation v3 + eclass_indexes latency average = 1225.910 ms latency average = 1209.565 ms latency average = 1207.326 ms I really didn't expect just this patch alone to speed this up as it calls list_nth() in a loop. I can only imagine that with this workload that these list_nth() loops are not performing many loops, otherwise, the performance would die off quickly. I thought about how we could defend against workloads where there's a large number of eclasses and most match a given relation. I ended up with a small function named list_skip_forward() that simply keeps looping for N times eating N ListCells into the given ListCell. This does improve performance a little bit more, but probably, more importantly, should be a good defence against the worst case. As is, the function would become completely redundant with Tom's list reimplementation patch, so for now, I just snuck it in as a static function in equivclass.c. I've attached this list_skip_forward.patch too. This patch is a bit rough around the edges as I only just started playing with it in the past 30 mins or so. Perhaps it shows that we might be able to fix this in PG12 and not have to wait for the list reimplementation at all. The performance with both attached patches is: * master + eclass_indexes + list_skip_forward_v0 latency average = 2375.383 ms latency average = 2351.450 ms latency average = 2339.259 ms Still nowhere near as good as with the list reimplementation patch, but way better than master. [1] https://www.postgresql.org/message-id/6970.1545327...@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
eclass_indexes_v2.patch
Description: Binary data
list_skip_forward_v0.patch
Description: Binary data