On Thu, Aug 10, 2017 at 6:23 PM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > Your patch didn't improve planning time without partition-wise join, > so it's something good to have along-with partition-wise join. Given > that Bitmapsets are used in other parts of code as well, the > optimization may affect those parts as well, esp. the overhead of > maintaining first_non_empty_wordnum.
Maybe, but if you consider that this container already deals with the upper bound moving up by reallocating and copying the whole thing, adjusting an int when the lower bound moves down doesn't seem like anything to worry about... > The comment at the beginning of the file bitmapset.c says > 3 * bitmapset.c > 4 * PostgreSQL generic bitmap set package > 5 * > 6 * A bitmap set can represent any set of nonnegative integers, although > 7 * it is mainly intended for sets where the maximum value is not large, > 8 * say at most a few hundred. > > When we created thousands of children, we have certainly crossed the > few hundred threashold. So, there may be other optimizations possible > there. Probably we should leave that out of partition-wise join > patches. +1 > Do you think we solving this problem is a prerequisite for > partition-wise join? Or should we propose that patch as a separate > enhancement? No, I'm not proposing anything yet. For now I just wanted to share this observation about where hot CPU time goes in simple tests, and since it turned out to be a loop in a loop that I could see an easy to way to fix for singleton sets and sets with a small range, I couldn't help trying it... But I'm still trying to understand the bigger picture. I'll be interested to compare profiles with the ordered append_rel_list version you have mentioned, to see how that moves the hot spots. I guess one very practical question to ask is: can we plan queries with realistic numbers of partitioned tables and partitions in reasonable times? Well, it certainly looks very good for hundreds of partitions so far... My own experience of partitioning with other RDBMSs has been on that order, 'monthly partitions covering the past 10 years' and similar, but on the other hand it wouldn't be surprising to learn that people want to go to many thousands, especially for schemas which just keep adding partitions over time and don't want to drop them. As for hash partitioning, that seems to be typically done with numbers like 16, 32 or 64 in other products from what I can glean. Speculation: perhaps hash partitioning is more motivated by parallelism than data maintenance and thus somehow anchored to the ground by core counts; if so no planning performance worries there I guess (until core counts double quite a few more times). One nice thing about the planning time is that restrictions on the partition key cut down planning time; so where I measure ~7 seconds to plan SELECT * FROM foofoo JOIN barbar USING (a, b) with 2k partitions, if I add WHERE a > 50 it's ~4 seconds and WHERE a > 99 it's ~0.8s, so if someone has a keep-adding-more-partitions-over-time model then at least their prunable current day/week/whatever queries will not suffer quite so badly. (Yeah my computer seems to be a lot slower than yours for these tests; clang -O2 no asserts on a mid 2014 MBP with i7 @ 2.2Ghz). Curious: would you consider joins between partitioned tables and non-partitioned tables where the join is pushed down to be a kind of "partition-wise join", or something else? If so, would that be a special case, or just the logical extreme case for 0014-WIP-Partition-wise-join-for-1-1-1-0-0-1-partition-ma.patch, where one single "partition" on the non-partitioned side maps to all the partitions on the partitioned size? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers