On 27 September 2017 at 14:22, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > - 0001 includes refactoring that Dilip proposed upthread [1] (added him as > an author). I slightly tweaked his patch -- renamed the function > get_matching_clause to match_clauses_to_partkey, similar to > match_clauses_to_index.
Hi Amit, I've made a pass over the 0001 patch while trying to get myself up to speed with the various developments that are going on in partitioning right now. These are just my thoughts from reading over the patch. I understand that there's quite a bit up in the air right now about how all this is going to work, but I have some thoughts about that too, which I wouldn't mind some feedback on as my line of thought may be off. Anyway, I will start with some small typos that I noticed while reading: partition.c: + * telling what kind of NullTest has been applies to the corresponding "applies" should be "applied" plancat.c: * might've occurred to satisfy the query. Rest of the fields are set in "Rest of the" should be "The remaining" Any onto more serious stuff: allpath.c: get_rel_partitions() I wonder if this function does not belong in partition.c. I'd have expected a function to exist per partition type, RANGE and LIST, I'd imagine should have their own function in partition.c to eliminate partitions which cannot possibly match, and return the remainder. I see people speaking of HASH partitioning, but we might one day also want something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine routing records to be processed into foreign tables where you never need to query them again). It would be good if we could easily expand this list and not have to touch files all over the optimizer to do that. Of course, there would be other work to do in the executor to implement any new partitioning method too. I know there's mention of it somewhere about get_rel_partitions() not being so smart about WHERE partkey > 20 AND partkey > 10, the code does not understand what's more restrictive. I think you could probably follow the same rules here that are done in eval_const_expressions(). Over there I see that evaluate_function() will call anything that's not marked as volatile. I'd imagine, for each partition key, you'd want to store a Datum with the minimum and maximum possible value based on the quals processed. If either the minimum or maximum is still set to NULL, then it's unbounded at that end. If you encounter partkey = Const, then minimum and maximum can be set to the value of that Const. From there you can likely ignore any other quals for that partition key, as if the query did contain another qual with partkey = SomeOtherConst, then that would have become a gating qual during the constant folding process. This way if the user had written WHERE partkey >= 1 AND partkey <= 1 the evaluation would end up the same as if they'd written WHERE partkey = 1 as the minimum and maximum would be the same value in both cases, and when those two values are the same then it would mean just one theoretical binary search on a partition range to find the correct partition instead of two. I see in get_partitions_for_keys you've crafted the function to return something to identify which partitions need to be scanned. I think it would be nice to see a special element in the partition array for the NULL and DEFAULT partition. I imagine 0 and 1, but obviously, these would be defined constants somewhere. The signature of that function is not so pretty and that would likely tidy it up a bit. The matching partition indexes could be returned as a Bitmapset, yet, I don't really see any code which handles adding the NULL and DEFAULT partition in get_rel_partitions() either, maybe I've just not looked hard enough yet... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers