On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostg...@gmail.com> wrote:
> > The new list bound binary search and related comparison support > > function look a bit too verbose to me. I was expecting > > partition_list_bsearch() to look very much like > > partition_range_datum_bsearch(), but that is not the case. The > > special case code that you wrote in partition_list_bsearch() seems > > unnecessary, at least in that function. I'm talking about the code > > fragment starting with this comment: > > > > I will look at other parts of the patch next week hopefully. For > > now, attached is a delta patch that applies on top of your v1, which > > does: > > > > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp() > > * Make qsort_partition_list_value_cmp simply call > > partition_lbound_datum_cmp() instead of having its own logic to > > compare input bounds > > * Move partition_lbound_datum_cmp() into partbounds.c as a static > > function (export seems unnecessary) > > * Add a comment for PartitionBoundInfo.isnulls and remove that for > null_index > > Yes. You are right. The extra code added in partition_list_bsearch() > is not required and thanks for sharing the delta patch. It looks good > to me and I have incorporated the changes in the attached patch. > > > I guess you're perhaps trying to address the case where the caller > > does not specify the values for all of the partition key columns, > > which can happen when the partition pruning code needs to handle a set > > of clauses matching only some of the partition key columns. But > > that's a concern of the partition pruning code and so the special case > > should be handled there (if at all), not in the binary search function > > that is shared with other callers. Regarding that, I'm wondering if > > we should require clauses matching all of the partition key columns to > > be found for the pruning code to call the binary search, so do > > something like get_matching_hash_bounds() does: > > > > Do you think that trying to match list partitions even with fewer keys > > is worth the complexity of the implementation? That is, is the use > > case to search for only a subset of partition key columns common > > enough with list partitioning? > > > > If we do decide to implement the special case, remember that to do > > that efficiently, we'd need to require that the subset of matched key > > columns constitutes a prefix, because of the way the datums are > > sorted. That is, match all partitions when the query only contains a > > clause for b when the partition key is (a, b, c), but engage the > > special case of pruning if the query contains clauses for a, or for a > > and b. > > Thanks for the suggestion. Below is the implementation details for the > partition pruning for multi column list partitioning. > > In the existing code (For single column list partitioning) > 1. In gen_partprune_steps_internal(), we try to match the where > clauses provided by the user with the partition key data using > match_clause_to_partition_key(). Based on the match, this function can > return many values like PARTCLAUSE_MATCH_CLAUSE, > PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc. > 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using > gen_prune_steps_from_opexps() (strategy-2) which generate and return a > list of PartitionPruneStepOp that are based on OpExpr and BooleanTest > clauses that have been matched to the partition key and it also takes > care handling prefix of the partition keys. > 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using > gen_prune_step_op() (strategy-1) which generates single > PartitionPruneStepOp since the earlier list partitioning supports > single column and there can be only one NULL value. In > get_matching_list_bounds(), if the nullkeys is not empty, we fetch the > partition index which accepts null and we used to return from here. > > In case of multi column list partitioning, we have columns more than > one and hence there is a possibility of more than one NULL values in > the where clauses. The above mentioned steps are modified like below. > > 1. Modified the match_clause_to_partition_key() to generate an object > of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in > case of clauses related to NULL. The information required to generate > PartClauseInfo is populated here like the constant expression > consisting of (Datum) 0, op_strategy, op_is_ne, etc. > 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2 > (gen_prune_steps_from_opexps) to generate partition pruning steps. > This function takes care of generating a list of pruning steps if > there are multiple clauses and also takes care of handling prefixes. > 3. Modified perform_pruning_base_step() to generate the datum values > and isnulls data of the where clauses. In case if any of the key > contains NULL value then the corresponding datum value is 0. > 4. Modified get_matching_list_bounds() to generate the minimum offset > and/or maximum offset of the matched values based on the difference > operation strategies. Now since the NULL containing bound values are > part of 'boundinfo', changed the code accordingly to include the NULL > containing partitions or not in different scenarios like > InvalidStrategy, etc. > > I have done some cosmetic changes to > v1_multi_column_list_partitioning.patch. So all the above code changes > related to partition pruning are merged with the previous patch and > also included the delta patch shared by you. Hence sharing a single > patch. > > Kindly have a look and share your thoughts. > > > Hi, bq. Supported new syantx to allow mentioning multiple key information. syantx -> syntax + isDuplicate = checkForDuplicates(result, values); + if (isDuplicate) + continue; It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates(). + //TODO: Handle for multi-column cases + for (j = 0; j < 1; j++) Is this part going to be updated in the next patch? Cheers