Hi Nitin. I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where update row movement is not working properly. Please find the test case below.
postgres=# create table p0 (a int, b text, c bool) partition by list (a,b,c); CREATE TABLE postgres=# create table p01 partition of p0 for values in ((1,1,true)); CREATE TABLE postgres=# create table p02 partition of p0 for values in ((1,NULL,false)); CREATE TABLE postgres=# insert into p0 values (1,'1',true); INSERT 0 1 postgres=# insert into p0 values (1,NULL,false); INSERT 0 1 postgres=# select tableoid::regclass,* from p0; tableoid | a | b | c ----------+---+---+--- p01 | 1 | 1 | t p02 | 1 | | f (2 rows) postgres=# update p0 set b = NULL; UPDATE 2 postgres=# select tableoid::regclass,* from p0; tableoid | a | b | c ----------+---+---+--- p01 | 1 | | t p02 | 1 | | f (2 rows) I think this update should fail as there is no partition satisfying update row (1,NULL,true). Thanks & Regards, Rajkumar Raghuwanshi On Fri, Aug 27, 2021 at 12:53 PM Nitin Jadhav <nitinjadhavpostg...@gmail.com> wrote: > > + * isnulls is an array of boolean-tuples with key->partnatts booleans > values > > + * each. Currently only used for list partitioning, it stores whether a > > > > I think 'booleans' should be 'boolean'. > > The trailing word 'each' is unnecessary. > > > bq. Supported new syantx to allow mentioning multiple key information. > > > > syantx -> syntax > > > + isDuplicate = checkForDuplicates(result, values); > > + if (isDuplicate) > > + continue; > > > > It seems the variable isDuplicate is not needed. The if statement can > directly check the return value from checkForDuplicates(). > > I agree that isDuplicate is not required. > Thanks for sharing the comments. I will take care of these comments in > the next patch. > > > + //TODO: Handle for multi-column cases > > + for (j = 0; j < 1; j++) > > > > Is this part going to be updated in the next patch? > > Yes. The code changes related to partition-wise join are in progress. > I will handle these in the next patch. > > Thanks & Regards, > Nitin Jadhav > > On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <z...@yugabyte.com> wrote: > > > > > > > > On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav < > nitinjadhavpostg...@gmail.com> wrote: > >> > >> > The new list bound binary search and related comparison support > >> > function look a bit too verbose to me. I was expecting > >> > partition_list_bsearch() to look very much like > >> > partition_range_datum_bsearch(), but that is not the case. The > >> > special case code that you wrote in partition_list_bsearch() seems > >> > unnecessary, at least in that function. I'm talking about the code > >> > fragment starting with this comment: > >> > > >> > I will look at other parts of the patch next week hopefully. For > >> > now, attached is a delta patch that applies on top of your v1, which > >> > does: > >> > > >> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp() > >> > * Make qsort_partition_list_value_cmp simply call > >> > partition_lbound_datum_cmp() instead of having its own logic to > >> > compare input bounds > >> > * Move partition_lbound_datum_cmp() into partbounds.c as a static > >> > function (export seems unnecessary) > >> > * Add a comment for PartitionBoundInfo.isnulls and remove that for > null_index > >> > >> Yes. You are right. The extra code added in partition_list_bsearch() > >> is not required and thanks for sharing the delta patch. It looks good > >> to me and I have incorporated the changes in the attached patch. > >> > >> > I guess you're perhaps trying to address the case where the caller > >> > does not specify the values for all of the partition key columns, > >> > which can happen when the partition pruning code needs to handle a set > >> > of clauses matching only some of the partition key columns. But > >> > that's a concern of the partition pruning code and so the special case > >> > should be handled there (if at all), not in the binary search function > >> > that is shared with other callers. Regarding that, I'm wondering if > >> > we should require clauses matching all of the partition key columns to > >> > be found for the pruning code to call the binary search, so do > >> > something like get_matching_hash_bounds() does: > >> > > >> > Do you think that trying to match list partitions even with fewer keys > >> > is worth the complexity of the implementation? That is, is the use > >> > case to search for only a subset of partition key columns common > >> > enough with list partitioning? > >> > > >> > If we do decide to implement the special case, remember that to do > >> > that efficiently, we'd need to require that the subset of matched key > >> > columns constitutes a prefix, because of the way the datums are > >> > sorted. That is, match all partitions when the query only contains a > >> > clause for b when the partition key is (a, b, c), but engage the > >> > special case of pruning if the query contains clauses for a, or for a > >> > and b. > >> > >> Thanks for the suggestion. Below is the implementation details for the > >> partition pruning for multi column list partitioning. > >> > >> In the existing code (For single column list partitioning) > >> 1. In gen_partprune_steps_internal(), we try to match the where > >> clauses provided by the user with the partition key data using > >> match_clause_to_partition_key(). Based on the match, this function can > >> return many values like PARTCLAUSE_MATCH_CLAUSE, > >> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc. > >> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using > >> gen_prune_steps_from_opexps() (strategy-2) which generate and return a > >> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest > >> clauses that have been matched to the partition key and it also takes > >> care handling prefix of the partition keys. > >> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using > >> gen_prune_step_op() (strategy-1) which generates single > >> PartitionPruneStepOp since the earlier list partitioning supports > >> single column and there can be only one NULL value. In > >> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the > >> partition index which accepts null and we used to return from here. > >> > >> In case of multi column list partitioning, we have columns more than > >> one and hence there is a possibility of more than one NULL values in > >> the where clauses. The above mentioned steps are modified like below. > >> > >> 1. Modified the match_clause_to_partition_key() to generate an object > >> of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in > >> case of clauses related to NULL. The information required to generate > >> PartClauseInfo is populated here like the constant expression > >> consisting of (Datum) 0, op_strategy, op_is_ne, etc. > >> 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2 > >> (gen_prune_steps_from_opexps) to generate partition pruning steps. > >> This function takes care of generating a list of pruning steps if > >> there are multiple clauses and also takes care of handling prefixes. > >> 3. Modified perform_pruning_base_step() to generate the datum values > >> and isnulls data of the where clauses. In case if any of the key > >> contains NULL value then the corresponding datum value is 0. > >> 4. Modified get_matching_list_bounds() to generate the minimum offset > >> and/or maximum offset of the matched values based on the difference > >> operation strategies. Now since the NULL containing bound values are > >> part of 'boundinfo', changed the code accordingly to include the NULL > >> containing partitions or not in different scenarios like > >> InvalidStrategy, etc. > >> > >> I have done some cosmetic changes to > >> v1_multi_column_list_partitioning.patch. So all the above code changes > >> related to partition pruning are merged with the previous patch and > >> also included the delta patch shared by you. Hence sharing a single > >> patch. > >> > >> Kindly have a look and share your thoughts. > >> > >> > > Hi, > > > > bq. Supported new syantx to allow mentioning multiple key information. > > > > syantx -> syntax > > > > + isDuplicate = checkForDuplicates(result, values); > > + if (isDuplicate) > > + continue; > > > > It seems the variable isDuplicate is not needed. The if statement can > directly check the return value from checkForDuplicates(). > > > > + //TODO: Handle for multi-column cases > > + for (j = 0; j < 1; j++) > > > > Is this part going to be updated in the next patch? > > > > Cheers > > >