On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangot...@gmail.com> wrote: > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav > <nitinjadhavpostg...@gmail.com> wrote: > > The attached patch also fixes the above comments. > > I noticed that multi-column list partitions containing NULLs don't > work correctly with partition pruning yet. > > create table p0 (a int, b text, c bool) partition by list (a, b, c); > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, > false)); > create table p02 partition of p0 for values in ((1, NULL, false)); > explain select * from p0 where a is null; > QUERY PLAN > -------------------------------------------------------- > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37) > Filter: (a IS NULL) > (2 rows) > > I guess that may be due to the following newly added code being incomplete: > > +/* > + * get_partition_bound_null_index > + * > + * Returns the partition index of the partition bound which accepts NULL. > + */ > +int > +get_partition_bound_null_index(PartitionBoundInfo boundinfo) > +{ > + int i = 0; > + int j = 0; > + > + if (!boundinfo->isnulls) > + return -1; > > - if (!val->constisnull) > - count++; > + for (i = 0; i < boundinfo->ndatums; i++) > + { > + //TODO: Handle for multi-column cases > + for (j = 0; j < 1; j++) > + { > + if (boundinfo->isnulls[i][j]) > + return boundinfo->indexes[i]; > } > } > > + return -1; > +} > > Maybe this function needs to return a "bitmapset" of indexes, because > multiple partitions can now contain NULL values. > > Some other issues I noticed and suggestions for improvement: > > +/* > + * checkForDuplicates > + * > + * Returns TRUE if the list bound element is already present in the list of > + * list bounds, FALSE otherwise. > + */ > +static bool > +checkForDuplicates(List *source, List *searchElem) > > This function name may be too generic. Given that it is specific to > implementing list bound de-duplication, maybe the following signature > is more appropriate: > > static bool > checkListBoundDuplicated(List *list_bounds, List *new_bound) > > Also, better if the function comment mentions those parameter names, like: > > "Returns TRUE if the list bound element 'new_bound' is already present > in the target list 'list_bounds', FALSE otherwise." > > +/* > + * transformPartitionListBounds > + * > + * Converts the expressions of list partition bounds from the raw grammar > + * representation. > > A sentence about the result format would be helpful, like: > > The result is a List of Lists of Const nodes to account for the > partition key possibly containing more than one column. > > + int i = 0; > + int j = 0; > > Better to initialize such loop counters closer to the loop. > > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char)); > + colname[i] = get_attname(RelationGetRelid(parent), > + key->partattrs[i], false); > > The palloc in the 1st statement is wasteful, because the 2nd statement > overwrites its pointer by the pointer to the string palloc'd by > get_attname(). > > + ListCell *cell2 = NULL; > > No need to explicitly initialize the loop variable. > > + RowExpr *rowexpr = NULL; > + > + if (!IsA(expr, RowExpr)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), > + errmsg("Invalid list bound specification"), > + parser_errposition(pstate, exprLocation((Node > *) spec)))); > + > + rowexpr = (RowExpr *) expr; > > It's okay to assign rowexpr at the top here instead of the dummy > NULL-initialization and write the condition as: > > if (!IsA(rowexpr, RowExpr)) > > + if (isDuplicate) > + continue; > + > + result = lappend(result, values); > > I can see you copied this style from the existing code, but how about > writing this simply as: > > if (!isDuplicate) > result = lappend(result, values); > > -/* One value coming from some (index'th) list partition */ > +/* One bound of a list partition */ > typedef struct PartitionListValue > { > int index; > - Datum value; > + Datum *values; > + bool *isnulls; > } PartitionListValue; > > Given that this is a locally-defined struct, I wonder if it makes > sense to rename the struct while we're at it. Call it, say, > PartitionListBound? > > Also, please keep part of the existing comment that says that the > bound belongs to index'th partition. > > Will send more comments in a bit...
+ * partition_bound_accepts_nulls + * + * Returns TRUE if partition bound has NULL value, FALSE otherwise. */ I suggest slight rewording, as follows: "Returns TRUE if any of the partition bounds contains a NULL value, FALSE otherwise." - PartitionListValue *all_values; + PartitionListValue **all_values; ... - all_values = (PartitionListValue *) - palloc(ndatums * sizeof(PartitionListValue)); + ndatums = get_list_datum_count(boundspecs, nparts); + all_values = (PartitionListValue **) + palloc(ndatums * sizeof(PartitionListValue *)); I don't see the need to redefine all_values's pointer type. No need to palloc PartitionListValue repeatedly for every datum as done further down as follows: + all_values[j] = (PartitionListValue *) palloc(sizeof(PartitionListValue)); You do need the following two though: + all_values[j]->values = (Datum *) palloc0(key->partnatts * sizeof(Datum)); + all_values[j]->isnulls = (bool *) palloc0(key->partnatts * sizeof(bool)); If you change the above the way I suggest, you'd also need to revert the following change: - qsort_arg(all_values, ndatums, sizeof(PartitionListValue), + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *), qsort_partition_list_value_cmp, (void *) key); + int orig_index = all_values[i]->index; + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum)); Missing a newline between these two statements. BTW, I noticed that the boundDatums variable is no longer used in create_list_bounds. I traced back its origin and found that a recent commit 53d86957e98 introduced it to implement an idea to reduce the finer-grained pallocs that were being done in create_list_bounds(). I don't think that this patch needs to throw away that work. You can make it work as the attached delta patch that applies on top of v3. Please check. @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval, if (b1->nindexes != b2->nindexes) return false; - if (b1->null_index != b2->null_index) + if (get_partition_bound_null_index(b1) != get_partition_bound_null_index(b2)) As mentioned in the last message, this bit in partition_bounds_equal() needs to be comparing "bitmapsets" of null bound indexes, that is after fixing get_partition_bound_null_index() as previously mentioned. But... @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval, * context. datumIsEqual() should be simple enough to be * safe. */ - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j], + if (b1->isnulls) + b1_isnull = b1->isnulls[i][j]; + if (b2->isnulls) + b2_isnull = b2->isnulls[i][j]; + + /* + * If any of the partition bound has NULL value, then check + * equality for the NULL value instead of comparing the datums + * as it does not contain valid value in case of NULL. + */ + if (b1_isnull || b2_isnull) + { + if (b1_isnull != b2_isnull) + return false; + } ...if you have this in the main loop, I don't think we need the above code stanza which appears to implement a short-cut for this long-form logic. + (key->strategy != PARTITION_STRATEGY_LIST || + !src->isnulls[i][j])) I think it's better to write this condition as follows just like the accompanying condition involving src->kind: (src->nulls == NULL || !src->isnulls[i][j]) (Skipped looking at merge_list_bounds() and related changes for now as I see a lot of TODOs remain to be done.) In check_new_partition_bound(): + Datum *values = (Datum *) palloc0(key->partnatts * sizeof(Datum)); + bool *isnulls = (bool *) palloc0(key->partnatts * sizeof(bool)); Doesn't seem like a bad idea to declare these as: Datum values[PARTITION_MAX_KEYS]; bool isnulls[PARTITION_MAX_KEYS]; I looked at get_qual_for_list_multi_column() and immediately thought that it may be a bad idea. I think it's better to integrate the logic for multi-column case into the existing function even if that makes the function appear more complex. Having two functions with the same goal and mostly the same code is not a good idea mainly because it becomes a maintenance burden. I have attempted a rewrite such that get_qual_for_list() now handles both the single-column and multi-column cases. Changes included in the delta patch. The patch updates some outputs of the newly added tests for multi-column list partitions, because the new code emits the IS NOT NULL tests a bit differently than get_qual_for_list_mutli_column() would. Notably, the old approach would emit IS NOT NULL for every non-NULL datum matched to a given column, not just once for the column. However, the patch makes a few other tests fail, mainly because I had to fix partition_bound_accepts_nulls() to handle the multi-column case, though didn't bother to update all callers of it to also handle the multi-column case correctly. I guess that's a TODO you're going to deal with at some point anyway. :) I still have more than half of v3 left to look at, so will continue looking. In the meantime, please check the changes I suggested, including the delta patch, and let me know your thoughts. -- Amit Langote EDB: http://www.enterprisedb.com
v3_delta_amit.patch
Description: Binary data