Re: Multi-Column List Partitioning

Rajkumar Raghuwanshi Sun, 10 Oct 2021 22:36:28 -0700

Thanks for the patch, it applied cleanly and fixed the reported issue.  I
observed another case where
In case of multi-col list partition on the same column query is not picking
partition wise join. Is this expected?


CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
(('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN
(('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT;
INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM
generate_series(0, 500) i WHERE i % 11 NOT  IN (0,10);
ANALYSE plt1;
CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN
(('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN
(('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM
generate_series(0, 500) i WHERE i % 11 NOT  IN (0,10);
ANALYSE plt2;
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN
plt2 t2 ON t1.c = t2.c;

postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1
INNER JOIN plt2 t2 ON t1.c = t2.c;
                 QUERY PLAN
--------------------------------------------
 Hash Join
   Hash Cond: ((t1.c)::text = (t2.c)::text)
   ->  Append
         ->  Seq Scan on plt1_p1 t1_1
         ->  Seq Scan on plt1_p2 t1_2
         ->  Seq Scan on plt1_p3 t1_3
   ->  Hash
         ->  Append
               ->  Seq Scan on plt2_p1 t2_1
               ->  Seq Scan on plt2_p2 t2_2
               ->  Seq Scan on plt2_p3 t2_3
(11 rows)

Thanks & Regards,
Rajkumar Raghuwanshi



On Thu, Oct 7, 2021 at 6:03 PM Nitin Jadhav <[email protected]>
wrote:

> Thanks Rajkumar for testing.
>
> > I think it should throw an error as the partition by list has only 1
> column but we are giving 2 values.
>
> I also agree that it should throw an error in the above case. Fixed the
> issue in the attached patch. Also added related test cases to the
> regression test suite.
>
>
> > also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’
> instead of ('0001','0001').
>
> Now throwing errors in the initial stage, this case doesn't arise.
>
> Please share if you find any other issues.
>
> Thanks & Regards,
> Nitin Jadhav
>
>
>
>
>
> On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <
> [email protected]> wrote:
>
>> Thanks Nitin,
>>
>> v4 patches applied cleanly and make check is passing now. While testing
>> further I observed that if multiple values are given for a single
>> column list partition it is not giving error instead it is changing
>> values itself. Please find the example below.
>>
>> postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
>> CREATE TABLE
>> postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
>> (('0001','0001'),('0002','0002'));
>> CREATE TABLE
>> postgres=# \d+ plt1;
>>                                           Partitioned table "public.plt1"
>>  Column |       Type        | Collation | Nullable | Default | Storage  |
>> Compression | Stats target | Description
>>
>> --------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
>>  a      | integer           |           |          |         | plain    |
>>             |              |
>>  b      | character varying |           |          |         | extended |
>>             |              |
>> Partition key: LIST (b)
>> Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')
>>
>> I think it should throw an error as the partition by list has only 1
>> column but we are giving 2 values.
>> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’
>> instead of ('0001','0001').
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>>
>> On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <
>> [email protected]> wrote:
>>
>>> > > On PG head + Nitin's v3 patch + Amit's Delta patch.  Make check is
>>> failing with below errors.
>>> >
>>> > Thanks Rajkumar for testing.
>>> >
>>> > Here's a v2 of the delta patch that should fix both of these test
>>> > failures.  As I mentioned in my last reply, my delta patch fixed what
>>> > I think were problems in Nitin's v3 patch but were not complete by
>>> > themselves.  Especially, I hadn't bothered to investigate various /*
>>> > TODO: handle multi-column list partitioning */ sites to deal with my
>>> > own changes.
>>>
>>> Thanks Rajkumar for testing and Thank you Amit for working on v2 of
>>> the delta patch. Actually I had done the code changes related to
>>> partition-wise join and I was in the middle of fixing the review
>>> comments, So I could not share the patch. Anyways thanks for your
>>> efforts.
>>>
>>> > I noticed that multi-column list partitions containing NULLs don't
>>> > work correctly with partition pruning yet.
>>> >
>>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL,
>>> 1, false));
>>> > create table p02 partition of p0 for values in ((1, NULL, false));
>>> > explain select * from p0 where a is null;
>>> >                        QUERY PLAN
>>> > --------------------------------------------------------
>>> > Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
>>> >  Filter: (a IS NULL)
>>> > (2 rows)
>>> >
>>> > In the attached updated version, I've dealt with some of those such
>>> > that at least the existing cases exercising partition pruning and
>>> > partition wise joins now pass.
>>>
>>> wrt partition pruning, I have checked the output of the above case
>>> with the v2 version of the delta patch and without that. The output
>>> remains same. Kindly let me know if I am missing something. But I feel
>>> the above output is correct as the partition p01 is the only partition
>>> which contains NULL value for column a, hence it is showing "Seq scan
>>> on p01" in the output. Kindly correct me if I am wrong. I feel the
>>> code changes related to 'null_keys' is not required, hence not
>>> incorporated that in the attached patch.
>>>
>>> wrt partition-wise join, I had run the regression test (with new cases
>>> related to partition-wise join) on v2 of the delta patch and observed
>>> the crash. Hence I have not incorporated the partition-wise join
>>> related code from v2 of delta patch to main v4 patch. Instead I have
>>> added the partition-wise join related code done by me in the attached
>>> patch. Please share your thoughts and if possible we can improvise the
>>> code. Rest of the changes looks good to me and I have incorporated
>>> that in the attached patch.
>>>
>>>
>>> > I guess that may be due to the following newly added code being
>>> incomplete:
>>> > Maybe this function needs to return a "bitmapset" of indexes, because
>>> > multiple partitions can now contain NULL values.
>>>
>>> I feel this function is not required at all as we are not separating
>>> the non null and null partitions now. Removed in the attached patch.
>>> Also removed the "scan_null' variable from the structure
>>> "PruneStepResult" and cleaned up the corresponding code blocks.
>>>
>>>
>>> > This function name may be too generic.  Given that it is specific to
>>> > implementing list bound de-duplication, maybe the following signature
>>> > is more appropriate:
>>> >
>>> > static bool
>>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>>>
>>> Yes. The function name looks more generic. How about using
>>> "isListBoundDuplicated()"? I have used this name in the patch. Please
>>> let me know if that does not look correct.
>>>
>>>
>>> > Also, better if the function comment mentions those parameter names,
>>> like:
>>> >
>>> > "Returns TRUE if the list bound element 'new_bound' is already present
>>> > in the target list 'list_bounds', FALSE otherwise."
>>>
>>> Fixed.
>>>
>>>
>>> > +/*
>>> > + * transformPartitionListBounds
>>> > + *
>>> > + * Converts the expressions of list partition bounds from the raw
>>> grammar
>>> > + * representation.
>>> >
>>> > A sentence about the result format would be helpful, like:
>>> >
>>> > The result is a List of Lists of Const nodes to account for the
>>> > partition key possibly containing more than one column.
>>>
>>> Fixed.
>>>
>>>
>>> > +   int             i = 0;
>>> > +   int             j = 0;
>>> >
>>> > Better to initialize such loop counters closer to the loop.
>>>
>>> Fixed in all the places.
>>>
>>>
>>> > +           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>>> > +           colname[i] = get_attname(RelationGetRelid(parent),
>>> > +                                    key->partattrs[i], false);
>>> >
>>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>>> > overwrites its pointer by the pointer to the string palloc'd by
>>> > get_attname().
>>>
>>> Removed the 1st statement as it is not required.
>>>
>>>
>>> > +           ListCell   *cell2 = NULL;
>>> >
>>> > No need to explicitly initialize the loop variable.
>>>
>>> Fixed in all the places.
>>>
>>>
>>> > +           RowExpr     *rowexpr = NULL;
>>> > +
>>> > +           if (!IsA(expr, RowExpr))
>>> > +               ereport(ERROR,
>>> > +                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>> > +                       errmsg("Invalid list bound specification"),
>>> > +                       parser_errposition(pstate, exprLocation((Node
>>> > *) spec))));
>>> > +
>>> > +           rowexpr = (RowExpr *) expr;
>>> >
>>> > It's okay to assign rowexpr at the top here instead of the dummy
>>> > NULL-initialization and write the condition as:
>>> >
>>> >    if (!IsA(rowexpr, RowExpr))
>>>
>>> Fixed.
>>>
>>>
>>> > +       if (isDuplicate)
>>> > +           continue;
>>> > +
>>> > +       result = lappend(result, values);
>>> >
>>> > I can see you copied this style from the existing code, but how about
>>> > writing this simply as:
>>> >
>>> >    if (!isDuplicate)
>>> >        result = lappend(result, values);
>>>
>>> This looks good. I have changed in the patch.
>>>
>>>
>>> > -/* One value coming from some (index'th) list partition */
>>> > +/* One bound of a list partition */
>>> > typedef struct PartitionListValue
>>> > {
>>> >    int         index;
>>> > -   Datum       value;
>>> > +   Datum      *values;
>>> > +   bool       *isnulls;
>>> >  } PartitionListValue;
>>> >
>>> > Given that this is a locally-defined struct, I wonder if it makes
>>> > sense to rename the struct while we're at it.  Call it, say,
>>> > PartitionListBound?
>>>
>>> Yes. PartitionListBound looks more appropriate and it also matches the
>>> similar structures of the other partition strategies.
>>>
>>> > Also, please keep part of the existing comment that says that the
>>> > bound belongs to index'th partition.
>>>
>>> Retained the old comment.
>>>
>>>
>>> > + * partition_bound_accepts_nulls
>>> > + *
>>> > + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>>> >  */
>>> >
>>> > I suggest slight rewording, as follows:
>>> >
>>> > "Returns TRUE if any of the partition bounds contains a NULL value,
>>> > FALSE otherwise."
>>>
>>> Fixed.
>>>
>>>
>>> > -   PartitionListValue *all_values;
>>> > +   PartitionListValue **all_values;
>>> > ...
>>> > -   all_values = (PartitionListValue *)
>>> > -       palloc(ndatums * sizeof(PartitionListValue));
>>> > +   ndatums = get_list_datum_count(boundspecs, nparts);
>>> > +   all_values = (PartitionListValue **)
>>> > +       palloc(ndatums * sizeof(PartitionListValue *));
>>> >
>>> > I don't see the need to redefine all_values's pointer type.  No need
>>> > to palloc PartitionListValue repeatedly for every datum as done
>>> > further down as follows:
>>> >
>>> > +           all_values[j] = (PartitionListValue *)
>>> > palloc(sizeof(PartitionListValue));
>>> >
>>> > You do need the following two though:
>>> >
>>> > +           all_values[j]->values = (Datum *) palloc0(key->partnatts *
>>> > sizeof(Datum));
>>> > +           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>>> > sizeof(bool));
>>> >
>>> > If you change the above the way I suggest, you'd also need to revert
>>> > the following change:
>>> >
>>> > -   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>>> > +   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>>> >              qsort_partition_list_value_cmp, (void *) key);
>>> >
>>> > +       int         orig_index = all_values[i]->index;
>>> > +       boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
>>> sizeof(Datum));
>>> >
>>> > Missing a newline between these two statements.
>>>
>>> Fixed. Made necessary changes to keep the intent of existing code.
>>>
>>>
>>> > @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>>> > *parttyplen, bool *parttypbyval,
>>> >    if (b1->nindexes != b2->nindexes)
>>> >        return false;
>>> >
>>> > -   if (b1->null_index != b2->null_index)
>>> > +   if (get_partition_bound_null_index(b1) !=
>>> > get_partition_bound_null_index(b2))
>>> >
>>> > As mentioned in the last message, this bit in partition_bounds_equal()
>>> > needs to be comparing "bitmapsets" of null bound indexes, that is
>>> > after fixing get_partition_bound_null_index() as previously mentioned.
>>>
>>> As mentioned earlier, removed the functionality of
>>> get_partition_bound_null_index(), hence the above condition is not
>>> required and removed.
>>>
>>> > But...
>>> >
>>> > @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>>> > *parttyplen, bool *parttypbyval,
>>> >                 * context.  datumIsEqual() should be simple enough to
>>> be
>>> >                 * safe.
>>> >                 */
>>> > -               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>>> > +               if (b1->isnulls)
>>> > +                   b1_isnull = b1->isnulls[i][j];
>>> > +               if (b2->isnulls)
>>> > +                   b2_isnull = b2->isnulls[i][j];
>>> > +
>>> > +               /*
>>> > +                * If any of the partition bound has NULL value, then
>>> check
>>> > +                * equality for the NULL value instead of comparing
>>> the datums
>>> > +                * as it does not contain valid value in case of NULL.
>>> > +                */
>>> > +               if (b1_isnull || b2_isnull)
>>> > +               {
>>> > +                   if (b1_isnull != b2_isnull)
>>> > +                       return false;
>>> > +               }
>>> >
>>> > ...if you have this in the main loop, I don't think we need the above
>>> > code stanza which appears to implement a short-cut for this long-form
>>> > logic.
>>>
>>> Yes. May be we could have ignored the above code stanza if we would
>>> have comparing the null indexes using get_partition_bound_null_index()
>>> in the beginning of the function. But hence we are not separating the
>>> non null partitions and null partitions, I would like to keep the
>>> logic in the inner loop as we are doing it for non null bound values
>>> in the above code stanza, just to give a feel that null bound values
>>> are also handled the same way as non null values. Please correct me if
>>> I am wrong.
>>>
>>>
>>> > +               (key->strategy != PARTITION_STRATEGY_LIST ||
>>> > +                !src->isnulls[i][j]))
>>> >
>>> > I think it's better to write this condition as follows just like the
>>> > accompanying condition involving src->kind:
>>> >
>>> >    (src->nulls == NULL || !src->isnulls[i][j])
>>>
>>> Fixed.
>>>
>>>
>>> > In check_new_partition_bound():
>>> >
>>> > +                       Datum      *values = (Datum *)
>>> > palloc0(key->partnatts * sizeof(Datum));
>>> > +                       bool       *isnulls = (bool *)
>>> > palloc0(key->partnatts * sizeof(bool));
>>> >
>>> > Doesn't seem like a bad idea to declare these as:
>>> >
>>> >     Datum    values[PARTITION_MAX_KEYS];
>>> >    bool        isnulls[PARTITION_MAX_KEYS];
>>>
>>> Thanks for the suggestion. I have changed as above.
>>>
>>> > I looked at get_qual_for_list_multi_column() and immediately thought
>>> > that it may be a bad idea.  I think it's better to integrate the logic
>>> > for multi-column case into the existing function even if that makes
>>> > the function appear more complex.  Having two functions with the same
>>> > goal and mostly the same code is not a good idea mainly because it
>>> > becomes a maintenance burden.
>>>
>>> Actually I had written a separate function because of the complexity.
>>> Now I have understood that since the objective is same, it should be
>>> done in a single function irrespective of complexity.
>>>
>>> > I have attempted a rewrite such that get_qual_for_list() now handles
>>> > both the single-column and multi-column cases.  Changes included in
>>> > the delta patch.  The patch updates some outputs of the newly added
>>> > tests for multi-column list partitions, because the new code emits the
>>> > IS NOT NULL tests a bit differently than
>>> > get_qual_for_list_mutli_column() would.  Notably, the old approach
>>> > would emit IS NOT NULL for every non-NULL datum matched to a given
>>> > column, not just once for the column.  However, the patch makes a few
>>> > other tests fail, mainly because I had to fix
>>> > partition_bound_accepts_nulls() to handle the multi-column case,
>>> > though didn't bother to update all callers of it to also handle the
>>> > multi-column case correctly.  I guess that's a TODO you're going to
>>> > deal with at some point anyway. :)
>>>
>>> Thank you very much for your efforts. The changes looks good to me and
>>> I have incorporated these changes in the attached patch.
>>>
>>> I have completed the coding for all the TODOs and hence removed in the
>>> patch. The naming conventions used for function/variable names varies
>>> across the files. Some places it is like 'namesLikeThis' and in some
>>> place it is like 'names_like_this'. I have used the naming conventions
>>> based on the surrounding styles used. I am happy to change those if
>>> required.
>>>
>>> I have verified 'make check' with the attached patch and it is working
>>> fine.
>>>
>>>
>>> Thanks & Regards,
>>> Nitin Jadhav
>>>
>>>
>>> On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
>>> <[email protected]> wrote:
>>> >
>>> > On PG head + Nitin's v3 patch + Amit's Delta patch.  Make check is
>>> failing with below errors.
>>> >
>>> > --inherit.sql is failing with error :"ERROR:  negative bitmapset
>>> member not allowed"
>>> > update mlparted_tab mlp set c = 'xxx'
>>> > from
>>> >   (select a from some_tab union all select a+1 from some_tab) ss (a)
>>> > where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
>>> > ERROR:  negative bitmapset member not allowed
>>> >
>>> > --partition_join.sql is crashing with enable_partitionwise_join set to
>>> true.
>>> > CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
>>> > CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001',
>>> '0003');
>>> > CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004',
>>> '0006');
>>> > CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008',
>>> '0009');
>>> > INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
>>> generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
>>> > ANALYZE plt1_adv;
>>> > CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
>>> > CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002',
>>> '0003');
>>> > CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004',
>>> '0006');
>>> > CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007',
>>> '0009');
>>> > INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
>>> generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
>>> > ANALYZE plt2_adv;
>>> > -- inner join
>>> > EXPLAIN (COSTS OFF)
>>> > SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2
>>> ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
>>> > server closed the connection unexpectedly
>>> >         This probably means the server terminated abnormally
>>> >         before or while processing the request.
>>> > connection to server was lost
>>> >
>>> >
>>> > --stack-trace
>>> > Core was generated by `postgres: edb regression [local] EXPLAIN
>>>                               '.
>>> > Program terminated with signal 6, Aborted.
>>> > #0  0x00007f7d339ba277 in raise () from /lib64/libc.so.6
>>> > Missing separate debuginfos, use: debuginfo-install
>>> glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
>>> krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
>>> libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64
>>> openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64
>>> zlib-1.2.7-17.el7.x86_64
>>> > (gdb) bt
>>> > #0  0x00007f7d339ba277 in raise () from /lib64/libc.so.6
>>> > #1  0x00007f7d339bb968 in abort () from /lib64/libc.so.6
>>> > #2  0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10
>>> "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe
>>> "partbounds.c", lineNumber=1957)
>>> >     at assert.c:69
>>> > #3  0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0,
>>> part_index=-1) at partbounds.c:1957
>>> > #4  0x00000000008919bd in merge_list_bounds (partnatts=1,
>>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
>>> inner_rel=0x1922938, jointype=JOIN_INNER,
>>> >     outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
>>> partbounds.c:1529
>>> > #5  0x00000000008910de in partition_bounds_merge (partnatts=1,
>>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
>>> inner_rel=0x1922938, jointype=JOIN_INNER,
>>> >     outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
>>> partbounds.c:1223
>>> > #6  0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0,
>>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>>> parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
>>> >     parts2=0x7fffd67751a8) at joinrels.c:1644
>>> > #7  0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0,
>>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>>> parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
>>> >     at joinrels.c:1402
>>> > #8  0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0,
>>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0,
>>> restrictlist=0x1ab3318)
>>> >     at joinrels.c:926
>>> > #9  0x000000000082a8f5 in make_join_rel (root=0x1a19ed0,
>>> rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
>>> > #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0,
>>> old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at
>>> joinrels.c:312
>>> > #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0,
>>> level=2) at joinrels.c:123
>>> > #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0,
>>> levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
>>> > #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0,
>>> joinlist=0x199d538) at allpaths.c:2951
>>> > #14 0x000000000080816b in make_one_rel (root=0x1a19ed0,
>>> joinlist=0x199d538) at allpaths.c:228
>>> > #15 0x000000000084491d in query_planner (root=0x1a19ed0,
>>> qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at
>>> planmain.c:276
>>> > #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0,
>>> tuple_fraction=0) at planner.c:1447
>>> > #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8,
>>> parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
>>> planner.c:1025
>>> > #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
>>> >     query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>>> boundParams=0x0) at planner.c:406
>>> > #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
>>> >     query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>>> boundParams=0x0) at planner.c:277
>>> > #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
>>> >     query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>>> boundParams=0x0) at postgres.c:847
>>> > #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290,
>>> cursorOptions=2048, into=0x0, es=0x19b36f0,
>>> >     queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>>> >     params=0x0, queryEnv=0x0) at explain.c:397
>>> > #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410,
>>> stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
>>> > #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
>>> >     queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>>> >     readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
>>> queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
>>> > #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
>>> >     queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>>> >     readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
>>> queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
>>> > #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40,
>>> pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378,
>>> qc=0x7fffd6775f90) at pquery.c:1147
>>> > #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40,
>>> isTopLevel=true) at pquery.c:1026
>>> > #27 0x000000000097ed11 in PortalRun (portal=0x1893b40,
>>> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8,
>>> altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
>>> > #28 0x0000000000978aa5 in exec_simple_query (
>>> >
>>> > Thanks & Regards,
>>> > Rajkumar Raghuwanshi
>>> >
>>> >
>>> > On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <[email protected]>
>>> wrote:
>>> >>
>>> >> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <[email protected]>
>>> wrote:
>>> >> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>>> >> > <[email protected]> wrote:
>>> >> > > The attached patch also fixes the above comments.
>>> >> >
>>> >> > I noticed that multi-column list partitions containing NULLs don't
>>> >> > work correctly with partition pruning yet.
>>> >> >
>>> >> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>>> >> > create table p01 partition of p0 for values in ((1, 1, true),
>>> (NULL, 1, false));
>>> >> > create table p02 partition of p0 for values in ((1, NULL, false));
>>> >> > explain select * from p0 where a is null;
>>> >> >                        QUERY PLAN
>>> >> > --------------------------------------------------------
>>> >> >  Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
>>> >> >    Filter: (a IS NULL)
>>> >> > (2 rows)
>>> >> >
>>> >> > I guess that may be due to the following newly added code being
>>> incomplete:
>>> >> >
>>> >> > +/*
>>> >> > + * get_partition_bound_null_index
>>> >> > + *
>>> >> > + * Returns the partition index of the partition bound which
>>> accepts NULL.
>>> >> > + */
>>> >> > +int
>>> >> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>>> >> > +{
>>> >> > +   int i = 0;
>>> >> > +   int j = 0;
>>> >> > +
>>> >> > +   if (!boundinfo->isnulls)
>>> >> > +       return -1;
>>> >> >
>>> >> > -           if (!val->constisnull)
>>> >> > -               count++;
>>> >> > +   for (i = 0; i < boundinfo->ndatums; i++)
>>> >> > +   {
>>> >> > +       //TODO: Handle for multi-column cases
>>> >> > +       for (j = 0; j < 1; j++)
>>> >> > +       {
>>> >> > +           if (boundinfo->isnulls[i][j])
>>> >> > +               return boundinfo->indexes[i];
>>> >> >         }
>>> >> >     }
>>> >> >
>>> >> > +   return -1;
>>> >> > +}
>>> >> >
>>> >> > Maybe this function needs to return a "bitmapset" of indexes,
>>> because
>>> >> > multiple partitions can now contain NULL values.
>>> >> >
>>> >> > Some other issues I noticed and suggestions for improvement:
>>> >> >
>>> >> > +/*
>>> >> > + * checkForDuplicates
>>> >> > + *
>>> >> > + * Returns TRUE if the list bound element is already present in
>>> the list of
>>> >> > + * list bounds, FALSE otherwise.
>>> >> > + */
>>> >> > +static bool
>>> >> > +checkForDuplicates(List *source, List *searchElem)
>>> >> >
>>> >> > This function name may be too generic.  Given that it is specific to
>>> >> > implementing list bound de-duplication, maybe the following
>>> signature
>>> >> > is more appropriate:
>>> >> >
>>> >> > static bool
>>> >> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>>> >> >
>>> >> > Also, better if the function comment mentions those parameter
>>> names, like:
>>> >> >
>>> >> > "Returns TRUE if the list bound element 'new_bound' is already
>>> present
>>> >> > in the target list 'list_bounds', FALSE otherwise."
>>> >> >
>>> >> > +/*
>>> >> > + * transformPartitionListBounds
>>> >> > + *
>>> >> > + * Converts the expressions of list partition bounds from the raw
>>> grammar
>>> >> > + * representation.
>>> >> >
>>> >> > A sentence about the result format would be helpful, like:
>>> >> >
>>> >> > The result is a List of Lists of Const nodes to account for the
>>> >> > partition key possibly containing more than one column.
>>> >> >
>>> >> > +   int             i = 0;
>>> >> > +   int             j = 0;
>>> >> >
>>> >> > Better to initialize such loop counters closer to the loop.
>>> >> >
>>> >> > +           colname[i] = (char *) palloc0(NAMEDATALEN *
>>> sizeof(char));
>>> >> > +           colname[i] = get_attname(RelationGetRelid(parent),
>>> >> > +                                    key->partattrs[i], false);
>>> >> >
>>> >> > The palloc in the 1st statement is wasteful, because the 2nd
>>> statement
>>> >> > overwrites its pointer by the pointer to the string palloc'd by
>>> >> > get_attname().
>>> >> >
>>> >> > +           ListCell   *cell2 = NULL;
>>> >> >
>>> >> > No need to explicitly initialize the loop variable.
>>> >> >
>>> >> > +           RowExpr     *rowexpr = NULL;
>>> >> > +
>>> >> > +           if (!IsA(expr, RowExpr))
>>> >> > +               ereport(ERROR,
>>> >> > +                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>> >> > +                       errmsg("Invalid list bound specification"),
>>> >> > +                       parser_errposition(pstate,
>>> exprLocation((Node
>>> >> > *) spec))));
>>> >> > +
>>> >> > +           rowexpr = (RowExpr *) expr;
>>> >> >
>>> >> > It's okay to assign rowexpr at the top here instead of the dummy
>>> >> > NULL-initialization and write the condition as:
>>> >> >
>>> >> >     if (!IsA(rowexpr, RowExpr))
>>> >> >
>>> >> > +       if (isDuplicate)
>>> >> > +           continue;
>>> >> > +
>>> >> > +       result = lappend(result, values);
>>> >> >
>>> >> > I can see you copied this style from the existing code, but how
>>> about
>>> >> > writing this simply as:
>>> >> >
>>> >> >     if (!isDuplicate)
>>> >> >         result = lappend(result, values);
>>> >> >
>>> >> > -/* One value coming from some (index'th) list partition */
>>> >> > +/* One bound of a list partition */
>>> >> >  typedef struct PartitionListValue
>>> >> >  {
>>> >> >     int         index;
>>> >> > -   Datum       value;
>>> >> > +   Datum      *values;
>>> >> > +   bool       *isnulls;
>>> >> >  } PartitionListValue;
>>> >> >
>>> >> > Given that this is a locally-defined struct, I wonder if it makes
>>> >> > sense to rename the struct while we're at it.  Call it, say,
>>> >> > PartitionListBound?
>>> >> >
>>> >> > Also, please keep part of the existing comment that says that the
>>> >> > bound belongs to index'th partition.
>>> >> >
>>> >> > Will send more comments in a bit...
>>> >>
>>> >> + * partition_bound_accepts_nulls
>>> >> + *
>>> >> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>>> >>   */
>>> >>
>>> >> I suggest slight rewording, as follows:
>>> >>
>>> >> "Returns TRUE if any of the partition bounds contains a NULL value,
>>> >> FALSE otherwise."
>>> >>
>>> >> -   PartitionListValue *all_values;
>>> >> +   PartitionListValue **all_values;
>>> >> ...
>>> >> -   all_values = (PartitionListValue *)
>>> >> -       palloc(ndatums * sizeof(PartitionListValue));
>>> >> +   ndatums = get_list_datum_count(boundspecs, nparts);
>>> >> +   all_values = (PartitionListValue **)
>>> >> +       palloc(ndatums * sizeof(PartitionListValue *));
>>> >>
>>> >> I don't see the need to redefine all_values's pointer type.  No need
>>> >> to palloc PartitionListValue repeatedly for every datum as done
>>> >> further down as follows:
>>> >>
>>> >> +           all_values[j] = (PartitionListValue *)
>>> >> palloc(sizeof(PartitionListValue));
>>> >>
>>> >> You do need the following two though:
>>> >>
>>> >> +           all_values[j]->values = (Datum *) palloc0(key->partnatts *
>>> >> sizeof(Datum));
>>> >> +           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>>> >> sizeof(bool));
>>> >>
>>> >> If you change the above the way I suggest, you'd also need to revert
>>> >> the following change:
>>> >>
>>> >> -   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>>> >> +   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>>> >>               qsort_partition_list_value_cmp, (void *) key);
>>> >>
>>> >> +       int         orig_index = all_values[i]->index;
>>> >> +       boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
>>> sizeof(Datum));
>>> >>
>>> >> Missing a newline between these two statements.
>>> >>
>>> >> BTW, I noticed that the boundDatums variable is no longer used in
>>> >> create_list_bounds.  I traced back its origin and found that a recent
>>> >> commit 53d86957e98 introduced it to implement an idea to reduce the
>>> >> finer-grained pallocs that were being done in create_list_bounds().  I
>>> >> don't think that this patch needs to throw away that work.  You can
>>> >> make it work as the attached delta patch that applies on top of v3.
>>> >> Please check.
>>> >>
>>> >> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>>> >> *parttyplen, bool *parttypbyval,
>>> >>     if (b1->nindexes != b2->nindexes)
>>> >>         return false;
>>> >>
>>> >> -   if (b1->null_index != b2->null_index)
>>> >> +   if (get_partition_bound_null_index(b1) !=
>>> >> get_partition_bound_null_index(b2))
>>> >>
>>> >> As mentioned in the last message, this bit in partition_bounds_equal()
>>> >> needs to be comparing "bitmapsets" of null bound indexes, that is
>>> >> after fixing get_partition_bound_null_index() as previously mentioned.
>>> >>
>>> >> But...
>>> >>
>>> >> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>>> >> *parttyplen, bool *parttypbyval,
>>> >>                  * context.  datumIsEqual() should be simple enough
>>> to be
>>> >>                  * safe.
>>> >>                  */
>>> >> -               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>>> >> +               if (b1->isnulls)
>>> >> +                   b1_isnull = b1->isnulls[i][j];
>>> >> +               if (b2->isnulls)
>>> >> +                   b2_isnull = b2->isnulls[i][j];
>>> >> +
>>> >> +               /*
>>> >> +                * If any of the partition bound has NULL value, then
>>> check
>>> >> +                * equality for the NULL value instead of comparing
>>> the datums
>>> >> +                * as it does not contain valid value in case of NULL.
>>> >> +                */
>>> >> +               if (b1_isnull || b2_isnull)
>>> >> +               {
>>> >> +                   if (b1_isnull != b2_isnull)
>>> >> +                       return false;
>>> >> +               }
>>> >>
>>> >> ...if you have this in the main loop, I don't think we need the above
>>> >> code stanza which appears to implement a short-cut for this long-form
>>> >> logic.
>>> >>
>>> >> +               (key->strategy != PARTITION_STRATEGY_LIST ||
>>> >> +                !src->isnulls[i][j]))
>>> >>
>>> >> I think it's better to write this condition as follows just like the
>>> >> accompanying condition involving src->kind:
>>> >>
>>> >>     (src->nulls == NULL || !src->isnulls[i][j])
>>> >>
>>> >> (Skipped looking at merge_list_bounds() and related changes for now as
>>> >> I see a lot of TODOs remain to be done.)
>>> >>
>>> >> In check_new_partition_bound():
>>> >>
>>> >> +                       Datum      *values = (Datum *)
>>> >> palloc0(key->partnatts * sizeof(Datum));
>>> >> +                       bool       *isnulls = (bool *)
>>> >> palloc0(key->partnatts * sizeof(bool));
>>> >>
>>> >> Doesn't seem like a bad idea to declare these as:
>>> >>
>>> >>     Datum    values[PARTITION_MAX_KEYS];
>>> >>     bool        isnulls[PARTITION_MAX_KEYS];
>>> >>
>>> >>
>>> >> I looked at get_qual_for_list_multi_column() and immediately thought
>>> >> that it may be a bad idea.  I think it's better to integrate the logic
>>> >> for multi-column case into the existing function even if that makes
>>> >> the function appear more complex.  Having two functions with the same
>>> >> goal and mostly the same code is not a good idea mainly because it
>>> >> becomes a maintenance burden.
>>> >>
>>> >> I have attempted a rewrite such that get_qual_for_list() now handles
>>> >> both the single-column and multi-column cases.  Changes included in
>>> >> the delta patch.  The patch updates some outputs of the newly added
>>> >> tests for multi-column list partitions, because the new code emits the
>>> >> IS NOT NULL tests a bit differently than
>>> >> get_qual_for_list_mutli_column() would.  Notably, the old approach
>>> >> would emit IS NOT NULL for every non-NULL datum matched to a given
>>> >> column, not just once for the column.  However, the patch makes a few
>>> >> other tests fail, mainly because I had to fix
>>> >> partition_bound_accepts_nulls() to handle the multi-column case,
>>> >> though didn't bother to update all callers of it to also handle the
>>> >> multi-column case correctly.  I guess that's a TODO you're going to
>>> >> deal with at some point anyway. :)
>>> >>
>>> >> I still have more than half of v3 left to look at, so will continue
>>> >> looking.   In the meantime, please check the changes I suggested,
>>> >> including the delta patch, and let me know your thoughts.
>>> >>
>>> >> --
>>> >> Amit Langote
>>> >> EDB: http://www.enterprisedb.com
>>>
>>

Re: Multi-Column List Partitioning

Reply via email to