Thanks for the patch, it applied cleanly and fixed the reported issue. I observed another case where In case of multi-col list partition on the same column query is not picking partition wise join. Is this expected?
CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c); CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003')); CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006')); CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT; INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10); ANALYSE plt1; CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c); CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003')); CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006')); CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT; INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10); ANALYSE plt2; SET enable_partitionwise_join TO true; EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c; postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c; QUERY PLAN -------------------------------------------- Hash Join Hash Cond: ((t1.c)::text = (t2.c)::text) -> Append -> Seq Scan on plt1_p1 t1_1 -> Seq Scan on plt1_p2 t1_2 -> Seq Scan on plt1_p3 t1_3 -> Hash -> Append -> Seq Scan on plt2_p1 t2_1 -> Seq Scan on plt2_p2 t2_2 -> Seq Scan on plt2_p3 t2_3 (11 rows) Thanks & Regards, Rajkumar Raghuwanshi On Thu, Oct 7, 2021 at 6:03 PM Nitin Jadhav <nitinjadhavpostg...@gmail.com> wrote: > Thanks Rajkumar for testing. > > > I think it should throw an error as the partition by list has only 1 > column but we are giving 2 values. > > I also agree that it should throw an error in the above case. Fixed the > issue in the attached patch. Also added related test cases to the > regression test suite. > > > > also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ > instead of ('0001','0001'). > > Now throwing errors in the initial stage, this case doesn't arise. > > Please share if you find any other issues. > > Thanks & Regards, > Nitin Jadhav > > > > > > On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi < > rajkumar.raghuwan...@enterprisedb.com> wrote: > >> Thanks Nitin, >> >> v4 patches applied cleanly and make check is passing now. While testing >> further I observed that if multiple values are given for a single >> column list partition it is not giving error instead it is changing >> values itself. Please find the example below. >> >> postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b); >> CREATE TABLE >> postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN >> (('0001','0001'),('0002','0002')); >> CREATE TABLE >> postgres=# \d+ plt1; >> Partitioned table "public.plt1" >> Column | Type | Collation | Nullable | Default | Storage | >> Compression | Stats target | Description >> >> --------+-------------------+-----------+----------+---------+----------+-------------+--------------+------------- >> a | integer | | | | plain | >> | | >> b | character varying | | | | extended | >> | | >> Partition key: LIST (b) >> Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)') >> >> I think it should throw an error as the partition by list has only 1 >> column but we are giving 2 values. >> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ >> instead of ('0001','0001'). >> >> Thanks & Regards, >> Rajkumar Raghuwanshi >> >> >> >> On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav < >> nitinjadhavpostg...@gmail.com> wrote: >> >>> > > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is >>> failing with below errors. >>> > >>> > Thanks Rajkumar for testing. >>> > >>> > Here's a v2 of the delta patch that should fix both of these test >>> > failures. As I mentioned in my last reply, my delta patch fixed what >>> > I think were problems in Nitin's v3 patch but were not complete by >>> > themselves. Especially, I hadn't bothered to investigate various /* >>> > TODO: handle multi-column list partitioning */ sites to deal with my >>> > own changes. >>> >>> Thanks Rajkumar for testing and Thank you Amit for working on v2 of >>> the delta patch. Actually I had done the code changes related to >>> partition-wise join and I was in the middle of fixing the review >>> comments, So I could not share the patch. Anyways thanks for your >>> efforts. >>> >>> > I noticed that multi-column list partitions containing NULLs don't >>> > work correctly with partition pruning yet. >>> > >>> > create table p0 (a int, b text, c bool) partition by list (a, b, c); >>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, >>> 1, false)); >>> > create table p02 partition of p0 for values in ((1, NULL, false)); >>> > explain select * from p0 where a is null; >>> > QUERY PLAN >>> > -------------------------------------------------------- >>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37) >>> > Filter: (a IS NULL) >>> > (2 rows) >>> > >>> > In the attached updated version, I've dealt with some of those such >>> > that at least the existing cases exercising partition pruning and >>> > partition wise joins now pass. >>> >>> wrt partition pruning, I have checked the output of the above case >>> with the v2 version of the delta patch and without that. The output >>> remains same. Kindly let me know if I am missing something. But I feel >>> the above output is correct as the partition p01 is the only partition >>> which contains NULL value for column a, hence it is showing "Seq scan >>> on p01" in the output. Kindly correct me if I am wrong. I feel the >>> code changes related to 'null_keys' is not required, hence not >>> incorporated that in the attached patch. >>> >>> wrt partition-wise join, I had run the regression test (with new cases >>> related to partition-wise join) on v2 of the delta patch and observed >>> the crash. Hence I have not incorporated the partition-wise join >>> related code from v2 of delta patch to main v4 patch. Instead I have >>> added the partition-wise join related code done by me in the attached >>> patch. Please share your thoughts and if possible we can improvise the >>> code. Rest of the changes looks good to me and I have incorporated >>> that in the attached patch. >>> >>> >>> > I guess that may be due to the following newly added code being >>> incomplete: >>> > Maybe this function needs to return a "bitmapset" of indexes, because >>> > multiple partitions can now contain NULL values. >>> >>> I feel this function is not required at all as we are not separating >>> the non null and null partitions now. Removed in the attached patch. >>> Also removed the "scan_null' variable from the structure >>> "PruneStepResult" and cleaned up the corresponding code blocks. >>> >>> >>> > This function name may be too generic. Given that it is specific to >>> > implementing list bound de-duplication, maybe the following signature >>> > is more appropriate: >>> > >>> > static bool >>> > checkListBoundDuplicated(List *list_bounds, List *new_bound) >>> >>> Yes. The function name looks more generic. How about using >>> "isListBoundDuplicated()"? I have used this name in the patch. Please >>> let me know if that does not look correct. >>> >>> >>> > Also, better if the function comment mentions those parameter names, >>> like: >>> > >>> > "Returns TRUE if the list bound element 'new_bound' is already present >>> > in the target list 'list_bounds', FALSE otherwise." >>> >>> Fixed. >>> >>> >>> > +/* >>> > + * transformPartitionListBounds >>> > + * >>> > + * Converts the expressions of list partition bounds from the raw >>> grammar >>> > + * representation. >>> > >>> > A sentence about the result format would be helpful, like: >>> > >>> > The result is a List of Lists of Const nodes to account for the >>> > partition key possibly containing more than one column. >>> >>> Fixed. >>> >>> >>> > + int i = 0; >>> > + int j = 0; >>> > >>> > Better to initialize such loop counters closer to the loop. >>> >>> Fixed in all the places. >>> >>> >>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char)); >>> > + colname[i] = get_attname(RelationGetRelid(parent), >>> > + key->partattrs[i], false); >>> > >>> > The palloc in the 1st statement is wasteful, because the 2nd statement >>> > overwrites its pointer by the pointer to the string palloc'd by >>> > get_attname(). >>> >>> Removed the 1st statement as it is not required. >>> >>> >>> > + ListCell *cell2 = NULL; >>> > >>> > No need to explicitly initialize the loop variable. >>> >>> Fixed in all the places. >>> >>> >>> > + RowExpr *rowexpr = NULL; >>> > + >>> > + if (!IsA(expr, RowExpr)) >>> > + ereport(ERROR, >>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), >>> > + errmsg("Invalid list bound specification"), >>> > + parser_errposition(pstate, exprLocation((Node >>> > *) spec)))); >>> > + >>> > + rowexpr = (RowExpr *) expr; >>> > >>> > It's okay to assign rowexpr at the top here instead of the dummy >>> > NULL-initialization and write the condition as: >>> > >>> > if (!IsA(rowexpr, RowExpr)) >>> >>> Fixed. >>> >>> >>> > + if (isDuplicate) >>> > + continue; >>> > + >>> > + result = lappend(result, values); >>> > >>> > I can see you copied this style from the existing code, but how about >>> > writing this simply as: >>> > >>> > if (!isDuplicate) >>> > result = lappend(result, values); >>> >>> This looks good. I have changed in the patch. >>> >>> >>> > -/* One value coming from some (index'th) list partition */ >>> > +/* One bound of a list partition */ >>> > typedef struct PartitionListValue >>> > { >>> > int index; >>> > - Datum value; >>> > + Datum *values; >>> > + bool *isnulls; >>> > } PartitionListValue; >>> > >>> > Given that this is a locally-defined struct, I wonder if it makes >>> > sense to rename the struct while we're at it. Call it, say, >>> > PartitionListBound? >>> >>> Yes. PartitionListBound looks more appropriate and it also matches the >>> similar structures of the other partition strategies. >>> >>> > Also, please keep part of the existing comment that says that the >>> > bound belongs to index'th partition. >>> >>> Retained the old comment. >>> >>> >>> > + * partition_bound_accepts_nulls >>> > + * >>> > + * Returns TRUE if partition bound has NULL value, FALSE otherwise. >>> > */ >>> > >>> > I suggest slight rewording, as follows: >>> > >>> > "Returns TRUE if any of the partition bounds contains a NULL value, >>> > FALSE otherwise." >>> >>> Fixed. >>> >>> >>> > - PartitionListValue *all_values; >>> > + PartitionListValue **all_values; >>> > ... >>> > - all_values = (PartitionListValue *) >>> > - palloc(ndatums * sizeof(PartitionListValue)); >>> > + ndatums = get_list_datum_count(boundspecs, nparts); >>> > + all_values = (PartitionListValue **) >>> > + palloc(ndatums * sizeof(PartitionListValue *)); >>> > >>> > I don't see the need to redefine all_values's pointer type. No need >>> > to palloc PartitionListValue repeatedly for every datum as done >>> > further down as follows: >>> > >>> > + all_values[j] = (PartitionListValue *) >>> > palloc(sizeof(PartitionListValue)); >>> > >>> > You do need the following two though: >>> > >>> > + all_values[j]->values = (Datum *) palloc0(key->partnatts * >>> > sizeof(Datum)); >>> > + all_values[j]->isnulls = (bool *) palloc0(key->partnatts * >>> > sizeof(bool)); >>> > >>> > If you change the above the way I suggest, you'd also need to revert >>> > the following change: >>> > >>> > - qsort_arg(all_values, ndatums, sizeof(PartitionListValue), >>> > + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *), >>> > qsort_partition_list_value_cmp, (void *) key); >>> > >>> > + int orig_index = all_values[i]->index; >>> > + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * >>> sizeof(Datum)); >>> > >>> > Missing a newline between these two statements. >>> >>> Fixed. Made necessary changes to keep the intent of existing code. >>> >>> >>> > @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16 >>> > *parttyplen, bool *parttypbyval, >>> > if (b1->nindexes != b2->nindexes) >>> > return false; >>> > >>> > - if (b1->null_index != b2->null_index) >>> > + if (get_partition_bound_null_index(b1) != >>> > get_partition_bound_null_index(b2)) >>> > >>> > As mentioned in the last message, this bit in partition_bounds_equal() >>> > needs to be comparing "bitmapsets" of null bound indexes, that is >>> > after fixing get_partition_bound_null_index() as previously mentioned. >>> >>> As mentioned earlier, removed the functionality of >>> get_partition_bound_null_index(), hence the above condition is not >>> required and removed. >>> >>> > But... >>> > >>> > @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16 >>> > *parttyplen, bool *parttypbyval, >>> > * context. datumIsEqual() should be simple enough to >>> be >>> > * safe. >>> > */ >>> > - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j], >>> > + if (b1->isnulls) >>> > + b1_isnull = b1->isnulls[i][j]; >>> > + if (b2->isnulls) >>> > + b2_isnull = b2->isnulls[i][j]; >>> > + >>> > + /* >>> > + * If any of the partition bound has NULL value, then >>> check >>> > + * equality for the NULL value instead of comparing >>> the datums >>> > + * as it does not contain valid value in case of NULL. >>> > + */ >>> > + if (b1_isnull || b2_isnull) >>> > + { >>> > + if (b1_isnull != b2_isnull) >>> > + return false; >>> > + } >>> > >>> > ...if you have this in the main loop, I don't think we need the above >>> > code stanza which appears to implement a short-cut for this long-form >>> > logic. >>> >>> Yes. May be we could have ignored the above code stanza if we would >>> have comparing the null indexes using get_partition_bound_null_index() >>> in the beginning of the function. But hence we are not separating the >>> non null partitions and null partitions, I would like to keep the >>> logic in the inner loop as we are doing it for non null bound values >>> in the above code stanza, just to give a feel that null bound values >>> are also handled the same way as non null values. Please correct me if >>> I am wrong. >>> >>> >>> > + (key->strategy != PARTITION_STRATEGY_LIST || >>> > + !src->isnulls[i][j])) >>> > >>> > I think it's better to write this condition as follows just like the >>> > accompanying condition involving src->kind: >>> > >>> > (src->nulls == NULL || !src->isnulls[i][j]) >>> >>> Fixed. >>> >>> >>> > In check_new_partition_bound(): >>> > >>> > + Datum *values = (Datum *) >>> > palloc0(key->partnatts * sizeof(Datum)); >>> > + bool *isnulls = (bool *) >>> > palloc0(key->partnatts * sizeof(bool)); >>> > >>> > Doesn't seem like a bad idea to declare these as: >>> > >>> > Datum values[PARTITION_MAX_KEYS]; >>> > bool isnulls[PARTITION_MAX_KEYS]; >>> >>> Thanks for the suggestion. I have changed as above. >>> >>> > I looked at get_qual_for_list_multi_column() and immediately thought >>> > that it may be a bad idea. I think it's better to integrate the logic >>> > for multi-column case into the existing function even if that makes >>> > the function appear more complex. Having two functions with the same >>> > goal and mostly the same code is not a good idea mainly because it >>> > becomes a maintenance burden. >>> >>> Actually I had written a separate function because of the complexity. >>> Now I have understood that since the objective is same, it should be >>> done in a single function irrespective of complexity. >>> >>> > I have attempted a rewrite such that get_qual_for_list() now handles >>> > both the single-column and multi-column cases. Changes included in >>> > the delta patch. The patch updates some outputs of the newly added >>> > tests for multi-column list partitions, because the new code emits the >>> > IS NOT NULL tests a bit differently than >>> > get_qual_for_list_mutli_column() would. Notably, the old approach >>> > would emit IS NOT NULL for every non-NULL datum matched to a given >>> > column, not just once for the column. However, the patch makes a few >>> > other tests fail, mainly because I had to fix >>> > partition_bound_accepts_nulls() to handle the multi-column case, >>> > though didn't bother to update all callers of it to also handle the >>> > multi-column case correctly. I guess that's a TODO you're going to >>> > deal with at some point anyway. :) >>> >>> Thank you very much for your efforts. The changes looks good to me and >>> I have incorporated these changes in the attached patch. >>> >>> I have completed the coding for all the TODOs and hence removed in the >>> patch. The naming conventions used for function/variable names varies >>> across the files. Some places it is like 'namesLikeThis' and in some >>> place it is like 'names_like_this'. I have used the naming conventions >>> based on the surrounding styles used. I am happy to change those if >>> required. >>> >>> I have verified 'make check' with the attached patch and it is working >>> fine. >>> >>> >>> Thanks & Regards, >>> Nitin Jadhav >>> >>> >>> On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi >>> <rajkumar.raghuwan...@enterprisedb.com> wrote: >>> > >>> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is >>> failing with below errors. >>> > >>> > --inherit.sql is failing with error :"ERROR: negative bitmapset >>> member not allowed" >>> > update mlparted_tab mlp set c = 'xxx' >>> > from >>> > (select a from some_tab union all select a+1 from some_tab) ss (a) >>> > where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3; >>> > ERROR: negative bitmapset member not allowed >>> > >>> > --partition_join.sql is crashing with enable_partitionwise_join set to >>> true. >>> > CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c); >>> > CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', >>> '0003'); >>> > CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', >>> '0006'); >>> > CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', >>> '0009'); >>> > INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM >>> generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9); >>> > ANALYZE plt1_adv; >>> > CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c); >>> > CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', >>> '0003'); >>> > CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', >>> '0006'); >>> > CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', >>> '0009'); >>> > INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM >>> generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9); >>> > ANALYZE plt2_adv; >>> > -- inner join >>> > EXPLAIN (COSTS OFF) >>> > SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 >>> ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a; >>> > server closed the connection unexpectedly >>> > This probably means the server terminated abnormally >>> > before or while processing the request. >>> > connection to server was lost >>> > >>> > >>> > --stack-trace >>> > Core was generated by `postgres: edb regression [local] EXPLAIN >>> '. >>> > Program terminated with signal 6, Aborted. >>> > #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6 >>> > Missing separate debuginfos, use: debuginfo-install >>> glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 >>> krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 >>> libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 >>> openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 >>> zlib-1.2.7-17.el7.x86_64 >>> > (gdb) bt >>> > #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6 >>> > #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6 >>> > #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 >>> "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe >>> "partbounds.c", lineNumber=1957) >>> > at assert.c:69 >>> > #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, >>> part_index=-1) at partbounds.c:1957 >>> > #4 0x00000000008919bd in merge_list_bounds (partnatts=1, >>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, >>> inner_rel=0x1922938, jointype=JOIN_INNER, >>> > outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at >>> partbounds.c:1529 >>> > #5 0x00000000008910de in partition_bounds_merge (partnatts=1, >>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, >>> inner_rel=0x1922938, jointype=JOIN_INNER, >>> > outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at >>> partbounds.c:1223 >>> > #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, >>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, >>> parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0, >>> > parts2=0x7fffd67751a8) at joinrels.c:1644 >>> > #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, >>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, >>> parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318) >>> > at joinrels.c:1402 >>> > #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, >>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, >>> restrictlist=0x1ab3318) >>> > at joinrels.c:926 >>> > #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, >>> rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760 >>> > #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, >>> old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at >>> joinrels.c:312 >>> > #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, >>> level=2) at joinrels.c:123 >>> > #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, >>> levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020 >>> > #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, >>> joinlist=0x199d538) at allpaths.c:2951 >>> > #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, >>> joinlist=0x199d538) at allpaths.c:228 >>> > #15 0x000000000084491d in query_planner (root=0x1a19ed0, >>> qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at >>> planmain.c:276 >>> > #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, >>> tuple_fraction=0) at planner.c:1447 >>> > #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, >>> parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at >>> planner.c:1025 >>> > #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290, >>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, >>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c >>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, >>> boundParams=0x0) at planner.c:406 >>> > #19 0x0000000000844ce9 in planner (parse=0x1aaa290, >>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, >>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c >>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, >>> boundParams=0x0) at planner.c:277 >>> > #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290, >>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, >>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c >>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, >>> boundParams=0x0) at postgres.c:847 >>> > #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, >>> cursorOptions=2048, into=0x0, es=0x19b36f0, >>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, >>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c >>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", >>> > params=0x0, queryEnv=0x0) at explain.c:397 >>> > #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, >>> stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281 >>> > #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8, >>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, >>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c >>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", >>> > readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, >>> queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845 >>> > #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8, >>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, >>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c >>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", >>> > readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, >>> queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527 >>> > #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, >>> pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, >>> qc=0x7fffd6775f90) at pquery.c:1147 >>> > #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, >>> isTopLevel=true) at pquery.c:1026 >>> > #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, >>> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, >>> altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758 >>> > #28 0x0000000000978aa5 in exec_simple_query ( >>> > >>> > Thanks & Regards, >>> > Rajkumar Raghuwanshi >>> > >>> > >>> > On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangot...@gmail.com> >>> wrote: >>> >> >>> >> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangot...@gmail.com> >>> wrote: >>> >> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav >>> >> > <nitinjadhavpostg...@gmail.com> wrote: >>> >> > > The attached patch also fixes the above comments. >>> >> > >>> >> > I noticed that multi-column list partitions containing NULLs don't >>> >> > work correctly with partition pruning yet. >>> >> > >>> >> > create table p0 (a int, b text, c bool) partition by list (a, b, c); >>> >> > create table p01 partition of p0 for values in ((1, 1, true), >>> (NULL, 1, false)); >>> >> > create table p02 partition of p0 for values in ((1, NULL, false)); >>> >> > explain select * from p0 where a is null; >>> >> > QUERY PLAN >>> >> > -------------------------------------------------------- >>> >> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37) >>> >> > Filter: (a IS NULL) >>> >> > (2 rows) >>> >> > >>> >> > I guess that may be due to the following newly added code being >>> incomplete: >>> >> > >>> >> > +/* >>> >> > + * get_partition_bound_null_index >>> >> > + * >>> >> > + * Returns the partition index of the partition bound which >>> accepts NULL. >>> >> > + */ >>> >> > +int >>> >> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo) >>> >> > +{ >>> >> > + int i = 0; >>> >> > + int j = 0; >>> >> > + >>> >> > + if (!boundinfo->isnulls) >>> >> > + return -1; >>> >> > >>> >> > - if (!val->constisnull) >>> >> > - count++; >>> >> > + for (i = 0; i < boundinfo->ndatums; i++) >>> >> > + { >>> >> > + //TODO: Handle for multi-column cases >>> >> > + for (j = 0; j < 1; j++) >>> >> > + { >>> >> > + if (boundinfo->isnulls[i][j]) >>> >> > + return boundinfo->indexes[i]; >>> >> > } >>> >> > } >>> >> > >>> >> > + return -1; >>> >> > +} >>> >> > >>> >> > Maybe this function needs to return a "bitmapset" of indexes, >>> because >>> >> > multiple partitions can now contain NULL values. >>> >> > >>> >> > Some other issues I noticed and suggestions for improvement: >>> >> > >>> >> > +/* >>> >> > + * checkForDuplicates >>> >> > + * >>> >> > + * Returns TRUE if the list bound element is already present in >>> the list of >>> >> > + * list bounds, FALSE otherwise. >>> >> > + */ >>> >> > +static bool >>> >> > +checkForDuplicates(List *source, List *searchElem) >>> >> > >>> >> > This function name may be too generic. Given that it is specific to >>> >> > implementing list bound de-duplication, maybe the following >>> signature >>> >> > is more appropriate: >>> >> > >>> >> > static bool >>> >> > checkListBoundDuplicated(List *list_bounds, List *new_bound) >>> >> > >>> >> > Also, better if the function comment mentions those parameter >>> names, like: >>> >> > >>> >> > "Returns TRUE if the list bound element 'new_bound' is already >>> present >>> >> > in the target list 'list_bounds', FALSE otherwise." >>> >> > >>> >> > +/* >>> >> > + * transformPartitionListBounds >>> >> > + * >>> >> > + * Converts the expressions of list partition bounds from the raw >>> grammar >>> >> > + * representation. >>> >> > >>> >> > A sentence about the result format would be helpful, like: >>> >> > >>> >> > The result is a List of Lists of Const nodes to account for the >>> >> > partition key possibly containing more than one column. >>> >> > >>> >> > + int i = 0; >>> >> > + int j = 0; >>> >> > >>> >> > Better to initialize such loop counters closer to the loop. >>> >> > >>> >> > + colname[i] = (char *) palloc0(NAMEDATALEN * >>> sizeof(char)); >>> >> > + colname[i] = get_attname(RelationGetRelid(parent), >>> >> > + key->partattrs[i], false); >>> >> > >>> >> > The palloc in the 1st statement is wasteful, because the 2nd >>> statement >>> >> > overwrites its pointer by the pointer to the string palloc'd by >>> >> > get_attname(). >>> >> > >>> >> > + ListCell *cell2 = NULL; >>> >> > >>> >> > No need to explicitly initialize the loop variable. >>> >> > >>> >> > + RowExpr *rowexpr = NULL; >>> >> > + >>> >> > + if (!IsA(expr, RowExpr)) >>> >> > + ereport(ERROR, >>> >> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), >>> >> > + errmsg("Invalid list bound specification"), >>> >> > + parser_errposition(pstate, >>> exprLocation((Node >>> >> > *) spec)))); >>> >> > + >>> >> > + rowexpr = (RowExpr *) expr; >>> >> > >>> >> > It's okay to assign rowexpr at the top here instead of the dummy >>> >> > NULL-initialization and write the condition as: >>> >> > >>> >> > if (!IsA(rowexpr, RowExpr)) >>> >> > >>> >> > + if (isDuplicate) >>> >> > + continue; >>> >> > + >>> >> > + result = lappend(result, values); >>> >> > >>> >> > I can see you copied this style from the existing code, but how >>> about >>> >> > writing this simply as: >>> >> > >>> >> > if (!isDuplicate) >>> >> > result = lappend(result, values); >>> >> > >>> >> > -/* One value coming from some (index'th) list partition */ >>> >> > +/* One bound of a list partition */ >>> >> > typedef struct PartitionListValue >>> >> > { >>> >> > int index; >>> >> > - Datum value; >>> >> > + Datum *values; >>> >> > + bool *isnulls; >>> >> > } PartitionListValue; >>> >> > >>> >> > Given that this is a locally-defined struct, I wonder if it makes >>> >> > sense to rename the struct while we're at it. Call it, say, >>> >> > PartitionListBound? >>> >> > >>> >> > Also, please keep part of the existing comment that says that the >>> >> > bound belongs to index'th partition. >>> >> > >>> >> > Will send more comments in a bit... >>> >> >>> >> + * partition_bound_accepts_nulls >>> >> + * >>> >> + * Returns TRUE if partition bound has NULL value, FALSE otherwise. >>> >> */ >>> >> >>> >> I suggest slight rewording, as follows: >>> >> >>> >> "Returns TRUE if any of the partition bounds contains a NULL value, >>> >> FALSE otherwise." >>> >> >>> >> - PartitionListValue *all_values; >>> >> + PartitionListValue **all_values; >>> >> ... >>> >> - all_values = (PartitionListValue *) >>> >> - palloc(ndatums * sizeof(PartitionListValue)); >>> >> + ndatums = get_list_datum_count(boundspecs, nparts); >>> >> + all_values = (PartitionListValue **) >>> >> + palloc(ndatums * sizeof(PartitionListValue *)); >>> >> >>> >> I don't see the need to redefine all_values's pointer type. No need >>> >> to palloc PartitionListValue repeatedly for every datum as done >>> >> further down as follows: >>> >> >>> >> + all_values[j] = (PartitionListValue *) >>> >> palloc(sizeof(PartitionListValue)); >>> >> >>> >> You do need the following two though: >>> >> >>> >> + all_values[j]->values = (Datum *) palloc0(key->partnatts * >>> >> sizeof(Datum)); >>> >> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts * >>> >> sizeof(bool)); >>> >> >>> >> If you change the above the way I suggest, you'd also need to revert >>> >> the following change: >>> >> >>> >> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue), >>> >> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *), >>> >> qsort_partition_list_value_cmp, (void *) key); >>> >> >>> >> + int orig_index = all_values[i]->index; >>> >> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * >>> sizeof(Datum)); >>> >> >>> >> Missing a newline between these two statements. >>> >> >>> >> BTW, I noticed that the boundDatums variable is no longer used in >>> >> create_list_bounds. I traced back its origin and found that a recent >>> >> commit 53d86957e98 introduced it to implement an idea to reduce the >>> >> finer-grained pallocs that were being done in create_list_bounds(). I >>> >> don't think that this patch needs to throw away that work. You can >>> >> make it work as the attached delta patch that applies on top of v3. >>> >> Please check. >>> >> >>> >> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16 >>> >> *parttyplen, bool *parttypbyval, >>> >> if (b1->nindexes != b2->nindexes) >>> >> return false; >>> >> >>> >> - if (b1->null_index != b2->null_index) >>> >> + if (get_partition_bound_null_index(b1) != >>> >> get_partition_bound_null_index(b2)) >>> >> >>> >> As mentioned in the last message, this bit in partition_bounds_equal() >>> >> needs to be comparing "bitmapsets" of null bound indexes, that is >>> >> after fixing get_partition_bound_null_index() as previously mentioned. >>> >> >>> >> But... >>> >> >>> >> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16 >>> >> *parttyplen, bool *parttypbyval, >>> >> * context. datumIsEqual() should be simple enough >>> to be >>> >> * safe. >>> >> */ >>> >> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j], >>> >> + if (b1->isnulls) >>> >> + b1_isnull = b1->isnulls[i][j]; >>> >> + if (b2->isnulls) >>> >> + b2_isnull = b2->isnulls[i][j]; >>> >> + >>> >> + /* >>> >> + * If any of the partition bound has NULL value, then >>> check >>> >> + * equality for the NULL value instead of comparing >>> the datums >>> >> + * as it does not contain valid value in case of NULL. >>> >> + */ >>> >> + if (b1_isnull || b2_isnull) >>> >> + { >>> >> + if (b1_isnull != b2_isnull) >>> >> + return false; >>> >> + } >>> >> >>> >> ...if you have this in the main loop, I don't think we need the above >>> >> code stanza which appears to implement a short-cut for this long-form >>> >> logic. >>> >> >>> >> + (key->strategy != PARTITION_STRATEGY_LIST || >>> >> + !src->isnulls[i][j])) >>> >> >>> >> I think it's better to write this condition as follows just like the >>> >> accompanying condition involving src->kind: >>> >> >>> >> (src->nulls == NULL || !src->isnulls[i][j]) >>> >> >>> >> (Skipped looking at merge_list_bounds() and related changes for now as >>> >> I see a lot of TODOs remain to be done.) >>> >> >>> >> In check_new_partition_bound(): >>> >> >>> >> + Datum *values = (Datum *) >>> >> palloc0(key->partnatts * sizeof(Datum)); >>> >> + bool *isnulls = (bool *) >>> >> palloc0(key->partnatts * sizeof(bool)); >>> >> >>> >> Doesn't seem like a bad idea to declare these as: >>> >> >>> >> Datum values[PARTITION_MAX_KEYS]; >>> >> bool isnulls[PARTITION_MAX_KEYS]; >>> >> >>> >> >>> >> I looked at get_qual_for_list_multi_column() and immediately thought >>> >> that it may be a bad idea. I think it's better to integrate the logic >>> >> for multi-column case into the existing function even if that makes >>> >> the function appear more complex. Having two functions with the same >>> >> goal and mostly the same code is not a good idea mainly because it >>> >> becomes a maintenance burden. >>> >> >>> >> I have attempted a rewrite such that get_qual_for_list() now handles >>> >> both the single-column and multi-column cases. Changes included in >>> >> the delta patch. The patch updates some outputs of the newly added >>> >> tests for multi-column list partitions, because the new code emits the >>> >> IS NOT NULL tests a bit differently than >>> >> get_qual_for_list_mutli_column() would. Notably, the old approach >>> >> would emit IS NOT NULL for every non-NULL datum matched to a given >>> >> column, not just once for the column. However, the patch makes a few >>> >> other tests fail, mainly because I had to fix >>> >> partition_bound_accepts_nulls() to handle the multi-column case, >>> >> though didn't bother to update all callers of it to also handle the >>> >> multi-column case correctly. I guess that's a TODO you're going to >>> >> deal with at some point anyway. :) >>> >> >>> >> I still have more than half of v3 left to look at, so will continue >>> >> looking. In the meantime, please check the changes I suggested, >>> >> including the delta patch, and let me know your thoughts. >>> >> >>> >> -- >>> >> Amit Langote >>> >> EDB: http://www.enterprisedb.com >>> >>