On Tue, 4 Jul 2023 at 06:56, James Coleman <jtc...@gmail.com> wrote: > > On Sun, Jun 11, 2023 at 10:23 PM James Coleman <jtc...@gmail.com> wrote: > > > > ... > > > And while trying the v9 patch I came across a crash with the query > > > below. > > > > > > set min_parallel_table_scan_size to 0; > > > set parallel_setup_cost to 0; > > > set parallel_tuple_cost to 0; > > > > > > explain (costs off) > > > select * from pg_description t1 where objoid in > > > (select objoid from pg_description t2 where t2.description = > > > t1.description); > > > QUERY PLAN > > > -------------------------------------------------------- > > > Seq Scan on pg_description t1 > > > Filter: (SubPlan 1) > > > SubPlan 1 > > > -> Gather > > > Workers Planned: 2 > > > -> Parallel Seq Scan on pg_description t2 > > > Filter: (description = t1.description) > > > (7 rows) > > > > > > select * from pg_description t1 where objoid in > > > (select objoid from pg_description t2 where t2.description = > > > t1.description); > > > WARNING: terminating connection because of crash of another server > > > process > > > > > > Seems something is wrong when extracting the argument from the Param in > > > parallel worker. > > > > With what I'm trying to change I don't think this plan should ever be > > generated since it means we'd have to pass a param from the outer seq > > scan into the parallel subplan, which we can't do (currently). > > > > I've attached the full backtrace to the email, but as you hinted at > > the parallel worker is trying to get the param (in this case > > detoasting it), but the param doesn't exist on the worker, so it seg > > faults. Looking at this further I think there's an existing test case > > that exposes the misplanning here (the one right under the comment > > "Parallel Append is not to be used when the subpath depends on the > > outer param" in select_parallel.sql), but it doesn't seg fault because > > the param is an integer, doesn't need to be detoasted, and therefore > > (I think) we skate by (but probably with wrong results in depending on > > the dataset). > > > > Interestingly this is one of the existing test queries my original > > patch approach didn't change, so this gives me something specific to > > work with improving the path. Thanks for testing this and bringing > > this to my attention! > > Here's what I've found debugging this: > > There's only a single gather path ever created when planning this > query, making it easy to know which one is the problem. That gather > path is created with this stacktrace: > > frame #0: 0x0000000105291590 > postgres`create_gather_path(root=0x000000013081ae78, > rel=0x000000013080c8e8, subpath=0x000000013081c080, > target=0x000000013081c8c0, required_outer=0x0000000000000000, > rows=0x0000000000000000) at pathnode.c:1971:2 > frame #1: 0x0000000105208e54 > postgres`generate_gather_paths(root=0x000000013081ae78, > rel=0x000000013080c8e8, override_rows=false) at allpaths.c:3097:4 > frame #2: 0x00000001052090ec > postgres`generate_useful_gather_paths(root=0x000000013081ae78, > rel=0x000000013080c8e8, override_rows=false) at allpaths.c:3241:2 > frame #3: 0x0000000105258754 > postgres`apply_scanjoin_target_to_paths(root=0x000000013081ae78, > rel=0x000000013080c8e8, scanjoin_targets=0x000000013081c978, > scanjoin_targets_contain_srfs=0x0000000000000000, > scanjoin_target_parallel_safe=true, tlist_same_exprs=true) at > planner.c:7696:3 > frame #4: 0x00000001052533cc > postgres`grouping_planner(root=0x000000013081ae78, tuple_fraction=0.5) > at planner.c:1611:3 > frame #5: 0x0000000105251e9c > postgres`subquery_planner(glob=0x00000001308188d8, > parse=0x000000013080caf8, parent_root=0x000000013080cc38, > hasRecursion=false, tuple_fraction=0.5) at planner.c:1062:2 > frame #6: 0x000000010526b134 > postgres`make_subplan(root=0x000000013080cc38, > orig_subquery=0x000000013080ff58, subLinkType=ANY_SUBLINK, > subLinkId=0, testexpr=0x000000013080d848, isTopQual=true) at > subselect.c:221:12 > frame #7: 0x0000000105268b8c > postgres`process_sublinks_mutator(node=0x000000013080d6d8, > context=0x000000016b0998f8) at subselect.c:1950:10 > frame #8: 0x0000000105268ad8 > postgres`SS_process_sublinks(root=0x000000013080cc38, > expr=0x000000013080d6d8, isQual=true) at subselect.c:1923:9 > frame #9: 0x00000001052527b8 > postgres`preprocess_expression(root=0x000000013080cc38, > expr=0x000000013080d6d8, kind=0) at planner.c:1169:10 > frame #10: 0x0000000105252954 > postgres`preprocess_qual_conditions(root=0x000000013080cc38, > jtnode=0x000000013080d108) at planner.c:1214:14 > frame #11: 0x0000000105251580 > postgres`subquery_planner(glob=0x00000001308188d8, > parse=0x0000000137010d68, parent_root=0x0000000000000000, > hasRecursion=false, tuple_fraction=0) at planner.c:832:2 > frame #12: 0x000000010525042c > postgres`standard_planner(parse=0x0000000137010d68, > query_string="explain (costs off)\nselect * from pg_description t1 > where objoid in\n (select objoid from pg_description t2 where > t2.description = t1.description);", cursorOptions=2048, > boundParams=0x0000000000000000) at planner.c:411:9 > > There aren't any lateral markings on the rels. Additionally the > partial path has param_info=null (I found out from Tom in a separate > thread [1] that this is only set for outer relations from the same > query level). > > The only param that I could easily find at first was a single param of > type PARAM_EXTERN in root->plan_params in make_subplan(). > > I spent a lot of time trying to figure out where we could find the > PARAM_EXEC param that's being fed into the subplan, but it doesn't > seem like we have access to any of these things at the point in the > path creation process that it's interesting to us when inserting the > gather nodes. > > Given all of that I settled on this approach: > 1. Modify is_parallel_safe() to by default ignore PARAM_EXEC params. > 2. Add is_parallel_safe_with_params() that checks for the existence of > such params. > 3. Store the required params in a bitmapset on each base rel. > 4. Union the bitmapset on join rels. > 5. Only insert a gather node if that bitmapset is empty. > > I have an intuition that there's some spot (e.g. joins) that we should > be removing params from this set (e.g., when we've satisfied them), > but I haven't been able to come up with such a scenario as yet. > > The attached v11 fixes the issue you reported.
One of the tests has failed in CFBot at [1] with: +++ /tmp/cirrus-ci-build/build/testrun/pg_upgrade/002_pg_upgrade/data/results/select_parallel.out 2023-12-20 20:08:42.480004000 +0000 @@ -137,23 +137,24 @@ explain (costs off) select (select max((select pa1.b from part_pa_test pa1 where pa1.a = pa2.a))) from part_pa_test pa2; - QUERY PLAN --------------------------------------------------------------- - Aggregate + QUERY PLAN +-------------------------------------------------------------------- + Finalize Aggregate -> Gather Workers Planned: 3 - -> Parallel Append - -> Parallel Seq Scan on part_pa_test_p1 pa2_1 - -> Parallel Seq Scan on part_pa_test_p2 pa2_2 + -> Partial Aggregate + -> Parallel Append + -> Parallel Seq Scan on part_pa_test_p1 pa2_1 + -> Parallel Seq Scan on part_pa_test_p2 pa2_2 + SubPlan 1 + -> Append + -> Seq Scan on part_pa_test_p1 pa1_1 + Filter: (a = pa2.a) + -> Seq Scan on part_pa_test_p2 pa1_2 + Filter: (a = pa2.a) SubPlan 2 -> Result - SubPlan 1 - -> Append - -> Seq Scan on part_pa_test_p1 pa1_1 - Filter: (a = pa2.a) - -> Seq Scan on part_pa_test_p2 pa1_2 - Filter: (a = pa2.a) -(14 rows) +(15 rows) More details of the failure is available at [2]. [1] - https://cirrus-ci.com/task/5685696451575808 [2] - https://api.cirrus-ci.com/v1/artifact/task/5685696451575808/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade Regards, Vignesh