On 8 September 2017 at 19:17, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Fri, Sep 8, 2017 at 3:59 PM, Amit Khandekar <amitdkhan...@gmail.com> wrote: >> On 7 September 2017 at 11:05, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar <amitdkhan...@gmail.com> >>> wrote: >>> 3. >>> +/* ---------------------------------------------------------------- >>> + * exec_append_leader_next >>> + * >>> + * To be used only if it's a parallel leader. The backend should scan >>> + * backwards from the last plan. This is to prevent it from taking up >>> + * the most expensive non-partial plan, i.e. the first subplan. >>> + * ---------------------------------------------------------------- >>> + */ >>> +static bool >>> +exec_append_leader_next(AppendState *state) >>> >>> From above explanation, it is clear that you don't want backend to >>> pick an expensive plan for a leader, but the reason for this different >>> treatment is not clear. >> >> Explained it, saying that for more workers, a leader spends more work >> in processing the worker tuples , and less work contributing to >> parallel processing. So it should not take expensive plans, otherwise >> it will affect the total time to finish Append plan. >> > > In that case, why can't we keep the workers also process in same > order, what is the harm in that?
Because of the way the logic of queuing works, the workers finish earlier if they start with expensive plans first. For e.g. : with 3 plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will finish in 8 time units (w1 will finish plan 1 in 8, while in parallel w2 will finish the remaining 2 plans in 8 units. Whereas if the plans are ordered like : 4, 4, 8, then the workers will finish in 12 time units (w1 and w2 will finish each of the 1st two plans in 4 units, and then w1 or w2 will take up plan 3 and finish in 8 units, while the other worker remains idle). > Also, the leader will always scan > the subplans from last subplan even if all the subplans are partial > plans. Since we already need to have two different code paths, I think we can use the same code paths for any subplans. > I think this will be the unnecessary difference in the > strategy of leader and worker especially when all paths are partial. > I think the selection of next subplan might become simpler if we use > the same strategy for worker and leader. Yeah if we had a common method for both it would have been better. But anyways we have different logics to maintain. > > Few more comments: > > 1. > + else if (IsA(subpath, MergeAppendPath)) > + { > + MergeAppendPath *mpath = (MergeAppendPath *) subpath; > + > + /* > + * If at all MergeAppend is partial, all its child plans have to be > + * partial : we don't currently support a mix of partial and > + * non-partial MergeAppend subpaths. > + */ > + if (is_partial) > + return list_concat(partial_subpaths, list_copy(mpath->subpaths)); > > In which situation partial MergeAppendPath is generated? Can you > provide one example of such path? Actually currently we don't support partial paths for MergeAppendPath. That code just has that if condition (is_partial) but currently that condition won't be true for MergeAppendPath. > > 2. > add_paths_to_append_rel() > { > .. > + /* Consider parallel append path. */ > + if (pa_subpaths_valid) > + { > + AppendPath *appendpath; > + int parallel_workers; > + > + parallel_workers = get_append_num_workers(pa_partial_subpaths, > + pa_nonpartial_subpaths); > + appendpath = create_append_path(rel, pa_nonpartial_subpaths, > + pa_partial_subpaths, > + NULL, parallel_workers, true, > + partitioned_rels); > + add_partial_path(rel, (Path *) appendpath); > + } > + > /* > - * Consider an append of partial unordered, unparameterized partial paths. > + * Consider non-parallel partial append path. But if the parallel append > + * path is made out of all partial subpaths, don't create another partial > + * path; we will keep only the parallel append path in that case. > */ > - if (partial_subpaths_valid) > + if (partial_subpaths_valid && !pa_all_partial_subpaths) > { > AppendPath *appendpath; > ListCell *lc; > int parallel_workers = 0; > > /* > - * Decide on the number of workers to request for this append path. > - * For now, we just use the maximum value from among the members. It > - * might be useful to use a higher number if the Append node were > - * smart enough to spread out the workers, but it currently isn't. > + * To decide the number of workers, just use the maximum value from > + * among the children. > */ > foreach(lc, partial_subpaths) > { > @@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root, > RelOptInfo *rel, > } > Assert(parallel_workers > 0); > > - /* Generate a partial append path. */ > - appendpath = create_append_path(rel, partial_subpaths, NULL, > - parallel_workers, partitioned_rels); > + appendpath = create_append_path(rel, NIL, partial_subpaths, > + NULL, parallel_workers, false, > + partitioned_rels); > add_partial_path(rel, (Path *) appendpath); > } > .. > } > > I think it might be better to add a sentence why we choose a different > way to decide a number of workers in the second case > (non-parallel-aware append). Yes, I agree. Will do that after we conclude with your next point below ... > Do you think non-parallel-aware Append > will be better in any case when there is a parallel-aware append? I > mean to say let's try to create non-parallel-aware append only when > parallel-aware append is not possible. By non-parallel-aware append, I am assuming you meant partial non-parallel-aware Append. Yes, if the parallel-aware Append path has *all* partial subpaths chosen, then we do omit a partial non-parallel Append path, as seen in this code in the patch : /* * Consider non-parallel partial append path. But if the parallel append * path is made out of all partial subpaths, don't create another partial * path; we will keep only the parallel append path in that case. */ if (partial_subpaths_valid && !pa_all_partial_subpaths) { ...... } But if the parallel-Append path has a mix of partial and non-partial subpaths, then we can't really tell which of the two could be cheapest until we calculate the cost. It can be that the non-parallel-aware partial Append can be cheaper as well. > > 3. > + * evaluates to a value just a bit greater than max(w1,w2, w3). So, we > > The spacing between w1, w2, w3 is not same. Right, will note this down for the next updated patch. > > 4. > - select count(*) from a_star; > -select count(*) from a_star; > + select round(avg(aa)), sum(aa) from a_star; > +select round(avg(aa)), sum(aa) from a_star; > > Why you have changed the existing test. It seems count(*) will also > give what you are expecting. Needed to do cover some data testing with Parallel Append execution. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers