On Thu, Mar 16, 2017 at 6:27 AM, Amit Khandekar <amitdkhan...@gmail.com> wrote: > Attached is an updated patch v7, which does the above.
Some comments: - You've added a GUC (which is good) but not documented it (which is bad) or added it to postgresql.conf.sample (also bad). - You've used a loop inside a spinlock-protected critical section, which is against project policy. Use an LWLock; define and document a new builtin tranche ID. - The comment for pa_finished claims that it is the number of workers executing the subplan, but it's a bool, not a count; I think this comment is just out of date. - paths_insert_sorted_by_cost() is a hand-coded insertion sort. Can't we find a way to use qsort() for this instead of hand-coding a slower algorithm? I think we could just create an array of the right length, stick each path into it from add_paths_to_append_rel, and then qsort() the array based on <is-partial, total-cost>. Then the result can be turned into a list. - Maybe the new helper functions in nodeAppend.c could get names starting with exec_append_, to match the style of exec_append_initialize_next(). - There's a superfluous whitespace change in add_paths_to_append_rel. - The substantive changes in add_paths_to_append_rel don't look right either. It's not clear why accumulate_partialappend_subpath is getting called even in the non-enable_parallelappend case. I don't think the logic for the case where we're not generating a parallel append path needs to change at all. - When parallel append is enabled, I think add_paths_to_append_rel should still consider all the same paths that it does today, plus one extra. The new path is a parallel append path where each subpath is the cheapest subpath for that childrel, whether partial or non-partial. If !enable_parallelappend, or if all of the cheapest subpaths are partial, then skip this. (If all the cheapest subpaths are non-partial, it's still potentially useful.) In other words, don't skip consideration of parallel append just because you have a partial path available for every child rel; it could be - I think the way cost_append() works is not right. What you've got assumes that you can just multiply the cost of a partial plan by the parallel divisor to recover the total cost, which is not true because we don't divide all elements of the plan cost by the parallel divisor -- only the ones that seem like they should be divided. Also, it could be smarter about what happens with the costs of non-partial paths. I suggest the following algorithm instead. 1. Add up all the costs of the partial paths. Those contribute directly to the final cost of the Append. This ignores the fact that the Append may escalate the parallel degree, but I think we should just ignore that problem for now, because we have no real way of knowing what the impact of that is going to be. 2. Next, estimate the cost of the non-partial paths. To do this, make an array of Cost of that length and initialize all the elements to zero, then add the total cost of each non-partial plan in turn to the element of the array with the smallest cost, and then take the maximum of the array elements as the total cost of the non-partial plans. Add this to the result from step 1 to get the total cost. - In get_append_num_workers, instead of the complicated formula with log() and 0.693, just add the list lengths and call fls() on the result. Integer arithmetic FTW! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers