On Sat, Apr 25, 2020 at 7:25 AM David Rowley <dgrowle...@gmail.com> wrote: > I looked at this again and I don't think what I've got is right.
Apart from the considerations which you raised, I think there's also a costing issue here. For instance, suppose we have an Append with three children. Two of them have partial paths which will complete after consuming 1 second of CPU time, and using a partial path is unquestionably the best strategy for those children. The third one has a partial path which will complete after using 10 seconds of CPU time and a non-partial path which will complete after using 8 seconds of CPU time. What strategy is best? If we pick the non-partial path, the time that the whole thing takes to complete will be limited by the non-partial path, which, since it cannot use parallelism, will take 8 seconds of wall clock time. If we pick the partial path, we have a total of 12 seconds of CPU time which we can finish in 6 wall clock seconds with 2 participants, 4 seconds with 3 participants, or 3 seconds with 4 participants. This is clearly a lot better. Incidentally, the planner knows about the fact that you can't finish an Append faster than you can finish its non-partial participants. See append_nonpartial_cost(). Now, I don't think anything necessarily gets broken here by your patch as written. Planner decisions are made on estimated cost, which is a proxy for wall clock time, not CPU time. Right now, we only prefer the non-partial path if we think it can be executed in less time by ONE worker than the partial path could be executed by ALL workers: /* * Either we've got only a non-partial path, or we think that * a single backend can execute the best non-partial path * faster than all the parallel backends working together can * execute the best partial path. So, in the above example, we wouldn't even consider the non-partial path. Even say we just have 2 workers. The partial path will have a cost of 6, which is less than 8, so it gets rejected. But as the comment goes on to note: * It might make sense to be more aggressive here. Even if * the best non-partial path is more expensive than the best * partial path, it could still be better to choose the * non-partial path if there are several such paths that can * be given to different workers. For now, we don't try to * figure that out. This kind of strategy is particularly appealing when the number of Append children is large compared to the number of workers. For instance, if you have an Append with 100 children and you are planning with 4 workers, it's probably a pretty good idea to be very aggressive about picking the path that uses the least resources, which the current algorithm would not do. You're unlikely to end up with idle workers, because you have so many plans to which they can be allocated. However, it's possible: it could be that there's one child which is way bigger than all the others and the really important thing is to get a partial path for that child, so that it can be attacked in parallel, even if that means that overall resource consumption is higher. As the number of children decreases relative to the number of workers, having partial paths becomes increasingly appealing. To take a degenerate case, suppose you have 4 workers but only 2 children. Partial paths should look really appealing, because the alternative is leaving workers unused. I *think* your patch is based around the idea of merging the turn-the-append-of-partial-paths-into-a-parallel-append case with the consider-a-mix-of-partial-and-nonpartial-paths-for-parallel-append case. That seems possible to do given that the current heuristic is to compare the raw path costs, but I think that it wouldn't work if we wanted to be more aggressive about considering the mixed strategy and letting the costing machinery sort out which way is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company