On Fri, Nov 20, 2015 at 1:25 AM, Robert Haas <robertmh...@gmail.com> wrote: > > On Thu, Nov 19, 2015 at 2:59 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > Won't it be useful to consider parameterized paths for below kind of > > plans where we can push the jointree to worker and each worker can > > scan the complete outer relation A and then the rest work is divided > > among workers (ofcourse there can be other ways to parallelize such joins, > > but still the way described also seems to be possible)? > > > > NestLoop > > -> Seq Scan on A > > Hash Join > > Join Condition: B.Y = C.W > > -> Seq Scan on B > > -> Index Scan using C_Z_IDX on C > > Index Condition: C.Z = A.X > > I had thought that this sort of plan wouldn't actually occur in real > life, but it seems that it does. What you've written here is a little > muddled - the hash join has no hash underneath, for example, and > there'd have to be some sort of join order restriction in order to > consider a plan of this type. However, somewhat to my surprise, I was > able to get a plan much like this by doing this: > .. > > So, all in all, I think this isn't a very promising type of plan - > both because we haven't got the infrastructure to make it safe to > execute today, and because even if we did have that infrastructure it > wouldn't be the right choice except in narrow circumstances. >
I think not only above type of plan, but it would be helpful to parallelize some other forms of joins ((refer "Parameterized Paths" section in optimiser/README) as well where parametrized params concept will be required. I am not sure if we can say that such cases will be narrow, so let's leave them, but surely we don't have enough infrastructure at the moment to parallelize them. > We can > of course revise that decision in the future if things look different > then. > No issues. The main reason why I brought up this discussion is to see the possibility of keeping logic of add_partial_path() and add_path() same, so that it is easy to maintain. There is no correctness issue here, so I defer it to you. > > > > Because I think the way code is written, it assumes that for each of the > > inheritence-child relation which has pages lesser than threshold, half > > the work will be done by master-backend which doesn't seem to be the > > right distribution. Consider a case where there are three such children > > each having cost 100 to scan, now it will cost them as > > 100/1.5 + 100/1.5 + 100/1.5 which means that per worker, it is > > considering 0.5 of master backends work which seems to be wrong. > > > > I think for Append case, we should consider this cost during Append path > > creation in create_append_path(). Basically we can make cost_seqscan > > to ignore the cost reduction due to parallel_degree for inheritance > > relations > > and then during Append path creation we can consider it and also consider > > work unit of master backend as 0.5 with respect to overall work. > > No, I don't think that's right. It's true that the way we're > calculating parallel_degree for each relation is unprincipled right > now, and we need to improve that. But if it were correct, then what > we're doing here would also be correct. If the number of workers > chosen for each child plan reflected the maximum number that could be > used effectively by that child plan, then any extras wouldn't speed > things up even if they were present, > Okay, but I think that's not what I am talking about. I am talking about below code in cost_seqscan: - if (nworkers > 0) - run_cost = run_cost / (nworkers + 0.5); + if (path->parallel_degree > 0) + run_cost = run_cost / (path->parallel_degree + 0.5); It will consider 50% of master backends effort for scan of each child relation, does that look correct to you? Wouldn't 50% of master backends effort be considered to scan all the child relations? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com