On Tue, Feb 14, 2017 at 4:24 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On further evaluation, it seems this patch has one big problem which > is that it will allow forming parallel plans which can't be supported > with current infrastructure. For ex. marking immediate level params > as parallel safe can generate below type of plan: > > Seq Scan on t1 > Filter: (SubPlan 1) > SubPlan 1 > -> Gather > Workers Planned: 1 > -> Result > One-Time Filter: (t1.k = 0) > -> Parallel Seq Scan on t2 > > > In this plan, we can't evaluate one-time filter (that contains > correlated param) unless we have the capability to pass all kind of > PARAM_EXEC param to workers. I don't want to invest too much time in > this patch unless somebody can see some way using current parallel > infrastructure to implement correlated subplans.
I don't think this approach has much chance of working; it just seems too simplistic. I'm not entirely sure what the right approach is. Unfortunately, the current query planner code seems to compute the sets of parameters that are set and used quite late, and really only on a per-subquery level. Here we need to know whether there is anything that's set below the Gather node and used above it, or the other way around, and we need to know it much earlier, while we're still doing path generation. There doesn't seem to be any simple way of getting that information, but I think you need it. What's more, I think you would still need it even if you had the ability to pass parameter values between processes. For example, consider: Gather -> Parallel Seq Scan Filter: (Correlated Subplan Reference Goes Here) Of course, the Param in the filter condition *can't* be a shared Param across all processes. It needs to be private to each process participating in the parallel sequential scan -- and the params passing data down from the Parallel Seq Scan to the correlated subplan also need to be private. On the other hand, in your example quoted above, you do need to share across processes: the value for t1.k needs to get passed down. So it seems to me that we somehow need to identify, for each parameter that gets used, whether it's provided by something beneath the Gather node (in which case it should be private to the worker) or whether it's provided from higher up (in which case it should be passed down to the worker, or if we can't do that, then don't use parallelism there). (There's also possible a couple of other cases, like an initPlan that needs to get executed only once, and also maybe a case where a parameter is set below the Gather and later used above the Gather. Not sure if that latter one happen, or how to deal with it.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers