On Tue, Feb 17, 2015 at 11:22 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> I still think this belongs in heapam.c somehow or other. If the logic >> is all in the executor, then it becomes impossible for any code that >> doensn't use the executor to do a parallel heap scan, and that's >> probably bad. It's not hard to imagine something like CLUSTER wanting >> to reuse that code, and that won't be possible if the logic is up in >> some higher layer. > > Yea. > >> If the logic we want is to start with a large chunk size and then >> switch to a small chunk size when there's not much of the relation >> left to scan, there's still no reason that can't be encapsulated in >> heapam.c. > > I don't mind having some logic in there, but I think you put in too > much. The snapshot stuff should imo go, and the next page logic should > be caller provided.
If we need to provide a way for the caller to provide the next-page logic, then I think that should be done via configuration arguments or flags, not a callback. There's just no way that the needs of the executor are going to be so radically different from a utility command that only a callback will do. >> I think it makes sense to think of a set of tasks in which workers can >> assist. So you a query tree which is just one query tree, with no >> copies of the nodes, and then there are certain places in that query >> tree where a worker can jump in and assist that node. To do that, it >> will have a copy of the node, but that doesn't mean that all of the >> stuff inside the node becomes shared data at the code level, because >> that would be stupid. > > My only "problem" with that description is that I think workers will > have to work on more than one node - it'll be entire subtrees of the > executor tree. Amit and I had a long discussion about this on Friday while in Boston together. I previously argued that the master and the slave should be executing the same node, ParallelSeqScan. However, Amit argued persuasively that what the master is doing is really pretty different from what the worker is doing, and that they really ought to be running two different nodes. This led us to cast about for a better design, and we came up with something that I think will be much better. The basic idea is to introduce a new node called Funnel. A Funnel node will have a left child but no right child, and its job will be to fire up a given number of workers. Each worker will execute the plan which is the left child of the funnel. The funnel node itself will pull tuples from all of those workers, and can also (if there are no tuples available from any worker) execute the plan itself. So a parallel sequential scan will look something like this: Funnel Workers: 4 -> Partial Heap Scan on xyz What this is saying is that each worker is going to scan part of the heap for xyz; together, they will scan the whole thing. The neat thing about this way of separating things out is that we can eventually write code to push more stuff down into the funnel. For example, consider this: Nested Loop -> Seq Scan on foo -> Index Scan on bar Index Cond: bar.x = foo.x Now, if a parallel sequential scan is cheaper than a regular sequential scan, we can instead do this: Nested Loop -> Funnel -> Partial Heap Scan on foo -> Index Scan on bara Index Cond: bar.x = foo.x The problem with this is that the nested loop/index scan is happening entirely in the master. But we can have logic that fixes that by knowing that a nested loop can be pushed through a funnel, yielding this: Funnel -> Nested Loop -> Partial Heap Scan on foo -> Index Scan on bar Index Cond: bar.x = foo.x Now that's pretty neat. One can also imagine doing this with aggregates. Consider: HashAggregate -> Funnel -> Partial Heap Scan on foo Filter: x = 1 Here, we can't just push the HashAggregate through the filter, but given infrastructure for we could convert that to something like this: HashAggregateFinish -> Funnel -> HashAggregatePartial -> Partial Heap Scan on foo Filter: x = 1 That'd be swell. You can see that something like this will also work for breaking off an entire plan tree and shoving it down into a worker. The master can't participate in the computation in that case, but it's otherwise the same idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers