On 2015-02-18 16:59:26 +0530, Amit Kapila wrote: > On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund <and...@2ndquadrant.com> > wrote: > > A query whose runetime is dominated by a sequential scan (+ attached > > filter) is certainly going to require a bigger prefetch size than one > > that does other expensive stuff. > > > > Imagine parallelizing > > SELECT * FROM largetable WHERE col = low_cardinality_value; > > and > > SELECT * > > FROM largetable JOIN gigantic_table ON (index_nestloop_condition) > > WHERE col = high_cardinality_value; > > > > The first query will be a simple sequential and disk reads on largetable > > will be the major cost of executing it. In contrast the second query > > might very well sensibly be planned as a parallel sequential scan with > > the nested loop executing in the same worker. But the cost of the > > sequential scan itself will likely be completely drowned out by the > > nestloop execution - index probes are expensive/unpredictable.
> I think the work/task given to each worker should be as granular > as possible to make it more predictable. > I think the better way to parallelize such a work (Join query) is that > first worker does sequential scan and filtering on large table and > then pass it to next worker for doing join with gigantic_table. I'm pretty sure that'll result in rather horrible performance. IPC is rather expensive, you want to do as little of it as possible. > > > > > > I think it makes sense to think of a set of tasks in which workers can > > > assist. So you a query tree which is just one query tree, with no > > > copies of the nodes, and then there are certain places in that query > > > tree where a worker can jump in and assist that node. To do that, it > > > will have a copy of the node, but that doesn't mean that all of the > > > stuff inside the node becomes shared data at the code level, because > > > that would be stupid. > > > > My only "problem" with that description is that I think workers will > > have to work on more than one node - it'll be entire subtrees of the > > executor tree. > There could be some cases where it could be beneficial for worker > to process a sub-tree, but I think there will be more cases where > it will just work on a part of node and send the result back to either > master backend or another worker for further processing. I think many parallelism projects start out that way, and then notice that it doesn't parallelize very efficiently. The most extreme example, but common, is aggregation over large amounts of data - unless you want to ship huge amounts of data between processes eto parallize it you have to do the sequential scan and the pre-aggregate step (that e.g. selects count() and sum() to implement a avg over all the workers) inside one worker. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers