On Mon, Nov 30, 2015 at 7:47 AM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > "Asynchronous execution" is a feature to start substantial work > of nodes before doing Exec*. This can reduce total startup time > by folding startup time of multiple execution nodes. Especially > effective for the combination of joins or appends and their > multiple children that needs long time to startup. > > This patch does that by inserting another phase "Start*" between > ExecInit* and Exec* to launch parallel processing including > pgworker and FDWs before requesting the very first tuple of the > result.
I have thought about this, too, but I'm not very convinced that this is the right model. In a typical case involving parallelism, you hope to have the Gather node as close to the top of the plan tree as possible. Therefore, the start phase will not happen much before the first execution of the node, and you don't get much benefit. Moreover, I think that prefetching can be useful not only at the start of the query - which is the only thing that your model supports - but also in mid-query. For example, consider an Append of two ForeignScan nodes. Ideally we'd like to return the results in the order that they become available, rather than serially. This model might help with that for the first batch of rows you fetch, but not after that. There are a couple of other problems here that are specific to this example. You get a benefit here because you've got two Gather nodes that both get kicked off before we try to read tuples from either, but that's generally something to avoid - you can only use 3 processes and typically at most 2 of those will actually be running (as opposed to sleeping) at the same time: the workers will run to completion, and then the leader will wake up and do its thing. I'm not saying our current implementation of parallel query scales well to a large number of workers (it doesn't) but I think that's more about improving the implementation than any theoretical problem, so this seems a little worse. Also, currently, both merge and hash joins have an optimization wherein if the outer side of the join turns out to be empty, we avoid paying the startup cost for the inner side of the join; kicking off the work on the inner side of the merge join asynchronously before we've gotten any tuples from the outer side loses the benefit of that optimization. I suspect there is no single paradigm that will help with all of the cases where asynchronous execution is useful. We're going to need a series of changes that are targeted at specific problems. For example, here it would be useful to have one side of the join confirm at the earliest possible stage that it will definitely return at least one tuple eventually, but then return control to the caller so that we can kick off the other side of the join. The sort node never eliminates anything, so as soon as the sequential scan underneath it coughs up a tuple, we're definitely getting a return value eventually. At that point it's safe to kick off the other Gather node. I don't quite know how to design a signalling system for that, but it could be done. But is it important enough to be worthwhile? Maybe, maybe not. I think we should be working toward a world where the Gather is at the top of the plan tree as often as possible, in which case asynchronously kicking off a Gather node won't be that exciting any more - see notes on the "parallelism + sorting" thread where I talk about primitives that would allow massively parallel merge joins, rather than 2 or 3 way parallel. From my point of view, the case where we really need some kind of asynchronous execution solution is a ForeignScan, and in particular a ForeignScan which is the child of an Append. In that case it's obviously really useful to be able to kick off all the foreign scans and then return a tuple from whichever one coughs it up first. Is that the ONLY case where asynchronous execution is useful? Probably not, but I bet it's the big one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers