On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-ab...@bloodgate.com> wrote: > Just a question for me to understand the implementation details vs. the > strategy: > > Have you considered how the scheduling decision might impact performance > due to "inter-plan parallelism vs. in-plan parallelism"? > > So what would be the scheduling strategy? And should there be a fixed one > or user-influencable? And what could be good ones? > > A simple example: > > E.g. if we have 5 subplans, and each can have at most 5 workers and we > have 5 workers overall. > > So, do we: > > Assign 5 workers to plan 1. Let it finish. > Then assign 5 workers to plan 2. Let it finish. > and so on > > or: > > Assign 1 workers to each plan until no workers are left?
Currently, we do the first of those, but I'm pretty sure the second is way better. For example, suppose each subplan has a startup cost. If you have all the workers pile on each plan in turn, every worker pays the startup cost for every subplan. If you spread them out, then subplans can get finished without being visited by all workers, and then the other workers never pay those costs. Moreover, you reduce contention for spinlocks, condition variables, etc. It's not impossible to imagine a scenario where having all workers pile on one subplan at a time works out better: for example, suppose you have a table with lots of partitions all of which are on the same disk, and it's actually one physical spinning disk, not an SSD or a disk array or anything, and the query is completely I/O-bound. Well, it could be, in that scenario, that spreading out the workers is going to turn sequential I/O into random I/O and that might be terrible. In most cases, though, I think you're going to be better off. If the partitions are on different spindles or if there's some slack I/O capacity for prefetching, you're going to come out ahead, maybe way ahead. If you come out behind, then you're evidently totally I/O bound and have no capacity for I/O parallelism; in that scenario, you should probably just turn parallel query off altogether, because you're not going to benefit from it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers