On 10 November 2014 15:57, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Oct 15, 2014 at 2:55 PM, Simon Riggs <si...@2ndquadrant.com> wrote: >> Something usable, with severe restrictions, is actually better than we >> have now. I understand the journey this work represents, so don't be >> embarrassed by submitting things with heuristics and good-enoughs in >> it. Our mentor, Mr.Lane, achieved much by spreading work over many >> releases, leaving others to join in the task. > > It occurs to me that, now that the custom-scan stuff is committed, it > wouldn't be that hard to use that, plus the other infrastructure we > already have, to write a prototype of parallel sequential scan.
+1 > Given > where we are with the infrastructure, there would be a number of > unhandled problems, such as deadlock detection (needs group locking or > similar), assessment of quals as to parallel-safety (needs > proisparallel or similar), general waterproofing to make sure that > pushing down a qual we shouldn't does do anything really dastardly > like crash the server (another written but yet-to-be-published patch > adds a bunch of relevant guards), and snapshot sharing (likewise). > But if you don't do anything weird, it should basically work. If we build something fairly restricted, but production ready we can still do something useful to users. * only functions marked as "CONTAINS NO SQL" * parallel_workers = 2 or fixed * only one Plan type, designed to maximise benefit and minimize optimizer change Why? * The above is all that is needed to test any infrastructure changes we accept. We need both infrastructure and tests. We could do this as a custom scan plugin, but why not make it work in core - that doesn't prevent a plugin version also if you desire it. * only functions marked as "CONTAINS NO SQL" We don't really know what proisparallel is, but we do know what CONTAINS NO SQL means and can easily check for it. Plus I already have a patch for this, slightly bitrotted. * parallel_workers = 2 (or at least not make it user settable) By fixing the number of workers at 2 we avoid any problems caused by having N variable, such as how to vary N fairly amongst users and other such considerations. We get the main benefit of parallelism, without causing other issues across the server. * Fixed Plan: aggregate-scan To make everything simpler, allow only plans of a single type. SELECT something, list of aggregates FROM foo WHERE filters GROUP BY something because we know that passing large amounts of data from worker to master process will be slow, so focusing only on seq scan is not sensible; we should focus on plans that significantly reduce the number of rows passed upwards. We could just do this for very selective WHERE clauses, but that is not an important class of query. As soon as include aggregates, we reduce data passing significantly AND we hit a very important subset of queries: This plan type is widely used in reporting queries, so will hit the mainline of BI applications and many Mat View creations. This will allow SELECT count(*) FROM foo to go faster also. The execution plan for that query type looks like this... Hash Aggregate Gather From Workers {Worker Nodes workers = 2 HashAggregate PartialScan} Which is simple enough that we include a mechanism for the Gather operation, plus it is simple enough to not need extensive optimizer changes - just changes to set_plain_rel_pathlist() to consider parallel plans. Plan costs are easily to calculate for the above... cpu_worker_startup_cost = 100 -- Startup cost is easy to calculate by observation, but a reasonably large default will be OK cpu_ipc_tuple_cost = 0.1 -- assume x10 normal cost of cpu_tuple_cost Partial scan costs are just same as SeqScan, just with fewer blocks. All other costs are the same We can submit the main patch by Dec 15, fix all the problems by Feb 15. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers