On Wed, May 21, 2025, at 10:55 AM, Scott Mead wrote: > > > On Wed, May 21, 2025, at 3:50 AM, Laurenz Albe wrote: > > On Tue, 2025-05-20 at 16:58 -0400, Scott Mead wrote: > > > On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote: > > > > On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote: > > > > > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote: > > > > > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <sc...@meads.us> wrote: > > > > > > > I'll open by proposing that we prevent the planner from > > > > > > > automatically > > > > > > > selecting parallel plans by default > > > > > > > > > > > What is the fallout? When a high-volume, low-latency query flips > > > > > > > to > > > > > > > parallel execution on a busy system, we end up in a situation > > > > > > > where > > > > > > > the database is effectively DDOSing itself with a very high rate > > > > > > > of > > > > > > > connection establish and tear-down requests. > > > > > > > > You are painting a bleak picture indeed. I get to see PostgreSQL > > > > databases > > > > in trouble regularly, but I have not seen anything like what you > > > > describe. > > > > > > > > With an argument like that, you may as well disable nested loop joins. > > > > I have seen enough cases where disabling nested loop joins, without any > > > > deeper analysis, made very slow queries reasonably fast. > > > > > > My argument is that parallel query should not be allowed to be invoked > > > without > > > user intervention. Yes, nestedloop can have a similar impact, but let's > > > take > > > a look at the breakdown at scale of PQ: > > > > > > [pgbench run that shows that parallel query is bad for throughput] > > > > I think that your experiment is somewhat misleading. Sure, if you > > overload the machine with parallel workers, that will eventually also > > harm the query response time. But many databases out there are not > > overloaded, and the shorter response time that parallel query offers > > makes many users happy. > > It's not intended to be misleading, sorry for that. I agree that PQ can have > a positive effect, the point is that our current defaults will very quickly > take a basic workload on a modest (16 CPU box) and quickly swamp it with a > concurrency of 5, which is counter-intuitive, hard to debug, and usually not > desired (again, in the case of a plan that silently invokes parallelism). > > FWIW, setting max_parallel_workers_per_gather to 0 by default only disables > automatic PQ selection behind a SIGHUP (or with a user context), users can > easily re-enable it if they think want without having to restart (similar to > parallel_setup_cost, but without the uncertainty). > > During my testing, I actually found (again, at concurrency = 5) that the > default max_parallel_workers and max_worker_processes of 8 is not high > enough. If the default max_parallel_workers_per_gather is 0, then we'd be > able to to crank those defaults up (especially max_worker_processes which > requires a restart). > > > > > > It is well known that what is beneficial for response time is detrimental > > for the overall throughput and vice versa. > > It is well-known. What's not is that the postgres defaults will quickly > swamp a machine with parallelism. That's a lesson that many only learn after > it's happened to them. ISTM that the better path is to let someone try to > optimize with parallelism rather than have to fight with it during an > emergent event. > > IOW: I'd rather know that I'm walking into a marsh with rattlesnakes rather > than find out after I'd been bitten. > > > Now parallel query clearly is a feature that is good for response time > > and bad for throughput, but that is not necessarily wrong. > > Agreed, I do like and use parallel query. I just don't think it's wise that > we allow that planner to make that decision on a user's behalf when the > overhead is this high and the concurrency behavior falls apart so > spectacularly fast. > > > > > Essentially, you are arguing that the default configuration should favor > > throughput over response time. > > That's one take on it, I'm actually saying that the default configuration > should protect medium-sized systems from unintended behavior that quickly > degrades performance while being very hard to identify and quantify. > > > > > > > Going back to the original commit which enabled PQ by default[1], it was > > > done so that the feature would be tested during beta. I think it's time > > > that we limit the accidental impact this can have to users by disabling > > > the feature by default. > > > > I disagree. > > My experience is that parallel query often improves the user experience. > > Sure, there are cases where I recommend disabling it, but I think that > > disabling it by default would be a move in the wrong direction. > > > > On the other hand, I have also seen cases where bad estimates trigger > > parallel query by mistake, making queries slower. So I'd support an > > effort to increase the default value for "parallel_setup_cost". > > I'm open to discussing a value for parallel_setup_cost that protects users > from runaway here, I just haven't been able to find a value that allows users > to be protected while simultaneously allowing users who want automatic > parallel-plan selection to take advantage of it.
I'd like to re-open the discussion for this commitfest item. I still have not been able to find a value for parallel_setup_cost that makes good decisions about parallelism on a user's behalf. I believe that setting the SIGHUP-able max_parallel_workers_per_gather to 0 by default is still the best way to prevent runaway parallel execution behavior. > > What I've found (and it sounds somewhat similar to what you are saying) is > that if you use parallelism intentionally and design for it (hardware, > concurrency model, etc...) it's very, very powerful. In cases where it 'just > kicks in', I haven't seen an example that makes users happy. > > > > > > > Yours, > > Laurenz Albe > > > > -- > Scott Mead > Amazon Web Services > sc...@meads.us -- Scott Mead sc...@meads.us