On 1/10/25 19:00, James Hunter wrote: > ... > > **Proposal:** > > I propose that we add a “query_work_mem” GUC, which works by > distributing (using some algorithm to be described in a follow-up > email) the entire “query_work_mem” to the query’s operators. And then > each operator will spill when it exceeds its own work_mem limit. So > we’ll preserve the existing “spill” logic as much as possible. > > To enable this to-be-described algorithm, I would add an “nbytes” > field to the Path struct, and display this (and related info) in > EXPLAIN PLAN. So the customer will be able to see how much work_mem > the SQL compiler thinks they’ll need, per operator; and so will the > algorithm. > > I wouldn’t change the existing planning logic (at least not in the > initial implementaton). So the existing planning logic would choose > between different SQL operators, still on the assumption that every > operator that needs working memory will get work_mem [* > hash_mem_multiplier].
All this seems generally feasible, but it hinges on the to-be-described algorithm can distribute the memory in a sensible way that doesn't affect the costing too much. If we plan a hash join with nbatch=1, and then come back and make it to use nbatch=1024, maybe we wouldn't have used hash join at all. Not sure. The fundamental issue seems to be not having any information about how much memory might be available to the operator. And in principle we can't have that during the bottom-up part of the planning, until after we construct the whole plan. Only at that point we know how many operators will need work_mem. Could we get at least some initial estimate how many such operators the query *might* end up using? Maybe that'd be just enough a priori information to set the effective work_mem for the planning part to make this practical. > This assumption might understate or overstate > the actual working memory we’ll give the operator, at runtime. If it > understates, the planner will be biased in favor of operators that > don’t use much working memory. If it overstates, the planner will be > biased in favor of operators that use too much working memory. > I'm not quite sure I understand this part. Could you elaborate? > (We could add a feedback loop to the planner, or even something simple > like generating multiple path, at different “work_mem” limits, but > everything I can think of here adds complexity without much potential > benefit. So I would defer any changes to the planner behavior until > later, if ever.) > What would be the feedback? I can imagine improving the estimate of how much memory a given operator needs during the bottom-up phase, but it doesn't quite help with knowing what will happen above the current node. > The to-be-described algorithm would look at a query’s Paths’ “nbytes” > fields, as well as the session “work_mem” GUC (which would, now, serve > as a hint to the SQL compiler), and decide how much of > “query_work_mem” to assign to the corresponding Plan node. > > It would assign that limit to a new “work_mem” field, on the Plan > node. And this limit would also be exposed, of course, in EXPLAIN > ANALYZE, along with the actual work_mem usage, which might very well > exceed the limit. This will let the customer know when a query spills, > and why. > > I would write the algorithm to maintain the existing work_mem > behavior, as much as possible. (Backward compatibility is good!) Most > likely, it would treat “work_mem” (and “hash_mem_multiplier”) as a > *minimum* work_mem. Then, so long as query_work_mem exceeds the sum of > work_mem [* hash _mem_multiplier] , for all operators in the query, > all operators would be assigned at least work_mem, which would make my > proposal a Pareto improvement. > > Last, at runtime, each PlanState would check its plan -> work_mem > field, rather than the global work_mem GUC. Execution would otherwise > be the same as today. > > What do you think? > I find it a bit hard to discuss an abstract proposal, without knowing the really crucial ingredient. It might be helpful to implement some sort of PoC of this approach, I'm sure that'd give us a lot of insights and means to experiment with it (instead of just speculating about what might happen). regards -- Tomas Vondra