Hi, On Thu, May 20, 2021 at 5:02 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
> Hi, > > On 5/20/21 5:43 AM, Andy Fan wrote: > > Currently we are using a custom/generic strategy to handle the data skew > > issue. However, it doesn't work well all the time. For example: SELECT * > > FROM t WHERE a between $1 and $2. We assume the selectivity is 0.0025, > > But users may provide a large range every time. Per our current strategy, > > a generic plan will be chosen, Index scan on A will be chosen. oops.. > > > > Yeah, the current logic is rather simple, which is however somewhat on > purpose, as it makes the planning very cheap. But it also means there's > very little info to check/compare and so we may make mistakes. > > > I think Oracle's Adaptive Cursor sharing should work. First It calculate > > the selectivity with the real bind values and generate/reuse different > plan > > based on the similarity of selectivity. The challenges I can think of > > now are: > > a). How to define the similarity. b). How to adjust the similarity > > during the > > real run. for example, we say [1% ~ 10%] is similar. but we find > > selectivity 20% > > used the same plan as 10%. what should be done here. > > > > IMO the big question is how expensive this would be. Calculating the > selectivities for real values (i.e. for each query) is not expensive, > but it's not free either. So even if we compare the selectivities in > some way and skip the actual query planning, it's still going to impact > the prepared statements. > That's true if we need to do this every time. We may just need to do this on some cases where the estimation is likely to be wrong, like a > $1; or a between $1 and $2; In such cases, we just use the hard coded value currently. > Also, we currently don't have any mechanism to extract the selectivities > from the whole query - not sure how complex that would be, as it may > involve e.g. join selectivities. > > The idea in my mind is just checking the quals on base relations. like t1.a > $1. So for something like t1.a + t2.a > $1 will be ignored. > > As for how to define the similarity, I doubt there's a simple and > sensible/reliable way to do that :-( > > I remember reading a paper about query planning in which the parameter > space was divided into regions with the same plan. In this case the > parameters are selectivities for all the query operations. So what we > might do is this: > > 1) Run the first N queries and extract the selectivities / plans. > > 2) Build "clusters" of selecitivies with the same plan. > > 3) Before running a query, see if it the selectivities fall into one of > the existing clusters. If yes, use the plan. If not, do regular > planning, add it to the data set and repeat (2). > > I have no idea how expensive would this be, and I assume the "clusters" > may have fairly complicated shapes (not simple convex regions). > > Thanks for sharing this, we do have lots of things to do here. Your idea should be a good place to start with. -- Best Regards Andy Fan (https://www.aliyun.com/)