On 03/17/2018 07:28 PM, Dean Rasheed wrote: > On 16 March 2018 at 15:26, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: >> Actually, one question - when deciding whether to keep the item in the >> MCV list, analyze_mcv_list only compares it's frequency with an average >> of the rest. But as we're removing items from the MCV list, the average >> frequency of the non-MCV items is growing (we're removing items with >> higher and higher frequencies). That means the estimates for the least >> common items will get higher and higher - essentially overestimates. So, >> couldn't/shouldn't analyze_mcv_list consider this too? >> > > Yes, that's the intention. At the start, sumcount is set to the count > of all but the last (least common) MCV item, so it can estimate the > frequency of the non-MCV items if the last MCV item were to be > removed. Then each time through the loop, sumcount is decreased by the > removed item's count, increasing the estimated frequency of the > non-MCV items accordingly. >
I know it's updated like that, but that's not quite what I meant. Sorry for not making the question clearer, so let me rephrase it. Currently, analyze_mcv_list only checks if the frequency of the current item is significantly higher than the non-MCV selectivity. My question is if it shouldn't also consider if removing the item from MCV would not increase the non-MCV selectivity too much. But perhaps that would be a silly idea, because the non-MCV items may also be seen as a random noise. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services