On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorot...@gmail.com> writes: > > True. If (max count - min count + 1) is small, enumerating of frequencies > > is both more compact and more precise representation. Simultaneously, > > if (max count - min count + 1) is large, we can run out of > > statistics_target with such representation. We can use same > representation > > of count distribution as for scalar column value: MCV and HISTOGRAM, but > it > > would require additional statkind and statistics slot. Probably, you've > > better ideas? > > I wasn't thinking of introducing two different representations, > but just trimming the histogram length when it's larger than necessary. > > On reflection my idea above is wrong; for example assume that we have a > column with 900 arrays of length 1 and 100 arrays of length 2. Going by > what I said, we'd reduce the histogram to {1,2}, which might accurately > capture the set of lengths present but fails to show that 1 is much more > common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries) > would capture the situation perfectly in one-tenth the space that the > current logic does. > > More generally, by limiting the histogram to statistics_target entries, > we are already accepting errors of up to 1/(2*statistics_target) in the > accuracy of the bin-boundary representation. What the above example > shows is that sometimes we could meet the same accuracy requirement with > fewer entries. I'm not sure how this could be mechanized but it seems > worth thinking about. >
I can propose following representation of histogram. If (max_count - min_count + 1) <= statistics_target then 1) store max_count and min_count in stavalues 2) store frequencies from min_count ot max_count in numvalues If (max_count - min_count + 1) > statistics_target then store histogram in current manner. I think in this case it's unlikely to be many repeating values. I can propose patch which change histogram representation to this. Comments? ------ With best regards, Alexander Korotkov.