On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote: > Noah Misch <n...@leadboat.com> writes: > > On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: > >> On reflection my idea above is wrong; for example assume that we have a > >> column with 900 arrays of length 1 and 100 arrays of length 2. Going by > >> what I said, we'd reduce the histogram to {1,2}, which might accurately > >> capture the set of lengths present but fails to show that 1 is much more > >> common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries) > >> would capture the situation perfectly in one-tenth the space that the > >> current logic does. > > > Granted. When the next sample finds 899/101 instead, though, the > > optimization > > vanishes. > > No, you missed my next point. That example shows that sometimes a > smaller histogram can represent the situation with zero error, but in > all cases a smaller histogram can represent the situation with perhaps > more error than a larger one. Since we already have a defined error > tolerance, we should try to generate a histogram that is as small as > possible while still not exceeding the error tolerance. > > Now, it might be that doing that is computationally impractical, or > too complicated to be reasonable. But it seems to me to be worth > looking into.
Yes, I did miss your point. One characteristic favoring this approach is its equal applicability to both STATISTIC_KIND_HISTOGRAM and STATISTIC_KIND_DECHIST. > > If you want to materially narrow typical statistics, Alexander's > > proposal looks like the way to go. I'd guess array columns always > > having DEC <= default_statistics_target are common enough to make that > > representation the dominant representation, if not the only necessary > > representation. > > Well, I don't want to have two representations; I don't think it's worth > the complexity. But certainly we could consider switching to a > different representation if it seems likely to usually be smaller. Perhaps some heavy array users could provide input: what are some typical length ranges among arrays in your applications on which you use "arr && const", "arr @> const" or "arr <@ const" searches? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers