Hi Alex, Thanks for excellent research.
I've ran your queries against Trustly's production database and I can confirm your findings, the results are similar: WITH ... SELECT count(1), min(hist_ratio)::real, avg(hist_ratio)::real, max(hist_ratio)::real, stddev(hist_ratio)::real FROM stats2 WHERE histogram_bounds IS NOT NULL; -[ RECORD 1 ]---- count | 2814 min | 0.193548 avg | 0.927357 max | 1 stddev | 0.164134 WHERE distinct_hist < num_hist -[ RECORD 1 ]---- count | 624 min | 0.193548 avg | 0.672407 max | 0.990099 stddev | 0.194901 WITH .. SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname, n_distinct, null_frac, num_mcv, most_common_vals, most_common_freqs, mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac, distinct_hist, num_hist, hist_ratio, histogram_bounds FROM stats2 ORDER BY hist_ratio LIMIT 1; -[ RECORD 1 ]-----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- columnname | public.x.y n_distinct | 103 null_frac | 0 num_mcv | 10 most_common_vals | {0,1,2,3,4,5,6,7,8,9} most_common_freqs | {0.4765,0.141733,0.1073,0.0830667,0.0559667,0.0373333,0.0251,0.0188,0.0141,0.0113667} mcv_frac | 0.971267 nonnull_mcv_frac | 0.971267 distinct_hist | 18 num_hist | 93 hist_ratio | 0.193548387096774 histogram_bounds | {10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285} On Mon, Jan 18, 2016 at 4:46 PM, Shulgin, Oleksandr <oleksandr.shul...@zalando.de> wrote: > On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr > <oleksandr.shul...@zalando.de> wrote: >> >> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> >>> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes: >>> > This post summarizes a few weeks of research of ANALYZE statistics >>> > distribution on one of our bigger production databases with some >>> > real-world >>> > data and proposes a patch to rectify some of the oddities observed. >>> >>> Please add this to the 2016-01 commitfest ... >> >> >> Added: https://commitfest.postgresql.org/8/434/ > > > It would be great if some folks could find a moment to run the queries I was > showing on their data to confirm (or refute) my findings, or to contribute > to the picture in general. > > As I was saying, the queries were designed in such a way that even > unprivileged user can run them (the results will be limited to the stats > data available to that user, obviously; and for custom-tailored statstarget > one still needs superuser to join the pg_statistic table directly). Also, > on the scale of ~30k attribute statistics records, the queries take only a > few seconds to finish. > > Cheers! > -- > Alex > -- Joel Jacobson Mobile: +46703603801 Trustly.com | Newsroom | LinkedIn | Twitter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers