Hi Alex,

Thanks for excellent research.

I've ran your queries against Trustly's production database and I can
confirm your findings, the results are similar:

WITH ...
SELECT count(1),
       min(hist_ratio)::real,
       avg(hist_ratio)::real,
       max(hist_ratio)::real,
       stddev(hist_ratio)::real
  FROM stats2
 WHERE histogram_bounds IS NOT NULL;

-[ RECORD 1 ]----
count  | 2814
min    | 0.193548
avg    | 0.927357
max    | 1
stddev | 0.164134


WHERE distinct_hist < num_hist
-[ RECORD 1 ]----
count  | 624
min    | 0.193548
avg    | 0.672407
max    | 0.990099
stddev | 0.194901


WITH ..
SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
       n_distinct, null_frac,
       num_mcv, most_common_vals, most_common_freqs,
       mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
       distinct_hist, num_hist, hist_ratio,
       histogram_bounds
  FROM stats2
 ORDER BY hist_ratio
 LIMIT 1;

 -[ RECORD 1 
]-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
columnname        | public.x.y
n_distinct        | 103
null_frac         | 0
num_mcv           | 10
most_common_vals  | {0,1,2,3,4,5,6,7,8,9}
most_common_freqs |
{0.4765,0.141733,0.1073,0.0830667,0.0559667,0.0373333,0.0251,0.0188,0.0141,0.0113667}
mcv_frac          | 0.971267
nonnull_mcv_frac  | 0.971267
distinct_hist     | 18
num_hist          | 93
hist_ratio        | 0.193548387096774
histogram_bounds  |
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}



On Mon, Jan 18, 2016 at 4:46 PM, Shulgin, Oleksandr
<oleksandr.shul...@zalando.de> wrote:
> On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
>>
>> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>
>>> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
>>> > This post summarizes a few weeks of research of ANALYZE statistics
>>> > distribution on one of our bigger production databases with some
>>> > real-world
>>> > data and proposes a patch to rectify some of the oddities observed.
>>>
>>> Please add this to the 2016-01 commitfest ...
>>
>>
>> Added: https://commitfest.postgresql.org/8/434/
>
>
> It would be great if some folks could find a moment to run the queries I was
> showing on their data to confirm (or refute) my findings, or to contribute
> to the picture in general.
>
> As I was saying, the queries were designed in such a way that even
> unprivileged user can run them (the results will be limited to the stats
> data available to that user, obviously; and for custom-tailored statstarget
> one still needs superuser to join the pg_statistic table directly).  Also,
> on the scale of ~30k attribute statistics records, the queries take only a
> few seconds to finish.
>
> Cheers!
> --
> Alex
>



-- 
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to