[PERFORM] How to tell ANALYZE to collect statistics from the whole table?

2015-01-24 Thread AlexK987
The documentation states that "The extent of analysis can be controlled by
adjusting the default_statistics_target configuration variable". It looks
like I can tell Postgres to create more histograms with more bins, and more
distinct values. This implicitly means that Postgres will use a larger
random subset to calculate statistics. 

However, this is not what I want. My data may be quite skewed, and I want
full control over the size of the sample. I want to explicitly tell Postgres
to analyze the whole table. How can I accomplish that?



--
View this message in context: 
http://postgresql.nabble.com/How-to-tell-ANALYZE-to-collect-statistics-from-the-whole-table-tp5835339.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to tell ANALYZE to collect statistics from the whole table?

2015-01-24 Thread AlexK987
Tomas,

Thank you for a very useful reply. Right now I do not have a case of poor
performance caused by strong data skew which is not properly reflected in
statistics. I was being defensive, trying to prevent every possible thing
that might go wrong.



--
View this message in context: 
http://postgresql.nabble.com/How-to-tell-ANALYZE-to-collect-statistics-from-the-whole-table-tp5835339p5835344.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why is GIN index slowing down my query?

2015-02-01 Thread AlexK987
I've created a GIN index on an INT[] column, but it slows down the selects.
Here is my table:

create table talent(person_id INT NOT NULL,
skills INT[] NOT NULL);

insert into talent(person_id, skills)
select generate_series, array[0, 1] || generate_series
from generate_series(3, 1048575);

create index talent_skills on talent using gin(skills);

analyze talent;

Here is my select:


explain analyze 
select * from talent 
where skills <@ array[1, 15]

"Bitmap Heap Scan on talent  (cost=52.00..56.01 rows=1 width=37) (actual
time=590.022..590.022 rows=0 loops=1)"
"  Recheck Cond: (skills <@ '{1,15}'::integer[])"
"  Rows Removed by Index Recheck: 1048573"
"  Heap Blocks: exact=8739"
"  ->  Bitmap Index Scan on talent_skills  (cost=0.00..52.00 rows=1 width=0)
(actual time=207.661..207.661 rows=1048573 loops=1)"
"Index Cond: (skills <@ '{1,15}'::integer[])"
"Planning time: 1.310 ms"
"Execution time: 590.078 ms"


If I drop my GIN index, my select is faster:


drop index talent_skills

explain analyze 
select * from talent 
where skills <@ array[1, 15]

"Seq Scan on talent  (cost=0.00..21846.16 rows=1 width=37) (actual
time=347.442..347.442 rows=0 loops=1)"
"  Filter: (skills <@ '{1,15}'::integer[])"
"  Rows Removed by Filter: 1048573"
"Planning time: 0.130 ms"
"Execution time: 347.470 ms"

Am I missing something?




--
View this message in context: 
http://postgresql.nabble.com/Why-is-GIN-index-slowing-down-my-query-tp5836319.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is GIN index slowing down my query?

2015-02-01 Thread AlexK987
Tom,

This is a realistic case: everyone have Python and Java skills, but PostGis
and Haskell and Closure are rare. If we are looking for a person that has
all the skills required for a task (array[1, 15]), that is "skills <@
array[1, 15] " and not the opposite, right?

Also can you explain why " entries for "0" and "1" swamp everything else so
that the planner 
doesn't know that eg "15" is really rare. " I thought that if a value is not
found in the histogram, than clearly that value is rare, correct? What am I
missing here?

I hear what you are saying about "don't keep both extremely common and 
extremely rare entries in the same array", but I cannot predict the future,
so I do not know which values are going to be common next year, or two years
later. So I think it would be very difficult to follow this advice.

What do you think?



--
View this message in context: 
http://postgresql.nabble.com/Why-is-GIN-index-slowing-down-my-query-tp5836319p5836323.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is GIN index slowing down my query?

2015-02-01 Thread AlexK987
Tom,

Oops, you were absolutely right: I needed to use @> instead of <@. Thanks
again!



--
View this message in context: 
http://postgresql.nabble.com/Why-is-GIN-index-slowing-down-my-query-tp5836319p5836327.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance