[PERFORM] How to tell ANALYZE to collect statistics from the whole table?
The documentation states that "The extent of analysis can be controlled by adjusting the default_statistics_target configuration variable". It looks like I can tell Postgres to create more histograms with more bins, and more distinct values. This implicitly means that Postgres will use a larger random subset to calculate statistics. However, this is not what I want. My data may be quite skewed, and I want full control over the size of the sample. I want to explicitly tell Postgres to analyze the whole table. How can I accomplish that? -- View this message in context: http://postgresql.nabble.com/How-to-tell-ANALYZE-to-collect-statistics-from-the-whole-table-tp5835339.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to tell ANALYZE to collect statistics from the whole table?
Tomas, Thank you for a very useful reply. Right now I do not have a case of poor performance caused by strong data skew which is not properly reflected in statistics. I was being defensive, trying to prevent every possible thing that might go wrong. -- View this message in context: http://postgresql.nabble.com/How-to-tell-ANALYZE-to-collect-statistics-from-the-whole-table-tp5835339p5835344.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Why is GIN index slowing down my query?
I've created a GIN index on an INT[] column, but it slows down the selects. Here is my table: create table talent(person_id INT NOT NULL, skills INT[] NOT NULL); insert into talent(person_id, skills) select generate_series, array[0, 1] || generate_series from generate_series(3, 1048575); create index talent_skills on talent using gin(skills); analyze talent; Here is my select: explain analyze select * from talent where skills <@ array[1, 15] "Bitmap Heap Scan on talent (cost=52.00..56.01 rows=1 width=37) (actual time=590.022..590.022 rows=0 loops=1)" " Recheck Cond: (skills <@ '{1,15}'::integer[])" " Rows Removed by Index Recheck: 1048573" " Heap Blocks: exact=8739" " -> Bitmap Index Scan on talent_skills (cost=0.00..52.00 rows=1 width=0) (actual time=207.661..207.661 rows=1048573 loops=1)" "Index Cond: (skills <@ '{1,15}'::integer[])" "Planning time: 1.310 ms" "Execution time: 590.078 ms" If I drop my GIN index, my select is faster: drop index talent_skills explain analyze select * from talent where skills <@ array[1, 15] "Seq Scan on talent (cost=0.00..21846.16 rows=1 width=37) (actual time=347.442..347.442 rows=0 loops=1)" " Filter: (skills <@ '{1,15}'::integer[])" " Rows Removed by Filter: 1048573" "Planning time: 0.130 ms" "Execution time: 347.470 ms" Am I missing something? -- View this message in context: http://postgresql.nabble.com/Why-is-GIN-index-slowing-down-my-query-tp5836319.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is GIN index slowing down my query?
Tom, This is a realistic case: everyone have Python and Java skills, but PostGis and Haskell and Closure are rare. If we are looking for a person that has all the skills required for a task (array[1, 15]), that is "skills <@ array[1, 15] " and not the opposite, right? Also can you explain why " entries for "0" and "1" swamp everything else so that the planner doesn't know that eg "15" is really rare. " I thought that if a value is not found in the histogram, than clearly that value is rare, correct? What am I missing here? I hear what you are saying about "don't keep both extremely common and extremely rare entries in the same array", but I cannot predict the future, so I do not know which values are going to be common next year, or two years later. So I think it would be very difficult to follow this advice. What do you think? -- View this message in context: http://postgresql.nabble.com/Why-is-GIN-index-slowing-down-my-query-tp5836319p5836323.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is GIN index slowing down my query?
Tom, Oops, you were absolutely right: I needed to use @> instead of <@. Thanks again! -- View this message in context: http://postgresql.nabble.com/Why-is-GIN-index-slowing-down-my-query-tp5836319p5836327.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance