[PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified
Some info: PostgreSQL version: 9.1.2 Table "cache": Rows count: 3 471 081 Column "tsv" tsvector Index "cache_tsv" USING gin (tsv) If i do query like THIS: *SELECT id FROM table WHERE tsv @@ to_tsquery('test:*');* It uses index and returns results immediately: explain analyze 'Bitmap Heap Scan on cache (cost=1441.78..63802.63 rows=19843 width=4) (actual time=29.309..31.518 rows=1358 loops=1)' ' Recheck Cond: (tsv @@ to_tsquery('test:*'::text))' ' -> Bitmap Index Scan on cache_tsv (cost=0.00..1436.82 rows=19843 width=0) (actual time=28.966..28.966 rows=1559 loops=1)' 'Index Cond: (tsv @@ to_tsquery('test:*'::text))' 'Total runtime: 31.789 ms' But the performance problems starts when i do the same query specifying LIMIT. *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;* By some reason index is not used. explain analyze 'Limit (cost=0.00..356.23 rows=20 width=4) (actual time=7.984..765.550 rows=20 loops=1)' ' -> Seq Scan on cache (cost=0.00..353429.50 rows=19843 width=4) (actual time=7.982..765.536 rows=20 loops=1)' 'Filter: (tsv @@ to_tsquery('test:*'::text))' 'Total runtime: 765.620 ms' Some more debug notes: 1) If i set SET enable_seqscan=off; then query uses indexes correctly 2) Also i notified, if i use: to_tsquery('test') without wildcard search :*, then index is used correctly in both queries, with or without LIMIT Any ideas how to fix the problem? Thank you
[PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specified
Some info: PostgreSQL version: 9.1.2 Table "cache": Rows count: 3 471 081 Column "tsv" tsvector Index "cache_tsv" USING gin (tsv) If i do query like THIS: *SELECT id FROM table WHERE tsv @@ to_tsquery('test:*');* It uses index and returns results immediately: explain analyze 'Bitmap Heap Scan on cache (cost=1441.78..63802.63 rows=19843 width=4) (actual time=29.309..31.518 rows=1358 loops=1)' ' Recheck Cond: (tsv @@ to_tsquery('test:*'::text))' ' -> Bitmap Index Scan on cache_tsv (cost=0.00..1436.82 rows=19843 width=0) (actual time=28.966..28.966 rows=1559 loops=1)' 'Index Cond: (tsv @@ to_tsquery('test:*'::text))' 'Total runtime: 31.789 ms' But the performance problems starts when i do the same query specifying LIMIT. *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;* By some reason index is not used. explain analyze 'Limit (cost=0.00..356.23 rows=20 width=4) (actual time=7.984..765.550 rows=20 loops=1)' ' -> Seq Scan on cache (cost=0.00..353429.50 rows=19843 width=4) (actual time=7.982..765.536 rows=20 loops=1)' 'Filter: (tsv @@ to_tsquery('test:*'::text))' 'Total runtime: 765.620 ms' Some more debug notes: 1) If i set SET enable_seqscan=off; then query uses indexes correctly 2) Also i notified, if i use: to_tsquery('test') without wildcard search :*, then index is used correctly in both queries, with or without LIMIT Any ideas how to fix the problem? Thank you
[PERFORM] Partitioning by status?
We have a set of large tables. One of the columns is a status indicator (active / archived). The queries against these tables almost always include the status, so partitioning against that seems to makes sense from a logical standpoint, especially given most of the data is "archived" and most of the processes want active records. Is it practical to partition on the status column and, eg, use triggers to move a row between the two partitions when status is updated? Any surprises to watch for, given the status column is actually NULL for active data and contains a value when archived? Mike
Re: [PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified
darklow writes: > But the performance problems starts when i do the same query specifying > LIMIT. > *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;* > By some reason index is not used. It apparently thinks there are enough matches that it might as well just seqscan the table and expect to find some matches at random, in less time than using the index would take. The estimate seems to be off quite a bit, so maybe raising the stats target for this column would help. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning by status?
Mike Blackwell wrote: > We have a set of large tables. One of the columns is a status indicator > (active / archived). The queries against these tables almost always include > the status, so partitioning against that seems to makes sense from a logical > standpoint, especially given most of the data is "archived" and most of the > processes want active records. > > Is it practical to partition on the status column and, eg, use triggers to > move > a row between the two partitions when status is updated? Any surprises to > watch for, given the status column is actually NULL for active data and > contains a value when archived? If i where you, i would try a partial index where status is null. But yes, partitioning is an other option, depends on your workload. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_upgrade failure "contrib" issue?
On Wed, Dec 7, 2011 at 6:53 PM, Tory M Blue wrote: > Well thought it was maybe just going from 8.4.4 to 9.1.1 so upgraded > to 8.4.9 and tried pg_upgrade again (this is 64bit) and it's failing > > -bash-4.0$ /tmp/pg_upgrade --check --old-datadir "/data/db" > --new-datadir "/data1/db" --old-bindir "/ipix/pgsql/bin" --new-bindir > "/ipix/pgsql9/bin" > Performing Consistency Checks > - > Checking current, bin, and data directories ok > Checking cluster versions ok > Checking database user is a superuser ok > Checking for prepared transactions ok > Checking for reg* system oid user data types ok > Checking for contrib/isn with bigint-passing mismatch ok > Checking for large objects ok > > There were problems executing "/ipix/pgsql/bin/pg_ctl" -w -l > "/dev/null" -D "/data/db" stop >> "/dev/null" 2>&1 > Failure, exiting > > > I've read some re pg_migrator and issues with contribs, but wondered > if there is something "Else" I need to know here I'm not sure that this is on-topic for pgsql-performance, and my reply here is horribly behind-the-times anyway, but my experience with pg_upgrade is that it's entirely willing to send all the critically important information you need to solve the problem to the bit bucket, as in your example. If you knew WHY it was having trouble running pg_ctl, you would probably be able to fix it easily, but since everything's been redirected to /dev/null, you can't. I believe that this gets considerably better if you run pg_upgrade with the "-l logfile" option, and then check the log file. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified
On 2012-01-10 18:04, Tom Lane wrote: darklow writes: But the performance problems starts when i do the same query specifying LIMIT. *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;* By some reason index is not used. It apparently thinks there are enough matches that it might as well just seqscan the table and expect to find some matches at random, in less time than using the index would take. The estimate seems to be off quite a bit, so maybe raising the stats target for this column would help. The cost of matching ts_match_vq against a toasted column is not calculated correctly. This is completely parallel with http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php Try raising the cost for ts_match_vq(tsvector,tsquery) that help a bit, but its hard to get the cost high enough. Raising statistics target helps too.. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Subquery flattening causing sequential scan
On Tue, Dec 27, 2011 at 12:29 PM, Jim Crate wrote: > My question is why does it do a seq scan when it flattens this subquery into > a JOIN? Is it because the emsg_messages table is around 1M rows? Are there > some guidelines to when the planner will prefer not to use an available > index? I just had a look through postgresql.conf and noticed that I forgot > to set effective_cache_size to something reasonable for a machine with 16GB > of memory. Would the default setting of 128MB cause this behavior? I can't > bounce the production server midday to test that change. You wouldn't need to bounce the production server to test that. You could just use SET in the session you were testing from. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioned table: differents plans, slow on some situations
2011/12/30 Matteo Sgalaberni : > I'm sorry, I pasted the wrong ones, but the results are the same, here A and > B again: > > Query A > > # EXPLAIN ANALYZE SELECT sms.id AS id_sms > > FROM > sms_messaggio AS sms, > sms_messaggio_dlr AS dlr > WHERE sms.id = dlr.id_sms_messaggio > AND sms.timestamp_todeliver >= '1/3/2010'::timestamp > AND sms.timestamp_todeliver < '30/4/2010'::timestamp > AND dlr.timestamp_todeliver >= '1/3/2010'::timestamp > AND dlr.timestamp_todeliver < '30/4/2010'::timestamp > AND sms.id_cliente = '13' > ORDER BY dlr.timestamp_todeliver DESC LIMIT 50; > > QUERY PLAN > -- > Limit (cost=0.02..943.11 rows=50 width=16) (actual time=0.603..79.729 > rows=50 loops=1) > -> Nested Loop (cost=0.02..107279143.34 rows=5687651 width=16) (actual > time=0.601..79.670 rows=50 loops=1) > Join Filter: (sms.id = dlr.id_sms_messaggio) > -> Merge Append (cost=0.02..20289460.70 rows=5687651 width=16) > (actual time=0.048..14.556 rows=5874 loops=1) > Sort Key: dlr.timestamp_todeliver > -> Index Scan Backward using sms_messaggio_dlr_todeliver on > sms_messaggio_dlr dlr (cost=0.00..8.27 rows=1 width=16) (actual > time=0.005..0.005 rows=0 loops=1) > Index Cond: ((timestamp_todeliver >= '2010-03-01 > 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < > '2010-04-30 00:00:00'::timestamp without time zone)) > -> Index Scan Backward using > sms_messaggio_dlr_timestamp_todeliver_201003 on sms_messaggio_dlr_201003 dlr > (cost=0.00..12428664.98 rows=3502530 width=16) (actual time=0.018..0.018 > rows=1 loops=1) > Index Cond: ((timestamp_todeliver >= '2010-03-01 > 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < > '2010-04-30 00:00:00'::timestamp without time zone)) > -> Index Scan Backward using > sms_messaggio_dlr_timestamp_todeliver_201004 on sms_messaggio_dlr_201004 dlr > (cost=0.00..7756421.17 rows=2185120 width=16) (actual time=0.022..8.408 > rows=5874 loops=1) > Index Cond: ((timestamp_todeliver >= '2010-03-01 > 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < > '2010-04-30 00:00:00'::timestamp without time zone)) > -> Append (cost=0.00..15.26 rows=3 width=8) (actual > time=0.010..0.010 rows=0 loops=5874) > -> Index Scan using sms_messaggio_pkey1 on sms_messaggio sms > (cost=0.00..0.28 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=5874) > Index Cond: (id = dlr.id_sms_messaggio) > Filter: ((timestamp_todeliver >= '2010-03-01 > 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < > '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 13)) > -> Index Scan using sms_messaggio_201003_pkey on > sms_messaggio_201003 sms (cost=0.00..7.54 rows=1 width=8) (actual > time=0.002..0.002 rows=0 loops=5874) > Index Cond: (id = dlr.id_sms_messaggio) > Filter: ((timestamp_todeliver >= '2010-03-01 > 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < > '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 13)) > -> Index Scan using sms_messaggio_201004_pkey on > sms_messaggio_201004 sms (cost=0.00..7.45 rows=1 width=8) (actual > time=0.004..0.004 rows=0 loops=5874) > Index Cond: (id = dlr.id_sms_messaggio) > Filter: ((timestamp_todeliver >= '2010-03-01 > 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < > '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 13)) > Total runtime: 79.821 ms > (22 rows) > > Query B: > # EXPLAIN ANALYZE SELECT sms.id AS id_sms > > FROM > sms_messaggio AS sms, > sms_messaggio_dlr AS dlr > WHERE sms.id = dlr.id_sms_messaggio > AND sms.timestamp_todeliver >= '1/3/2010'::timestamp > AND sms.timestamp_todeliver < '30/4/2010'::timestamp > AND dlr.timestamp_todeliver >= '1/3/2010'::timestamp > AND dlr.timestamp_todeliver < '30/4/2010'::timestamp > AND sms.id_cliente = '7' > ORDER BY dlr.timestamp_todeliver DESC LIMIT 50; >