[PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified

2012-01-10 Thread darklow
Some info:
PostgreSQL version: 9.1.2

Table "cache":
Rows count: 3 471 081
Column "tsv" tsvector
Index "cache_tsv" USING gin (tsv)

If i do query like THIS:

*SELECT id FROM table WHERE tsv @@ to_tsquery('test:*');*
It uses index and returns results immediately:

explain analyze
'Bitmap Heap Scan on cache  (cost=1441.78..63802.63 rows=19843 width=4)
(actual time=29.309..31.518 rows=1358 loops=1)'
'  Recheck Cond: (tsv @@ to_tsquery('test:*'::text))'
'  ->  Bitmap Index Scan on cache_tsv  (cost=0.00..1436.82 rows=19843
width=0) (actual time=28.966..28.966 rows=1559 loops=1)'
'Index Cond: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 31.789 ms'


But the performance problems starts when i do the same query specifying
LIMIT.
*SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*

By some reason index is not used.

explain analyze
'Limit  (cost=0.00..356.23 rows=20 width=4) (actual time=7.984..765.550
rows=20 loops=1)'
'  ->  Seq Scan on cache  (cost=0.00..353429.50 rows=19843 width=4) (actual
time=7.982..765.536 rows=20 loops=1)'
'Filter: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 765.620 ms'


Some more debug notes:
1) If i set SET enable_seqscan=off; then query uses indexes correctly
2) Also i notified, if i use: to_tsquery('test') without wildcard search
:*, then index is used correctly in both queries, with or without LIMIT

Any ideas how to fix the problem?
Thank you


[PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specified

2012-01-10 Thread darklow
Some info:
PostgreSQL version: 9.1.2

Table "cache":
Rows count: 3 471 081
Column "tsv" tsvector
Index "cache_tsv" USING gin (tsv)

If i do query like THIS:

*SELECT id FROM table WHERE tsv @@ to_tsquery('test:*');*
It uses index and returns results immediately:

explain analyze
'Bitmap Heap Scan on cache  (cost=1441.78..63802.63 rows=19843 width=4)
(actual time=29.309..31.518 rows=1358 loops=1)'
'  Recheck Cond: (tsv @@ to_tsquery('test:*'::text))'
'  ->  Bitmap Index Scan on cache_tsv  (cost=0.00..1436.82 rows=19843
width=0) (actual time=28.966..28.966 rows=1559 loops=1)'
'Index Cond: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 31.789 ms'


But the performance problems starts when i do the same query specifying
LIMIT.
*SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*

By some reason index is not used.

explain analyze
'Limit  (cost=0.00..356.23 rows=20 width=4) (actual time=7.984..765.550
rows=20 loops=1)'
'  ->  Seq Scan on cache  (cost=0.00..353429.50 rows=19843 width=4) (actual
time=7.982..765.536 rows=20 loops=1)'
'Filter: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 765.620 ms'


Some more debug notes:
1) If i set SET enable_seqscan=off; then query uses indexes correctly
2) Also i notified, if i use: to_tsquery('test') without wildcard search
:*, then index is used correctly in both queries, with or without LIMIT

Any ideas how to fix the problem?
Thank you


[PERFORM] Partitioning by status?

2012-01-10 Thread Mike Blackwell
We have a set of large tables.  One of the columns is a status indicator
(active / archived).  The queries against these tables almost always
include the status, so partitioning against that seems to makes sense from
a logical standpoint, especially given most of the data is "archived" and
most of the processes want active records.

Is it practical to partition on the status column and, eg, use triggers to
move a row between the two partitions when status is updated?  Any
surprises to watch for, given the status column is actually NULL for active
data and contains a value when archived?

Mike


Re: [PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified

2012-01-10 Thread Tom Lane
darklow  writes:
> But the performance problems starts when i do the same query specifying
> LIMIT.
> *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*
> By some reason index is not used.

It apparently thinks there are enough matches that it might as well just
seqscan the table and expect to find some matches at random, in less
time than using the index would take.

The estimate seems to be off quite a bit, so maybe raising the stats
target for this column would help.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioning by status?

2012-01-10 Thread Andreas Kretschmer
Mike Blackwell  wrote:

> We have a set of large tables.  One of the columns is a status indicator
> (active / archived).  The queries against these tables almost always include
> the status, so partitioning against that seems to makes sense from a logical
> standpoint, especially given most of the data is "archived" and most of the
> processes want active records.
> 
> Is it practical to partition on the status column and, eg, use triggers to 
> move
> a row between the two partitions when status is updated?  Any surprises to
> watch for, given the status column is actually NULL for active data and
> contains a value when archived?

If i where you, i would try a partial index where status is null. But
yes, partitioning is an other option, depends on your workload.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_upgrade failure "contrib" issue?

2012-01-10 Thread Robert Haas
On Wed, Dec 7, 2011 at 6:53 PM, Tory M Blue  wrote:
> Well thought it was maybe just going from 8.4.4 to 9.1.1 so upgraded
> to 8.4.9 and tried pg_upgrade again (this is 64bit) and it's failing
>
> -bash-4.0$ /tmp/pg_upgrade --check --old-datadir "/data/db"
> --new-datadir "/data1/db" --old-bindir "/ipix/pgsql/bin" --new-bindir
> "/ipix/pgsql9/bin"
> Performing Consistency Checks
> -
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Checking database user is a superuser                       ok
> Checking for prepared transactions                          ok
> Checking for reg* system oid user data types                ok
> Checking for contrib/isn with bigint-passing mismatch       ok
> Checking for large objects                                  ok
>
> There were problems executing "/ipix/pgsql/bin/pg_ctl" -w -l
> "/dev/null" -D "/data/db"  stop >> "/dev/null" 2>&1
> Failure, exiting
>
>
> I've read some re pg_migrator and issues with contribs, but wondered
> if there is something "Else" I need to know here

I'm not sure that this is on-topic for pgsql-performance, and my reply
here is horribly behind-the-times anyway, but my experience with
pg_upgrade is that it's entirely willing to send all the critically
important information you need to solve the problem to the bit bucket,
as in your example.  If you knew WHY it was having trouble running
pg_ctl, you would probably be able to fix it easily, but since
everything's been redirected to /dev/null, you can't.  I believe that
this gets considerably better if you run pg_upgrade with the "-l
logfile" option, and then check the log file.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified

2012-01-10 Thread Jesper Krogh

On 2012-01-10 18:04, Tom Lane wrote:

darklow  writes:

But the performance problems starts when i do the same query specifying
LIMIT.
*SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*
By some reason index is not used.

It apparently thinks there are enough matches that it might as well just
seqscan the table and expect to find some matches at random, in less
time than using the index would take.

The estimate seems to be off quite a bit, so maybe raising the stats
target for this column would help.

The cost of matching ts_match_vq against a toasted column
is not calculated correctly. This is completely parallel with
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

Try raising the cost for ts_match_vq(tsvector,tsquery) that help a bit, but
its hard to get the cost high enough.

Raising statistics target  helps too..

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery flattening causing sequential scan

2012-01-10 Thread Robert Haas
On Tue, Dec 27, 2011 at 12:29 PM, Jim Crate  wrote:
> My question is why does it do a seq scan when it flattens this subquery into 
> a JOIN?  Is it because the emsg_messages table is around 1M rows?  Are there 
> some guidelines to when the planner will prefer not to use an available 
> index?  I just had a look through postgresql.conf and noticed that I forgot 
> to set effective_cache_size to something reasonable for a machine with 16GB 
> of memory.  Would the default setting of 128MB cause this behavior?  I can't 
> bounce the production server midday to test that change.

You wouldn't need to bounce the production server to test that.  You
could just use SET in the session you were testing from.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partitioned table: differents plans, slow on some situations

2012-01-10 Thread Robert Haas
2011/12/30 Matteo Sgalaberni :
> I'm sorry, I pasted the wrong ones, but the results are the same, here A and 
> B again:
>
> Query A
>
> # EXPLAIN  ANALYZE SELECT sms.id AS id_sms
>
>                      FROM
>                       sms_messaggio AS sms,
>                       sms_messaggio_dlr AS dlr
>                      WHERE sms.id = dlr.id_sms_messaggio
>                        AND sms.timestamp_todeliver >= '1/3/2010'::timestamp
>                        AND sms.timestamp_todeliver < '30/4/2010'::timestamp
>                        AND dlr.timestamp_todeliver >= '1/3/2010'::timestamp
>                        AND dlr.timestamp_todeliver < '30/4/2010'::timestamp
>                         AND sms.id_cliente = '13'
>                      ORDER BY dlr.timestamp_todeliver DESC LIMIT 50;
>                                                                               
>                        QUERY PLAN
> --
>  Limit  (cost=0.02..943.11 rows=50 width=16) (actual time=0.603..79.729 
> rows=50 loops=1)
>   ->  Nested Loop  (cost=0.02..107279143.34 rows=5687651 width=16) (actual 
> time=0.601..79.670 rows=50 loops=1)
>         Join Filter: (sms.id = dlr.id_sms_messaggio)
>         ->  Merge Append  (cost=0.02..20289460.70 rows=5687651 width=16) 
> (actual time=0.048..14.556 rows=5874 loops=1)
>               Sort Key: dlr.timestamp_todeliver
>               ->  Index Scan Backward using sms_messaggio_dlr_todeliver on 
> sms_messaggio_dlr dlr  (cost=0.00..8.27 rows=1 width=16) (actual 
> time=0.005..0.005 rows=0 loops=1)
>                     Index Cond: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>               ->  Index Scan Backward using 
> sms_messaggio_dlr_timestamp_todeliver_201003 on sms_messaggio_dlr_201003 dlr  
> (cost=0.00..12428664.98 rows=3502530 width=16) (actual time=0.018..0.018 
> rows=1 loops=1)
>                     Index Cond: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>               ->  Index Scan Backward using 
> sms_messaggio_dlr_timestamp_todeliver_201004 on sms_messaggio_dlr_201004 dlr  
> (cost=0.00..7756421.17 rows=2185120 width=16) (actual time=0.022..8.408 
> rows=5874 loops=1)
>                     Index Cond: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone))
>         ->  Append  (cost=0.00..15.26 rows=3 width=8) (actual 
> time=0.010..0.010 rows=0 loops=5874)
>               ->  Index Scan using sms_messaggio_pkey1 on sms_messaggio sms  
> (cost=0.00..0.28 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=5874)
>                     Index Cond: (id = dlr.id_sms_messaggio)
>                     Filter: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 13))
>               ->  Index Scan using sms_messaggio_201003_pkey on 
> sms_messaggio_201003 sms  (cost=0.00..7.54 rows=1 width=8) (actual 
> time=0.002..0.002 rows=0 loops=5874)
>                     Index Cond: (id = dlr.id_sms_messaggio)
>                     Filter: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 13))
>               ->  Index Scan using sms_messaggio_201004_pkey on 
> sms_messaggio_201004 sms  (cost=0.00..7.45 rows=1 width=8) (actual 
> time=0.004..0.004 rows=0 loops=5874)
>                     Index Cond: (id = dlr.id_sms_messaggio)
>                     Filter: ((timestamp_todeliver >= '2010-03-01 
> 00:00:00'::timestamp without time zone) AND (timestamp_todeliver < 
> '2010-04-30 00:00:00'::timestamp without time zone) AND (id_cliente = 13))
>  Total runtime: 79.821 ms
> (22 rows)
>
> Query B:
> # EXPLAIN  ANALYZE SELECT sms.id AS id_sms
>
>                      FROM
>                       sms_messaggio AS sms,
>                       sms_messaggio_dlr AS dlr
>                      WHERE sms.id = dlr.id_sms_messaggio
>                        AND sms.timestamp_todeliver >= '1/3/2010'::timestamp
>                        AND sms.timestamp_todeliver < '30/4/2010'::timestamp
>                        AND dlr.timestamp_todeliver >= '1/3/2010'::timestamp
>                        AND dlr.timestamp_todeliver < '30/4/2010'::timestamp
>                         AND sms.id_cliente = '7'
>                      ORDER BY dlr.timestamp_todeliver DESC LIMIT 50;
>