Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
"Pavel Stehule"  writes:

> Hello
>
> 2008/9/1 David West <[EMAIL PROTECTED]>:
>> Thanks for your suggestion but the result is the same.
>>
>> Here is the explain analyse output from different queries.
>> Select * from my_table where A is null and B = '21' limit 15
>>
>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
>> time=85837.043..85896.140 rows=15 loops=1)"
>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>> "Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>> "Total runtime: 85896.214 ms"
>>
>
> I see it - problem is in statistics - system expect 1055580, but there
> is only 15 values.

Aren't you rather seeing the effect of the limit clause?

gc=# create table foo ( bar int );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 1000) / 100 );
INSERT 0 1001
gc=# analyze foo;
ANALYZE
gc=# explain analyze select * from foo where bar = 8 limit 15;
 QUERY PLAN 
 
-
 Limit  (cost=0.00..2.30 rows=15 width=4) (actual time=2379.878..2379.921 
rows=15 loops=1)
   ->  Seq Scan on foo  (cost=0.00..164217.00 rows=1070009 width=4) (actual 
time=2379.873..2379.888 rows=15 loops=1)
 Filter: (bar = 8)
 Total runtime: 2379.974 ms

(on 8.3.1)

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Pavel Stehule
2008/9/2 Guillaume Cottenceau <[EMAIL PROTECTED]>:
> "Pavel Stehule"  writes:
>
>> Hello
>>
>> 2008/9/1 David West <[EMAIL PROTECTED]>:
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
>>> time=85837.043..85896.140 rows=15 loops=1)"
>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
>>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>> "Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>
>> I see it - problem is in statistics - system expect 1055580, but there
>> is only 15 values.
>
> Aren't you rather seeing the effect of the limit clause?

yes, true, my mistake

Pavel

>
> gc=# create table foo ( bar int );
> CREATE TABLE
> gc=# insert into foo ( select generate_series(0, 1000) / 100 );
> INSERT 0 1001
> gc=# analyze foo;
> ANALYZE
> gc=# explain analyze select * from foo where bar = 8 limit 15;
> QUERY PLAN
> -
>  Limit  (cost=0.00..2.30 rows=15 width=4) (actual time=2379.878..2379.921 
> rows=15 loops=1)
>   ->  Seq Scan on foo  (cost=0.00..164217.00 rows=1070009 width=4) (actual 
> time=2379.873..2379.888 rows=15 loops=1)
> Filter: (bar = 8)
>  Total runtime: 2379.974 ms
>
> (on 8.3.1)
>
> --
> Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
> Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow update of index during insert/copy

2008-09-02 Thread Greg Smith

On Tue, 2 Sep 2008, Thomas Finneid wrote:

You should try setting this to open_sync , that can be considerably faster 
for some write-heavy situations.  Make sure to test that throughly though, 
there are occasional reports of issues with that setting under Linux


The production machine is Solaris 10 running on a Sun v980. Do you know of it 
has any issues like these?


On Solaris you can safely use open_datasync which is a bit better than 
open_sync.  For best results, you need to separate the xlog onto a 
separate partition and mount it using forcedirectio, because Postgres 
doesn't know how to use direct I/O directly on Solaris yet.


Additionally, would I need to do any config changes when going from linux to 
solaris?


Assuming the same amount of memory, the postgresql.conf should be 
basically the same, except for the wal_sync_method change mentioned above. 
If there's more RAM in the production server you can ramp up 
shared_buffers, effective_cache_size, and possibly work_mem 
proportionately.  The settings I suggested for maintenance_work_mem and 
wal_buffers are already near the useful upper limits for those parameters.


There are a few operating system level things you should consider tweaking 
on Solaris 10 for better PostgreSQL performance.  You need to be a bit 
more careful about the parameters used for the filesystem than on Linux, 
and the settings there vary considerably depending on whether you're using 
UFS or ZFS.  The best intro to that I know of is at 
http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best ; I added 
some clarification to a few points in there and some other Solaris notes 
at http://notemagnet.blogspot.com/2008_04_01_archive.html  Those should 
get you started.


I hope you're already looking into some sort of repeatable benchmarking 
that's representative of your application you can run.  You'll go crazy 
playing with all these settings without something like that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
Russell Smith  writes:

> Pavel Stehule wrote:
>> Hello
>>
>> 2008/9/1 David West <[EMAIL PROTECTED]>:
>>   
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
>>> time=85837.043..85896.140 rows=15 loops=1)"
>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
>>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>> "Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>> 
> [snip]
>
> Further to Pavel's comments;
>
> (actual time=85837.038..85896.091 rows=15 loops=1)
>
> That's 85 seconds on a sequence scan to return the first tuple.  The table is 
> not bloated by any chance is it?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 1000) / 100, case when 
random() < 0.05 then 'Today Alcatel-Lucent has announced that P*** C is 
appointed non-executive Chairman and B** V is appointed Chief Executive 
Officer.' else null end );
INSERT 0 1001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
   QUERY PLAN   
 
-
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
 ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=5 width=0)
   Index Cond: (bar = 8)
 ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=5 width=0)
   Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
  QUERY PLAN  
--
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089 width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
Yes I inserted values in big batches according to a single value of 'B', so
indeed a sequence scan may have to scan forward many millions of rows before
finding the required value.

I have been doing regular analyse commands on my table.  I don't think my
table is bloated, I haven't been performing updates.  However I'm doing a
vacuum analyse now and I'll see if that makes any difference.

I am using postgres 8.3.1 with a default install on windows - no tweaks to
the configuration at all.

There are many other columns in my table, but none of them are used in this
query.

Guillaume in your example you didn't add the limit clause?  Postgres chooses
the correct index in my case without the limit clause, the problem is with
the limit clause.  One other difference with your example is both my columns
are varchar columns, not integer and text, I don't know if that would make a
difference.

>From looking at the plans, it seems to be postgres is assuming it will only
have to sequentially scan 15 rows, which is not true in my case because
column B is not distributed randomly (nor will it be in production).  Would
postgres not be best to ignore the limit when deciding the best index to use
-  in this simple query wouldn't the best plan to use always be the same
with or without a limit?

Thanks to all of you for your interest in my problem
David

-Original Message-
From: Guillaume Cottenceau [mailto:[EMAIL PROTECTED] 
Sent: 02 September 2008 10:07
To: David West; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 1000) / 100, case
when random() < 0.05 then 'Today Alcatel-Lucent has announced that P***
C is appointed non-executive Chairman and B** V is appointed
Chief Executive Officer.' else null end );
INSERT 0 1001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
   QUERY PLAN


-
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
 ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=5
width=0)
   Index Cond: (bar = 8)
 ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=5
width=0)
   Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
  QUERY PLAN


--
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089
width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Gregory Stark


 "Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
 time=85837.043..85896.140 rows=15 loops=1)"
 "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
 width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
 "Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
 "Total runtime: 85896.214 ms"

Postgres does collect and use statistics about what fraction of the "A" column
is null. It also collects and uses statistics about what fraction of the "B"
column is 21 (using a histogram). And it does take the LIMIT into account.

I think the other poster might well be right about this table being extremely
bloated. You could test by running and posting the results of:

VACUUM VERBOSE my_table

What it doesn't collect is where in the table those records are -- so if there
are a lot of them then it might use a sequential scan regardless of whether
they're at the beginning or end of the table. That seems unlikely to be the
problem though.

The other thing it doesn't collect is how many of the B=21 records have null
As. So if a large percentage of the table has A as null then it will assume
that's true for the B=21 records and if there are a lot of B=21 records then
it will assume a sequential scan will find matches quickly. If in fact the two
columns are highly correlated and B=21 records almost never have A null
whereas records with other values of B have lots of null values then Postgres
might make a bad decision here.

Also, it only has the statitics for B=21 via a histogram. If the distribution
of B is highly skewed so that, for example values between 20 and 25 are very
common but B=21 happens to be quite rare then Postgres might get a bad
estimate here. You could improve this by raising the statistics target for the
B column and re-analyzing.

That brings up another question -- when was the last time this table was
analyzed?

What estimates and actual results does postgres get for simple queries like:

EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21;

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
"David West"  writes:

> Yes I inserted values in big batches according to a single value of 'B', so
> indeed a sequence scan may have to scan forward many millions of rows before
> finding the required value.

That may well be why the seqscan is so slow to give your results;
that said, it doesn't explain why the indexscane is not
preferred.

> I have been doing regular analyse commands on my table.  I don't think my

Like, recently? Can you post the stats?

gc=# select * from pg_stats where tablename = 'foo';

You should try to ANALYZE again and see if that makes a
difference, to be sure.

> table is bloated, I haven't been performing updates.  However I'm doing a

Maybe you've been DELETE'ing then INSERT'ing some? That creates
bloat too. Btw, don't forget to prefer TRUNCATE to remove
everything from the table, and ANALYZE after large INSERT's.

> vacuum analyse now and I'll see if that makes any difference.

A single VACUUM may not report how bloated your table is, if it's
been VACUUM'ed some before, but not frequently enough. If you
have time for it, and you can afford a full lock on the table,
only a VACUUM FULL VERBOSE will tell you the previous bloat (the
"table .. truncated to .." line IIRC).

> I am using postgres 8.3.1 with a default install on windows - no tweaks to
> the configuration at all.

With a default install, the free space map settings may well be
too small for tracking free space on a table as large as 10M
rows. Performing VACUUM VERBOSE on database 'template1' will show
you interesting information about the current and ideal FSM
settings, at the end of the output. Something like:

   INFO:  free space map contains 37709 pages in 276 relations
   DETAIL:  A total of 42080 page slots are in use (including overhead).
   42080 page slots are required to track all free space.
   Current limits are:  204800 page slots, 1000 relations, using 1265 kB.

Of course, this also depends on the frequency of your VACUUMing
(if autovacuuming is not configured or badly configured) against
the frequency of your UPDATE's and DELETE's.

> There are many other columns in my table, but none of them are used in this
> query.

Can you show us the table definition? I am too ignorant in PG to
know if that would make a difference, but it might ring a bell
for others.. AFAIK, more column data may mean larger resultsets
and may change the plan (though "width=128" in the log of your
explanation wouldn't mean a lot of data are stored per row).

> Guillaume in your example you didn't add the limit clause?  Postgres chooses
> the correct index in my case without the limit clause, the problem is with
> the limit clause.

Duh, forgot about that, sorry! But I did try it and it was the same.

gc=# explain select * from foo where baz is null and bar = '8' limit 15;
 QUERY PLAN 


 Limit  (cost=0.00..0.42 rows=15 width=154)
   ->  Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089 
width=154)
 Index Cond: (bar = 8)
 Filter: (baz IS NULL)
(4 rows)

> One other difference with your example is both my columns are
> varchar columns, not integer and text, I don't know if that
> would make a difference.

It is always useful to know as much about the actual table
definition and data, to isolate a performance problem... I know
it may clash with privacy :/ but that kind of information
probably will not, isn't it?

With:

gc=# create table foo ( bar varchar(64), baz varchar(256) );

it doesn't make a difference yet:

gc=# explain select * from foo where baz is null and bar = '8';
 QUERY PLAN  
-
 Index Scan using foobar on foo  (cost=0.00..27450.05 rows=982092 width=149)
   Index Cond: ((bar)::text = '8'::text)
   Filter: (baz IS NULL)
(3 rows)

gc=# explain select * from foo where baz is null and bar = '8' limit 15;
QUERY PLAN  
   
---
 Limit  (cost=0.00..0.42 rows=15 width=149)
   ->  Index Scan using foobar on foo  (cost=0.00..27450.05 rows=982092 
width=149)
 Index Cond: ((bar)::text = '8'::text)
 Filter: (baz IS NULL)
(4 rows)

Btw, it would help if you could reproduce my test scenario and
see if PG uses "correctly" the indexscan. It is better to try on
your installation, to take care of any configuration/whatever
variation which may create your problem.

>>From looking at the plans, it seems to be postgres is assuming it will only
> have to sequentially scan 15 rows, which is not true in my case because
> column B is not distributed randomly (nor will it be in production).  Would

Why do you say that? The explanation seems to rather tell that it

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
Here is the results of 'vacuum analyse verbose' on my table:

INFO:  vacuuming "public.jbpm_taskinstance"
INFO:  scanned index "jbpm_taskinstance_pkey" to remove 928153 row versions
DETAIL:  CPU 0.70s/2.40u sec elapsed 46.49 sec.
INFO:  scanned index "idx_tskinst_tminst" to remove 928153 row versions
DETAIL:  CPU 0.78s/2.34u sec elapsed 88.99 sec.
INFO:  scanned index "idx_tskinst_slinst" to remove 928153 row versions
DETAIL:  CPU 0.63s/2.37u sec elapsed 92.54 sec.
INFO:  scanned index "idx_taskinst_tokn" to remove 928153 row versions
DETAIL:  CPU 0.99s/2.30u sec elapsed 110.29 sec.
INFO:  scanned index "idx_taskinst_tsk" to remove 928153 row versions
DETAIL:  CPU 0.92s/2.63u sec elapsed 89.16 sec.
INFO:  scanned index "idx_pooled_actor" to remove 928153 row versions
DETAIL:  CPU 0.32s/1.65u sec elapsed 2.56 sec.
INFO:  scanned index "idx_task_actorid" to remove 928153 row versions
DETAIL:  CPU 0.09s/1.88u sec elapsed 2.69 sec.
INFO:  "jbpm_taskinstance": removed 928153 row versions in 13685 pages
DETAIL:  CPU 0.84s/0.82u sec elapsed 26.42 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  927985 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages

Afterwards I ran a 'vacuum full verbose'

INFO:  vacuuming "public.jbpm_taskinstance"
INFO:  "jbpm_taskinstance": found 0 removable, 7555748 nonremovable row
versions in 166156 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 88 to 209 bytes long.
There were 8470471 unused item pointers.
Total free space (including removable row versions) is 208149116 bytes.
9445 pages are or will become empty, including 0 at the end of the table.
119104 pages containing 206008504 free bytes are potential move
destinations.
CPU 2.44s/1.60u sec elapsed 127.89 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.87s/2.16u sec elapsed 120.81 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  0 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 0.79s/1.95u sec elapsed 103.52 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  0 index row versions were removed.
28343 index pages have been deleted, 28343 are currently reusable.
CPU 0.62s/1.93u sec elapsed 99.21 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  0 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.10s/2.18u sec elapsed 108.29 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.01s/1.73u sec elapsed 64.73 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 20896
pages
DETAIL:  0 index row versions were removed.
136 index pages have been deleted, 136 are currently reusable.
CPU 0.26s/1.57u sec elapsed 3.01 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 20885
pages
DETAIL:  0 index row versions were removed.
121 index pages have been deleted, 121 are currently reusable.
CPU 0.23s/1.52u sec elapsed 2.77 sec.
INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
140279 pages
DETAIL:  CPU 26.50s/138.35u sec elapsed 735.02 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.04s/1.38u sec elapsed 117.72 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  1374243 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 1.37s/1.01u sec elapsed 123.56 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  1374243 index row versions were removed.
28560 index pages have been deleted, 28560 are currently reusable.
CPU 1.20s/1.27u sec elapsed 105.67 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  1374243 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.29s/0.96u sec elapsed 112.62 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.48s/1.12u sec elapsed 70.56 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 25534
pages
DETAIL:  13

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West

>A single VACUUM may not report how bloated your table is, if it's
>been VACUUM'ed some before, but not frequently enough. If you
>have time for it, and you can afford a full lock on the table,
>only a VACUUM FULL VERBOSE will tell you the previous bloat (the
>"table .. truncated to .." line IIRC).

Here's the output of vacuum full verbose (after running a plain vacuum
verbose)

INFO:  vacuuming "public.jbpm_taskinstance"
INFO:  "jbpm_taskinstance": found 0 removable, 7555748 nonremovable row
versions in 166156 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 88 to 209 bytes long.
There were 8470471 unused item pointers.
Total free space (including removable row versions) is 208149116 bytes.
9445 pages are or will become empty, including 0 at the end of the table.
119104 pages containing 206008504 free bytes are potential move
destinations.
CPU 2.44s/1.60u sec elapsed 127.89 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.87s/2.16u sec elapsed 120.81 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  0 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 0.79s/1.95u sec elapsed 103.52 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  0 index row versions were removed.
28343 index pages have been deleted, 28343 are currently reusable.
CPU 0.62s/1.93u sec elapsed 99.21 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  0 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.10s/2.18u sec elapsed 108.29 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.01s/1.73u sec elapsed 64.73 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 20896
pages
DETAIL:  0 index row versions were removed.
136 index pages have been deleted, 136 are currently reusable.
CPU 0.26s/1.57u sec elapsed 3.01 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 20885
pages
DETAIL:  0 index row versions were removed.
121 index pages have been deleted, 121 are currently reusable.
CPU 0.23s/1.52u sec elapsed 2.77 sec.
INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
140279 pages
DETAIL:  CPU 26.50s/138.35u sec elapsed 735.02 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.04s/1.38u sec elapsed 117.72 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  1374243 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 1.37s/1.01u sec elapsed 123.56 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  1374243 index row versions were removed.
28560 index pages have been deleted, 28560 are currently reusable.
CPU 1.20s/1.27u sec elapsed 105.67 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  1374243 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.29s/0.96u sec elapsed 112.62 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.48s/1.12u sec elapsed 70.56 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 25534
pages
DETAIL:  1374243 index row versions were removed.
3769 index pages have been deleted, 3769 are currently reusable.
CPU 0.48s/0.82u sec elapsed 6.89 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 25545
pages
DETAIL:  1374243 index row versions were removed.
3790 index pages have been deleted, 3790 are currently reusable.
CPU 0.37s/1.24u sec elapsed 7.93 sec.
INFO:  vacuuming "pg_toast.pg_toast_560501"
INFO:  "pg_toast_560501": found 0 removable, 0 nonremovable row versions in
0 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_560501_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00

Re: [PERFORM] slow update of index during insert/copy

2008-09-02 Thread Matthew Wakeling

On Mon, 1 Sep 2008, Scott Carey wrote:
Thanks for the info on the patch to support it -- however the versions 
posted there are rather old...


Over here, we're using an extremely old patched version of the JDBC 
driver. That's the patch I sent to some mailing list a couple of years 
ago. It works very well, but I would be very eager to see the COPY support 
make it to the mainstream driver with a consistent interface.



On the performance impact of using COPY instead of INSERT.


Over here, we use the binary form of COPY, and it is *really* fast. It's 
quite easy to saturate the discs.


Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are. -- Kyle Hearn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
"David West"  writes:

> INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
> 140279 pages

nothing which would explain so much planning off :/

> Yep, the table is from the jboss jbpm (business process management) schema.

I've went to that kind of test then, but it didn't help much:

  create table foo ( bar character varying(255), baz character varying(255),
id_ bigint NOT NULL,
class_ character(1) NOT NULL,
version_ integer NOT NULL,
name_ character varying(255),
description_ character varying(4000),
create_ timestamp without time zone,
start_ timestamp without time zone,
end_ timestamp without time zone,
duedate_ timestamp without time zone,
priority_ integer,
iscancelled_ boolean,
issuspended_ boolean,
isopen_ boolean,
issignalling_ boolean,
isblocking_ boolean,
task_ bigint,
token_ bigint,
procinst_ bigint,
swimlaninstance_ bigint,
taskmgmtinstance_ bigint,
processname_ character varying(255) );
  
  insert into foo ( select generate_series(0, 1000) / 100, case when 
random() < 0.05 then 'Today Alcatel-Lucent has announced that Philippe Camus is 
appointed non-executive Chairman and Ben Verwaayen is appointed Chief Executive 
Officer.' else null end, 1, 'a', 1  );
  
  create index foobaz on foo(baz);
  create index foobar on foo(bar);
  analyze foo;

Estimated costs still look correct on my side:

  gc=# explain select * from foo where baz is null and bar in ('8') limit 15;
   QUERY PLAN   
  
  

   Limit  (cost=0.00..0.46 rows=15 width=1795)
 ->  Index Scan using foobar on foo  (cost=0.00..26311.70 rows=860238 
width=1795)
   Index Cond: ((bar)::text = '8'::text)
   Filter: (baz IS NULL)
  (4 rows)
  
  gc=# set enable_indexscan = off;
  SET
  gc=# explain select * from foo where baz is null and bar in ('8') limit 15;
QUERY PLAN  
  --
   Limit  (cost=0.00..3.46 rows=15 width=1795)
 ->  Seq Scan on foo  (cost=0.00..198396.62 rows=860238 width=1795)
   Filter: ((baz IS NULL) AND ((bar)::text = '8'::text))
  (3 rows)


>>Btw, it would help if you could reproduce my test scenario and
>>see if PG uses "correctly" the indexscan. It is better to try on
>>your installation, to take care of any configuration/whatever
>>variation which may create your problem.
>
> I have tried your example and I get the same results as you.
>
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
> QUERY PLAN
>
> 
> 
> ---
>  Limit  (cost=0.00..0.53 rows=15 width=154)
>->  Index Scan using foobar on foo  (cost=0.00..33159.59 rows=934389
> width=15
> 4)
>  Index Cond: (bar = 8)
>  Filter: (baz IS NULL)
> (4 rows)
>
> db=# drop index foobar;
> DROP INDEX
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>  QUERY PLAN
> -
>  Limit  (cost=0.00..2.87 rows=15 width=154)
>->  Seq Scan on foo  (cost=0.00..178593.35 rows=934389 width=154)
>  Filter: ((baz IS NULL) AND (bar = 8))
> (3 rows)
>
> It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for
> sequential scan.  I wonder why in my real tables the index scan cost is
> higher than the sequential scan cost.  Perhaps because of the extra width of
> my rows?

You may try to crosscheck with the new test I've put upper, but
I'm skeptical :/

I think I've unfortunately more than reached my level of
incompetence on that subject, sorry I wasn't able to better
locate your problem :/

>>> From looking at the plans, it seems to be postgres is assuming it will 
>>> only
>>> have to sequentially scan 15 rows, which is not true in my case 
>>> because column B is not distributed randomly (nor will it be in 
>>> production).  Would
>>
>>Why do you say that? The explanation seems to rather tell that it
>>(correctly) assumes that the seqscan would bring up about 1M rows for the
> selected values of A and B, and then it will limit to 15 rows.
>
> I say that because the plan gives a really really low number (3.21) for the
> estimated cost after the limit on sequential scan:
>
> Select * from JBPM_TASKINSTANCE this_ where actorid_ is null and
> this_.POOLEDACTOR_ in ('21') limit 15
> "Limit  (cost=0.00..3.21 rows=15 width=128) (actual
> time=84133.211..84187.247 rows=15 loops=1)"
> "  ->  Seq Scan on jbpm_taskinstance this_  (cost=0.00..234725.85
> rows=1095365 width=128) (actual time=84133.205..84187.186 rows=15 loops=1)"
> "Filter: ((actorid_ IS NULL) AND ((poole

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
Thanks very much for your help Guillaume, I appreciate you spending time on
this.

> Well, if your have 95% of NULL actorid_ and 10% for each value of
> pooledactor_, then it makes sense to assume it will have to fetch
> about 150 rows to find the 15 awaited ones...

This is only true if the data is randomly distributed, which it isn't
unfortunately.

To any postgres developers reading this, two words: planning hints :-).  In
the face of imperfect information it's not possible to write a perfect
planner, please give us the ability to use the heuristic information we have
as developers, and the planner will never know about.  Even if we force
postgres to use queries that give sub-optimal performance some of the time,
once we can write sufficiently performant queries, we're happy.  In cases
like this where postgres gets it very wrong (and this is just a very simple
query after all), well, we're screwed.

I'm going to try partitioning my database along the pooledactor_ column to
see if I can get reasonable performance for my purposes, even if I can't
reach 10 million rows.

Thanks
David

-Original Message-
From: Guillaume Cottenceau [mailto:[EMAIL PROTECTED] 
Sent: 02 September 2008 14:56
To: David West
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?

"David West"  writes:

> INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156
to
> 140279 pages

nothing which would explain so much planning off :/

> Yep, the table is from the jboss jbpm (business process management)
schema.

I've went to that kind of test then, but it didn't help much:

  create table foo ( bar character varying(255), baz character varying(255),
id_ bigint NOT NULL,
class_ character(1) NOT NULL,
version_ integer NOT NULL,
name_ character varying(255),
description_ character varying(4000),
create_ timestamp without time zone,
start_ timestamp without time zone,
end_ timestamp without time zone,
duedate_ timestamp without time zone,
priority_ integer,
iscancelled_ boolean,
issuspended_ boolean,
isopen_ boolean,
issignalling_ boolean,
isblocking_ boolean,
task_ bigint,
token_ bigint,
procinst_ bigint,
swimlaninstance_ bigint,
taskmgmtinstance_ bigint,
processname_ character varying(255) );
  
  insert into foo ( select generate_series(0, 1000) / 100, case when
random() < 0.05 then 'Today Alcatel-Lucent has announced that Philippe Camus
is appointed non-executive Chairman and Ben Verwaayen is appointed Chief
Executive Officer.' else null end, 1, 'a', 1  );
  
  create index foobaz on foo(baz);
  create index foobar on foo(bar);
  analyze foo;

Estimated costs still look correct on my side:

  gc=# explain select * from foo where baz is null and bar in ('8') limit
15;
   QUERY PLAN

 


   Limit  (cost=0.00..0.46 rows=15 width=1795)
 ->  Index Scan using foobar on foo  (cost=0.00..26311.70 rows=860238
width=1795)
   Index Cond: ((bar)::text = '8'::text)
   Filter: (baz IS NULL)
  (4 rows)
  
  gc=# set enable_indexscan = off;
  SET
  gc=# explain select * from foo where baz is null and bar in ('8') limit
15;
QUERY PLAN  
  --
   Limit  (cost=0.00..3.46 rows=15 width=1795)
 ->  Seq Scan on foo  (cost=0.00..198396.62 rows=860238 width=1795)
   Filter: ((baz IS NULL) AND ((bar)::text = '8'::text))
  (3 rows)


>>Btw, it would help if you could reproduce my test scenario and
>>see if PG uses "correctly" the indexscan. It is better to try on
>>your installation, to take care of any configuration/whatever
>>variation which may create your problem.
>
> I have tried your example and I get the same results as you.
>
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
> QUERY PLAN
>
>

> 
> ---
>  Limit  (cost=0.00..0.53 rows=15 width=154)
>->  Index Scan using foobar on foo  (cost=0.00..33159.59 rows=934389
> width=15
> 4)
>  Index Cond: (bar = 8)
>  Filter: (baz IS NULL)
> (4 rows)
>
> db=# drop index foobar;
> DROP INDEX
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>  QUERY PLAN
> -
>  Limit  (cost=0.00..2.87 rows=15 width=154)
>->  Seq Scan on foo  (cost=0.00..178593.35 rows=934389 width=154)
>  Filter: ((baz IS NULL) AND (bar = 8))
> (3 rows)
>
> It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for
> sequential scan.  I wonder why in my real tables the index scan cost is
> higher than the sequential

Re: [PERFORM] too many clog files

2008-09-02 Thread Guillaume Lelarge
Greg Smith a écrit :
> [...]
>> - When, or in what case is  a new clog file produced?
> 
> Every 32K transactions.

Are you sure about this?

y clog files get up to 262144 bytes. Which means 100 transactions'
status: 262144 bytes are 2Mb (mega bits), so if a status is 2 bits, it
holds 1M transactions' status).

AFAICT, 32K transactions' status are available on a single (8KB) page.

Or am I wrong?


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too many clog files

2008-09-02 Thread Greg Smith

On Tue, 2 Sep 2008, Guillaume Lelarge wrote:


AFAICT, 32K transactions' status are available on a single (8KB) page.


You're right, I had that right on the refered to page but mangled it when 
writing the e-mail.


262144 bytes are 2Mb (mega bits), so if a status is 2 bits, [a clog 
file] holds 1M transactions' status).


Exactly.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] logging options...

2008-09-02 Thread Jessica Richard
for a short test purpose, I would like to see what queries are running and how 
long each of them takes.by reconfiguring postgres.conf on the server level.

log_statement = 'all'  is giving me the query statements.. but I don't know 
where I can turn "timing" on just like what I can run from the command line 
"\timing'to measure how long each of the queries takes to finish...

Thanks,
Jessica



  

Re: [PERFORM] logging options...

2008-09-02 Thread Guillaume Lelarge
Jessica Richard a écrit :
> for a short test purpose, I would like to see what queries are running
> and how long each of them takes.by reconfiguring postgres.conf on
> the server level.
> 
> log_statement = 'all'  is giving me the query statements.. but I don't
> know where I can turn "timing" on just like what I can run from the
> command line "\timing'to measure how long each of the queries takes
> to finish...
> 

Either you configure log_statement to all, ddl or mod and log_duration
to on, either you configure log_min_duration_statement to 0.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Matt Smiley
Hi David,

Early in this thread, Pavel suggested:

> you should partial index
> 
> create index foo(b) on mytable where a is null;

Rather, you might try the opposite partial index (where a is NOT null) as a 
replacement for the original unqualified index on column A.  This new index 
will be ignored by the query you're trying to tune, but it'll be available to 
the other queries that filter to a non-null value of column A.  (Omitting NULL 
from that index should be ok because you normally wouldn't want to use an index 
when 95% of the table's rows match the filtered key.)

Then you can temporarily disable Seq Scans in your session for just this one 
query, as follows:

SQL> create table my_table ( a int, b int ) ;
CREATE TABLE

SQL> create index idx_a_not_null on my_table ( a ) where a is not null ;
CREATE INDEX

SQL> create index idx_b on my_table ( b ) ;
CREATE INDEX

SQL> insert into my_table (a, b)
select
  case when random() <= 0.95 then null else i end as a,
  mod(i, 10) as b
from generate_series(1, 1000) s(i)
;
INSERT 0 1000

SQL> analyze my_table ;
ANALYZE


Review the statistics available to the optimizer:

SQL> select attname, null_frac, n_distinct, most_common_vals, 
most_common_freqs, histogram_bounds, correlation
from pg_stats
where tablename = 'my_table'
order by attname
;
 attname | null_frac | n_distinct |   most_common_vals| 
 most_common_freqs   |  
  histogram_bounds| 
correlation
-+---++---+--++-
 a   | 0.945 | -1 |   | 
 | 
{2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768}
 |   1
 b   | 0 | 10 | {9,4,3,1,2,6,8,5,7,0} | 
{0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.098,0.096,0.09}
 |  
  |0.127294
(2 rows)

SQL> select relname, reltuples, relpages from pg_class where relname in 
('my_table', 'idx_a_not_null', 'idx_b') order by relname ;
relname | reltuples | relpages
+---+--
 idx_a_not_null |499955 | 1100
 idx_b  | 1e+07 |21946
 my_table   | 1e+07 |39492
(3 rows)


Run the test query, first without disabling Seq Scan to show this example 
reproduces the plan you're trying to avoid.

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
QUERY PLAN
---
 Limit  (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 
loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.00 rows=929250 width=8) (actual 
time=0.061..0.159 rows=15 loops=1)
 Filter: ((a IS NULL) AND (b = 5))
 Total runtime: 0.371 ms
(4 rows)


Now run the same query without the Seq Scan option.

SQL> set enable_seqscan = false ;
SET

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
QUERY PLAN
--
 Limit  (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..2869913.63 rows=929250 
width=8) (actual time=0.072..0.130 rows=15 loops=1)
 Index Cond: (b = 5)
 Filter: (a IS NULL)
 Total runtime: 0.341 ms
(5 rows)

SQL> reset enable_seqscan ;
RESET


Yes, it's unsavory to temporarily adjust a session-level parameter to tune a 
single query, but I don't know of a less intrusive way to avoid the SeqScan.  
Here's why I think it might be your simplest option:

As far as I can tell, the plan nodes for accessing the table/index are unaware 
of the LIMIT.  The cost of the Limit node is estimated as the cost of its input 
row-source multiplied by the ratio of requested/returned rows.  For example, 
from the preceding plan output:
2869913.63 for "Index Scan" upper cost * (15 row limit / 929250 returned 
rows) = 46.326 upper cost for the "Limit" node
The underlying plan nodes each assume that all the rows matching their filter 
predicates will be returned up the pipeline; the cost estimate is only reduced 
at the Limit node.  A Seq Scan and an Index Scan (over a complete index) will 
both expected the same number of input rows (pg_class.reltuples).  They also 
produce the same estimate

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Tom Lane
"Matt Smiley" <[EMAIL PROTECTED]> writes:
>  So an Index Scan is always going to have a higher cost estimate than
>  an equivalent Seq Scan returning the same result rows (unless
>  random_page_cost is < 1).  That's why I think the planner is always
>  preferring the plan that uses a Seq Scan.

If that were the case, we'd never choose an indexscan at all...

It's true that a plain indexscan is not preferred for queries that will
return a large fraction of the table.  However, it should be willing to
use a bitmap scan for this query, given default cost settings (the
default cost settings will cause it to prefer bitmap scan for retrieving
up to about a third of the table, in my experience).  I too am confused
about why it doesn't prefer that choice in the OP's example.  It would
be interesting to alter the random_page_cost setting and see if he gets
different results.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Samsung 32GB SATA SSD tested

2008-09-02 Thread behrangs

Jeff,

Some off topic questions:

Is it possible to boot the OS from the ioDrive? If so, is the difference in
boot up time noticeable?

Also, how does ioDrive impact compilation time for a moderately large code
base? What about application startup times?

Cheers,
Behrang


Jeffrey Baker wrote:
> 
> For background, please read the thread "Fusion-io ioDrive", archived at
> 
> http://archives.postgresql.org/pgsql-performance/2008-07/msg00010.php
> 
> To recap, I tested an ioDrive versus a 6-disk RAID with pgbench on an
> ordinary PC.  I now also have a 32GB Samsung SATA SSD, and I have tested
> it in the same machine with the same software and configuration.  I
> tested it connected to the NVIDIA CK804 SATA controller on the
> motherboard, and as a pass-through disk on the Areca RAID controller,
> with write-back caching enabled.
> 
>Service Time Percentile, millis
>R/W TPS   R-O TPS  50th   80th   90th   95th
> RAID  182   673 18 32 42 64
> Fusion971  4792  8  9 10 11
> SSD+NV442  4399 12 18 36 43
> SSD+Areca 252  5937 12 15 17 21
> 
> As you can see, there are tradeoffs.  The motherboard's ports are
> substantially faster on the TPC-B type of workload.  This little, cheap
> SSD achieves almost half the performance of the ioDrive (i.e. similar
> performance to a 50-disk SAS array.)  The RAID controller does a better
> job on the read-only workload, surpassing the ioDrive by 20%.
> 
> Strangely the RAID controller behaves badly on the TPC-B workload.  It
> is faster than disk, but not by a lot, and it's much slower than the
> other flash configurations.  The read/write benchmark did not vary when
> changing the number of clients between 1 and 8.  I suspect this is some
> kind of problem with Areca's kernel driver or firmware.
> 
> On the bright side, the Samsung+Areca configuration offers excellent
> service time distribution, comparable to that achieved by the ioDrive.
> Using the motherboard's SATA ports gave service times comparable to the
> disk RAID.
> 
> The performance is respectable for a $400 device.  You get about half
> the tps and half the capacity of the ioDrive, but for one fifth the
> price and in the much more convenient SATA form factor.
> 
> Your faithful investigator,
> jwb
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Samsung-32GB-SATA-SSD-tested-tp18601508p19282698.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance