date:20080703

[PERFORM] switchover between index and sequential scans

2008-07-03 Thread Abhijit Menon-Sen

Hi.

I have a table with 1.8M rows on a Postgres 8.1.4 server, and I'm
executing a query which looks like:

select count(*) from header_fields where message in
(select message from mailbox_messages limit N);

I've found that when N==75, the query uses a fast index scan, but when
N==100, it switches to a seqscan instead. Here are the plans, first the
fast query (which retrieves 1306 rows):

> explain analyse select count(*) from header_fields where message in (select 
> message from mailbox_messages limit 75);

 Aggregate  (cost=84873.57..84873.58 rows=1 width=0) (actual 
time=940.513..940.516 rows=1 loops=1)
   ->  Nested Loop  (cost=2.25..84812.59 rows=24391 width=0) (actual 
time=53.235..935.743 rows=1306 loops=1)
 ->  HashAggregate  (cost=2.25..3.00 rows=75 width=4) (actual 
time=1.351..1.969 rows=75 loops=1)
   ->  Limit  (cost=0.00..1.31 rows=75 width=4) (actual 
time=0.096..0.929 rows=75 loops=1)
 ->  Seq Scan on mailbox_messages  (cost=0.00..1912.10 
rows=109410 width=4) (actual time=0.087..0.513 rows=75 loops=1)
 ->  Index Scan using header_fields_message_key on header_fields  
(cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 
loops=75)
   Index Cond: (header_fields.message = "outer".message)
 Total runtime: 942.535 ms

And the slow query (which fetches 1834 rows):

> explain analyse select count(*) from header_fields where message in (select 
> message from mailbox_messages limit 100);

 Aggregate  (cost=95175.20..95175.21 rows=1 width=0) (actual 
time=36670.432..36670.435 rows=1 loops=1)
   ->  Hash IN Join  (cost=3.00..95093.89 rows=32522 width=0) (actual 
time=27.620..36662.768 rows=1834 loops=1)
 Hash Cond: ("outer".message = "inner".message)
 ->  Seq Scan on header_fields  (cost=0.00..85706.78 rows=1811778 
width=4) (actual time=22.505..29281.553 rows=1812184 loops=1)
 ->  Hash  (cost=2.75..2.75 rows=100 width=4) (actual time=1.708..1.708 
rows=100 loops=1)
   ->  Limit  (cost=0.00..1.75 rows=100 width=4) (actual 
time=0.033..1.182 rows=100 loops=1)
 ->  Seq Scan on mailbox_messages  (cost=0.00..1912.10 
rows=109410 width=4) (actual time=0.023..0.633 rows=100 loops=1)
 Total runtime: 36670.732 ms

(If I set enable_seqscan=off, just to see what happens, then it uses the
first plan, and executes much faster.)

I'd like to understand why this happens, although the problem doesn't
seem to exist with 8.3. The number of rows retrieved in each case is a
tiny fraction of the table size, so what causes the decision to change
between 75 and 100?

This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has
shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048.
Changing the last two doesn't seem to have any effect on the plan.

Thanks.

-- ams

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] switchover between index and sequential scans

2008-07-03 Thread Gregory Stark

"Abhijit Menon-Sen" <[EMAIL PROTECTED]> writes:

>  ->  Index Scan using header_fields_message_key on header_fields  
> (cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 
> loops=75)
>Index Cond: (header_fields.message = "outer".message)
>
>  ->  Seq Scan on header_fields  (cost=0.00..85706.78 rows=1811778 
> width=4) (actual time=22.505..29281.553 rows=1812184 loops=1)

It looks to me like it's overestimating the number of rows in the index scan
by 20x and it's overestimating the cost of random accesses by about 100%.
Combined it's overestimating the cost of the index scan by about 40x.

> This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has
> shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048.
> Changing the last two doesn't seem to have any effect on the plan.

You could try dramatically increasing effective_cache_size to try to convince
it that most of the random accesses are cached. Or you could reach for the
bigger hammer and reduce random_page_cost by about half.

Also, if this box is dedicated you could make use of more than 24M for shared
buffers. Probably something in the region 64M-128M if your database is large
enough to warrant it.

And increase the statistics target on header_fields and re-analyze?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] cursor/Fetch mechanisms under postgreSQL

2008-07-03 Thread Mokhtari Amine


Hi,

I have juste some questions about cursor/Fetch mechanisms under postgreSQL.

Do it use only scan table approach when one fetch (next) is executed or 
it takes into account index when this lasts one is able to increase the 
performance.



I did some experimentation wherein I calculated real wall clock time for 
a same cursor/Fetch C1 with and without index over one field.


I noticed that the time response is the same 

I analyzed a same query (Q1) encapsulated and not encapsulated in a 
cursor (c1) and i had the following plan:


- for a Query all alone : explain analyze SELECT id_ville, taux_crim, 
temp_ann FROM ville WHERE (pop between 5 AND 10) AND (taux_crim 
between 13 AND 16);

I have the following plan:

"Bitmap Heap Scan on ville  (cost=253.82..1551.48 rows=915 width=12) 
(actual time=12.863..24.354 rows=1309 loops=1)"

"  Recheck Cond: ((taux_crim >= 13) AND (taux_crim <= 16))"
"  Filter: ((pop >= 5) AND (pop <= 10))"
"  ->  Bitmap Index Scan on taux_crim_idx  (cost=0.00..253.59 rows=1 
width=0) (actual time=12.482..12.482 rows=13381 loops=1)"

"Index Cond: ((taux_crim >= 13) AND (taux_crim <= 16))"
"Total runtime: 27.464 ms"


- for a same query encapsulated in a curseur: explain analyze declare c1 
cursor for SELECT id_ville, taux_crim, temp_ann FROM ville WHERE (pop 
between 5 AND 10) AND (taux_crim between 13 AND 16);


i have another plan:
---
"Seq Scan on ville  (cost=0.00..3031.00 rows=915 width=12)"
"  Filter: ((pop >= 5) AND (pop <= 10) AND (taux_crim >= 13) AND 
(taux_crim <= 16))"

---


If index can increase the performance, why this last one is not used by 
a cursor ??? why ?

who to calculate the threshold where it is able to use index ?


REMARK: this experimentation has been done with a little program wrote 
in ECPG.


--
Mr. Amine Mokhtari,
Phd Student,
IRISA Lab. 
Addr: Irisa / Enssat Technopole Anticipa, 6, rue de Kerampont, BP

80518-22305 Lannion Cedex

Email : [EMAIL PROTECTED]
   [EMAIL PROTECTED]
   [EMAIL PROTECTED]

Fix: +33 (0)2 96 46 91 00
Mob: +33 (0)6 70 87 58 72
Fax: +33 (0)2 96 37 01 99 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] switchover between index and sequential scans

2008-07-03 Thread Abhijit Menon-Sen

Hi Greg.

At 2008-07-03 11:05:46 +0100, [EMAIL PROTECTED] wrote:
>
> And increase the statistics target on header_fields and re-analyze?

Aha! Thanks for the tip. I just changed the default_statistics_target to
100 (from 10) and ANALYSEd header_fields and mailbox_messages, and now
it ALWAYS uses the index scan if I specify a LIMIT. That is,

select count(*) from header_fields where message in
(select message from mailbox_messages limit N)

always uses the index scan on header_fields_message_key, even when N is
equal to the number of rows in mailbox_messages (109410).

 Aggregate  (cost=30779.98..30779.99 rows=1 width=0) (actual 
time=175040.923..175040.926 rows=1 loops=1)
   ->  Nested Loop  (cost=3279.73..30760.93 rows=7617 width=0) (actual 
time=2114.426..169137.088 rows=1771029 loops=1)
 ->  HashAggregate  (cost=3279.73..3281.73 rows=200 width=4) (actual 
time=2076.662..2649.541 rows=109365 loops=1)
   ->  Limit  (cost=0.00..1912.10 rows=109410 width=4) (actual 
time=0.029..1386.128 rows=109410 loops=1)
 ->  Seq Scan on mailbox_messages  (cost=0.00..1912.10 
rows=109410 width=4) (actual time=0.022..744.190 rows=109410 loops=1)
 ->  Index Scan using header_fields_message_key on header_fields  
(cost=0.00..136.92 rows=38 width=4) (actual time=0.678..1.416 rows=16 
loops=109365)
   Index Cond: (header_fields.message = "outer".message)
 Total runtime: 175041.496 ms

Note the massive _under_estimation in the hash aggregate and the
nestloop. If I don't specify a limit, it'll use a seq scan again.

Very interesting.

-- ams

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [QUESTION]Concurrent Access

2008-07-03 Thread PFC



I want to know if the PostGree has limitations about the concurrent  
access,

because a lot of people will access this database at the same time.


PostgreSQL has excellent concurrency provided you use it correctly.

But what do you mean by concurrent access ?

* Number of opened Postgres connections at the same time ?
	=> each one of those uses a little bit of RAM. (see manual) but if idle  
they don't use CPU.


* Number of opened transactions at the same time ?
(between BEGIN and COMMIT)
	If your transactions are long and you have many transactions at the same  
time you can get lock problems, for instance transaction A updates row X  
and transaction B updates the same row X, one will have to wait for the  
other to commit or rollback of course. If your transactions last 1 ms  
there is no problem, if they last 5 minutes you will suffer.


* Number of queries executing at the same time ?
	This is different from above, each query will eat some CPU and IO  
resources, and memory too.


* Number of concurrent HTTP connections to your website ?
	If you have a website, you will probably use some form of connection  
pooling, or lighttpd/fastcgi, or a proxy, whatever, so the number of open  
database connections at the same time won't be that high. Unless you use  
mod_php without connection pooling, in that case it will suck of course,  
but that's normal.


* Number of people using your client ?
See number of idle connections above. Or use connection pool.

I want to know about the limitations, like how much memory do i have to  
use


That depends on what you want to do ;)


How big could be my database ?


That depends on what you do with it ;)

Working set size is more relevant than total database size.

	For instance if your database contains orders from the last 10 years, but  
only current orders (say orders from this month) are accessed all the  
time, with old orders being rarely accessed, you want the last 1-2 months'  
worth of orders to fit in RAM for fast access (caching) but you don't need  
RAM to fit your entire database.

So, think about working sets not total sizes.

	And there is no limit on the table size (well, there is, but you'll never  
hit it). People have terabytes in postgres and it seems to work ;)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] slow delete

2008-07-03 Thread Jessica Richard

I have a table with 29K rows total and I need to delete about 80K out of it.

I have a b-tree index on column cola (varchar(255) ) for my where clause to use.

my "select count(*) from test where cola = 'abc' runs very fast,
 
but my actual "delete from test where cola = 'abc';" takes forever, never can 
finish and I haven't figured why

In my explain output, what is that "Bitmap Heap Scan on table"? is it a table 
scan? is my index being used?

How does delete work? to delete 80K rows that meet my condition, does Postgres 
find them all and delete them all together or one at a time?


by the way, there is a foreign key on another table that references the primary 
key col0 on table test.

Could some one help me out here?

Thanks a lot,
Jessica


testdb=# select count(*) from test;
 count  

 295793  --total 295,793 rows
(1 row)

Time: 155.079 ms

testdb=# select count(*) from test where cola = 'abc';
 count 
---
 80998  - need to delete 80,988 rows
(1 row)



testdb=# explain delete from test where cola = 'abc';
 QUERY PLAN 


 Bitmap Heap Scan on test  (cost=2110.49..10491.57 rows=79766 width=6)
   Recheck Cond: ((cola)::text = 'abc'::text)
   ->  Bitmap Index Scan on test_cola_idx  (cost=0.00..2090.55 rows=79766 
width=0)
 Index Cond: ((cola)::text = 'abc'::text)
(4 rows)

[PERFORM] Define all IP's in the world in pg_hba.conf

2008-07-03 Thread idc danny

Hi everybody,

I know that this group deals with performance but is the only one on which I'm 
subscribed, so my apologize in advance for the question.

I want to allow everybody in the world, all IP's, to connect to my server. How 
do I accomplish that? Definitely, it's not a good solution to enter all them 
manually in pg_hba.conf :).

Currently, if above question cannot be answered, I want to achieve to allow the 
IP's of Hamachi network, which all are of the form 5.*.*.* - but in the future 
it can expand to all IP's.

Thank you,
Danny


 


  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] slow delete

2008-07-03 Thread Craig Ringer


Jessica Richard wrote:

I have a table with 29K rows total and I need to delete about 80K out of it.


I assume you meant 290K or something.

I have a b-tree index on column cola (varchar(255) ) for my where clause 
to use.


my "select count(*) from test where cola = 'abc' runs very fast,
 
but my actual "delete from test where cola = 'abc';" takes forever, 
never can finish and I haven't figured why


When you delete, the database server must:

- Check all foreign keys referencing the data being deleted
- Update all indexes on the data being deleted
- and actually flag the tuples as deleted by your transaction

All of which takes time. It's a much slower operation than a query that 
just has to find out how many tuples match the search criteria like your 
SELECT does.


How many indexes do you have on the table you're deleting from? How many 
foreign key constraints are there to the table you're deleting from?


If you find that it just takes too long, you could drop the indexes and 
foreign key constraints, do the delete, then recreate the indexes and 
foreign key constraints. This can sometimes be faster, depending on just 
what proportion of the table must be deleted.


Additionally, remember to VACUUM ANALYZE the table after that sort of 
big change. AFAIK you shouldn't really have to if autovacuum is doing 
its job, but it's not a bad idea anyway.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Define all IP's in the world in pg_hba.conf

2008-07-03 Thread Russell Smith

idc danny wrote:
> Hi everybody,
>
> I know that this group deals with performance but is the only one on which 
> I'm subscribed, so my apologize in advance for the question.
>
> I want to allow everybody in the world, all IP's, to connect to my server. 
> How do I accomplish that? Definitely, it's not a good solution to enter all 
> them manually in pg_hba.conf :).
>   
what's wrong with 0.0.0.0/0 ?
> Currently, if above question cannot be answered, I want to achieve to allow 
> the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the 
> future it can expand to all IP's.
>
> Thank you,
> Danny
>
>
>  
>
>
>   
>
>   


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] switchover between index and sequential scans

Re: [PERFORM] switchover between index and sequential scans

[PERFORM] cursor/Fetch mechanisms under postgreSQL

Re: [PERFORM] switchover between index and sequential scans

Re: [PERFORM] [QUESTION]Concurrent Access

[PERFORM] slow delete

[PERFORM] Define all IP's in the world in pg_hba.conf

Re: [PERFORM] slow delete

Re: [PERFORM] Define all IP's in the world in pg_hba.conf

9 matches

Site Navigation

Mail list logo

Footer information