[PERFORM] switchover between index and sequential scans
Hi. I have a table with 1.8M rows on a Postgres 8.1.4 server, and I'm executing a query which looks like: select count(*) from header_fields where message in (select message from mailbox_messages limit N); I've found that when N==75, the query uses a fast index scan, but when N==100, it switches to a seqscan instead. Here are the plans, first the fast query (which retrieves 1306 rows): > explain analyse select count(*) from header_fields where message in (select > message from mailbox_messages limit 75); Aggregate (cost=84873.57..84873.58 rows=1 width=0) (actual time=940.513..940.516 rows=1 loops=1) -> Nested Loop (cost=2.25..84812.59 rows=24391 width=0) (actual time=53.235..935.743 rows=1306 loops=1) -> HashAggregate (cost=2.25..3.00 rows=75 width=4) (actual time=1.351..1.969 rows=75 loops=1) -> Limit (cost=0.00..1.31 rows=75 width=4) (actual time=0.096..0.929 rows=75 loops=1) -> Seq Scan on mailbox_messages (cost=0.00..1912.10 rows=109410 width=4) (actual time=0.087..0.513 rows=75 loops=1) -> Index Scan using header_fields_message_key on header_fields (cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 loops=75) Index Cond: (header_fields.message = "outer".message) Total runtime: 942.535 ms And the slow query (which fetches 1834 rows): > explain analyse select count(*) from header_fields where message in (select > message from mailbox_messages limit 100); Aggregate (cost=95175.20..95175.21 rows=1 width=0) (actual time=36670.432..36670.435 rows=1 loops=1) -> Hash IN Join (cost=3.00..95093.89 rows=32522 width=0) (actual time=27.620..36662.768 rows=1834 loops=1) Hash Cond: ("outer".message = "inner".message) -> Seq Scan on header_fields (cost=0.00..85706.78 rows=1811778 width=4) (actual time=22.505..29281.553 rows=1812184 loops=1) -> Hash (cost=2.75..2.75 rows=100 width=4) (actual time=1.708..1.708 rows=100 loops=1) -> Limit (cost=0.00..1.75 rows=100 width=4) (actual time=0.033..1.182 rows=100 loops=1) -> Seq Scan on mailbox_messages (cost=0.00..1912.10 rows=109410 width=4) (actual time=0.023..0.633 rows=100 loops=1) Total runtime: 36670.732 ms (If I set enable_seqscan=off, just to see what happens, then it uses the first plan, and executes much faster.) I'd like to understand why this happens, although the problem doesn't seem to exist with 8.3. The number of rows retrieved in each case is a tiny fraction of the table size, so what causes the decision to change between 75 and 100? This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048. Changing the last two doesn't seem to have any effect on the plan. Thanks. -- ams -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] switchover between index and sequential scans
"Abhijit Menon-Sen" <[EMAIL PROTECTED]> writes: > -> Index Scan using header_fields_message_key on header_fields > (cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 > loops=75) >Index Cond: (header_fields.message = "outer".message) > > -> Seq Scan on header_fields (cost=0.00..85706.78 rows=1811778 > width=4) (actual time=22.505..29281.553 rows=1812184 loops=1) It looks to me like it's overestimating the number of rows in the index scan by 20x and it's overestimating the cost of random accesses by about 100%. Combined it's overestimating the cost of the index scan by about 40x. > This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has > shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048. > Changing the last two doesn't seem to have any effect on the plan. You could try dramatically increasing effective_cache_size to try to convince it that most of the random accesses are cached. Or you could reach for the bigger hammer and reduce random_page_cost by about half. Also, if this box is dedicated you could make use of more than 24M for shared buffers. Probably something in the region 64M-128M if your database is large enough to warrant it. And increase the statistics target on header_fields and re-analyze? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] cursor/Fetch mechanisms under postgreSQL
Hi, I have juste some questions about cursor/Fetch mechanisms under postgreSQL. Do it use only scan table approach when one fetch (next) is executed or it takes into account index when this lasts one is able to increase the performance. I did some experimentation wherein I calculated real wall clock time for a same cursor/Fetch C1 with and without index over one field. I noticed that the time response is the same I analyzed a same query (Q1) encapsulated and not encapsulated in a cursor (c1) and i had the following plan: - for a Query all alone : explain analyze SELECT id_ville, taux_crim, temp_ann FROM ville WHERE (pop between 5 AND 10) AND (taux_crim between 13 AND 16); I have the following plan: "Bitmap Heap Scan on ville (cost=253.82..1551.48 rows=915 width=12) (actual time=12.863..24.354 rows=1309 loops=1)" " Recheck Cond: ((taux_crim >= 13) AND (taux_crim <= 16))" " Filter: ((pop >= 5) AND (pop <= 10))" " -> Bitmap Index Scan on taux_crim_idx (cost=0.00..253.59 rows=1 width=0) (actual time=12.482..12.482 rows=13381 loops=1)" "Index Cond: ((taux_crim >= 13) AND (taux_crim <= 16))" "Total runtime: 27.464 ms" - for a same query encapsulated in a curseur: explain analyze declare c1 cursor for SELECT id_ville, taux_crim, temp_ann FROM ville WHERE (pop between 5 AND 10) AND (taux_crim between 13 AND 16); i have another plan: --- "Seq Scan on ville (cost=0.00..3031.00 rows=915 width=12)" " Filter: ((pop >= 5) AND (pop <= 10) AND (taux_crim >= 13) AND (taux_crim <= 16))" --- If index can increase the performance, why this last one is not used by a cursor ??? why ? who to calculate the threshold where it is able to use index ? REMARK: this experimentation has been done with a little program wrote in ECPG. -- Mr. Amine Mokhtari, Phd Student, IRISA Lab. Addr: Irisa / Enssat Technopole Anticipa, 6, rue de Kerampont, BP 80518-22305 Lannion Cedex Email : [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Fix: +33 (0)2 96 46 91 00 Mob: +33 (0)6 70 87 58 72 Fax: +33 (0)2 96 37 01 99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] switchover between index and sequential scans
Hi Greg. At 2008-07-03 11:05:46 +0100, [EMAIL PROTECTED] wrote: > > And increase the statistics target on header_fields and re-analyze? Aha! Thanks for the tip. I just changed the default_statistics_target to 100 (from 10) and ANALYSEd header_fields and mailbox_messages, and now it ALWAYS uses the index scan if I specify a LIMIT. That is, select count(*) from header_fields where message in (select message from mailbox_messages limit N) always uses the index scan on header_fields_message_key, even when N is equal to the number of rows in mailbox_messages (109410). Aggregate (cost=30779.98..30779.99 rows=1 width=0) (actual time=175040.923..175040.926 rows=1 loops=1) -> Nested Loop (cost=3279.73..30760.93 rows=7617 width=0) (actual time=2114.426..169137.088 rows=1771029 loops=1) -> HashAggregate (cost=3279.73..3281.73 rows=200 width=4) (actual time=2076.662..2649.541 rows=109365 loops=1) -> Limit (cost=0.00..1912.10 rows=109410 width=4) (actual time=0.029..1386.128 rows=109410 loops=1) -> Seq Scan on mailbox_messages (cost=0.00..1912.10 rows=109410 width=4) (actual time=0.022..744.190 rows=109410 loops=1) -> Index Scan using header_fields_message_key on header_fields (cost=0.00..136.92 rows=38 width=4) (actual time=0.678..1.416 rows=16 loops=109365) Index Cond: (header_fields.message = "outer".message) Total runtime: 175041.496 ms Note the massive _under_estimation in the hash aggregate and the nestloop. If I don't specify a limit, it'll use a seq scan again. Very interesting. -- ams -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [QUESTION]Concurrent Access
I want to know if the PostGree has limitations about the concurrent access, because a lot of people will access this database at the same time. PostgreSQL has excellent concurrency provided you use it correctly. But what do you mean by concurrent access ? * Number of opened Postgres connections at the same time ? => each one of those uses a little bit of RAM. (see manual) but if idle they don't use CPU. * Number of opened transactions at the same time ? (between BEGIN and COMMIT) If your transactions are long and you have many transactions at the same time you can get lock problems, for instance transaction A updates row X and transaction B updates the same row X, one will have to wait for the other to commit or rollback of course. If your transactions last 1 ms there is no problem, if they last 5 minutes you will suffer. * Number of queries executing at the same time ? This is different from above, each query will eat some CPU and IO resources, and memory too. * Number of concurrent HTTP connections to your website ? If you have a website, you will probably use some form of connection pooling, or lighttpd/fastcgi, or a proxy, whatever, so the number of open database connections at the same time won't be that high. Unless you use mod_php without connection pooling, in that case it will suck of course, but that's normal. * Number of people using your client ? See number of idle connections above. Or use connection pool. I want to know about the limitations, like how much memory do i have to use That depends on what you want to do ;) How big could be my database ? That depends on what you do with it ;) Working set size is more relevant than total database size. For instance if your database contains orders from the last 10 years, but only current orders (say orders from this month) are accessed all the time, with old orders being rarely accessed, you want the last 1-2 months' worth of orders to fit in RAM for fast access (caching) but you don't need RAM to fit your entire database. So, think about working sets not total sizes. And there is no limit on the table size (well, there is, but you'll never hit it). People have terabytes in postgres and it seems to work ;) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] slow delete
I have a table with 29K rows total and I need to delete about 80K out of it. I have a b-tree index on column cola (varchar(255) ) for my where clause to use. my "select count(*) from test where cola = 'abc' runs very fast, but my actual "delete from test where cola = 'abc';" takes forever, never can finish and I haven't figured why In my explain output, what is that "Bitmap Heap Scan on table"? is it a table scan? is my index being used? How does delete work? to delete 80K rows that meet my condition, does Postgres find them all and delete them all together or one at a time? by the way, there is a foreign key on another table that references the primary key col0 on table test. Could some one help me out here? Thanks a lot, Jessica testdb=# select count(*) from test; count 295793 --total 295,793 rows (1 row) Time: 155.079 ms testdb=# select count(*) from test where cola = 'abc'; count --- 80998 - need to delete 80,988 rows (1 row) testdb=# explain delete from test where cola = 'abc'; QUERY PLAN Bitmap Heap Scan on test (cost=2110.49..10491.57 rows=79766 width=6) Recheck Cond: ((cola)::text = 'abc'::text) -> Bitmap Index Scan on test_cola_idx (cost=0.00..2090.55 rows=79766 width=0) Index Cond: ((cola)::text = 'abc'::text) (4 rows)
[PERFORM] Define all IP's in the world in pg_hba.conf
Hi everybody, I know that this group deals with performance but is the only one on which I'm subscribed, so my apologize in advance for the question. I want to allow everybody in the world, all IP's, to connect to my server. How do I accomplish that? Definitely, it's not a good solution to enter all them manually in pg_hba.conf :). Currently, if above question cannot be answered, I want to achieve to allow the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the future it can expand to all IP's. Thank you, Danny -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow delete
Jessica Richard wrote: I have a table with 29K rows total and I need to delete about 80K out of it. I assume you meant 290K or something. I have a b-tree index on column cola (varchar(255) ) for my where clause to use. my "select count(*) from test where cola = 'abc' runs very fast, but my actual "delete from test where cola = 'abc';" takes forever, never can finish and I haven't figured why When you delete, the database server must: - Check all foreign keys referencing the data being deleted - Update all indexes on the data being deleted - and actually flag the tuples as deleted by your transaction All of which takes time. It's a much slower operation than a query that just has to find out how many tuples match the search criteria like your SELECT does. How many indexes do you have on the table you're deleting from? How many foreign key constraints are there to the table you're deleting from? If you find that it just takes too long, you could drop the indexes and foreign key constraints, do the delete, then recreate the indexes and foreign key constraints. This can sometimes be faster, depending on just what proportion of the table must be deleted. Additionally, remember to VACUUM ANALYZE the table after that sort of big change. AFAIK you shouldn't really have to if autovacuum is doing its job, but it's not a bad idea anyway. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Define all IP's in the world in pg_hba.conf
idc danny wrote: > Hi everybody, > > I know that this group deals with performance but is the only one on which > I'm subscribed, so my apologize in advance for the question. > > I want to allow everybody in the world, all IP's, to connect to my server. > How do I accomplish that? Definitely, it's not a good solution to enter all > them manually in pg_hba.conf :). > what's wrong with 0.0.0.0/0 ? > Currently, if above question cannot be answered, I want to achieve to allow > the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the > future it can expand to all IP's. > > Thank you, > Danny > > > > > > > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance