[PERFORM] performance discrepancy indexing one column versus two columns
All, In the psql output below, I would expect the second query to run faster, because the b-tree index on two columns knows the values of 'b' for any given value of 'a', and hence does not need to fetch a row from the actual table. I am not seeing a speed-up, however, so I expect my understanding of the index mechanism is wrong. Could anyone enlighten me? Specifically, I would expect the first query to walk the b-tree looking for values of 'a' equal to 1, 11, etc., and then a dereference over to the main table to fetch the value for column 'b'. But I would expect the second query to walk the b-tree looking for values of 'a' equal to 1, 11, etc., and then find on that same page in the b-tree the value of 'b', thereby avoiding the dereference and extra page fetch. Is the problem that the two-column b-tree contains more data, is spread across more disk pages, and is hence slower to access, canceling out the performance gain of not having to fetch from the main table? Or is the query system not using the second column information from the index and doing the table fetch anyway? Or does the index store the entire row from the main table regardless of the column being indexed? I am running postgresql 8.0.3 on a Pentium 4 with ide hard drives and the default configuration file settings. Thanks in advance, mark mark=# create sequence test_id_seq; CREATE SEQUENCE mark=# create table test (a integer not null default nextval('test_id_seq'), b integer not null); CREATE TABLE mark=# create function testfunc () returns void as $$ mark$# declare mark$# i integer; mark$# begin mark$# for i in 1..100 loop mark$# insert into test (b) values (i); mark$# end loop; mark$# return; mark$# end; mark$# $$ language plpgsql; CREATE FUNCTION mark=# select * from testfunc(); testfunc -- (1 row) mark=# select count(*) from test; count - 100 (1 row) mark=# create index test_single_idx on test(a); CREATE INDEX mark=# vacuum full; VACUUM mark=# analyze; ANALYZE mark=# explain analyze select b from test where a in (1, 11, 21, 31, 41, 51, 61, 71, 81, 91); QUERY PLAN -- Index Scan using test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx, test_single_idx on test (cost=0.00..30.36 rows=10 width=4) (actual time=0.145..0.917 rows=10 loops=1) Index Cond: ((a = 1) OR (a = 11) OR (a = 21) OR (a = 31) OR (a = 41) OR (a = 51) OR (a = 61) OR (a = 71) OR (a = 81) OR(a = 91)) Total runtime: 1.074 ms (3 rows) mark=# drop index test_single_idx; DROP INDEX mark=# create index test_double_idx on test(a,b); CREATE INDEX mark=# vacuum full; VACUUM mark=# analyze; ANALYZE mark=# explain analyze select b from test where a in (1, 11, 21, 31, 41, 51, 61, 71, 81, 91); QUERY PLAN -- Index Scan using test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx, test_double_idx on test (cost=0.00..43.48 rows=10 width=4) (actual time=0.283..1.119 rows=10 loops=1) Index Cond: ((a = 1) OR (a = 11) OR (a = 21) OR (a = 31) OR (a = 41) OR (a = 51) OR (a = 61) OR (a = 71) OR (a = 81) OR(a = 91)) Total runtime: 1.259 ms (3 rows) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] performance discrepancy indexing one column versus
On Sun, 11 Sep 2005, Mark Dilger wrote: > All, > > In the psql output below, I would expect the second query to run faster, > because the b-tree index on two columns knows the values of 'b' for any > given value of 'a', and hence does not need to fetch a row from the > actual table. I am not seeing a speed-up, however, so I expect my > understanding of the index mechanism is wrong. Could anyone enlighten > me? A common but incorrect assumption. We must consult the underlying table when we do an index scan so that we can check visibility information. The reason it is stored there in the table is so that we have only one place to check for tuple visibility and therefore avoid race conditions. A brief explanation of this system is described here: http://www.postgresql.org/docs/8.0/static/mvcc.html. and this page shows what information we store in the to do visibility checks: http://www.postgresql.org/docs/8.0/static/storage-page-layout.html Thanks, Gavin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Index not used with prepared statement
Hi. I have a performance problem with prepared statements (JDBC prepared statement). This query: PreparedStatement st = conn.prepareStatement("SELECT id FROM dga_dienstleister WHERE plz like '45257'"); does use an index. This query: String plz = "45257"; PreparedStatement st = conn.prepareStatement("SELECT id FROM dga_dienstleister WHERE plz like ?"); st.setString(1, plz); does NOT use an index. As it should in the end result in absolutely the same statement, the index should be used all the time. I have to set the protocolVersion=2 and use the JDBC2 driver to get it working (but then the statements are created like in the first query, so no solution, only a workaround). I'm not sure whether this is a bug (I think it is) or a problem of understanding. Known problem? I have tried PG 8.0.1, 8.0.3, 8.1beta with the JDBC-drivers - postgresql-8.0-312.jdbc2.jar --> okay with protocolVersion=2 in the URL - postgresql-8.0-312.jdbc3.jar --> not okay whatever I do I'm on Mac OS X, if that matters. cug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index not used with prepared statement
Guido Neitzer schrob: > I have a performance problem with prepared statements (JDBC prepared > statement). > > This query: > > PreparedStatement st = conn.prepareStatement("SELECT id FROM > dga_dienstleister WHERE plz like '45257'"); > > does use an index. > > This query: > > String plz = "45257"; > PreparedStatement st = conn.prepareStatement("SELECT id FROM > dga_dienstleister WHERE plz like ?"); > st.setString(1, plz); > > does NOT use an index. > > As it should in the end result in absolutely the same statement, the > index should be used all the time. I'm not perfectly sure, but since the index could only be used with a subset of all possible parameters (the pattern for like has to be left-anchored), I could imagine the planner has to avoid the index in order to produce an universal plan (the thing behind a prepared statement). Is there a reason you are using the like operator at all? IMO using the =-operator instead in your example should produce an "index-using prepared statement". HTH Andreas -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index not used with prepared statement
On 11.09.2005, at 11:03 Uhr, Andreas Seltenreich wrote: I'm not perfectly sure, but since the index could only be used with a subset of all possible parameters (the pattern for like has to be left-anchored), I could imagine the planner has to avoid the index in order to produce an universal plan (the thing behind a prepared statement). Hmm. Now I get it. So I have to look that my framework doesn't produce a preparedStatement, instead build a complete statement string. Weird. Is there a reason you are using the like operator at all? IMO using the =-operator instead in your example should produce an "index-using prepared statement". Yes, you are right, but then I can't pass anything like '45%' to the query. It will just return nothing. I use the "like" because I build the queries on the fly and add a % at the end where necessary. And, to be clear: this is a minimal example, most of my queries are generated by a framework. This was an example to test the behaviour. Okay, I had problems with the understanding of prepared statements on the client and the server side. What I thought was, that I get a preparedStatement by JDBC which also inserts the values into the string and this is executed on the server side. cug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] shared buffers
Martin Nickel wrote: > Chris, > Would you say that 3 pages is a good maximum for a Postgres install? > We're running 8.0.3 on 64-bit SUSE on a dual Opteron box with 4G and have > shared_buffers set at 12. I've moved it up and down (it was 16 > when I got here) without any measurable performance difference. What I've read on the mailing list, is that usually the sweet spot is actually around 10k pages. 120k seems far too high. I believe that the major fixes to the buffer manager are more in 8.1 rather than 8.0, so you probably are hitting some problems. (The biggest problem was that there were places that require doing a complete scan through shared memory looking for dirty pages, or some such). > > The reason I ask is because I occasionally see large-ish queries take > forever (like cancel-after-12-hours forever) and wondered if this could > result from shared_buffers being too large. There are lots of possibilities for why these take so long, perhaps you would want to post them, and we can try to help. For instance, if you have a foreign key reference from one table to another, and don't have indexes on both sides, then deleting from the referenced table, will cause a sequential scan on the referring table for *each* deleted row. (IIRC). John =:-> > > Thanks for your (and anyone else's) help! > Martin Nickel signature.asc Description: OpenPGP digital signature
[PERFORM] LEFT JOIN optimization
Hi list, I don't have much experience with Postgres optimization, somehow I was happily avoiding anything more difficult than simple select statement, and it was working all right. Now LEFT JOIN must be used, and I am not happy with the performance: It takes about 5 seconds to run very simple LEFT JOIN query on a table "user_" with ~ 13.000 records left joined to table "church" with ~ 300 records on Powerbook PPC 1.67 GHz with 1.5 GB ram. Is it normal? Some details: test=# explain select * from user_ left join church on user_.church_id = church.id; QUERY PLAN - Hash Left Join (cost=6.44..7626.69 rows=12763 width=325) Hash Cond: ("outer".church_id = "inner".id) -> Seq Scan on user_ (cost=0.00..7430.63 rows=12763 width=245) -> Hash (cost=5.75..5.75 rows=275 width=80) -> Seq Scan on church (cost=0.00..5.75 rows=275 width=80) (5 rows) From what I understand, it doesn't use foreign key index on user_ table. So I tried: mydb=# set enable_seqscan='false'; SET mydb=# explain select * from user_ left join church on user_.church_id = church.id; QUERY PLAN - Merge Right Join (cost=0.00..44675.77 rows=12763 width=325) Merge Cond: ("outer".id = "inner".church_id) -> Index Scan using chirch_pkey on church (cost=0.00..17.02 rows=275 width=80) -> Index Scan using user__church_id on user_ (cost=0.00..44500.34 rows=12763 width=245) (4 rows) It's my first time reading Query plans, but from wat I understand, it doesn't make the query faster.. Any tips are greatly appreciated. -- Ksenia ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LEFT JOIN optimization
* Ksenia Marasanova ([EMAIL PROTECTED]) wrote: > Any tips are greatly appreciated. EXPLAIN ANALYZE of the same queries would be much more useful. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] LEFT JOIN optimization
2005/9/12, Stephen Frost <[EMAIL PROTECTED]>: > * Ksenia Marasanova ([EMAIL PROTECTED]) wrote: > > Any tips are greatly appreciated. > > EXPLAIN ANALYZE of the same queries would be much more useful. Thanks, here it is: test=# explain analyze select * from user_ left join church on user_.church_id = church.id; QUERY PLAN --- Hash Left Join (cost=6.44..7626.69 rows=12763 width=325) (actual time=388.573..2016.929 rows=12763 loops=1) Hash Cond: ("outer".church_id = "inner".id) -> Seq Scan on user_ (cost=0.00..7430.63 rows=12763 width=245) (actual time=360.431..1120.012 rows=12763 loops=1) -> Hash (cost=5.75..5.75 rows=275 width=80) (actual time=27.985..27.985 rows=0 loops=1) -> Seq Scan on church (cost=0.00..5.75 rows=275 width=80) (actual time=0.124..26.953 rows=275 loops=1) Total runtime: 2025.946 ms (6 rows) test=# set enable_seqscan='false'; SET test=# explain analyze select * from user_ left join church on user_.church_id = church.id; QUERY PLAN -- Merge Right Join (cost=0.00..44675.77 rows=12763 width=325) (actual time=0.808..2119.099 rows=12763 loops=1) Merge Cond: ("outer".id = "inner".church_id) -> Index Scan using chirch_pkey on church (cost=0.00..17.02 rows=275 width=80) (actual time=0.365..5.471 rows=275 loops=1) -> Index Scan using user__church_id on user_ (cost=0.00..44500.34 rows=12763 width=245) (actual time=0.324..1243.348 rows=12763 loops=1) Total runtime: 2131.364 ms (5 rows) I followed some tips on the web and vacuum-ed database, I think the query is faster now, almost acceptable, but still interesting to know if it possible to optimize it... Thanks again, -- Ksenia ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] LEFT JOIN optimization
* Ksenia Marasanova ([EMAIL PROTECTED]) wrote: > test=# explain analyze select * from user_ left join church on > user_.church_id = church.id; [...] > Total runtime: 2025.946 ms > (6 rows) > > test=# set enable_seqscan='false'; > SET > test=# explain analyze select * from user_ left join church on > user_.church_id = church.id; > [...] > Total runtime: 2131.364 ms > (5 rows) > > > I followed some tips on the web and vacuum-ed database, I think the > query is faster now, almost acceptable, but still interesting to know > if it possible to optimize it... I have to say that it does seem a bit slow for only 12,000 rows.. What's the datatype of user_.church_id and church.id? Are you sure you really want all 12,000 rows every time you run that query? Perhaps there's a 'where' clause you could apply with an associated index to limit the query to just what you actually need? You'll noticed from above, though, that the non-index scan is faster. I'd expect that when using a left-join query: you have to go through the entire table on an open left-join like that, a sequencial scan is going to be the best way to do that. The fact that church.id is hashed makes the solution the planner came up with almost certainly the best one possible. Are you sure a left-join is what you want? Sounds like maybe you've moved (for some reason) from a regular join to a left join with a filtering in the application which is probably a bad move... If you can use at least some filtering in the database I expect that'd help.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Prepared statement not using index
Guido Neitzer wrote: > Hi. > > I have an interesting problem with the JDBC drivers. When I use a > select like this: > > "SELECT t0.aktiv, t0.id, t0.ist_teilnehmer, t0.nachname, t0.plz, > t0.vorname FROM public.dga_dienstleister t0 WHERE t0.plz like > ?::varchar(256) ESCAPE '|'" withBindings: 1:"53111"(plz)> > > the existing index on the plz column is not used. > > When I the same select with a concrete value, the index IS used. > > I use PostgreSQL 8.0.3 on Mac OS X and the JDBC driver 8.0-312 JDBC 3. > > After a lot of other things, I tried using a 7.4 driver and with this, > the index is used in both cases. > > Why can this happen? Is there a setting I might have not seen? > Something I do wrong? > > cug I've had this problem in the past. In my case, the issue was that the column I was searching had a mixed blend of possible values. For example, with 1M rows, the number 3 occurred 100 times, but the number 18 occurred 700,000 times. So when I manually did a search for 3, it naturally realized that it could use an index scan, because it had the statistics to say it was very selective. If I manually did a search for 18, it switched to sequential scan, because it was not very selective (both are the correct plans). But if you create a prepared statement, parameterized on this number, postgres has no way of knowing ahead of time, whether you will be asking about 3 or 18, so when the query is prepared, it has to be pessimistic, and avoid worst case behavior, so it choses to always use a sequential scan. The only way I got around this was with writing a plpgsql function which used the EXECUTE syntax to dynamically re-plan part of the query. Hope this makes sense. This may or may not be your problem, without knowing more about you setup. But the symptoms seem similar. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Advise about how to delete entries
Kevin wrote: > Arnau wrote: > >> Hi all, >> >> > >> > COPY FROM a file with all the ID's to delete, into a temporary >> table, and do a joined delete to your main table (thus, only one query). >> >> >> I already did this, but I don't have idea about how to do this join, >> could you give me a hint ;-) ? >> >> Thank you very much > > > maybe something like this: > > DELETE FROM statistics_sasme s > LEFT JOIN temp_table t ON (s.statistic_id = t.statistic_id) > WHERE t.statistic_id IS NOT NULL > Why can't you do: DELETE FROM statistics_sasme s JOIN temp_table t ON (s.statistic_id = t.statistic_id); Or possibly: DELETE FROM statistics_sasme s WHERE s.id IN (SELECT t.statistic_id FROM temp_table t); I'm not sure how delete exactly works with joins, but the IN form should be approximately correct. John =:-> signature.asc Description: OpenPGP digital signature