[PERFORM] performance discrepancy indexing one column versus two columns

2005-09-11 Thread Mark Dilger

All,

In the psql output below, I would expect the second query to run faster, because 
the b-tree index on two columns knows the values of 'b' for any given value of 
'a', and hence does not need to fetch a row from the actual table.  I am not 
seeing a speed-up, however, so I expect my understanding of the index mechanism 
is wrong.  Could anyone enlighten me?


Specifically, I would expect the first query to walk the b-tree looking for 
values of 'a' equal to 1, 11, etc., and then a dereference over to the main 
table to fetch the value for column 'b'.


But I would expect the second query to walk the b-tree looking for values of 'a' 
equal to 1, 11, etc., and then find on that same page in the b-tree the 
value of 'b', thereby avoiding the dereference and extra page fetch.


Is the problem that the two-column b-tree contains more data, is spread across 
more disk pages, and is hence slower to access, canceling out the performance 
gain of not having to fetch from the main table?  Or is the query system not 
using the second column information from the index and doing the table fetch 
anyway?  Or does the index store the entire row from the main table regardless 
of the column being indexed?


I am running postgresql 8.0.3 on a Pentium 4 with ide hard drives and the 
default configuration file settings.


Thanks in advance,

mark



mark=# create sequence test_id_seq;
CREATE SEQUENCE
mark=# create table test (a integer not null default nextval('test_id_seq'), b 
integer not null);

CREATE TABLE
mark=# create function testfunc () returns void as $$
mark$# declare
mark$# i integer;
mark$# begin
mark$# for i in 1..100 loop
mark$# insert into test (b) values (i);
mark$# end loop;
mark$# return;
mark$# end;
mark$# $$ language plpgsql;
CREATE FUNCTION
mark=# select * from testfunc();
 testfunc
--

(1 row)

mark=# select count(*) from test;
  count
-
 100
(1 row)

mark=# create index test_single_idx on test(a);
CREATE INDEX
mark=# vacuum full;
VACUUM
mark=# analyze;
ANALYZE
mark=# explain analyze select b from test where a in (1, 11, 21, 31, 
41, 51, 61, 71, 81, 91);


 QUERY PLAN
--
 Index Scan using test_single_idx, test_single_idx, test_single_idx, 
test_single_idx, test_single_idx, test_single_idx, test_single_idx, 
test_single_idx, test_single_idx, test_single_idx on test  (cost=0.00..30.36 
rows=10 width=4) (actual time=0.145..0.917 rows=10 loops=1)
   Index Cond: ((a = 1) OR (a = 11) OR (a = 21) OR (a = 31) OR (a = 
41) OR (a = 51) OR (a = 61) OR (a = 71) OR (a = 81) OR(a = 
91))

 Total runtime: 1.074 ms
(3 rows)

mark=# drop index test_single_idx;
DROP INDEX
mark=# create index test_double_idx on test(a,b);
CREATE INDEX
mark=# vacuum full;
VACUUM
mark=# analyze;
ANALYZE
mark=# explain analyze select b from test where a in (1, 11, 21, 31, 
41, 51, 61, 71, 81, 91);


 QUERY PLAN
--
 Index Scan using test_double_idx, test_double_idx, test_double_idx, 
test_double_idx, test_double_idx, test_double_idx, test_double_idx, 
test_double_idx, test_double_idx, test_double_idx on test  (cost=0.00..43.48 
rows=10 width=4) (actual time=0.283..1.119 rows=10 loops=1)
   Index Cond: ((a = 1) OR (a = 11) OR (a = 21) OR (a = 31) OR (a = 
41) OR (a = 51) OR (a = 61) OR (a = 71) OR (a = 81) OR(a = 
91))

 Total runtime: 1.259 ms
(3 rows)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] performance discrepancy indexing one column versus

2005-09-11 Thread Gavin Sherry
On Sun, 11 Sep 2005, Mark Dilger wrote:

> All,
>
> In the psql output below, I would expect the second query to run faster,
> because the b-tree index on two columns knows the values of 'b' for any
> given value of 'a', and hence does not need to fetch a row from the
> actual table.  I am not seeing a speed-up, however, so I expect my
> understanding of the index mechanism is wrong.  Could anyone enlighten
> me?

A common but incorrect assumption. We must consult the underlying table
when we do an index scan so that we can check visibility information. The
reason it is stored there in the table is so that we have only one place
to check for tuple visibility and therefore avoid race conditions.

A brief explanation of this system is described here:
http://www.postgresql.org/docs/8.0/static/mvcc.html.

and this page shows what information we store in the to do visibility
checks:

http://www.postgresql.org/docs/8.0/static/storage-page-layout.html

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Index not used with prepared statement

2005-09-11 Thread Guido Neitzer

Hi.

I have a performance problem with prepared statements (JDBC prepared  
statement).


This query:

PreparedStatement st = conn.prepareStatement("SELECT id FROM  
dga_dienstleister WHERE plz like '45257'");


does use an index.

This query:

String plz = "45257";
PreparedStatement st = conn.prepareStatement("SELECT id FROM  
dga_dienstleister WHERE plz like ?");

st.setString(1, plz);

does NOT use an index.

As it should in the end result in absolutely the same statement, the  
index should be used all the time. I have to set the  
protocolVersion=2 and use the JDBC2 driver to get it working (but  
then the statements are created like in the first query, so no  
solution, only a workaround).


I'm not sure whether this is a bug (I think it is) or a problem of  
understanding.


Known problem?

I have tried PG 8.0.1, 8.0.3, 8.1beta with the JDBC-drivers

- postgresql-8.0-312.jdbc2.jar --> okay with protocolVersion=2 in the  
URL

- postgresql-8.0-312.jdbc3.jar --> not okay whatever I do

I'm on Mac OS X, if that matters.

cug

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index not used with prepared statement

2005-09-11 Thread Andreas Seltenreich
Guido Neitzer schrob:

> I have a performance problem with prepared statements (JDBC prepared  
> statement).
>
> This query:
>
> PreparedStatement st = conn.prepareStatement("SELECT id FROM  
> dga_dienstleister WHERE plz like '45257'");
>
> does use an index.
>
> This query:
>
> String plz = "45257";
> PreparedStatement st = conn.prepareStatement("SELECT id FROM  
> dga_dienstleister WHERE plz like ?");
> st.setString(1, plz);
>
> does NOT use an index.
>
> As it should in the end result in absolutely the same statement, the  
> index should be used all the time.

I'm not perfectly sure, but since the index could only be used with a
subset of all possible parameters (the pattern for like has to be
left-anchored), I could imagine the planner has to avoid the index in
order to produce an universal plan (the thing behind a prepared
statement).

Is there a reason you are using the like operator at all? IMO using
the =-operator instead in your example should produce an "index-using
prepared statement".

HTH
Andreas
-- 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index not used with prepared statement

2005-09-11 Thread Guido Neitzer

On 11.09.2005, at 11:03 Uhr, Andreas Seltenreich wrote:


I'm not perfectly sure, but since the index could only be used with a
subset of all possible parameters (the pattern for like has to be
left-anchored), I could imagine the planner has to avoid the index in
order to produce an universal plan (the thing behind a prepared
statement).


Hmm. Now I get it. So I have to look that my framework doesn't  
produce a preparedStatement, instead build a complete statement  
string. Weird.



Is there a reason you are using the like operator at all? IMO using
the =-operator instead in your example should produce an "index-using
prepared statement".


Yes, you are right, but then I can't pass anything like '45%' to the  
query. It will just return nothing.


I use the "like" because I build the queries on the fly and add a %  
at the end where necessary.


And, to be clear: this is a minimal example, most of my queries are  
generated by a framework. This was an example to test the behaviour.


Okay, I had problems with the understanding of prepared statements on  
the client and the server side. What I thought was, that I get a  
preparedStatement by JDBC which also inserts the values into the  
string and this is executed on the server side.


cug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] shared buffers

2005-09-11 Thread John A Meinel
Martin Nickel wrote:
> Chris,
> Would you say that 3 pages is a good maximum for a Postgres install?
> We're running 8.0.3 on 64-bit SUSE on a dual Opteron box with 4G and have
> shared_buffers set at 12.  I've moved it up and down (it was 16
> when I got here) without any measurable performance difference.

What I've read on the mailing list, is that usually the sweet spot is
actually around 10k pages. 120k seems far too high.

I believe that the major fixes to the buffer manager are more in 8.1
rather than 8.0, so you probably are hitting some problems. (The biggest
problem was that there were places that require doing a complete scan
through shared memory looking for dirty pages, or some such).

>
> The reason I ask is because I occasionally see large-ish queries take
> forever (like cancel-after-12-hours forever) and wondered if this could
> result from shared_buffers being too large.

There are lots of possibilities for why these take so long, perhaps you
would want to post them, and we can try to help.
For instance, if you have a foreign key reference from one table to
another, and don't have indexes on both sides, then deleting from the
referenced table, will cause a sequential scan on the referring table
for *each* deleted row. (IIRC).

John
=:->

>
> Thanks for your (and anyone else's) help!
> Martin Nickel


signature.asc
Description: OpenPGP digital signature


[PERFORM] LEFT JOIN optimization

2005-09-11 Thread Ksenia Marasanova
Hi list,

I don't have much experience with Postgres optimization, somehow I was
happily avoiding anything more difficult than simple select statement,
and it was working all right.

Now LEFT JOIN must be used, and I am not happy with the performance:
It takes about 5 seconds to run very simple LEFT JOIN query on a table
"user_" with ~ 13.000 records left joined to table "church" with ~ 300
records on Powerbook PPC 1.67 GHz with 1.5 GB ram.
Is it normal?

Some details:

test=# explain select * from user_ left join church on user_.church_id
= church.id;
 QUERY PLAN  
-
 Hash Left Join  (cost=6.44..7626.69 rows=12763 width=325)
   Hash Cond: ("outer".church_id = "inner".id)
   ->  Seq Scan on user_  (cost=0.00..7430.63 rows=12763 width=245)
   ->  Hash  (cost=5.75..5.75 rows=275 width=80)
 ->  Seq Scan on church  (cost=0.00..5.75 rows=275 width=80)
(5 rows)


From what I understand, it doesn't use foreign key index on user_
table. So I tried:

mydb=# set enable_seqscan='false';
SET
mydb=# explain select * from user_ left join church on user_.church_id
= church.id;
 QUERY PLAN
-
 Merge Right Join  (cost=0.00..44675.77 rows=12763 width=325)
   Merge Cond: ("outer".id = "inner".church_id)
   ->  Index Scan using chirch_pkey on church  (cost=0.00..17.02
rows=275 width=80)
   ->  Index Scan using user__church_id on user_  (cost=0.00..44500.34
rows=12763 width=245)
(4 rows)


It's my first time reading Query plans, but from wat I understand, it
doesn't make the query faster..

Any tips are greatly appreciated.
-- 
Ksenia

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] LEFT JOIN optimization

2005-09-11 Thread Stephen Frost
* Ksenia Marasanova ([EMAIL PROTECTED]) wrote:
> Any tips are greatly appreciated.

EXPLAIN ANALYZE of the same queries would be much more useful.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] LEFT JOIN optimization

2005-09-11 Thread Ksenia Marasanova
2005/9/12, Stephen Frost <[EMAIL PROTECTED]>:
> * Ksenia Marasanova ([EMAIL PROTECTED]) wrote:
> > Any tips are greatly appreciated.
> 
> EXPLAIN ANALYZE of the same queries would be much more useful.

Thanks, here it is:

test=# explain analyze select * from user_ left join church on
user_.church_id = church.id;
  QUERY PLAN
---
 Hash Left Join  (cost=6.44..7626.69 rows=12763 width=325) (actual
time=388.573..2016.929 rows=12763 loops=1)
   Hash Cond: ("outer".church_id = "inner".id)
   ->  Seq Scan on user_  (cost=0.00..7430.63 rows=12763 width=245)
(actual time=360.431..1120.012 rows=12763 loops=1)
   ->  Hash  (cost=5.75..5.75 rows=275 width=80) (actual
time=27.985..27.985 rows=0 loops=1)
 ->  Seq Scan on church  (cost=0.00..5.75 rows=275 width=80)
(actual time=0.124..26.953 rows=275 loops=1)
 Total runtime: 2025.946 ms
(6 rows)

test=# set enable_seqscan='false';
SET
test=# explain analyze select * from user_ left join church on
user_.church_id = church.id;
 
QUERY PLAN
--
 Merge Right Join  (cost=0.00..44675.77 rows=12763 width=325) (actual
time=0.808..2119.099 rows=12763 loops=1)
   Merge Cond: ("outer".id = "inner".church_id)
   ->  Index Scan using chirch_pkey on church  (cost=0.00..17.02
rows=275 width=80) (actual time=0.365..5.471 rows=275 loops=1)
   ->  Index Scan using user__church_id on user_  (cost=0.00..44500.34
rows=12763 width=245) (actual time=0.324..1243.348 rows=12763 loops=1)
 Total runtime: 2131.364 ms
(5 rows)


I followed some tips on the web and vacuum-ed database, I think the
query is  faster now, almost acceptable, but still interesting to know
if it possible to optimize it...

Thanks again,
-- 
Ksenia

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] LEFT JOIN optimization

2005-09-11 Thread Stephen Frost
* Ksenia Marasanova ([EMAIL PROTECTED]) wrote:
> test=# explain analyze select * from user_ left join church on
> user_.church_id = church.id;
[...]
>  Total runtime: 2025.946 ms
> (6 rows)
> 
> test=# set enable_seqscan='false';
> SET
> test=# explain analyze select * from user_ left join church on
> user_.church_id = church.id;
>  
[...]
>  Total runtime: 2131.364 ms
> (5 rows)
> 
> 
> I followed some tips on the web and vacuum-ed database, I think the
> query is  faster now, almost acceptable, but still interesting to know
> if it possible to optimize it...

I have to say that it does seem a bit slow for only 12,000 rows..
What's the datatype of user_.church_id and church.id?  Are you sure you
really want all 12,000 rows every time you run that query?  Perhaps
there's a 'where' clause you could apply with an associated index to
limit the query to just what you actually need?

You'll noticed from above, though, that the non-index scan is faster.
I'd expect that when using a left-join query: you have to go through the
entire table on an open left-join like that, a sequencial scan is going
to be the best way to do that.  The fact that church.id is hashed makes
the solution the planner came up with almost certainly the best one
possible.

Are you sure a left-join is what you want?  Sounds like maybe you've
moved (for some reason) from a regular join to a left join with a
filtering in the application which is probably a bad move...  If you can
use at least some filtering in the database I expect that'd help..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Prepared statement not using index

2005-09-11 Thread John A Meinel
Guido Neitzer wrote:
> Hi.
>
> I have an interesting problem with the JDBC drivers. When I use a
> select like this:
>
> "SELECT t0.aktiv, t0.id, t0.ist_teilnehmer, t0.nachname, t0.plz,
> t0.vorname FROM public.dga_dienstleister t0 WHERE t0.plz  like
> ?::varchar(256) ESCAPE '|'" withBindings: 1:"53111"(plz)>
>
> the existing index on the plz column is not used.
>
> When I the same select with a concrete value, the index IS used.
>
> I use PostgreSQL 8.0.3 on Mac OS X and the JDBC driver 8.0-312 JDBC 3.
>
> After a lot of other things, I tried using a 7.4 driver and with  this,
> the index is used in both cases.
>
> Why can this happen? Is there a setting I might have not seen?
> Something I do wrong?
>
> cug

I've had this problem in the past. In my case, the issue was that the
column I was searching had a mixed blend of possible values. For
example, with 1M rows, the number 3 occurred 100 times, but the number
18 occurred 700,000 times.

So when I manually did a search for 3, it naturally realized that it
could use an index scan, because it had the statistics to say it was
very selective. If I manually did a search for 18, it switched to
sequential scan, because it was not very selective (both are the correct
plans).

But if you create a prepared statement, parameterized on this number,
postgres has no way of knowing ahead of time, whether you will be asking
about 3 or 18, so when the query is prepared, it has to be pessimistic,
and avoid worst case behavior, so it choses to always use a sequential scan.

The only way I got around this was with writing a plpgsql function which
used the EXECUTE syntax to dynamically re-plan part of the query.

Hope this makes sense. This may or may not be your problem, without
knowing more about you setup. But the symptoms seem similar.

John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Advise about how to delete entries

2005-09-11 Thread John A Meinel
Kevin wrote:
> Arnau wrote:
>
>> Hi all,
>>
>> >
>> > COPY FROM a file with all the ID's to delete, into a temporary
>> table, and  do a joined delete to your main table (thus, only one query).
>>
>>
>>   I already did this, but I don't have idea about how to do this join,
>> could you give me a hint ;-) ?
>>
>> Thank you very much
>
>
> maybe something like this:
>
> DELETE FROM statistics_sasme s
> LEFT JOIN temp_table t ON (s.statistic_id = t.statistic_id)
> WHERE t.statistic_id IS NOT NULL
>

Why can't you do:
DELETE FROM statistics_sasme s JOIN temp_table t ON (s.statistic_id =
t.statistic_id);

Or possibly:

DELETE FROM statistics_sasme s
  WHERE s.id IN (SELECT t.statistic_id FROM temp_table t);

I'm not sure how delete exactly works with joins, but the IN form should
be approximately correct.

John
=:->



signature.asc
Description: OpenPGP digital signature