-----Original Message-----
From: Mike Christensen [mailto:m...@kitchenpc.com] 
Sent: Tuesday, June 05, 2012 4:28 PM
To: Dann Corbit
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] I'd like to learn a bit more about how indexes work

I'm aware of how "B-Trees" work, but I only understand them on the level of 
traversing a single tree at a time.  I'm curious how Postgres combines multiple 
indexes with OR and AND clauses.

I've done some Google searches, however I can't find anything basic.
Everything I've found assumes you already have knowledge of terms such as "hash 
join, aggregate join, etc".

At this point I'm not looking at learning how the optimizer works..
>>
"How the optimizer works" is the answer to your question.
The plan of attack for forming a query is a function of the optimizer.

One possible plan  for " WHERE data = key1 OR data = key2 " is something along 
the lines of:
SEEK("key1")
While key == key1 accumulate rows into the result set
   GetNextRow()
SEEK("key2")
While key == key2 accumulate rows into the result set
   GetNextRow()

However, if the table is tiny (suppose it is ten rows and fits into memory) 
then a table scan might be cheaper.

Here at CONNX, I have written a hashed btree search that tends to be cheaper 
than using a clustered index if there are no qualifiers on the join.
For instance
SELECT a.*, b.* from table1 a, table2 b WHERE a.unique_index = b.foreign_key

It will be faster to actually not use the index.  Whereas if there are 
additional where clause criteria such as:
SELECT a.*, b.* from table1 a, table2 b WHERE a.unique_index = b.foreign_key 
AND a.unique_index IN (k1, k2, k3, k4,..., kn, kn+1)
It will probably be faster to use the index unless the list of items is a 
substantial proportion of the possible data values.

The point is that there is not a simple formula that describes how data values 
are retrieved from the database.  The method of collection is decided by the 
optimizer.

It isn't always cheaper to use an index.  In fact, sometimes building an index 
is a complete waste of time.
For instance, suppose that you have a column named 'sex' that can contain the 
values 'F', 'M', and 'U'
If you built an index on that column it won't help you to find all the males 
faster than a table scan because the data is not specific enough so that the 
total number of pages of disk that are read would be more with the index than 
if the index were not used.
So, I suggest that possibly the articles you do not want to read are the very 
ones that will answer your questions.

On the other hand, it is not unlikely that I simply do not understand the 
questions that you are asking.
<<

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to