Re: [GENERAL] Simple query not using index: why?

William Garrison Wed, 03 Sep 2008 12:56:17 -0700

Can't it just scan the index to get that? I assumed the index had linksto every fileid in the table. In my over-simplified imagination, thetable looks like this:


ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
etc.


While the index looks like
fileid|ctid
fileid|ctid
fileid|ctid
fileid|ctid
...

So I expected scanning the index was faster, and still had everything itneeded to do the count. Or perhaps it was because I said COUNT(*) so itneeds to look at the other columns in the table? I really just wantedthe number of "hits" not the number of records with distinct values oranything like that. My understanding was that COUNT(*) did that, anddidn't really look at the columns themselves.



Adrian Klaver wrote:

 -------------- Original message ----------------------
From: William Garrison <[EMAIL PROTECTED]>
I am looking for records with duplicate keys, so I am running this query:

SELECT
    fileid, COUNT(*)
FROM
    file
GROUP BY
    fileid
HAVING
    COUNT(*)>1
The table has an index on fileid (non-unique index) so I am surprisedthat postgres is doing a table scan. This database is >15GB, and thereare a number of fairly large string columns in the table. I am verysurprised that scanning the index is not faster than scanning thetable. Any thoughts on that? Is scanning the table faster thanscanning the index? Is there a reason that it needs anything other thanthe index?
I may be missing something, but it would have to scan the entire table to get 
all the occurrences of each fileid in order to do the count(*).



--
Adrian Klaver
[EMAIL PROTECTED]

Re: [GENERAL] Simple query not using index: why?

Reply via email to