Re: [HACKERS] [PERFORM] not using index for select min(...)

Josh Berkus Sun, 02 Feb 2003 12:25:13 -0800

Tom,

> In the end, the only reasonable way to handle this kind of thing is
> to teach the query planner about it.  Considering the small number
> of cases that are usefully optimizable (basically only MIN and MAX
> on a single table without any WHERE or GROUP clauses), and the ready
> availability of a SQL-level workaround, it strikes me as a very
> low-priority TODO item.


Low priority for you, Tom.  For some of us, it's one of the three most 
high-priority "bugs" in PostgreSQL.

I constantly try to sell my clients, and potential clients, on PostgreSQL.  
And the two things that trip me up the most frequently are lack of 
replication and our dog-slow aggregates.  I can usually sell Postgres on our 
strong points, but the aggregate issue is *always* a problem.   And the "slow 
aggregate" problem comes up about twice a week on Performance and three times 
a week on SQL.

Regardless of the technical reason, among MSSQL, Oracle, MySQL and PostgreSQL, 
we have the slowest performing simple aggregates.  It's very well to explain 
this is due to our system of extensible aggregates, but if a potential 
Postgres developer doesn't want to create custom aggregates, but does want to 
use MIN() in a correlated subquery, then they will go to a different RDBMS.

As I said before, I'm absolutely thrilled that you came up with a solution for 
COUNT(*) ... GROUP BY queries through Hash Aggregates.   That's half the 
picture, now we need a way to speed up MIN() and MAX() for simple one-column 
expressions.   While there is a "workaround" using ORDER BY & LIMIT, this 
doesn't work for correlated subqueries or if one wants to evaluate the result 
of MAX() in the query.  For example, the following query is not possible to 
"workaround" in PostgreSQL:

select teams_desc.team_id, team_name, team_code, notes,
min(teams_tree.treeno) as lnode, max(teams_tree.treeno) as rnode,
parent.team_id as parent_id, count(*)/2 as tlevel
from teams_desc JOIN teams_tree USING (team_id)
join teams_tree parent ON parent.treeno < teams_tree.treeno
join teams_tree parents on parents.treeno < teams_tree.treeno
WHERE parent.treeno = (SELECT max(p1.treeno) from teams_tree p1
                where p1.treeno < teams_tree.treeno
                and exists (select treeno from teams_tree p2
                        where p2.treeno > teams_tree.treeno
                        and p2.team_id = p1.team_id))
AND EXISTS (select parents2.team_id from teams_tree parents2
        where parents2.treeno > teams_tree.treeno
        AND parents2.team_id = parents.team_id)
group by teams_desc.team_id, team_name, team_code, notes, parent.team_id;

While one would hardly expect the above query to be fast, it is dissapointing 
that it takes about 8-10 times as long to execute on PostgreSQL as on MSSQL, 
since MSSQL seems to be able to use indexes to evaluate all three MIN() and 
MAX() expressions.

Further, assigning such a common query function to a Postgres-specific 
workaround hardly upholds our project's dedication to standards.   The fact 
that we are telling new users to use non-SQL-compliant code to do a query 
type present in 90% of databases bothers me every single time I give a newbie 
that advice.

It still seems to me that if a query's WHERE expression can be evaluated using 
an index, then any related MIN() or MAX() expression should be evaluable 
using an index.   That is, if you are selecting:
SELECT MAX(team_id) FROM teams WHERE team_id BETWEEN 100 and 200;
... with an index on team_id then this entire query should be able to return 
trough an index scan.   We've discussed the particular planner problems this 
presents for PostgreSQL, but I still believe that these are solvable ... and 
moreover, that we *need* to solve them if we're going to be competitive with 
other SQL RDBMSes.

I do realize that it's my job to find something to do about this issue since 
I'm the one so worked up about it.   What I'm concerned about is the 
possibility of having any idea or fix I come up with dismissed out of hand 
because it's a "low-priority todo".   Please add up the questions and 
complaints of the users on SQL, NOVICE, and PERFORMANCE ... I know you read 
them.

Thanks for reading, Tom.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [PERFORM] not using index for select min(...)

Reply via email to