On 2011-04-09 18:54, Tom Lane wrote:
I think that would be a positive disimprovement.  The current design
guarantees that volatile sort expressions are evaluated exactly once,
in the order the rows are read from the data source.  There would be no
guarantees at all, either as to the number of evaluations or the order
in which they happen, if we tried to do evaluation only during the
actual sort.
If I could "only evaluate" the rows needed, then that would also
be a huge win, below case shifts what "definitely shouldn't be a seqscan"
into one due to a secondary sort key.

Another small problem is that any such thing would require carrying
along some kind of closure (ie, the expression and all its input
values), not just the final sort key value, in tuples being sorted.
The ensuing complexity, speed penalty, and increase in data volume
to be sorted would be paid by everybody, making this probably a net
performance loss when considered across all applications.
Getting the value for the first sortkey and carrying on a closure
for the rest would mostly (very often) be "optimal" ?

It would also enable a select that has to sortkeys to utilize an
index that only contains the primary sortkey, which is a huge
negative effect of what's being done today.

2011-04-18 07:12:43.931 testdb=# explain select id from testdb.testtable order by id limit 500;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..262.75 rows=500 width=4)
-> Index Scan using testtable_pkey on testtable (cost=0.00..15015567.84 rows=28573400 width=4)
(2 rows)

Time: 1.363 ms
2011-04-18 07:12:52.498 testdb=# explain select id from testdb.testtable order by id,name limit 500;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit  (cost=5165456.70..5165457.95 rows=500 width=35)
   ->  Sort  (cost=5165456.70..5236890.20 rows=28573400 width=35)
         Sort Key: id, name
-> Seq Scan on testtable (cost=0.00..3741675.00 rows=28573400 width=35)
(4 rows)

Time: 1.420 ms

Enabling any users to sort using multiple keys, without ending in Seq Scans somewhere down the line seems to require multi dimension indexes on all combinations of colums

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to