2010/1/5 Milan Zamazal <p...@brailcom.org>

> My problem is that retrieving sorted data from large tables is sometimes
> very slow in PostgreSQL (8.4.1, FWIW).
>
>

> I typically retrieve the data using cursors, to display them in UI:
>
>  BEGIN;
>  DECLARE ... SELECT ... ORDER BY ...;
>  FETCH ...;
>  ...
>
> On a newly created table of about 10 million rows the FETCH command
> takes about one minute by default, with additional delay during the
> contingent following COMMIT command.  This is because PostgreSQL uses
> sequence scan on the table even when there is an index on the ORDER BY
> column.  When I can force PostgreSQL to perform index scan (e.g. by
> setting one of the options enable_seqscan or enable_sort to off), FETCH
> response is immediate.
>
> PostgreSQL manual explains motivation for sequence scans of large tables
> and I can understand the motivation.  Nevertheless such behavior leads
> to unacceptably poor performance in my particular case.  It is important
> to get first resulting rows quickly, to display them to the user without
> delay.
>
> My questions are:
>
> - What is your experience with using ORDER BY + indexes on large tables?
>

Without a WHERE clause postgres will almost always choose a sequential scan.



> - Is there a way to convince PostgreSQL to use index scans automatically
>  in cases where it is much more efficient?  I tried using ANALYZE,
>  VACUUM and SET STATISTICS, but without success.
>
>
By using cursors you take some responsibility away from the planner.
It has no idea that you want first 100 rows quickly. It just tries to
optimize the whole operation.




> - Is it a good idea to set enable_seqscan or enable_sort to "off"
>  globally in my case?  Or to set them to "off" just before working with
>  large tables?  My databases contain short and long tables, often
>  connected through REFERENCES or joined into views and many of shorter
>  tables serve as codebooks.  Can setting one of the parameters to off
>  have clearly negative impacts?
>

IMHO, no, no and yes.


>
> - Is there a recommended way to keep indexes in good shape so that the
>  performance of initial rows retrievals remains good?  The large tables
>  are typically append-only tables with a SERIAL primary key.
>
>
Use partitioning.
If that's not possible, REINDEX periodically to avoid sub-optimal btree
layout. But that's just a half-solution.




Thanks for any tips.
>


tips:

1. get rid of cursors, unless you have a strong need for them (eg. seeking
back and forth and updating).

2. switch to "chunked" processing, like this:

SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
(process the records)
SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch]
ORDER by idxcol LIMIT 1000;
... and so on.



pozdrawiam,
Filip


-- 
Filip Rembiałkowski
JID,mailto:filip.rembialkow...@gmail.com
http://filip.rembialkowski.net/

Reply via email to