2010/1/5 Milan Zamazal <p...@brailcom.org> > My problem is that retrieving sorted data from large tables is sometimes > very slow in PostgreSQL (8.4.1, FWIW). > >
> I typically retrieve the data using cursors, to display them in UI: > > BEGIN; > DECLARE ... SELECT ... ORDER BY ...; > FETCH ...; > ... > > On a newly created table of about 10 million rows the FETCH command > takes about one minute by default, with additional delay during the > contingent following COMMIT command. This is because PostgreSQL uses > sequence scan on the table even when there is an index on the ORDER BY > column. When I can force PostgreSQL to perform index scan (e.g. by > setting one of the options enable_seqscan or enable_sort to off), FETCH > response is immediate. > > PostgreSQL manual explains motivation for sequence scans of large tables > and I can understand the motivation. Nevertheless such behavior leads > to unacceptably poor performance in my particular case. It is important > to get first resulting rows quickly, to display them to the user without > delay. > > My questions are: > > - What is your experience with using ORDER BY + indexes on large tables? > Without a WHERE clause postgres will almost always choose a sequential scan. > - Is there a way to convince PostgreSQL to use index scans automatically > in cases where it is much more efficient? I tried using ANALYZE, > VACUUM and SET STATISTICS, but without success. > > By using cursors you take some responsibility away from the planner. It has no idea that you want first 100 rows quickly. It just tries to optimize the whole operation. > - Is it a good idea to set enable_seqscan or enable_sort to "off" > globally in my case? Or to set them to "off" just before working with > large tables? My databases contain short and long tables, often > connected through REFERENCES or joined into views and many of shorter > tables serve as codebooks. Can setting one of the parameters to off > have clearly negative impacts? > IMHO, no, no and yes. > > - Is there a recommended way to keep indexes in good shape so that the > performance of initial rows retrievals remains good? The large tables > are typically append-only tables with a SERIAL primary key. > > Use partitioning. If that's not possible, REINDEX periodically to avoid sub-optimal btree layout. But that's just a half-solution. Thanks for any tips. > tips: 1. get rid of cursors, unless you have a strong need for them (eg. seeking back and forth and updating). 2. switch to "chunked" processing, like this: SELECT * FROM bigtable ORDER by idxcol LIMIT 1000; (process the records) SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch] ORDER by idxcol LIMIT 1000; ... and so on. pozdrawiam, Filip -- Filip Rembiałkowski JID,mailto:filip.rembialkow...@gmail.com http://filip.rembialkowski.net/