Re: [GENERAL] query performance

Alex Turner Sun, 13 Jan 2008 21:55:58 -0800

Oh - if you do this then make sure that you have the primary key index on
overview too.


Alex

On Jan 14, 2008 12:53 AM, Alex Turner <[EMAIL PROTECTED]> wrote:

> If you combine it with Tom Lane's suggestion - it will go even better,
> something like:
>
> select * from t_documentcontent where _id in (select _id from overview
> where _id>x order by _id limit 50);
>
> Alex
>
>
> On Jan 13, 2008 11:59 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:
>
> >
> > Thanks Alex
> >
> > I test your solution and is realy more faster.
> >
> > Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=
> > 101.695..106.178 rows=50 loops=1)
> >   ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual
> > time=101.509..101.567 rows=50 loops=1)
> >         ->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50
> > width=108) (actual time=101.327..101.456 rows=50 loops=1)
> >               ->  Limit  (cost=1741.60..1742.69 rows=50 width=108)
> > (actual time=101.313..101.383 rows=50 loops=1)
> >                     ->  Seq Scan on overview  
> > (cost=0.00..3283.07rows=150807 width=108) (actual time=
> > 0.036..72.249 rows=80050 loops=1)
> >   ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=
> > 0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
> >         Index Cond: ((t_documentcontent._id)::text =
> > ("outer"._id)::text)
> > Total runtime: 106.323 ms
> >
> > I now need to see what trigers i need to add, and test the insertions.
> > Thanks again
> >
> >
> > On Jan 14, 2008 5:54 AM, Alex Turner < [EMAIL PROTECTED]> wrote:
> >
> > > Here is a table I threw together to demonstrate the approximate speed
> > > of a materialized view in this case:
> > >
> > > trend=# explain analyze select property_id from overview order by
> > > property_id limit 50 offset 50000;
> > >                                                          QUERY
> > > PLAN
> > >
> > > ----------------------------------------------------------------------------------------------------------------------------
> > >  Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=
> > > 446.048..446.125 rows=50 loops=1)
> > >    ->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual
> > > time=384.788..424.433 rows=50050 loops=1)
> > >          Sort Key: property_id
> > >          ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094
> > > width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
> > >  Total runtime: 447.578 ms
> > > (5 rows)
> > >
> > > trend=# select count(*) from overview;
> > >  count
> > > --------
> > >  173409
> > > (1 row)
> > >
> > > trend=#
> > >
> > > It's not great - but it's better than 47 seconds (The machine I'm
> > > running it on is far from big iron, so these results should be fairly
> > > typical for any modern x86 box - also this materialized view is almost
> > > certainly in RAM, and therefore IO speed is irrelevant).
> > >
> > > Tom lane has already suggested another approach, whereby you order
> > > your results, then select the next 10 from the set where the id is greater
> > > than the greatest of the last one:
> > >
> > > select id from overview order by id limit 50;
> > >
> > > x = get row['id']  // for row 1
> > > do something
> > > x=get row['id']  // for row 2
> > > do something
> > > ...
> > > x=get row['id']  // for row 50
> > >
> > > select id from overview where id>x order by id limit 50.
> > >
> > > The order by is relevant and infact imperative, because you must order
> > > your results somehow, otherwise your pagination will produce different
> > > results each time you try it as database updates will affect the order the
> > > rows come back by default without an order by clause.
> > >
> > > Let me say that again to be clear:  The order rows come back if you
> > > don't specify an order by can change!  so pulling rows without an order by
> > > is a REALLY bad idea.  This will break your pagination if a database 
> > > update
> > > happens between someone viewing a page and hitting next to view the next
> > > page.
> > >
> > > Alex
> > >
> > >
> > > On Jan 13, 2008 11:43 PM, Alex Turner <[EMAIL PROTECTED] > wrote:
> > >
> > > > If you have to access the data this way (with no where clause at all
> > > > - which sometimes you do) then I have already provided a solution that 
> > > > will
> > > > work reasonably well.  If you create what is essentially a materialized 
> > > > view
> > > > of just the id field, the sequence scan will return much fewer pages 
> > > > than
> > > > when you do it on the main table.  Then you join it to the indexed main
> > > > table, and page in just the rows you need.  Voila - much faster result. 
> > > >  Of
> > > > course we haven't really talked about how that will affect insert speed 
> > > > and
> > > > delete speed if you trigger then up, but you haven't really talked 
> > > > about any
> > > > requirements there.
> > > >
> > > > Alex
> > > >
> > > >
> > > > On Jan 13, 2008 11:27 PM, pepone. onrez <[EMAIL PROTECTED] >
> > > > wrote:
> > > >
> > > > > Sorry Alex i forget mention that i have setscan of in my last
> > > > > test.
> > > > >
> > > > > now I have set seqscan on  and indexscan on and added order by
> > > > > _id
> > > > >
> > > > > The table has an index in the _id field
> > > > >
> > > > > CREATE INDEX i_documentcontent_document
> > > > >   ON t_documentcontent
> > > > >   USING btree
> > > > >   (_document);
> > > > >
> > > > > The database was rencently vacum analyze , but not vacun full
> > > > >
> > > > > here is the explain of 2 diferent queries , when i put a large
> > > > > OFFSET
> > > > >
> > > > >  EXPLAIN ANALYZE SELECT
> > > > >     t_documentcontent._id AS _id
> > > > >     FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000
> > > > >
> > > > > "Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=
> > > > > 41119.702..41119.792 rows=50 loops=1)"
> > > > > "  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58)
> > > > > (actual time=41064.802..41100.424 rows=50050 loops=1)"
> > > > > "        Sort Key: _id"
> > > > > "        ->  Seq Scan on t_documentcontent  (cost= 
> > > > > 0.00..110772.07rows=150807 width=58) (actual time=
> > > > > 106.679..33267.194 rows=150807 loops=1)"
> > > > > "Total runtime: 41120.015 ms"
> > > > >
> > > > >  EXPLAIN ANALYZE SELECT
> > > > >     t_documentcontent._id AS _id
> > > > >     FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000
> > > > >
> > > > > "Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=
> > > > > 1172.969..1194.228 rows=50 loops=1)"
> > > > > "  ->  Index Scan using i_documentcontent_id on t_documentcontent
> > > > > (cost=0.00..600805.54 rows=150807 width=58) (actual time=
> > > > > 0.077..1189.688 rows=10050 loops=1)"
> > > > > "Total runtime: 1194.316 ms"
> > > > >
> > > > > Tom
> > > > >  i using uuid for the _id field that is the primary key  add a
> > > > > WHERE id > ?  don 't apply
> > > > > the cursor aproach is also not suitable for same of my queries
> > > > >
> > > > > I use this query for paginate contents of a filesysstem with lots
> > > > > of documents avoid offset is not posible always
> > > >
> > > >
> > > >
> > >
> >
>

Re: [GENERAL] query performance

Reply via email to