Re: Mnogosearch (Was: Re: [GENERAL] website doc search is ... )

Marc G. Fournier Thu, 01 Jan 2004 20:35:20 -0800

On Thu, 1 Jan 2004, Tom Lane wrote:

> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> > On Thu, 1 Jan 2004, Tom Lane wrote:
> >> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> >>> what sort of impact does CLUSTER have on the system?  For instance, an
> >>> index happens nightly, so I'm guessing that I'll have to CLUSTER each
> >>> right after?
> >>
> >> Depends; what does the "index" process do --- are ndict8 and friends
> >> rebuilt from scratch?
>
> > nope, but heavily updated ... basically, the indexer looks at url for what
> > urls need to be 're-indexed' ... if it does, it removed all words from the
> > ndict# tables that belong to that url, and re-adds accordingly ...
>
> Hmm, but in practice only a small fraction of the pages on the site
> change in any given day, no?  I'd think the typical nightly run changes
> only a small fraction of the entries in the tables, if it is smart
> enough not to re-index pages that did not change.
>
> My guess is that it'd be enough to re-cluster once a week or so.
>
> But this is pointless speculation until we find out whether clustering
> helps enough to make it worth maintaining clustered-ness at all.  Did
> you get any results yet?


Here is post-CLUSTER:

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..19470.40 rows=1952 width=8) (actual time=39.639..4200.376 
rows=13415 loops=1)
   ->  Index Scan using n8_word on ndict8  (cost=0.00..70.90 rows=3253 width=8) 
(actual time=37.047..2802.400 rows=15533 loops=1)
         Index Cond: (word_id = 417851441)
   ->  Index Scan using url_rec_id on url  (cost=0.00..5.95 rows=1 width=4) (actual 
time=0.061..0.068 rows=1 loops=15533)
         Index Cond: (url.rec_id = "outer".url_id)
         Filter: (url ~~ 'http://archives.postgresql.org/%%'::text)
 Total runtime: 4273.799 ms
(7 rows)

And ... shit ... just tried a search on 'security invoker', and results
back in 2 secs ... 'multi version', 18 secs ... 'mnogosearch', .32sec ...
'mnogosearch performance', 18secs ...

this is closer to what I expect from PostgreSQL ...

I'm still loading the 'WITHOUT OIDS' database ... should I expect that,
with CLUSTERing, its performance would be slightly better yet, or would
the difference be negligible?

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]           Yahoo!: yscrappy              ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: Mnogosearch (Was: Re: [GENERAL] website doc search is ... )

Reply via email to