On Thu, 1 Jan 2004, Tom Lane wrote: > "Marc G. Fournier" <[EMAIL PROTECTED]> writes: > > On Thu, 1 Jan 2004, Tom Lane wrote: > >> "Marc G. Fournier" <[EMAIL PROTECTED]> writes: > >>> what sort of impact does CLUSTER have on the system? For instance, an > >>> index happens nightly, so I'm guessing that I'll have to CLUSTER each > >>> right after? > >> > >> Depends; what does the "index" process do --- are ndict8 and friends > >> rebuilt from scratch? > > > nope, but heavily updated ... basically, the indexer looks at url for what > > urls need to be 're-indexed' ... if it does, it removed all words from the > > ndict# tables that belong to that url, and re-adds accordingly ... > > Hmm, but in practice only a small fraction of the pages on the site > change in any given day, no? I'd think the typical nightly run changes > only a small fraction of the entries in the tables, if it is smart > enough not to re-index pages that did not change. > > My guess is that it'd be enough to re-cluster once a week or so. > > But this is pointless speculation until we find out whether clustering > helps enough to make it worth maintaining clustered-ness at all. Did > you get any results yet?
Here is post-CLUSTER: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..19470.40 rows=1952 width=8) (actual time=39.639..4200.376 rows=13415 loops=1) -> Index Scan using n8_word on ndict8 (cost=0.00..70.90 rows=3253 width=8) (actual time=37.047..2802.400 rows=15533 loops=1) Index Cond: (word_id = 417851441) -> Index Scan using url_rec_id on url (cost=0.00..5.95 rows=1 width=4) (actual time=0.061..0.068 rows=1 loops=15533) Index Cond: (url.rec_id = "outer".url_id) Filter: (url ~~ 'http://archives.postgresql.org/%%'::text) Total runtime: 4273.799 ms (7 rows) And ... shit ... just tried a search on 'security invoker', and results back in 2 secs ... 'multi version', 18 secs ... 'mnogosearch', .32sec ... 'mnogosearch performance', 18secs ... this is closer to what I expect from PostgreSQL ... I'm still loading the 'WITHOUT OIDS' database ... should I expect that, with CLUSTERing, its performance would be slightly better yet, or would the difference be negligible? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend