Best Practice for Lucene Search

Konstantyn Smirnov Wed, 11 Feb 2009 06:05:41 -0800

In the beginning of the development, I was also facing a choice to mirror the
documents in DB/index.


But when the number of raws reached the mark of 7 mio, the query like 

        "select count(id) from documentz" 

(using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became
clear to me, that something is not right with that approach :confused:.

The other reasons to name a few, would be the need to run the data import
(almost) twice for the index and DB, and then synchronize them in case of
changes.

At the moment, I have a set-up of 6 physical indieces, 15 GB each and in
total I have like 46 mio documents, and can say that I'm pretty happy with
the search performance.

Ah, almost forgot! Those 46 mio documents represent around 100 different
sources (field-structures), and would need to be persisted in 100 different
DB-tables. Also a whole lot of new sources are expected to come, and be
added into the stack ONLINE w/o the server restart! 

Using mixed DB/index solution it would be nightmare to maintain that, but a
single Lucene index copes with the task fast and easy.

So, my vote for text-only datas goes clearly to Lucene-only solution :)
-- 
View this message in context: 
http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21955474.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Best Practice for Lucene Search

Reply via email to