In the beginning of the development, I was also facing a choice to mirror the documents in DB/index.
But when the number of raws reached the mark of 7 mio, the query like "select count(id) from documentz" (using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became clear to me, that something is not right with that approach :confused:. The other reasons to name a few, would be the need to run the data import (almost) twice for the index and DB, and then synchronize them in case of changes. At the moment, I have a set-up of 6 physical indieces, 15 GB each and in total I have like 46 mio documents, and can say that I'm pretty happy with the search performance. Ah, almost forgot! Those 46 mio documents represent around 100 different sources (field-structures), and would need to be persisted in 100 different DB-tables. Also a whole lot of new sources are expected to come, and be added into the stack ONLINE w/o the server restart! Using mixed DB/index solution it would be nightmare to maintain that, but a single Lucene index copes with the task fast and easy. So, my vote for text-only datas goes clearly to Lucene-only solution :) -- View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21955474.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org