Pros of keeping content only in the database
* Need only one stored copy of data (saved disk space)

Pros of storing copy of content in Lucene:

* A match is more easily explained
If you collapse multiple DB fields into a single searchable field e.g. customer first name and surname database field into a single Lucene "name" field it is easier to get highlighting to work with the actual data that was searched ("name") than trying to piece together what this was made from in the DB and apply highlighting. Not impossible to overcome but just more work to bear in mind.

* A match result is guaranteed consistent -
If there is a time lag between database update and search indexing (which there invariably is) then you could match a search on a value that is no longer stored in the database.

* Speed of content retrieval
Lucene doc retrievals using internal Lucene doc ids and selective field loading may prove to be faster than hitting the DB with "select X from table where key in (matchKey1, matchKey2....)". Remember you have to read Lucene docs anyway to get the "matchKeyX" value for your SQL statement. Only benchmarking will tell how much faster this is if at all. It does depend on your doc sizes/number of fields shown etc

Cheers
Mark




agatone wrote:
Hi, I asked this question already on "lucene-general" list but also got advised
to ask here too.

I'm working on a project that has big database in the background (some
tables have about 1500000 rows). We decided to use Lucene for "faster"
search. Our search works similar as all searches: you write search string,
get list of hits with detail link. But there is dilemma if we should store
more data into index than it's needed.
One side of developing team insists that we should use lucene index as
somekind of storage for data so when you get hit, you go onto details and
then again use lucene to find document that matches the selected ID and take
the data from Lucene index. So in the end you end with copying complete
database tables into the lucene index.

Other side insists on storing to index only data that is displayed directly
to the user when showing the search results list and needed for search
criteria. When you go onto details, you have the matching ID so you can
pickup that row from database by that ID rather than search it inside Lucene
index.
Can someone please describe drawbacks and advantages of both approaches.
Actually can someone write down what's the actual profit, where and when of
the Lucene itself in real production env.
IT would be great if there is anyone who could write his experience with
indexing and searching large amount of data.


Thank you
------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.173 / Virus Database: 270.7.5/1700 - Release Date: 9/30/2008 11:03 AM




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to