On 13-11-2012 4:15 selvakumar netaji wrote:
Hi All,
We are using lucene for searching data from the database in our enterprise
application.
The searches would be in a single index, whose documents are indexed from
two different databases A and B. The frequency of updating the database A
is linear i.e. for every minute it gets inserted whereas the frequency of
updating of the database B is on a weekly basis.
The problem is with the indexing of the database A. For eg if the indexing
got completed in t second and and a data(d1) gets inserted in (t+1) second
then the search of Data d1 would not be in index.
Is it really a problem that there is a window where an update from the
database is not yet visible? Or do you just perceive it as a problem?
I.e. is it something an end-user will (or did) complain about?
To avoid this data loss,
Searching can be performed in index and in db(whose record are not in
index). The problem over here is that we won't be able to get the score
base ordering in database and there would be problems in combining the
results from the db and from the index. Is there are any way to get the
lucene score form the search results in db.
The other alternative would be update the index for every 30(might be less
than that) sec so that the whenever the db gets updated the index gets
updated. Is there are any other solution to update the index directly
whenever the db gets updated. Can you please suggest.
Perhaps you can convert it into something event-based, for instance with
a Message Queue (jms) which allows you to stream the updates as soon as
you know they're made. And combined with NRT (near real time search) you
should be able to access the changes fairly quickly after being made.
But there will still be a window where the database is ahead of the
search index.
The final solution as I've thought would be to have two indexes, one file
system index and a in-memory index. The file system index would be indexed
or updated on a daily basis and the in-memory index would be updated
whenever the db changes. So we'll search both the indexes and we'll combine
the data since both have the lucene scores. So there would not be any data
loss.
This scenario will also have a windows where the database is ahead...
Unless you start a transaction on the database and make that wait for
the update on the search index.
You could also try some tricks with the user interface, if the amount of
results from the database is very small compared to the normal result.
Say you have 100 results from the index and 3 from the db that are not
yet in the index.
You could present that as 'These 3 results are so new, the're not
properly processed yet and here are the 100 results that are fully
processed'
That way, you leave the 'scoring' to the user.
Best regards,
Arjen
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org