I am just going to wax philosophical for a minute. I am trying to
understand lucene's merging algorithm in depth.
Let's say I create an index of 25M web pages on a single machine. While
creating this index I am doing both search and indexing / re-indexing at
the same time, a bit like Technorat
I have a periodic process that runs as a timer task that periodically
optimizes my search index. However, I am having difficulties with this
process failing:
java.io.IOException: Cannot overwrite: C:\04950_04959\deleteable.new
at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory
Andrzej, I think you did a great job elucidating my thoughts as well. I
heartily concur with everything you said.
Andrzej Bialecki Wrote:
> Hmm... Please define what "adequate" means. :-) IMHO,
> "adequate" is when for any query the response time is well
> below 1 second. Otherwise the serv
It depends on the application. Depending on the access pattern of you
system you might be able to use Lucene. It's been done ;-).
If you have a very few tables with very simple relationships, it might
be an answer -- perhaps not the best one though. If you want to use
advanced RDBMS feature
You mentioned that "it will scale well in the future". Does this imply
that it doesn't scale well now? What are the current limitations of the
Lucene Highlighter? Does does it perform under high query load?
This is just a curiousity of mine, but nutch has a separate Summarizer:
net.nutch.sear