Hi,
I am trying to build a search utility that looks for 'similarities' between
documents.
In other words, for every document listed as a part of search result for a
phrase, I want to be able to list documents that are similar to it (but not
necessarily match the same search criterion). For exampl
Hi,
I know Lucene does not have transaction support at this stage.
However, I want to know what will happen if there is an operating
system crash during the indexing process, will the Lucene index got
corrupted?
Thanks,
Jian
-
On Jul 15, 2005, at 3:12 PM, [EMAIL PROTECTED] wrote:
If Microsoft Search does as you describe. Isn't it just:
1) Open file
2) Determine file type
3) Convert file content to UTF8, if text based, and you have the
API to read it. .html, .txt., .doc, .excel, etc.
4) Perform string search, rege
As somebody already said, you can have an in-memory index with
RAMDirectory. You can also pre-build a Lucene index on that CD - CD is
"static", you can't add/remove/change files on it, so you can build an
index and burn it onto the CD at the same time when you put the Word
files on it.
As for get
I imagine you could index the info you wanted to quickly search on into a
RAMDirectory (assuming it wasn't too much info), then run simple or complex
searches on that, but I that might take longer to do than simple regex
searching on files. That would only give you a gain if you were going to run
r
If Microsoft Search does as you describe. Isn't it just:
1) Open file
2) Determine file type
3) Convert file content to UTF8, if text based, and you have the API to
read it. .html, .txt., .doc, .excel, etc.
4) Perform string search, regex.
5) Continue to next file
As far as I know, Lucene is n
How can you use Lucene like the very limited but fast search that
Microsoft Windows Search provide?
The use case is that the users have a CD with lot of files. I provide
them a nice user interface. They have the option to generate the full
text search index but they should also be able to search
Paul Smith wrote:
I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce
code is in Nutch, I haven't been paying close enough attention.
Me too.. :) I didn't even know Nutch was now fully in the ASF, and I'm
a Member... :-$
Let me pipe in on behalf of the Nutch project... T