Searching for similar documents

2005-07-15 Thread Kadlabalu, Hareesh
Hi, I am trying to build a search utility that looks for 'similarities' between documents. In other words, for every document listed as a part of search result for a phrase, I want to be able to list documents that are similar to it (but not necessarily match the same search criterion). For exampl

Lucene index integrity during a system crash

2005-07-15 Thread jian chen
Hi, I know Lucene does not have transaction support at this stage. However, I want to know what will happen if there is an operating system crash during the indexing process, will the Lucene index got corrupted? Thanks, Jian -

Re: Runtime full text search like in Microsoft Windows Search

2005-07-15 Thread Erik Hatcher
On Jul 15, 2005, at 3:12 PM, [EMAIL PROTECTED] wrote: If Microsoft Search does as you describe. Isn't it just: 1) Open file 2) Determine file type 3) Convert file content to UTF8, if text based, and you have the API to read it. .html, .txt., .doc, .excel, etc. 4) Perform string search, rege

Re: Runtime full text search like in Microsoft Windows Search

2005-07-15 Thread Otis Gospodnetic
As somebody already said, you can have an in-memory index with RAMDirectory. You can also pre-build a Lucene index on that CD - CD is "static", you can't add/remove/change files on it, so you can build an index and burn it onto the CD at the same time when you put the Word files on it. As for get

RE: Runtime full text search like in Microsoft Windows Search

2005-07-15 Thread Nathan Brackett
I imagine you could index the info you wanted to quickly search on into a RAMDirectory (assuming it wasn't too much info), then run simple or complex searches on that, but I that might take longer to do than simple regex searching on files. That would only give you a gain if you were going to run r

Re: Runtime full text search like in Microsoft Windows Search

2005-07-15 Thread [EMAIL PROTECTED]
If Microsoft Search does as you describe. Isn't it just: 1) Open file 2) Determine file type 3) Convert file content to UTF8, if text based, and you have the API to read it. .html, .txt., .doc, .excel, etc. 4) Perform string search, regex. 5) Continue to next file As far as I know, Lucene is n

Runtime full text search like in Microsoft Windows Search

2005-07-15 Thread Tardif, Sebastien
How can you use Lucene like the very limited but fast search that Microsoft Windows Search provide? The use case is that the users have a CD with lot of files. I provide them a nice user interface. They have the option to generate the full text search index but they should also be able to search

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-15 Thread Andrzej Bialecki
Paul Smith wrote: I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce code is in Nutch, I haven't been paying close enough attention. Me too.. :) I didn't even know Nutch was now fully in the ASF, and I'm a Member... :-$ Let me pipe in on behalf of the Nutch project... T