Re: Search while indexing

2009-03-07 Thread sonfon
Erick: Thanks for your suggestion. I think another solution would be keeping an list of keywords that could uniquely identify a document in a database, and search for keywords before adding a new document. As querying database is fast, this probaly wouldn't cost much time. But this would req

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Hi Got it working! Thanks again for your help! Amin On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman wrote: > Thanks! The final piece that I needed to do for the project! > Cheers > > Amin > > On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > >> > cool. i will use compression an

Re: doubt in adding a field in document

2009-03-07 Thread Erick Erickson
Didn't you post this already? Have you really looked at the Field documentation as was suggested last time? The short form is that there is no Field constructor like new Field("path",textFiles[i].getPath()); For plain strings, use the form: *Field

Re: Search while indexing

2009-03-07 Thread Erick Erickson
First, you'll probably want to search the user list archive for this issue, as it's been discussed and you'll find more information than I can remember off the top of my head. That said: 1> changes to an index are not visible until you reopen the reader. You probably have to flush the writer

doubt in adding a field in document

2009-03-07 Thread nitin gopi
hi all, i am having error in my code. the line giving error is bold in the code.the error is cannot find symbol. thank you nitin import java.io.File; import java.io.FileReader; import java.io.Reader; import java.util.Date; // import org.apache.lucene; import org.apache.lucene.analysis.Analyze

Search while indexing

2009-03-07 Thread sonfon
Dear All, Now, I'm considering to build index for my application with lucene. However, as the document sources I'm going to index has many duplications, so before adding a document to an IndexWriter, I hope search in the index database first to see if a same document copy has already been ad

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Thanks! The final piece that I needed to do for the project! Cheers Amin On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > > cool. i will use compression and store in index. is there anything > > special > > i need to for decompressing the text? i presume i can just do > > doc.get("cont

RE: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Uwe Schindler
> cool. i will use compression and store in index. is there anything > special > i need to for decompressing the text? i presume i can just do > doc.get("content")? > thanks for your advice all! No just use Field.Store.COMPRESS when adding to index and Document.get() when fetching. The decompress

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
cool. i will use compression and store in index. is there anything special i need to for decompressing the text? i presume i can just do doc.get("content")? thanks for your advice all! On Sat, Mar 7, 2009 at 11:50 AM, Uwe Schindler wrote: > You could store the text contents compressed; I think

RE: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Uwe Schindler
You could store the text contents compressed; I think extracting text from PDF files is much more time-intensive than decompressing a stored field. And text-only contents often compress very good. In my opinion, if the (uncompressed) contents of the docs are not very large (so I mean several megaby

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Erik Hatcher
It depends :) It's a trade-off. If storing is not prohibitive, I recommend that as it makes life easier for highlighting. Erik On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote: hi that's what i was thinking about. i would need to get the file and extract the text again a

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
hi that's what i was thinking about. i would need to get the file and extract the text again and then pass through the highlighter. The other option is storing the content in the index the downside being index is going to be large. Which would be the recommended approach? Cheers Amin On Sat,

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Erik Hatcher
With the caveat that if you're not storing the text you want highlighted, you'll have to retrieve it somehow and send it into the Highlighter yourself. Erik On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote: You should look at contrib/highlighter, which does exactly this. Mike

Re: deletion of index-files fails

2009-03-07 Thread Michael McCandless
OK I'll go make IndexReader.getRefCount() public for 2.9. Mike rolaren...@earthlink.net wrote: FWIW, +1 from me on all this: when I started poking at my little problem I found as you said that there was really no way to trace the issue (one can use the debugger of course and I did, which i

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Michael McCandless
You should look at contrib/highlighter, which does exactly this. Mike Amin Mohammed-Coleman wrote: Hi I am currently indexing documents (pdf, ms word, etc) that are uploaded, these documents can be searched and what the search returns to the user are summaries of the documents. Currently

Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Hi I am currently indexing documents (pdf, ms word, etc) that are uploaded, these documents can be searched and what the search returns to the user are summaries of the documents. Currently the summaries are extracted when indexing the file (summary constructed by taking the first 10 lines of the