dear, Thanks for you reply Mr. simon, I found it very useful. I have another doubt, I create the index in a clustered environment (2 physical systems and 2 virtual). A shared system among the nodes is where this index will be created. The scheduler runs in another remote system which will create and save the pdfs to a remote file server. Is it advisable to create a local index and add it the main index shared by the nodes? Or create a local index and copy it to the nodes?
Thanks Vijay On 5/3/10, Simon Willnauer <simon.willna...@googlemail.com> wrote: > Hey there, > > you might have to implement a some kind of unique identifier using an > indexed lucene field. When you are indexing you should fire a query with the > uuid of your document (maybe the path to you pdf document) and check if the > document is in the index already. You could also do a boolean query > combining UUID, timestamp and / or a hash value to see if the document has > been changed. if so you can simply update the document by its UUID > (something like indexwriter.updateDocument(new Term("uuid", > value),document);) > > Unfortunately you have to implement this yourself but it should not be that > much of a deal. > > simon > > On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan < > vijay.raghava...@gmail.com> wrote: > >> Dear all, >> I am using lucene 3.0 to index the pdf reports that I generate >> dynamically. I index the pdf file name (without extension), file path >> and its absolute path as fields. I search with the file name without >> extension; it retrieves a list, as usually 2 or more files are present >> in the same name in different sub directories. As I create the index >> for the first time it updates, assuming 100 pdf files in different >> directories, the files meta info. If again I do indexing, while my >> report generator scheduler has the produced 500 more pdf files >> totaling to 600 files in different directories, I wish to index only >> the new files to the index. But presently it’s doing the whole thing >> again (600 files). How to implement this functionality? Think of the >> thousands of pdf files created on each run. >> >> P.S: I cannot keep the meta-info of generated pdf files in the java >> memory, as it exceeds thousands in a single run, and update the index >> looping this list. >> >> new IndexWriter(FSDirectory.open(this.indexDir), new StandardAnalyzer( >> Version.LUCENE_CURRENT), true, >> >> IndexWriter.MaxFieldLength.LIMITED); >> >> is the boolean parameter is for this purpose? Please guide me. >> >> -- >> Thanks >> Vijay Veeraraghavan >> >> >> >> -- >> Thanks & Regards >> Vijay Veeraraghavan >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > -- Thanks & Regards Vijay Veeraraghavan --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org