Re: Indexing only newly created files

2010-05-03 Thread Vijay Veeraraghavan
dear all, as replied below, does searching again for the document in the index and if found skip the indexing else index it, is this not similar to indexing all pdf documents once again, is not this overhead? As I am not going to index the details of the pdf (so if an indexed pdf was recreated i n

Re: Indexing only newly created files

2010-05-03 Thread Vijay Veeraraghavan
dear, Thanks for you reply Mr. simon, I found it very useful. I have another doubt, I create the index in a clustered environment (2 physical systems and 2 virtual). A shared system among the nodes is where this index will be created. The scheduler runs in another remote system which will create an

Re: Indexing only newly created files

2010-05-03 Thread Simon Willnauer
Hey there, you might have to implement a some kind of unique identifier using an indexed lucene field. When you are indexing you should fire a query with the uuid of your document (maybe the path to you pdf document) and check if the document is in the index already. You could also do a boolean qu