dear all,
as replied below, does searching again for the document in the index
and if found skip the indexing else index it, is this not similar to
indexing all pdf documents once again, is not this overhead? As I am
not going to index the details of the pdf (so if an indexed pdf was
recreated i n
dear,
Thanks for you reply Mr. simon, I found it very useful.
I have another doubt, I create the index in a clustered environment (2
physical systems and 2 virtual). A shared system among the nodes is
where this index will be created. The scheduler runs in another remote
system which will create an
Hey there,
you might have to implement a some kind of unique identifier using an
indexed lucene field. When you are indexing you should fire a query with the
uuid of your document (maybe the path to you pdf document) and check if the
document is in the index already. You could also do a boolean qu