I am really sorry if something made you confuse, as I said I am indexing a folder which contains mylogs.log,mylogs1.log,mylogs2.log etc, I am not indexing them as a flat file. I have tokenized my each line of text with regex and storing them as fields like "messageType", "timeStamp","message".
So I dont bother what file among those 4 files having this particular content but, I just want to insert only new records. My job routine will update these log files for every 30 minutes, and storing each row as document. So when I reading the files after 30 minutes for indexing,mylogs1.log content will previous version of mylog.log content. So If a row exists with the same data, So If I want to eliminate writing same record (from other file among those 4) again, Could you please suggest what do I need to do while calling add or updateDocument? Do I need to run seach before inserting any row or do I have any better way to eiliminate writing? I really appreciate your time reading this, and thanks for responding. -- View this message in context: http://lucene.472066.n3.nabble.com/Rendexing-problem-Indexing-folder-size-is-keep-on-growing-for-same-remote-folder-tp4092835p4092990.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org