Hello, I tried floating this question on the Lucene list as well, but thought the answer could also come from Solr's handling of IndexReader. Hence posting here.
I am trying to execute a program to read documents segment-by-segment and reindex to the same index. I am reading using Lucene apis and indexing using Solr api (in a core that is currently loaded) to preserve the field analysis and automatically take care of deletions. The segments being processed *do not* participate in merge. The reindexing code is executing in the same JVM *inside* of Solr. What I am observing is that even after a segment has been fully processed and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment with 0 live docs gets left behind bloating up the index. *Upon Solr restart, the segment gets cleared successfully.* I tried to replicate the same thing without the code by indexing 3 docs on an empty test core, and then reindexing the same docs. The older segment gets deleted as soon as softCommit interval hits or an explicit commit=true is called. Here is the high level code. But this leaves undeleted segments behind until I restart Solr and load the core again. try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir())); IndexReader reader = DirectoryReader.open(dir)) { for (LeafReaderContext lrc : reader.leaves()) { //read live docs from each leaf , create a SolrInputDocument out of Document and index using Solr api } }catch(Exception e){} Would opening an IndexReader this way interfere with how Solr manages IndexReader and file refCounts, thereby preventing file deletion? What am I missing? Happy to provide more details if required, code or otherwise. Help would be much appreciated! Thanks, Rahul