[Solr] Reindexing leaving behind 0 live doc segments

Rahul Goswami Thu, 31 Aug 2023 13:46:33 -0700

Hello,
I tried floating this question on the Lucene list as well, but thought
the answer could also come from Solr's handling of IndexReader. Hence
posting here.


I am trying to execute a program to read documents segment-by-segment and
reindex to the same index. I am reading using Lucene apis and indexing
using Solr api (in a core that is currently loaded) to preserve the field
analysis and automatically take care of deletions. The segments being
processed *do not* participate in merge.

The reindexing code is executing in the same JVM *inside* of Solr.

What I am observing is that even after a segment has been fully processed
and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment
with 0 live docs gets left behind bloating up the index. *Upon Solr
restart, the segment gets cleared successfully.*

I tried to replicate the same thing without the code by indexing 3 docs on
an empty test core, and then reindexing the same docs. The older segment
gets deleted as soon as softCommit interval hits or an explicit commit=true
is called.

Here is the high level code. But this leaves undeleted segments behind
until I restart Solr and load the core again.

try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
                    IndexReader reader = DirectoryReader.open(dir)) {
                for (LeafReaderContext lrc : reader.leaves()) {

                       //read live docs from each leaf , create a
SolrInputDocument out of Document and index using Solr api

                }
}catch(Exception e){}

Would opening an IndexReader this way interfere with how Solr manages
IndexReader and file refCounts, thereby preventing file deletion? What am I
missing?

Happy to provide more details if required, code or otherwise. Help would be
much appreciated!

Thanks,
Rahul

[Solr] Reindexing leaving behind 0 live doc segments

Reply via email to