Hello,

I'm a novice Lucene user and just started using it to do some prototyping
for my project.

I noticed SortedSetDocValues was introduced in 4.3.0 that allows faceted
search without a dedicated taxonomy index.  I've successfully used it to
perform faceting on a small index (~3000 documents, ~400 bytes per doc).
But when I loaded a bigger index (~50000 documents), I started getting
ArrayIndexOutOfBounds exception when SortedSetDocValuesAccumulator performs
aggregation.

Specifically, it errors out on line 139 where it tries to migrate segment
ordinals to global ordinals.  I've poked around and did some debugging; the
following is my finding.

The smaller index only had one segment when initially loaded, while the
bigger one had multiple.  My test suite consists of some searches on the
index with occasional updates to the index.  The error only happens when I
do a faceted search immediately following an update to the index.

Then I tried forcing a merge of the segments for the larger index as the
final step of initial indexing.  So when I initially loaded the index
afterwards, there was only one segment.  This time there were no errors,
even though it was the same set of documents.  Interestingly, even though
segments are created as I do updates on the index as part of my test suite,
no errors crop up afterwards.  I can add that I've only seen issues with 3
or more segments, while 2 seems to work.  I don't know why this would be
the case but these are my observations.

Let me know if there is some standard way to report bugs that I should
follow.  I've checked out the JIRA page for Lucene, but it looked more like
a "find bugs, create issue, fix it, upload patch", where the issue creator
fixes the bug.  I have a long ways to go before I understand the low level
implementation to apply a fix :(

Thanks

Reply via email to