Likely what happened is you had a bunch of smaller segments, and then
suddenly they got merged into that one big segment (_aiaz) in your
index.

The representation for norms in particular is not sparse, so this
means the size of the norms file for a given segment will be
number-of-unique-indexed-fields X number-of-documents.

So this count grows quadratically on merge.

Do these fields really need to be indexed?   If so, it'd be better to
use a single field for all users for the indexable text if you can.

Failing that, a simple workaround is to set the maxMergeMB/Docs on the
merge policy; this'd prevent big segments from being produced.
Disabling norms should also workaround this, though that will affect
hit scores...

Mike

On Wed, Nov 3, 2010 at 7:37 PM, Mark Kristensson
<mark.kristens...@smartsheet.com> wrote:
> Yes, we do have a large number of unique field names in that index, because 
> they are driven by user named fields in our application (with some cleaning 
> to remove illegal chars).
>
> This slowness problem has appeared very suddenly in the last couple of weeks 
> and the number of unique field names has not spiked in the last few weeks. 
> Have we crept over some threshold with our linear growth in the number of 
> unique field names? Perhaps there is a limit driven by the amount of RAM in 
> the machine that we are violating? Are there any guidelines for the maximum 
> number, or suggested number, of unique fields names in an index or segment? 
> Any suggestions for potentially mitigating the problem?
>
> Thanks,
> Mark
>
>
> On Nov 3, 2010, at 2:02 PM, Michael McCandless wrote:
>
>> On Wed, Nov 3, 2010 at 4:27 PM, Mark Kristensson
>> <mark.kristens...@smartsheet.com> wrote:
>>>
>>> I've run checkIndex against the index and the results are below. That net 
>>> is that it's telling me nothing is wrong with the index.
>>
>> Thanks.
>>
>>> I did not have any instrumentation around the opening of the IndexSearcher 
>>> (we don't use an IndexReader), just around the actual query execution so I 
>>> had to add some additional logging. What I found surprised me, opening a 
>>> search against this index takes the same 6 to 8 seconds that closing the 
>>> indexWriter takes.
>>
>> IndexWriter opens a SegmentReader for each segment in the index, to
>> apply deletions, so I think this is the source of the slowness.
>>
>> From the CheckIndex output, it looks like you have many (296,713)
>> unique fields names on that one large segment -- does that sound
>> right?  I suspect such a very high field count is the source of the
>> slowness...
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to