Re: IndexWriter.close() performance issue

Mark Kristensson Tue, 23 Nov 2010 14:23:11 -0800

I've tried the suggestion below, but it really doesn't seem to have any impact. 
I guess that's not surprising since 80% of the CPU time when I ran hprof was in 
String.intern(), not in the StringHelper class.


Clearly, if I'm going to hack things up at this level, I've got some work do 
to, including:
- Synchronizing access to the storage mechanism I use for the strings
- Using something else (String?) for the hashCode (instead of the String's 
built-in integer hashcode).

Could I not use the HashMap solution I proposed to ensure that all field name 
strings that are the same refer to the same instance of the String? Does a 
HashMap.get() not return the same instance with every call? 

I'm also intrigued by a comment from Mike about FieldInfos and using those to 
control the uniqueness. Any suggestions on how I go about doing that? Would 
that be instead of monkeying with StringHelper or in addition to it?

Thanks,
Mark



On Nov 20, 2010, at 5:44 AM, Yonik Seeley wrote:

> On Fri, Nov 19, 2010 at 5:41 PM, Mark Kristensson
> <mark.kristens...@smartsheet.com> wrote:
>> Here's the changes I made to org.apache.lucene.util.StringHelper:
>> 
>>  //public static StringInterner interner = new SimpleStringInterner(1024,8);
> 
> As Mike said, the real fix for trunk is to get rid of interning.
> But for your version, you could try making the string intern cache larger.
> 
> StringHelper.interner = new SimpleStringInterner(300000,8);
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: IndexWriter.close() performance issue

Reply via email to