I've tried the suggestion below, but it really doesn't seem to have any impact. I guess that's not surprising since 80% of the CPU time when I ran hprof was in String.intern(), not in the StringHelper class.
Clearly, if I'm going to hack things up at this level, I've got some work do to, including: - Synchronizing access to the storage mechanism I use for the strings - Using something else (String?) for the hashCode (instead of the String's built-in integer hashcode). Could I not use the HashMap solution I proposed to ensure that all field name strings that are the same refer to the same instance of the String? Does a HashMap.get() not return the same instance with every call? I'm also intrigued by a comment from Mike about FieldInfos and using those to control the uniqueness. Any suggestions on how I go about doing that? Would that be instead of monkeying with StringHelper or in addition to it? Thanks, Mark On Nov 20, 2010, at 5:44 AM, Yonik Seeley wrote: > On Fri, Nov 19, 2010 at 5:41 PM, Mark Kristensson > <mark.kristens...@smartsheet.com> wrote: >> Here's the changes I made to org.apache.lucene.util.StringHelper: >> >> //public static StringInterner interner = new SimpleStringInterner(1024,8); > > As Mike said, the real fix for trunk is to get rid of interning. > But for your version, you could try making the string intern cache larger. > > StringHelper.interner = new SimpleStringInterner(300000,8); > > -Yonik > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org