Thanks for the suggestion, Erick. I created a JIRA and moved the patch
to SVN, just to be safe. [1]

--Gregg

[1] https://issues.apache.org/jira/browse/SOLR-3514

On Wed, Jun 6, 2012 at 2:35 PM, Erick Erickson <erickerick...@gmail.com> wrote:
>
> Hmmm, it would be better to open a Solr JIRA and attach this as a patch.
> Although we've had some folks provide a Git-based rather than an SVN-based
> patch.
>
> Anyone can open a JIRA, but you must create a signon to do that. It'd get more
> attention that way....
>
> Best
> Erick
>
> On Tue, Jun 5, 2012 at 2:19 PM, Gregg Donovan <gregg...@gmail.com> wrote:
> > We've encountered GC spikes at Etsy after adding new
> > ExternalFileFields a decent number of times. I was always a little
> > confused by this behavior -- isn't it just one big float[]? why does
> > that cause problems for the GC? -- but looking at the FileFloatSource
> > code a little more carefully, I wonder if this is due to using a
> > WeakHashMap that is only cleaned by GC or manual invocation of a
> > request handler.
> >
> > FileFloatSource stores a WeakHashMap containing <IndexReader,<float[]
> > or CreationPlaceholder>>. In the code[1], it mentions that the
> > implementation is modeled after the FieldCache implementation.
> > However, the FieldCacheImpl adds listeners for IndexReader close
> > events and uses those to purge its caches. [2] Should we be doing the
> > same in FileFloatSource?
> >
> > Here's a mostly untested patch[3] with a possible implementation.
> > There are probably better ways to do it (e.g. I don't love using
> > another WeakHashMap), but I found it tough to hook into the
> > IndexReader lifecycle without a) relying on classes other than
> > FileFloatSource b) changing the public API of FIleFloatSource or c)
> > changing the implementation too much.
> >
> > There is a RequestHandler inside of FileFloatSource
> > (ReloadCacheRequestHandler) that can be used to clear the cache
> > entirely[4], but this is sub-optimal for us for a few reasons:
> >
> > --It clears the entire cache. ExternalFileFields often take some
> > non-trivial time to load and we prefer to do so during SolrCore
> > warmups. Clearing the entire cache while serving traffic would likely
> > cause user-facing requests to timeout.
> > --It forces an extra commit with its consequent cache cycling, etc..
> >
> > I'm thinking of ways to monitor the size of FileFloatSource's cache to
> > track its size against GC pause times, but it seems tricky because
> > even calling WeakHashMap#size() has side-effects. Any ideas?
> >
> > Overall, what do you think? Does relying on GC to clean this cache
> > make sense as a possible cause of GC spikiness? If so, does the patch
> > [3] look like a decent approach?
> >
> > Thanks!
> >
> > --Gregg
> >
> > [1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135
> > [2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166
> > [3] https://gist.github.com/2876371
> > [4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310

Reply via email to