[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

Jeff Wartes (JIRA) Tue, 05 Apr 2016 12:16:37 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226948#comment-15226948
 ]


Jeff Wartes commented on SOLR-8944:
-----------------------------------

I hadn't refreshed and didn't see this comment before I added mine, but thanks 
for the info, I appreciate the references and context. I'll take a look at what 
would be involved with DocIdSetBuilder.

I also feel like I should mention though, that class will be the third case of 
a hardcoded magic fraction of maxDoc I've come across in the context of 
investigating allocations this last week. It might be worth considering whether 
the gyrations around avoiding the creation of these BitSets is more or less 
complicated than managing a pool would be.

> Improve geospatial garbage generation
> -------------------------------------
>
>                 Key: SOLR-8944
>                 URL: https://issues.apache.org/jira/browse/SOLR-8944
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>              Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

Reply via email to