[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

David Smiley (JIRA) Tue, 05 Apr 2016 09:21:57 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226561#comment-15226561
 ]


David Smiley commented on SOLR-8944:
------------------------------------

What should be done is to enhance this query (and most other predicates) to use 
{{DocIdSetBuilder}}. File an issue if you wish to pursue it; it'd be easy I 
think.  In LUCENE-6645 some performance testing of some new spatial approaches 
wasdone that also needed to build up a BitSet, and it was shown that 
SparseFixedBitSet caused a significant performance hit.  DocIdSetBuilder has an 
internal sparse sorted array mode which is used when the number of docs is less 
than 1/128th of the total docs in a segment.

I hope that helps enough and we can stop there.  I don't like the idea of 
adding complexity to re-use FixedBitSets.  Instead... perhaps more could be 
done to enhance the cache-ability of your spatial queries.  I've thought of 
perhaps using {{TermQueryPrefixTreeStrategy}} with a very coarse/approximate 
and thus more cacheable filter, although with a non-cached Solr post-filter 
using perhaps LatLonType.  LatLonType _can_ be slow, but using projected space 
(2D) instead of surface-of-sphere might help a lot if your data isn't 
world-wide.

> Improve geospatial garbage generation
> -------------------------------------
>
>                 Key: SOLR-8944
>                 URL: https://issues.apache.org/jira/browse/SOLR-8944
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>              Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

Reply via email to