[
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226561#comment-15226561
]
David Smiley commented on SOLR-8944:
------------------------------------
What should be done is to enhance this query (and most other predicates) to use
{{DocIdSetBuilder}}. File an issue if you wish to pursue it; it'd be easy I
think. In LUCENE-6645 some performance testing of some new spatial approaches
wasdone that also needed to build up a BitSet, and it was shown that
SparseFixedBitSet caused a significant performance hit. DocIdSetBuilder has an
internal sparse sorted array mode which is used when the number of docs is less
than 1/128th of the total docs in a segment.
I hope that helps enough and we can stop there. I don't like the idea of
adding complexity to re-use FixedBitSets. Instead... perhaps more could be
done to enhance the cache-ability of your spatial queries. I've thought of
perhaps using {{TermQueryPrefixTreeStrategy}} with a very coarse/approximate
and thus more cacheable filter, although with a non-cached Solr post-filter
using perhaps LatLonType. LatLonType _can_ be slow, but using projected space
(2D) instead of surface-of-sphere might help a lot if your data isn't
world-wide.
> Improve geospatial garbage generation
> -------------------------------------
>
> Key: SOLR-8944
> URL: https://issues.apache.org/jira/browse/SOLR-8944
> Project: Solr
> Issue Type: Improvement
> Reporter: Jeff Wartes
> Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index.
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the
> backtraces, it appears the biggest source of FixBitSet creation in my case
> (by two orders of magnitude) is my use of queries that involve geospatial
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc,
> which presumably changes less frequently than queries are issued. If an
> existing FixedBitSet were not available from a pool, the worst case (create a
> new one) would be no worse than the current behavior. The complication would
> be enforcement around when to return the object to the pool, but it looks
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts
> considerable effort into allocating smaller chunks only as necessary. Is this
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little
> more data around the current choices before choosing an approach.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]