[
https://issues.apache.org/jira/browse/LUCENE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285809#comment-14285809
]
David Smiley commented on LUCENE-6191:
--------------------------------------
I have some performance numbers taken while working on SOLR-7005. I took a
geonames data set of 8,552,952 docs and I indexed the latitude & longitude into
a quad prefixTree with maximum resolution of a meter and with geo=false and
-180 to 180, -90 to 90 world bounds of standard geodetic degree boundaries.
That's a screw-up on my part; I forgot to use 360x360 to get square grid boxes
instead of rectangular ones. But that's not pertinent. The index size is
2.6GB which is kind of large. Increasing the maximum resolution to above a
meter will decrease the index size a lot. This reminds me of how beneficial
the forthcoming "flex" prefixTree will be, but I digress. This data is all
points.
Base stats:
* Machine: my SSD based recent MacBook Pro, Java 8
* Lucene/Solr: trunk as of last night
* Docs: 8,552,952
* Segments: 1
* Disk index size: 2.6GB
* QuadTree:
** precision: 26 (better than a meter)
512x512 heatmap, (_note: this is a whopping 262,144 cells_): 248ms (PNG to be
attached to SOLR-7005 soon).
Now filtered with an additional query down to 165 docs: 105ms (I figure this
fast number is due to a particular optimization in the prefix tree facet
counter for highly discriminating filters).
64x64 heatmap (4,096 cells): 105ms
Filtered to 165 docs: 21ms
I took one measurement when the index was un-optimized at 38 segments,
including 10K deleted docs (512x512 query all): 1800ms roughly. I should try
this again after I re-index with the square grid cells I want.
> Spatial 2D faceting (heatmaps)
> ------------------------------
>
> Key: LUCENE-6191
> URL: https://issues.apache.org/jira/browse/LUCENE-6191
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/spatial
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 5.1
>
> Attachments: LUCENE-6191__Spatial_heatmap.patch
>
>
> Lucene spatial's PrefixTree (grid) based strategies index data in a way
> highly amenable to faceting on grids cells to compute a so-called _heatmap_.
> The underlying code in this patch uses the PrefixTreeFacetCounter utility
> class which was recently refactored out of faceting for NumberRangePrefixTree
> LUCENE-5735. At a low level, the terms (== grid cells) are navigated
> per-segment, forward only with TermsEnum.seek, so it's pretty quick and
> furthermore requires no extra caches & no docvalues. Ideally you should use
> QuadPrefixTree (or Flex once it comes out) to maximize the number grid levels
> which in turn maximizes the fidelity of choices when you ask for a grid
> covering a region. Conveniently, the provided capability returns the data in
> a 2-D grid of counts, so the caller needn't know a thing about how the data
> is encoded in the prefix tree. Well almost... at this point they need to
> provide a grid level, but I'll soon provide a means of deriving the grid
> level based on a min/max cell count.
> I recommend QuadPrefixTree with geo=false so that you can provide a square
> world-bounds (360x360 degrees), which means square grid cells which are more
> desirable to display than rectangular cells.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]