[
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-6422:
---------------------------------
Attachment: LUCENE-6422_with_SPT_factory_and_benchmark.patch
See this new patch.
* Fixed compilation against Java 7
* Added SpatialPrefixTreeFactory support so this tree can be chosen by-name (in
e.g. the benchmark module or Solr for that matter)
* Added benchmark module support by enhancing SpatialDocMaker (which uses the
aforementioned factory).
* included a tweaked spatial.alg
I did some benchmarking -- pretty quick & rough right now. I set max levels to
20 (chosen arbitrarily; with 27 it choked on memory given 2GB heap), with
distErrPct of 0.0, and indexing random circles up to 3 decimal degrees (a few
hundred KM or so), and disabling leafy branch pruning to compare apples to
apples.
I ran it with "quad" and "packedQuad" with the same settings otherwise.
* Index size: Quad: 1.4GB, PackedQuad 1.6GB
* Index time: Quad: 1.46 rec/sec, PackedQuad: 1.91 rec/sec
* Query time: Quad: 6.35 rec/sec, PackedQuad: 7.21 rec/sec
* The benchmark module shows average memory use but I always look at that with
a grain of salt. Seems PackedQuad *might* use a little more mem during
indexing and less during search. Shrug.
I was skeptical there would be index size savings and the benchmark shows there
aren't any. Please prove me wrong, Nick! I like the indexing & query speed
improvements -- not surprised given the nice code here without the ugly
recursion that was in legacy Quad.
off to bed now...
> Add StreamingQuadPrefixTree
> ---------------------------
>
> Key: LUCENE-6422
> URL: https://issues.apache.org/jira/browse/LUCENE-6422
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Affects Versions: 5.x
> Reporter: Nicholas Knize
> Attachments: LUCENE-6422.patch,
> LUCENE-6422_with_SPT_factory_and_benchmark.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5
> bits per GeoHash cell, respectively. To create the terms representing a
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to
> invert() which creates a second lexicographically sorted array of Terms. This
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to
> accomplish the following:
> 1. Create a packed 8byte representation for a QuadCell
> 2. Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells
> using an AutoPrefixAutomaton
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]