[
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicholas Knize updated LUCENE-6422:
-----------------------------------
Attachment: LUCENE-6422.patch
Developed and tested against branch_5x. I need to run more rigorous (and
systematic) benchmarking but preliminary tests show a 90% reduction in index
size on exotic cases (high precision).
One particular shape (the political boundary of Wales): using QuadPrefixTree
(w/ RecursivePrefixStrategy) consumed 1G of memory at TreeLevel 26 with
distance_error_pct: 0. The new PackedQuadPrefixTree brought this down to just
over 80mb with the same precision.
There are many improvements remaining (including using variable byte array
instead of 8 bytes for even the lowest levels). But this provides initial
progress that should open the door for better precision on extreme shapes.
> Add StreamingQuadPrefixTree
> ---------------------------
>
> Key: LUCENE-6422
> URL: https://issues.apache.org/jira/browse/LUCENE-6422
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Affects Versions: 5.x
> Reporter: Nicholas Knize
> Attachments: LUCENE-6422.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5
> bits per GeoHash cell, respectively. To create the terms representing a
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to
> invert() which creates a second lexicographically sorted array of Terms. This
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to
> accomplish the following:
> 1. Create a packed 8byte representation for a QuadCell
> 2. Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells
> using an AutoPrefixAutomaton
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]