[
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-6422:
---------------------------------
Attachment: LUCENE-6422.patch
Latest patch:
* Removed some redundant casts
* moved PrefixTreeIterator.leaves to a local variable of pruned() since that's
the only place it was used.
* implemented toString() so that the leafyPrune state could be printed
* Removed modifications to RecursivePrefixTreeStrategy -- an instanceof check
related to leafy branch pruning. PackedQuadPrefixTree does in fact support RPT
doing the leaf pruning, so there's no instance-of check needed. It's up to the
user to have this tree do it or have RPT do it. They aren't the same since
RPT's impl is recursive whereas PQPT's impl is only at the last level (the
level where it has the most benefit, yes). I also modified the fuzzy test to
independently set the leafy branch prune option on them, just to test it works.
* Enhanced getTreeCellIterator/PrefixTreeIterator to honor the "detailLevel"
parameter instead of always going to maxLevels. This was a simple matter of
passing through this parameter to the iterator and renaming maxLevels to
detailLevel there.
* PQPT.Cell.toString: moved Long.numberOfLeadingZeros out of the loop, and
changed the O(N^2) String append to use a StringBuilder.
* Optimized compareToNoLeaf to simply compare the longs instead of converting
to bytes and comparing byte by byte. I put in an assert to check for parity
with the old algorithm.
* Added back in Benchmark SpatialDocMaker stuff, but with the leafy branch
prune option set on PQPT grid if it's of that type instead of the strategy.
I did some testing; notably running the fuzzy test with 10k iterations and
temporarily set to test just the packed quad.
I think it's ready from my point of view but I'd like to get your input on my
changes Nick.
> Add StreamingQuadPrefixTree
> ---------------------------
>
> Key: LUCENE-6422
> URL: https://issues.apache.org/jira/browse/LUCENE-6422
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Affects Versions: 5.x
> Reporter: Nicholas Knize
> Attachments: LUCENE-6422.patch, LUCENE-6422.patch, LUCENE-6422.patch,
> LUCENE-6422.patch, LUCENE-6422_with_SPT_factory_and_benchmark.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5
> bits per GeoHash cell, respectively. To create the terms representing a
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to
> invert() which creates a second lexicographically sorted array of Terms. This
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to
> accomplish the following:
> 1. Create a packed 8byte representation for a QuadCell
> 2. Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells
> using an AutoPrefixAutomaton
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]