[ 
https://issues.apache.org/jira/browse/LUCENE-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6422:
---------------------------------
    Attachment: LUCENE-6422_with_SPT_factory_and_benchmark.patch

See this new patch.
* Fixed compilation against Java 7
* Added SpatialPrefixTreeFactory support so this tree can be chosen by-name (in 
e.g. the benchmark module or Solr for that matter)
* Added benchmark module support by enhancing SpatialDocMaker (which uses the 
aforementioned factory).
* included a tweaked spatial.alg

I did some benchmarking -- pretty quick & rough right now.  I set max levels to 
20 (chosen arbitrarily; with 27 it choked on memory given 2GB heap), with 
distErrPct of 0.0, and indexing random circles up to 3 decimal degrees (a few 
hundred KM or so), and disabling leafy branch pruning to compare apples to 
apples.

I ran it with "quad" and "packedQuad" with the same settings otherwise.
* Index size: Quad: 1.4GB, PackedQuad 1.6GB
* Index time: Quad: 1.46 rec/sec, PackedQuad: 1.91 rec/sec
* Query time: Quad: 6.35 rec/sec, PackedQuad: 7.21 rec/sec
* The benchmark module shows average memory use but I always look at that with 
a grain of salt.  Seems PackedQuad *might* use a little more mem during 
indexing and less during search.  Shrug.

I was skeptical there would be index size savings and the benchmark shows there 
aren't any.  Please prove me wrong, Nick!  I like the indexing & query speed 
improvements -- not surprised given the nice code here without the ugly 
recursion that was in legacy Quad.

off to bed now...

> Add StreamingQuadPrefixTree
> ---------------------------
>
>                 Key: LUCENE-6422
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6422
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>    Affects Versions: 5.x
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6422.patch, 
> LUCENE-6422_with_SPT_factory_and_benchmark.patch
>
>
> To conform to Lucene's inverted index, SpatialStrategies use strings to 
> represent QuadCells and GeoHash cells. Yielding 1 byte per QuadCell and 5 
> bits per GeoHash cell, respectively.  To create the terms representing a 
> Shape, the BytesRefIteratorTokenStream first builds all of the terms into an 
> ArrayList of Cells in memory, then passes the ArrayList.Iterator back to 
> invert() which creates a second lexicographically sorted array of Terms. This 
> doubles the memory consumption when indexing a shape.
> This task introduces a PackedQuadPrefixTree that uses a StreamingStrategy to 
> accomplish the following:
> 1.  Create a packed 8byte representation for a QuadCell
> 2.  Build the Packed cells 'on demand' when incrementToken is called
> Improvements over this approach include the generation of the packed cells 
> using an AutoPrefixAutomaton



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to