[ 
https://issues.apache.org/jira/browse/LUCENE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288661#comment-14288661
 ] 

David Smiley commented on LUCENE-6196:
--------------------------------------

bq. If the bounding box is used for determining detail level, what I've done in 
the past is to set bounds for the number of geohash tokens, and descend until 
either you hit the maximum depth, or you hit the minimum number of desired 
tokens.

That makes sense.  It hasn't been an issue thus far because all of the existing 
shapes have a bounding box.  Right now the traversal done in TreeCellIterator 
(used with CellTokenStream, via PrefixTreeStrategy) is depth-first, so it 
doesn't know how many cells there are going to be as it walks (it's given a 
target/max depth level a-priori). If it was breadth-first, then it could 
instead be given a minimum number of cells and then it'll go to the level just 
beyond if it reaches this threshold mid-way through tokenizing a level.  The 
rationale behind it being depth-first is that I intend to reuse this on the 
TreeCellIterator on the search side in a refactor of 
AbstractVisitingPrefixTreeFilter (LUCENE-5745) which underpins Intersects, 
Within, and now recently for heatmap & time faceting, which needs to consume 
the tokens in sorted order to leap-frog with TermsEnum, and so it must be 
depth-first.

The concept of using the cells for fast and approximate filtering and then 
lookup the vector geometry in, say, DocValues totally makes sense.  As of last 
summer the spatial module has had SerializedDVStrategy to generalize the later 
check.  It would further be awesome to have a derivative of RPT that is able to 
detect which cells are _within_ the query geometry and so don't double-check 
those documents, and likewise for leaf cells _containing_ the query geometry 
need not be double-checked either.  It's a wish-list feature LUCENE-5579.  
Ideally the leaf cells would be differentiated as edge-approximated or within 
but it's not essential.  The optimization I'm talking about here wouldn't be 
appropriate when you want incorporate the distance from a mid-point of a 
point-radius shape in the ranking since you might as well filter at the 
collector as you describe.

bq. We've got some WIP for indexing both 3&4D space-time using a hilbert-curve

Sweet!  2015 is shaping up to be an awesome year for Lucene spatial.  I wish I 
had the ability to contribute at the levels I was supported at a couple years 
ago.

> Include geo3d package, along with Lucene integration to make it useful
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-6196
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6196
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: Karl Wright
>            Assignee: David Smiley
>         Attachments: ShapeImpl.java, geo3d.zip
>
>
> I would like to explore contributing a geo3d package to Lucene.  This can be 
> used in conjunction with Lucene search, both for generating geohashes (via 
> spatial4j) for complex geographic shapes, as well as limiting results 
> resulting from those queries to those results within the exact shape in 
> highly performant ways.
> The package uses 3d planar geometry to do its magic, which basically limits 
> computation necessary to determine membership (once a shape has been 
> initialized, of course) to only multiplications and additions, which makes it 
> feasible to construct a performant BoostSource-based filter for geographic 
> shapes.  The math is somewhat more involved when generating geohashes, but is 
> still more than fast enough to do a good job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to