[
https://issues.apache.org/jira/browse/LUCENE-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-6477:
---------------------------------------
Attachment: LUCENE-6477.patch
Another iteration, improving the pivot selection on each recursion to try to
make closer-to-square-shaped leaf cells. All I do is pick the longer dimension
to slice, instead of just alternating lat/lon like before.
This gives a ~7% speedup on the "bboxes around London over OpenStreetMap
points" benchmark.
The same number of leaf cells must be visited as before (since it's a fixed
512-1024 points per leaf cell), but because the cells are more square shaped,
there are fewer cells that cross the boundary and those cells are slower
because we must pull the lat/lon for each point and check it against the query
shape (vs cells we know are fully enclosed, where we just blindly add all
docIDs).
I made another video, and you can definitely see that the cells are "less
slivery": https://plus.google.com/+MichaelMcCandless/posts/Daj9FgYPdtv
> Add BKD tree for spatial shape query intersecting indexed points
> ----------------------------------------------------------------
>
> Key: LUCENE-6477
> URL: https://issues.apache.org/jira/browse/LUCENE-6477
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: Trunk, 5.2
>
> Attachments: LUCENE-6477.patch, LUCENE-6477.patch, LUCENE-6477.patch,
> LUCENE-6477.patch, LUCENE-6477.patch
>
>
> I'd like to explore using dedicated spatial trees for faster shape
> intersection filters than postings-based implementations.
> I implemented the tree data structure from
> https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
> The idea is simple: it builds a full binary tree, partitioning 2D
> space, alternately on lat and then lon, into smaller and smaller
> rectangles until a leaf has <= N (default 1024) points.
> It cannot index shapes (just points), and can then do fast shape
> intersection queries. Multi-valued fields are supported.
> I only implemented the "point is contained in this bounding box" query
> for now, but I think polygon shape querying should be easy to
> implement using the same approach from LUCENE-6450.
> For indexing, you add BKDPointField (takes lat, lon) to your doc, and
> must set up your Codec use BKDTreeDocValuesFormat for that field.
> This DV format wraps Lucene50DVFormat, but then builds the disk-based
> BKD tree structure on the side. BKDPointInBBoxQuery then requires this
> DVFormat, and casts it to gain access to the tree.
> I quantize each incoming double lat/lon to 32 bits precision (so 64
> bits per point) = ~9 milli-meter lon precision at the equator, I
> think.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]