[
https://issues.apache.org/jira/browse/LUCENE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356757#comment-16356757
]
Ignacio Vera edited comment on LUCENE-8126 at 2/8/18 3:35 PM:
--------------------------------------------------------------
I have added the performance test classes on the branch in case you want to
have a look to check results are not bias.
I have done two test (They do not include index size as I have no found a
reliable way of getting that info), I can only show you for now some tipical
results.
1) Indexing 5k random polygons and executing 50 Random queries. Trees have
precision set to 1e-4 and distErrPct to 5%:
load geohash : 132,15 indexed shapes per second
query geohash recursive : 10,88 queries per second
query geohash composite : 7,29 queries per second
load quad : 299,04 indexed shapes per second
query quad recursive : 15,13 queries per second
query quad composite : 10,89 queries per second
load s2 arity 4 : 623,67 indexed shapes per second
query s2 arity 4 recursive : 40,00 queries per second
query s2 arity 4 composite : 21,51 queries per second
load s2 arity 16 : 283,46 indexed shapes per second
query s2 arity 16 recursive : 12,22 queries per second
query s2 arity 16 composite : 9,99 queries per second
load s2 arity 64 : 159,05 indexed shapes per second
query s2 arity 64 recursive : 5,13 queries per second
query s2 arity 64 composite : 3,58 queries per second
1) Indexing 50k random points and executing 50 Random queries. Trees have
precision set to 1e-6 and distErrPct to 0%:
load geohash : 14898,69 indexed shapes per second
query geohash recursive : 2,31 queries per second
query geohash composite : 2,29 queries per second
load quad : 5068,42 indexed shapes per second
query quad recursive : 5,10 queries per second
query quad composite : 5,11 queries per second
load s2 arity 4 : 9748,49 indexed shapes per second
query s2 arity 4 recursive : 11,34 queries per second
query s2 arity 4 composite : 11,38 queries per second
load s2 arity 16 : 15513,50 indexed shapes per second
query s2 arity 16 recursive : 3,86 queries per second
query s2 arity 16 composite : 3,83 queries per second
load s2 arity 64 : 16886,19 indexed shapes per second
query s2 arity 64 recursive : 1,56 queries per second
query s2 arity 64 composite : 1,56 queries per second
Some data of my use case: I need to index ~3M documents. 2M are points, 0.5M
polygons and 0.5M multi-polygons with an averge size of 20. They need to be
indexed on the celestial sphere (unit sphere). All polygons are squared (4
points) with different sizes, from 1 square degree to very tiny ones, all
distributed mainly on the south hemisphere of the sphere. Moving from Geohash
to S2 has provided the following benefits:
1) 20% reduccion of index size.
2) 75% reduccion on indexing time (yuhu!).
3) 2.5 times faster queries.
Anyway what we need is a benchmark for SPT the same way that there is one for
BKD tree. Probably my next mini-project.
Conclusion: If you use Geo3D, you probably want to use S2 :)
was (Author: ivera):
I have added the performance test classes on the branch in case you want to
have a look to check results are not bias.
I have done two test (They do not include index size as I have no found a
reliable way of getting that info), I can only show you for now some tipical
results.
1) Indexing 5k random polygons and executing 50 Random queries. Trees have
precision set to 1e-4 and distErrPct to 5%:
load geohash : 132,15 indexed shapes per second
query geohash recursive : 10,88 queries per second
query geohash composite : 7,29 queries per second
load quad : 299,04 indexed shapes per second
query quad recursive : 15,13 queries per second
query quad composite : 10,89 queries per second
load s2 arity 4 : 623,67 indexed shapes per second
query s2 arity 4 recursive : 40,00 queries per second
query s2 arity 4 composite : 21,51 queries per second
load s2 arity 16 : 283,46 indexed shapes per second
query s2 arity 16 recursive : 12,22 queries per second
query s2 arity 16 composite : 9,99 queries per second
load s2 arity 64 : 159,05 indexed shapes per second
query s2 arity 64 recursive : 5,13 queries per second
query s2 arity 64 composite : 3,58 queries per second
1) Indexing 50k random points and executing 50 Random queries. Trees have
precision set to 1e-6 and distErrPct to 0%:
load geohash : 14898,69 indexed shapes per second
query geohash recursive : 2,31 queries per second
query geohash composite : 2,29 queries per second
load quad : 5068,42 indexed shapes per second
query quad recursive : 5,10 queries per second
query quad composite : 5,11 queries per second
load s2 arity 4 : 9748,49 indexed shapes per second
query s2 arity 4 recursive : 11,34 queries per second
query s2 arity 4 composite : 11,38 queries per second
load s2 arity 16 : 15513,50 indexed shapes per second
query s2 arity 16 recursive : 3,86 queries per second
query s2 arity 16 composite : 3,83 queries per second
load s2 arity 64 : 16886,19 indexed shapes per second
query s2 arity 64 recursive : 1,56 queries per second
query s2 arity 64 composite : 1,56 queries per second
Some data of my use case: I need to index ~3M documents. 2M are points, 0.5M
polygons and 0.5M multi-polygons with an averge size of 20. They need to be
indexed on the celestial sphere (unit sphere). All polygons are squared (4
points) with different sizes, from 1 square degree to very tiny ones, all
distributed mainly on the south hemisphere of the sphere. Moving from Geohash
to S2 has provided the following benefits:
1) 20% reduccion of index size.
2) 75% reduccion on indexing time (yuhu!).
3) 2.5 times faster queries.
Anyway what we need is a benchmark for SPT the same way that there is one for
BKD tree. Probably my next mini-project.
Conclusion: If you use Geo3D, you probably want to use S2 :)
> Spatial prefix tree based on S2 geometry
> ----------------------------------------
>
> Key: LUCENE-8126
> URL: https://issues.apache.org/jira/browse/LUCENE-8126
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/spatial-extras
> Reporter: Ignacio Vera
> Assignee: Ignacio Vera
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Hi [~dsmiley],
> I have been working on a prefix tree based on goggle S2 geometry
> (https://s2geometry.io/) to be used mainly with Geo3d shapes with very
> promising results, in particular for complex shapes (e.g polygons). Using
> this pixelization scheme reduces the size of the index, improves the
> performance of the queries and reduces the loading time for non-point shapes.
> If you are ok with this contribution and before providing any code I would
> like to understand what is the correct/prefered approach:
> 1) Add new depency to the S2 library
> (https://mvnrepository.com/artifact/io.sgr/s2-geometry-library-java). It has
> Apache 2.0 license so it should be ok.
> 2) Create a utility class with all methods necessary to navigate the S2 tree
> and create shapes from S2 cells (basically port what we need from the library
> into Lucene).
> What do you think?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]