[
https://issues.apache.org/jira/browse/LUCENE-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578730#comment-16578730
]
Nicholas Knize commented on LUCENE-8452:
----------------------------------------
+1 [~jpountz] I'm toying around with that approach a bit and can post some
benchmark numbers when I have them.
As a side note (that may be of interest) I went ahead and extracted all
linestrings, multilinestrings, and multipolygons from the latest planet OSM
snapshot to run some local scale benchmarks and general tests with real world
shape data. I converted the data from .pbf to WKT for easy ingest in luceneutil
(and already have a WKT parser for {{LatLonShape}} - lines and polygons - that
I can commit to luceneutil separately if interested). The data is quite large,
and very good (real world w/ varying spatial extents, vertex counts, etc). If
there is any interest I can extract a smaller set (e.g., 60M shapes to
complement the 60M points in geobench) and make available for geo nightly
benchmarks.
Here are the numbers for the entire corpus of data:
||Type||Count||File Size||
|{{LINESTRING}}|157,075,680|88GB|
|{{MULTILINESTRING}}|532,043|7.1GB|
|{{MULTIPOLYGON}}|351,975,024|164GB|
> BKD-based shape indexing benchmarks
> -----------------------------------
>
> Key: LUCENE-8452
> URL: https://issues.apache.org/jira/browse/LUCENE-8452
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/sandbox
> Reporter: Ignacio Vera
> Priority: Major
> Attachments: BKDperf.pdf
>
>
> Initial benchmarking of the new BKD-based shape indexing suggest that
> searches can be somewhat under-performing. I open this ticket to share the
> findings and to open a discussion how to speed up the solution.
>
> The first benchmark is done by using the current benchmark in luceneutils for
> indexing points and search by bounding box. We would expect {{LatLonShape}}
> to be slower that {{LatLonPoint}} but still having a good performance. The
> results of running such benchmark in my computer looks like:
>
> LatLonPoint:
> 89.717239531 sec to index
> INDEX SIZE: 0.5087761553004384 GB
> READER MB: 0.6098232269287109
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 72.91056132596746
> BEST QPS: 74.19031323419311
>
> LatLonShape:
> 89.388678805 sec to index
> INDEX SIZE: 1.3028179928660393 GB
> READER MB: 0.8827085494995117
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 1.0053836784184809
> BEST QPS: 1.0230305276205143
>
> A second benchmark has been performed indexing around 10 million 4-side
> polygons and around 3 million points. Searches are performed using bounding
> boxes. The results are compared with spatial trees alternatives. Spatial
> trees use a composite strategy, precision=0.001 degrees and distErrPct=0.25:
>
> s2 (Geo3d):
> 1191.732124301 sec to index part 0
> INDEX SIZE: 3.2086284114047885 GB
> READER MB: 19.453557014465332
> maxDoc=12949519
> totHits=705758537
> BEST M hits/sec: 13.311369588840462
> BEST QPS: 4.243743434150063
>
> quad (JTS):
> 3252.62925159 sec to index part 0
> INDEX SIZE: 4.5238002222031355 GB
> READER MB: 41.15725612640381
> maxDoc=12949519
> totHits=705758357
> BEST M hits/sec: 35.54591930673003
> BEST QPS: 11.332252412866938
>
> LatLonShape:
> 30.32712009 sec to index part 0
> INDEX SIZE: 0.5627057952806354 GB
> READER MB: 0.29498958587646484
> maxDoc=12949519
> totHits=705758228
> BEST M hits/sec: 3.4130465326433357
> BEST QPS: 1.0880999177593018
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]