Re: Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

2017-06-29 Thread Erick Erickson
1> Is it correct that stored fields can only be sorted on if they become a DocValue field in 5.x no. Indexed-only fields can still be used to sort. DocValues are just more efficient at load time and don't consume as much of the Java heap. Essentially this latter can be thought of as moving the "u

Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

2017-06-29 Thread Florian Buetow
Hi, I am in the process of updating a large index from Lucene 4.x to 5.x and have two questions related to the sorting order. 1. Is it correct that stored fields can only be sorted on if they become a DocValue field in 5.x? 2. When "updating" stored fields to DocValue fields , is it required to

Re: Lucene GeoNear Search and Sort Performance

2017-06-29 Thread sc
For now, I used LatLonPoint.newDistanceQuery(...) and SloppyMath.haversinMeters to get the distance between center point and all other points within 50m radius circle and sort them using Comparator. I am able to get accurate results in < 10ms which is fine with me. Code: Query l

RE: Is it possible to normalise BM25 scores in the query level?

2017-06-29 Thread Rifat
Hi, How can we normalize BM25 scores by the query length (number of tokens) in Lucene or elasticsearch? I can access document fields by scripting or lucene expressions in elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-expression.html) but I could no

Re: Lucene GeoNear Search and Sort Performance

2017-06-29 Thread sc
Thank you so much. With LatLonPoint.nearest(..), I am getting results in 6ms, with accurate results, when I compared MongoDB.geoNear(). I also tried with LatLonPoint.newDistanceQuery, and I got results in 6ms but the results are NOT the nearest points. Few more questions: Is there an API that u

Re: Lucene GeoNear Search and Sort Performance

2017-06-29 Thread Michael McCandless
To make the distance filter faster you should use LatLonPoint. And, if you really just need the N nearest points, you could try LatLonPoint.nearest(...) ... that might be faster than filtering + sorting separately. Mike McCandless http://blog.mikemccandless.com On Thu, Jun 29, 2017 at 12:06 PM,

Lucene GeoNear Search and Sort Performance

2017-06-29 Thread sc
Hi, I have similar requirement of searching points within a radius of 50m. Loaded 100M latlon, indexed/searching with LatLonDocValuesField. Currently, I am testing it on my macbook pro. I have tested with all Directory(RAM/FS/MMap) types, but it takes > 3-4 secs to do search/sort to return of

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-06-29 Thread sc
Hi, I have similar requirement of searching points within a radius of 50m. Loaded 100M latlon, indexed/searching with LatLonDocValuesField. I am testing it on my macbook pro. I have used all Directory(RAM/FS/MMap) types but it takes 3-4 secs to do search/sort to return of 5 points with in rad