Search returning the correct number of hits but wrong stored data

2016-05-30 Thread Conny Gyllendahl
Field (for example the date is converted to an Integer like 20160530 and then stored) The above allows me to do quick range querys like: +subscriberId:[12345 TO 12345] +date:[20160501 TO 20160531] I have written my own Collector that extends SimpleCollector and just adds the document ids to a Set.

Re: How can Docvalues so efficient

2016-05-30 Thread Adrien Grand
When executing queries, Lucene has an abstraction called Scorer, which is responsible for returning matching documents in doc id order. Since doc values are stored on disk in doc id order, reads are sequential. There is an adversary case when few documents match since you might need to jump over la

Re: How can Docvalues so efficient

2016-05-30 Thread Ting Yao
Thank you very much for answering me. But could you explain how Lucene reads the doc values files sequentially? 2016-05-30 18:15 GMT+08:00 Adrien Grand : > Doc values indeed need to read from disk. However, the fact that Lucene > reads the doc values files sequentially (disks perform better at s

Re: How can Docvalues so efficient

2016-05-30 Thread Adrien Grand
Doc values indeed need to read from disk. However, the fact that Lucene reads the doc values files sequentially (disks perform better at sequential access than random access) and that the filesystem cache helps keep hot regions of the doc values files in memory usually helps keep perfermance close

How can Docvalues so efficient

2016-05-30 Thread Ting Yao
Hi all, I am reading Lucene source code recently and we also use the Elastic Search as our search engine. As far as I know, the elastic search performance is pretty good. The elastic search is based on Lucene. So I am wondering that how it can search words so fast when the field data (uninve