On Tue, May 28, 2013 at 8:55 PM, Arun Kumar K <arunk...@gmail.com> wrote: > Thanks for clarifying the things. > I have some doubts regarding sorting : >> >> While you can do that, I don't recommend it. For example, if you have >> 5 fields, loading all fields from stored fields requires at most 1 >> disk seek while loading all fields from doc values requires at least 5 >> disk seeks for disk-based doc values. > > > 1> I am assuming those mentioned 5 fields are sortable fields upon which > sorting is done. > In my understanding, loading stored fields takes 1 disk seek for finding file > pointer & 1 disk seek for getting all those fields.
This was correct until Lucene 4.0, but since 4.1, Lucene stores the doc ID -> file pointer mapping in memory, ensuring at most 1 disk seek. > Since different file is maintained for a particular doc value field. We get 5 > disk seeks + 1 disk seek for file pointer. There is no general rule since this depends on the doc values type and the codec implementation, but you got the idea. > If we have only one sortable field , which could be better ? I guess no diff. Just to make things clear, before Lucene had doc values, sorting was performed based on the inverted index (which was uninverted and stored in memory using FieldCache), not stored fields. Stored fields are bad for sorting because they are usually large and don't play nice with the file system cache. Doc values are very similar to FieldCache except that the hard work is done at indexing time instead of searching time. This is good trade-off because it allows for faster loading of indexes and for off-loading data to disk. This is never a bad idea to use doc values for sorting. > Also, I vaguely remember that there is some performance loss for sorting > based on string in lucene 4.0 > Then, will the decision change for String field or based on type of field ? I don't see why String sorting would be slower. However, it is true that String sorting requires a lot of memory. If your field is a number, you should definitely use a numeric field cache. > 2> Also, In my understanding, if we need to use parser based queries for > docvalues, we need to have a storedfield for a doc with same name & value of > the doc's docvalue. > Even term queries won't work. Am i right here? QueryParser is completely unaware of your schema. If you want QueryParser to use doc-values-based queries, you can override QueryParser.newRangeQuery and/or QueryParser.newFieldQuery to return a new ConstantScoreQuery that wraps a FieldCacheRangeFilter. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org