Lucene Taxonomy Faceting

2016-11-02 Thread sidhant92
How Can i achieve the following: Suppose i have the following set of documents { "id": "1" "type": "abc" }, { "id": "2" "type": "abc" }, { "id": "2" "type": "abc" }, { "id": "3" "type": "abc" } Using a taxonomy index in lucene in facets i can get the count during search that gives

RE: Understanding performance characteristics of the new point types

2016-11-02 Thread Uwe Schindler
Hi, FYI, the old NumericRangeQuery is fast here, because it rewrites to a constant score BooleanQuery for this low-cardinality case! If you have no real range, then it rewrites to a TermQuery! Points are different, they are not so good for simple term-based lookups. Uwe - Uwe Schindler H.

Re: Understanding performance characteristics of the new point types

2016-11-02 Thread Florian Hopf
Thank you both for the explanation, we will switch to StringField with a TermQuery instead. On 02.11.2016 20:09, Michael McCandless wrote: > Yeah it's best to use StringField for low-cardinality use cases. > > When cardinality is low (4 unique values in your case), legacy > numerics would rewrite

Re: Understanding performance characteristics of the new point types

2016-11-02 Thread Michael McCandless
Yeah it's best to use StringField for low-cardinality use cases. When cardinality is low (4 unique values in your case), legacy numerics would rewrite to a BooleanQuery, which is much more performant for MUST clauses, vs dimensional points which will always need to construct an up front bitset for

Re: Understanding performance characteristics of the new point types

2016-11-02 Thread Fuad Efendi
Hi florian, If my understanting is correct, you are using IntPoint to index 4 different document types which is overkill; why not to try classic “non-tokenized” keyword field (a.k.a. “legacy string”) for document types? Cardinality is only four for document types. -- Fuad Efendi (416) 993-2060

Understanding performance characteristics of the new point types

2016-11-02 Thread Florian Hopf
Hi, we are indexing different types of documents in one Lucene index. They have most fields in common but we need to filter some types for certain queries. We are using numeric values to determine the types of documents (1-4). Now, when querying these documents we see that the performance degrades

Re: an overview of the index format on Javadoc

2016-11-02 Thread Shinichiro Abe
I saw the commit you made. Thank you! Shinichiro Abe 2016-11-02 18:38 GMT+09:00 Michael McCandless : > Thank you, I pushed your patch on that issue! > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Nov 1, 2016 at 1:51 AM, Shinichiro Abe > wrote: >> I opened https://issues.apa

Re: an overview of the index format on Javadoc

2016-11-02 Thread Michael McCandless
Thank you, I pushed your patch on that issue! Mike McCandless http://blog.mikemccandless.com On Tue, Nov 1, 2016 at 1:51 AM, Shinichiro Abe wrote: > I opened https://issues.apache.org/jira/browse/LUCENE-7532, attached a patch. > > Thank you, > Shinichiro Abe > > 2016-10-29 18:50 GMT+09:00 Mich