Bit-set implementation for a low cardinality custom field

2013-09-27 Thread Marcos Juarez Lopez
I've been looking at multiple optimizations we could do in our Lucene indexes (currently around ~8B total spread out in indexes of ~250M documents each) for querying fields that have very low cardinality (usually true/false, or in some cases, less than 10 categories). I would have thought Lucene o

Re: Query performance in Lucene 4.x

2013-09-27 Thread Desidero
Erick, Thank you for responding. I ran tests using both compressed fields and uncompressed fields, and it was significantly slower with uncompressed fields. I looked into the lazy field loading per your suggestion, but we don't get any values from the returned Documents until the result set has b

Indexing documents with multiple field values

2013-09-27 Thread Igor Shalyminov
Hello! I have really long document field values. Tokens of these fields are of the form: word|payload|position_increment. (I need to control position increments and payload manually.) I collect these compound tokens for the entire document, then join them with a '\t', and then pass this string

RE: Multiphrase Query in Lucene 4.3

2013-09-27 Thread Allison, Timothy B.
1) An alternate method to your original question would be to do something like this (I haven't compiled or tested this!): Query q = new PrefixQuery(new Term("field", "app")); q = q.rewrite(indexReader) ; Set terms = new HashSet(); q.extractTerms(terms); Term[] arr = terms.toArray(new Term[terms.

Re: Query performance in Lucene 4.x

2013-09-27 Thread Erick Erickson
Hmmm, since 4.1, fields have been stored compressed by default. I suppose it's possible that this is a result of compressing/uncompressing. What happens if 1> you enable lazy field loading 2> don't load any fields? FWIW, Erick On Thu, Sep 26, 2013 at 10:55 AM, Desidero wrote: > A quick update:

why different type attributes?

2013-09-27 Thread Bernd Fehling
This question might be stupid, but why are there different type attributes? We have , , , ... but also "word", "shingle", ... Why not , , ...??? Is there a deeper logic behind this or just historically grown and not yet unified? Regards Bernd --

Re: Multiphrase Query in Lucene 4.3

2013-09-27 Thread VIGNESH S
Hi, The word i am giving is "Romer Geoffrey ".The Word is in the Field. trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s = trm.term().utf8ToString(); and hence It is giving a diffrent word..I think this is why my multiphrasequery is not giving desired results. What may be the