I've been looking at multiple optimizations we could do in our Lucene
indexes (currently around ~8B total spread out in indexes of ~250M
documents each) for querying fields that have very low cardinality (usually
true/false, or in some cases, less than 10 categories). I would have
thought Lucene o
Erick,
Thank you for responding.
I ran tests using both compressed fields and uncompressed fields, and it
was significantly slower with uncompressed fields. I looked into the lazy
field loading per your suggestion, but we don't get any values from the
returned Documents until the result set has b
Hello!
I have really long document field values. Tokens of these fields are of the
form: word|payload|position_increment. (I need to control position increments
and payload manually.)
I collect these compound tokens for the entire document, then join them with a
'\t', and then pass this string
1) An alternate method to your original question would be to do something like
this (I haven't compiled or tested this!):
Query q = new PrefixQuery(new Term("field", "app"));
q = q.rewrite(indexReader) ;
Set terms = new HashSet();
q.extractTerms(terms);
Term[] arr = terms.toArray(new Term[terms.
Hmmm, since 4.1, fields have been stored compressed by default.
I suppose it's possible that this is a result of compressing/uncompressing.
What happens if
1> you enable lazy field loading
2> don't load any fields?
FWIW,
Erick
On Thu, Sep 26, 2013 at 10:55 AM, Desidero wrote:
> A quick update:
This question might be stupid, but why are there different type attributes?
We have , , , ... but also "word", "shingle", ...
Why not , , ...???
Is there a deeper logic behind this or just historically grown and not yet
unified?
Regards
Bernd
--
Hi,
The word i am giving is "Romer Geoffrey ".The Word is in the Field.
trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s =
trm.term().utf8ToString(); and hence
It is giving a diffrent word..I think this is why my multiphrasequery is
not giving desired results.
What may be the