RE: Avoid memory issues when indexing terms with multiplicity

2014-04-07 Thread Dávid Nemeskey
Hi Uwe, thanks for your reply, too. :) I must admit that I was ahead of myself in the mail a bit, because I am not using a TokenFilter yet, but expand the tokens manually before sending them to Lucene. It is good to know that it makes a difference. I will definitely try the TokenStream-based solu

Re: Avoid memory issues when indexing terms with multiplicity

2014-04-07 Thread Dávid Nemeskey
nk you could override your TermsConsumer's implementation of > finishTerm() to rewrite "dog:3" as "dog" and multiply Term Frequency by 3, > right before the term is written to the postings. This is not for the > faint of heart, and I wouldn't recommend trying unl

Avoid memory issues when indexing terms with multiplicity

2014-04-04 Thread Dávid Nemeskey
gh the index, remove the payloads and positions and writing the posting lists myself? Thank you very much. Best, Dávid Nemeskey