On Sun, May 23, 2010 at 12:34 PM, Shai Erera <[email protected]> wrote:
> I want to stress that not all ngram-based languages are affected by this > behavior, especially those for which we do ngram just because of a lack of > good tokenizer. > They are also affected! Do you understand how the queryparser treats whitespace? You cannot currently use "normal" word spanning n-grams with lucene because of this: 1) you can only use word-internal n-grams because each whitespace-separated word gets its own tokenstream 2) all queries here are also made into phrasequeries automatically, which is stupid as n-grams already contain the 'positional information' -- Robert Muir [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
