On Sun, May 23, 2010 at 12:34 PM, Shai Erera <[email protected]> wrote:

> I want to stress that not all ngram-based languages are affected by this
> behavior, especially those for which we do ngram just because of a lack of
> good tokenizer.
>

They are also affected! Do you understand how the queryparser treats
whitespace? You cannot currently use "normal" word spanning n-grams
with lucene because of this:

1) you can only use word-internal n-grams because each
whitespace-separated word gets its own tokenstream
2) all queries here are also made into phrasequeries automatically,
which is stupid as n-grams already contain the 'positional
information'

-- 
Robert Muir
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to