Re: A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Martin
Ahh, I knew I saw it somewhere, then I lost it again... :) I guess the name is not quite intuitive, but anyway thanks a lot! and I'm just wondering if there is a tokenizer that would return me the whole text. KeywordTokenizer does this. -

Re: A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Ahmet Arslan
> and I'm just wondering if there is a tokenizer > that would return me the whole text. KeywordTokenizer does this. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java

A full-text tokenizer for the NGramTokenFilter

2010-07-17 Thread Martin
Hi there, I have been recently trying to build a lucene index out of ngrams and seem to have stumbled on to a number of issues. I first tried to use the NGramTokenizer, but that thing apparently only takes the first 1024 characters to tokenize. Having searched around the web, I came upon this

Re: Scoring exact matches higher in a stemmed field

2010-07-17 Thread Itamar Syn-Hershko
Shai, you got it right. I want to be able to send "b bb" through the QP with my custom analyzer, and get back "(b b$) (b bb$)" -- 2 terms with 2 tokens in the same position for each. I want this to be a native product of the engine, as opposed to forcing this from the query end. I'm using diff